1 00:00:00,790 --> 00:00:02,730 - [Instructor] For our next example we're going to use 2 00:00:02,730 --> 00:00:06,740 a live tweet stream to perform sentiment analysis. 3 00:00:06,740 --> 00:00:08,980 Now, before I talk about this in detail 4 00:00:08,980 --> 00:00:11,810 let me just go ahead to the command line 5 00:00:11,810 --> 00:00:14,310 where I'm going to execute a script 6 00:00:14,310 --> 00:00:17,210 called sentimentlistener.py. 7 00:00:17,210 --> 00:00:19,930 For this script you can provide a topic 8 00:00:19,930 --> 00:00:21,620 that you would like to track, 9 00:00:21,620 --> 00:00:23,330 and you can specify the total number 10 00:00:23,330 --> 00:00:25,180 of tweets that you want to process, 11 00:00:25,180 --> 00:00:27,420 and what it will then do is start up 12 00:00:27,420 --> 00:00:30,000 a live tweet stream tracking that topic, 13 00:00:30,000 --> 00:00:33,940 and once those 10, in this case, tweets come in 14 00:00:33,940 --> 00:00:36,810 it will summarize how many positive tweets there were, 15 00:00:36,810 --> 00:00:38,540 how many negative tweets there were, 16 00:00:38,540 --> 00:00:40,630 and how many neutral tweets there were 17 00:00:40,630 --> 00:00:43,810 based on the values returned by TextBlob. 18 00:00:43,810 --> 00:00:45,770 So, I'm gonna start that executing 19 00:00:45,770 --> 00:00:49,030 so we can see tweets start to come in. 20 00:00:49,030 --> 00:00:52,480 Again, I'm using marvel as my topic at the moment, 21 00:00:52,480 --> 00:00:56,230 and it's going to precede each tweet by a plus sign 22 00:00:56,230 --> 00:00:59,250 if it's positive, a minus sign if it's negative, 23 00:00:59,250 --> 00:01:02,620 and nothing but a space character if it's 24 00:01:02,620 --> 00:01:05,140 considered to be a neutral tweet. 25 00:01:05,140 --> 00:01:06,620 Now, while those tweets come in 26 00:01:06,620 --> 00:01:10,680 let me just jump back over to my slide here. 27 00:01:10,680 --> 00:01:13,570 So, one of the things that you might use something 28 00:01:13,570 --> 00:01:17,190 like for is dynamic sentiment analysis. 29 00:01:17,190 --> 00:01:20,090 So, I could potentially be receiving these tweets, 30 00:01:20,090 --> 00:01:22,830 figuring out if they're positive, negative, or neutral, 31 00:01:22,830 --> 00:01:25,050 and maybe graphing that information 32 00:01:25,050 --> 00:01:28,660 in a dynamic visualization that shows 33 00:01:28,660 --> 00:01:31,390 what's going on right up to the minute. 34 00:01:31,390 --> 00:01:33,880 So, for instance political researchers 35 00:01:33,880 --> 00:01:35,710 might use that during elections 36 00:01:35,710 --> 00:01:38,840 to see how people are feeling about specific politicians 37 00:01:38,840 --> 00:01:41,900 and topics that might help they determine 38 00:01:41,900 --> 00:01:45,870 how those folks are likely to vote as well. 39 00:01:45,870 --> 00:01:47,820 Companies might use something like this 40 00:01:47,820 --> 00:01:50,640 to see what people are saying about their products 41 00:01:50,640 --> 00:01:52,550 and other people's products and use that 42 00:01:52,550 --> 00:01:54,730 to their competitive advantage. 43 00:01:54,730 --> 00:01:58,620 And as you saw, the script that I'm executing 44 00:01:58,620 --> 00:02:01,760 is going to enable you to check the sentiment 45 00:02:01,760 --> 00:02:03,940 on whatever specified topic you 46 00:02:03,940 --> 00:02:06,050 provide as a command line argument 47 00:02:06,050 --> 00:02:09,710 and the number of tweets that you specify as well. 48 00:02:09,710 --> 00:02:11,950 So, let me jump back out, and as you can see, 49 00:02:11,950 --> 00:02:15,740 the script terminated with four positive, four neutral, 50 00:02:15,740 --> 00:02:19,010 and two negative tweets in this particular case, 51 00:02:19,010 --> 00:02:21,130 and one thing to keep in mind as you start 52 00:02:21,130 --> 00:02:22,930 looking through these tweets here 53 00:02:22,930 --> 00:02:24,960 is that even though they're marked 54 00:02:24,960 --> 00:02:27,140 as positive, negative, or neutral, 55 00:02:27,140 --> 00:02:29,970 these are tweets and tweets don't come 56 00:02:29,970 --> 00:02:32,810 in full sentences a lot of the time, 57 00:02:32,810 --> 00:02:36,740 so it may be difficult for a tool like TextBlob 58 00:02:36,740 --> 00:02:39,270 with its default sentiment analyzer 59 00:02:39,270 --> 00:02:43,600 to truly get a sense of the sentiment in certain tweets 60 00:02:43,600 --> 00:02:46,570 based on what their content actually is. 61 00:02:46,570 --> 00:02:49,670 So, you may want to try this out, for example, 62 00:02:49,670 --> 00:02:53,630 with the Naive Bayes sentiment analyzer 63 00:02:53,630 --> 00:02:56,930 that we demonstrated back in the NLP chapter. 64 00:02:56,930 --> 00:02:59,380 In our case in this example we're working 65 00:02:59,380 --> 00:03:02,907 with the default sentiment analyzer in TextBlob, 66 00:03:02,907 --> 00:03:05,060 and you may find that there are other 67 00:03:05,060 --> 00:03:09,150 sentiment analysis tools out there that are better geared 68 00:03:09,150 --> 00:03:12,033 towards detecting sentiment in tweets, 69 00:03:12,890 --> 00:03:14,880 and similarly, you know, when you're 70 00:03:14,880 --> 00:03:17,330 dealing with other types of text 71 00:03:17,330 --> 00:03:19,100 there may be better tools out there 72 00:03:19,100 --> 00:03:21,340 for certain kinds of text that 73 00:03:21,340 --> 00:03:24,300 you want to analyze and manipulate. 74 00:03:24,300 --> 00:03:26,900 So, now that we saw the script in action 75 00:03:26,900 --> 00:03:31,820 let me jump into a text editor here and show you the script. 76 00:03:31,820 --> 00:03:34,550 Now, just briefly overviewing the script first 77 00:03:34,550 --> 00:03:36,360 before we look at it in detail. 78 00:03:36,360 --> 00:03:38,410 Of course we have the imports for anything 79 00:03:38,410 --> 00:03:41,540 that we're going to use in this script file. 80 00:03:41,540 --> 00:03:43,940 We have our class definition that inherits 81 00:03:43,940 --> 00:03:46,410 from StreamListener so that we can be 82 00:03:46,410 --> 00:03:50,080 notified as tweets arrive in our app. 83 00:03:50,080 --> 00:03:53,033 We have the methods of class SentimentListener, 84 00:03:54,050 --> 00:03:56,950 the initialization method, the on_status method 85 00:03:56,950 --> 00:04:00,190 to receive each tweet, and as I scroll down here 86 00:04:00,190 --> 00:04:02,210 you see after the class we have 87 00:04:02,210 --> 00:04:04,420 a main function that we defined, 88 00:04:04,420 --> 00:04:07,290 which is where we specify, in this case, 89 00:04:07,290 --> 00:04:10,430 the logic of the application itself, 90 00:04:10,430 --> 00:04:13,780 and at the very bottom of the file we have an if statement 91 00:04:13,780 --> 00:04:16,630 that I introduced to you in an earlier lesson, 92 00:04:16,630 --> 00:04:19,060 and just as a matter of review, 93 00:04:19,060 --> 00:04:24,060 recall that when you execute any .py file as a script 94 00:04:25,440 --> 00:04:29,523 there is a global variable called __name__, 95 00:04:30,820 --> 00:04:34,973 and its value is going to be set to the string __main__. 96 00:04:37,160 --> 00:04:40,020 So, this if statement's body will execute 97 00:04:40,020 --> 00:04:45,020 only if I run sentimentlistener.py as a script, 98 00:04:45,370 --> 00:04:46,940 which is what I just demonstrated 99 00:04:46,940 --> 00:04:49,110 to you at the command line. 100 00:04:49,110 --> 00:04:52,130 In that case, we will call the main function. 101 00:04:52,130 --> 00:04:55,730 The main function will set up our authorization, 102 00:04:55,730 --> 00:04:58,050 authentication, excuse me, and then 103 00:04:58,050 --> 00:05:00,160 perform the logic of the application, 104 00:05:00,160 --> 00:05:02,750 and as part of that it will be using 105 00:05:02,750 --> 00:05:06,340 an object of our SentimentListener class. 106 00:05:06,340 --> 00:05:09,430 So, taking it from the top now in more detail, 107 00:05:09,430 --> 00:05:11,750 we're importing our keys.py file 108 00:05:11,750 --> 00:05:13,320 because we need all the information 109 00:05:13,320 --> 00:05:16,310 for authenticating with the Twitter APIs. 110 00:05:16,310 --> 00:05:18,930 We're importing the preprocessor module, 111 00:05:18,930 --> 00:05:22,720 which we demonstrated a few videos back for cleaning tweets, 112 00:05:22,720 --> 00:05:25,120 and we'll use that in this example. 113 00:05:25,120 --> 00:05:27,640 We're using the sys module to access 114 00:05:27,640 --> 00:05:30,020 the command line arguments to the script, 115 00:05:30,020 --> 00:05:33,700 which are going to be the text that we want to track 116 00:05:33,700 --> 00:05:36,220 and the number of tweets to receive 117 00:05:36,220 --> 00:05:38,990 before terminating the livestream, 118 00:05:38,990 --> 00:05:41,140 and we're also importing TextBlob 119 00:05:41,140 --> 00:05:42,930 because we're going to take advantage 120 00:05:42,930 --> 00:05:46,870 of its builtin capabilities for sentiment analysis. 121 00:05:46,870 --> 00:05:49,460 And then finally, of course, we're importing tweepy 122 00:05:49,460 --> 00:05:54,460 because we need to use that to access the Twitter APIs. 123 00:05:54,690 --> 00:05:57,170 So, just as we did in the preceding example, 124 00:05:57,170 --> 00:05:58,920 we're creating a new subclass 125 00:05:58,920 --> 00:06:01,360 of StreamListener from the tweepy module. 126 00:06:01,360 --> 00:06:03,800 We're calling this one SentimentListener, 127 00:06:03,800 --> 00:06:07,490 and again, it's going to handle an incoming tweet stream, 128 00:06:07,490 --> 00:06:11,360 this time checking the tweets for sentiment. 129 00:06:11,360 --> 00:06:14,780 Now, you'll notice that we have a number of arguments 130 00:06:14,780 --> 00:06:18,480 that we are receiving into our init method. 131 00:06:18,480 --> 00:06:20,880 These arguments are going to represent 132 00:06:20,880 --> 00:06:23,710 the tweepy API object. 133 00:06:23,710 --> 00:06:26,010 This is going to be a dictionary 134 00:06:26,010 --> 00:06:28,550 that keeps track of positive sentiment, 135 00:06:28,550 --> 00:06:31,640 neutral sentiment, and negative sentiment. 136 00:06:31,640 --> 00:06:33,430 We're also going to receive a string 137 00:06:33,430 --> 00:06:36,060 that represents the topic that we're tracking, 138 00:06:36,060 --> 00:06:38,920 and we're going to receive a limit argument 139 00:06:38,920 --> 00:06:42,030 specifying the total number of tweets to process. 140 00:06:42,030 --> 00:06:45,720 If we don't specify a value for that, it would be 10. 141 00:06:45,720 --> 00:06:47,250 As you'll see, we're going to use 142 00:06:47,250 --> 00:06:49,200 a command line argument in this case, 143 00:06:49,200 --> 00:06:52,410 so whatever that argument is is what we will supply 144 00:06:52,410 --> 00:06:55,540 as the last argument to the init method. 145 00:06:55,540 --> 00:06:59,210 Now we're going to initialize a number of instance variables 146 00:06:59,210 --> 00:07:03,470 in our SentimentListener object that we create. 147 00:07:03,470 --> 00:07:05,510 We'll store the sentiment dictionary 148 00:07:05,510 --> 00:07:07,720 so that while those tweets are coming in 149 00:07:07,720 --> 00:07:10,030 we can keep updating that dictionary. 150 00:07:10,030 --> 00:07:12,530 We will keep track of how many tweets we've processed 151 00:07:12,530 --> 00:07:14,510 so we know when to terminate the stream, 152 00:07:14,510 --> 00:07:18,350 and we'll use the TWEET_LIMIT to help us with that as well. 153 00:07:18,350 --> 00:07:22,540 And we will store the topic that we are looking for also. 154 00:07:22,540 --> 00:07:24,350 The reason for that is we're going 155 00:07:24,350 --> 00:07:26,540 to look at every tweet that comes in, 156 00:07:26,540 --> 00:07:30,030 and if the actual topic is not in the tweet's text 157 00:07:30,030 --> 00:07:33,120 we're going to ignore that tweet in this example. 158 00:07:33,120 --> 00:07:34,820 So, we're only going to show you tweets 159 00:07:34,820 --> 00:07:36,930 that actually physically contain 160 00:07:36,930 --> 00:07:40,310 the topic in the actual tweet_text, 161 00:07:40,310 --> 00:07:42,430 not necessarily in all of the other 162 00:07:42,430 --> 00:07:44,810 metadata in this example. 163 00:07:44,810 --> 00:07:47,930 Now, we also chose to set the preprocessor 164 00:07:47,930 --> 00:07:51,150 module's options to say that we're going 165 00:07:51,150 --> 00:07:55,840 to drop out URLs and any Twitter RESERVED words 166 00:07:55,840 --> 00:07:58,680 that appear in tweets as we clean them, 167 00:07:58,680 --> 00:08:02,110 and then finally, as we did in the last example as well, 168 00:08:02,110 --> 00:08:05,360 we take the tweepy API object and hand that off 169 00:08:05,360 --> 00:08:08,980 to the superclass StreamListener's init method 170 00:08:08,980 --> 00:08:10,880 because the StreamListener class 171 00:08:10,880 --> 00:08:13,100 has all sorts of builtin capability 172 00:08:13,100 --> 00:08:17,063 that is dependent on using that API object. 173 00:08:18,060 --> 00:08:20,500 Now, separately we have our on_status method, 174 00:08:20,500 --> 00:08:22,550 and again, this is the method that's going 175 00:08:22,550 --> 00:08:25,680 to get called every time we receive a status object 176 00:08:25,680 --> 00:08:28,580 from Twitter representing a given tweet, 177 00:08:28,580 --> 00:08:31,410 and as we did in the preceding example 178 00:08:31,410 --> 00:08:34,673 we are once again going to try to get the full_text 179 00:08:34,673 --> 00:08:36,880 of an extended_tweet, which is one 180 00:08:36,880 --> 00:08:40,150 that has 141 to 280 characters. 181 00:08:40,150 --> 00:08:42,710 If that fails we'll get an exception, 182 00:08:42,710 --> 00:08:44,760 in which case we will just fall back 183 00:08:44,760 --> 00:08:48,710 to the default text attribute of the status object, 184 00:08:48,710 --> 00:08:52,270 and that's going to represent a tweet up to 140, 185 00:08:52,270 --> 00:08:55,120 or up through 140 characters. 186 00:08:55,120 --> 00:08:59,360 Now, if the particular tweet that we receive is a retweet 187 00:08:59,360 --> 00:09:01,470 we are simply going to ignore it. 188 00:09:01,470 --> 00:09:05,040 We want 10 unique tweets for this example, 189 00:09:05,040 --> 00:09:07,550 or whatever number of tweets you specify. 190 00:09:07,550 --> 00:09:09,140 We want them to be unique tweets 191 00:09:09,140 --> 00:09:12,060 that have positive, negative, or neutral sentiment. 192 00:09:12,060 --> 00:09:14,470 If it's a bunch of retweets, and it could be 193 00:09:14,470 --> 00:09:17,070 many of them if you're dealing with a viral topic, 194 00:09:17,070 --> 00:09:19,840 you may get all the same value very quickly, 195 00:09:19,840 --> 00:09:24,460 so we're ignoring any tweet that contains RT 196 00:09:24,460 --> 00:09:25,800 at the beginning of the tweet, 197 00:09:25,800 --> 00:09:28,080 which indicates that it's a retweet. 198 00:09:28,080 --> 00:09:30,870 Separately we're going to clean the text. 199 00:09:30,870 --> 00:09:32,930 We're going to get rid of any URLs 200 00:09:32,930 --> 00:09:36,550 and other Twitter keywords that, like "fav," 201 00:09:36,550 --> 00:09:39,930 F-A-V, that might be inside of that tweet. 202 00:09:39,930 --> 00:09:43,630 And then we're going to check to see if the topic 203 00:09:43,630 --> 00:09:48,480 that we're searching for is contained in the tweet_text. 204 00:09:48,480 --> 00:09:51,350 So, if the topic in its lowercase form 205 00:09:51,350 --> 00:09:55,030 is not in the tweet_text in its lowercase form, 206 00:09:55,030 --> 00:09:57,210 then we're going to ignore that tweet as well. 207 00:09:57,210 --> 00:10:00,770 So, if I search for marvel, I want the word marvel 208 00:10:00,770 --> 00:10:05,150 to appear in every single tweet in this particular example. 209 00:10:05,150 --> 00:10:07,670 Now, once we have gotten to this point 210 00:10:07,670 --> 00:10:10,060 we have a tweet that we are going to process, 211 00:10:10,060 --> 00:10:12,690 so we create a TextBlob out of it, 212 00:10:12,690 --> 00:10:15,220 and you may recall from the preceding lesson 213 00:10:15,220 --> 00:10:18,190 that every TextBlob has a sentiment attribute, 214 00:10:18,190 --> 00:10:21,910 and inside that sentiment attribute is a polarity value, 215 00:10:21,910 --> 00:10:23,130 which is going to be greater 216 00:10:23,130 --> 00:10:25,470 than zero for positive sentiment, 217 00:10:25,470 --> 00:10:28,400 less, equal to zero for neutral sentiment, 218 00:10:28,400 --> 00:10:31,090 and less than zero for negative sentiment. 219 00:10:31,090 --> 00:10:32,920 So, we're going to check that polarity, 220 00:10:32,920 --> 00:10:35,390 and if it's a positive value we're going 221 00:10:35,390 --> 00:10:39,560 to set the sentiment variable to a plus sign, 222 00:10:39,560 --> 00:10:43,430 which we'll use as part of what we display with the tweet. 223 00:10:43,430 --> 00:10:47,060 We're going to go into the dictionary that we stored 224 00:10:47,060 --> 00:10:49,610 and set its positive keys value 225 00:10:49,610 --> 00:10:52,920 to whatever the value was previously plus one, 226 00:10:52,920 --> 00:10:54,830 so we're gonna modify the value 227 00:10:54,830 --> 00:10:57,130 associated with the positive key. 228 00:10:57,130 --> 00:11:00,420 If the sentiment is neutral we'll do the same thing, 229 00:11:00,420 --> 00:11:03,910 but we'll use a space character to indicate 230 00:11:03,910 --> 00:11:06,080 neutral sentiment, and we'll add one 231 00:11:06,080 --> 00:11:08,050 to the neutral counter, if you will, 232 00:11:08,050 --> 00:11:10,010 and then finally, if it's not positive 233 00:11:10,010 --> 00:11:12,310 or neutral it has to be negative, 234 00:11:12,310 --> 00:11:14,130 so for the else part we'll increment 235 00:11:14,130 --> 00:11:17,600 the negative sentiment key value pair 236 00:11:17,600 --> 00:11:21,270 and we'll use a minus sign to indicate negative sentiment 237 00:11:21,270 --> 00:11:25,920 in the context of the output of this application. 238 00:11:25,920 --> 00:11:29,480 Now, at that point we're going to display the tweet. 239 00:11:29,480 --> 00:11:32,880 We're going to precede it by its sentiment string, 240 00:11:32,880 --> 00:11:34,740 then we're going to show the screen_name 241 00:11:34,740 --> 00:11:36,840 of the user followed by a colon, 242 00:11:36,840 --> 00:11:41,209 and we're going to show their actual cleaned up tweet_text 243 00:11:41,209 --> 00:11:44,270 as the text that gets dumped out 244 00:11:44,270 --> 00:11:46,820 at the command line in this example. 245 00:11:46,820 --> 00:11:48,970 We will also add one to our tweet_count, 246 00:11:48,970 --> 00:11:51,730 because eventually we're going to want to reach 247 00:11:51,730 --> 00:11:54,130 that TWEET_LIMIT and we want to make sure that 248 00:11:54,130 --> 00:11:57,950 the stream terminates properly in our example. 249 00:11:57,950 --> 00:12:01,630 In the main method we're going to set up our authentication, 250 00:12:01,630 --> 00:12:03,190 so this is the same stuff that 251 00:12:03,190 --> 00:12:04,900 we've done a couple of times now. 252 00:12:04,900 --> 00:12:08,510 First we must create the OAuthHandler object, 253 00:12:08,510 --> 00:12:11,350 which requires both the consumer API key 254 00:12:11,350 --> 00:12:14,310 and the consumer API secret key. 255 00:12:14,310 --> 00:12:16,520 We then have to tell the OAuth object 256 00:12:16,520 --> 00:12:18,170 to set its access token, 257 00:12:18,170 --> 00:12:21,030 which is the additional two pieces of information 258 00:12:21,030 --> 00:12:25,240 that we have stored in our keys.py file. 259 00:12:25,240 --> 00:12:27,470 Once we've configured that OAuth object 260 00:12:27,470 --> 00:12:30,080 we can get our tweepy API object, 261 00:12:30,080 --> 00:12:35,080 and again, that requires the OAuthHandler as an argument, 262 00:12:35,250 --> 00:12:38,390 and once again, we are going to specify that 263 00:12:38,390 --> 00:12:42,060 if we reach those rate limits that Twitter imposes 264 00:12:42,060 --> 00:12:46,630 we want the Twitter API object to automatically issue 265 00:12:46,630 --> 00:12:51,630 a wait so that we don't go past the rate limits. 266 00:12:51,900 --> 00:12:54,570 Next up we're going to get our command line arguments. 267 00:12:54,570 --> 00:12:56,520 So, the search_key is going to be 268 00:12:56,520 --> 00:12:59,460 the argument at index number one. 269 00:12:59,460 --> 00:13:01,020 Remember that when you're working 270 00:13:01,020 --> 00:13:02,770 with command line arguments 271 00:13:02,770 --> 00:13:07,120 the sys module provides this argv list. 272 00:13:07,120 --> 00:13:10,520 Element number zero is the name of your script. 273 00:13:10,520 --> 00:13:14,020 Element number one is the first argument after the script. 274 00:13:14,020 --> 00:13:17,400 Element number two is the second argument after the script. 275 00:13:17,400 --> 00:13:20,930 By the way, you can use a multiword search_key 276 00:13:20,930 --> 00:13:23,870 if when you type in your command line arguments 277 00:13:23,870 --> 00:13:27,600 you enclose the search_key in double quote characters. 278 00:13:27,600 --> 00:13:28,840 So, whatever the search_key is 279 00:13:28,840 --> 00:13:30,660 will be stored here as a string, 280 00:13:30,660 --> 00:13:33,190 and the last command line argument, 281 00:13:33,190 --> 00:13:35,130 the one in index number two, 282 00:13:35,130 --> 00:13:39,160 is going to represent the total number of tweets to process, 283 00:13:39,160 --> 00:13:43,270 not including all of the ones that we ignore along the way. 284 00:13:43,270 --> 00:13:46,570 We only add one to the counter for tweets 285 00:13:46,570 --> 00:13:50,300 if we get what we are considering to be a valid tweet 286 00:13:50,300 --> 00:13:52,580 for the purpose of this example. 287 00:13:52,580 --> 00:13:55,270 Now, we are assuming that you properly 288 00:13:55,270 --> 00:13:58,190 supply the two command line arguments, 289 00:13:58,190 --> 00:14:02,090 so if you do not, this program will fail 290 00:14:02,090 --> 00:14:05,070 with an exception and terminate, just so you know. 291 00:14:05,070 --> 00:14:06,920 Of course, we could check to make sure 292 00:14:06,920 --> 00:14:09,720 that there really are two command line arguments first, 293 00:14:09,720 --> 00:14:12,430 and then only process them if in fact 294 00:14:12,430 --> 00:14:15,200 there are the appropriate arguments. 295 00:14:15,200 --> 00:14:17,720 Next up we set up the sentiment dictionary, 296 00:14:17,720 --> 00:14:21,410 and as you can see, we've preinitialized it with three keys, 297 00:14:21,410 --> 00:14:25,263 positive with the value zero, neutral with the value zero, 298 00:14:25,263 --> 00:14:27,191 and negative with the value zero, 299 00:14:27,191 --> 00:14:30,640 and this is the dictionary that the SentimentListener object 300 00:14:30,640 --> 00:14:35,220 is going to continuously update as each tweet arrives. 301 00:14:35,220 --> 00:14:38,140 Next up we create our SentimentListener, 302 00:14:38,140 --> 00:14:40,700 and we're giving it the four arguments 303 00:14:40,700 --> 00:14:43,420 that we talked about up above in the init method. 304 00:14:43,420 --> 00:14:45,910 The first is the tweepy API object 305 00:14:45,910 --> 00:14:48,700 that will be used to interact with Twitter. 306 00:14:48,700 --> 00:14:52,100 The second is this sentiment dictionary that we just created 307 00:14:52,100 --> 00:14:55,940 that the SentimentListener will store and repeatedly update. 308 00:14:55,940 --> 00:14:59,210 The third is the topic that we're actually searching for, 309 00:14:59,210 --> 00:15:01,830 the search_key in the main function, 310 00:15:01,830 --> 00:15:04,950 and finally, limit, which we just created back here 311 00:15:04,950 --> 00:15:09,103 at line 73, will represent the number of tweets to process. 312 00:15:10,270 --> 00:15:12,630 Next up we're going to set up our tweepy.Stream, 313 00:15:12,630 --> 00:15:16,540 and again, we have to tell it the OAuthHandler object 314 00:15:16,540 --> 00:15:19,870 that it's going to use to authenticate with Twitter, 315 00:15:19,870 --> 00:15:23,650 and then we also need to specify the listener object 316 00:15:23,650 --> 00:15:26,940 that will be notified as each tweet arrives, 317 00:15:26,940 --> 00:15:30,020 and that's the listener that we just created. 318 00:15:30,020 --> 00:15:32,440 And finally, to get the stream going 319 00:15:32,440 --> 00:15:34,650 we are going to invoke the filter method 320 00:15:34,650 --> 00:15:36,890 once again on the stream object. 321 00:15:36,890 --> 00:15:39,340 So, as you can see here, we are tracking 322 00:15:39,340 --> 00:15:42,290 a list of terms that contains one term, 323 00:15:42,290 --> 00:15:45,527 the search_key that we specified as a command line argument, 324 00:15:45,527 --> 00:15:48,080 and again, this can be a comma-separated list 325 00:15:48,080 --> 00:15:49,630 of a whole bunch of terms, 326 00:15:49,630 --> 00:15:52,490 so for instance if I was a political researcher 327 00:15:52,490 --> 00:15:55,230 and I was tracking two presidential candidates 328 00:15:55,230 --> 00:15:59,650 I might want to have both of their names in the list here 329 00:15:59,650 --> 00:16:01,600 so that I get all the tweets containing 330 00:16:01,600 --> 00:16:04,870 each name along the way, or not all of them. 331 00:16:04,870 --> 00:16:07,870 Remember, it's only one percent of the livestream, 332 00:16:07,870 --> 00:16:12,470 so a randomly selected one percent of the tweets 333 00:16:12,470 --> 00:16:14,810 that have those names is what I should've said. 334 00:16:14,810 --> 00:16:17,160 Now, in this case we happen to be looking 335 00:16:17,160 --> 00:16:19,820 only for tweets in the English language, 336 00:16:19,820 --> 00:16:21,750 but we did demonstrate to you 337 00:16:21,750 --> 00:16:23,980 in the previous streaming example 338 00:16:23,980 --> 00:16:26,670 that we can receive tweets in multiple languages, 339 00:16:26,670 --> 00:16:29,040 and that we can use things like TextBlob 340 00:16:29,040 --> 00:16:32,360 to automatically translate those tweets as well. 341 00:16:32,360 --> 00:16:36,630 So, the languages argument, again, is going to be a list, 342 00:16:36,630 --> 00:16:38,360 so you can actually track tweets 343 00:16:38,360 --> 00:16:40,430 in several different languages 344 00:16:40,430 --> 00:16:43,980 by putting a comma-separated list of values here. 345 00:16:43,980 --> 00:16:48,000 And finally, in this case we set is_async to False 346 00:16:48,000 --> 00:16:51,300 because we only want to display the final results 347 00:16:51,300 --> 00:16:55,190 after all the tweets that we specified have been read in, 348 00:16:55,190 --> 00:16:57,860 so in this example we did 10 tweets 349 00:16:57,860 --> 00:17:00,180 and I only want to show you the contents 350 00:17:00,180 --> 00:17:04,130 of the sentiment dictionary after we have all 10 of them. 351 00:17:04,130 --> 00:17:08,280 Once we, this call returns, which could take some, 352 00:17:08,280 --> 00:17:10,900 a short amount of time or a long amount of time, 353 00:17:10,900 --> 00:17:14,453 or somewhere in between depending on what you are tracking, 354 00:17:15,400 --> 00:17:18,010 at that point we will display our final summary 355 00:17:18,010 --> 00:17:20,650 of the results where we show you the tweet sentiment 356 00:17:20,650 --> 00:17:22,570 for whatever the search_key was, 357 00:17:22,570 --> 00:17:25,040 and for the positive we'll access 358 00:17:25,040 --> 00:17:26,700 the positive key in the dictionary, 359 00:17:26,700 --> 00:17:29,080 and similarly the neutral and negative keys 360 00:17:29,080 --> 00:17:32,893 in the dictionary to display the final results.