1 00:00:00,970 --> 00:00:02,080 - [Instructor] Now that we've shown you 2 00:00:02,080 --> 00:00:05,510 our subclass of StreamListener called Tweet Listener, 3 00:00:05,510 --> 00:00:09,330 we're going to use that in an interactive IPython session 4 00:00:09,330 --> 00:00:13,460 in which we're going to create an asynchronous stream 5 00:00:13,460 --> 00:00:16,200 that delivers tweets as they arrive 6 00:00:16,200 --> 00:00:19,200 right into the IPython environment 7 00:00:19,200 --> 00:00:24,200 and is simply going to display each tweet as it arrives. 8 00:00:24,350 --> 00:00:27,300 And because it's going to be an asynchronous stream, 9 00:00:27,300 --> 00:00:29,500 you'll see that even though 10 00:00:29,500 --> 00:00:32,200 the stream is still being processed, 11 00:00:32,200 --> 00:00:36,620 we will have access to the IPYthon command prompt 12 00:00:36,620 --> 00:00:38,390 for the next snippet of input 13 00:00:38,390 --> 00:00:41,990 that we want to read in and execute. 14 00:00:41,990 --> 00:00:43,990 And one of the benefits of that is 15 00:00:43,990 --> 00:00:45,920 we could if we needed to 16 00:00:45,920 --> 00:00:48,480 specify that we want to terminate the stream 17 00:00:48,480 --> 00:00:52,610 by setting a specific attribute on our stream object. 18 00:00:52,610 --> 00:00:54,950 Now we're going to be performing a number of steps 19 00:00:54,950 --> 00:00:56,600 in this interactive session. 20 00:00:56,600 --> 00:00:59,760 We will start out by authenticating once again with Twitter 21 00:00:59,760 --> 00:01:02,460 because this will be a new IPython session. 22 00:01:02,460 --> 00:01:04,430 We'll create a Tweepy API object 23 00:01:04,430 --> 00:01:07,140 for interacting with the Twitter APIs. 24 00:01:07,140 --> 00:01:10,220 We'll create an instance of our TweetListener class 25 00:01:10,220 --> 00:01:12,690 that I just looked at with you in the preceding video, 26 00:01:12,690 --> 00:01:16,670 and then we'll use that as we create a Tweepy Stream object 27 00:01:16,670 --> 00:01:20,010 that's going to be used to manage the connection 28 00:01:20,010 --> 00:01:21,540 to the Twitter stream. 29 00:01:21,540 --> 00:01:23,450 Once we have that stream object, 30 00:01:23,450 --> 00:01:25,830 we're going to invoke it's filter method 31 00:01:25,830 --> 00:01:28,190 to initiate the stream processing, 32 00:01:28,190 --> 00:01:31,070 and as you'll see, the filter method receives 33 00:01:31,070 --> 00:01:34,610 a track parameter in which you can specify 34 00:01:34,610 --> 00:01:36,050 the terms that you would like 35 00:01:36,050 --> 00:01:39,307 to track and receive live tweets for. 36 00:01:39,307 --> 00:01:41,230 And in fact there's a whole bunch 37 00:01:41,230 --> 00:01:44,800 of potential arguments that you can supply to filter, 38 00:01:44,800 --> 00:01:46,660 so I will show you those 39 00:01:46,660 --> 00:01:49,430 in the online documentation as well. 40 00:01:49,430 --> 00:01:51,400 We'll come back to the slides in just a moment, 41 00:01:51,400 --> 00:01:54,830 but for now let's go ahead and jump over to 42 00:01:54,830 --> 00:01:57,140 a new IPython session, 43 00:01:57,140 --> 00:01:58,810 and in this IPython session, 44 00:01:58,810 --> 00:02:01,670 I've already imported the Tweepy module 45 00:02:01,670 --> 00:02:04,670 as well as my keys dot PY file 46 00:02:04,670 --> 00:02:06,820 in which I have my 47 00:02:06,820 --> 00:02:08,890 Deitel specific keys, 48 00:02:08,890 --> 00:02:11,010 and of course you would import the keys file 49 00:02:11,010 --> 00:02:14,400 that has your keys from earlier in this lesson. 50 00:02:14,400 --> 00:02:16,040 Next up, we're going to create 51 00:02:16,040 --> 00:02:18,740 that O Auth Handler object once again, 52 00:02:18,740 --> 00:02:21,360 and just as a review, 53 00:02:21,360 --> 00:02:24,820 you're going to access the keys files consumer key 54 00:02:24,820 --> 00:02:27,020 and the keys files consumer secret 55 00:02:27,020 --> 00:02:30,210 as you create that O Auth Handler object, 56 00:02:30,210 --> 00:02:31,840 and those are going to be used 57 00:02:31,840 --> 00:02:34,810 to authenticate with Twitter. 58 00:02:34,810 --> 00:02:36,630 And then separately, don't forget 59 00:02:36,630 --> 00:02:40,220 that you also need to set your access token, 60 00:02:40,220 --> 00:02:41,560 and for that purpose, you're going 61 00:02:41,560 --> 00:02:43,770 to use the keys files access token 62 00:02:43,770 --> 00:02:47,600 and the keys files access token secret variables. 63 00:02:47,600 --> 00:02:50,700 And again, you would've had to update that file 64 00:02:50,700 --> 00:02:53,059 with your specific information 65 00:02:53,059 --> 00:02:56,610 from when we demonstrated how to create an app 66 00:02:56,610 --> 00:02:59,803 through the developer portal at Twitter.com. 67 00:03:01,010 --> 00:03:04,080 At this point, we've configured our O Auth Handler object, 68 00:03:04,080 --> 00:03:06,900 and remember that next we can use that object 69 00:03:06,900 --> 00:03:10,340 as we configure our Tweepy API object. 70 00:03:10,340 --> 00:03:12,650 The O Auth Handler is the first argument, 71 00:03:12,650 --> 00:03:15,230 and that's going to enable the API object 72 00:03:15,230 --> 00:03:18,300 to pass the credentials along to Twitter 73 00:03:18,300 --> 00:03:21,200 as we invoke the various API methods. 74 00:03:21,200 --> 00:03:25,170 Once again, we have used the additional keyword arguments 75 00:03:25,170 --> 00:03:27,280 that specify that we want to wait 76 00:03:27,280 --> 00:03:31,110 if we encounter any rate limit violations, 77 00:03:31,110 --> 00:03:35,410 so that we don't accidentally get our account suspended. 78 00:03:35,410 --> 00:03:37,400 At this point, we're now ready 79 00:03:37,400 --> 00:03:40,960 to create our TweetListener object, 80 00:03:40,960 --> 00:03:42,610 and of course, in order to do that, 81 00:03:42,610 --> 00:03:47,160 we first have to import it's definition into our sessions. 82 00:03:47,160 --> 00:03:49,900 I am working from the CH 12 folder 83 00:03:49,900 --> 00:03:53,090 that corresponds to this lesson's source code, 84 00:03:53,090 --> 00:03:55,473 and within that folder is the file TweetListener dot PY, 85 00:03:57,568 --> 00:04:00,700 and within that file is the TweetListener class 86 00:04:00,700 --> 00:04:03,770 that we looked at in the preceding video. 87 00:04:03,770 --> 00:04:05,930 Now once we have that class imported, 88 00:04:05,930 --> 00:04:08,410 we can create a new instance of that class. 89 00:04:08,410 --> 00:04:12,510 We simply have to specify as an argument the API object. 90 00:04:12,510 --> 00:04:15,450 Now you may recall that there was an optional argument 91 00:04:15,450 --> 00:04:18,560 representing the total number of tweets 92 00:04:18,560 --> 00:04:20,040 that you want to process. 93 00:04:20,040 --> 00:04:22,360 That keyword argument was called limit, 94 00:04:22,360 --> 00:04:24,400 and if you don't specify that argument 95 00:04:24,400 --> 00:04:25,770 which is the case here, 96 00:04:25,770 --> 00:04:29,030 we're only going to process 10 tweets 97 00:04:29,030 --> 00:04:30,480 from the live tweet stream, 98 00:04:30,480 --> 00:04:34,400 which is good enough to demonstrate the overall concept 99 00:04:34,400 --> 00:04:37,490 of streaming live tweets onto your system. 100 00:04:37,490 --> 00:04:40,330 So we'll create that TweetListener object 101 00:04:40,330 --> 00:04:42,400 and then we're going to use that 102 00:04:42,400 --> 00:04:45,690 to initialize a Tweepy Stream object 103 00:04:45,690 --> 00:04:49,150 which is going to actually manage the stream. 104 00:04:49,150 --> 00:04:51,440 Now, you need to supply two arguments 105 00:04:51,440 --> 00:04:53,810 when you create a Tweepy Stream object. 106 00:04:53,810 --> 00:04:57,200 You need to give it the O Auth Handler object. 107 00:04:57,200 --> 00:04:58,900 Now we chose to do that 108 00:04:58,900 --> 00:05:01,760 by accessing it through the API object, 109 00:05:01,760 --> 00:05:05,030 but we actually could of simply passed in auth 110 00:05:05,030 --> 00:05:07,510 because it's part of our current session as well, 111 00:05:07,510 --> 00:05:09,440 but this is the version of that object 112 00:05:09,440 --> 00:05:13,910 that's stored physically in the Tweepy API object. 113 00:05:13,910 --> 00:05:15,930 Then separately, we have to specify 114 00:05:15,930 --> 00:05:18,220 our listener object as well, 115 00:05:18,220 --> 00:05:20,460 which is the TweetListener variable 116 00:05:20,460 --> 00:05:23,393 that we initialized on the preceding snippet. 117 00:05:24,250 --> 00:05:28,030 Now we have configured the stream object 118 00:05:28,030 --> 00:05:31,500 and we are ready to start streaming. 119 00:05:31,500 --> 00:05:33,780 Now, I'm going to paste in the next statement 120 00:05:33,780 --> 00:05:35,870 but I'm not going to execute it yet. 121 00:05:35,870 --> 00:05:38,350 The way that you initiate the stream 122 00:05:38,350 --> 00:05:40,960 is by calling the filter method 123 00:05:40,960 --> 00:05:42,950 on the stream object. 124 00:05:42,950 --> 00:05:46,630 Now in this case we're supplying two keyword arguments. 125 00:05:46,630 --> 00:05:48,500 The first is called track, 126 00:05:48,500 --> 00:05:51,140 and as you can see, it's a list 127 00:05:51,140 --> 00:05:55,010 of the words that you would like to search for 128 00:05:55,010 --> 00:05:56,920 in the lives stream of tweets 129 00:05:56,920 --> 00:05:59,070 and get a sampling of the tweets 130 00:05:59,070 --> 00:06:00,200 that are coming in live. 131 00:06:00,200 --> 00:06:03,030 Remember, you have access to only a maximum 132 00:06:03,030 --> 00:06:05,210 of one percent of the tweets 133 00:06:05,210 --> 00:06:08,240 that are happening at any given time, 134 00:06:08,240 --> 00:06:12,430 and when you request a particular tracking 135 00:06:12,430 --> 00:06:14,240 word or set of words 136 00:06:14,240 --> 00:06:17,440 as a comma-delimited list in a list object, 137 00:06:17,440 --> 00:06:20,020 then it's going to give you back a subset 138 00:06:20,020 --> 00:06:22,460 of the overall tweets, randomly selected 139 00:06:22,460 --> 00:06:25,200 from what's going on right now. 140 00:06:25,200 --> 00:06:27,110 Here I'm tracking Marvel 141 00:06:27,110 --> 00:06:30,820 because I'm a big Marvel superhero movie fan 142 00:06:30,820 --> 00:06:33,870 and I'm also supplying, as a second argument, 143 00:06:33,870 --> 00:06:37,380 is underscore async which I'm setting to true, 144 00:06:37,380 --> 00:06:39,210 and what that means is, 145 00:06:39,210 --> 00:06:42,850 this snippet executing inside of IPython, 146 00:06:42,850 --> 00:06:44,610 once I actually do execute it, 147 00:06:44,610 --> 00:06:46,990 is not going to block 148 00:06:46,990 --> 00:06:49,810 and wait for all the tweets to come in. 149 00:06:49,810 --> 00:06:51,980 I will initiate the stream 150 00:06:51,980 --> 00:06:54,880 and I may immediately see some tweets come in. 151 00:06:54,880 --> 00:06:57,690 But, you'll also see once I execute this 152 00:06:57,690 --> 00:07:00,320 that a new in prompt will show up 153 00:07:00,320 --> 00:07:02,510 at which I could actually continue 154 00:07:02,510 --> 00:07:05,860 typing additional code in IPython 155 00:07:05,860 --> 00:07:08,660 as the live stream is being passed 156 00:07:08,660 --> 00:07:10,510 into IPython as well. 157 00:07:10,510 --> 00:07:12,880 Every single time we get a new tweet, 158 00:07:12,880 --> 00:07:15,450 the TweetListener will simply display 159 00:07:15,450 --> 00:07:18,010 wherever I am in the IPython session, 160 00:07:18,010 --> 00:07:19,910 the contents of that tweet 161 00:07:21,000 --> 00:07:23,140 as we are about to demonstrate. 162 00:07:23,140 --> 00:07:26,220 Now the default value for is async is false. 163 00:07:26,220 --> 00:07:28,130 If I were to go with the default value, 164 00:07:28,130 --> 00:07:30,610 then this snippet will start executing 165 00:07:30,610 --> 00:07:33,620 and will not gives us back a new in prompt 166 00:07:33,620 --> 00:07:36,250 until the maximum number of tweets 167 00:07:36,250 --> 00:07:39,250 that we configured the TweetListener for is reached. 168 00:07:39,250 --> 00:07:41,060 And again, in snippet seven 169 00:07:41,060 --> 00:07:43,640 because we did not specify the limit, 170 00:07:43,640 --> 00:07:46,950 we will get a maximum of 10 tweets 171 00:07:46,950 --> 00:07:48,910 for demonstration purposes. 172 00:07:48,910 --> 00:07:50,980 Now before I execute that 173 00:07:50,980 --> 00:07:52,470 and show you what's going to happen, 174 00:07:52,470 --> 00:07:55,010 I just want to switch back to the slides for a moment 175 00:07:55,010 --> 00:07:57,440 and talk about the filter method. 176 00:07:57,440 --> 00:07:59,520 In addition to the track parameter, 177 00:07:59,520 --> 00:08:00,850 there's actually a whole bunch 178 00:08:00,850 --> 00:08:03,373 of other filter method parameters. 179 00:08:04,430 --> 00:08:06,720 This is the webpage where you can learn 180 00:08:06,720 --> 00:08:08,867 about filtering realtime tweets 181 00:08:08,867 --> 00:08:10,890 through the Twitter APIs, 182 00:08:10,890 --> 00:08:12,350 and if you scroll down here, 183 00:08:12,350 --> 00:08:14,530 you'll see that there's actually quite a number 184 00:08:14,530 --> 00:08:17,630 of different parameters that we can supply 185 00:08:17,630 --> 00:08:18,780 to the filter method, 186 00:08:18,780 --> 00:08:22,250 and all of these are actually described below 187 00:08:22,250 --> 00:08:23,880 on this webpage. 188 00:08:23,880 --> 00:08:26,360 If you're interested in filtering your tweets 189 00:08:26,360 --> 00:08:28,270 at a more refined level, 190 00:08:28,270 --> 00:08:30,000 for example, you might want to filter tweets 191 00:08:30,000 --> 00:08:32,240 based on locations, you can go ahead 192 00:08:32,240 --> 00:08:34,040 and take a look at all these parameters, 193 00:08:34,040 --> 00:08:37,050 and the corresponding keyword arguments 194 00:08:37,050 --> 00:08:40,320 with these names are what you would supply 195 00:08:40,320 --> 00:08:42,690 as arguments to the filter method 196 00:08:42,690 --> 00:08:45,223 as you initiate your stream processing. 197 00:08:46,140 --> 00:08:47,930 Just coming back here for a moment, 198 00:08:47,930 --> 00:08:51,000 what the filter method is going to give you back 199 00:08:51,000 --> 00:08:55,170 are the full JSON objects representing each tweet, 200 00:08:55,170 --> 00:08:57,550 and it's important that you understand 201 00:08:57,550 --> 00:08:59,700 that it's doing the filtering, 202 00:08:59,700 --> 00:09:02,910 not just based on the tweets text 203 00:09:02,910 --> 00:09:06,200 but also all of the other metadata 204 00:09:06,200 --> 00:09:08,810 that's part of those tweet objects. 205 00:09:08,810 --> 00:09:10,220 Just looking at the text, 206 00:09:10,220 --> 00:09:13,090 you might not actually see the terms 207 00:09:14,290 --> 00:09:15,680 that you're filtering for. 208 00:09:15,680 --> 00:09:19,782 They could be embedded in @-mentions, hashtags, 209 00:09:19,782 --> 00:09:23,500 URLs that are expanded to their full URLs 210 00:09:23,500 --> 00:09:27,150 versus the shorthand versions that often show up in tweets 211 00:09:27,150 --> 00:09:29,603 and also other information as well. 212 00:09:30,450 --> 00:09:33,840 Now let's go back over to our IPython session 213 00:09:33,840 --> 00:09:36,810 and actually execute the call to filter. 214 00:09:36,810 --> 00:09:39,050 You can see I got connection successful 215 00:09:39,050 --> 00:09:42,100 and tweets are now flying by on the screen here, 216 00:09:42,100 --> 00:09:45,250 and eventually everything stopped scrolling 217 00:09:45,250 --> 00:09:47,770 because of the fact that 218 00:09:47,770 --> 00:09:49,950 I've reached my limit of 10. 219 00:09:49,950 --> 00:09:51,930 Now I'm scanning quickly 220 00:09:51,930 --> 00:09:53,190 through what's on the screen here, 221 00:09:53,190 --> 00:09:56,150 and everything looks to be pretty simple. 222 00:09:56,150 --> 00:09:57,650 No swear words showing up. 223 00:09:57,650 --> 00:09:59,260 But please keep in mind, 224 00:09:59,260 --> 00:10:01,490 you are dealing with live people 225 00:10:01,490 --> 00:10:03,100 sending live tweets 226 00:10:03,100 --> 00:10:05,080 that contain whatever they're thinking 227 00:10:05,080 --> 00:10:08,290 at any given time, so it's extremely 228 00:10:08,290 --> 00:10:11,400 possible that you will see stuff you don't like 229 00:10:11,400 --> 00:10:15,620 in all of this stuff that's flying by on the screen. 230 00:10:15,620 --> 00:10:18,440 Just keep that in mind as you're working with Twitter 231 00:10:18,440 --> 00:10:21,170 and other social network data mining 232 00:10:21,170 --> 00:10:24,460 because the reality is that you will encounter things 233 00:10:24,460 --> 00:10:27,440 that are probably not appropriate, for example, 234 00:10:27,440 --> 00:10:30,960 for your school-age children along the way. 235 00:10:30,960 --> 00:10:32,050 As you can see here, 236 00:10:32,050 --> 00:10:33,450 a bunch of tweets scrolled by. 237 00:10:33,450 --> 00:10:34,360 For each of the tweets, 238 00:10:34,360 --> 00:10:37,170 we're showing you the screen name, language, 239 00:10:37,170 --> 00:10:40,950 the actual tweet text which in this case was a retweet. 240 00:10:40,950 --> 00:10:44,460 In some cases the tweets came in in other languages. 241 00:10:44,460 --> 00:10:47,573 This looks like it might be Korean, in this case. 242 00:10:48,625 --> 00:10:52,520 In the case of a tweet in another language, 243 00:10:52,520 --> 00:10:55,323 we were able to translate that as well. 244 00:10:56,400 --> 00:10:59,610 In the end, once it finished processing 245 00:10:59,610 --> 00:11:03,020 the maximum limit of tweets that we specified, 246 00:11:03,020 --> 00:11:06,450 then the stream was actually automatically closed 247 00:11:06,450 --> 00:11:10,910 when our StreamListener's on-status method returned false, 248 00:11:10,910 --> 00:11:12,280 which is the indication 249 00:11:12,280 --> 00:11:14,930 that the stream should no longer be kept open. 250 00:11:14,930 --> 00:11:17,690 As I was executing this, 251 00:11:17,690 --> 00:11:20,240 and in fact, let's just go ahead and do it again. 252 00:11:20,240 --> 00:11:22,380 Watch down at the bottom of the screen here, 253 00:11:22,380 --> 00:11:25,490 because you'll see that the in prompt for 254 00:11:25,490 --> 00:11:27,960 the next snippet which will be snippet 11 255 00:11:27,960 --> 00:11:29,620 is going to be displayed 256 00:11:29,620 --> 00:11:32,503 while the stuff is actually executing. 257 00:11:33,430 --> 00:11:36,600 I could actually go ahead and type code here 258 00:11:37,482 --> 00:11:39,120 (typing) 259 00:11:39,120 --> 00:11:42,900 while I'm waiting, and it will enable me 260 00:11:42,900 --> 00:11:45,310 to execute that code as well. 261 00:11:45,310 --> 00:11:47,510 Now the reason that's actually handy, 262 00:11:47,510 --> 00:11:49,400 and by the way, notice there's no more tweets 263 00:11:49,400 --> 00:11:50,970 coming in at the moment here. 264 00:11:50,970 --> 00:11:53,880 But the reason that's actually handy is that 265 00:11:54,810 --> 00:11:56,970 if you're dealing with, let's say, 266 00:11:56,970 --> 00:11:59,560 reading in massive numbers of tweets, 267 00:11:59,560 --> 00:12:01,020 you may get to a point 268 00:12:01,020 --> 00:12:04,520 where you want to terminate the tweet stream. 269 00:12:04,520 --> 00:12:05,640 I mentioned the difference 270 00:12:05,640 --> 00:12:08,520 between asynchronous and synchronous tweet streams 271 00:12:08,520 --> 00:12:09,680 a little bit ago, 272 00:12:09,680 --> 00:12:11,940 and one of the things that you can do 273 00:12:11,940 --> 00:12:14,410 when you have an asynchronous stream 274 00:12:14,410 --> 00:12:18,180 is set it's running attribute to false, 275 00:12:18,180 --> 00:12:22,160 at which point that will terminate the asynchronous stream. 276 00:12:22,160 --> 00:12:24,230 Let's just assume for argument's sake 277 00:12:24,230 --> 00:12:26,760 that we were using our StreamListener 278 00:12:26,760 --> 00:12:29,410 to acquire ten thousand tweets, 279 00:12:29,410 --> 00:12:31,980 and then we decide we want to cut it off early. 280 00:12:31,980 --> 00:12:33,850 If it's coming in asynchronously, 281 00:12:33,850 --> 00:12:35,190 we can cut if off early 282 00:12:35,190 --> 00:12:37,760 simply by setting running to false, 283 00:12:37,760 --> 00:12:42,360 rather than killing the actual execution of the app. 284 00:12:42,360 --> 00:12:44,000 Just as a reminder again, 285 00:12:44,000 --> 00:12:47,160 if we don't use is async equals true, 286 00:12:47,160 --> 00:12:48,730 we get a synchronous stream 287 00:12:48,730 --> 00:12:52,360 in which case the next in prompt will not appear 288 00:12:52,360 --> 00:12:54,710 until the maximum number of tweets 289 00:12:54,710 --> 00:12:57,563 is actually reached in our code.