1
00:00:00,790 --> 00:00:02,730
- [Instructor] For our next
example we're going to use

2
00:00:02,730 --> 00:00:06,740
a live tweet stream to
perform sentiment analysis.

3
00:00:06,740 --> 00:00:08,980
Now, before I talk about this in detail

4
00:00:08,980 --> 00:00:11,810
let me just go ahead to the command line

5
00:00:11,810 --> 00:00:14,310
where I'm going to execute a script

6
00:00:14,310 --> 00:00:17,210
called sentimentlistener.py.

7
00:00:17,210 --> 00:00:19,930
For this script you can provide a topic

8
00:00:19,930 --> 00:00:21,620
that you would like to track,

9
00:00:21,620 --> 00:00:23,330
and you can specify the total number

10
00:00:23,330 --> 00:00:25,180
of tweets that you want to process,

11
00:00:25,180 --> 00:00:27,420
and what it will then do is start up

12
00:00:27,420 --> 00:00:30,000
a live tweet stream tracking that topic,

13
00:00:30,000 --> 00:00:33,940
and once those 10, in
this case, tweets come in

14
00:00:33,940 --> 00:00:36,810
it will summarize how many
positive tweets there were,

15
00:00:36,810 --> 00:00:38,540
how many negative tweets there were,

16
00:00:38,540 --> 00:00:40,630
and how many neutral tweets there were

17
00:00:40,630 --> 00:00:43,810
based on the values returned by TextBlob.

18
00:00:43,810 --> 00:00:45,770
So, I'm gonna start that executing

19
00:00:45,770 --> 00:00:49,030
so we can see tweets start to come in.

20
00:00:49,030 --> 00:00:52,480
Again, I'm using marvel
as my topic at the moment,

21
00:00:52,480 --> 00:00:56,230
and it's going to precede
each tweet by a plus sign

22
00:00:56,230 --> 00:00:59,250
if it's positive, a minus
sign if it's negative,

23
00:00:59,250 --> 00:01:02,620
and nothing but a space character if it's

24
00:01:02,620 --> 00:01:05,140
considered to be a neutral tweet.

25
00:01:05,140 --> 00:01:06,620
Now, while those tweets come in

26
00:01:06,620 --> 00:01:10,680
let me just jump back
over to my slide here.

27
00:01:10,680 --> 00:01:13,570
So, one of the things that
you might use something

28
00:01:13,570 --> 00:01:17,190
like for is dynamic sentiment analysis.

29
00:01:17,190 --> 00:01:20,090
So, I could potentially
be receiving these tweets,

30
00:01:20,090 --> 00:01:22,830
figuring out if they're
positive, negative, or neutral,

31
00:01:22,830 --> 00:01:25,050
and maybe graphing that information

32
00:01:25,050 --> 00:01:28,660
in a dynamic visualization that shows

33
00:01:28,660 --> 00:01:31,390
what's going on right up to the minute.

34
00:01:31,390 --> 00:01:33,880
So, for instance political researchers

35
00:01:33,880 --> 00:01:35,710
might use that during elections

36
00:01:35,710 --> 00:01:38,840
to see how people are feeling
about specific politicians

37
00:01:38,840 --> 00:01:41,900
and topics that might help they determine

38
00:01:41,900 --> 00:01:45,870
how those folks are
likely to vote as well.

39
00:01:45,870 --> 00:01:47,820
Companies might use something like this

40
00:01:47,820 --> 00:01:50,640
to see what people are
saying about their products

41
00:01:50,640 --> 00:01:52,550
and other people's products and use that

42
00:01:52,550 --> 00:01:54,730
to their competitive advantage.

43
00:01:54,730 --> 00:01:58,620
And as you saw, the
script that I'm executing

44
00:01:58,620 --> 00:02:01,760
is going to enable you
to check the sentiment

45
00:02:01,760 --> 00:02:03,940
on whatever specified topic you

46
00:02:03,940 --> 00:02:06,050
provide as a command line argument

47
00:02:06,050 --> 00:02:09,710
and the number of tweets
that you specify as well.

48
00:02:09,710 --> 00:02:11,950
So, let me jump back
out, and as you can see,

49
00:02:11,950 --> 00:02:15,740
the script terminated with
four positive, four neutral,

50
00:02:15,740 --> 00:02:19,010
and two negative tweets
in this particular case,

51
00:02:19,010 --> 00:02:21,130
and one thing to keep in mind as you start

52
00:02:21,130 --> 00:02:22,930
looking through these tweets here

53
00:02:22,930 --> 00:02:24,960
is that even though they're marked

54
00:02:24,960 --> 00:02:27,140
as positive, negative, or neutral,

55
00:02:27,140 --> 00:02:29,970
these are tweets and tweets don't come

56
00:02:29,970 --> 00:02:32,810
in full sentences a lot of the time,

57
00:02:32,810 --> 00:02:36,740
so it may be difficult
for a tool like TextBlob

58
00:02:36,740 --> 00:02:39,270
with its default sentiment analyzer

59
00:02:39,270 --> 00:02:43,600
to truly get a sense of the
sentiment in certain tweets

60
00:02:43,600 --> 00:02:46,570
based on what their content actually is.

61
00:02:46,570 --> 00:02:49,670
So, you may want to try
this out, for example,

62
00:02:49,670 --> 00:02:53,630
with the Naive Bayes sentiment analyzer

63
00:02:53,630 --> 00:02:56,930
that we demonstrated
back in the NLP chapter.

64
00:02:56,930 --> 00:02:59,380
In our case in this example we're working

65
00:02:59,380 --> 00:03:02,907
with the default sentiment
analyzer in TextBlob,

66
00:03:02,907 --> 00:03:05,060
and you may find that there are other

67
00:03:05,060 --> 00:03:09,150
sentiment analysis tools out
there that are better geared

68
00:03:09,150 --> 00:03:12,033
towards detecting sentiment in tweets,

69
00:03:12,890 --> 00:03:14,880
and similarly, you know, when you're

70
00:03:14,880 --> 00:03:17,330
dealing with other types of text

71
00:03:17,330 --> 00:03:19,100
there may be better tools out there

72
00:03:19,100 --> 00:03:21,340
for certain kinds of text that

73
00:03:21,340 --> 00:03:24,300
you want to analyze and manipulate.

74
00:03:24,300 --> 00:03:26,900
So, now that we saw the script in action

75
00:03:26,900 --> 00:03:31,820
let me jump into a text editor
here and show you the script.

76
00:03:31,820 --> 00:03:34,550
Now, just briefly
overviewing the script first

77
00:03:34,550 --> 00:03:36,360
before we look at it in detail.

78
00:03:36,360 --> 00:03:38,410
Of course we have the imports for anything

79
00:03:38,410 --> 00:03:41,540
that we're going to use
in this script file.

80
00:03:41,540 --> 00:03:43,940
We have our class definition that inherits

81
00:03:43,940 --> 00:03:46,410
from StreamListener so that we can be

82
00:03:46,410 --> 00:03:50,080
notified as tweets arrive in our app.

83
00:03:50,080 --> 00:03:53,033
We have the methods of
class SentimentListener,

84
00:03:54,050 --> 00:03:56,950
the initialization method,
the on_status method

85
00:03:56,950 --> 00:04:00,190
to receive each tweet,
and as I scroll down here

86
00:04:00,190 --> 00:04:02,210
you see after the class we have

87
00:04:02,210 --> 00:04:04,420
a main function that we defined,

88
00:04:04,420 --> 00:04:07,290
which is where we specify, in this case,

89
00:04:07,290 --> 00:04:10,430
the logic of the application itself,

90
00:04:10,430 --> 00:04:13,780
and at the very bottom of the
file we have an if statement

91
00:04:13,780 --> 00:04:16,630
that I introduced to you
in an earlier lesson,

92
00:04:16,630 --> 00:04:19,060
and just as a matter of review,

93
00:04:19,060 --> 00:04:24,060
recall that when you execute
any .py file as a script

94
00:04:25,440 --> 00:04:29,523
there is a global
variable called __name__,

95
00:04:30,820 --> 00:04:34,973
and its value is going to be
set to the string __main__.

96
00:04:37,160 --> 00:04:40,020
So, this if statement's body will execute

97
00:04:40,020 --> 00:04:45,020
only if I run
sentimentlistener.py as a script,

98
00:04:45,370 --> 00:04:46,940
which is what I just demonstrated

99
00:04:46,940 --> 00:04:49,110
to you at the command line.

100
00:04:49,110 --> 00:04:52,130
In that case, we will
call the main function.

101
00:04:52,130 --> 00:04:55,730
The main function will
set up our authorization,

102
00:04:55,730 --> 00:04:58,050
authentication, excuse me, and then

103
00:04:58,050 --> 00:05:00,160
perform the logic of the application,

104
00:05:00,160 --> 00:05:02,750
and as part of that it will be using

105
00:05:02,750 --> 00:05:06,340
an object of our SentimentListener class.

106
00:05:06,340 --> 00:05:09,430
So, taking it from the
top now in more detail,

107
00:05:09,430 --> 00:05:11,750
we're importing our keys.py file

108
00:05:11,750 --> 00:05:13,320
because we need all the information

109
00:05:13,320 --> 00:05:16,310
for authenticating with the Twitter APIs.

110
00:05:16,310 --> 00:05:18,930
We're importing the preprocessor module,

111
00:05:18,930 --> 00:05:22,720
which we demonstrated a few
videos back for cleaning tweets,

112
00:05:22,720 --> 00:05:25,120
and we'll use that in this example.

113
00:05:25,120 --> 00:05:27,640
We're using the sys module to access

114
00:05:27,640 --> 00:05:30,020
the command line arguments to the script,

115
00:05:30,020 --> 00:05:33,700
which are going to be the
text that we want to track

116
00:05:33,700 --> 00:05:36,220
and the number of tweets to receive

117
00:05:36,220 --> 00:05:38,990
before terminating the livestream,

118
00:05:38,990 --> 00:05:41,140
and we're also importing TextBlob

119
00:05:41,140 --> 00:05:42,930
because we're going to take advantage

120
00:05:42,930 --> 00:05:46,870
of its builtin capabilities
for sentiment analysis.

121
00:05:46,870 --> 00:05:49,460
And then finally, of course,
we're importing tweepy

122
00:05:49,460 --> 00:05:54,460
because we need to use that
to access the Twitter APIs.

123
00:05:54,690 --> 00:05:57,170
So, just as we did in
the preceding example,

124
00:05:57,170 --> 00:05:58,920
we're creating a new subclass

125
00:05:58,920 --> 00:06:01,360
of StreamListener from the tweepy module.

126
00:06:01,360 --> 00:06:03,800
We're calling this one SentimentListener,

127
00:06:03,800 --> 00:06:07,490
and again, it's going to handle
an incoming tweet stream,

128
00:06:07,490 --> 00:06:11,360
this time checking the
tweets for sentiment.

129
00:06:11,360 --> 00:06:14,780
Now, you'll notice that we
have a number of arguments

130
00:06:14,780 --> 00:06:18,480
that we are receiving
into our init method.

131
00:06:18,480 --> 00:06:20,880
These arguments are going to represent

132
00:06:20,880 --> 00:06:23,710
the tweepy API object.

133
00:06:23,710 --> 00:06:26,010
This is going to be a dictionary

134
00:06:26,010 --> 00:06:28,550
that keeps track of positive sentiment,

135
00:06:28,550 --> 00:06:31,640
neutral sentiment, and negative sentiment.

136
00:06:31,640 --> 00:06:33,430
We're also going to receive a string

137
00:06:33,430 --> 00:06:36,060
that represents the topic
that we're tracking,

138
00:06:36,060 --> 00:06:38,920
and we're going to
receive a limit argument

139
00:06:38,920 --> 00:06:42,030
specifying the total number
of tweets to process.

140
00:06:42,030 --> 00:06:45,720
If we don't specify a value
for that, it would be 10.

141
00:06:45,720 --> 00:06:47,250
As you'll see, we're going to use

142
00:06:47,250 --> 00:06:49,200
a command line argument in this case,

143
00:06:49,200 --> 00:06:52,410
so whatever that argument
is is what we will supply

144
00:06:52,410 --> 00:06:55,540
as the last argument to the init method.

145
00:06:55,540 --> 00:06:59,210
Now we're going to initialize
a number of instance variables

146
00:06:59,210 --> 00:07:03,470
in our SentimentListener
object that we create.

147
00:07:03,470 --> 00:07:05,510
We'll store the sentiment dictionary

148
00:07:05,510 --> 00:07:07,720
so that while those tweets are coming in

149
00:07:07,720 --> 00:07:10,030
we can keep updating that dictionary.

150
00:07:10,030 --> 00:07:12,530
We will keep track of how
many tweets we've processed

151
00:07:12,530 --> 00:07:14,510
so we know when to terminate the stream,

152
00:07:14,510 --> 00:07:18,350
and we'll use the TWEET_LIMIT
to help us with that as well.

153
00:07:18,350 --> 00:07:22,540
And we will store the topic
that we are looking for also.

154
00:07:22,540 --> 00:07:24,350
The reason for that is we're going

155
00:07:24,350 --> 00:07:26,540
to look at every tweet that comes in,

156
00:07:26,540 --> 00:07:30,030
and if the actual topic
is not in the tweet's text

157
00:07:30,030 --> 00:07:33,120
we're going to ignore that
tweet in this example.

158
00:07:33,120 --> 00:07:34,820
So, we're only going to show you tweets

159
00:07:34,820 --> 00:07:36,930
that actually physically contain

160
00:07:36,930 --> 00:07:40,310
the topic in the actual tweet_text,

161
00:07:40,310 --> 00:07:42,430
not necessarily in all of the other

162
00:07:42,430 --> 00:07:44,810
metadata in this example.

163
00:07:44,810 --> 00:07:47,930
Now, we also chose to set the preprocessor

164
00:07:47,930 --> 00:07:51,150
module's options to say that we're going

165
00:07:51,150 --> 00:07:55,840
to drop out URLs and any
Twitter RESERVED words

166
00:07:55,840 --> 00:07:58,680
that appear in tweets as we clean them,

167
00:07:58,680 --> 00:08:02,110
and then finally, as we did
in the last example as well,

168
00:08:02,110 --> 00:08:05,360
we take the tweepy API
object and hand that off

169
00:08:05,360 --> 00:08:08,980
to the superclass
StreamListener's init method

170
00:08:08,980 --> 00:08:10,880
because the StreamListener class

171
00:08:10,880 --> 00:08:13,100
has all sorts of builtin capability

172
00:08:13,100 --> 00:08:17,063
that is dependent on
using that API object.

173
00:08:18,060 --> 00:08:20,500
Now, separately we have
our on_status method,

174
00:08:20,500 --> 00:08:22,550
and again, this is the method that's going

175
00:08:22,550 --> 00:08:25,680
to get called every time
we receive a status object

176
00:08:25,680 --> 00:08:28,580
from Twitter representing a given tweet,

177
00:08:28,580 --> 00:08:31,410
and as we did in the preceding example

178
00:08:31,410 --> 00:08:34,673
we are once again going to
try to get the full_text

179
00:08:34,673 --> 00:08:36,880
of an extended_tweet, which is one

180
00:08:36,880 --> 00:08:40,150
that has 141 to 280 characters.

181
00:08:40,150 --> 00:08:42,710
If that fails we'll get an exception,

182
00:08:42,710 --> 00:08:44,760
in which case we will just fall back

183
00:08:44,760 --> 00:08:48,710
to the default text attribute
of the status object,

184
00:08:48,710 --> 00:08:52,270
and that's going to
represent a tweet up to 140,

185
00:08:52,270 --> 00:08:55,120
or up through 140 characters.

186
00:08:55,120 --> 00:08:59,360
Now, if the particular tweet
that we receive is a retweet

187
00:08:59,360 --> 00:09:01,470
we are simply going to ignore it.

188
00:09:01,470 --> 00:09:05,040
We want 10 unique tweets for this example,

189
00:09:05,040 --> 00:09:07,550
or whatever number of tweets you specify.

190
00:09:07,550 --> 00:09:09,140
We want them to be unique tweets

191
00:09:09,140 --> 00:09:12,060
that have positive, negative,
or neutral sentiment.

192
00:09:12,060 --> 00:09:14,470
If it's a bunch of
retweets, and it could be

193
00:09:14,470 --> 00:09:17,070
many of them if you're
dealing with a viral topic,

194
00:09:17,070 --> 00:09:19,840
you may get all the
same value very quickly,

195
00:09:19,840 --> 00:09:24,460
so we're ignoring any
tweet that contains RT

196
00:09:24,460 --> 00:09:25,800
at the beginning of the tweet,

197
00:09:25,800 --> 00:09:28,080
which indicates that it's a retweet.

198
00:09:28,080 --> 00:09:30,870
Separately we're going to clean the text.

199
00:09:30,870 --> 00:09:32,930
We're going to get rid of any URLs

200
00:09:32,930 --> 00:09:36,550
and other Twitter
keywords that, like "fav,"

201
00:09:36,550 --> 00:09:39,930
F-A-V, that might be inside of that tweet.

202
00:09:39,930 --> 00:09:43,630
And then we're going to
check to see if the topic

203
00:09:43,630 --> 00:09:48,480
that we're searching for is
contained in the tweet_text.

204
00:09:48,480 --> 00:09:51,350
So, if the topic in its lowercase form

205
00:09:51,350 --> 00:09:55,030
is not in the tweet_text
in its lowercase form,

206
00:09:55,030 --> 00:09:57,210
then we're going to
ignore that tweet as well.

207
00:09:57,210 --> 00:10:00,770
So, if I search for marvel,
I want the word marvel

208
00:10:00,770 --> 00:10:05,150
to appear in every single tweet
in this particular example.

209
00:10:05,150 --> 00:10:07,670
Now, once we have gotten to this point

210
00:10:07,670 --> 00:10:10,060
we have a tweet that we
are going to process,

211
00:10:10,060 --> 00:10:12,690
so we create a TextBlob out of it,

212
00:10:12,690 --> 00:10:15,220
and you may recall from
the preceding lesson

213
00:10:15,220 --> 00:10:18,190
that every TextBlob has
a sentiment attribute,

214
00:10:18,190 --> 00:10:21,910
and inside that sentiment
attribute is a polarity value,

215
00:10:21,910 --> 00:10:23,130
which is going to be greater

216
00:10:23,130 --> 00:10:25,470
than zero for positive sentiment,

217
00:10:25,470 --> 00:10:28,400
less, equal to zero for neutral sentiment,

218
00:10:28,400 --> 00:10:31,090
and less than zero for negative sentiment.

219
00:10:31,090 --> 00:10:32,920
So, we're going to check that polarity,

220
00:10:32,920 --> 00:10:35,390
and if it's a positive value we're going

221
00:10:35,390 --> 00:10:39,560
to set the sentiment
variable to a plus sign,

222
00:10:39,560 --> 00:10:43,430
which we'll use as part of
what we display with the tweet.

223
00:10:43,430 --> 00:10:47,060
We're going to go into the
dictionary that we stored

224
00:10:47,060 --> 00:10:49,610
and set its positive keys value

225
00:10:49,610 --> 00:10:52,920
to whatever the value
was previously plus one,

226
00:10:52,920 --> 00:10:54,830
so we're gonna modify the value

227
00:10:54,830 --> 00:10:57,130
associated with the positive key.

228
00:10:57,130 --> 00:11:00,420
If the sentiment is neutral
we'll do the same thing,

229
00:11:00,420 --> 00:11:03,910
but we'll use a space
character to indicate

230
00:11:03,910 --> 00:11:06,080
neutral sentiment, and we'll add one

231
00:11:06,080 --> 00:11:08,050
to the neutral counter, if you will,

232
00:11:08,050 --> 00:11:10,010
and then finally, if it's not positive

233
00:11:10,010 --> 00:11:12,310
or neutral it has to be negative,

234
00:11:12,310 --> 00:11:14,130
so for the else part we'll increment

235
00:11:14,130 --> 00:11:17,600
the negative sentiment key value pair

236
00:11:17,600 --> 00:11:21,270
and we'll use a minus sign to
indicate negative sentiment

237
00:11:21,270 --> 00:11:25,920
in the context of the
output of this application.

238
00:11:25,920 --> 00:11:29,480
Now, at that point we're
going to display the tweet.

239
00:11:29,480 --> 00:11:32,880
We're going to precede it
by its sentiment string,

240
00:11:32,880 --> 00:11:34,740
then we're going to show the screen_name

241
00:11:34,740 --> 00:11:36,840
of the user followed by a colon,

242
00:11:36,840 --> 00:11:41,209
and we're going to show their
actual cleaned up tweet_text

243
00:11:41,209 --> 00:11:44,270
as the text that gets dumped out

244
00:11:44,270 --> 00:11:46,820
at the command line in this example.

245
00:11:46,820 --> 00:11:48,970
We will also add one to our tweet_count,

246
00:11:48,970 --> 00:11:51,730
because eventually we're
going to want to reach

247
00:11:51,730 --> 00:11:54,130
that TWEET_LIMIT and we
want to make sure that

248
00:11:54,130 --> 00:11:57,950
the stream terminates
properly in our example.

249
00:11:57,950 --> 00:12:01,630
In the main method we're going
to set up our authentication,

250
00:12:01,630 --> 00:12:03,190
so this is the same stuff that

251
00:12:03,190 --> 00:12:04,900
we've done a couple of times now.

252
00:12:04,900 --> 00:12:08,510
First we must create
the OAuthHandler object,

253
00:12:08,510 --> 00:12:11,350
which requires both the consumer API key

254
00:12:11,350 --> 00:12:14,310
and the consumer API secret key.

255
00:12:14,310 --> 00:12:16,520
We then have to tell the OAuth object

256
00:12:16,520 --> 00:12:18,170
to set its access token,

257
00:12:18,170 --> 00:12:21,030
which is the additional
two pieces of information

258
00:12:21,030 --> 00:12:25,240
that we have stored in our keys.py file.

259
00:12:25,240 --> 00:12:27,470
Once we've configured that OAuth object

260
00:12:27,470 --> 00:12:30,080
we can get our tweepy API object,

261
00:12:30,080 --> 00:12:35,080
and again, that requires the
OAuthHandler as an argument,

262
00:12:35,250 --> 00:12:38,390
and once again, we are
going to specify that

263
00:12:38,390 --> 00:12:42,060
if we reach those rate
limits that Twitter imposes

264
00:12:42,060 --> 00:12:46,630
we want the Twitter API
object to automatically issue

265
00:12:46,630 --> 00:12:51,630
a wait so that we don't
go past the rate limits.

266
00:12:51,900 --> 00:12:54,570
Next up we're going to get
our command line arguments.

267
00:12:54,570 --> 00:12:56,520
So, the search_key is going to be

268
00:12:56,520 --> 00:12:59,460
the argument at index number one.

269
00:12:59,460 --> 00:13:01,020
Remember that when you're working

270
00:13:01,020 --> 00:13:02,770
with command line arguments

271
00:13:02,770 --> 00:13:07,120
the sys module provides this argv list.

272
00:13:07,120 --> 00:13:10,520
Element number zero is
the name of your script.

273
00:13:10,520 --> 00:13:14,020
Element number one is the first
argument after the script.

274
00:13:14,020 --> 00:13:17,400
Element number two is the second
argument after the script.

275
00:13:17,400 --> 00:13:20,930
By the way, you can use
a multiword search_key

276
00:13:20,930 --> 00:13:23,870
if when you type in your
command line arguments

277
00:13:23,870 --> 00:13:27,600
you enclose the search_key
in double quote characters.

278
00:13:27,600 --> 00:13:28,840
So, whatever the search_key is

279
00:13:28,840 --> 00:13:30,660
will be stored here as a string,

280
00:13:30,660 --> 00:13:33,190
and the last command line argument,

281
00:13:33,190 --> 00:13:35,130
the one in index number two,

282
00:13:35,130 --> 00:13:39,160
is going to represent the total
number of tweets to process,

283
00:13:39,160 --> 00:13:43,270
not including all of the ones
that we ignore along the way.

284
00:13:43,270 --> 00:13:46,570
We only add one to the counter for tweets

285
00:13:46,570 --> 00:13:50,300
if we get what we are
considering to be a valid tweet

286
00:13:50,300 --> 00:13:52,580
for the purpose of this example.

287
00:13:52,580 --> 00:13:55,270
Now, we are assuming that you properly

288
00:13:55,270 --> 00:13:58,190
supply the two command line arguments,

289
00:13:58,190 --> 00:14:02,090
so if you do not, this program will fail

290
00:14:02,090 --> 00:14:05,070
with an exception and
terminate, just so you know.

291
00:14:05,070 --> 00:14:06,920
Of course, we could check to make sure

292
00:14:06,920 --> 00:14:09,720
that there really are two
command line arguments first,

293
00:14:09,720 --> 00:14:12,430
and then only process them if in fact

294
00:14:12,430 --> 00:14:15,200
there are the appropriate arguments.

295
00:14:15,200 --> 00:14:17,720
Next up we set up the
sentiment dictionary,

296
00:14:17,720 --> 00:14:21,410
and as you can see, we've
preinitialized it with three keys,

297
00:14:21,410 --> 00:14:25,263
positive with the value zero,
neutral with the value zero,

298
00:14:25,263 --> 00:14:27,191
and negative with the value zero,

299
00:14:27,191 --> 00:14:30,640
and this is the dictionary that
the SentimentListener object

300
00:14:30,640 --> 00:14:35,220
is going to continuously
update as each tweet arrives.

301
00:14:35,220 --> 00:14:38,140
Next up we create our SentimentListener,

302
00:14:38,140 --> 00:14:40,700
and we're giving it the four arguments

303
00:14:40,700 --> 00:14:43,420
that we talked about up
above in the init method.

304
00:14:43,420 --> 00:14:45,910
The first is the tweepy API object

305
00:14:45,910 --> 00:14:48,700
that will be used to
interact with Twitter.

306
00:14:48,700 --> 00:14:52,100
The second is this sentiment
dictionary that we just created

307
00:14:52,100 --> 00:14:55,940
that the SentimentListener will
store and repeatedly update.

308
00:14:55,940 --> 00:14:59,210
The third is the topic that
we're actually searching for,

309
00:14:59,210 --> 00:15:01,830
the search_key in the main function,

310
00:15:01,830 --> 00:15:04,950
and finally, limit, which
we just created back here

311
00:15:04,950 --> 00:15:09,103
at line 73, will represent the
number of tweets to process.

312
00:15:10,270 --> 00:15:12,630
Next up we're going to
set up our tweepy.Stream,

313
00:15:12,630 --> 00:15:16,540
and again, we have to tell
it the OAuthHandler object

314
00:15:16,540 --> 00:15:19,870
that it's going to use to
authenticate with Twitter,

315
00:15:19,870 --> 00:15:23,650
and then we also need to
specify the listener object

316
00:15:23,650 --> 00:15:26,940
that will be notified
as each tweet arrives,

317
00:15:26,940 --> 00:15:30,020
and that's the listener
that we just created.

318
00:15:30,020 --> 00:15:32,440
And finally, to get the stream going

319
00:15:32,440 --> 00:15:34,650
we are going to invoke the filter method

320
00:15:34,650 --> 00:15:36,890
once again on the stream object.

321
00:15:36,890 --> 00:15:39,340
So, as you can see here, we are tracking

322
00:15:39,340 --> 00:15:42,290
a list of terms that contains one term,

323
00:15:42,290 --> 00:15:45,527
the search_key that we specified
as a command line argument,

324
00:15:45,527 --> 00:15:48,080
and again, this can be
a comma-separated list

325
00:15:48,080 --> 00:15:49,630
of a whole bunch of terms,

326
00:15:49,630 --> 00:15:52,490
so for instance if I was
a political researcher

327
00:15:52,490 --> 00:15:55,230
and I was tracking two
presidential candidates

328
00:15:55,230 --> 00:15:59,650
I might want to have both of
their names in the list here

329
00:15:59,650 --> 00:16:01,600
so that I get all the tweets containing

330
00:16:01,600 --> 00:16:04,870
each name along the
way, or not all of them.

331
00:16:04,870 --> 00:16:07,870
Remember, it's only one
percent of the livestream,

332
00:16:07,870 --> 00:16:12,470
so a randomly selected
one percent of the tweets

333
00:16:12,470 --> 00:16:14,810
that have those names is
what I should've said.

334
00:16:14,810 --> 00:16:17,160
Now, in this case we happen to be looking

335
00:16:17,160 --> 00:16:19,820
only for tweets in the English language,

336
00:16:19,820 --> 00:16:21,750
but we did demonstrate to you

337
00:16:21,750 --> 00:16:23,980
in the previous streaming example

338
00:16:23,980 --> 00:16:26,670
that we can receive tweets
in multiple languages,

339
00:16:26,670 --> 00:16:29,040
and that we can use things like TextBlob

340
00:16:29,040 --> 00:16:32,360
to automatically translate
those tweets as well.

341
00:16:32,360 --> 00:16:36,630
So, the languages argument,
again, is going to be a list,

342
00:16:36,630 --> 00:16:38,360
so you can actually track tweets

343
00:16:38,360 --> 00:16:40,430
in several different languages

344
00:16:40,430 --> 00:16:43,980
by putting a comma-separated
list of values here.

345
00:16:43,980 --> 00:16:48,000
And finally, in this case
we set is_async to False

346
00:16:48,000 --> 00:16:51,300
because we only want to
display the final results

347
00:16:51,300 --> 00:16:55,190
after all the tweets that we
specified have been read in,

348
00:16:55,190 --> 00:16:57,860
so in this example we did 10 tweets

349
00:16:57,860 --> 00:17:00,180
and I only want to show you the contents

350
00:17:00,180 --> 00:17:04,130
of the sentiment dictionary
after we have all 10 of them.

351
00:17:04,130 --> 00:17:08,280
Once we, this call returns,
which could take some,

352
00:17:08,280 --> 00:17:10,900
a short amount of time
or a long amount of time,

353
00:17:10,900 --> 00:17:14,453
or somewhere in between depending
on what you are tracking,

354
00:17:15,400 --> 00:17:18,010
at that point we will
display our final summary

355
00:17:18,010 --> 00:17:20,650
of the results where we
show you the tweet sentiment

356
00:17:20,650 --> 00:17:22,570
for whatever the search_key was,

357
00:17:22,570 --> 00:17:25,040
and for the positive we'll access

358
00:17:25,040 --> 00:17:26,700
the positive key in the dictionary,

359
00:17:26,700 --> 00:17:29,080
and similarly the
neutral and negative keys

360
00:17:29,080 --> 00:17:32,893
in the dictionary to
display the final results.