1 00:00:06,900 --> 00:00:09,920 - Welcome to Lesson 12 Data Mining Twitter. 2 00:00:09,920 --> 00:00:13,300 This lesson is based on chapter 12 of our professional book 3 00:00:13,300 --> 00:00:14,750 Python for Programmers 4 00:00:14,750 --> 00:00:17,010 and chapter 13 of our textbook 5 00:00:17,010 --> 00:00:20,700 Intro to Python for Computer Science and Data Science. 6 00:00:20,700 --> 00:00:23,880 In this lesson, we're going to focus on data mining 7 00:00:23,880 --> 00:00:26,590 one of the biggest big data sources there is 8 00:00:26,590 --> 00:00:27,940 which is Twitter. 9 00:00:27,940 --> 00:00:31,520 Everyday, hundreds of millions of tweets are sent 10 00:00:31,520 --> 00:00:33,710 and you actually can tap into that 11 00:00:33,710 --> 00:00:37,370 and acquire as many as 1% of those tweets, 12 00:00:37,370 --> 00:00:41,470 something on the order of 7.5 to 8 million tweets a day 13 00:00:41,470 --> 00:00:44,120 are available to you if you would like. 14 00:00:44,120 --> 00:00:46,950 So we're going to be working with a library called Tweepy 15 00:00:46,950 --> 00:00:51,290 which makes it super easy to authenticate with Twitter. 16 00:00:51,290 --> 00:00:54,580 You will need a Twitter developer account for this purpose 17 00:00:54,580 --> 00:00:57,620 and you'll have to go ahead and set up what they call an app 18 00:00:57,620 --> 00:00:59,810 which will give you the credentials you need 19 00:00:59,810 --> 00:01:01,320 for interacting with Twitter 20 00:01:01,320 --> 00:01:04,860 and you'll simply feed those into the Tweepy library. 21 00:01:04,860 --> 00:01:06,980 It will do the authentication for you 22 00:01:06,980 --> 00:01:09,420 making it really easy to connect. 23 00:01:09,420 --> 00:01:10,980 And then from that point, 24 00:01:10,980 --> 00:01:14,780 you can start interacting with the other Twitter APIs. 25 00:01:14,780 --> 00:01:17,350 So some of the things that we're going to do in this lesson 26 00:01:17,350 --> 00:01:20,840 we're going to search tweets from the past seven days 27 00:01:20,840 --> 00:01:23,270 and you only have access to the past seven days 28 00:01:23,270 --> 00:01:25,000 using the free APIs, 29 00:01:25,000 --> 00:01:28,330 but there are services out there that you can pay 30 00:01:28,330 --> 00:01:32,210 in order to gain access to the full tweet database 31 00:01:32,210 --> 00:01:33,430 if you would like. 32 00:01:33,430 --> 00:01:34,750 We're also going to show you 33 00:01:34,750 --> 00:01:38,000 how to sample the livestream of tweets 34 00:01:38,000 --> 00:01:39,620 and Twitter does that randomly 35 00:01:39,620 --> 00:01:41,910 so you're never going to be guaranteed 36 00:01:41,910 --> 00:01:44,750 to get the exact same tweets that somebody else gets 37 00:01:44,750 --> 00:01:48,170 if they too are tapping into the livestream. 38 00:01:48,170 --> 00:01:50,520 You will just get a random sample of the tweets 39 00:01:50,520 --> 00:01:54,193 that match the criteria you happen to be searching for. 40 00:01:55,740 --> 00:01:56,660 Also in this lesson, 41 00:01:56,660 --> 00:01:58,640 we're going to take a look at the so called 42 00:01:58,640 --> 00:02:01,100 tweet object meta data. 43 00:02:01,100 --> 00:02:02,750 Every tweet that happens 44 00:02:02,750 --> 00:02:05,570 also has a ton of additional information 45 00:02:05,570 --> 00:02:08,470 that is packaged together with that tweet 46 00:02:08,470 --> 00:02:11,670 into a JavaScript object notation object 47 00:02:11,670 --> 00:02:15,530 which you get access to when you receive that tweet 48 00:02:15,530 --> 00:02:17,380 from the Twitter APIs. 49 00:02:17,380 --> 00:02:20,220 So we're going to take a look at some of that meta data 50 00:02:20,220 --> 00:02:22,440 which includes things like who sent the tweet, 51 00:02:22,440 --> 00:02:24,060 whether it was retweeted, 52 00:02:24,060 --> 00:02:26,120 when it happened, where it happened, 53 00:02:26,120 --> 00:02:28,530 and a lot more as well. 54 00:02:28,530 --> 00:02:31,010 We'll also use some natural language processing 55 00:02:31,010 --> 00:02:32,110 in this lesson. 56 00:02:32,110 --> 00:02:35,020 We'll do that for both pre-processing tweets 57 00:02:35,020 --> 00:02:37,740 to get them ready for analysis 58 00:02:37,740 --> 00:02:40,210 and we'll also do some sentiment analysis 59 00:02:40,210 --> 00:02:42,880 using the same TextBlob capabilities 60 00:02:42,880 --> 00:02:46,020 that we introduced back in lesson 12. 61 00:02:46,020 --> 00:02:47,520 In addition, we're going to show you 62 00:02:47,520 --> 00:02:50,450 how to look at the Twitter trends. 63 00:02:50,450 --> 00:02:53,740 It maintains trending topics worldwide 64 00:02:53,740 --> 00:02:56,970 for about 450 different locations 65 00:02:56,970 --> 00:02:59,680 and one of those is for the world at large 66 00:02:59,680 --> 00:03:02,530 and then the others are for individual cities 67 00:03:02,530 --> 00:03:04,530 and landmarks as well. 68 00:03:04,530 --> 00:03:07,310 And we're going to finish off this lesson 69 00:03:07,310 --> 00:03:11,620 by showing you how to locate the location information 70 00:03:11,620 --> 00:03:14,310 in the context of the tweet meta data 71 00:03:14,310 --> 00:03:16,980 and then we're going to actually plot tweets 72 00:03:16,980 --> 00:03:18,400 on an interactive map 73 00:03:18,400 --> 00:03:19,950 where you can click a map marker 74 00:03:19,950 --> 00:03:22,613 to see the tweet that came from that location.