1 00:00:00,550 --> 00:00:01,620 - [Narrator] Next up, we're going to take 2 00:00:01,620 --> 00:00:04,593 a look at a case study using a MongoDB 3 00:00:04,593 --> 00:00:07,200 JSON Document Database. 4 00:00:07,200 --> 00:00:09,570 And in particular what we're going to do 5 00:00:09,570 --> 00:00:13,420 is set up a Twitter streaming listener. 6 00:00:13,420 --> 00:00:15,110 Where we're going to be looking for 7 00:00:15,110 --> 00:00:18,700 Tweets to, from and about the 8 00:00:18,700 --> 00:00:20,950 100 current U.S. Senators. 9 00:00:20,950 --> 00:00:24,010 Every one of whom has a Twitter handle. 10 00:00:24,010 --> 00:00:26,490 And, we'll be storing the actual 11 00:00:26,490 --> 00:00:29,390 JavaScript object notation objects that 12 00:00:29,390 --> 00:00:31,750 represent those Tweets directly 13 00:00:31,750 --> 00:00:34,010 in a MongoDB database. 14 00:00:34,010 --> 00:00:37,440 Now, we're going to capture 10 thousand such Tweets. 15 00:00:37,440 --> 00:00:39,592 Then we're going to use the Panda's library 16 00:00:39,592 --> 00:00:43,000 to help us summarize that information. 17 00:00:43,000 --> 00:00:44,850 And we'll do things like display the 18 00:00:44,850 --> 00:00:48,040 top 10 Senators by their Tweet activity. 19 00:00:48,040 --> 00:00:49,900 And again that will be Tweets to them, 20 00:00:49,900 --> 00:00:52,730 from them and about them as well. 21 00:00:52,730 --> 00:00:54,870 And, once we have all that information 22 00:00:54,870 --> 00:00:56,950 we're also going to summarize on a 23 00:00:56,950 --> 00:00:59,150 state by state basis on an 24 00:00:59,150 --> 00:01:01,237 interactive U.S Folium map. 25 00:01:01,237 --> 00:01:05,182 The names of the states, the names of the Senators 26 00:01:05,182 --> 00:01:08,760 and the Tweet activity associated with 27 00:01:08,760 --> 00:01:13,760 each Senator in decreasing order of most Tweets. 28 00:01:14,070 --> 00:01:16,730 Now, to do all of this we're going to be working 29 00:01:16,730 --> 00:01:21,470 with a free cloud-based MongoDB Atlas cluster. 30 00:01:21,470 --> 00:01:24,710 This is their online MongoDB service that 31 00:01:24,710 --> 00:01:27,400 enables you to work with MongoDB 32 00:01:27,400 --> 00:01:29,440 without having to install anything 33 00:01:29,440 --> 00:01:31,784 locally on your computer other than 34 00:01:31,784 --> 00:01:34,730 the library that we need in order 35 00:01:34,730 --> 00:01:37,360 to interact with MongoDB. 36 00:01:37,360 --> 00:01:40,800 As we'll discuss it only allows you to 37 00:01:40,800 --> 00:01:43,700 store up to 512 megabytes of data. 38 00:01:43,700 --> 00:01:45,940 Which nowadays is of course tiny. 39 00:01:45,940 --> 00:01:47,970 And, this is to give you a taste of 40 00:01:47,970 --> 00:01:49,798 what MongoDB all about. 41 00:01:49,798 --> 00:01:52,210 And of course they would then expect 42 00:01:52,210 --> 00:01:56,110 you to sign up to use their paid service. 43 00:01:56,110 --> 00:01:58,330 Or, if you need to store more data 44 00:01:58,330 --> 00:01:59,910 and don't want to pay for the service, 45 00:01:59,910 --> 00:02:02,140 you can also download their free 46 00:02:02,140 --> 00:02:05,080 MongoDB community server and you can run 47 00:02:05,080 --> 00:02:06,583 that locally on your own systems. 48 00:02:06,583 --> 00:02:09,600 And again of course they do have their 49 00:02:09,600 --> 00:02:13,110 paid online atlas service. 50 00:02:13,110 --> 00:02:14,850 Now, for the purpose of this example 51 00:02:14,850 --> 00:02:17,440 you will need two additional libraries. 52 00:02:17,440 --> 00:02:19,580 So, you'll need to install pymongo. 53 00:02:19,580 --> 00:02:22,020 Which is the python module for 54 00:02:22,020 --> 00:02:24,310 interacting with MongoDB databases. 55 00:02:24,310 --> 00:02:26,740 And the pymongo module requires 56 00:02:26,740 --> 00:02:30,080 dnspython when we are connecting 57 00:02:30,080 --> 00:02:32,159 to a MongoDB Atlas clusters. 58 00:02:32,159 --> 00:02:34,900 So, you will need to install both of those 59 00:02:34,900 --> 00:02:38,670 libraries before continuing with this example. 60 00:02:38,670 --> 00:02:41,810 Now, in the ch16 examples folder there's 61 00:02:41,810 --> 00:02:44,170 a sub folder called TwitterMongoDB. 62 00:02:44,170 --> 00:02:47,920 And, within there you'll find the keys.py file. 63 00:02:47,920 --> 00:02:50,324 Similar to the one that we created back 64 00:02:50,324 --> 00:02:52,790 when we were discussing the 65 00:02:52,790 --> 00:02:54,270 Twitter Data Mining. 66 00:02:54,270 --> 00:02:57,060 And you will need your Twitter credentials 67 00:02:57,060 --> 00:03:00,486 and your OpenMapQuest key from back in the 68 00:03:00,486 --> 00:03:02,910 Twitter Data Mining lesson. 69 00:03:02,910 --> 00:03:04,550 So, you'll want to copy that stuff 70 00:03:04,550 --> 00:03:06,235 in to this keys.py file. 71 00:03:06,235 --> 00:03:09,600 And, separately we will be adding to this 72 00:03:10,451 --> 00:03:14,500 keys.py file a MongoDB connection string. 73 00:03:14,500 --> 00:03:16,780 Which is a key piece of information 74 00:03:16,780 --> 00:03:18,320 that will need to be able to 75 00:03:18,320 --> 00:03:21,313 communicate with the cluster properly.