1 00:00:00,710 --> 00:00:01,620 - [Instructor] So to finish up 2 00:00:01,620 --> 00:00:04,260 our trending topics discussion 3 00:00:04,260 --> 00:00:09,060 let's take the top trending topics from New York City 4 00:00:09,060 --> 00:00:13,260 and turn them into a word cloud visualization. 5 00:00:13,260 --> 00:00:15,480 So for that purpose we're going to define 6 00:00:15,480 --> 00:00:19,630 a dictionary called topics and we're going to be using 7 00:00:19,630 --> 00:00:22,730 this dictionary to store key value pairs 8 00:00:22,730 --> 00:00:25,980 where the keys are the trending topic texts 9 00:00:25,980 --> 00:00:29,130 and the values are the number of tweets 10 00:00:29,130 --> 00:00:31,810 associated with that trending topic. 11 00:00:31,810 --> 00:00:33,810 And remember in a word cloud 12 00:00:33,810 --> 00:00:37,190 the visualization is going to make bigger words 13 00:00:37,190 --> 00:00:40,210 for the items that are more frequent. 14 00:00:40,210 --> 00:00:44,510 So to create the dictionary let's use a little loop here. 15 00:00:44,510 --> 00:00:45,640 We're going to walk through 16 00:00:45,640 --> 00:00:48,050 the New York City trends list that we created 17 00:00:48,050 --> 00:00:51,130 back up above here in the preceding video. 18 00:00:51,130 --> 00:00:55,000 And for each of the items in that list 19 00:00:55,000 --> 00:00:57,840 we're going to create a new key 20 00:00:57,840 --> 00:01:01,400 in the topics dictionary, which is initially empty. 21 00:01:01,400 --> 00:01:05,820 The key is going to be the given trends name 22 00:01:05,820 --> 00:01:10,820 so that will often be a hashtag or just a string of text. 23 00:01:10,920 --> 00:01:13,010 And the value is going to be 24 00:01:13,010 --> 00:01:15,940 the corresponding trends tweet volume. 25 00:01:15,940 --> 00:01:18,130 And remember that up above here 26 00:01:18,130 --> 00:01:20,810 we filtered out any of the items 27 00:01:20,810 --> 00:01:23,990 that have none as their tweet volume. 28 00:01:23,990 --> 00:01:25,950 So all of the tweets that are going to be 29 00:01:25,950 --> 00:01:28,790 processed in this case are going to have 30 00:01:28,790 --> 00:01:32,080 10,000 or more tweets associated with them. 31 00:01:32,080 --> 00:01:34,560 All of the trends will have 10,000 or more tweets 32 00:01:34,560 --> 00:01:38,320 and because it's a relatively small number 33 00:01:38,320 --> 00:01:40,700 of topics that we're dealing with 34 00:01:40,700 --> 00:01:43,000 the word cloud is going to probably be 35 00:01:43,000 --> 00:01:46,360 somewhat sparse when it is completed. 36 00:01:46,360 --> 00:01:48,340 So let's go ahead and execute that 37 00:01:48,340 --> 00:01:50,320 to acquire the key value pairs. 38 00:01:50,320 --> 00:01:53,890 And we can take a quick look at what topics looks like 39 00:01:53,890 --> 00:01:56,540 and you'll see that we actually got 40 00:01:56,540 --> 00:01:59,050 a decent number of key value pairs. 41 00:01:59,050 --> 00:02:03,690 The top tweet topic at the moment is whatever Hanbin is 42 00:02:03,690 --> 00:02:07,780 and there's over 2.4 million tweets for that right now 43 00:02:07,780 --> 00:02:11,010 and that goes all the way down to the smallest 44 00:02:11,010 --> 00:02:13,430 of the topics which has as of right this moment 45 00:02:13,430 --> 00:02:16,280 11,000 plus tweets today. 46 00:02:16,280 --> 00:02:19,350 And there are some other trending topics probably as well 47 00:02:19,350 --> 00:02:23,000 but those were fewer than 10,000 tweets 48 00:02:23,000 --> 00:02:25,470 and therefore did not show up in the list. 49 00:02:25,470 --> 00:02:28,520 And always keep in mind that when you look at stuff 50 00:02:28,520 --> 00:02:30,980 like this coming back live from Twitter 51 00:02:30,980 --> 00:02:34,900 you could get some really crazy stuff back 52 00:02:34,900 --> 00:02:37,380 including lots of curse words, for example. 53 00:02:37,380 --> 00:02:41,350 So just be aware that you may see things you don't like 54 00:02:41,350 --> 00:02:43,650 in the results that you get back from Twitter. 55 00:02:44,520 --> 00:02:48,290 So at this point we have our dictionary 56 00:02:48,290 --> 00:02:50,800 that we're going to use to create the word cloud. 57 00:02:50,800 --> 00:02:52,900 Previously we didn't use a dictionary 58 00:02:52,900 --> 00:02:54,700 so this is a little bit different 59 00:02:54,700 --> 00:02:57,950 from what we showed you back in the NLP chapter. 60 00:02:57,950 --> 00:03:01,390 So next up we're going to import the word cloud class 61 00:03:01,390 --> 00:03:02,960 from the word cloud library 62 00:03:02,960 --> 00:03:05,530 and this time when I create the word cloud 63 00:03:05,530 --> 00:03:09,340 I'm going to include some additional key word arguments. 64 00:03:09,340 --> 00:03:11,010 We do have a width and a height 65 00:03:11,010 --> 00:03:12,730 so we're specifying that we want 66 00:03:12,730 --> 00:03:15,920 600 pixels wide and 900 pixels tall. 67 00:03:15,920 --> 00:03:18,270 We're not going to use a mask image this time 68 00:03:18,270 --> 00:03:23,140 so our actual word cloud will be rectangular 69 00:03:23,140 --> 00:03:26,070 and using the dimensions that we've specified. 70 00:03:26,070 --> 00:03:29,390 This key word argument is new to us in this example 71 00:03:29,390 --> 00:03:32,770 prefer horizontal with the value .5 72 00:03:32,770 --> 00:03:36,330 means to try to use as a guideline 73 00:03:36,330 --> 00:03:41,270 that 50% of the words should be horizontally oriented 74 00:03:41,270 --> 00:03:43,640 versus vertically oriented. 75 00:03:43,640 --> 00:03:46,189 And it's free to ignore that 76 00:03:46,189 --> 00:03:49,200 in order to fit the content appropriately 77 00:03:49,200 --> 00:03:52,740 but it's just a suggestion that you can provide. 78 00:03:52,740 --> 00:03:54,420 We also indicated that we didn't want 79 00:03:54,420 --> 00:03:58,120 any fonts smaller than 10 point size. 80 00:03:58,120 --> 00:04:01,140 And we did give a color map to choose colors in this case 81 00:04:01,140 --> 00:04:04,900 and we set the background color to white as well. 82 00:04:04,900 --> 00:04:07,910 Now once we've created the word cloud object 83 00:04:07,910 --> 00:04:10,770 we can use it to generate the word cloud. 84 00:04:10,770 --> 00:04:13,610 Previously we used the generate method for that, 85 00:04:13,610 --> 00:04:15,660 here we're using fit words 86 00:04:15,660 --> 00:04:18,060 which is capable of receiving as its argument 87 00:04:18,060 --> 00:04:20,310 a dictionary of key value pairs 88 00:04:20,310 --> 00:04:22,890 like the one that we created up above. 89 00:04:22,890 --> 00:04:24,950 So we'll go ahead and execute that 90 00:04:24,950 --> 00:04:28,630 to create the word cloud and let's also write that out 91 00:04:29,610 --> 00:04:32,460 to a file on disk as a PNG image 92 00:04:32,460 --> 00:04:36,190 by using the to file method of the word cloud object. 93 00:04:36,190 --> 00:04:38,990 And I'm now going to go ahead and open up that image 94 00:04:38,990 --> 00:04:40,910 and bring it onto my screen here. 95 00:04:40,910 --> 00:04:43,560 So this is, whoops, this is the word cloud 96 00:04:43,560 --> 00:04:47,310 that we just generated and as you can see 97 00:04:47,310 --> 00:04:50,650 some of the words are horizontal some of them are vertical. 98 00:04:50,650 --> 00:04:53,040 The one that's the biggest is the one 99 00:04:53,040 --> 00:04:55,070 that had the largest tweet count 100 00:04:55,070 --> 00:04:57,440 and these smaller items that you see 101 00:04:57,440 --> 00:05:00,510 in various locations throughout the word cloud 102 00:05:00,510 --> 00:05:04,683 represent the ones that had smaller overall tweet counts.