1 00:00:00,680 --> 00:00:01,940 - [Instructor] In the preceding video, 2 00:00:01,940 --> 00:00:04,360 we showed you a way to use a dictionary 3 00:00:04,360 --> 00:00:06,850 to summarize word counts. 4 00:00:06,850 --> 00:00:10,000 Now, when you're doing things like that, 5 00:00:10,000 --> 00:00:12,910 which are kind of common types of tasks, 6 00:00:12,910 --> 00:00:17,200 it's often the case that Python already has a library class 7 00:00:17,200 --> 00:00:19,660 that can help you with that task. 8 00:00:19,660 --> 00:00:22,570 Summarizing word counts is a common technique 9 00:00:22,570 --> 00:00:25,710 for text analysis and it turns out 10 00:00:25,710 --> 00:00:30,210 that there is a type in the collections module, 11 00:00:30,210 --> 00:00:33,470 which is part of the Python Standard Library 12 00:00:33,470 --> 00:00:37,293 that can help us perform that task easily. 13 00:00:38,344 --> 00:00:40,040 So far we've looked at a number 14 00:00:40,040 --> 00:00:41,810 of different built-in collections. 15 00:00:41,810 --> 00:00:45,550 We looked at lists, and tuples, and now dictionaries, 16 00:00:45,550 --> 00:00:48,060 and soon, we're going to look at sets as well. 17 00:00:48,060 --> 00:00:50,510 But there are other types of collections out there 18 00:00:50,510 --> 00:00:53,660 and this particular module actually has a bunch 19 00:00:53,660 --> 00:00:55,970 of additional collection types 20 00:00:55,970 --> 00:00:59,060 that you can use in your Python code. 21 00:00:59,060 --> 00:01:01,950 One of which is this Counter class. 22 00:01:01,950 --> 00:01:05,550 And, it's a type of dictionary that knows how to, 23 00:01:05,550 --> 00:01:09,350 as they say, count hashable objects. 24 00:01:09,350 --> 00:01:12,110 Hashable objects are immutable objects 25 00:01:12,110 --> 00:01:15,670 that you can use as the keys in a dictionary 26 00:01:15,670 --> 00:01:18,530 or that you can use as the elements in a set, 27 00:01:18,530 --> 00:01:21,590 where the set elements must be unique. 28 00:01:21,590 --> 00:01:23,930 So anything hashable, which would be things 29 00:01:23,930 --> 00:01:27,120 like strings and the numeric values, 30 00:01:27,120 --> 00:01:29,580 like integers and floating point numbers, 31 00:01:29,580 --> 00:01:33,810 typically they're going to be immutable types of objects. 32 00:01:33,810 --> 00:01:36,150 Most commonly, they're going to be strings. 33 00:01:36,150 --> 00:01:39,700 But we can take advantage of this built-in Counter class 34 00:01:39,700 --> 00:01:43,410 in the collections module to perform the work 35 00:01:43,410 --> 00:01:47,090 that we showed you in the preceding script, in that loop. 36 00:01:47,090 --> 00:01:50,550 So it's going to do all of that work for us. 37 00:01:50,550 --> 00:01:52,590 Now, to demonstrate this concept, 38 00:01:52,590 --> 00:01:56,030 let's switch over to the terminal window here. 39 00:01:56,030 --> 00:02:00,140 And, let's go ahead and from that collections module, 40 00:02:00,140 --> 00:02:03,200 let's import the Counter class. 41 00:02:03,200 --> 00:02:06,050 And, just to demonstrate, by the way, 42 00:02:06,050 --> 00:02:08,630 we're going to take a string, 43 00:02:08,630 --> 00:02:11,870 paste it in the same text statement 44 00:02:11,870 --> 00:02:14,040 that we created in the script. 45 00:02:14,040 --> 00:02:17,370 So we're going to once again have this string, 46 00:02:17,370 --> 00:02:18,420 well actually two strings, 47 00:02:18,420 --> 00:02:21,540 that are going to be assembled together into one string. 48 00:02:21,540 --> 00:02:24,080 And, remember what we did in the script is, 49 00:02:24,080 --> 00:02:27,260 we told that string to split itself 50 00:02:27,260 --> 00:02:29,620 into a list of individual strings. 51 00:02:29,620 --> 00:02:31,450 So just to show you what that does, 52 00:02:31,450 --> 00:02:36,160 if I say text.split, you get a list. 53 00:02:36,160 --> 00:02:37,320 There's the square brackets 54 00:02:37,320 --> 00:02:39,370 denoting a list of string values, 55 00:02:39,370 --> 00:02:41,070 comment limited of course, 56 00:02:41,070 --> 00:02:43,100 and these are all the individual words 57 00:02:43,100 --> 00:02:46,860 that you see in the text body up above here. 58 00:02:46,860 --> 00:02:47,980 So what we're going to do is 59 00:02:47,980 --> 00:02:52,410 feed text.split into the creation of a Counter object 60 00:02:52,410 --> 00:02:54,490 and as part of creating the object, 61 00:02:54,490 --> 00:02:56,700 it's automatically going to summarize 62 00:02:56,700 --> 00:02:59,570 all of these words into their word counts. 63 00:02:59,570 --> 00:03:01,760 So to do that, we're going to create an object, 64 00:03:01,760 --> 00:03:06,760 I'll call counter, with a lower case c to differentiate it, 65 00:03:07,040 --> 00:03:08,220 from the class name. 66 00:03:08,220 --> 00:03:11,310 And, we'll say text.split here, 67 00:03:11,310 --> 00:03:14,640 and that's going to go create the Counter object. 68 00:03:14,640 --> 00:03:17,230 And, to save me a tiny bit of typing time here, 69 00:03:17,230 --> 00:03:20,420 I'm going to go ahead and put in a little for loop, 70 00:03:20,420 --> 00:03:22,210 that's going to iterate 71 00:03:22,210 --> 00:03:26,070 through my new Counter object's items. 72 00:03:26,070 --> 00:03:28,100 So it is still a dictionary 73 00:03:28,100 --> 00:03:31,510 and therefore I can still call the items method on it. 74 00:03:31,510 --> 00:03:33,370 And, the items method, of course, 75 00:03:33,370 --> 00:03:36,850 is going to give me back key value pairs as tuples. 76 00:03:36,850 --> 00:03:39,600 I'm going to take those tuples and pass them off 77 00:03:39,600 --> 00:03:42,180 to the sorted built-in function, 78 00:03:42,180 --> 00:03:46,860 which will give me those tuples in lexicographical order. 79 00:03:46,860 --> 00:03:48,330 These are all lower-case strings 80 00:03:48,330 --> 00:03:50,950 so they will be in alphabetical order. 81 00:03:50,950 --> 00:03:53,760 And, for each word and corresponding count, 82 00:03:53,760 --> 00:03:55,610 we're going to display a line of text 83 00:03:55,610 --> 00:03:57,790 containing the word and the count. 84 00:03:57,790 --> 00:03:59,820 So let's go ahead and execute that. 85 00:03:59,820 --> 00:04:02,520 There are all of the words and all of the counts, 86 00:04:02,520 --> 00:04:04,328 and if you were to go back and compare them 87 00:04:04,328 --> 00:04:07,200 to the output of the preceding script, 88 00:04:07,200 --> 00:04:08,660 where we used the same text, 89 00:04:08,660 --> 00:04:11,770 you'll see we got the same exact results. 90 00:04:11,770 --> 00:04:15,000 Now, separately we can also go ahead, 91 00:04:15,000 --> 00:04:16,870 like we did in the preceding script, 92 00:04:16,870 --> 00:04:21,010 and display the number of unique keys once again. 93 00:04:21,010 --> 00:04:23,871 So to do that we're going to use the length 94 00:04:23,871 --> 00:04:26,910 of the counter.keys method call. 95 00:04:26,910 --> 00:04:30,950 So counter.keys is going to give us the actual collection 96 00:04:30,950 --> 00:04:34,220 of keys which is a set of unique values, 97 00:04:34,220 --> 00:04:35,600 and the length of that, of course, 98 00:04:35,600 --> 00:04:37,300 is the number of unique keys. 99 00:04:37,300 --> 00:04:39,090 We can also simply take the length 100 00:04:39,090 --> 00:04:40,780 of the Counter object itself 101 00:04:40,780 --> 00:04:42,630 because it is a dictionary 102 00:04:42,630 --> 00:04:44,290 and the number of key value pairs 103 00:04:44,290 --> 00:04:47,260 would also be the number of unique keys. 104 00:04:47,260 --> 00:04:49,000 So as you can see once again, 105 00:04:49,000 --> 00:04:51,213 the number of unique keys is 10.