1 00:00:00,688 --> 00:00:02,560 - (Instructor) So now that you've been introduced 2 00:00:02,560 --> 00:00:06,820 to the statistics module and a little bit with lists, 3 00:00:06,820 --> 00:00:08,860 we'd like you to go ahead and create a list 4 00:00:08,860 --> 00:00:11,270 using the values that you see here 5 00:00:11,270 --> 00:00:14,160 and use the statistics module's capabilities 6 00:00:14,160 --> 00:00:17,360 to calculate the mean, the median, and the mode. 7 00:00:17,360 --> 00:00:19,830 So go ahead and pause the video and give that a shot 8 00:00:19,830 --> 00:00:21,903 and then come back to see the results. 9 00:00:25,820 --> 00:00:28,920 Okay, let's go ahead and take a look at the results here. 10 00:00:28,920 --> 00:00:32,680 So, of course in order to use the statistics module, 11 00:00:32,680 --> 00:00:35,840 we need to first import it, which we will do 12 00:00:35,840 --> 00:00:38,680 and we then want you to just to find 13 00:00:38,680 --> 00:00:41,950 a list containing those six values in this case. 14 00:00:41,950 --> 00:00:43,493 So let's go ahead and do that. 15 00:00:44,950 --> 00:00:48,270 Again, comma separated values within square brackets. 16 00:00:48,270 --> 00:00:51,350 And now we have three additional cells here, 17 00:00:51,350 --> 00:00:54,050 each of which uses, from the statistics module, 18 00:00:54,050 --> 00:00:56,650 the mean, the median, or the mode function. 19 00:00:56,650 --> 00:00:59,610 And by the way, notice here in Jupyter Notebooks 20 00:00:59,610 --> 00:01:03,750 that it's color coding the names of the functions 21 00:01:03,750 --> 00:01:06,620 that come from the module to indicate 22 00:01:06,620 --> 00:01:09,840 that they are part of the module's statistics. 23 00:01:09,840 --> 00:01:12,150 So the color coding that you get 24 00:01:12,150 --> 00:01:15,097 may vary from IPython at the command line 25 00:01:15,097 --> 00:01:19,150 to Jupyter Notebooks to the different IDE's 26 00:01:19,150 --> 00:01:22,850 that you might use to implement your Python code. 27 00:01:22,850 --> 00:01:26,310 So we can check the mean of those values by executing this, 28 00:01:26,310 --> 00:01:28,925 notice that we're getting an infinitely repeating sequence. 29 00:01:28,925 --> 00:01:32,630 By default, it shows a maximum of this number of digits 30 00:01:32,630 --> 00:01:33,970 to the right of the decimal point, 31 00:01:33,970 --> 00:01:37,210 rounding to the last position and of course 32 00:01:37,210 --> 00:01:39,730 we could use format strings to get 33 00:01:39,730 --> 00:01:42,450 a nicer looking format of the data. 34 00:01:42,450 --> 00:01:44,400 The median value in this case, 35 00:01:44,400 --> 00:01:46,908 well, they're not sorted up above here 36 00:01:46,908 --> 00:01:50,520 but we could, of course, add another cell 37 00:01:50,520 --> 00:01:52,250 to see what it would look like sorted. 38 00:01:52,250 --> 00:01:55,740 So let me go ahead and do that and let's do sorted 39 00:01:57,470 --> 00:02:01,850 with values as an argument and if I execute that, 40 00:02:01,850 --> 00:02:03,840 you can see these in sorted order 41 00:02:03,840 --> 00:02:07,040 and remember the median is the middle value 42 00:02:07,040 --> 00:02:10,320 but there's not a single middle value here 43 00:02:10,320 --> 00:02:13,444 because we have an even number of items 44 00:02:13,444 --> 00:02:16,940 so what's happening is it's taking the two middle values, 45 00:02:16,940 --> 00:02:19,720 calculating the average of those two values, 46 00:02:19,720 --> 00:02:23,710 and displaying the average so 86 47 00:02:23,710 --> 00:02:27,560 is the average of 84 and 88 of course 48 00:02:27,560 --> 00:02:29,052 and going back up here for a moment, 49 00:02:29,052 --> 00:02:32,650 when we evaluate the mode, whatever appears the most 50 00:02:32,650 --> 00:02:37,090 times 88, in this case, is the mode value so again, 51 00:02:37,090 --> 00:02:39,860 in sorted order, that helps us sees things 52 00:02:39,860 --> 00:02:41,800 like the mode as well because we can see 53 00:02:41,800 --> 00:02:43,860 all of the items of the same value, 54 00:02:43,860 --> 00:02:46,350 side by side with one another. 55 00:02:46,350 --> 00:02:50,780 Now, as long as we are looking at this, 56 00:02:50,780 --> 00:02:53,000 I just want to take a moment and make 57 00:02:53,000 --> 00:02:55,370 one other modification here to show you 58 00:02:55,370 --> 00:03:00,370 what happens, if in fact you don't have a unique mode value. 59 00:03:00,810 --> 00:03:04,160 So we have two 88's, let's go back up here 60 00:03:04,160 --> 00:03:08,300 and add in another 73 and I'll re-execute that cell 61 00:03:08,300 --> 00:03:11,800 so that I now have an updated list in memory 62 00:03:11,800 --> 00:03:15,750 and let's now go re-execute the mode and when we do, 63 00:03:15,750 --> 00:03:18,550 notice we get a statistics error. 64 00:03:18,550 --> 00:03:22,490 The reason is that there was no unique mode value. 65 00:03:22,490 --> 00:03:24,860 There were two values that had 66 00:03:24,860 --> 00:03:27,990 equally common values and if we go back above, 67 00:03:27,990 --> 00:03:31,410 we can see we have two 88's and we have two 73's 68 00:03:31,410 --> 00:03:34,950 so the mode is not always the best type 69 00:03:34,950 --> 00:03:38,440 of central statistic to use because you can't 70 00:03:38,440 --> 00:03:40,250 ever guarantee when you're dealing 71 00:03:40,250 --> 00:03:42,940 with potentially millions of values, 72 00:03:42,940 --> 00:03:45,350 that you have a unique mode value 73 00:03:45,350 --> 00:03:48,743 so you may not see that one used quite as frequently.