1 00:00:00,910 --> 00:00:02,750 - To finish up our discussion of 2 00:00:02,750 --> 00:00:04,960 the Titanic data set for now, 3 00:00:04,960 --> 00:00:08,820 we'd like to take a look at how we can visualize 4 00:00:08,820 --> 00:00:10,540 a little bit of this data, 5 00:00:10,540 --> 00:00:11,670 and for that purpose, 6 00:00:11,670 --> 00:00:14,880 we're going to take advantage of a built in 7 00:00:14,880 --> 00:00:18,520 Pandas capability for displaying a histogram 8 00:00:18,520 --> 00:00:23,010 based on specifically the age column in this case. 9 00:00:23,010 --> 00:00:25,210 So in order to do that, 10 00:00:25,210 --> 00:00:28,490 we need to enable Matplotlib support. 11 00:00:28,490 --> 00:00:31,990 Pandas uses Matplotlib to implement its graphics. 12 00:00:31,990 --> 00:00:36,590 We're in a curin IPython session in which I did not 13 00:00:36,590 --> 00:00:40,430 enable Matplotlib support as I launched the session, 14 00:00:40,430 --> 00:00:43,360 so here we want to talk about using 15 00:00:43,360 --> 00:00:46,373 the magic called Matplotlib. 16 00:00:47,590 --> 00:00:49,700 And what Matplotlib, 17 00:00:49,700 --> 00:00:52,050 I'm just making sure I spelled it correctly here. 18 00:00:52,050 --> 00:00:55,690 And that's going to enable Matplotlib support 19 00:00:55,690 --> 00:00:57,210 in the current session, 20 00:00:57,210 --> 00:01:01,760 so even if you didn't launch Ipython with Matplotlib support 21 00:01:01,760 --> 00:01:05,670 you can enable it after the fact by using this magic. 22 00:01:05,670 --> 00:01:08,920 So now that we have that set up, 23 00:01:08,920 --> 00:01:12,330 we can go ahead and create a histogram, 24 00:01:12,330 --> 00:01:15,770 and to do that, we'll use a very simple method 25 00:01:15,770 --> 00:01:18,790 call hist stands for histogram, 26 00:01:18,790 --> 00:01:22,400 and when you invoke that on a data frame, 27 00:01:22,400 --> 00:01:24,890 it's going to create histograms for 28 00:01:24,890 --> 00:01:28,540 all of the numeric columns by default. 29 00:01:28,540 --> 00:01:29,900 So there's only one, 30 00:01:29,900 --> 00:01:32,410 which means it's only going to create one histogram, 31 00:01:32,410 --> 00:01:34,950 and in this case we'll hit enter and let that 32 00:01:34,950 --> 00:01:35,920 create the histogram 33 00:01:35,920 --> 00:01:38,290 which will pop open in a separate window here. 34 00:01:38,290 --> 00:01:41,210 And now what you see is a histogram 35 00:01:41,210 --> 00:01:45,310 of everything created for you auto-magically, if you will. 36 00:01:45,310 --> 00:01:46,770 On the left hand column, 37 00:01:46,770 --> 00:01:51,020 we have the number of passengers in a given age range, 38 00:01:51,020 --> 00:01:53,890 and down along the bottom we have the age ranges. 39 00:01:53,890 --> 00:01:56,980 So you can see this first block here 40 00:01:56,980 --> 00:02:01,570 represents what looks like about zero through eight or so, 41 00:02:01,570 --> 00:02:05,410 so there were somewhere around maybe 70 people 42 00:02:05,410 --> 00:02:08,320 who were in the zero to eight range. 43 00:02:08,320 --> 00:02:09,700 In this next block here, 44 00:02:09,700 --> 00:02:11,770 maybe we had about 60 people 45 00:02:11,770 --> 00:02:16,370 who were in the eight to approximately 16 or 17 range. 46 00:02:16,370 --> 00:02:18,300 In this next block here, 47 00:02:18,300 --> 00:02:23,280 we had over 250 people, maybe 275 or so, 48 00:02:23,280 --> 00:02:26,390 280 people that were in approximately 49 00:02:26,390 --> 00:02:31,010 the 17 to maybe 24 range, et cetera, 50 00:02:31,010 --> 00:02:34,840 so you can see we get this kind of contiguous presentation 51 00:02:34,840 --> 00:02:37,348 of ranges of values and 52 00:02:37,348 --> 00:02:41,230 the number of people inside those ranges. 53 00:02:41,230 --> 00:02:43,730 So, again, this is a key aspect of 54 00:02:43,730 --> 00:02:45,660 getting to know your data, 55 00:02:45,660 --> 00:02:49,070 to visualize that data, 56 00:02:49,070 --> 00:02:51,900 and sometimes by looking at it in a picture, 57 00:02:51,900 --> 00:02:56,580 you can understand the data a little bit better as well. 58 00:02:56,580 --> 00:03:00,010 So we will of course do lots of visualizations 59 00:03:00,010 --> 00:03:03,010 in subsequent examples and we just wanted to give you 60 00:03:03,010 --> 00:03:05,120 a basic demonstration here, 61 00:03:05,120 --> 00:03:09,480 and of course, Pandas has other visualization capabilities 62 00:03:09,480 --> 00:03:11,930 above and beyond simple histograms 63 00:03:11,930 --> 00:03:13,863 that we're showing in this example.