1 00:00:00,001 --> 00:00:04,234 Hey, welcome to this new section, and in 2 00:00:04,259 --> 00:00:06,599 this section, you'll learn how to use 3 00:00:06,629 --> 00:00:09,659 pandas, which is a very important Python 4 00:00:09,659 --> 00:00:11,789 library, and you really don't want to 5 00:00:11,789 --> 00:00:15,419 miss that. So what is pandas? Well, 6 00:00:15,449 --> 00:00:18,149 pandas is a library providing data 7 00:00:18,149 --> 00:00:20,879 structures and data analysis tools 8 00:00:21,214 --> 00:00:24,784 within Python. Or if the word tools 9 00:00:24,809 --> 00:00:27,419 confuses you, then you can say pandas is 10 00:00:27,419 --> 00:00:29,459 a library providing data structures and 11 00:00:29,459 --> 00:00:32,639 data analysis code. So basically, pandas 12 00:00:32,669 --> 00:00:35,309 allows you to load data from different 13 00:00:35,309 --> 00:00:38,519 sources into Python, and then use Python 14 00:00:38,519 --> 00:00:41,489 code to analyze those data and produce 15 00:00:41,489 --> 00:00:44,159 results, which can be in the form of 16 00:00:44,159 --> 00:00:47,279 tables, text, and also visualization 17 00:00:47,309 --> 00:00:49,049 with the help of visualization 18 00:00:49,049 --> 00:00:52,589 libraries, such as bokeh. bokeh, 19 00:00:52,589 --> 00:00:56,039 which is covered later in the course. So 20 00:00:56,039 --> 00:00:58,559 for now, we'll focus on data without 21 00:00:58,559 --> 00:01:01,319 visualizing them, and pandas is great 22 00:01:01,319 --> 00:01:03,869 for that. So practically, how do we use 23 00:01:03,869 --> 00:01:08,009 pandas? Well, you learn how to open text 24 00:01:08,009 --> 00:01:10,499 files using Python built in file 25 00:01:10,499 --> 00:01:12,449 handling methods earlier in the course. 26 00:01:12,904 --> 00:01:15,244 Now, what we open from text files was 27 00:01:15,269 --> 00:01:17,849 just plain text. But what if you want to 28 00:01:17,849 --> 00:01:20,369 load text files with data constructed 29 00:01:20,369 --> 00:01:23,849 off rows and columns. So things get a 30 00:01:23,849 --> 00:01:25,889 bit complicated. But here is where 31 00:01:25,889 --> 00:01:28,139 pandas comes into play. So you can 32 00:01:28,139 --> 00:01:30,599 probably do that using build-in Python 33 00:01:30,629 --> 00:01:33,059 methods that we have learned in the 34 00:01:33,059 --> 00:01:36,149 course. But to be more efficient and to 35 00:01:36,149 --> 00:01:38,579 be much more efficient, you need to 36 00:01:38,579 --> 00:01:40,319 have a high level library such as 37 00:01:40,319 --> 00:01:43,409 pandas, which is able to recognize such 38 00:01:43,439 --> 00:01:46,559 data structures automatically. So I use 39 00:01:46,559 --> 00:01:49,829 Pandas for loading data from data mining 40 00:01:49,829 --> 00:01:52,829 activities, such as web scraping. So 41 00:01:52,829 --> 00:01:55,079 you scrap data from a website with 42 00:01:55,079 --> 00:01:57,209 Python, and then store those data in 43 00:01:57,209 --> 00:02:00,329 pandas dataframes. So you use pandas to 44 00:02:00,329 --> 00:02:02,159 provide data structures for you in 45 00:02:02,189 --> 00:02:05,339 Python. I use Pandas for loading data 46 00:02:05,339 --> 00:02:08,339 from Excel files, and also use pandas 47 00:02:08,339 --> 00:02:10,169 for analyzing those data instead of 48 00:02:10,169 --> 00:02:12,929 using Excel. Excel can be good for 49 00:02:12,929 --> 00:02:15,059 analyzing a small table of data that 50 00:02:15,059 --> 00:02:17,099 fits in your screen, in your computer 51 00:02:17,099 --> 00:02:20,171 screen. But for data larger than 52 00:02:20,196 --> 00:02:22,529 that, so you really want to use code. 53 00:02:23,524 --> 00:02:25,774 So you write Python code once and then 54 00:02:25,799 --> 00:02:29,099 you reuse it with other data as well, and 55 00:02:29,129 --> 00:02:31,859 you don't want to do selections, 56 00:02:31,884 --> 00:02:34,709 and dragging, and many other cumbersome 57 00:02:34,739 --> 00:02:37,529 operations that you normally do in 58 00:02:37,529 --> 00:02:40,259 a graphical base program, such as Excel. 59 00:02:40,619 --> 00:02:42,719 So code is the way to go if you want to 60 00:02:42,719 --> 00:02:46,439 be efficient with data. and Python is great 61 00:02:46,439 --> 00:02:49,529 for that with pandas. So you really want 62 00:02:49,529 --> 00:02:52,229 to get a good hang of pandas, and you 63 00:02:52,229 --> 00:02:55,259 will learn that in this section, and 64 00:02:55,259 --> 00:02:57,539 also practice it with real world 65 00:02:57,539 --> 00:02:59,879 applications that we will be building as 66 00:02:59,879 --> 00:03:02,609 you progress through the course. Let's 67 00:03:02,609 --> 00:03:05,369 go ahead and dive into some code for now 68 00:03:05,429 --> 00:03:07,660 in the next lectures. See you.