1 00:00:00,000 --> 00:00:02,818 Now that you have a basic understanding 2 00:00:02,819 --> 00:00:05,489 of pandas, and you know how to use 3 00:00:05,519 --> 00:00:08,129 Jupyter Notebooks, we can go ahead and 4 00:00:08,129 --> 00:00:11,249 learn how to load various kinds of files 5 00:00:11,644 --> 00:00:15,634 in Python, using Pandas and Jupyter. So 6 00:00:15,659 --> 00:00:19,319 I've got five files here, they contain 7 00:00:19,344 --> 00:00:23,094 exactly the same datasets, and this is a 8 00:00:23,129 --> 00:00:25,889 text version, or let me open the xml 9 00:00:25,889 --> 00:00:29,489 version that will show a pretty overview 10 00:00:29,514 --> 00:00:32,004 of data as you can see. So we've got 11 00:00:32,669 --> 00:00:35,009 7 lines of code including the 12 00:00:35,009 --> 00:00:39,659 header, we have also 7 columns. So 13 00:00:39,659 --> 00:00:42,659 it's just some basic data of 14 00:00:42,659 --> 00:00:45,149 supermarkets. So ID, Address, 15 00:00:45,149 --> 00:00:49,079 City, State, Country, and the Name of the 16 00:00:49,104 --> 00:00:51,264 supermarket, and Number of Employees. 17 00:00:51,288 --> 00:00:53,374 [No audio] 18 00:00:53,399 --> 00:00:55,919 And similarly, we have exactly some of 19 00:00:55,919 --> 00:00:58,139 the same data but in different formats. 20 00:00:58,139 --> 00:01:01,349 So we have csv, and csv means comma 21 00:01:01,349 --> 00:01:04,469 separated values. So it's basically a 22 00:01:04,469 --> 00:01:08,039 text file where the values, the columns 23 00:01:08,039 --> 00:01:10,769 are separated by commas, as you can see 24 00:01:10,769 --> 00:01:12,479 here, so every column is separated by 25 00:01:12,479 --> 00:01:16,169 comma, but it has a csv extension, and 26 00:01:16,169 --> 00:01:19,109 it can be opened with Excel. So if I 27 00:01:19,109 --> 00:01:20,579 open this now, you'll see the same 28 00:01:20,579 --> 00:01:23,789 dataset that you saw in the xlsx file. We 29 00:01:23,789 --> 00:01:27,209 also have the same data in separated by 30 00:01:27,209 --> 00:01:30,141 semi colons. As you can see in here, 31 00:01:30,165 --> 00:01:32,223 [No audio] 32 00:01:32,248 --> 00:01:33,749 and yeah, if 33 00:01:33,749 --> 00:01:35,579 you're working with data, you're 34 00:01:35,579 --> 00:01:37,289 probably familiar with these kinds of 35 00:01:37,289 --> 00:01:40,559 files. So this is how to store data, you 36 00:01:40,559 --> 00:01:44,129 need to have some conventions, and using 37 00:01:44,129 --> 00:01:46,409 such convention, then you use another 38 00:01:46,409 --> 00:01:49,349 programs such as Python, to load these 39 00:01:49,349 --> 00:01:53,219 data. So when you load a csv file, 40 00:01:53,219 --> 00:01:55,349 Python knows that the values will be 41 00:01:55,379 --> 00:01:59,339 separated by commas, and it knows how to 42 00:01:59,339 --> 00:02:02,099 separate them. It knows how to extract 43 00:02:02,579 --> 00:02:05,699 values. So we open all these one by one. 44 00:02:06,232 --> 00:02:07,821 We also have a json file, 45 00:02:07,845 --> 00:02:11,592 [No audio] 46 00:02:11,628 --> 00:02:16,527 which is yet another format to store data, and looks like a 47 00:02:16,559 --> 00:02:18,659 Python dictionary actually. So we will 48 00:02:18,659 --> 00:02:21,059 learn how to convert them to a Pandas 49 00:02:21,089 --> 00:02:23,459 dataframe as well. So all these will be 50 00:02:23,484 --> 00:02:25,674 converted to Pandas Dataframes. 51 00:02:26,072 --> 00:02:29,161 And yeah, I'll go ahead and start Jupyter. 52 00:02:29,919 --> 00:02:34,589 jupyter notebook. Here are my files, I'll go 53 00:02:34,589 --> 00:02:37,049 ahead and create a new Jupyter Notebook 54 00:02:37,049 --> 00:02:40,079 for Python 3. Before I go ahead and 55 00:02:40,079 --> 00:02:42,509 load those files in Python, there's a 56 00:02:42,509 --> 00:02:45,899 trick I do usually, I import os, and then 57 00:02:45,929 --> 00:02:50,159 os.listdir and Alt Enter 58 00:02:50,159 --> 00:02:52,079 execute, that's when you go to the next line. 59 00:02:52,109 --> 00:02:54,119 And what you get is you get a list of 60 00:02:54,119 --> 00:02:57,149 files and folders as well, or file names 61 00:02:57,179 --> 00:02:59,099 that you have in the current directory. 62 00:02:59,519 --> 00:03:01,799 So now I don't have to switch to my 63 00:03:01,799 --> 00:03:03,629 folder to look at the names, I have 64 00:03:03,629 --> 00:03:05,579 everything in here. Now I can go ahead 65 00:03:05,609 --> 00:03:06,788 and import pandas. 66 00:03:06,813 --> 00:03:09,105 [No audio] 67 00:03:09,130 --> 00:03:10,199 And let's start loading 68 00:03:10,199 --> 00:03:13,455 these files one by one. Let's say df1, 69 00:03:13,479 --> 00:03:15,539 so dataframe 1, and that will be 70 00:03:15,564 --> 00:03:20,444 equal to pandas.read_csv, 71 00:03:20,469 --> 00:03:23,101 [No audio] 72 00:03:23,126 --> 00:03:26,039 and then you have to pass the name of the 73 00:03:26,039 --> 00:03:29,355 file that you want to open, supermarkets.csv, 74 00:03:29,379 --> 00:03:32,699 and just Enter and maybe want 75 00:03:32,699 --> 00:03:36,869 to print that out, so df there, and you know, 76 00:03:36,869 --> 00:03:39,659 what we got here is, so we loaded the 77 00:03:39,659 --> 00:03:41,249 dataframe first, and then we printed 78 00:03:41,249 --> 00:03:44,309 that out, so that we got this table, nice 79 00:03:44,309 --> 00:03:46,769 table in there. That's how easy it is to 80 00:03:46,769 --> 00:03:49,979 load data from csv file. Now the benefit 81 00:03:49,979 --> 00:03:52,529 of having the data in Python is it once 82 00:03:52,529 --> 00:03:54,869 you load them in Python, you can do many 83 00:03:54,869 --> 00:03:58,049 many operations with your data. You can 84 00:03:58,379 --> 00:04:00,665 do statistics and add new columns. 85 00:04:00,690 --> 00:04:03,509 You can merge columns, you can add 86 00:04:04,379 --> 00:04:05,969 the numbers form column to the 87 00:04:05,969 --> 00:04:07,859 numbers of the other columns, and at the 88 00:04:07,884 --> 00:04:10,404 end, you can export those data back to 89 00:04:10,769 --> 00:04:13,979 formats like csv, Excel, etc. So we're 90 00:04:13,979 --> 00:04:15,659 going to do that later. But first, let's 91 00:04:15,659 --> 00:04:18,809 see how we load data from several file 92 00:04:18,809 --> 00:04:20,129 formats in the next videos.