1 00:00:00,000 --> 00:00:02,693 All right, now that you know how to load 2 00:00:02,699 --> 00:00:05,579 data in Python via Pandas, and you know 3 00:00:05,579 --> 00:00:07,289 how to do that through using different 4 00:00:07,289 --> 00:00:10,229 data sources, so csv, json, and text 5 00:00:10,229 --> 00:00:13,649 files, and Excel, and now you want to 6 00:00:13,679 --> 00:00:15,539 understand how and learn how to 7 00:00:15,539 --> 00:00:18,869 manipulate these DataFrames, and by 8 00:00:18,869 --> 00:00:21,569 manipulation, what I mean is deleting 9 00:00:21,569 --> 00:00:23,759 rows and columns from your DataFrame, 10 00:00:24,119 --> 00:00:26,969 and adding new rows and columns, and 11 00:00:26,969 --> 00:00:29,429 also modifying existing rows and 12 00:00:29,429 --> 00:00:31,379 columns. So that's what you're going to 13 00:00:31,379 --> 00:00:34,529 learn throughout these lectures. But 14 00:00:34,529 --> 00:00:37,079 first of all, I'd like you to understand 15 00:00:37,144 --> 00:00:40,144 how DataFrames are indexed, and with 16 00:00:40,169 --> 00:00:43,319 indexing, I mean, you know, we have this 17 00:00:43,319 --> 00:00:45,539 DataFrame here, and this can be a big 18 00:00:45,539 --> 00:00:47,219 one, because all these happens to be a 19 00:00:47,219 --> 00:00:49,919 shorter one with only six rows. But if 20 00:00:49,919 --> 00:00:51,419 you have big DataFrames with lots of 21 00:00:51,419 --> 00:00:53,969 columns and rows, then you may want to 22 00:00:53,999 --> 00:00:56,821 extract information out of the DataFrame, 23 00:00:56,845 --> 00:00:58,619 and to extract information, you 24 00:00:58,619 --> 00:01:01,589 need to have like a coordinate system, 25 00:01:02,429 --> 00:01:04,949 only that DataFrame like an embedded 26 00:01:04,949 --> 00:01:07,379 coordinate system. So that if you want 27 00:01:07,379 --> 00:01:10,409 to access, let's say, so these two rows 28 00:01:10,409 --> 00:01:13,289 here, this portion here, you want to 29 00:01:13,289 --> 00:01:15,689 know how to do that. So that's what 30 00:01:15,689 --> 00:01:17,609 you're going to learn know how DataFrames 31 00:01:17,639 --> 00:01:19,439 are indexed and how you can slice them. 32 00:01:19,769 --> 00:01:23,339 So let's try to extract that portion of 33 00:01:23,339 --> 00:01:24,959 the DataFrame. There might be different 34 00:01:24,959 --> 00:01:27,521 ways to access that portion of the DataFrame. 35 00:01:27,966 --> 00:01:30,988 The first way is to use a label-based 36 00:01:31,012 --> 00:01:33,149 indexing. The other way is to use 37 00:01:33,149 --> 00:01:36,145 position-based indexing. So your DataFrame 38 00:01:36,169 --> 00:01:38,909 has column labels and index 39 00:01:38,909 --> 00:01:41,999 labels. So now you can use labels from 40 00:01:41,999 --> 00:01:44,759 your index column and labels from 41 00:01:44,759 --> 00:01:47,849 your header, your column names to access 42 00:01:47,879 --> 00:01:50,849 portions of your DataFrame. With label 43 00:01:50,874 --> 00:01:53,394 indexing, you want to use loc in there 44 00:01:54,029 --> 00:01:56,489 so the loc method, and then you pass 45 00:01:56,489 --> 00:01:58,529 square brackets in there, and then that 46 00:01:58,529 --> 00:02:01,769 gets two elements, and the first element 47 00:02:01,769 --> 00:02:06,179 could be a range of the index column. So 48 00:02:06,179 --> 00:02:07,799 we're talking about labels and not 49 00:02:07,799 --> 00:02:10,469 strings, so you will have to pass, you 50 00:02:10,469 --> 00:02:16,649 know 735 Dolores St, and then a 51 00:02:16,649 --> 00:02:23,121 range, so with a column there, 332 Hill St, 52 00:02:23,145 --> 00:02:24,859 and then from Country 53 00:02:24,883 --> 00:02:27,234 [No audio] 54 00:02:27,259 --> 00:02:29,249 to ID, execute 55 00:02:29,249 --> 00:02:32,279 that. I know this is our portion. So when 56 00:02:32,279 --> 00:02:34,529 you use labels, you're including the 57 00:02:34,529 --> 00:02:36,749 first label that you pass there and the 58 00:02:36,749 --> 00:02:39,509 last one as well. So everything between 59 00:02:39,509 --> 00:02:42,569 those, and like here, Country and 60 00:02:42,594 --> 00:02:44,814 Employees is included as well, but ID 61 00:02:45,034 --> 00:02:48,904 also, and of course, similarly, almost 62 00:02:48,929 --> 00:02:52,949 similarly, you can access, you know, 63 00:02:52,949 --> 00:02:56,489 single cells from your DataFrame, just 64 00:02:56,489 --> 00:02:59,219 like that. So the intersection between 65 00:02:59,219 --> 00:03:03,629 this index label and this column name is 66 00:03:03,629 --> 00:03:06,959 USA, which would be this one here. If 67 00:03:06,959 --> 00:03:10,949 you want all the USAs, then you just 68 00:03:10,949 --> 00:03:13,619 pass everything there, and you get 69 00:03:13,769 --> 00:03:16,889 everything here, which of course, if you 70 00:03:16,889 --> 00:03:19,374 want, you can convert it to list. 71 00:03:19,398 --> 00:03:21,655 [No audio] 72 00:03:21,685 --> 00:03:25,864 So a simple list using the Python built-in 73 00:03:25,894 --> 00:03:29,014 function, which is list and that's about 74 00:03:29,039 --> 00:03:31,559 label-based indexing. Now, this is not 75 00:03:31,559 --> 00:03:34,799 the common way to access to extract data 76 00:03:34,799 --> 00:03:37,589 from a DataFrame. More common could be 77 00:03:37,589 --> 00:03:40,949 to access a data based on indexing, not 78 00:03:40,949 --> 00:03:46,889 based on labels. So to do that, you do 79 00:03:48,042 --> 00:03:51,184 df7, and instead of loc, you do iloc. 80 00:03:53,014 --> 00:03:56,794 That, again, expects two items. So the first 81 00:03:56,819 --> 00:03:59,099 would be the range of your indexes. 82 00:03:59,123 --> 00:04:01,354 [No audio] 83 00:04:01,379 --> 00:04:04,755 Actually, let me print all the DataFrame 84 00:04:04,779 --> 00:04:05,921 here so that you 85 00:04:05,945 --> 00:04:08,195 [No audio] 86 00:04:08,220 --> 00:04:13,690 can refer to that. So let me access from 87 00:04:13,715 --> 00:04:17,189 Dolores to 23rd street, and that 88 00:04:17,189 --> 00:04:23,909 would be 1 to 3, I believe, yep, and also 89 00:04:23,909 --> 00:04:28,337 from Country to ID. So again 1 to 90 00:04:28,361 --> 00:04:30,361 [No audio] 91 00:04:30,371 --> 00:04:31,455 3, 92 00:04:31,479 --> 00:04:34,385 [No audio] 93 00:04:34,410 --> 00:04:36,269 and here, you can see the difference now, 94 00:04:36,869 --> 00:04:38,939 you know, the ID wasn't included there 95 00:04:38,969 --> 00:04:42,449 and neither was 23rd Street, because this 96 00:04:42,449 --> 00:04:45,119 is as, as you do with lists, this is upper 97 00:04:45,119 --> 00:04:47,697 bound exclusive. So with Python list 98 00:04:47,723 --> 00:04:49,829 3 is not included in the 99 00:04:49,829 --> 00:04:53,279 slice, but with labels, that the last 100 00:04:53,309 --> 00:04:56,009 item or the range was included in the 101 00:04:56,284 --> 00:04:58,984 slice. So in this case, you want to pass 102 00:04:59,009 --> 00:05:01,289 4 there and 4 there, and 103 00:05:01,289 --> 00:05:03,809 that's how you get your portion, and of 104 00:05:03,809 --> 00:05:06,329 course, similarly, you can do things 105 00:05:06,329 --> 00:05:09,959 like that. So you get all the rows or 106 00:05:09,959 --> 00:05:11,455 only one of them. 107 00:05:11,479 --> 00:05:14,826 [No audio] 108 00:05:14,851 --> 00:05:16,139 So that would be a row 109 00:05:16,139 --> 00:05:19,169 with index 3, which is this one, but 110 00:05:19,169 --> 00:05:21,299 only four columns, Country, 111 00:05:21,599 --> 00:05:24,779 Employees, and ID. So USA, 10, and 4. 112 00:05:25,354 --> 00:05:27,634 All right, that is position-based 113 00:05:27,659 --> 00:05:29,579 indexing, and yeah, that's what I wanted 114 00:05:29,579 --> 00:05:32,009 to teach you about DataFrame indexing 115 00:05:32,009 --> 00:05:35,199 and slicing, and I'll talk to you in the next lecture.