1 00:00:00,000 --> 00:00:01,155 [No audio] 2 00:00:01,180 --> 00:00:03,600 Good, we're continuing our series of 3 00:00:03,629 --> 00:00:06,387 lectures about manipulating Pandas dataframes. 4 00:00:06,678 --> 00:00:08,460 So you'll learn how to drop a 5 00:00:08,460 --> 00:00:10,740 row or column out of the DataFrame in 6 00:00:10,740 --> 00:00:12,750 the previous lecture. Now let's see how 7 00:00:12,750 --> 00:00:16,655 we can add a column or row to a DataFrame. 8 00:00:17,526 --> 00:00:19,146 So we have df7 here with 9 00:00:19,175 --> 00:00:22,775 Address as index column and the series 10 00:00:22,800 --> 00:00:26,070 of column labels. Now, just to clarify 11 00:00:26,070 --> 00:00:28,110 something, I know this is difficult to 12 00:00:28,110 --> 00:00:30,300 wrap up your mind, how you'll be using 13 00:00:30,300 --> 00:00:32,940 these operations in real life? So I just 14 00:00:32,940 --> 00:00:35,370 want you to know the syntax of doing 15 00:00:35,370 --> 00:00:37,860 this operations, and throughout the 16 00:00:37,860 --> 00:00:40,170 course, you will learn how to actually 17 00:00:40,200 --> 00:00:42,510 put this into work with real life 18 00:00:42,510 --> 00:00:44,400 examples. So don't worry about that. 19 00:00:45,035 --> 00:00:47,495 Anyway, let's add a colon there, and to do 20 00:00:47,520 --> 00:00:49,860 that, you'd want to say df and in 21 00:00:49,860 --> 00:00:51,960 square brackets, you pass the name of 22 00:00:51,960 --> 00:00:55,920 the new column, and let's say, Continent, 23 00:00:56,615 --> 00:00:58,865 that'd be equal to, you know, now you 24 00:00:58,890 --> 00:01:01,320 would have to pass a list of values that 25 00:01:01,320 --> 00:01:03,330 you want to populate that column with, 26 00:01:04,290 --> 00:01:08,430 let's say North America. Now, if you 27 00:01:08,455 --> 00:01:10,285 execute that, you'll get an error, 28 00:01:10,590 --> 00:01:12,330 name df is not defined. Actually, I 29 00:01:12,330 --> 00:01:15,180 didn't mean this error. This was another 30 00:01:15,180 --> 00:01:17,940 thing. You get this error. So it says 31 00:01:17,940 --> 00:01:20,520 that the length of values does not match 32 00:01:20,520 --> 00:01:24,690 the length of index. So the length of 33 00:01:24,720 --> 00:01:28,800 index is, you know, the length of index is 34 00:01:29,730 --> 00:01:33,390 df7.index, the length of your 35 00:01:33,390 --> 00:01:36,000 index is 5, but you're trying to pass 36 00:01:36,000 --> 00:01:39,420 their list with a length of 1. So 37 00:01:39,420 --> 00:01:41,970 you've got only one element there. So 38 00:01:41,970 --> 00:01:44,040 the solution here is to pass a list with 39 00:01:44,040 --> 00:01:47,310 the exact number of items that you have 40 00:01:47,310 --> 00:01:50,190 in your table, in your DataFrame. So we 41 00:01:50,190 --> 00:01:53,340 have 5 rows there, 5, and yeah, 42 00:01:53,340 --> 00:01:56,580 you could add four more items here, 43 00:01:56,580 --> 00:01:58,950 North America, North America, etc, or 44 00:01:58,950 --> 00:02:00,660 you can do some fancy things in there. 45 00:02:00,900 --> 00:02:05,460 So you could say df7.shape then 46 00:02:05,460 --> 00:02:11,790 0 times that, execute, df7, print that 47 00:02:11,790 --> 00:02:15,390 out. Here, we have a new column. So if 48 00:02:15,390 --> 00:02:16,882 you're confused with this shape 49 00:02:16,907 --> 00:02:20,280 0 times North America, well, what I 50 00:02:20,280 --> 00:02:23,130 did is, you know, df7.shape, 51 00:02:23,160 --> 00:02:26,070 what you get is 5, 7, which means 52 00:02:26,070 --> 00:02:28,200 you have 5 rows and 7 columns. 53 00:02:29,580 --> 00:02:33,270 Now, what I want to get is the first 54 00:02:33,300 --> 00:02:36,420 item of that tuple. So I get 5 by 55 00:02:36,420 --> 00:02:39,300 doing shape 0. With this, I always 56 00:02:39,300 --> 00:02:41,880 make sure that I'm getting the number of 57 00:02:41,880 --> 00:02:45,180 rows that my DataFrame has, and then 58 00:02:45,180 --> 00:02:50,040 when you multiply 5 by a list with 59 00:02:50,310 --> 00:02:52,530 one element, you get a list with 5 60 00:02:52,555 --> 00:02:56,185 elements. America, that's the idea, 61 00:02:56,910 --> 00:02:59,730 delete that, and note that this is 62 00:02:59,730 --> 00:03:03,210 actually an inplace operation. So that 63 00:03:03,210 --> 00:03:05,400 will update your DataFrame, and all 64 00:03:05,400 --> 00:03:07,470 that it was about adding a new column, and 65 00:03:07,470 --> 00:03:08,915 how about modifying a new column. 66 00:03:10,595 --> 00:03:14,075 So modifying the Continent column, you 67 00:03:14,100 --> 00:03:15,960 know, that could be something like 68 00:03:16,950 --> 00:03:20,910 Country, and you can also add some 69 00:03:20,935 --> 00:03:24,391 streams in there, let's say plus a comma, 70 00:03:25,352 --> 00:03:29,380 plus, maybe another string, North America. 71 00:03:29,404 --> 00:03:32,126 [No audio] 72 00:03:32,151 --> 00:03:35,010 Execute that, print all 73 00:03:35,010 --> 00:03:39,120 the DataFrame, and yeah, see what we 74 00:03:39,120 --> 00:03:42,150 got. So what we did here is we updated 75 00:03:42,180 --> 00:03:44,820 the Continent data column by referring 76 00:03:44,850 --> 00:03:46,470 to an existing column, which is 77 00:03:46,470 --> 00:03:49,920 Country, and then for each value of the 78 00:03:49,920 --> 00:03:52,200 Country column, we added so we 79 00:03:52,200 --> 00:03:54,990 concatenate the comma string, which you 80 00:03:54,990 --> 00:03:57,900 can see in here, just after USA, and we 81 00:03:57,900 --> 00:04:01,650 also added another string, so North 82 00:04:01,650 --> 00:04:04,260 America. Now this could also be another 83 00:04:04,260 --> 00:04:07,350 column if you liked. So if you pass 84 00:04:07,350 --> 00:04:10,201 here, let's say Employees, you'd get 85 00:04:10,505 --> 00:04:12,305 8 instead of the first North 86 00:04:12,330 --> 00:04:16,950 America, and then 15, and so on. So that's 87 00:04:16,950 --> 00:04:20,010 how you update column. Now how about 88 00:04:20,040 --> 00:04:22,260 adding a new row? Well, this can be a 89 00:04:22,260 --> 00:04:24,630 bit tricky, but still understandable. 90 00:04:26,040 --> 00:04:28,200 What you could do here, because there's 91 00:04:28,200 --> 00:04:31,800 not easy methods to pass row to a 92 00:04:31,800 --> 00:04:34,980 DataFrame. What you could do is you 93 00:04:34,980 --> 00:04:38,340 could say df7_t, so I'm creating a 94 00:04:38,340 --> 00:04:43,388 new variable, that would be equal to df7.T. 95 00:04:43,713 --> 00:04:46,080 So T actually is a method 96 00:04:46,080 --> 00:04:48,900 that what it does is it transposes your 97 00:04:48,925 --> 00:04:52,534 DataFrame, and with transposition I mean, 98 00:04:52,714 --> 00:04:54,635 you know, df7, you check 99 00:04:54,660 --> 00:04:57,600 your new DataFrame. What you get is 100 00:04:57,600 --> 00:05:00,810 this. So you're rows have become 101 00:05:00,810 --> 00:05:03,330 columns, and your columns have become 102 00:05:03,330 --> 00:05:06,390 rows. So spend a few second looking at 103 00:05:06,390 --> 00:05:11,610 this, yep, and what we can do now is, you 104 00:05:11,610 --> 00:05:15,816 know, we can use the same syntax, df7_t, 105 00:05:16,049 --> 00:05:18,210 and then add a new column in there, 106 00:05:18,240 --> 00:05:20,460 so for the name of a column, you'll 107 00:05:20,460 --> 00:05:27,288 have to pass an Address, let's say My Address, 108 00:05:28,287 --> 00:05:30,180 let it be equal again to a 109 00:05:30,180 --> 00:05:33,810 list, and you want the list to reflect 110 00:05:34,571 --> 00:05:39,235 this order. So City first, let's say, My City, 111 00:05:40,438 --> 00:05:41,988 Country, My Country, 112 00:05:42,012 --> 00:05:44,536 [No audio] 113 00:05:44,581 --> 00:05:48,875 and 10 for Employees, and 7 for ID, 114 00:05:50,268 --> 00:05:54,487 My Shop for the shop name, My State, 115 00:05:54,511 --> 00:05:58,437 [No audio] 116 00:05:58,462 --> 00:05:59,618 and My Continent, 117 00:06:00,155 --> 00:06:06,545 execute that, and df7_t. If you look at 118 00:06:06,570 --> 00:06:08,813 these now, you'll see that 119 00:06:08,837 --> 00:06:10,837 [No audio] 120 00:06:10,862 --> 00:06:13,521 you get a new column in your DataFrame, 121 00:06:14,271 --> 00:06:20,919 and now what you do so to complete the trick is, let 122 00:06:21,000 --> 00:06:26,955 me say df7 equals to df7_t.T. 123 00:06:26,979 --> 00:06:29,723 [No audio] 124 00:06:29,748 --> 00:06:31,110 So df7 now, 125 00:06:31,260 --> 00:06:35,190 we'll have the new row added at the end. So 126 00:06:35,190 --> 00:06:37,410 let me wrap this up. Again, what I did 127 00:06:37,410 --> 00:06:40,110 is I transpose the original DataFrame, 128 00:06:40,140 --> 00:06:43,860 df7, then I added a column there. 129 00:06:44,040 --> 00:06:46,020 And now if I transpose the DataFrame 130 00:06:46,020 --> 00:06:48,000 back again to its original position, 131 00:06:48,360 --> 00:06:50,610 this column that I added will be 132 00:06:50,610 --> 00:06:54,300 converted to a row, and that does the 133 00:06:54,300 --> 00:06:58,020 trick, and similarly, you can modify a 134 00:06:58,020 --> 00:07:00,780 row if you like. So in that case, you 135 00:07:00,780 --> 00:07:02,880 don't point to My Address, but you'd 136 00:07:02,880 --> 00:07:06,630 point to an existing column with an 137 00:07:06,630 --> 00:07:08,550 address, with an existing address name. 138 00:07:08,765 --> 00:07:11,135 For instance, this one, if I pass it 139 00:07:11,160 --> 00:07:15,450 here, and execute that, execute that, and that, 140 00:07:17,015 --> 00:07:20,915 and you'll see that the values of this row 141 00:07:20,940 --> 00:07:23,610 with this address name have been updated 142 00:07:23,610 --> 00:07:27,360 now, as you can see in here. Now, that 143 00:07:27,360 --> 00:07:29,940 closes this lecture as well, and yeah, 144 00:07:29,940 --> 00:07:31,540 I'll see you later.