1 00:00:00,000 --> 00:00:01,200 [no audio] 2 00:00:01,200 --> 00:00:05,300 Friends, here we are going to work with CSV files using Python. 3 00:00:06,000 --> 00:00:07,900 So first of all, what is CSV? 4 00:00:09,200 --> 00:00:12,700 See guys simply, CSV is a Comma Separated Values. 5 00:00:13,400 --> 00:00:16,500 That means whatever the values you have in a CSV file 6 00:00:17,000 --> 00:00:21,900 they all are separated by a comma, or separator is comma, 7 00:00:21,900 --> 00:00:28,400 or delimiter is comma. Separator or delimiter is same, right. 8 00:00:28,600 --> 00:00:33,900 Fine. Now why we are using this? So you are going to use this 9 00:00:33,900 --> 00:00:36,200 just to store tabular type of data, 10 00:00:37,600 --> 00:00:43,900 such as your Excel or database data. Of course, we can store 11 00:00:43,900 --> 00:00:49,100 our data as a tabular form in your Excel sheets as well. 12 00:00:49,900 --> 00:00:54,400 But compared to excel sheets, CSV is the simple way to store 13 00:00:54,400 --> 00:00:56,000 your tabular type of data. 14 00:00:57,700 --> 00:00:58,600 Let me do one thing. 15 00:00:58,600 --> 00:01:02,800 First of all, I am going to open a simple file. 16 00:01:03,200 --> 00:01:07,300 Let me open a file called, simply any text file 17 00:01:07,300 --> 00:01:08,300 I am going to open it. 18 00:01:10,000 --> 00:01:11,500 Right. Fine. See that. 19 00:01:12,700 --> 00:01:14,000 Let me open it first. 20 00:01:15,400 --> 00:01:23,100 See by default your CSV data is separated with comma. So 21 00:01:23,100 --> 00:01:26,800 basically CSV, you can assume that CSV is like your Excel sheet, 22 00:01:27,800 --> 00:01:31,800 or tabular data. So the tabular data is nothing but you have 23 00:01:31,800 --> 00:01:33,600 rows and columns. Your data, 24 00:01:33,700 --> 00:01:36,100 your data is like a rows and column. 25 00:01:36,100 --> 00:01:39,700 Let's say I have suppose, column as serial number, 26 00:01:40,900 --> 00:01:47,700 name, suppose salary, then skill, so I am going to store in 27 00:01:47,700 --> 00:01:49,400 this way. Suppose 28 00:01:49,400 --> 00:01:52,800 let's say 'John', 'Python', 29 00:01:53,600 --> 00:01:57,100 sorry. 'salary', right. Some $2000. 30 00:01:58,000 --> 00:02:01,500 Let's say 'Python'. Then I have 'Jason', 31 00:02:03,300 --> 00:02:04,900 then $1000, 32 00:02:07,500 --> 00:02:09,600 then I have suppose, Bash. 33 00:02:11,000 --> 00:02:14,700 Let me save it as, as a CSV file. 34 00:02:17,000 --> 00:02:19,900 Some 'info.csv', that's it. 35 00:02:20,500 --> 00:02:26,800 So I saved your file as '.csv' extension, as a CSV file. 36 00:02:27,400 --> 00:02:29,300 See now this is your CSV file. 37 00:02:29,800 --> 00:02:34,400 I can open this CSV file with any 'Notepad', right? 38 00:02:34,400 --> 00:02:35,100 See that. Your data 39 00:02:35,100 --> 00:02:40,500 is there as it is, or by default I can able to open that using 40 00:02:40,500 --> 00:02:42,600 your Excel sheets as well. 41 00:02:44,000 --> 00:02:47,100 See that, automatically your data is arranged in terms of 42 00:02:47,100 --> 00:02:48,300 rows and columns. 43 00:02:49,600 --> 00:02:51,400 In terms of rows and columns. 44 00:02:52,000 --> 00:02:57,000 So guys if you observe while storing your data, right, we 45 00:02:57,000 --> 00:03:03,100 stored our data by taking comma as the separator between your columns. 46 00:03:05,200 --> 00:03:08,700 Right. Not only that, you can also store your data by taking 47 00:03:08,800 --> 00:03:10,400 any other delimiter as well. 48 00:03:10,400 --> 00:03:15,900 Let's say, I am going to take again same data with a delimiter as 49 00:03:15,900 --> 00:03:20,900 something like '|' 'serial number', name', suppose 'salary', 50 00:03:21,500 --> 00:03:23,200 then I am going to take 'skill'. 51 00:03:24,000 --> 00:03:26,000 Let's say serial number - 1, 52 00:03:27,500 --> 00:03:35,100 'John', I'm taking salary suppose $2000, 'skill' - 53 00:03:35,400 --> 00:03:37,500 let's say Python. Same way 54 00:03:37,500 --> 00:03:39,500 I am going to store one more data. 55 00:03:39,700 --> 00:03:40,600 You can take anything. 56 00:03:40,900 --> 00:03:44,800 So here my intention is, in case if we take separator as other 57 00:03:44,800 --> 00:03:49,200 than comma how you can work with your CSV files, that's important. 58 00:03:50,500 --> 00:03:55,900 Let me now save this file as some 'new_info.csv'. 59 00:03:57,500 --> 00:04:00,700 Let me save it as '.csv' file. 60 00:04:00,900 --> 00:04:05,300 Let's say 'name' as 'new_info.csv'. 61 00:04:05,500 --> 00:04:06,500 That's it. 62 00:04:06,800 --> 00:04:14,700 Now previously you stored your data by taking separator as comma, 63 00:04:14,900 --> 00:04:18,899 but now I have taken separator as pipeline. By default 64 00:04:18,899 --> 00:04:22,899 if I open that, see that, now you are not getting your data 65 00:04:23,200 --> 00:04:24,899 in terms of columns. 66 00:04:25,600 --> 00:04:29,200 See that your Excel sheet is not able to identify your data 67 00:04:29,400 --> 00:04:30,600 as in terms of columns. 68 00:04:30,600 --> 00:04:34,900 It has given only one column. But you know the delimiter, right? 69 00:04:34,900 --> 00:04:38,000 What is the delimiter you used to create your CSV file? 70 00:04:38,300 --> 00:04:43,200 Just select your column, go to 'Data-> Text to Columns'. 71 00:04:43,200 --> 00:04:44,500 [no audio] 72 00:04:44,500 --> 00:04:48,800 delimiter, 'Delimited', right. Just select other, and provide 73 00:04:48,800 --> 00:04:54,200 your '|' as. See that. Now you're able to fetch your 74 00:04:54,200 --> 00:04:57,900 data, I mean you are able to convert your data into number 75 00:04:57,900 --> 00:05:00,500 of columns based on the separator 76 00:05:00,500 --> 00:05:02,800 you used for your data. 77 00:05:02,800 --> 00:05:05,800 [no audio] 78 00:05:05,800 --> 00:05:08,800 So this is also one type of encoding. 79 00:05:09,900 --> 00:05:12,200 See, if you know pipeline then only you can able to retrieve 80 00:05:12,200 --> 00:05:14,100 the data from your file. 81 00:05:15,000 --> 00:05:17,600 So if third person don't know what is the delimiter you 82 00:05:17,600 --> 00:05:21,600 used for your data, then they can't separate your data. 83 00:05:22,800 --> 00:05:27,300 Right. So it's like a one type of encoding. Of course, not that much 84 00:05:27,300 --> 00:05:29,600 good but up to certain limits 85 00:05:29,600 --> 00:05:30,700 it is good practice. 86 00:05:32,000 --> 00:05:36,500 That's fine. And guys if you store your data in CSV file, it is very easy 87 00:05:36,500 --> 00:05:40,600 to transfer your data to your database, or from database to 88 00:05:40,600 --> 00:05:43,700 if you want to download it is very easy to download into CSV files. 89 00:05:44,500 --> 00:05:45,600 That's why in real time 90 00:05:45,800 --> 00:05:49,300 most of the cases you will work with CSV files instead of 91 00:05:49,500 --> 00:05:50,600 excel sheets. 92 00:05:50,600 --> 00:05:52,500 [no audio] 93 00:05:52,500 --> 00:05:57,600 That's fine. Now guys assume that you have one CSV file. 94 00:05:58,300 --> 00:06:02,400 Now you want to read the data using python. Then how you can read it? 95 00:06:03,600 --> 00:06:08,900 Right. See let me save a script name called 'read_csv'. 96 00:06:10,500 --> 00:06:11,800 Anything you can take it. Now, 97 00:06:11,800 --> 00:06:12,900 I am going to take it, 98 00:06:13,900 --> 00:06:15,100 I'm going to store somewhere. 99 00:06:15,300 --> 00:06:16,500 You can store anywhere. 100 00:06:16,800 --> 00:06:18,400 Let's say I want to store here. 101 00:06:20,000 --> 00:06:27,500 'read_csv_file.py'. So guys if you want to read a CSV 102 00:06:27,500 --> 00:06:30,700 file, or if you want to write a data into your CSV file, in 103 00:06:30,700 --> 00:06:35,200 your python you have a default module called CSV'. First you need to 104 00:06:35,200 --> 00:06:38,500 import that. This is the default module. You can directly use 105 00:06:38,500 --> 00:06:44,700 that. Then let me take 'req_file' to read. So suppose that 106 00:06:44,700 --> 00:06:45,400 file is, suppose 107 00:06:45,400 --> 00:06:50,800 let's say 'info.csv', right. But guys whenever if you are 108 00:06:50,800 --> 00:06:54,600 working with files, it is a good practice to provide complete path. 109 00:06:54,600 --> 00:06:56,500 [no audio] 110 00:06:56,500 --> 00:07:00,300 Right. See, let me take the complete path for my file. 111 00:07:00,700 --> 00:07:03,200 So actually it is there in somewhere, right? 112 00:07:03,300 --> 00:07:06,400 So I'm going to provide complete path for my file. 113 00:07:07,900 --> 00:07:10,300 And guys you are working with Windows, 114 00:07:10,700 --> 00:07:14,400 that's why please, please provide '\\' in your path. 115 00:07:14,400 --> 00:07:18,300 [no audio] 116 00:07:18,300 --> 00:07:22,900 Yeah, fine. Now you have in your hand your CSV file. 117 00:07:22,900 --> 00:07:24,500 Next you need to read it. 118 00:07:25,300 --> 00:07:27,400 So guys forget about the CSV, 119 00:07:27,500 --> 00:07:30,600 if you want to read a text file, how we can read? Already we 120 00:07:30,600 --> 00:07:32,200 have seen our previous videos, right? 121 00:07:32,400 --> 00:07:35,800 Let's say I'm going to open your file in 'read' mode. Your file 122 00:07:35,800 --> 00:07:37,900 is there in the variable called 'req_file' 123 00:07:38,200 --> 00:07:41,100 I want to read the content, then open your file in 'read' mode. 124 00:07:41,100 --> 00:07:42,700 Finally just to close your file. 125 00:07:43,600 --> 00:07:44,700 Now I want to get the data 126 00:07:44,700 --> 00:07:47,400 what is there with your CSV file, 127 00:07:47,600 --> 00:07:51,700 I mean with your file, so 'content' or 'data' equals to simply 128 00:07:51,700 --> 00:07:54,000 'fo.', I am taking 'readlines'. 129 00:07:54,000 --> 00:07:56,000 [no audio] 130 00:07:56,000 --> 00:07:58,000 Then let me print your data. 131 00:07:58,500 --> 00:08:04,200 So I am going to take one by one line from your 'content'. 132 00:08:04,200 --> 00:08:06,800 Then, I want to print that. See the result. 133 00:08:06,800 --> 00:08:09,500 [no audio] 134 00:08:09,500 --> 00:08:11,700 Yes, you are getting your data line by line. 135 00:08:12,600 --> 00:08:16,000 Right. So, you know, if you want to remove that space, empty 136 00:08:16,000 --> 00:08:17,300 line, you can strip it. 137 00:08:17,900 --> 00:08:22,100 Yes, you are getting. But the thing is this is like a string. 138 00:08:22,200 --> 00:08:26,700 String means this entire data is like a one column but we 139 00:08:26,700 --> 00:08:28,800 know CSV is nothing but tabular data. 140 00:08:29,700 --> 00:08:33,799 So now you are not getting your data as a separate columns. 141 00:08:33,900 --> 00:08:35,000 Your entire line you're 142 00:08:35,000 --> 00:08:37,700 getting as a one line only. I should not get like that. 143 00:08:37,700 --> 00:08:38,700 Because it is a CSV, 144 00:08:38,700 --> 00:08:40,200 I need to get in terms of columns. 145 00:08:40,200 --> 00:08:42,000 [no audio] 146 00:08:42,000 --> 00:08:46,000 Right. Now what you have to do is first you have to read your, you 147 00:08:46,000 --> 00:08:49,500 have to open your file in 'read' mode, then whatever the data 148 00:08:49,500 --> 00:08:52,400 you have inside of your file that data 149 00:08:52,400 --> 00:08:54,500 you need to convert it into 150 00:08:54,500 --> 00:08:56,400 [no audio] 151 00:08:56,400 --> 00:08:57,600 rows and columns. 152 00:08:58,400 --> 00:09:00,600 Anyway you're getting rows, but you need to convert that 153 00:09:00,600 --> 00:09:01,800 data into columns. 154 00:09:02,200 --> 00:09:05,500 That's why what you have to do is, you need to parse, you need 155 00:09:05,500 --> 00:09:08,200 to send your data into CSV, 156 00:09:08,200 --> 00:09:11,900 so that CSV will convert your data in terms of columns. 157 00:09:13,300 --> 00:09:19,100 Right. So now see that what I'm doing is 'csv_reader', no need to take 158 00:09:19,100 --> 00:09:20,200 this much lengthy variable, 159 00:09:20,200 --> 00:09:26,000 simply you can take, 'csv', 'csv_reader', 'cr', or simply 160 00:09:27,300 --> 00:09:33,800 'csv', or, 'data', equals to, what you have to do is from CSV module, 161 00:09:34,400 --> 00:09:38,300 right, use a function called 'reader'. 162 00:09:39,100 --> 00:09:43,100 So from your file try to convert your rows, 163 00:09:43,100 --> 00:09:45,300 anyway file is nothing but, normal file is also you're going 164 00:09:45,300 --> 00:09:48,200 to get line by line. Line is nothing but a row in your CSV. 165 00:09:48,600 --> 00:09:54,000 But that line consists of, that line consists of columns. 166 00:09:54,400 --> 00:09:59,600 So CSV will automatically convert your data into columns, 167 00:10:00,100 --> 00:10:01,900 right. Now see the result. 168 00:10:01,900 --> 00:10:04,100 [no audio] 169 00:10:04,100 --> 00:10:08,400 So guys in your data you have all your rows of your CSV in 170 00:10:08,400 --> 00:10:09,500 terms of columns. 171 00:10:09,900 --> 00:10:12,600 Let me take, suppose if I directly print your 'data'. 172 00:10:13,900 --> 00:10:17,600 So that is the object, but inside of that object you have 173 00:10:17,600 --> 00:10:21,400 number of rows and each row consists of some columns. 174 00:10:21,400 --> 00:10:25,500 So if you want to see that, 'for each in data', let's say, 175 00:10:25,600 --> 00:10:27,700 I want to print that. Now see the result. 176 00:10:29,600 --> 00:10:30,600 What happened? 177 00:10:32,900 --> 00:10:33,900 'for each in data'. 178 00:10:33,900 --> 00:10:37,200 [no audio] 179 00:10:37,200 --> 00:10:39,700 oh, we close that, right? Yeah. 180 00:10:39,700 --> 00:10:44,300 [no audio] 181 00:10:44,300 --> 00:10:47,500 Let me close after this. 182 00:10:47,500 --> 00:10:51,600 [no audio] 183 00:10:51,600 --> 00:10:52,600 Now see that. 184 00:10:54,100 --> 00:10:55,700 You are able to get your data. 185 00:10:56,900 --> 00:11:01,000 See, each line has a list. List means number of values, separate 186 00:11:01,000 --> 00:11:02,500 values. That means this is one column, 187 00:11:02,500 --> 00:11:03,600 this is one column, this is 188 00:11:03,600 --> 00:11:05,600 one column, separate columns you are getting. 189 00:11:05,600 --> 00:11:08,600 [no audio] 190 00:11:08,600 --> 00:11:10,600 Right. Separate columns you are getting, 191 00:11:12,300 --> 00:11:14,400 or in case, in case if you 192 00:11:14,400 --> 00:11:18,300 don't want to take this, if you don't want to use your CSV, 193 00:11:19,300 --> 00:11:22,700 right, see what I am doing is let's say I want the content 194 00:11:22,700 --> 00:11:27,600 of my file, you know 'fo.readlines'. Without CSV module 195 00:11:27,600 --> 00:11:29,400 also you can convert your data, 196 00:11:29,400 --> 00:11:31,700 [no audio] 197 00:11:31,700 --> 00:11:35,600 into number of columns. See that. I am going to print, 198 00:11:37,200 --> 00:11:41,100 first of all your data line by line. Yes, you are getting. 199 00:11:41,700 --> 00:11:46,100 But you know, this is the a single row. In this do you have 200 00:11:46,100 --> 00:11:47,300 four columns. Now, 201 00:11:47,300 --> 00:11:50,400 if you observe here, this string somewhere you have comma. 202 00:11:50,500 --> 00:11:51,800 I can split it no, 203 00:11:51,800 --> 00:11:56,300 simply. Actually this operation is doing by your CSV module. 204 00:11:56,300 --> 00:12:00,600 That's it. Now see the result. The same data what you are getting from 205 00:12:00,600 --> 00:12:03,800 your CSV module same thing you can able to get it but no 206 00:12:03,800 --> 00:12:06,900 need to do this split operation by yourself, that will taken 207 00:12:06,900 --> 00:12:09,400 care by your CSV module. 208 00:12:09,500 --> 00:12:11,300 That's why what we are doing is, 209 00:12:11,300 --> 00:12:13,800 [no audio] 210 00:12:13,800 --> 00:12:15,300 right, see that. 211 00:12:15,300 --> 00:12:17,300 [no audio] 212 00:12:17,300 --> 00:12:22,300 Now 'content' is nothing but from your 'csv' module create a 213 00:12:22,300 --> 00:12:27,900 'reader' object, or convert your string, string of rows into 214 00:12:28,200 --> 00:12:29,600 some columns. 215 00:12:29,600 --> 00:12:32,500 So for that you can do in this way. That's it. 216 00:12:32,500 --> 00:12:37,200 Now 'for each in content'. Right. 217 00:12:37,200 --> 00:12:41,200 [no audio] 218 00:12:41,200 --> 00:12:43,700 Let me 'print(each)'. 219 00:12:43,700 --> 00:12:45,600 [no audio] 220 00:12:45,600 --> 00:12:47,600 Now see the result. Yeah, finally 221 00:12:47,600 --> 00:12:48,700 you need to close your file. 222 00:12:48,700 --> 00:12:49,800 Don't forget about that. 223 00:12:49,800 --> 00:12:52,300 'fo.close'. 224 00:12:52,300 --> 00:12:54,100 [no audio] 225 00:12:54,100 --> 00:12:55,100 That's it. 226 00:12:56,100 --> 00:13:00,900 Right. So guys actually CSV is splitting your data based 227 00:13:00,900 --> 00:13:04,000 on delimiter, but we are not providing anywhere delimiter, 228 00:13:04,000 --> 00:13:07,800 right. Then how your CSV module is splitting your line into 229 00:13:08,100 --> 00:13:11,600 a list which consists of your values as columns? 230 00:13:11,600 --> 00:13:13,400 [no audio] 231 00:13:13,400 --> 00:13:17,400 We know by default CSV will consider separator as comma. 232 00:13:17,500 --> 00:13:21,300 CSV is nothing but Comma Separated Values. But in case if 233 00:13:21,300 --> 00:13:24,300 you want to mention to your CSV, if you want to tell to your 234 00:13:24,300 --> 00:13:30,800 CSV, "Use this as a separator, delimiter". You can mention, 'delimiter' 235 00:13:30,900 --> 00:13:34,600 is nothing but ",". That's it. Now see the result. 236 00:13:34,600 --> 00:13:37,200 [no audio] 237 00:13:37,200 --> 00:13:42,100 Right. Suppose if I don't mention this, and anyway by default 238 00:13:42,100 --> 00:13:45,600 your delimiter is comma, by default with your CSV 239 00:13:45,600 --> 00:13:47,900 that's why CSV will assume that comma is the separator. 240 00:13:48,500 --> 00:13:50,000 Now we have one more file, right? 241 00:13:50,400 --> 00:13:54,300 If you observe this file, 'new_info', in that delimiter is not a comma. 242 00:13:54,700 --> 00:13:57,200 Let me try with that 'new_info'. 243 00:13:57,200 --> 00:14:00,400 [no audio] 244 00:14:00,400 --> 00:14:03,300 You're getting your data as one value on your list. 245 00:14:03,300 --> 00:14:05,100 That means again, you have one column. 246 00:14:05,300 --> 00:14:07,500 Actually these are four columns in each line, right? 247 00:14:08,500 --> 00:14:12,200 But you're getting as one column. Then in this case, you 248 00:14:12,200 --> 00:14:13,200 need a delimiter. 249 00:14:13,700 --> 00:14:18,000 So for better practice always try to write 'delimiter' as your required thing. 250 00:14:18,000 --> 00:14:20,600 I mean you just mention your 'delimiter' option as well while 251 00:14:20,600 --> 00:14:24,700 converting your data into rows and columns. 252 00:14:25,400 --> 00:14:26,400 That's it. 253 00:14:27,200 --> 00:14:29,500 Now it is not separated, comma as the separator, 254 00:14:29,500 --> 00:14:31,800 it has a pipeline as a separator. Now, let me run it. 255 00:14:33,300 --> 00:14:35,400 What happened? Sorry 256 00:14:35,400 --> 00:14:38,900 [no audio] 257 00:14:38,900 --> 00:14:39,900 That's it. 258 00:14:40,900 --> 00:14:46,800 'delimiter'. So likewise, whatever the delimiter you have in your CSV 259 00:14:46,800 --> 00:14:49,900 file, if you know that you can able to read your data using 260 00:14:49,900 --> 00:14:51,800 your Python very easily in this way. 261 00:14:51,800 --> 00:14:53,700 [no audio] 262 00:14:53,700 --> 00:14:55,300 Right. Fine. 263 00:14:55,300 --> 00:15:00,100 So this is the way how we can read your content from your 264 00:15:00,100 --> 00:15:01,100 CSV file. 265 00:15:01,400 --> 00:15:02,400 So guys nothing is there. 266 00:15:03,800 --> 00:15:09,100 Just open your CSV file in 'read' mode like a text file, after 267 00:15:09,100 --> 00:15:12,900 that you are going to convert your data into tabular data 268 00:15:12,900 --> 00:15:18,600 by using this operation from CSV '.reader', then provide 269 00:15:18,600 --> 00:15:21,000 your file object information, 270 00:15:21,100 --> 00:15:23,400 and what is the delimiter in your CSV file. 271 00:15:23,900 --> 00:15:26,800 Then you are going to get content. Then in that content 272 00:15:26,800 --> 00:15:29,300 you have all your rows and columns. Then you need to take 273 00:15:29,300 --> 00:15:30,300 one by one row. 274 00:15:31,000 --> 00:15:36,000 See, if it is a CSV file, right, it is having rows. Each row is having 275 00:15:36,000 --> 00:15:37,000 some columns. 276 00:15:37,500 --> 00:15:42,500 So all rows if I, I can take by using 'for' loop from your 'content', 277 00:15:42,500 --> 00:15:44,400 not only 'content', you can take any variable here. 278 00:15:46,000 --> 00:15:47,000 Right. 279 00:15:47,500 --> 00:15:50,200 So after reading that then you can close your file. 280 00:15:50,200 --> 00:15:52,800 [no audio] 281 00:15:52,800 --> 00:15:56,800 Okay. So guys this is a simple way to read your CSV files. 282 00:15:56,900 --> 00:15:57,900 While going forward we will 283 00:15:57,900 --> 00:16:01,900 see different ways to work with your CSV files. 284 00:16:02,200 --> 00:16:03,300 Okay. In next video 285 00:16:03,300 --> 00:16:05,900 we'll see how to write a content to your CSV file. 286 00:16:07,200 --> 00:16:08,200 Okay. 287 00:16:08,700 --> 00:16:11,000 Okay guys. Thank you for watching this video. 288 00:16:11,000 --> 00:16:23,400 [no audio]