1 00:00:00,000 --> 00:00:02,160 In this lesson, I'm going to show you how you can 2 00:00:02,160 --> 00:00:05,310 read some text from a file with Python and how to 3 00:00:05,340 --> 00:00:07,740 organize this text so you can use it in your 4 00:00:07,740 --> 00:00:11,130 program. So I have just created a new Python file 5 00:00:11,130 --> 00:00:14,940 here read_write.py for this section. And 6 00:00:14,940 --> 00:00:18,330 then in the same directory here, and the same 7 00:00:18,360 --> 00:00:21,870 level, I have created the file_test, okay, this is 8 00:00:21,870 --> 00:00:25,170 just the name for the file, which contains the 9 00:00:25,170 --> 00:00:27,450 list of numbers, and actually, you can see we have 10 00:00:27,450 --> 00:00:30,360 just one number per line, okay. Those are integer 11 00:00:30,360 --> 00:00:33,510 numbers with positive and negative. One thing you 12 00:00:33,510 --> 00:00:36,720 can see is that this file doesn't have any 13 00:00:36,750 --> 00:00:39,780 extension, okay. You can create a file without any 14 00:00:39,780 --> 00:00:42,240 extension, that's not the problem. On Windows, 15 00:00:42,240 --> 00:00:44,220 it's gonna say that maybe it's not going to 16 00:00:44,220 --> 00:00:47,490 recognize the file. But you actually don't need an 17 00:00:47,490 --> 00:00:50,400 extension to create a text file. You're going to 18 00:00:50,400 --> 00:00:53,550 be able to read it from Python without any 19 00:00:53,550 --> 00:00:56,190 problem. So you can create a file, put it in the 20 00:00:56,190 --> 00:00:58,830 same folder here, if you want to do that, you can 21 00:00:58,830 --> 00:01:01,980 just create a New, File here, don't create a Python 22 00:01:01,980 --> 00:01:05,069 file, just create a new file, give it a name. And 23 00:01:05,069 --> 00:01:08,790 that's it , or from the File Manager. And now how 24 00:01:08,820 --> 00:01:11,940 we're going to read that file from Python? So I'm 25 00:01:11,940 --> 00:01:15,450 going first to write the code structure, okay, and 26 00:01:15,450 --> 00:01:17,760 that's going to be the same every time. And then 27 00:01:17,760 --> 00:01:21,870 I'm going to explain it. So with_ keyword, with, so 28 00:01:21,930 --> 00:01:25,800 open function. The first parameter is the name of 29 00:01:25,800 --> 00:01:30,630 the file, the path. The second is r for reading, 30 00:01:30,660 --> 00:01:35,880 and then you have as f colon, go back to newline, 31 00:01:35,910 --> 00:01:39,510 with indentation. And if you want, for example, to 32 00:01:39,510 --> 00:01:44,275 print the entire file, you can print and then f.read. 33 00:01:44,275 --> 00:01:47,730 So let's actually run that. And you can 34 00:01:47,730 --> 00:01:50,970 see we have the contents of the file here. Okay, 35 00:01:51,000 --> 00:01:53,670 all the numbers you see here are here. So that 36 00:01:53,670 --> 00:01:56,970 worked. And now let me explain that a bit more. So 37 00:01:56,970 --> 00:01:59,190 this structure, you are going to use it to read 38 00:01:59,190 --> 00:02:01,980 from any file, okay. You start with the with 39 00:02:01,980 --> 00:02:05,610 keyword, okay. And then open function, okay, the 40 00:02:05,730 --> 00:02:09,270 thing we find is that you need to first open it, 41 00:02:09,509 --> 00:02:11,880 and then you need to close the file. That's very 42 00:02:11,910 --> 00:02:14,250 important. Here, you can see we don't close the 43 00:02:14,250 --> 00:02:17,160 file, we don't call a close function. Why is that? 44 00:02:17,160 --> 00:02:19,770 Because the with keyword is going to take care of 45 00:02:19,770 --> 00:02:22,860 that. Okay. So whatever is after the with, you can 46 00:02:22,860 --> 00:02:25,020 be sure that you can do whatever you want with 47 00:02:25,050 --> 00:02:28,470 this file. And then when it's going to go out, or 48 00:02:28,470 --> 00:02:30,810 if you have an error, the with keyword will make 49 00:02:30,810 --> 00:02:33,720 sure that the file is closed, so you can open it 50 00:02:33,810 --> 00:02:37,020 again safely the next time. So the function 51 00:02:37,050 --> 00:02:39,450 open is going to open the file. You need first to 52 00:02:39,450 --> 00:02:42,540 provide the path. Okay, so this is the path, this 53 00:02:42,540 --> 00:02:44,730 is directly the name of the file because we are in 54 00:02:44,730 --> 00:02:48,150 the same folder. And then you need to provide, so 55 00:02:48,150 --> 00:02:50,610 between quotes, you need to provide the 56 00:02:50,640 --> 00:02:52,890 permission, what do you want to do with the file. 57 00:02:53,100 --> 00:02:55,950 Here, we just want to read. So you're going to put 58 00:02:55,950 --> 00:02:59,310 the letter r, as simple as that. And then you have 59 00:02:59,400 --> 00:03:02,250 as keyword, and then f, f is going to be the 60 00:03:02,250 --> 00:03:06,300 variable you can use inside the with block, and 61 00:03:06,300 --> 00:03:09,660 this represents actually the file that was opened 62 00:03:09,690 --> 00:03:13,770 here. So you write this and then you can do 63 00:03:13,770 --> 00:03:17,124 whatever you want with the file. And now we have f.read, 64 00:03:17,124 --> 00:03:19,230 which is a function you can use on a 65 00:03:19,230 --> 00:03:23,340 file, which is simply going to read the file. So 66 00:03:23,340 --> 00:03:27,360 to read everything, and we just print that. And 67 00:03:27,360 --> 00:03:30,780 now what if we want for example, to read each line 68 00:03:30,810 --> 00:03:34,170 separately and not just get the whole text in one 69 00:03:34,170 --> 00:03:37,020 block. What you can do something to run that you 70 00:03:37,020 --> 00:03:40,860 can do for line in f. So this also is a very 71 00:03:40,860 --> 00:03:44,250 common structure, for line in f, and let's do print 72 00:03:44,940 --> 00:03:48,810 line. So for line in f, so the file that you get 73 00:03:48,810 --> 00:03:52,410 here is going to iterate through all the lines of 74 00:03:52,410 --> 00:03:55,230 the file. So I can run this and you can see we 75 00:03:55,230 --> 00:03:59,730 still have the same numbers. But we also have a 76 00:03:59,760 --> 00:04:03,120 new line every time. Okay, we have an additional 77 00:04:03,150 --> 00:04:06,660 new line. Okay, so how to get rid of this. So this 78 00:04:06,660 --> 00:04:09,120 is very common when you do this structure. How to 79 00:04:09,120 --> 00:04:13,500 get rid of this? Well, line is a string, okay. And 80 00:04:13,530 --> 00:04:16,440 the thing is that every time you go to a new line 81 00:04:16,440 --> 00:04:20,100 with this structure, it's going to add a new line 82 00:04:20,100 --> 00:04:22,590 character. So you need to remove that newline 83 00:04:22,590 --> 00:04:26,160 character for each line you read. And to do that, 84 00:04:26,160 --> 00:04:31,560 you simply do line.rstrip, okay. This is 85 00:04:31,560 --> 00:04:35,700 a function on string, and you can do quote and 86 00:04:35,700 --> 00:04:40,320 backslash n. So actually what is this backslash n? 87 00:04:40,350 --> 00:04:44,340 Okay, if I am going to do something print, I'm 88 00:04:44,340 --> 00:04:50,430 going to do A backslash n B backslash n C. Okay, 89 00:04:50,520 --> 00:04:53,160 and I'm going to run that and you can see we have 90 00:04:53,190 --> 00:04:57,870 A, B, C each on a new line. Okay, the backslash n 91 00:04:57,870 --> 00:05:00,240 character is actually a special character. So 92 00:05:00,240 --> 00:05:03,150 when you do backslash plus letter, you 93 00:05:03,180 --> 00:05:06,000 usually have a special character. And this is 94 00:05:06,000 --> 00:05:08,580 simply going back to a new line. So the problem 95 00:05:08,580 --> 00:05:11,160 with this structure is simply that you get a new 96 00:05:11,520 --> 00:05:14,910 newline character for each line, and the rstrip 97 00:05:14,910 --> 00:05:17,790 is simply going to remove it from the end of the 98 00:05:17,790 --> 00:05:20,670 string. So if you have this string, for example, 99 00:05:20,940 --> 00:05:23,880 and you use that function, it's going to remove 100 00:05:24,180 --> 00:05:28,650 the last backslash n. Okay, so now I'm going to 101 00:05:28,650 --> 00:05:33,810 run it like this. And you can see, we have each 102 00:05:33,840 --> 00:05:37,890 line without any additional backslash n or 103 00:05:37,890 --> 00:05:40,680 newline character. Great. So now you can open the 104 00:05:40,680 --> 00:05:43,950 file and process each line separately. What I'm 105 00:05:43,950 --> 00:05:46,290 going to show you now is about this, I'm going to 106 00:05:46,290 --> 00:05:51,000 come back to this file name of file path. So you 107 00:05:51,000 --> 00:05:55,140 have two ways to provide the path here, either the 108 00:05:55,170 --> 00:05:58,590 relative path, which is what we've done here. So 109 00:05:58,590 --> 00:06:02,190 relative is simply that, where is this file 110 00:06:02,280 --> 00:06:06,450 relative to that one, this is in the same folder 111 00:06:06,450 --> 00:06:08,910 to just write the name of the file. If this is 112 00:06:08,910 --> 00:06:11,520 somewhere else, you need to change the path. Okay. 113 00:06:12,030 --> 00:06:15,060 So if you create a new folder, and you put this in 114 00:06:15,060 --> 00:06:17,130 the folder, you need to put the name of the folder 115 00:06:17,130 --> 00:06:20,610 first. Now, what you can also do is to provide the 116 00:06:20,640 --> 00:06:25,020 absolute path. So I'm going to go here and right 117 00:06:25,020 --> 00:06:29,820 click, and do Copy Path. Okay. And you can see, so 118 00:06:29,820 --> 00:06:32,940 we have the relative path, which is that one Path 119 00:06:32,970 --> 00:06:37,920 From Content Root, and we have the absolute path, 120 00:06:37,950 --> 00:06:41,520 which is actually where the file is exactly. Okay. 121 00:06:41,520 --> 00:06:47,007 So c:\Users\ my user name, and then PyCharmProjects\ 122 00:06:47,007 --> 00:06:50,070 my_first_project and the filename. So 123 00:06:50,070 --> 00:06:53,650 I'm going to copy this, and I'm going to replace it here. 124 00:06:54,630 --> 00:06:56,880 And using this, I'm going to be able to open the 125 00:06:56,880 --> 00:07:00,180 file the same way. But for Windows, we need to 126 00:07:00,180 --> 00:07:03,270 change something, okay, if you're using Linux or 127 00:07:03,270 --> 00:07:06,330 macOS, you don't need to change anything on the 128 00:07:06,360 --> 00:07:09,510 path because it's going to use forward slashes, 129 00:07:09,510 --> 00:07:12,690 okay, so forward slash is simply like this, okay. 130 00:07:13,020 --> 00:07:16,110 But on Windows, you have backslashes. And as you 131 00:07:16,110 --> 00:07:18,510 have seen with the backslash n, when you use a 132 00:07:18,510 --> 00:07:21,060 backslash, usually it means that you have a 133 00:07:21,060 --> 00:07:24,570 special character. So what we need to do if you 134 00:07:24,570 --> 00:07:27,030 want to use a backslash in a string, actually, 135 00:07:27,390 --> 00:07:30,750 without triggering a special character, you need 136 00:07:30,750 --> 00:07:34,140 to double each backslash. So that's what I'm going 137 00:07:34,140 --> 00:07:37,140 to do. Every time I see a backslash, I double it 138 00:07:37,440 --> 00:07:40,530 with another backslash. As you can see here, you 139 00:07:40,530 --> 00:07:43,800 have the recognized backslash f, which is going to 140 00:07:43,800 --> 00:07:47,130 do somethin. If I put a backslash here, it's 141 00:07:47,130 --> 00:07:51,150 simply a normal, backslash, OK. And now I have the 142 00:07:51,180 --> 00:07:54,510 absolute path, I'm going to run this and this is 143 00:07:54,510 --> 00:07:57,630 going to work the same. So you can either provide 144 00:07:57,690 --> 00:08:01,380 the absolute path or the relative path, depending 145 00:08:01,410 --> 00:08:04,710 on what is more convenient. So now what you can do 146 00:08:04,710 --> 00:08:07,290 with this, well, you can do whatever you want. For 147 00:08:07,290 --> 00:08:10,650 example, you could create a number list again, 148 00:08:10,680 --> 00:08:14,610 empty number list before the with structure. And 149 00:08:14,610 --> 00:08:17,130 then for each line, you're going to add the new 150 00:08:17,130 --> 00:08:20,760 line or the new number in the list. Okay, so make 151 00:08:20,760 --> 00:08:24,840 sure you also cast and the number so cast this to 152 00:08:24,870 --> 00:08:29,790 integer float, so you can make them usable, okay 153 00:08:29,790 --> 00:08:32,610 in your list when you want to make computations 154 00:08:32,640 --> 00:08:35,669 with numbers. All right, so now you can read from 155 00:08:35,669 --> 00:08:38,909 a file, and you can also see how to organize the 156 00:08:38,909 --> 00:08:41,820 content you read so you can then process it in 157 00:08:41,820 --> 00:08:43,299 your Python code.