1 00:00:00,000 --> 00:00:01,200 [no audio] 2 00:00:01,200 --> 00:00:03,400 Friends here we are going to discuss about 3 00:00:03,400 --> 00:00:05,000 what are the rules to create 4 00:00:05,000 --> 00:00:08,300 a pattern, such that you can use that pattern in your Regular 5 00:00:08,300 --> 00:00:10,300 Expressions. See already 6 00:00:10,300 --> 00:00:14,400 we know simply what is the RegEx or Regular Expression. See 7 00:00:14,400 --> 00:00:17,900 the 'regex' is a procedure in any language to look for a specified 8 00:00:17,900 --> 00:00:19,500 pattern in a given text. 9 00:00:20,700 --> 00:00:24,300 Now if you want to work with the Regular Expressions using 10 00:00:24,300 --> 00:00:28,400 your Python, then you have to use a module called 're', 11 00:00:29,400 --> 00:00:30,800 and that is the default module. 12 00:00:31,200 --> 00:00:33,500 So if you want to work with 're' in your scripts you have 13 00:00:33,500 --> 00:00:37,800 to import that. Simply 'import re'. Now in this 're' module you 14 00:00:37,800 --> 00:00:39,800 are having different types of operations. 15 00:00:39,900 --> 00:00:43,200 Now, I am going to show you some of them like, search, match, 16 00:00:43,200 --> 00:00:48,900 finditer, findall, sub, split, compile, right, and 17 00:00:49,300 --> 00:00:50,300 see as of now 18 00:00:50,300 --> 00:00:53,600 we are at very beginning stage to work with your Regular 19 00:00:53,600 --> 00:00:57,400 Expressions, that's why I am going to show you simple syntaxes 20 00:00:57,400 --> 00:00:58,600 for your 'search'. 21 00:00:58,800 --> 00:01:02,900 'match', 'finditer', and 'findall'. See 22 00:01:03,600 --> 00:01:05,000 this is one Regular Expression, 23 00:01:05,700 --> 00:01:07,700 and here we are using 'search' operation. 24 00:01:07,700 --> 00:01:09,400 Now the purpose of 'search' 25 00:01:09,400 --> 00:01:11,900 operation from your Regular Expression is different, same 26 00:01:11,900 --> 00:01:15,400 way the 'match' is different, 'finditer' is different, find all is different. 27 00:01:16,600 --> 00:01:19,400 But whatever it may be all these operations are going to 28 00:01:19,400 --> 00:01:20,900 use a called 'pattern'. 29 00:01:22,500 --> 00:01:25,700 Now before going to work with your Regular Expressions operations, 30 00:01:25,700 --> 00:01:28,200 first of all we should be good with 'pattern', 31 00:01:28,300 --> 00:01:30,300 I mean how to create a 'pattern', or 32 00:01:30,300 --> 00:01:32,000 what are the rules to create your 'pattern'. 33 00:01:33,100 --> 00:01:36,200 Right. Once if you are good with pattern, then you can very 34 00:01:36,200 --> 00:01:40,300 easily use your required operation from your RegEx or 're' module. 35 00:01:41,700 --> 00:01:46,700 So first I'm showing you very simple examples for your pattern. 36 00:01:47,000 --> 00:01:48,800 And already we know that guys, what is a pattern? 37 00:01:50,200 --> 00:01:54,500 Pattern is simply a sequence of characters which represent 38 00:01:54,500 --> 00:01:58,700 multiple strings. Now see that I am taking one of the pattern 39 00:01:58,700 --> 00:02:01,000 called "Python" inside of a quotation. 40 00:02:01,400 --> 00:02:05,700 First of all this is a string, your pattern, but this is 41 00:02:05,700 --> 00:02:08,500 normally a string, but whenever if you use this string 42 00:02:08,600 --> 00:02:12,000 inside of your Regular Expressions, then it is going to become a pattern. 43 00:02:12,900 --> 00:02:15,000 Now generally, I am saying that pattern is going to represent 44 00:02:15,000 --> 00:02:16,000 multiple strings. 45 00:02:16,600 --> 00:02:20,200 But of course sometimes your pattern is also going to represent 46 00:02:20,300 --> 00:02:22,300 only one string, no problem. 47 00:02:22,700 --> 00:02:25,600 But if you use your string inside of Regular Expressions, 48 00:02:25,600 --> 00:02:29,400 then this is going to become a pattern. Now this pattern is 49 00:02:29,400 --> 00:02:30,900 going to represent only one string, 50 00:02:31,000 --> 00:02:34,900 but if I take this pattern "python", in square brackets 2, 51 00:02:34,900 --> 00:02:39,100 and 3, '[23]'. First of all, this is a string. Normally this is 52 00:02:39,100 --> 00:02:42,700 a string, but if you use this string inside of your Regular Expressions, 53 00:02:42,700 --> 00:02:45,000 now it is going to become a pattern. Now if it is a 54 00:02:45,000 --> 00:02:49,300 pattern it is going to represent two strings, "Python2", and "Python3". 55 00:02:50,500 --> 00:02:55,000 That's it. Now here we have to know how the square brackets are going 56 00:02:55,000 --> 00:02:57,800 to work here. See based on square brackets 57 00:02:57,800 --> 00:03:00,500 now this entire string is going to represent two strings. 58 00:03:01,800 --> 00:03:03,100 That's why this is called a pattern. 59 00:03:03,800 --> 00:03:06,900 Now we have to learn this type of rules to create a pattern. 60 00:03:07,400 --> 00:03:11,100 Anyway, we will see them one by one, step by step, and guys 61 00:03:11,100 --> 00:03:12,900 sometimes for your pattern 62 00:03:13,000 --> 00:03:17,000 you can also use 'r', that is called 'raw string', but there 63 00:03:17,000 --> 00:03:18,300 is some purpose for that. 64 00:03:18,300 --> 00:03:21,600 We'll discuss that in your advanced level of your Regular 65 00:03:21,600 --> 00:03:27,000 Expressions. Now guys, just to get an idea on how to create 66 00:03:27,000 --> 00:03:30,700 a pattern or to understand the rules to create your pattern 67 00:03:30,900 --> 00:03:33,700 I am going to take one of the operations from your Regular 68 00:03:33,700 --> 00:03:36,800 Expressions that is called 'findall', because compared to all 69 00:03:37,000 --> 00:03:38,400 this 'findall' is very easy. 70 00:03:39,100 --> 00:03:41,800 So I'm going to take this operation just to give you 71 00:03:41,800 --> 00:03:46,700 clarity on to create rules for your patterns. Then later 72 00:03:46,700 --> 00:03:48,100 we will work with all operations. 73 00:03:48,600 --> 00:03:51,000 See once you are good with pattern, then we can work with 74 00:03:51,000 --> 00:03:53,400 any operation from your Regular Expressions. 75 00:03:53,400 --> 00:03:57,100 [no audio] 76 00:03:57,100 --> 00:03:58,200 Fine. Now see that. 77 00:03:58,200 --> 00:04:00,100 [no audio] 78 00:04:00,100 --> 00:04:03,500 'findall', this is one of the operations from your Regular Expression. 79 00:04:03,900 --> 00:04:07,700 Now how to use that 'findall'? Simply you have to import your 80 00:04:07,700 --> 00:04:11,100 're' module inside of your scripts on your command 81 00:04:11,100 --> 00:04:16,300 line. Then 're.findall(pattern,text)'. 82 00:04:16,600 --> 00:04:19,200 So this is your given text, in this text you are going to 83 00:04:19,200 --> 00:04:23,200 search for something, search for some string. That we are going 84 00:04:23,200 --> 00:04:26,899 to use, that we are going to find by using 'findall' operation 85 00:04:26,899 --> 00:04:27,899 from your 're'. 86 00:04:28,200 --> 00:04:30,000 So finally whatever you are going to get, that 87 00:04:30,000 --> 00:04:31,000 I'm printing. That's it. 88 00:04:31,000 --> 00:04:33,000 [no audio] 89 00:04:33,000 --> 00:04:35,400 Right. See that. Just to give simple idea 90 00:04:35,500 --> 00:04:37,100 I am going to write a simple 91 00:04:37,100 --> 00:04:39,900 [no audio] 92 00:04:39,900 --> 00:04:41,900 Python script. Let me save this. 93 00:04:42,000 --> 00:04:44,900 Of course, you can practice this on your command line as well. 94 00:04:46,200 --> 00:04:53,500 So 'practice_for_regex.py'. See first of all we are going 95 00:04:53,500 --> 00:04:56,300 to work with Regular Expressions in your Python. 96 00:04:56,300 --> 00:05:00,300 So we need to 'import re', and make sure that this is a default 97 00:05:00,300 --> 00:05:03,700 module. You don't need to install your 're' module, by default it 98 00:05:03,700 --> 00:05:04,800 is there with your Python. 99 00:05:05,600 --> 00:05:09,700 Now, first of all let me take 'my_str', or let me say simply 100 00:05:09,700 --> 00:05:11,200 'text'. You can take anything, 101 00:05:11,200 --> 00:05:12,200 this is just a variable. 102 00:05:12,600 --> 00:05:13,700 So, "This is a 103 00:05:13,700 --> 00:05:16,600 [no audio] 104 00:05:16,600 --> 00:05:21,900 python and it is easy to learn". That's it. 105 00:05:22,100 --> 00:05:23,100 This is my string. 106 00:05:23,400 --> 00:05:26,700 Now, my intention is, I want to search 107 00:05:28,300 --> 00:05:30,200 a string called "is" in 108 00:05:30,200 --> 00:05:31,200 a given text. 109 00:05:32,900 --> 00:05:35,800 See that. What I am doing is, simply I'm going to create 110 00:05:35,800 --> 00:05:37,300 'my_pat'. Pattern is nothing 111 00:05:37,300 --> 00:05:39,500 but, what is your required string. 112 00:05:40,400 --> 00:05:42,500 What is the required string you want to search in a given 113 00:05:42,500 --> 00:05:44,100 text, that is a pattern. 114 00:05:44,200 --> 00:05:45,900 So I want to search only "is". 115 00:05:47,400 --> 00:05:53,700 Now, let me simply 'print(re.findall())', your pattern in 116 00:05:53,700 --> 00:05:54,700 a given text. 117 00:05:55,300 --> 00:05:57,200 So guys these are the variables. So directly 118 00:05:57,200 --> 00:05:59,800 if you want to write your pattern here, and your text here 119 00:05:59,800 --> 00:06:01,700 you can write it. I will show you that as well. 120 00:06:02,200 --> 00:06:06,500 Now, let me run this and see the output. See in your entire text 121 00:06:06,700 --> 00:06:08,300 in how many places "is" there? 122 00:06:08,500 --> 00:06:10,500 See, guys be clear. 123 00:06:10,500 --> 00:06:15,500 We are not going to work with word. Word means, before your 124 00:06:15,600 --> 00:06:16,800 starting letter of a word 125 00:06:16,800 --> 00:06:19,400 you should have some space, after ending letter of your word 126 00:06:19,400 --> 00:06:21,800 you should have some space, then that is a word. But as of 127 00:06:21,800 --> 00:06:24,700 now you're not going to work with a word, you are going to 128 00:06:24,700 --> 00:06:27,100 work with a string. See 129 00:06:27,100 --> 00:06:30,800 "is" is there with the combination of this word, but we are looking 130 00:06:30,800 --> 00:06:32,600 just simply "is" string. 131 00:06:33,300 --> 00:06:38,100 So here "is" is there, here "is" is there, here "is" is there, so 132 00:06:38,100 --> 00:06:44,000 all those "is" you're getting. See what I am saying is, yeah 133 00:06:45,400 --> 00:06:48,700 while working with your Regular Expressions, directly 134 00:06:48,700 --> 00:06:53,100 you can write your pattern here and your text here. See 135 00:06:53,100 --> 00:06:54,300 I am going to write this thing. 136 00:06:55,400 --> 00:06:58,100 But instead of writing, just I stored in some variables, and 137 00:06:58,100 --> 00:06:59,400 now I am working with variables. 138 00:06:59,500 --> 00:07:01,400 There is no change. Right. 139 00:07:01,400 --> 00:07:04,800 Anyway, this is not a good practice. First store your text into 140 00:07:05,100 --> 00:07:07,200 some variable and your pattern into some 141 00:07:07,200 --> 00:07:08,700 variable then use that variable. 142 00:07:08,700 --> 00:07:10,800 [no audio] 143 00:07:10,800 --> 00:07:14,300 Fine. Now how it is going to work? 144 00:07:15,400 --> 00:07:17,800 Very simple. What is your pattern? "is". 145 00:07:18,600 --> 00:07:22,800 So two letters string you're searching. What your Python will 146 00:07:22,800 --> 00:07:27,800 do, from your given text it will take first two characters. Then 147 00:07:27,800 --> 00:07:31,800 is this, are these two characters equal to these two characters? 148 00:07:31,800 --> 00:07:36,500 No. That's why it will skip that part. Then it will take second 149 00:07:37,800 --> 00:07:40,600 character and third character, 150 00:07:40,700 --> 00:07:42,600 "h-i". Are they equal to "is"? 151 00:07:42,800 --> 00:07:46,100 No. Then it is going to skip. Then third and fourth it will 152 00:07:46,100 --> 00:07:47,900 take. Yes, "is" and "is". Now 153 00:07:47,900 --> 00:07:50,600 it is going to match your going to get. So likewise 154 00:07:50,600 --> 00:07:53,900 it is going to compare in your entire text, wherever there 155 00:07:53,900 --> 00:07:57,800 is a match for "is" those strings you are going to get. 156 00:07:58,500 --> 00:08:02,000 Be clear. Not a word, just simply you're going to match for a string. 157 00:08:02,000 --> 00:08:03,700 Even word is also a string. 158 00:08:04,800 --> 00:08:08,700 See, here "is" is a word but here "is" is not a word. That is a combination 159 00:08:08,700 --> 00:08:11,300 with some other word. But we are not looking for a word. 160 00:08:11,300 --> 00:08:14,000 We are looking for a string. That's it. 161 00:08:15,300 --> 00:08:17,500 Now, can you tell me, very simple logic here, 162 00:08:17,500 --> 00:08:20,000 [no audio] 163 00:08:20,000 --> 00:08:23,500 that is, how many times "is" is there in a given text? 164 00:08:23,900 --> 00:08:28,600 How many times "is" is there in a given text? See if you observe, 165 00:08:28,600 --> 00:08:32,299 the operation result, 'findall' operation result is a list 166 00:08:32,299 --> 00:08:37,299 no. We know the length we can find for a list by using 'len()' 167 00:08:37,299 --> 00:08:39,200 operation so that you can conclude that 168 00:08:39,200 --> 00:08:42,400 there are three "is" strings are there in a given text. 169 00:08:43,000 --> 00:08:44,400 It's very, very important one. 170 00:08:44,400 --> 00:08:47,000 [no audio] 171 00:08:47,000 --> 00:08:48,000 That's fine. 172 00:08:48,000 --> 00:08:51,600 [no audio] 173 00:08:51,600 --> 00:08:56,400 So first let me write once again, 'print(re.findall())', 174 00:08:56,799 --> 00:08:59,000 your pattern, given text. 175 00:08:59,000 --> 00:09:00,800 [no audio] 176 00:09:00,800 --> 00:09:06,200 Fine. Now my intention is, actually, the basic purpose of your Regular 177 00:09:06,200 --> 00:09:07,700 Expression is at a time 178 00:09:07,800 --> 00:09:10,900 if you want to search multiple strings, then we are going 179 00:09:10,900 --> 00:09:12,300 to work with Regular Expressions. 180 00:09:13,500 --> 00:09:15,300 Now I want to print 181 00:09:17,000 --> 00:09:21,800 from my given text wherever "is" is there, I mean "is" string 182 00:09:21,800 --> 00:09:22,800 and "it" string, 183 00:09:24,400 --> 00:09:26,400 all those things I want to print. 184 00:09:27,400 --> 00:09:29,000 See my intention, be clear, 185 00:09:29,500 --> 00:09:34,200 I want to search for "is" and "it" in a given 186 00:09:35,500 --> 00:09:39,400 text. So if "is" is there it has to print, if "it" is there it has 187 00:09:39,400 --> 00:09:42,800 to print. Suppose if "is" is there multiple times, all times 188 00:09:42,800 --> 00:09:46,500 it has to print. If "it" is there multiple times, all "its" it has to print. 189 00:09:48,000 --> 00:09:50,200 But now you have to understand that 190 00:09:51,400 --> 00:09:54,100 this, here you are having two strings, 191 00:09:54,300 --> 00:09:57,700 but from this you need to make a pattern 192 00:09:58,900 --> 00:10:01,800 such that that pattern has to represent these two strings, 193 00:10:02,000 --> 00:10:06,200 so that by using that pattern you can search in a given string 194 00:10:06,200 --> 00:10:07,900 for these two strings. 195 00:10:09,300 --> 00:10:11,900 See, if you observe first character is common 196 00:10:11,900 --> 00:10:15,700 no. That's why I can take simply "i". Then second character 197 00:10:15,700 --> 00:10:20,100 it maybe "s" or "t". So whenever there is a possibility of this 198 00:10:20,100 --> 00:10:23,400 one or that one then just simply write square brackets, inside 199 00:10:23,400 --> 00:10:24,600 of that just write 200 00:10:24,600 --> 00:10:27,600 "st" or "ts", no problem, both are same. 201 00:10:28,900 --> 00:10:32,000 Right. Now see that. What I am doing is, instead of "is", 202 00:10:33,200 --> 00:10:36,800 first character is "is", sorry "i", but second character 203 00:10:36,800 --> 00:10:38,200 maybe "s" or maybe "t". 204 00:10:39,000 --> 00:10:42,400 Now see the result - "is", "is", "it", "is". 205 00:10:42,900 --> 00:10:45,600 Yes, you are getting. But you don't need to follow order here. 206 00:10:45,600 --> 00:10:47,800 I mean first "s", second "t". No, not like that. 207 00:10:47,800 --> 00:10:50,000 You can also write "ts" also, no problem. See that. 208 00:10:51,500 --> 00:10:55,400 Now you are looking for a string which consists of 209 00:10:55,400 --> 00:10:57,200 two characters first of all, 210 00:10:58,400 --> 00:11:01,100 and that string, two character string 211 00:11:01,100 --> 00:11:05,600 maybe "is" or maybe "it". Now you are going to search for two 212 00:11:05,600 --> 00:11:10,000 strings with length of two, that means two characters in each 213 00:11:10,000 --> 00:11:15,300 string. Those strings are "it" and "is" in a given text. Now if 214 00:11:15,300 --> 00:11:18,500 they are in a given text then your 'findall' is going to find 215 00:11:18,500 --> 00:11:20,400 them. Let's say suppose 216 00:11:20,400 --> 00:11:21,400 I am trying to find, 217 00:11:23,000 --> 00:11:25,700 let me comment this, I want to find like 218 00:11:27,100 --> 00:11:29,000 'my_pat="x"'. 219 00:11:29,400 --> 00:11:31,000 Do you have "x" in your given text? 220 00:11:31,000 --> 00:11:32,400 No, that's why you're getting empty, '[]'. 221 00:11:32,400 --> 00:11:34,500 [no audio] 222 00:11:34,500 --> 00:11:38,400 Right. Do you have "a"? Yes. How many times "a" is there, that many 223 00:11:38,400 --> 00:11:40,800 number of times you are getting. Here "a" 224 00:11:40,800 --> 00:11:43,400 is there. Be clear. Very, very important. Here "a" 225 00:11:43,400 --> 00:11:46,700 is there, here "a" is there, here "a" is there. 226 00:11:48,200 --> 00:11:52,600 So wherever you have "a" all that "a" you are getting here. 227 00:11:54,100 --> 00:11:58,700 So if you go with advanced operations like 'search', 'findall', 228 00:11:58,800 --> 00:11:59,800 sorry, 'finditer', 229 00:12:00,500 --> 00:12:03,900 then you are going to get the index position as well. 230 00:12:04,000 --> 00:12:08,000 What is the index of first matching character, first 231 00:12:08,000 --> 00:12:10,100 matching string, and second matching string. 232 00:12:10,400 --> 00:12:12,200 We will see that in advanced level. 233 00:12:12,600 --> 00:12:15,300 So for time being we have to understand how to create a pattern. 234 00:12:16,500 --> 00:12:20,000 Now here I am going to show you the simple rules 235 00:12:21,000 --> 00:12:25,800 to create your pattern but this is just first level rules, 236 00:12:26,300 --> 00:12:27,400 basic rules. 237 00:12:27,400 --> 00:12:31,800 So we will see advanced level rules in next video. Fine. 238 00:12:31,800 --> 00:12:34,400 The very first thing is, suppose 239 00:12:34,400 --> 00:12:36,900 I want to match "a" with "a", then directly you have to write 240 00:12:36,900 --> 00:12:38,400 "a", there is no other option. 241 00:12:38,500 --> 00:12:40,700 I mean, suppose here I have written "a", right? 242 00:12:41,000 --> 00:12:43,400 So you are going to match "a" with "a", then you are getting. 243 00:12:43,800 --> 00:12:47,300 Suppose I want to match with some number, if number is there 244 00:12:47,300 --> 00:12:51,900 9 that to 9, if that 9 number is there in a given string, then, given 245 00:12:51,900 --> 00:12:53,000 string means text guys. 246 00:12:53,400 --> 00:12:57,200 In this 'text' variable if you have a 9, then Python 247 00:12:57,200 --> 00:13:00,200 will find that. But as of now, you don't have any 9 number 248 00:13:00,200 --> 00:13:02,200 in a given text, that's why it is not finding. 249 00:13:03,400 --> 00:13:06,200 So simple rule is, if you want to match with any particular 250 00:13:06,200 --> 00:13:08,800 character with particular character, then you 251 00:13:08,800 --> 00:13:11,400 need to directly write your pattern as that character only. 252 00:13:11,800 --> 00:13:12,900 Let's say somewhere 253 00:13:12,900 --> 00:13:14,100 I'm having "@". 254 00:13:15,200 --> 00:13:18,100 Now I want to search how many times "@" 255 00:13:18,100 --> 00:13:19,400 is there in a given text. 256 00:13:20,600 --> 00:13:22,100 See that. One time. 257 00:13:22,300 --> 00:13:23,400 Let me write somewhere 258 00:13:23,800 --> 00:13:26,200 once again this "@". Now 259 00:13:26,300 --> 00:13:29,500 let me rerun your code and see the result. In two places you 260 00:13:29,500 --> 00:13:31,800 have "@", now those two "@" you are getting. 261 00:13:31,800 --> 00:13:34,100 [no audio] 262 00:13:34,100 --> 00:13:38,700 Fine. Now if you want to match any character with particular 263 00:13:38,700 --> 00:13:42,900 character then you have to write that character itself. 264 00:13:43,600 --> 00:13:49,300 That is, anyway, we know that generally. Second rule, see just 265 00:13:49,300 --> 00:13:55,000 now we have seen that if I write something like '[ts]' or '[st]', 266 00:13:55,800 --> 00:14:00,800 what is the meaning for this? It is going to look for either 267 00:14:00,800 --> 00:14:07,000 "t" or for "s". But same way if I write '[abc]', if I write '[abc]' now it 268 00:14:07,000 --> 00:14:11,200 is going to look for either "a" or "b" or "c" in a given text. 269 00:14:11,200 --> 00:14:13,600 Let me show you that so that you can understand. 270 00:14:14,200 --> 00:14:16,400 Suppose I am writing as of now 271 00:14:16,400 --> 00:14:17,700 'my_pat="@"'. 272 00:14:18,100 --> 00:14:22,400 Now your 'findall' is going to match or look in a given text only for "@". 273 00:14:23,000 --> 00:14:27,200 Now, what I am doing is, if I write something like "@a", 274 00:14:27,200 --> 00:14:31,000 what is the meaning for this? You are going to look for 275 00:14:31,000 --> 00:14:35,200 a string which consists of "@", and "a" immediately, side by side 276 00:14:35,200 --> 00:14:38,400 "@", and "a". Do you have anywhere in a given text? No, nowhere 277 00:14:38,400 --> 00:14:41,700 we have "@" and "a" immediately in a given text. Bow see 278 00:14:41,700 --> 00:14:47,200 the result, '[]'. But I want to look either for "@", 279 00:14:47,400 --> 00:14:49,400 or for "a", or for "s", 280 00:14:49,900 --> 00:14:53,500 you can write any number of characters inside of square brackets. 281 00:14:53,500 --> 00:14:57,800 Now the meaning is, assume that your Python is going to search 282 00:14:57,800 --> 00:15:03,200 for "@", also for "a", and also for "s". Now see the result. 283 00:15:04,600 --> 00:15:06,800 How many times "s" is there? That many times you're going to 284 00:15:06,800 --> 00:15:09,700 get "s". How many times "a" is there? That many times you are 285 00:15:09,700 --> 00:15:12,100 getting. How many times "@" is there? You are getting. 286 00:15:13,800 --> 00:15:17,500 Now, first of all you're looking for only one character string 287 00:15:17,500 --> 00:15:21,100 in a given string. That character maybe "@", maybe 288 00:15:21,100 --> 00:15:22,100 "a", or maybe "s". 289 00:15:22,400 --> 00:15:24,700 So all these three are there some number of times, you're 290 00:15:24,700 --> 00:15:27,300 going to get all of them that many number of times. 291 00:15:27,300 --> 00:15:29,900 [no audio] 292 00:15:29,900 --> 00:15:33,100 Right. Fine. Now see here. 293 00:15:33,100 --> 00:15:36,200 [no audio] 294 00:15:36,200 --> 00:15:40,600 So sometimes I want to, just assume that I want to look for 295 00:15:40,900 --> 00:15:44,800 either a or b or c or d or e or f. 296 00:15:44,800 --> 00:15:46,600 [no audio] 297 00:15:46,600 --> 00:15:51,100 So what is the meaning of, if you write any characters 298 00:15:51,100 --> 00:15:53,900 inside of a square bracket means, first of all this entire 299 00:15:53,900 --> 00:15:55,300 thing is one character. 300 00:15:56,400 --> 00:16:01,500 But that character maybe "a" or maybe "b" or maybe "c", "d", likewise. 301 00:16:03,000 --> 00:16:05,500 So I can write in this way if I want to look for either "a" 302 00:16:05,500 --> 00:16:11,100 or "b" or "c" or "d" or "e" or "f", but all these are sequence characters, 303 00:16:11,100 --> 00:16:12,900 right, "a" to "f". 304 00:16:13,300 --> 00:16:14,700 So instead of writing in this way, 305 00:16:14,700 --> 00:16:18,400 there is a shortcut for this. You can also write simply "a-f". 306 00:16:19,500 --> 00:16:22,400 Now the meaning for this is you are going to look for one 307 00:16:22,400 --> 00:16:27,100 character string but that character maybe "a" or maybe "b", or 308 00:16:27,100 --> 00:16:28,300 up to "f". 309 00:16:28,500 --> 00:16:31,600 That's it. Now assume that 310 00:16:32,900 --> 00:16:33,900 something like this, 311 00:16:34,000 --> 00:16:36,200 I want to search a string, 312 00:16:36,200 --> 00:16:39,200 [no audio] 313 00:16:39,200 --> 00:16:52,000 let's say '[abcd, either "a" or "b" or "c" or "d", or some 'hijkl', 314 00:16:52,600 --> 00:17:00,000 or 'xyz]'. No spaces guys. Just assume that I have written 315 00:17:00,000 --> 00:17:04,800 '[abcd hijkl xyz]'. Now 316 00:17:04,800 --> 00:17:07,300 what is first of all the meaning for this? If you write anything 317 00:17:07,300 --> 00:17:11,800 inside of a square bracket, it is going to look 318 00:17:11,800 --> 00:17:16,200 for one character string, but that character maybe "a", or maybe 319 00:17:16,200 --> 00:17:18,598 "b", something like that from this group. 320 00:17:19,500 --> 00:17:23,500 But if you observe, first four characters are sequence, next 321 00:17:23,500 --> 00:17:26,000 five characters are also in sequence, next three characters 322 00:17:26,098 --> 00:17:28,098 are also sequence. In that case 323 00:17:28,098 --> 00:17:30,900 you can also write shortcut as 'a-d', 324 00:17:32,300 --> 00:17:39,400 now by default 'a-d' means '[abcd]'. Then '[h-l]'. The meaning 325 00:17:39,400 --> 00:17:45,300 for this is [hijkl]. Then '[x-z]'. Now the meaning for this 326 00:17:45,300 --> 00:17:49,000 is [xyz]'. Just for this, shortcut this is. 327 00:17:49,000 --> 00:17:50,900 [no audio] 328 00:17:50,900 --> 00:17:55,500 Right. Now see that. Let me take a given text. 329 00:17:57,000 --> 00:18:01,100 So my intention is, let me write a, let me comment this code 330 00:18:01,100 --> 00:18:04,900 as of now. Actually my intention is, just I want to look for 331 00:18:05,400 --> 00:18:06,900 either "a" or "b" or "c", 332 00:18:08,300 --> 00:18:14,700 or "d", let me take it. So 'my_pat', I want to look only for 333 00:18:14,900 --> 00:18:19,900 either A or B or C or D whenever if you want to look for 334 00:18:20,100 --> 00:18:23,100 either "a" or "b" or "c" or "d". Then you have to write your characters 335 00:18:23,100 --> 00:18:26,600 inside of square brackets. Now, let me 'print()', 336 00:18:29,100 --> 00:18:33,100 your operation is from 're.findall()', from your pattern, 337 00:18:33,900 --> 00:18:36,300 your given text. Now see the result. 338 00:18:38,000 --> 00:18:42,200 [abcd]. "c" is not there nowhere, and "b" also not there, but "a" 339 00:18:42,200 --> 00:18:46,300 and "d" is there, that's why you're getting that - 'a', 'a', 'd', 'a', 'a'. 340 00:18:49,000 --> 00:18:51,600 But the same thing I can also write '[a-d]'. 341 00:18:51,600 --> 00:18:53,700 [no audio] 342 00:18:53,700 --> 00:18:58,700 See the result. First two "a", one "d", two "a'. Now see the result, 343 00:18:58,700 --> 00:19:02,700 I am going to rerun by changing my pattern. Same thing you're 344 00:19:02,700 --> 00:19:08,600 getting. So this exactly equals to '[abcd]'. 345 00:19:09,500 --> 00:19:12,800 See sometimes you're going to look for any character, suppose '[a-z]', 346 00:19:13,100 --> 00:19:15,200 no need to write 'abcdefg' up to 'z'. 347 00:19:15,200 --> 00:19:18,200 Simply 'a-z' you can write. '[a-z]'. That's it. 348 00:19:18,200 --> 00:19:20,700 [no audio] 349 00:19:20,700 --> 00:19:22,000 So this is one of the rules. 350 00:19:23,100 --> 00:19:24,900 Then, so that's what I am giving here. 351 00:19:25,600 --> 00:19:28,700 So this already we discussed. Suppose if you have 'abcd', 352 00:19:28,700 --> 00:19:32,900 some 'hijk', and 'xyz', you can also write shortcut in this way. 353 00:19:33,900 --> 00:19:41,500 Now, let me take '\w'. Matches any single character, or digit, 354 00:19:41,500 --> 00:19:42,800 or underscore. 355 00:19:42,800 --> 00:19:44,700 [no audi] 356 00:19:44,700 --> 00:19:49,500 See that. See guys if I write simply pattern as 'a', "a" is going 357 00:19:49,500 --> 00:19:53,200 to match with only "a". Your 'findall' is going to look for this 358 00:19:53,200 --> 00:19:58,000 "a" in a given text for "a" only. But if you write instead 359 00:19:58,000 --> 00:20:02,000 of "a" if I write some special symbol called '\w', now the 360 00:20:02,000 --> 00:20:07,000 meaning for this is any alphabet from a to z, or 361 00:20:07,000 --> 00:20:13,400 capital A to Z, then number 0 to 9 or maybe underscore also. 362 00:20:13,400 --> 00:20:15,300 [no audio] 363 00:20:15,300 --> 00:20:19,900 See suppose if we have a text called 'abc', and if I write a 364 00:20:19,900 --> 00:20:23,700 pattern "a", now if I look into this "a" is going to get only 365 00:20:23,700 --> 00:20:29,000 one time, but if I write in place of "a", '\w', "a" is also a '\w\, 366 00:20:29,200 --> 00:20:33,100 or "b" also comes under '\w', "c" also comes under '\w'. Now you're 367 00:20:33,100 --> 00:20:37,600 going to get three characters because '\w' means any any alphabet, 368 00:20:38,200 --> 00:20:40,300 and number, and one underscore as well. 369 00:20:41,100 --> 00:20:43,600 Let me show you this on your script. 370 00:20:45,800 --> 00:20:47,200 See my intention is, 371 00:20:48,800 --> 00:20:50,000 let me take, now, 372 00:20:50,000 --> 00:20:51,700 first of all let me comment this code. 373 00:20:53,000 --> 00:20:54,000 Guys be clear, 374 00:20:54,000 --> 00:20:57,500 [no audio] 375 00:20:57,500 --> 00:21:01,700 I am going to take 'my_pat' as 're', sorry. 376 00:21:03,200 --> 00:21:05,600 I'm not taking raw string as of now, but we have to take 377 00:21:05,600 --> 00:21:08,700 it. I will tell you when you have to take 'r'. Fine. 378 00:21:09,800 --> 00:21:11,900 Now, what I am doing simply I am writing '\w'. 379 00:21:13,500 --> 00:21:18,900 Now see the result. If I do 'print(re.findall())', your 380 00:21:18,900 --> 00:21:22,300 pattern, in a given text, see the result. 381 00:21:23,600 --> 00:21:27,500 You're going to get all matching things except space. 382 00:21:27,500 --> 00:21:32,400 Are we getting space matching? See after this "is", between "This" 383 00:21:32,400 --> 00:21:35,200 and "is", we have a space. Are we getting a space anywhere here? 384 00:21:35,300 --> 00:21:39,200 No, because space does not come under '\w', 385 00:21:39,300 --> 00:21:41,100 that's why you're not getting matching for that. 386 00:21:42,500 --> 00:21:46,400 And "@" does not comes under the your '\w', that's why this 387 00:21:46,400 --> 00:21:49,400 is not a match for your '\w'. That's why you're not getting 388 00:21:49,400 --> 00:21:53,500 that. Let me write somewhere "-", and let me write somewhere 389 00:21:53,500 --> 00:21:56,000 "_". Now see the result. 390 00:21:57,100 --> 00:22:00,200 You're getting "_". Between "o", and "l", there is 391 00:22:00,200 --> 00:22:03,900 an "_", but between "o" and "_" you have a 392 00:22:03,900 --> 00:22:05,800 space, that space is not a match. 393 00:22:06,400 --> 00:22:10,800 You're not getting that. And "-" does not come under this 394 00:22:10,800 --> 00:22:14,600 '\w', but "_" comes under '\w', that's why you are getting here. 395 00:22:14,600 --> 00:22:16,500 [no audio] 396 00:22:16,500 --> 00:22:21,500 Now I am looking for two letter string or two character 397 00:22:21,500 --> 00:22:25,800 string. If I write '\w\w', the meaning for that is just 398 00:22:25,800 --> 00:22:31,300 assumption. "a" means, sorry, '\w' means 'a-z' small letters, 399 00:22:31,300 --> 00:22:35,100 and 'A-Z' capital letters, and numbers 0 to 9, and underscore. 400 00:22:35,600 --> 00:22:39,600 Now if I write two '\w' the meaning for that is, it may be 401 00:22:39,600 --> 00:22:46,600 'aa', 'ab', 'ac', likewise 'A-Z' and 'ba', 'bb', 'bc', 402 00:22:46,600 --> 00:22:53,500 likewise, or maybe 'a0', 'a1', 'a2' up to 'a9', or 'a_', 403 00:22:54,400 --> 00:22:57,700 right, 'b_', 'c_', likewise you are going to get 404 00:22:57,700 --> 00:23:00,500 number of combinations. Among all those combinations 405 00:23:01,700 --> 00:23:04,800 whatever matching is there in a given string you're going to get them. 406 00:23:06,400 --> 00:23:10,600 See if I write '\w\w' the meaning for that is, you're going 407 00:23:10,600 --> 00:23:16,000 to look for a string which consists of two characters. 408 00:23:17,100 --> 00:23:22,000 Now if I run this, see "th" is also one match for this. First, 409 00:23:22,000 --> 00:23:24,000 let me run this and see the output so that you can understand. 410 00:23:24,100 --> 00:23:27,000 See that. "Th", two letter string, 411 00:23:27,000 --> 00:23:31,200 yes, match. That's why we are getting that. "is", yes you are getting, 412 00:23:31,900 --> 00:23:35,700 then "is" you're, in between you have a space, right, space does not 413 00:23:35,700 --> 00:23:38,500 come under your '\w', that's why you are not getting that. "is", 414 00:23:38,500 --> 00:23:44,100 yes you are getting. See "a" is only one single character. Before 415 00:23:44,100 --> 00:23:45,700 "a" and after "a" you have a space. 416 00:23:46,600 --> 00:23:49,900 So combination with "a", you don't have any valid character 417 00:23:49,900 --> 00:23:51,600 which comes under '\w'. 418 00:23:52,700 --> 00:23:56,500 In case if space comes under '\w', you're going to get "a " 419 00:23:56,500 --> 00:24:00,700 also, but space not comes under '\w' matching, 420 00:24:00,800 --> 00:24:05,500 that's why for "a" you don't have any combination to get two 421 00:24:05,500 --> 00:24:06,500 character strings. 422 00:24:06,500 --> 00:24:09,200 That's why this is not going to match for your pattern. 423 00:24:09,200 --> 00:24:13,000 [no audio] 424 00:24:13,000 --> 00:24:19,100 Right. And one more thing, see that last "le" suppose, two 425 00:24:19,100 --> 00:24:23,700 character string, "ar", two character string, but for "n" 426 00:24:23,700 --> 00:24:27,300 you don't have again any matching to get a two character string. 427 00:24:27,500 --> 00:24:30,000 After "n" suppose if we have any character, 428 00:24:30,000 --> 00:24:33,900 let's say "a", now if I run this. See the last thing, you're 429 00:24:33,900 --> 00:24:37,900 getting "ar". After "ar" now I already here you "a" letter, 430 00:24:38,000 --> 00:24:41,800 "a" character. Now if I rerun you're going to get "na" also. 431 00:24:42,600 --> 00:24:45,200 Not only "a", you can take "_" also, because "_" 432 00:24:45,200 --> 00:24:46,800 also comes under '\w', 433 00:24:47,000 --> 00:24:49,700 you are going to get matched by using these two characters 434 00:24:49,700 --> 00:24:54,700 combination. Now, let me write three, three character 435 00:24:56,300 --> 00:24:57,800 matching. Guys, 436 00:24:57,900 --> 00:25:00,500 now the meaning for this is, you're going to look for a string 437 00:25:00,500 --> 00:25:03,700 which consists of three characters, but what are those three 438 00:25:03,700 --> 00:25:09,600 characters? We know '\w' means any character 'a-z' small or 439 00:25:09,600 --> 00:25:14,200 capital, 0 to 9, and underscore. Only from these three groups 440 00:25:14,600 --> 00:25:17,300 it is going to look for three character string. 441 00:25:17,800 --> 00:25:22,300 Now, let me run this and see the result. "thi" three-letter character, 442 00:25:22,300 --> 00:25:26,100 yes, you are getting. But for "s", after that you have a space, 443 00:25:26,300 --> 00:25:29,900 so space does not come under '\w', that's why there is no possibility 444 00:25:29,900 --> 00:25:33,900 for this "s" to make three letter character. Then it is going 445 00:25:33,900 --> 00:25:37,500 to skip. Then for "is" you have only two characters, but you 446 00:25:37,500 --> 00:25:40,800 don't have any three characters string, because if you want 447 00:25:40,800 --> 00:25:43,900 to get a three character string here we have a space, but 448 00:25:43,900 --> 00:25:46,900 space does not come under '\w', that's why you're not getting 449 00:25:46,900 --> 00:25:50,000 a part for this. Same way for "a" you don't have any three character 450 00:25:50,000 --> 00:25:52,100 string, but "python", "pyt", 451 00:25:53,600 --> 00:25:57,900 three characters, valid characters, "hon", three characters. That's why 452 00:25:57,900 --> 00:26:02,000 you are getting that. "and", three characters. For "it" 453 00:26:02,000 --> 00:26:03,600 you don't have any valid three characters, 454 00:26:03,600 --> 00:26:08,000 that's why you're not getting. "eas", yes, "e-a-s", three characters, 455 00:26:08,000 --> 00:26:11,400 but for "y" we don't have any three character string combination, 456 00:26:11,400 --> 00:26:15,300 that's why this is not a part. Likewise you are getting. I 457 00:26:15,300 --> 00:26:16,500 hope you're good with this. 458 00:26:16,500 --> 00:26:18,300 [no audio] 459 00:26:18,300 --> 00:26:22,000 Now, let me go with the capital '\W' - Matches 460 00:26:22,000 --> 00:26:24,600 any character not part of '\w'. 461 00:26:26,000 --> 00:26:28,400 See whatever you have in this '\w', 462 00:26:29,100 --> 00:26:33,000 other than that all are going to come under '\W'. See that. 463 00:26:34,400 --> 00:26:35,900 Now, let me comment this. 464 00:26:35,900 --> 00:26:41,100 [no audio] 465 00:26:41,100 --> 00:26:43,200 First let me take 'my_pat' as small, 466 00:26:43,200 --> 00:26:45,900 [no audio] 467 00:26:45,900 --> 00:26:51,300 '\w' only, and see the result first. 'print(re.findall())'. Guys 468 00:26:51,300 --> 00:26:54,000 you can get clarity by doing practice. 469 00:26:54,100 --> 00:26:55,300 Please do same thing 470 00:26:55,600 --> 00:26:59,300 once, I mean whatever we've done here same thing you just practice 471 00:26:59,300 --> 00:27:02,600 once, and try to understand how you're going to get your output. 472 00:27:02,600 --> 00:27:04,800 [no audio] 473 00:27:04,800 --> 00:27:08,100 Now, let me run this and see the output. But if I make it 474 00:27:08,100 --> 00:27:13,100 as a capital '\W', see "T" and "h", "i", "s", whatever you are getting here 475 00:27:13,100 --> 00:27:15,500 no, these are not the part of '\W'. 476 00:27:15,500 --> 00:27:16,900 They are part of '\w'. 477 00:27:17,500 --> 00:27:20,600 That's why if I write '\W', see what you are going to 478 00:27:20,600 --> 00:27:24,900 get. You're going to get spaces, "@", and "-", because 479 00:27:24,900 --> 00:27:27,500 these are not the part of your '\w'. 480 00:27:29,000 --> 00:27:33,100 It is going to look for a character which is not there in 481 00:27:33,100 --> 00:27:34,700 this group, '\w' group. 482 00:27:36,400 --> 00:27:41,100 Space is the match for '\W', "@" is the match 483 00:27:41,100 --> 00:27:42,100 for '\W'. 484 00:27:42,500 --> 00:27:44,900 See how many spaces are there, all those spaces you are 485 00:27:44,900 --> 00:27:45,900 going to get here. 486 00:27:46,900 --> 00:27:48,800 So that is the use of '\W'. 487 00:27:48,800 --> 00:27:53,500 Simply the members which are there in this, other 488 00:27:53,500 --> 00:27:56,200 than that remaining all come under '\W'. 489 00:27:56,700 --> 00:27:58,000 Very, very important. 490 00:27:59,300 --> 00:28:03,200 Then '\d' - Matches a decimal digit 0-9, 491 00:28:03,200 --> 00:28:07,800 any number. See any way by default '\w' 492 00:28:07,800 --> 00:28:11,800 means it also includes your numbers but along with the numbers 493 00:28:11,800 --> 00:28:15,400 '\w' includes alphabets and underscore. But I want to match 494 00:28:15,400 --> 00:28:16,700 with only numbers. 495 00:28:17,500 --> 00:28:21,300 Let's say I have somewhere, let me write "PYTHON3". 496 00:28:21,300 --> 00:28:25,200 [no audio] 497 00:28:25,200 --> 00:28:26,200 "python3", 498 00:28:27,100 --> 00:28:29,500 and let's say, somewhere just for, here 499 00:28:29,500 --> 00:28:31,600 I am writing "python2", and somewhere 500 00:28:31,600 --> 00:28:33,600 I'm writing some number let's say 4. 501 00:28:35,200 --> 00:28:39,100 Okay. Now what I am doing is, I am going to take a pattern as some 502 00:28:39,100 --> 00:28:43,700 '\d'. You know what is meant by '\d'. Any number 0 to 9. 503 00:28:44,200 --> 00:28:49,900 See I'm going to take pattern as '\d', and let me print it. 504 00:28:51,100 --> 00:28:53,700 're.findall()'. Sorry. 505 00:28:54,800 --> 00:28:57,700 'findall()', your pattern, in a given text. 506 00:28:58,200 --> 00:29:01,000 So wherever number is there, because you're looking for only 507 00:29:01,000 --> 00:29:02,400 one number, one string, 508 00:29:02,400 --> 00:29:06,900 sorry, a character which consists of only one digit, string 509 00:29:06,900 --> 00:29:10,400 which consists of only one character that is a digit. Now 510 00:29:10,400 --> 00:29:12,600 see the result. [2, 4, 3]. Yes, 511 00:29:12,600 --> 00:29:16,100 we have somewhere [2, 4, 3]. You are getting that. Now my intention 512 00:29:16,100 --> 00:29:21,700 is, "python" with some number. So before '\d' 513 00:29:21,700 --> 00:29:25,500 I should have a word called "python", after that any number 514 00:29:25,500 --> 00:29:27,200 maybe 2 or 3. Now, 515 00:29:27,200 --> 00:29:30,600 if I run this you are going to get "python", after that number 516 00:29:30,600 --> 00:29:34,600 is there, no. Now this is the match for this pattern. At 517 00:29:34,600 --> 00:29:39,600 the same time "python3" is also match for '\d'. Suppose 518 00:29:39,600 --> 00:29:44,300 if I write only "python2", you're going to get only "python2" 519 00:29:44,300 --> 00:29:48,600 because directly you're mentioning match "2" with "2", "python2" 520 00:29:48,600 --> 00:29:54,200 with "python2". But I want to match with "python\d" means, 521 00:29:54,700 --> 00:29:58,400 the meaning for this, it is going to represent "python0", 522 00:29:58,400 --> 00:30:02,000 "python1", "python2", "python3", up to "python9". But among 523 00:30:02,000 --> 00:30:05,100 all those whatever the strings are there in a given text 524 00:30:05,100 --> 00:30:08,000 those things, those strings only it is going to match. 525 00:30:08,900 --> 00:30:09,900 That's it. 526 00:30:09,900 --> 00:30:13,400 [no audio] 527 00:30:13,400 --> 00:30:16,200 Right. Now let's say instead of this pattern 528 00:30:16,500 --> 00:30:17,800 I want to look for 529 00:30:20,600 --> 00:30:25,500 a string which consists of two digits. Do we have anything 530 00:30:25,500 --> 00:30:29,000 in this? No. Sequentially, 531 00:30:29,000 --> 00:30:31,700 I mean immediately two digits nowhere we are having. That's 532 00:30:31,700 --> 00:30:33,700 why you're not getting that. Let's say I want to write here 533 00:30:33,700 --> 00:30:37,900 45, something. Now you are getting that because two digits are 534 00:30:37,900 --> 00:30:41,300 there. You're looking for '\d\d' means a string 535 00:30:41,300 --> 00:30:43,200 which consists of two digits. 536 00:30:43,600 --> 00:30:44,600 Yes you are getting. 537 00:30:46,100 --> 00:30:48,500 Now, so in this basic rules, let me go 538 00:30:48,500 --> 00:30:50,100 with the last one, '.'. 539 00:30:51,700 --> 00:30:55,800 Very, very important one, at the same time confusion one. Very 540 00:30:55,800 --> 00:31:01,100 useful in your real-time, '.'. '.' matches any single character, 541 00:31:01,100 --> 00:31:03,100 except new line character. 542 00:31:03,100 --> 00:31:05,200 [no audio] 543 00:31:05,200 --> 00:31:10,200 See if you take '\W', other than this members all comes 544 00:31:10,200 --> 00:31:11,800 under '\W'. 545 00:31:12,100 --> 00:31:16,400 But if you go with the '.', all valid characters which are 546 00:31:16,400 --> 00:31:20,100 there in your keyboard or which are there in your computer 547 00:31:20,100 --> 00:31:23,400 language, all are going to match with your '.' except new 548 00:31:23,400 --> 00:31:25,100 line. See that. 549 00:31:25,100 --> 00:31:28,100 [no audio] 550 00:31:28,100 --> 00:31:30,600 Let me comment this things first. 551 00:31:30,600 --> 00:31:34,200 [no audio] 552 00:31:34,200 --> 00:31:37,000 Guys these are very basic rules to create your patterns. 553 00:31:37,000 --> 00:31:38,400 We have advanced levels as well, 554 00:31:38,400 --> 00:31:40,500 we will see them in our next video. 555 00:31:42,200 --> 00:31:45,400 So now my intention is, I am taking a pattern as 556 00:31:45,400 --> 00:31:47,400 [no audio] 557 00:31:47,400 --> 00:31:52,900 '.' means, match with any character. Now see that what you are going to get. 558 00:31:52,900 --> 00:31:55,500 [no audio] 559 00:31:55,500 --> 00:32:03,100 Your pattern, 'text'. All we are going to get. See that. We're 560 00:32:03,100 --> 00:32:05,400 going to get each and every character which is there in a given 561 00:32:05,400 --> 00:32:09,500 text. Anyway, we don't have any '\n', I mean new line in our 562 00:32:09,500 --> 00:32:13,200 given text, except that that is going to match all things. 563 00:32:13,600 --> 00:32:18,400 Now if you observe "T", "h", "i", "s", after that space is there in a given 564 00:32:18,400 --> 00:32:21,800 text. Yes, that space is also matching for '.'. '.' means, 565 00:32:21,800 --> 00:32:25,000 a '.' means anything simply other than '\n'. 566 00:32:26,500 --> 00:32:28,600 Now I want to match 567 00:32:29,000 --> 00:32:32,500 any two characters, then write two '.'. Now see the result. 568 00:32:33,600 --> 00:32:35,500 I want to match for three '.', 569 00:32:35,600 --> 00:32:39,000 I mean any three characters, any three symbols which are there 570 00:32:39,000 --> 00:32:40,300 in computer language. 571 00:32:41,200 --> 00:32:42,200 See that. 572 00:32:43,800 --> 00:32:46,300 "33" you're going to get. Now 573 00:32:46,300 --> 00:32:50,300 my intention is, I want to match '.' with '.' only. 574 00:32:50,300 --> 00:32:52,100 [no audio] 575 00:32:52,100 --> 00:32:56,600 Generally '.' means, in your pattern '.' means for any matching 576 00:32:57,100 --> 00:33:00,300 except new line. But I want to match '.' with '.'. 577 00:33:00,500 --> 00:33:06,200 Let's say in a given text somewhere I have '.', "python 2." 578 00:33:07,100 --> 00:33:09,100 Now if I run this what you are getting? 579 00:33:09,100 --> 00:33:11,100 [no audio] 580 00:33:11,100 --> 00:33:14,100 See now you're also going to get somewhere "." also. 581 00:33:15,600 --> 00:33:16,600 You're looking for 582 00:33:16,600 --> 00:33:18,600 [no audio] 583 00:33:18,600 --> 00:33:21,800 three digit string, yes. 584 00:33:22,100 --> 00:33:23,500 Sorry, three character string. 585 00:33:23,500 --> 00:33:27,700 Yes, you are getting, 'n2.'. But along with the '.' you're also 586 00:33:27,700 --> 00:33:28,700 getting some other strings 587 00:33:28,700 --> 00:33:30,100 no, but I don't want to get them. 588 00:33:30,100 --> 00:33:32,500 I want to get exactly '.' with '.' only. 589 00:33:32,600 --> 00:33:34,500 I want to match my '.' with '.'. 590 00:33:35,300 --> 00:33:39,700 See if you remember, if you want to match "a" with "a", let's say 591 00:33:39,700 --> 00:33:41,000 this is your text "abc". 592 00:33:41,400 --> 00:33:44,400 This is your text, and you want to write a pattern. How you 593 00:33:44,400 --> 00:33:46,600 can write a pattern if you want to match "a" with "a"? 594 00:33:46,700 --> 00:33:47,800 "a" only you're writing. 595 00:33:48,400 --> 00:33:51,500 Now in my text somewhere I have '.', but I want to match 596 00:33:51,500 --> 00:33:55,100 "." with ".". Simply if I write ".", "." is going to match with 597 00:33:55,100 --> 00:33:56,100 any character. 598 00:33:56,300 --> 00:33:59,800 But I want to match this "." with this "." only. At that time 599 00:33:59,800 --> 00:34:04,100 you have to escape your special purpose of your "." by placing 600 00:34:04,200 --> 00:34:05,200 '\' symbol. 601 00:34:06,100 --> 00:34:07,100 Let me run this. 602 00:34:07,100 --> 00:34:09,500 [no audio] 603 00:34:09,500 --> 00:34:12,100 '\.', now see the result. 604 00:34:12,100 --> 00:34:14,100 [no audio] 605 00:34:14,199 --> 00:34:19,100 Let me write some other ways, in other places ".". 606 00:34:19,100 --> 00:34:21,100 See that. That's it. 607 00:34:21,100 --> 00:34:23,400 [no audio] 608 00:34:23,400 --> 00:34:27,000 But if I remove this '\', now you're going to get all characters 609 00:34:27,000 --> 00:34:29,500 which are going to match for your ".". By simply "." means 610 00:34:29,500 --> 00:34:31,400 anything. That's it. Except new line. 611 00:34:32,199 --> 00:34:33,500 So it's very, very important guys. 612 00:34:33,500 --> 00:34:36,100 If you want to match "." with ".", you have to use in your 613 00:34:36,100 --> 00:34:37,500 pattern '\.'. 614 00:34:38,300 --> 00:34:39,900 Very, very useful example 615 00:34:39,900 --> 00:34:42,199 I am going to give in your real time. 616 00:34:43,000 --> 00:34:45,900 Let's say you have a text in this way. 617 00:34:47,500 --> 00:34:51,000 "This is my ip of a db server. 618 00:34:51,800 --> 00:34:56,199 Guys just assume that some, '255.', 619 00:34:57,500 --> 00:34:58,500 let's say 620 00:34:58,500 --> 00:35:00,800 [no audio] 621 00:35:00,800 --> 00:35:01,800 '100.' 622 00:35:01,800 --> 00:35:04,000 [no audio] 623 00:35:04,000 --> 00:35:06,400 '102.' Of course 624 00:35:06,400 --> 00:35:08,400 [no audio] 625 00:35:08,400 --> 00:35:12,699 as of now I am taking in my IP in all places, in all places three digits. 626 00:35:14,000 --> 00:35:17,900 But see, you may have sometimes here only two digits or one digit, right. 627 00:35:18,500 --> 00:35:21,699 But as of now based on our rules, if I go with this example 628 00:35:21,699 --> 00:35:23,000 you can understand. Later 629 00:35:23,000 --> 00:35:24,699 we will see in our advanced level 630 00:35:25,199 --> 00:35:27,800 if you have only one digit here how to match, everything 631 00:35:27,800 --> 00:35:31,600 we will see. Now I am going to tell you one important thing 632 00:35:31,600 --> 00:35:32,600 from your real time. 633 00:35:33,100 --> 00:35:36,100 See you are going to run some script, or some information 634 00:35:36,100 --> 00:35:38,800 is there somewhere in your text file, and you are going to 635 00:35:38,800 --> 00:35:42,400 read that, but you don't know where your IP is there. 636 00:35:44,199 --> 00:35:47,800 Assume that you're getting some output somewhere. In that 637 00:35:47,800 --> 00:35:51,800 output you have IP, but you don't know where exactly your IP is there, 638 00:35:51,800 --> 00:35:54,100 [no audio] 639 00:35:54,100 --> 00:35:57,200 where exactly your IP is there but I want to fetch only my 640 00:35:57,200 --> 00:36:00,000 IP. How you can write a pattern? 641 00:36:01,100 --> 00:36:05,500 See, I can write a pattern, any variable, just take any variable. 642 00:36:05,500 --> 00:36:08,300 I am taking simply now 'pat' or 'my_pat', or 'pat', or 643 00:36:08,300 --> 00:36:12,900 'x=', also you can take. See as of now 644 00:36:12,900 --> 00:36:16,300 I am assuming that in my IP in all places three digit number 645 00:36:16,300 --> 00:36:20,100 is there. That's why '\d\d\d' means, 646 00:36:20,100 --> 00:36:22,800 [no audio] 647 00:36:22,800 --> 00:36:24,200 you are looking for 648 00:36:24,200 --> 00:36:27,200 three digit string in a given text. 649 00:36:28,200 --> 00:36:31,700 But first let me run this so that you can understand. 650 00:36:31,700 --> 00:36:34,000 [no audio] 651 00:36:34,000 --> 00:36:35,100 What happened? Yeah. 652 00:36:35,300 --> 00:36:39,800 Sorry, I have written at the end one more extra '\'. Fine. Now see that. 653 00:36:40,900 --> 00:36:44,800 'findall()', your pattern, your text. 654 00:36:45,900 --> 00:36:46,900 See the output. 655 00:36:46,900 --> 00:36:49,000 [no audio] 656 00:36:49,000 --> 00:36:53,200 Okay, let me write three digits. Now see the output. '255', 657 00:36:53,200 --> 00:36:55,800 '100', '102', '103', you're getting separately. 658 00:36:55,800 --> 00:36:57,500 But this entire thing is one IP. 659 00:36:57,500 --> 00:37:01,900 I want to get as only one value, not like this. Then what I have to do? 660 00:37:02,800 --> 00:37:06,400 I want to write a pattern such that it is going to match with this thing. 661 00:37:07,500 --> 00:37:10,800 I don't know exact number what is there but I know in each and every 662 00:37:10,800 --> 00:37:19,600 position there are three digits then three '\d\d\d', then '.\d\d\d 663 00:37:19,600 --> 00:37:26,200 .\d\d\d.\d\d\d' 664 00:37:27,300 --> 00:37:28,300 Now see the output. 665 00:37:28,900 --> 00:37:30,100 It's working perfectly, right? 666 00:37:30,100 --> 00:37:32,100 [no audio] 667 00:37:32,100 --> 00:37:35,700 It's working perfectly. Now see the magic. 668 00:37:37,300 --> 00:37:38,300 Now let's say 669 00:37:38,300 --> 00:37:42,600 [no audio] 670 00:37:42,600 --> 00:37:47,900 here instead of '\d', what I am doing is '\w' I am writing. 671 00:37:47,900 --> 00:37:51,300 [no audio] 672 00:37:51,300 --> 00:37:54,500 Just assume that. Because under '\w' 673 00:37:54,500 --> 00:37:56,000 you're also going to get digits, right? 674 00:37:56,000 --> 00:38:00,000 [no audio] 675 00:38:00,000 --> 00:38:01,600 Now see the result what you are getting. 676 00:38:02,700 --> 00:38:03,700 You are getting. 677 00:38:05,000 --> 00:38:10,600 Now I am writing "this", something, in this 678 00:38:10,600 --> 00:38:13,300 way I'm writing. Now see the result what you are getting. 679 00:38:13,300 --> 00:38:16,800 [no audio] 680 00:38:16,800 --> 00:38:18,000 Nowhere you have "." here 681 00:38:18,000 --> 00:38:23,000 no, then why it is matching with your given pattern? Because 682 00:38:23,000 --> 00:38:27,100 "." means anything, but I want to match "." with "." only. 683 00:38:28,800 --> 00:38:30,900 I want to match "." with "." only. 684 00:38:32,100 --> 00:38:35,700 Otherwise, let me take in terms of numbers only so that you will get idea. 685 00:38:36,300 --> 00:38:39,400 Let's say 1, 2, 3. 686 00:38:39,400 --> 00:38:41,900 [no audio] 687 00:38:41,900 --> 00:38:43,000 One extra number. 688 00:38:43,000 --> 00:38:44,300 Let me write some numbers, 689 00:38:44,300 --> 00:38:46,400 [no audio] 690 00:38:46,400 --> 00:38:47,400 some number. 691 00:38:47,400 --> 00:38:49,200 [no audio] 692 00:38:49,200 --> 00:38:51,400 This is not IP, right, but I'm writing some number. 693 00:38:52,400 --> 00:38:54,400 Now let me write instead of this thing. 694 00:38:55,500 --> 00:38:59,200 Guys be clear why I am giving this means, in your real time 695 00:38:59,200 --> 00:39:01,800 you're going to get sometimes confusion, for that 696 00:39:02,200 --> 00:39:05,700 I am giving some clarity here to use your '\d'. 697 00:39:05,700 --> 00:39:08,900 [no audio] 698 00:39:08,900 --> 00:39:12,900 Now if I run this, see the output. This is IP, but this is 699 00:39:12,900 --> 00:39:13,900 not a IP right? 700 00:39:15,100 --> 00:39:18,800 Our intention is, three digits, then '.', three digits, then 701 00:39:18,800 --> 00:39:22,800 '.', three digits, then '.', three digits. In this way if 702 00:39:22,800 --> 00:39:25,400 I have any string in a given text that only I want to fetch. 703 00:39:25,400 --> 00:39:29,100 Yes, I'm fetching that. Along with that why I'm getting this? 704 00:39:30,700 --> 00:39:33,100 See now you are not matching "," with ".". 705 00:39:33,700 --> 00:39:37,700 If you want to match this "." with this ".", you don't 706 00:39:37,700 --> 00:39:39,700 write '.', you have to write '\.'. 707 00:39:40,900 --> 00:39:44,500 Simply if I write a '.', '.' means any digit, any character. 708 00:39:44,900 --> 00:39:47,900 That's why see '\d\d\d', this is going to match for 709 00:39:47,900 --> 00:39:49,100 first three digits. 710 00:39:49,600 --> 00:39:50,600 Then '.' means anything 711 00:39:50,600 --> 00:39:53,900 no. Now this is also going to match. Again three digits. 712 00:39:53,900 --> 00:39:57,500 Yes, three digits. Then "." means anything. '5' is also going 713 00:39:57,500 --> 00:40:01,000 to match for ".". But our intention is not for this type 714 00:40:01,000 --> 00:40:05,800 of numbers. Some three digits, then ".", some three digits, then 715 00:40:05,800 --> 00:40:09,600 ".", then some three digits, ".", likewise. 716 00:40:09,600 --> 00:40:13,000 That's why if you want to match "." with ".", 717 00:40:13,200 --> 00:40:15,400 you have to use '\.'. 718 00:40:16,100 --> 00:40:17,800 That's it. Now see the result. 719 00:40:17,800 --> 00:40:20,000 [no audio] 720 00:40:20,000 --> 00:40:23,300 So that's why there is a small difference between '.' and 721 00:40:23,300 --> 00:40:25,400 '\.'. You need to understand that clearly. 722 00:40:27,000 --> 00:40:30,600 Simple thing, I want to match "." with "." only, then you 723 00:40:30,600 --> 00:40:31,900 have to write '\.'. That's it. 724 00:40:33,300 --> 00:40:35,700 Okay. Okay guys, don't get confused. 725 00:40:36,000 --> 00:40:39,500 We are going to practice with number of examples so that 726 00:40:39,500 --> 00:40:42,000 you will get good idea about your Regular Expressions. 727 00:40:42,200 --> 00:40:44,200 So as of now, these are the simple 728 00:40:45,300 --> 00:40:48,000 rules to create your pattern. In next video 729 00:40:48,000 --> 00:40:52,500 we'll discuss about advanced level rules to create your pattern. 730 00:40:53,500 --> 00:40:55,200 Okay, thank you for watching this video. 731 00:40:55,200 --> 00:41:01,600 [no audio]