1 00:00:00,000 --> 00:00:01,700 Friends, here our concept is 2 00:00:01,700 --> 00:00:04,400 'findall' and 'finditer' operations of 're' module. 3 00:00:05,300 --> 00:00:08,300 So let me explain this with a simple script. 4 00:00:08,300 --> 00:00:10,000 Let me open my editor. 5 00:00:10,000 --> 00:00:12,100 [no audio] 6 00:00:12,100 --> 00:00:16,900 So here let me import 're' module first. 7 00:00:16,900 --> 00:00:19,400 I have given some script name. Now 8 00:00:19,400 --> 00:00:22,900 let me take 'my_str' as suppose. 9 00:00:22,900 --> 00:00:27,100 [no audio] 10 00:00:27,100 --> 00:00:30,200 "This is python and we are having 11 00:00:30,200 --> 00:00:32,600 [no audio] 12 00:00:32,600 --> 00:00:35,000 python2 and python3 versions". 13 00:00:35,000 --> 00:00:41,600 [no audio] 14 00:00:41,600 --> 00:00:45,400 Now, let me take 'my_pat' as, so actually when you are going 15 00:00:45,400 --> 00:00:47,600 to take pattern based on your requirement. Suppose 16 00:00:47,600 --> 00:00:52,300 my requirement is I need to find out either 'python', or 'python2', 17 00:00:52,300 --> 00:00:56,900 or 'python3'. See how can I write? 'python', 18 00:00:56,900 --> 00:00:58,700 [no audio] 19 00:00:58,700 --> 00:01:03,800 So if you are strictly looking for 'python' word or 'python2' 20 00:01:03,800 --> 00:01:08,200 'python3' words, then you have to take '\b' then 'python'. 21 00:01:08,900 --> 00:01:13,500 So our requirement is 'python' or '2' or '3', that's why in 22 00:01:13,500 --> 00:01:18,100 square brackets just write '[23]', then your '?'. 23 00:01:18,100 --> 00:01:23,300 '?' means once or none. Then '\b'. That's it. 24 00:01:24,200 --> 00:01:28,300 Now it is going to represent three words - 'python', 'python2', 25 00:01:28,300 --> 00:01:34,100 and 'python3'. Among these three, right, if you go with suppose 26 00:01:34,100 --> 00:01:38,000 're.MATCH', you know what is the usage of 're.MATCH', 27 00:01:39,400 --> 00:01:41,400 so 're.MATCH(my_pat)', 28 00:01:42,700 --> 00:01:45,000 then your string. 29 00:01:45,000 --> 00:01:46,800 [no audio] 30 00:01:46,800 --> 00:01:49,000 See the result. What you are getting? You are getting 'None'. 31 00:01:49,100 --> 00:01:53,300 The reason is, 'match' will always look at very first in a given 32 00:01:53,300 --> 00:01:55,000 string for your given pattern. 33 00:01:55,200 --> 00:01:58,500 We don't have any 'python', or 'python2', or 'python3' at very 34 00:01:58,500 --> 00:02:01,700 first, that's why 'match' is not able to find that. Now if I 35 00:02:01,700 --> 00:02:04,600 go with the 'search', then you are going to get information 36 00:02:04,600 --> 00:02:08,400 about 'python' because the very first, anyway you know that, 37 00:02:08,400 --> 00:02:12,400 'search' is going to look for your given pattern in a given string 38 00:02:12,500 --> 00:02:15,000 from left to right. While going from left to right 39 00:02:15,100 --> 00:02:16,400 we have somewhere 'python', 40 00:02:17,100 --> 00:02:21,100 so among these three patterns - 'python', 'python2', and 'python3' here 41 00:02:21,100 --> 00:02:22,800 you've got, your 'search' got 'python'. 42 00:02:22,800 --> 00:02:26,300 That's why it is giving that. But my requirement is, I need 43 00:02:26,300 --> 00:02:30,700 to find all the matchings for your pattern which are there 44 00:02:30,700 --> 00:02:32,300 in a given string. 45 00:02:32,300 --> 00:02:34,700 [no audio] 46 00:02:34,400 --> 00:02:36,300 Right. So in a given string, 'python' 47 00:02:36,300 --> 00:02:38,900 is there, 'python2' is there, and 'python3' is also there. 48 00:02:39,700 --> 00:02:41,300 I want to find all these three. 49 00:02:41,300 --> 00:02:44,400 [no audio] 50 00:02:44,400 --> 00:02:49,100 Then our operations are, your 'findall' operations. 51 00:02:49,600 --> 00:02:55,100 Let me print with 're.findall' your pattern. Your pattern 52 00:02:55,100 --> 00:02:59,900 may represent one string or multiple strings, right, so that 53 00:02:59,900 --> 00:03:04,200 I am looking in a given string. Now see the result. You're 54 00:03:04,200 --> 00:03:07,500 getting all the matchings for your given pattern. Suppose 55 00:03:07,500 --> 00:03:11,900 if I take 'pyx'. Now, you don't have any matching for your 56 00:03:11,900 --> 00:03:15,700 pattern in a given string. You're getting a empty list. Be clear. 57 00:03:15,700 --> 00:03:17,700 'findall' will always give you a list. 58 00:03:17,800 --> 00:03:20,100 If there is a match it will give the list of values, if 59 00:03:20,100 --> 00:03:21,300 there is no match it 60 00:03:21,300 --> 00:03:26,300 will give empty list. Now by using 'findall' operation directly 61 00:03:26,300 --> 00:03:30,200 you can tell that how many matchings are there for your given 62 00:03:30,200 --> 00:03:34,000 pattern in a given string simply by applying a 'length' function 63 00:03:34,000 --> 00:03:36,900 on your 'findall'. See the result. 64 00:03:37,900 --> 00:03:40,100 If there is no matching directly it will give 0. 65 00:03:40,100 --> 00:03:42,500 [no audio] 66 00:03:42,500 --> 00:03:45,200 Because we know that for list we can apply 'length' function, 67 00:03:45,400 --> 00:03:48,500 and from that you can find out how many matches are there. That's it. 68 00:03:48,800 --> 00:03:51,900 So that is the use of your 'findall', or basically 'findall' 69 00:03:51,900 --> 00:03:55,400 is useful just to understand 'regex' operations. 70 00:03:57,100 --> 00:04:01,100 But if you go with your 'findall' we have a small drawback, 71 00:04:01,200 --> 00:04:07,400 of course directly we can able to find how many matches are 72 00:04:07,400 --> 00:04:11,000 there for a given, for your given pattern in a given string, 73 00:04:11,500 --> 00:04:17,899 but yeah, let me take this as 'python'. 74 00:04:17,899 --> 00:04:22,200 Yeah, now see that. But the disadvantage is it is not giving 75 00:04:22,200 --> 00:04:24,800 any information about your matching. 76 00:04:24,899 --> 00:04:27,000 I mean, what is the starting index of this matching, 77 00:04:27,000 --> 00:04:28,700 what is the ending index of this matching. 78 00:04:29,900 --> 00:04:33,600 So whenever if you want to look all the matchings in a given 79 00:04:33,600 --> 00:04:36,500 string for your given pattern, and if you want to see at the 80 00:04:36,500 --> 00:04:40,500 same time the starting index or ending index of your matching, 81 00:04:40,800 --> 00:04:43,000 then you have to go with the 'finditer' operation. 82 00:04:43,500 --> 00:04:47,500 Let me print first the information for your 'finditer', 83 00:04:48,600 --> 00:04:51,200 'my_pat,my_str'. 84 00:04:52,600 --> 00:04:54,200 First observe the output what you are getting. 85 00:04:54,900 --> 00:04:57,700 Let me comment your remaining 'print' statements. 86 00:04:59,100 --> 00:05:00,700 See the output. You are getting 87 00:05:02,200 --> 00:05:05,300 'callable_iterator object at', something, some information 88 00:05:05,300 --> 00:05:10,600 is there. Just observe that. Now I am writing 'py', some, let's 89 00:05:10,600 --> 00:05:16,200 say 'pyxthon'. Now, we don't have any matching for your 90 00:05:16,200 --> 00:05:20,400 given pattern because it is not 'python', 'pyxthon'. For this 91 00:05:20,400 --> 00:05:25,000 related word we don't have anything in our given string. Now if I look 92 00:05:25,000 --> 00:05:27,200 in this way, see the result what you are getting? Still you 93 00:05:27,200 --> 00:05:28,700 are getting some object. 94 00:05:29,600 --> 00:05:33,200 So first of all, 'finditer' will give always some object. Whether 95 00:05:33,200 --> 00:05:35,900 there is a match or not, whenever if you run 'finditer' it 96 00:05:35,900 --> 00:05:39,000 will give first some object. You need to remember this point. 97 00:05:39,600 --> 00:05:43,700 Then, see that I am using for loop for this. 'for each 98 00:05:45,400 --> 00:05:49,000 object in this pattern, 99 00:05:49,500 --> 00:05:53,800 I mean in this operation, let me print each object. Now, see the result. 100 00:05:54,200 --> 00:05:58,400 You're not getting anything, because first of all whenever 101 00:05:58,400 --> 00:06:01,300 if you run 'finditer' operation it will give some object 102 00:06:01,900 --> 00:06:05,200 whether there is a match or not secondary. If there is a match 103 00:06:05,200 --> 00:06:08,100 this loop will repeat. If there is no match you're not getting 104 00:06:08,100 --> 00:06:10,200 anything. Be clear. 105 00:06:11,000 --> 00:06:14,300 See in your 'match' and 'search' operation you are using 'if' condition. 106 00:06:14,700 --> 00:06:17,400 If your object is present then what is the match, 107 00:06:17,400 --> 00:06:18,300 what is the starting index, 108 00:06:18,300 --> 00:06:19,300 what is the ending index. 109 00:06:19,900 --> 00:06:23,000 But if you go with 'finditer' in order to use 'if' condition, 110 00:06:23,500 --> 00:06:26,700 because first of all 'finditer' for multiple matchings directly 111 00:06:26,700 --> 00:06:30,500 use for loop. If there is a match then loop will repeat. If 112 00:06:30,500 --> 00:06:33,700 there is at least one match your loop will repeat 113 00:06:33,700 --> 00:06:37,300 at least one time. If there are no matches then your loop, 114 00:06:37,400 --> 00:06:40,400 your loop won't repeat for at least one time. 115 00:06:40,400 --> 00:06:42,700 [no audio] 116 00:06:42,700 --> 00:06:47,200 Now see that. I am making now my pattern as 'p-y-t-h-o-n'. Now see 117 00:06:47,200 --> 00:06:49,100 the output. You're getting three objects. 118 00:06:50,300 --> 00:06:54,200 Suppose I am taking my pattern as only 'python2', 119 00:06:54,200 --> 00:06:57,800 be clear 'python2' or 'python3', now see the result. 120 00:06:58,100 --> 00:07:00,200 Your loop is repeating only one time and it is giving you 121 00:07:00,200 --> 00:07:02,500 object. Right. 122 00:07:03,000 --> 00:07:04,000 So be clear. 123 00:07:05,000 --> 00:07:08,900 Now, I have given my pattern as 'python', in square brackets '[23]', 124 00:07:08,900 --> 00:07:12,000 then '?'. Now see the result, three objects 125 00:07:12,000 --> 00:07:16,300 it is giving. So 'finditer' will consist of the number of 126 00:07:16,300 --> 00:07:19,800 objects based on how many matches are there for your pattern 127 00:07:19,800 --> 00:07:23,700 in a given string. And whenever if you use for loop for that 128 00:07:23,700 --> 00:07:26,500 iterator it will give the different objects for each 129 00:07:26,500 --> 00:07:31,100 match. Right. Now see that. Now I can directly write, 130 00:07:31,100 --> 00:07:32,100 what is the match. 131 00:07:32,100 --> 00:07:34,000 Let me write some message. 132 00:07:34,000 --> 00:07:36,400 [no audio] 133 00:07:36,400 --> 00:07:37,600 "The match is: ", 134 00:07:37,600 --> 00:07:39,800 [no audio] 135 00:07:39,800 --> 00:07:41,600 so here I will print the match. 136 00:07:43,000 --> 00:07:44,100 What is your match? 137 00:07:44,800 --> 00:07:47,700 '.group', right. See that. 138 00:07:47,700 --> 00:07:52,400 [no audio] 139 00:07:52,400 --> 00:07:56,000 See that. Then I want to find what is the starting index for 140 00:07:56,000 --> 00:07:57,300 each and every object. 141 00:07:57,900 --> 00:08:00,100 So "starting index", 142 00:08:01,800 --> 00:08:06,000 your 'each ob.start()', if you remember from your 143 00:08:07,100 --> 00:08:10,100 'match' and 'search' operation. Then what is the ending index. 144 00:08:10,800 --> 00:08:17,000 Be clear, ending index is nothing but your object, 145 00:08:17,000 --> 00:08:23,900 let me write it, your object, '.end() - 1'. Be clear. 146 00:08:24,000 --> 00:08:26,000 Don't forget about that. That's it. 147 00:08:26,000 --> 00:08:28,900 [no audio] 148 00:08:28,900 --> 00:08:31,500 Right. So guys, this is the usage of 'finditer'. 149 00:08:33,200 --> 00:08:35,700 So in most of the cases in your real time, you're going to 150 00:08:35,700 --> 00:08:39,400 work with 'search' and 'match' operations, but you have to know 151 00:08:39,400 --> 00:08:43,900 the functionality or working operation of your 'findall' 152 00:08:43,900 --> 00:08:45,000 and 'finditer' as well. 153 00:08:45,100 --> 00:08:47,600 Sometimes you may use, based on requirement you will use, but 154 00:08:47,600 --> 00:08:48,600 based on my experience 155 00:08:48,600 --> 00:08:51,799 I am telling that most of the cases we'll use 'search' and 'match'. 156 00:08:51,799 --> 00:08:53,799 [no audio] 157 00:08:53,799 --> 00:08:55,600 Okay. Okay guys, 158 00:08:55,600 --> 00:08:57,200 thank you for watching this video. 159 00:08:57,200 --> 00:09:05,088 [no audio]