1 00:00:00,000 --> 00:00:07,433 [No Audio] 2 00:00:07,434 --> 00:00:09,666 Welcome back again in the course. In the 3 00:00:09,667 --> 00:00:11,500 previous tutorial, we have started our 4 00:00:11,501 --> 00:00:14,500 discussion about regexes. We will move ahead 5 00:00:14,501 --> 00:00:17,366 with the topic of regexes in this tutorial also. 6 00:00:17,666 --> 00:00:20,233 The next concepts in regexes is that of 7 00:00:20,234 --> 00:00:23,100 quantifiers. Quantifiers are used to denote 8 00:00:23,101 --> 00:00:26,100 the repetitions. There are three quantifiers 9 00:00:26,101 --> 00:00:28,300 that are commonly used called the question 10 00:00:28,301 --> 00:00:32,000 mark, a plus, and a star. The question mark is 11 00:00:32,001 --> 00:00:34,433 used for indicating zero or one times of 12 00:00:34,434 --> 00:00:37,333 repetition, the plus is used to indicate one 13 00:00:37,334 --> 00:00:40,366 or more times of repetition, and the star is 14 00:00:40,367 --> 00:00:42,866 used to indicate zero or more times of 15 00:00:42,867 --> 00:00:45,200 repetitions. So, let us learn the question 16 00:00:45,201 --> 00:00:47,666 mark first. I will write a simple Regex. 17 00:00:47,667 --> 00:00:52,166 [No Audio] 18 00:00:52,167 --> 00:00:55,133 This Regex means that, we need to have a zero or 19 00:00:55,134 --> 00:00:57,566 one instance of letter a at the beginning, 20 00:00:57,567 --> 00:00:59,900 followed by two mandatory letters of a. 21 00:01:00,233 --> 00:01:02,933 The question mark after the letter a means that 22 00:01:02,934 --> 00:01:04,965 the letter a is kind of optional at the 23 00:01:04,966 --> 00:01:07,100 beginning, that is zero or one times. 24 00:01:07,300 --> 00:01:09,300 Let us define some input text. 25 00:01:09,301 --> 00:01:13,000 [No Audio] 26 00:01:13,001 --> 00:01:14,700 I also include the code for 27 00:01:14,701 --> 00:01:16,300 displaying the possible matches. 28 00:01:17,766 --> 00:01:19,600 Let us cargo run this now. 29 00:01:19,601 --> 00:01:23,033 [No Audio] 30 00:01:23,034 --> 00:01:24,866 It has returned 2 matches, 31 00:01:24,867 --> 00:01:27,300 the first words, word matched because 32 00:01:27,301 --> 00:01:30,166 the first a is a kind of optional, and it is 33 00:01:30,167 --> 00:01:32,866 not present in the word. The second one also 34 00:01:32,867 --> 00:01:34,800 matched, because the first optional a is 35 00:01:34,801 --> 00:01:37,833 present in it. Let us do one more example, 36 00:01:37,834 --> 00:01:41,166 I will define a new Regex and also a new input text. 37 00:01:41,167 --> 00:01:47,500 [No Audio] 38 00:01:47,501 --> 00:01:49,800 In this case now, the Regex means that at the 39 00:01:49,801 --> 00:01:52,233 start, we need to have a mandatory letter of 40 00:01:52,234 --> 00:01:55,700 b followed by an optional a. In the input text, 41 00:01:55,701 --> 00:01:58,333 the first word will not match, because it does 42 00:01:58,334 --> 00:02:01,433 not have the mandatory letter of b. 43 00:02:01,533 --> 00:02:04,233 The second word containing ba will match. 44 00:02:04,234 --> 00:02:06,733 The next word containing only letter b will also 45 00:02:06,734 --> 00:02:10,100 match, because the letter a is option. Lastly, 46 00:02:10,101 --> 00:02:13,466 the last word containing ba will also match. 47 00:02:13,467 --> 00:02:14,733 Let us cargo run this. 48 00:02:14,734 --> 00:02:19,133 [No Audio] 49 00:02:19,134 --> 00:02:21,200 Let us now cover a nice use case 50 00:02:21,201 --> 00:02:23,266 of this quantifier, where you can use 51 00:02:23,267 --> 00:02:25,466 it to determine some files of a particular 52 00:02:25,467 --> 00:02:27,400 type, let me write a Regex. 53 00:02:27,401 --> 00:02:32,833 [No Audio] 54 00:02:32,834 --> 00:02:35,866 This Regex will match any text that contains 55 00:02:35,867 --> 00:02:39,500 the mandatory .rs at the end. The name of the 56 00:02:39,501 --> 00:02:41,833 file should have a word contains 0 letters, 57 00:02:41,834 --> 00:02:44,666 or at most 3 letters. Let us write some 58 00:02:44,667 --> 00:02:46,600 input text to test the Regex. 59 00:02:46,601 --> 00:02:51,733 [No Audio] 60 00:02:51,734 --> 00:02:53,766 In this case, it will match the first two 61 00:02:53,767 --> 00:02:56,166 file names, but the third one since the length 62 00:02:56,167 --> 00:02:58,633 of the fourth file name is greater than three 63 00:02:58,634 --> 00:03:00,600 characters, will not match. 64 00:03:00,601 --> 00:03:02,100 Let us cargo run this to confirm. 65 00:03:02,101 --> 00:03:06,500 [No Audio] 66 00:03:06,501 --> 00:03:07,966 The next quantifier is that 67 00:03:07,967 --> 00:03:10,100 of the plus, which is used to indicate one or 68 00:03:10,101 --> 00:03:11,900 more times. Let us comment out the 69 00:03:11,901 --> 00:03:13,400 previous code and start fresh. 70 00:03:13,401 --> 00:03:17,566 [No Audio] 71 00:03:17,567 --> 00:03:19,033 I will define a simple Regex. 72 00:03:19,034 --> 00:03:23,866 [No Audio] 73 00:03:23,867 --> 00:03:26,033 This Regex means that we need to have 74 00:03:26,034 --> 00:03:28,333 one or more times of letter a in the input 75 00:03:28,334 --> 00:03:31,800 text for a match to occur. Let us add some input text. 76 00:03:31,801 --> 00:03:36,133 [No Audio] 77 00:03:36,134 --> 00:03:37,933 I will also add the code for 78 00:03:37,934 --> 00:03:40,500 displaying the matches. This will now match 79 00:03:40,501 --> 00:03:42,966 the first word containing a single a, the 80 00:03:42,967 --> 00:03:45,766 second word containing double a, and the third 81 00:03:45,767 --> 00:03:48,933 word containing aaa. However for the 82 00:03:48,934 --> 00:03:52,466 word baab, it will only match the couple of 83 00:03:52,467 --> 00:03:55,533 a's, and from the last word, it will only match 84 00:03:55,566 --> 00:03:58,500 a single a. Let us execute to confirm. 85 00:03:58,501 --> 00:04:03,900 [No Audio] 86 00:04:03,901 --> 00:04:06,300 To find a file which may contain any number 87 00:04:06,301 --> 00:04:10,766 of characters, but has a .gif extension, we will 88 00:04:10,800 --> 00:04:13,200 use a Regex which will look something like this. 89 00:04:13,201 --> 00:04:17,300 [No Audio] 90 00:04:17,301 --> 00:04:19,166 This Regex will match anything that 91 00:04:19,167 --> 00:04:21,500 starts with any number of characters from one 92 00:04:21,501 --> 00:04:25,166 onwards, but that ends on the letters of 93 00:04:25,167 --> 00:04:29,133 .gif. The slash before the dot is used to 94 00:04:29,134 --> 00:04:31,833 make sure that the dot is used in its literal 95 00:04:31,834 --> 00:04:33,966 meaning and not for making any type of 96 00:04:33,967 --> 00:04:37,800 character. Let me add some text to test this Regex. 97 00:04:37,801 --> 00:04:43,366 [No Audio] 98 00:04:43,367 --> 00:04:45,900 Let us execute the code now to see the results. 99 00:04:45,901 --> 00:04:50,566 [No Audio] 100 00:04:50,567 --> 00:04:52,500 The next quantifier is that of the 101 00:04:52,501 --> 00:04:55,033 star, which is used to indicate zero or more 102 00:04:55,034 --> 00:04:57,633 times of repetitions. For instance I will 103 00:04:57,634 --> 00:05:00,166 write a simple Regex, which will match any 104 00:05:00,167 --> 00:05:02,633 text that starts with a mandatory letter of 105 00:05:02,666 --> 00:05:05,200 a followed by zero or more times of letter b. 106 00:05:05,201 --> 00:05:10,166 [No Audio] 107 00:05:10,167 --> 00:05:12,600 Let us add a sample text to test. 108 00:05:12,601 --> 00:05:18,100 [No Audio] 109 00:05:18,101 --> 00:05:20,100 In this case, it will match all the three 110 00:05:20,101 --> 00:05:22,700 words in the text, let us cargo run this to confirm. 111 00:05:22,701 --> 00:05:27,700 [No Audio] 112 00:05:27,701 --> 00:05:29,700 Let us now move to the next topic, but before 113 00:05:29,701 --> 00:05:31,433 that, I will comment out the previous code 114 00:05:31,434 --> 00:05:32,666 and start fresh again. 115 00:05:32,667 --> 00:05:36,200 [No Audio] 116 00:05:36,201 --> 00:05:38,133 Sometimes, we may do not 117 00:05:38,134 --> 00:05:40,500 want to have unlimited number of repetitions, 118 00:05:40,501 --> 00:05:42,600 but rather we are interested in some limited 119 00:05:42,601 --> 00:05:45,066 repetitions. To mention the limited 120 00:05:45,067 --> 00:05:47,400 repetitions, we will mention the least and 121 00:05:47,401 --> 00:05:49,633 most number of times of repetitions inside 122 00:05:49,634 --> 00:05:53,000 the curly brackets, let us use it inside the Regex. 123 00:05:53,001 --> 00:05:57,933 [No Audio] 124 00:05:57,934 --> 00:06:00,066 This Regex will match any text 125 00:06:00,067 --> 00:06:02,100 containing between three to five number of 126 00:06:02,101 --> 00:06:04,933 characters, let us add a simple text. 127 00:06:04,934 --> 00:06:09,566 [No Audio] 128 00:06:09,567 --> 00:06:12,300 Let us also add the code for displaying the matches. 129 00:06:13,200 --> 00:06:14,633 Let us cargo run this now. 130 00:06:14,634 --> 00:06:19,700 [No Audio] 131 00:06:19,701 --> 00:06:22,000 You may note that, it only matches the words 132 00:06:22,001 --> 00:06:25,266 that are between the given range of characters. 133 00:06:25,300 --> 00:06:27,166 For those words who are greater than length, 134 00:06:27,200 --> 00:06:29,166 only their first five characters have 135 00:06:29,167 --> 00:06:31,766 matched. You may use the word boundary meta 136 00:06:31,767 --> 00:06:34,100 corrector that is slash be at the start enter 137 00:06:34,101 --> 00:06:36,233 at the end of the Regex to enforce to only 138 00:06:36,234 --> 00:06:38,600 matches the words, and do not return partial 139 00:06:38,601 --> 00:06:41,866 word from the text. Let me add and cargo run again. 140 00:06:41,867 --> 00:06:47,566 [No Audio] 141 00:06:47,567 --> 00:06:49,733 You may note that, the partial words are not 142 00:06:49,734 --> 00:06:52,533 being returned now. A nice use case of 143 00:06:52,534 --> 00:06:54,566 limited repetition is, to limit the number of 144 00:06:54,567 --> 00:06:57,033 digits in the fraction and whole number part 145 00:06:57,034 --> 00:07:00,100 of a number. For instance, let me add a Regex. 146 00:07:00,101 --> 00:07:05,666 [No Audio] 147 00:07:05,667 --> 00:07:07,500 This Regex will make sure that the number 148 00:07:07,501 --> 00:07:10,500 of digits in the whole and fraction parts are 149 00:07:10,501 --> 00:07:13,400 between one to three digits with a dot 150 00:07:13,401 --> 00:07:16,966 between the two parts. Let me add some text for testing. 151 00:07:16,967 --> 00:07:22,433 [No Audio] 152 00:07:22,434 --> 00:07:24,166 Let us cargo run this now. 153 00:07:24,167 --> 00:07:27,833 [No Audio] 154 00:07:27,834 --> 00:07:29,300 Again to exclude the last 155 00:07:29,301 --> 00:07:31,500 number and make sure that, only that 156 00:07:31,501 --> 00:07:33,900 it only returns the valid numbers, we will use 157 00:07:33,901 --> 00:07:35,400 the word boundary at the beginning 158 00:07:35,401 --> 00:07:36,700 and at the end of the Regex. 159 00:07:36,701 --> 00:07:39,433 [No Audio] 160 00:07:39,434 --> 00:07:41,100 Now the last number is 161 00:07:41,101 --> 00:07:44,500 being excluded. Sometimes we may know the 162 00:07:44,501 --> 00:07:47,166 least number of times, but we may do not know 163 00:07:47,167 --> 00:07:49,233 the maximum number of times the repetitions 164 00:07:49,234 --> 00:07:51,666 will occur. In that case, we will only mention 165 00:07:51,667 --> 00:07:53,366 the first number in the curly brackets and 166 00:07:53,367 --> 00:07:56,033 we'll keep the second number empty, let me add a Regex. 167 00:07:56,034 --> 00:08:02,166 [No Audio] 168 00:08:02,167 --> 00:08:04,566 This will match any text containing three or 169 00:08:04,567 --> 00:08:08,666 more digits. Let us also add some text for testing. 170 00:08:08,667 --> 00:08:14,233 [No Audio] 171 00:08:14,234 --> 00:08:16,100 In this case, it will only match the first 172 00:08:16,101 --> 00:08:18,166 three numbers, since they contain three or 173 00:08:18,167 --> 00:08:20,766 more digit, the last number will not be 174 00:08:20,767 --> 00:08:23,500 matched. Lastly, we can mention a fixed 175 00:08:23,501 --> 00:08:25,800 number of times of repetitions by placing a 176 00:08:25,801 --> 00:08:28,266 single number within the curly brackets, this 177 00:08:28,267 --> 00:08:30,200 is quite simple and will be covered in the 178 00:08:30,201 --> 00:08:33,000 next example. Let us now move to the next 179 00:08:33,033 --> 00:08:35,533 topic, which is capturing groups, I will 180 00:08:35,534 --> 00:08:36,900 comment out the code first. 181 00:08:36,901 --> 00:08:42,433 [No Audio] 182 00:08:42,434 --> 00:08:45,333 Sometimes the pattern we want to define may 183 00:08:45,334 --> 00:08:47,666 be a bit more complex, and therefore we may 184 00:08:47,667 --> 00:08:50,300 want to break it down into its individual 185 00:08:50,301 --> 00:08:52,700 components for the sake of simplification. 186 00:08:52,833 --> 00:08:54,900 This can be done by placing parts of a 187 00:08:54,901 --> 00:08:56,966 regular expression inside parentheses to 188 00:08:56,967 --> 00:08:59,133 create groups or parts within a regular 189 00:08:59,134 --> 00:09:02,100 expression. Consider a case for detecting the 190 00:09:02,101 --> 00:09:04,066 dates which are given in the form of year, 191 00:09:04,067 --> 00:09:06,733 month, and day. The year is in the form of 192 00:09:06,734 --> 00:09:09,000 four digits, followed by a month containing 193 00:09:09,001 --> 00:09:11,333 two digits, and lastly the day containing two 194 00:09:11,334 --> 00:09:14,233 digits. We will write a Regex for each part 195 00:09:14,234 --> 00:09:16,266 of the year, day, and month separately using 196 00:09:16,267 --> 00:09:18,833 the capturing groups. The Regex is going to 197 00:09:18,834 --> 00:09:21,366 be simple, so let us write it and then we will explain. 198 00:09:21,367 --> 00:09:29,733 [No Audio] 199 00:09:29,734 --> 00:09:32,466 The Regex in the first smooth brackets will 200 00:09:32,467 --> 00:09:35,200 match the year. The year contains exactly four 201 00:09:35,201 --> 00:09:37,133 digits, so inside the curly brackets we wrote 202 00:09:37,134 --> 00:09:40,266 four, which will match exactly four digits. 203 00:09:40,300 --> 00:09:42,933 Next we have a mandatory dash, this is 204 00:09:42,934 --> 00:09:45,400 followed by another group which will contain 205 00:09:45,401 --> 00:09:47,366 a couple of digits for the months, followed by 206 00:09:47,367 --> 00:09:50,166 another mandatory dash, and finally another 207 00:09:50,167 --> 00:09:52,800 group for the day containing two digits. Let 208 00:09:52,801 --> 00:09:54,800 us write some dates in the input text now. 209 00:09:54,801 --> 00:10:00,900 [No Audio] 210 00:10:00,901 --> 00:10:02,900 Lastly, we will add the code for displaying 211 00:10:02,901 --> 00:10:06,233 the result. The value at the index zero of 212 00:10:06,234 --> 00:10:08,500 the capture will contain the full match, and 213 00:10:08,501 --> 00:10:10,833 the individual groups will be in the 214 00:10:10,834 --> 00:10:13,366 remaining of the indexes. In particular, the 215 00:10:13,367 --> 00:10:15,466 value at the index 1 will contain the year 216 00:10:15,467 --> 00:10:17,633 group, the index 2 will contain the value 217 00:10:17,634 --> 00:10:20,233 for the month, and the index 3 will contain 218 00:10:20,234 --> 00:10:22,500 the value for the day. Let us add a suitable 219 00:10:22,501 --> 00:10:24,266 print statement to display the result. 220 00:10:24,267 --> 00:10:32,466 [No Audio] 221 00:10:32,467 --> 00:10:34,266 Let us go go run this now. 222 00:10:34,267 --> 00:10:38,333 [No Audio] 223 00:10:38,334 --> 00:10:39,900 With this we end 224 00:10:39,901 --> 00:10:41,733 this tutorial. There are many details of Regexes, 225 00:10:41,734 --> 00:10:43,833 but we intentionally leave them for now 226 00:10:43,933 --> 00:10:46,166 for the sake of briefness. Since this is a 227 00:10:46,167 --> 00:10:48,400 Rust course and not a regular expression 228 00:10:48,401 --> 00:10:50,533 course, therefore we will limit it to these 229 00:10:50,534 --> 00:10:52,766 details. For those of you who are interested 230 00:10:52,767 --> 00:10:55,066 in learning more about Regexes, they may take 231 00:10:55,067 --> 00:10:57,433 our other course on regular expression, you 232 00:10:57,434 --> 00:11:00,333 can ask for a coupon of that and I will give 233 00:11:00,334 --> 00:11:02,866 you away a free coupon for the course. For now 234 00:11:02,867 --> 00:11:04,733 the details covered in these tutorials are 235 00:11:04,734 --> 00:11:06,600 enough to give you a quick start and build 236 00:11:06,601 --> 00:11:09,100 simple regular expressions. See you again and 237 00:11:09,101 --> 00:11:11,866 until next tutorial, enjoy Rust programming. 238 00:11:11,867 --> 00:11:15,400 [No Audio]