1 00:00:06,720 --> 00:00:10,050 - In this video, you will learn about regular expressions. 2 00:00:10,050 --> 00:00:12,180 So, regular expressions are text patterns 3 00:00:12,180 --> 00:00:14,670 that are used by tools like grep and others. 4 00:00:14,670 --> 00:00:15,990 And why is it useful? 5 00:00:15,990 --> 00:00:17,280 Well that is useful in cases 6 00:00:17,280 --> 00:00:20,130 where you don't know exactly what you are looking for. 7 00:00:20,130 --> 00:00:22,620 Now, you can specify the pattern, 8 00:00:22,620 --> 00:00:26,790 a little bit like wild cards, in a shell environment. 9 00:00:26,790 --> 00:00:29,040 If you are going to use regular expressions, 10 00:00:29,040 --> 00:00:31,530 you should always put your regular expression, 11 00:00:31,530 --> 00:00:35,853 also shortly refer to as "regex," between single quotes. 12 00:00:36,780 --> 00:00:38,670 That is because regular expressions 13 00:00:38,670 --> 00:00:40,620 are using characters that can 14 00:00:40,620 --> 00:00:42,540 also be interpreted by the shell. 15 00:00:42,540 --> 00:00:44,490 And you want to avoid any problems 16 00:00:44,490 --> 00:00:46,743 because of misinterpretation. 17 00:00:48,000 --> 00:00:51,090 That's also one of the big things in regular expressions. 18 00:00:51,090 --> 00:00:52,260 The regular expressions 19 00:00:52,260 --> 00:00:55,590 sometimes looks like shell character 20 00:00:55,590 --> 00:00:57,000 but they are not. 21 00:00:57,000 --> 00:00:59,670 So, don't confuse the regular expressions, 22 00:00:59,670 --> 00:01:03,780 with globbing, which is shell wildcards, and so on, 23 00:01:03,780 --> 00:01:06,570 because they look the same but they are not. 24 00:01:06,570 --> 00:01:08,272 Look at this example for instance, 25 00:01:08,272 --> 00:01:13,272 grep single quote, a star, single quote, on a star. 26 00:01:13,290 --> 00:01:15,900 So the single quotes are identifying 27 00:01:15,900 --> 00:01:18,720 that the first 'a star' is a regular expression. 28 00:01:18,720 --> 00:01:23,340 And the second a star is shell globbing. 29 00:01:23,340 --> 00:01:26,010 Regular expressions are very powerful 30 00:01:26,010 --> 00:01:28,097 and you are going to appreciate them. 31 00:01:28,097 --> 00:01:30,690 But they only work in specific utilities 32 00:01:30,690 --> 00:01:32,970 like grep and vim and awk and sed. 33 00:01:32,970 --> 00:01:35,190 There's a couple more utilities you will find 34 00:01:35,190 --> 00:01:36,453 your way automatically. 35 00:01:37,290 --> 00:01:38,370 What you should to remember 36 00:01:38,370 --> 00:01:41,460 If your utility is about finding text 37 00:01:41,460 --> 00:01:43,380 and about treating text, 38 00:01:43,380 --> 00:01:44,340 then there's a chance 39 00:01:44,340 --> 00:01:47,100 that it works with regular expressions. 40 00:01:47,100 --> 00:01:48,650 I will give you an introduction 41 00:01:48,650 --> 00:01:51,210 to the most useful regular expressions. 42 00:01:51,210 --> 00:01:53,460 For more detailed information 43 00:01:53,460 --> 00:01:57,150 consider reading "Man 7 regex", 44 00:01:57,150 --> 00:02:01,110 where the seven is referring to man section seven, 45 00:02:01,110 --> 00:02:03,993 to get information about the regular expressions. 46 00:02:05,160 --> 00:02:07,410 Now, regular expressions are powerful 47 00:02:07,410 --> 00:02:10,860 but regular expressions also are confusing. 48 00:02:10,860 --> 00:02:12,000 Now, why is that? 49 00:02:12,000 --> 00:02:14,010 Well, that is because regular expressions work 50 00:02:14,010 --> 00:02:18,630 with many tools, but there is also this thing 51 00:02:18,630 --> 00:02:21,330 called extended regular expressions 52 00:02:21,330 --> 00:02:23,280 and extended regular expressions 53 00:02:23,280 --> 00:02:24,690 don't always work. 54 00:02:24,690 --> 00:02:27,270 And a regular expression is either a normal 55 00:02:27,270 --> 00:02:29,460 or an extended regular expression. 56 00:02:29,460 --> 00:02:31,920 And if you use an extended regular expression 57 00:02:31,920 --> 00:02:36,150 without using an extended regular expression option 58 00:02:36,150 --> 00:02:38,100 then it's not going to work. 59 00:02:38,100 --> 00:02:39,510 Starting to feel confused? 60 00:02:39,510 --> 00:02:43,080 I can imagine, but I will clarify once I demonstrate. 61 00:02:43,080 --> 00:02:46,380 You should also notice at some scripting languages 62 00:02:46,380 --> 00:02:47,610 like Pearl for instance, 63 00:02:47,610 --> 00:02:49,743 come with their own regular expressions. 64 00:02:50,730 --> 00:02:52,860 And that makes that if you're going to look 65 00:02:52,860 --> 00:02:55,442 up regular expressions on Google, for instance, 66 00:02:55,442 --> 00:02:57,300 you will find information 67 00:02:57,300 --> 00:02:59,812 that doesn't match your environment 68 00:02:59,812 --> 00:03:02,133 and it won't work. 69 00:03:03,709 --> 00:03:07,260 Also confusing is that there is so much information 70 00:03:07,260 --> 00:03:09,120 about regular expressions 71 00:03:09,120 --> 00:03:11,190 that it is difficult to find the difference 72 00:03:11,190 --> 00:03:12,420 between the things that matter 73 00:03:12,420 --> 00:03:14,670 and the things that don't matter. 74 00:03:14,670 --> 00:03:17,790 Now, I've put together a slight 75 00:03:17,790 --> 00:03:21,390 that contains all of the most common, regular expressions. 76 00:03:21,390 --> 00:03:22,223 And you know what? 77 00:03:22,223 --> 00:03:26,300 I am just going to put the entire slide on 78 00:03:27,240 --> 00:03:30,360 on the screen so that we can talk about these examples 79 00:03:30,360 --> 00:03:35,360 and you can use the slide for your own reference purposes. 80 00:03:41,598 --> 00:03:46,098 All right, let me start with grep, grep Anna or users. 81 00:03:48,330 --> 00:03:49,260 What is that doing? 82 00:03:49,260 --> 00:03:53,313 This is showing all lines that contain the text Anna. 83 00:03:54,180 --> 00:03:57,210 Notice that Anna in this case is a regular expression 84 00:03:57,210 --> 00:04:00,090 in order to be interpreted as a regular expression 85 00:04:00,090 --> 00:04:01,860 if you really want to do it a pure way 86 00:04:01,860 --> 00:04:03,960 put it between single quotes. 87 00:04:03,960 --> 00:04:06,060 No difference but it's good practice 88 00:04:06,060 --> 00:04:09,600 to identify your regular expressions using single quotes. 89 00:04:09,600 --> 00:04:12,242 First leg X I want to introduce is carrot, 90 00:04:12,242 --> 00:04:16,713 which is looking for the pattern at the beginning of a line. 91 00:04:17,730 --> 00:04:20,760 Likewise, we can also use dollar, 92 00:04:20,760 --> 00:04:22,170 which is doing the opposite, 93 00:04:22,170 --> 00:04:23,610 which is looking for the pattern 94 00:04:23,610 --> 00:04:24,990 at the end of the line, 95 00:04:24,990 --> 00:04:26,580 and you can combine them as well. 96 00:04:26,580 --> 00:04:29,160 So carrot, dollar is looking 97 00:04:29,160 --> 00:04:32,493 for lines starting and ending with. 98 00:04:33,462 --> 00:04:36,210 Now another one that is interesting 99 00:04:36,210 --> 00:04:38,970 is the slice B slash B 100 00:04:38,970 --> 00:04:40,890 is for end of word. 101 00:04:40,890 --> 00:04:43,320 Let's get back to the original start. 102 00:04:43,320 --> 00:04:44,640 Here we go. 103 00:04:44,640 --> 00:04:47,343 Look at the second line, which is Annabelle. 104 00:04:48,210 --> 00:04:50,040 I want to do a regular expression 105 00:04:50,040 --> 00:04:52,350 that will work on words only. 106 00:04:52,350 --> 00:04:54,120 So how are we going to do that? 107 00:04:54,120 --> 00:04:58,830 Well, I am going to use grab Anna slash B 108 00:04:58,830 --> 00:05:02,370 slash B is the word marker on users. 109 00:05:02,370 --> 00:05:06,330 And there we can see that it prints all lines 110 00:05:06,330 --> 00:05:07,803 that have Anna as a word, 111 00:05:08,670 --> 00:05:11,280 but not lines that contain it somewhere 112 00:05:11,280 --> 00:05:12,780 in another string. 113 00:05:12,780 --> 00:05:14,695 Now we have other regular expressions 114 00:05:14,695 --> 00:05:18,030 which are basically repeating operators 115 00:05:18,030 --> 00:05:20,970 and the repeating operators are interesting, 116 00:05:20,970 --> 00:05:24,900 but I need to modify my users file a little bit 117 00:05:24,900 --> 00:05:28,080 to show you how it works. 118 00:05:28,080 --> 00:05:33,080 So I am adding BT and bit and bite and bolt and boot 119 00:05:36,360 --> 00:05:39,750 and boot and well, something like this. 120 00:05:39,750 --> 00:05:41,433 That should be good enough. 121 00:05:42,990 --> 00:05:46,560 So let's talk about these repeating operators 122 00:05:46,560 --> 00:05:50,053 to start with I am going to show you grep 'b' 123 00:05:53,193 --> 00:05:55,280 B I star T on users. 124 00:05:58,080 --> 00:05:59,400 Now this is an interesting one 125 00:05:59,400 --> 00:06:00,263 because here we have the star 126 00:06:00,263 --> 00:06:04,590 and the star means zero or more times. 127 00:06:04,590 --> 00:06:05,423 So what do we see? 128 00:06:05,423 --> 00:06:08,550 We see, we get match on BT on bit and on bite. 129 00:06:08,550 --> 00:06:11,719 So we are looking for zero or more occurrences 130 00:06:11,719 --> 00:06:14,733 of the, of the Is. 131 00:06:16,200 --> 00:06:19,440 You can also do that in another way, by using a plus. 132 00:06:19,440 --> 00:06:23,133 A plus if you replace the, the star with a plus 133 00:06:23,133 --> 00:06:26,443 then you are looking for one or more times 134 00:06:26,443 --> 00:06:28,650 but as you can see, that doesn't work. 135 00:06:28,650 --> 00:06:29,483 How come? 136 00:06:29,483 --> 00:06:30,840 Well, that is because this is one 137 00:06:30,840 --> 00:06:34,290 of these famous, extended regular expressions. 138 00:06:34,290 --> 00:06:37,140 And if you are going to use extended regular expressions 139 00:06:37,140 --> 00:06:37,973 with grep, 140 00:06:37,973 --> 00:06:39,793 you need grep minus uppercase E 141 00:06:39,793 --> 00:06:42,480 minus uppercase E will look 142 00:06:42,480 --> 00:06:45,030 for the extended regular expression 143 00:06:45,030 --> 00:06:46,770 and without minus uppercase E 144 00:06:46,770 --> 00:06:49,706 it is not interpreted the right way. 145 00:06:49,706 --> 00:06:51,780 Now we also have question mark 146 00:06:51,780 --> 00:06:52,740 what is question mark 147 00:06:52,740 --> 00:06:54,930 Question mark is zero or one time 148 00:06:54,930 --> 00:06:57,960 also an extended regular expression. 149 00:06:57,960 --> 00:07:02,960 So you want to get a match on, on zero or one ice 150 00:07:04,770 --> 00:07:06,510 Well, this is what we get. 151 00:07:06,510 --> 00:07:08,370 That's not the most convincing example. 152 00:07:08,370 --> 00:07:12,030 Let's do that on B O question mark T 153 00:07:12,030 --> 00:07:15,900 which is giving us BT and bot. 154 00:07:15,900 --> 00:07:20,900 Whereas if we use the B plus B O plus T 155 00:07:22,620 --> 00:07:26,640 we also see boot and boot with, with many O's. 156 00:07:26,640 --> 00:07:28,710 Now we can also use a regular expression 157 00:07:28,710 --> 00:07:30,810 that's looking for an exact match. 158 00:07:30,810 --> 00:07:34,650 So I am using grep on BO 159 00:07:34,650 --> 00:07:39,300 and I want to find lines that have exactly two Os 160 00:07:39,300 --> 00:07:40,650 not more, not less. 161 00:07:40,650 --> 00:07:41,730 How do we do that? 162 00:07:41,730 --> 00:07:44,400 By using slash curly brace, 163 00:07:44,400 --> 00:07:45,540 followed by a two 164 00:07:45,540 --> 00:07:47,160 followed by another slash 165 00:07:47,160 --> 00:07:49,170 followed by another curly brace 166 00:07:49,170 --> 00:07:50,370 followed by a T 167 00:07:50,370 --> 00:07:53,433 and a single quote on users. 168 00:07:54,420 --> 00:07:56,940 So this is giving me an exact match. 169 00:07:56,940 --> 00:08:00,210 Now, if you want to, to understand it's a little bit 170 00:08:00,210 --> 00:08:01,200 hard to understand, 171 00:08:01,200 --> 00:08:02,850 but we're what it really comes down 172 00:08:02,850 --> 00:08:04,956 to is a two between braces 173 00:08:04,956 --> 00:08:07,680 but these braces need to be escaped. 174 00:08:07,680 --> 00:08:09,630 And that's why there's this slice before them 175 00:08:09,630 --> 00:08:14,100 and that makes them a little bit more readable. 176 00:08:14,100 --> 00:08:15,343 But this is how it works. 177 00:08:15,343 --> 00:08:17,670 What happens if you don't escape them? 178 00:08:17,670 --> 00:08:21,360 Well, then you don't get a result and it's just wrong. 179 00:08:21,360 --> 00:08:23,881 Now there are other regular expressions as well. 180 00:08:23,881 --> 00:08:28,290 Like for instance, you are looking for a string 181 00:08:28,290 --> 00:08:32,003 that must be a word grep on slash B 182 00:08:35,700 --> 00:08:39,930 Anna slash B on users 183 00:08:39,930 --> 00:08:40,763 for instance. 184 00:08:40,763 --> 00:08:43,743 And now we can see all occurrences where Anna is a word 185 00:08:43,743 --> 00:08:47,490 it's not really necessary to do the B at the beginning 186 00:08:47,490 --> 00:08:52,350 because it works out well in either way. 187 00:08:52,350 --> 00:08:55,770 You also are going to appreciate the one character. 188 00:08:55,770 --> 00:09:00,770 So if you use grab dot, dot, dot, dot on users 189 00:09:03,630 --> 00:09:05,790 can you predict what is going to happen? 190 00:09:05,790 --> 00:09:08,340 It is just showing all of the lines. 191 00:09:08,340 --> 00:09:09,173 Why? 192 00:09:09,173 --> 00:09:12,540 Well, because it's looking for, for characters 193 00:09:12,540 --> 00:09:13,556 any character 194 00:09:13,556 --> 00:09:16,020 that's probably not what you wanted. 195 00:09:16,020 --> 00:09:18,840 If you are looking for four characters in a word, 196 00:09:18,840 --> 00:09:20,133 for instance, 197 00:09:20,133 --> 00:09:23,250 that's becoming more interesting 198 00:09:23,250 --> 00:09:27,900 then you use slash b behind it. 199 00:09:27,900 --> 00:09:30,480 Then it's considering individual words. 200 00:09:30,480 --> 00:09:31,980 Is that convincing? 201 00:09:31,980 --> 00:09:35,490 Well, maybe not, but if we put a B in the beginning 202 00:09:35,490 --> 00:09:37,860 then we get something that really makes sense. 203 00:09:37,860 --> 00:09:41,520 We are looking for four character words right here 204 00:09:41,520 --> 00:09:44,490 and likewise, you can use carrot and dollar 205 00:09:44,490 --> 00:09:47,943 as the start and end of the lines. 206 00:09:49,668 --> 00:09:51,750 Last regular expression 207 00:09:51,750 --> 00:09:56,220 I want to show you is another extended regular expression. 208 00:09:56,220 --> 00:10:01,220 Grep minus uppercase E, followed by SVM pipe VMX 209 00:10:06,690 --> 00:10:11,610 on slash proc, slash CPU info. 210 00:10:13,710 --> 00:10:16,746 And this is actually a useful regular expression 211 00:10:16,746 --> 00:10:20,700 because proc CPU info is a file that exists in the kernel. 212 00:10:20,700 --> 00:10:24,480 And this file contains CPU information. 213 00:10:24,480 --> 00:10:28,659 Let me show you cat on slash proc slash CPU info. 214 00:10:28,659 --> 00:10:30,240 This is the file. 215 00:10:30,240 --> 00:10:32,430 If you wanna do virtualization 216 00:10:32,430 --> 00:10:37,290 in order to do virtualization, you need either SVM 217 00:10:37,290 --> 00:10:40,740 or VMX to be in the result of proc CPU info. 218 00:10:40,740 --> 00:10:44,730 And this is how you can do an either or regular expression. 219 00:10:44,730 --> 00:10:46,200 Let me just fake it 220 00:10:46,200 --> 00:10:48,660 because here we don't see a result at all. 221 00:10:48,660 --> 00:10:50,220 That's not very convincing. 222 00:10:50,220 --> 00:10:51,450 I'm just going to fake it. 223 00:10:51,450 --> 00:10:54,990 And instead of VMX, I'm looking for VME and there we go 224 00:10:54,990 --> 00:10:58,930 you can see that that is actually giving a match. 225 00:10:58,930 --> 00:11:02,190 Now what if we also include PSE 226 00:11:02,190 --> 00:11:04,650 then you can see that both are giving a match. 227 00:11:04,650 --> 00:11:06,480 This either or regular expression 228 00:11:06,480 --> 00:11:09,840 in fact is a very useful regular expression. 229 00:11:09,840 --> 00:11:11,250 Are you feeling confused? 230 00:11:11,250 --> 00:11:12,300 I can imagine. 231 00:11:12,300 --> 00:11:15,210 Regular expressions are confusing 232 00:11:15,210 --> 00:11:16,860 and even the best make errors 233 00:11:16,860 --> 00:11:18,780 while using regular expression. 234 00:11:18,780 --> 00:11:21,780 My recommendation, practice what we have seen right here 235 00:11:21,780 --> 00:11:23,550 and you will get familiar with them 236 00:11:23,550 --> 00:11:25,083 one time, sooner or later.