1 00:00:00,000 --> 00:00:01,110 [No audio] 2 00:00:01,110 --> 00:00:03,870 As we are continuing to the text processing 3 00:00:03,870 --> 00:00:07,920 commands, we will go over the grep and egrep 4 00:00:07,920 --> 00:00:10,920 commands that is also as one of the powerful 5 00:00:10,920 --> 00:00:14,640 commands in terms of text processing. So what 6 00:00:14,640 --> 00:00:19,260 exactly is a grep. The grep command which stands 7 00:00:19,260 --> 00:00:23,130 for global regular expression print, processes 8 00:00:23,130 --> 00:00:26,940 texts line by line, and prints any lines which 9 00:00:26,940 --> 00:00:32,369 match a specified pattern. So basically in simple 10 00:00:32,369 --> 00:00:36,210 words, it's a search technology or a search 11 00:00:36,210 --> 00:00:40,470 feature in Linux that allows you to search for a 12 00:00:40,470 --> 00:00:44,970 specific feature or specific keyword actually, sorry 13 00:00:44,970 --> 00:00:48,810 not feature a specific keyword that you look for 14 00:00:49,080 --> 00:00:54,750 in a file or an output from a command. So let's 15 00:00:54,750 --> 00:00:57,510 get into a Linux machine and I will show you a few 16 00:00:57,510 --> 00:01:00,090 of the commands and examples. So this way you have 17 00:01:00,090 --> 00:01:02,490 a better understanding how the grep or egrep 18 00:01:02,490 --> 00:01:06,209 command works. I have my Linux machine right here 19 00:01:06,239 --> 00:01:11,160 open and again I am logged in as myself, the 20 00:01:11,160 --> 00:01:16,380 machine, the hostname is MyFirstLinuxVM and 21 00:01:17,460 --> 00:01:21,290 which directory I am in, I am in /home/iafzal. 22 00:01:21,290 --> 00:01:23,310 Now one thing I do want to tell you, you're probably 23 00:01:23,310 --> 00:01:25,770 thinking to yourself, hey why Imran alway runs 24 00:01:25,770 --> 00:01:29,460 whoami, and hostname when he sees himself here, 25 00:01:29,670 --> 00:01:32,970 as iafzal, and right here is hostname? The 26 00:01:32,970 --> 00:01:37,770 reason I do that is because a lot of systems, when 27 00:01:37,770 --> 00:01:39,600 you're going to get into the corporate world, a 28 00:01:39,600 --> 00:01:43,050 lot of systems not going to have this type of 29 00:01:43,050 --> 00:01:45,720 prompt, they're probably going to show you either 30 00:01:45,750 --> 00:01:49,500 this symbol or the hashtag. So you're probably not 31 00:01:49,500 --> 00:01:51,670 going to know whether you're logged in as iafzal, 32 00:01:51,670 --> 00:01:53,400 or you're going to logged into the right 33 00:01:53,400 --> 00:01:55,950 machine. But if you do see it perfect, that's 34 00:01:55,950 --> 00:01:59,010 great. But if you don't, that's why I want you to 35 00:01:59,010 --> 00:02:03,240 get used to have these few commands, whoami, 36 00:02:03,240 --> 00:02:05,520 hostname, and which directory you are in before 37 00:02:05,520 --> 00:02:07,650 you start making any changes to your system. 38 00:02:08,100 --> 00:02:11,610 Anyway, let's clear the screen. And the first 39 00:02:11,610 --> 00:02:14,550 command we will run for grep, as always check the 40 00:02:14,550 --> 00:02:17,940 version and check for help. So for grep dash dash 41 00:02:17,940 --> 00:02:21,180 version, it will give you the version and now you 42 00:02:21,180 --> 00:02:24,180 could read it, you know where to find the version. 43 00:02:24,420 --> 00:02:27,180 The next one is grep dash dash help. This will 44 00:02:27,180 --> 00:02:31,110 give you all the information about the grep and 45 00:02:31,170 --> 00:02:36,210 its options. If you want to get a quick list of 46 00:02:36,210 --> 00:02:38,220 all the options that are available, you always use 47 00:02:38,220 --> 00:02:42,990 grep dash dash help, if you want to get the longer 48 00:02:42,990 --> 00:02:47,040 description or have every option, you could use 49 00:02:47,040 --> 00:02:51,360 man and do grep, and this will give you a detailed 50 00:02:51,360 --> 00:02:55,410 information about the grep command. Anyway, so we 51 00:02:55,410 --> 00:02:57,720 know the version, we know how to get help, and now 52 00:02:57,720 --> 00:03:01,560 let's get to this second lab question which is to 53 00:03:01,560 --> 00:03:05,850 search for a keyword in a file. So I am in my home 54 00:03:05,850 --> 00:03:08,520 directory. I'll go back to the seinfeld directory, 55 00:03:08,550 --> 00:03:11,250 in there I have a file called seinfeld-characters 56 00:03:11,280 --> 00:03:15,360 we'll cat it out, and these, this are the contents of 57 00:03:15,360 --> 00:03:19,320 the seinfeld directory. Okay, so now if I just 58 00:03:19,320 --> 00:03:25,170 clear the screen, go back up, and what I will do is I 59 00:03:25,170 --> 00:03:30,817 will grep for Seinfeld, in the filename 60 00:03:30,817 --> 00:03:35,010 seinfeld-characters. It means only give me the 61 00:03:35,010 --> 00:03:37,380 lines there matches the word Seinfeld. 62 00:03:37,680 --> 00:03:39,930 When I hit enter, there are two lines that match 63 00:03:39,930 --> 00:03:43,920 the word Seinfeld, which is Jerry Seinfeld and 64 00:03:43,920 --> 00:03:47,460 Morty Seinfeld. I could, same way I could do with 65 00:03:47,490 --> 00:03:50,130 other files as well. Let's say if I want to grep 66 00:03:50,130 --> 00:03:53,880 for my username iafzal in /etc/password, 67 00:03:54,180 --> 00:03:56,015 I could do the same thing grep iafzal 68 00:03:56,015 --> 00:03:59,190 /etc/password and it will go into that file 69 00:03:59,190 --> 00:04:02,250 and search for that specific keyword. And it 70 00:04:02,250 --> 00:04:05,190 brings back one line because there is only one 71 00:04:05,220 --> 00:04:09,680 user in /etc/passwd that has the word iafzal, 72 00:04:09,680 --> 00:04:12,450 and that's me, of course, right. Next 73 00:04:12,450 --> 00:04:16,920 command or the next syntax that we'll learn is 74 00:04:16,920 --> 00:04:20,579 search, search for a keyword and count it. So if I 75 00:04:20,579 --> 00:04:24,089 am doing, let's hit up arrow key twice, and this is 76 00:04:24,089 --> 00:04:26,459 a command we ran again, which is to grep Seinfeld 77 00:04:26,459 --> 00:04:29,730 from seinfeld-characters, and it gives me two but 78 00:04:29,730 --> 00:04:33,300 what if you have a file that has 1000s of words 79 00:04:33,450 --> 00:04:36,120 that matches your criteria, and you want to know 80 00:04:36,120 --> 00:04:38,430 how many times it shows up, then you could just 81 00:04:38,430 --> 00:04:41,580 simply do grep -c, and it would tell you that 82 00:04:41,610 --> 00:04:46,590 it is showing up two times. So that's what -c 83 00:04:46,620 --> 00:04:50,460 is used for. The next one is -i which is the 84 00:04:50,460 --> 00:04:54,060 one that you should always use and I have a habit 85 00:04:54,180 --> 00:04:57,870 of using grep with always -i. What -i 86 00:04:57,870 --> 00:05:00,330 does is it actually search for keyword and 87 00:05:00,360 --> 00:05:05,130 ignores the case sensitive. Meaning if I have grep 88 00:05:05,130 --> 00:05:08,400 Seinfeld seinfeld-character it gave me 2 Seinfeld. 89 00:05:08,400 --> 00:05:13,290 What if I have, in inside of this while I have a 90 00:05:13,290 --> 00:05:16,560 seinfeld with lowercase s. So let's say if I'm 91 00:05:16,560 --> 00:05:19,740 grepping for seinfeld with lowercase s, then hit 92 00:05:19,740 --> 00:05:22,770 enter. It's not giving me anything, even though I 93 00:05:22,770 --> 00:05:25,680 know there is a word seinfeld in it. So that's 94 00:05:25,680 --> 00:05:29,790 why I always use -i option to tell my system, 95 00:05:29,790 --> 00:05:33,870 hey, ignore any uppercase or lowercase, from that 96 00:05:33,870 --> 00:05:38,007 keyword, and from that file. And you do -i, 97 00:05:38,007 --> 00:05:40,860 it actually greps and gets you the 98 00:05:40,860 --> 00:05:43,620 result, regardless whether they are written in 99 00:05:43,620 --> 00:05:48,240 uppercase or lowercase. Next one is n, displays the 100 00:05:48,240 --> 00:05:52,590 matched line and their line numbers. So we used, let's 101 00:05:52,590 --> 00:05:55,500 clear the screen, and if you remember, use -c 102 00:05:55,500 --> 00:05:59,220 option, which gives you the total number of lines 103 00:05:59,220 --> 00:06:02,190 that matched our criteria. Now I wanted to see 104 00:06:02,190 --> 00:06:04,980 those lines to end the line and number where 105 00:06:05,010 --> 00:06:09,300 exactly it happened. So in that case, I hit up up 106 00:06:09,300 --> 00:06:12,660 arrow key and replace -c with -n, hit 107 00:06:12,660 --> 00:06:15,510 enter. And you will see, it will give me the two 108 00:06:15,510 --> 00:06:18,180 lines that match my criteria. And it also telling 109 00:06:18,180 --> 00:06:21,720 me, the first line is actually on the line number 110 00:06:21,720 --> 00:06:24,720 one, and the second matched line is line number 111 00:06:24,750 --> 00:06:28,410 eight. Isn't that awesome? Yeah, I like it. Okay. 112 00:06:28,650 --> 00:06:31,740 Next one is -v, also very powerful. Meaning 113 00:06:31,740 --> 00:06:37,140 what if you want to get everything but that search 114 00:06:37,320 --> 00:06:40,470 keyword. So let's say if I want to get all the 115 00:06:40,470 --> 00:06:44,550 lines that do not match seinfeld. So let's do 116 00:06:44,550 --> 00:06:51,240 grep -v seinfeld, or I could just do vi or 117 00:06:51,240 --> 00:06:58,920 iv to include, i as in to ignore, seinfeld from 118 00:06:58,920 --> 00:07:02,160 seinfeld-characters. And now you will see I have a 119 00:07:02,160 --> 00:07:05,400 list of every line that the whole file content, 120 00:07:05,640 --> 00:07:10,560 but this does not have that two Jerry Seinfeld, and 121 00:07:10,560 --> 00:07:15,030 Morty Seinfeld. These two lines are omitted 122 00:07:15,300 --> 00:07:19,680 or are excluded from it because I use the -v 123 00:07:19,680 --> 00:07:23,100 option. So -v is basically telling the system 124 00:07:23,280 --> 00:07:27,660 give me everything except this search keyword. All 125 00:07:27,660 --> 00:07:31,470 right. The next one is what if you want to combine 126 00:07:31,470 --> 00:07:34,890 the keyword with other texts processors commands 127 00:07:34,890 --> 00:07:37,200 that we have learned previously. Right. Yes, we 128 00:07:37,200 --> 00:07:40,230 could definitely do that. So let's say if I type 129 00:07:40,530 --> 00:07:44,520 grep seinfeld, or let me run the same command, 130 00:07:44,520 --> 00:07:47,250 the last command, grep -vi seinfeld and 131 00:07:47,250 --> 00:07:51,060 seinfeld-characters, right? Right. So what if I 132 00:07:51,060 --> 00:07:54,240 want to print, from this output I only wanted to 133 00:07:54,240 --> 00:07:56,940 print the first column. Then you could, what you 134 00:07:56,940 --> 00:07:59,400 could do is you could hit up arrow key, pipe it 135 00:07:59,400 --> 00:08:04,230 right here, awk it and dollar by the way, chop 136 00:08:04,230 --> 00:08:07,050 down the line, it is not actually putting into a 137 00:08:07,050 --> 00:08:09,870 new line. It's a continuation of the same line. If 138 00:08:09,870 --> 00:08:11,610 you make your window bigger, you will see it as 139 00:08:11,610 --> 00:08:14,280 coming up in one line. Let me show you real quick. 140 00:08:14,670 --> 00:08:19,050 You see if I do it this, see went back. So print 141 00:08:20,070 --> 00:08:25,890 dollar one, and curly braces, and quote close. And 142 00:08:25,890 --> 00:08:30,270 now it's giving me only the first column off my 143 00:08:30,360 --> 00:08:34,320 matched criteria. Now, what if I wanted to get only 144 00:08:34,320 --> 00:08:38,308 the first three letters have this output? Can you 145 00:08:38,308 --> 00:08:41,370 guess, what could that be? I'm sure you know how to 146 00:08:41,370 --> 00:08:44,610 do it. But if you forgot, that's fine, you hit up 147 00:08:44,610 --> 00:08:48,090 arrow key, now you could put another pipe. And now 148 00:08:48,090 --> 00:08:51,990 that you could do cut -c for characters, and 149 00:08:51,990 --> 00:08:54,870 you want the first three that will be one minus 150 00:08:54,900 --> 00:08:57,120 three, one through three because you are picking 151 00:08:57,120 --> 00:09:00,450 the range. And now you will see it is giving you 152 00:09:00,480 --> 00:09:03,690 everything except the keyword line that matches 153 00:09:03,690 --> 00:09:07,350 Seinfeld. Alright, let's clear the screen. Let me 154 00:09:07,350 --> 00:09:11,580 make my window a little smaller. And I'll move on 155 00:09:11,580 --> 00:09:16,230 to the next one, which is ls -l. So grep does 156 00:09:16,230 --> 00:09:20,670 not only search for a keyword from a file, it 157 00:09:20,670 --> 00:09:24,120 actually search for any keyword from any output of 158 00:09:24,120 --> 00:09:28,920 a command. So let's say if I have, let me go to my 159 00:09:28,920 --> 00:09:32,400 home directory, so it's /home/iafzal/seinfeld. 160 00:09:32,400 --> 00:09:35,640 I'll go one step back to cd .., it will bring me 161 00:09:35,640 --> 00:09:38,820 back to my /home/iafzal directory. Now in 162 00:09:38,820 --> 00:09:41,790 here, when I do ls -l, you're gonna see the 163 00:09:41,790 --> 00:09:44,580 listing of all the files and directories. Now what 164 00:09:44,580 --> 00:09:48,420 in this, what if in this output, I only wanted to 165 00:09:48,420 --> 00:09:52,140 see the line that matched the file or directory 166 00:09:52,140 --> 00:09:56,790 Desktop. So I hit up arrow key, pipe it, and now 167 00:09:56,820 --> 00:10:02,280 I'll grep it for Desktop, and let's say just for 168 00:10:02,850 --> 00:10:06,630 giggles, use the lowercase d, and hit enter and 169 00:10:06,630 --> 00:10:09,360 it's not going to give me anything. Why? Because if 170 00:10:09,360 --> 00:10:12,720 you notice the Desktop has uppercase D, so you 171 00:10:12,720 --> 00:10:15,450 could either do hit up arrow key and replace the 172 00:10:15,480 --> 00:10:18,660 lowercase d with uppercase or leave it 173 00:10:18,660 --> 00:10:22,980 as it, and put -i option to ignore the case 174 00:10:22,980 --> 00:10:25,230 sensitive, hit enter, and you're gonna see that it 175 00:10:25,230 --> 00:10:28,470 is pulling only that line, that matched the word 176 00:10:28,500 --> 00:10:32,610 Desktop. Isn't that cool? All right, now let me go 177 00:10:32,610 --> 00:10:35,520 into the grep and egrep functionality. Now you 178 00:10:35,520 --> 00:10:39,030 know grep works perfectly on every keyword. Now 179 00:10:39,390 --> 00:10:42,570 the egrep comes into play when you want to search 180 00:10:42,630 --> 00:10:47,370 two words from a file. So let's clear the screen, 181 00:10:47,430 --> 00:10:50,580 let's go into the seinfeld directory. In seinfeld 182 00:10:50,580 --> 00:10:53,040 directory, we have a file called seinfeld, and let's 183 00:10:53,040 --> 00:10:56,190 cat that seinfeld-characters file, hit enter. 184 00:10:56,700 --> 00:10:59,610 And you'll see these are the contents of that 185 00:11:00,210 --> 00:11:04,350 file. Now what if I want to get the list or lines 186 00:11:04,350 --> 00:11:07,800 that match not only Seinfeld, but also Costanza. 187 00:11:08,730 --> 00:11:11,580 For that we will use the egrep, which is a little 188 00:11:11,580 --> 00:11:15,180 powerful command than grep. So it will be egrep 189 00:11:15,600 --> 00:11:19,170 -i of course, I want to ignore uppercase or 190 00:11:19,170 --> 00:11:22,690 lowercase, here you put double quote, 191 00:11:22,690 --> 00:11:25,209 [No audio] 192 00:11:25,209 --> 00:11:28,230 the keyword and the first keyword I want to search for is 193 00:11:28,230 --> 00:11:33,000 Seinfeld, then the for the second keyword you have 194 00:11:33,000 --> 00:11:36,360 to put a pipe in the middle and then put the 195 00:11:36,360 --> 00:11:42,060 second keyword would be Costanza, close with the 196 00:11:42,060 --> 00:11:45,120 double quotes, and of course specify the file name 197 00:11:45,120 --> 00:11:49,500 from where you are trying to search. You hit 198 00:11:49,530 --> 00:11:52,650 enter, and now you will see it is picking up, let me 199 00:11:52,650 --> 00:11:55,920 make this bigger so, and hit the up arrow key, so 200 00:11:55,920 --> 00:11:58,500 you know what exactly the output. Now you see it's 201 00:11:58,500 --> 00:12:01,603 giving me Jerry Seinfeld, George Costanza, 202 00:12:01,603 --> 00:12:05,790 Frank Costanza, Estelle Costanza, and Morty Seinfeld. So all 203 00:12:05,790 --> 00:12:09,570 the lines that match Seinfeld and Costanzas are 204 00:12:09,840 --> 00:12:14,790 there. So again, if you wanted to awk it out for 205 00:12:14,790 --> 00:12:18,570 a second column, if you wanted to cut it out for 206 00:12:18,570 --> 00:12:22,140 certain characters, you could definitely do that 207 00:12:22,140 --> 00:12:26,370 by putting pipe after one another. So this 208 00:12:26,370 --> 00:12:29,970 lecture is all about grep and egrep, and again, 209 00:12:30,000 --> 00:12:33,300 I want you to try man on grep and try different 210 00:12:33,330 --> 00:12:36,240 options that are available to you like these are 211 00:12:36,240 --> 00:12:39,660 the different options. Try it if you have time, 212 00:12:39,690 --> 00:12:42,750 otherwise the ones I showed you, these are more 213 00:12:42,750 --> 00:12:45,930 than enough for you to perform your everyday daily 214 00:12:45,930 --> 00:12:47,915 task while you are at the job. 215 00:12:47,915 --> 00:12:50,091 [No audio]