1 00:00:00,000 --> 00:00:03,449 Hello, and welcome to the lecture of awk, and this 2 00:00:03,449 --> 00:00:05,609 is one of the commands for text processing 3 00:00:05,609 --> 00:00:08,849 commands. awk is a very powerful command and we 4 00:00:08,849 --> 00:00:11,249 are going to try a few commands that awk has to 5 00:00:11,249 --> 00:00:15,869 offer. So what exactly is an awk? awk is a utility 6 00:00:16,049 --> 00:00:19,109 or language designed for data extraction, and most 7 00:00:19,109 --> 00:00:22,229 of the time it is used to extract fields from a 8 00:00:22,229 --> 00:00:26,429 file or from an input. So let's try a few commands 9 00:00:26,729 --> 00:00:30,419 of our awk. And I have my Linux machine opened 10 00:00:30,419 --> 00:00:35,159 right here on the side, I am logged in as myself, 11 00:00:35,189 --> 00:00:38,879 I logged in as PuTTY. I logged in through the 12 00:00:38,879 --> 00:00:41,399 PuTTY, okay, and the hostname, which is the 13 00:00:41,399 --> 00:00:44,279 server name of my machine is MyFirstLinuxVM. 14 00:00:44,909 --> 00:00:48,299 And I am in my home directory, perfect. Now go ahead 15 00:00:48,299 --> 00:00:51,179 and clear the screen by typing clear. Now let's go 16 00:00:51,179 --> 00:00:54,989 back to the slide, and in this slide, we'll go 17 00:00:54,989 --> 00:00:57,389 over the first command, and the first command is to 18 00:00:57,389 --> 00:01:01,229 check the version of awk. So awk space dash dash 19 00:01:01,229 --> 00:01:03,809 version will give you the version of the 20 00:01:03,809 --> 00:01:07,979 awk utility, when it was written, and who wrote it, 21 00:01:08,009 --> 00:01:10,979 and few description of the awk. Go ahead and 22 00:01:10,979 --> 00:01:14,309 clear the screen, and try the next command. The 23 00:01:14,309 --> 00:01:17,669 next command will actually list the first column 24 00:01:17,699 --> 00:01:20,549 from a file. So let's go to our seinfeld 25 00:01:20,549 --> 00:01:23,339 directory, which is in our home directory, in that 26 00:01:23,339 --> 00:01:25,926 directory, we have a file called seinfeld-characters, 27 00:01:25,926 --> 00:01:28,049 and this is the file we will use for 28 00:01:28,049 --> 00:01:31,146 this lecture. So when you do cat seinfeld-characters, 29 00:01:31,146 --> 00:01:32,489 you're going to see the list of all 30 00:01:32,489 --> 00:01:35,489 the characters or the contents inside this file. 31 00:01:36,029 --> 00:01:38,969 And this file has two columns, as you can see by 32 00:01:38,969 --> 00:01:41,519 the first name and the last name. What if you want 33 00:01:41,519 --> 00:01:43,919 to get only the first name, which is the first 34 00:01:43,919 --> 00:01:46,469 column of this file. Then you will run the 35 00:01:46,469 --> 00:01:51,149 command awk single quote, curly braces, print dollar 36 00:01:51,149 --> 00:01:54,389 one, curly braces, close single quote, close, and the 37 00:01:54,389 --> 00:01:57,773 name of the file, in our case it is seinfeld-characters. 38 00:01:57,773 --> 00:01:59,699 And you're gonna see it only giving us 39 00:01:59,699 --> 00:02:03,959 the first column of this file. If I want 40 00:02:03,959 --> 00:02:06,329 to get the last name, which is the second column 41 00:02:06,749 --> 00:02:10,709 or second field of this file, then I'll do print 42 00:02:10,709 --> 00:02:13,318 dollar two. You could just simply, by the way you 43 00:02:13,318 --> 00:02:15,899 could just simply hit up arrow key and come back 44 00:02:15,899 --> 00:02:18,239 using the left arrow key and change the value. So 45 00:02:18,239 --> 00:02:19,919 in this case, I'm going to change it from one to 46 00:02:19,919 --> 00:02:23,309 two and leave everything as is and do the right 47 00:02:23,309 --> 00:02:25,199 arrow key to go all the way to the enter, hit 48 00:02:25,199 --> 00:02:28,019 enter. And you will see it is giving me the second 49 00:02:28,019 --> 00:02:30,629 column which is the last name of all the 50 00:02:30,629 --> 00:02:33,539 characters inside the seinfeld-characters file. 51 00:02:34,529 --> 00:02:38,579 Then we have the next command is, what if you want 52 00:02:38,579 --> 00:02:42,719 to list the column of an output. So not just, the 53 00:02:42,719 --> 00:02:45,269 actual the contents of a file, but actually the 54 00:02:45,269 --> 00:02:49,079 output of a command. So if I do ls -l in 55 00:02:49,079 --> 00:02:52,599 my directory, which is /home/iafzal/seinfeld 56 00:02:52,599 --> 00:02:54,779 you'll see I have these column which is a 57 00:02:54,779 --> 00:03:00,269 permission, and links, the user, group, size, month, 58 00:03:00,629 --> 00:03:03,929 date, time, and the file name. All are these 59 00:03:03,959 --> 00:03:08,489 different columns are actually fields. If I want 60 00:03:08,489 --> 00:03:11,729 to print the first field, which is the permission 61 00:03:11,729 --> 00:03:14,969 field, and the third field, which is the user 62 00:03:14,999 --> 00:03:18,599 ownership field, then I could simply do ls -l, 63 00:03:19,049 --> 00:03:23,429 awk it, single quote, curly braces, print, dollar 64 00:03:23,429 --> 00:03:27,809 one, comma dollar three, curly braces, close single 65 00:03:27,809 --> 00:03:30,179 quote, close, and you will see you only gonna get 66 00:03:30,329 --> 00:03:33,149 the first and that third column, see how powerful 67 00:03:33,149 --> 00:03:37,829 awk is. Moving on. What if you want to get the last 68 00:03:37,829 --> 00:03:43,259 column of an output. Now if you do ls -ltr, 69 00:03:43,619 --> 00:03:47,039 and what if you wanted to get this last column? You 70 00:03:47,039 --> 00:03:53,579 could count it by saying 1, 2, 3, 4, 5, 6, 7, 8, 9, that it is the 71 00:03:53,579 --> 00:03:57,839 last column. So you hit up arrow key, and you 72 00:03:57,839 --> 00:04:00,569 just put in dollar nine and it will give you the last 73 00:04:00,569 --> 00:04:03,539 column. But there is an easier way instead of you 74 00:04:03,749 --> 00:04:08,369 for counting every column, you could just type NF. 75 00:04:08,759 --> 00:04:11,849 That will automatically gives you the last column 76 00:04:11,879 --> 00:04:15,959 of the output. So there we go, so it's print 77 00:04:15,959 --> 00:04:19,798 dollar nine is the same as print dollar NF, which 78 00:04:19,798 --> 00:04:21,539 gives you the last column. 79 00:04:22,829 --> 00:04:26,910 Then what if you wanted to search for a specific 80 00:04:26,910 --> 00:04:32,250 word from a file using awk command. Okay, so awk 81 00:04:32,250 --> 00:04:35,760 can also do the search for you. So let's say if I 82 00:04:35,760 --> 00:04:39,900 have a file, cat seinfeld-characters, and in that 83 00:04:39,900 --> 00:04:42,690 file, we have all these characters. What if I only 84 00:04:42,690 --> 00:04:46,800 want to search the line that has the word Jerry? I 85 00:04:46,800 --> 00:04:51,600 could use awk, single quote, slash, Jerry, make sure 86 00:04:51,600 --> 00:04:55,830 you specify the exact as uppercase J, slash, curly 87 00:04:55,830 --> 00:05:03,270 braces, print what it found, and, quote close, and 88 00:05:03,300 --> 00:05:06,390 the file name, which is seinfeld, and it is giving 89 00:05:06,420 --> 00:05:11,520 us the Jerry Seinfeld, which is the first line 90 00:05:11,550 --> 00:05:14,370 from our file. As you can see there is no other 91 00:05:14,580 --> 00:05:20,070 line in my content that has Jerry in it. What if I 92 00:05:20,070 --> 00:05:22,710 wanted to do the same thing but instead I want to 93 00:05:22,710 --> 00:05:29,160 do Seinfeld. So I'll do Seinfeld, hit enter, 94 00:05:29,160 --> 00:05:31,800 and now you will see there are two Seinfelds in 95 00:05:31,800 --> 00:05:35,430 there, Jerry Seinfeld, and Morty Seinfeld. So it 96 00:05:35,430 --> 00:05:38,520 matched the exact keywords that we are trying to 97 00:05:38,520 --> 00:05:43,560 search for. Next command I will try is, what if you 98 00:05:43,560 --> 00:05:47,520 wanted to get the list or the fields that is 99 00:05:47,520 --> 00:05:51,090 separated by a delimiter. In this case, we will 100 00:05:51,090 --> 00:05:54,330 use /etc/passwd again as similar to what we 101 00:05:54,330 --> 00:05:59,290 use in the cut command. So if you do cat /etc/passwd, 102 00:05:59,290 --> 00:06:04,110 and you'll see every field in this file 103 00:06:04,140 --> 00:06:09,000 is separated by a colon. So you could also use awk 104 00:06:09,000 --> 00:06:11,940 command here instead of cut command to get the 105 00:06:12,120 --> 00:06:15,690 fields. So if you want to get the first field, 106 00:06:15,690 --> 00:06:18,330 which tells you the user name only, then you 107 00:06:18,330 --> 00:06:24,600 could use awk dash minus F delimiter, and in this 108 00:06:24,600 --> 00:06:28,740 case, the delimiter is colon, and then single 109 00:06:28,740 --> 00:06:31,860 quote, curly braces, dollar one, curly braces 110 00:06:31,860 --> 00:06:34,950 closed, single quote closed, and /etc/passwd. 111 00:06:36,420 --> 00:06:39,450 And you will see you got all of the user names 112 00:06:39,450 --> 00:06:42,090 from that file. Same thing, what if you want to 113 00:06:42,090 --> 00:06:44,670 get the home directory. So home directory, we know 114 00:06:44,670 --> 00:06:47,730 that it is actually the sixth field. So you type 115 00:06:47,730 --> 00:06:51,270 six, and you'll get all the home directory. So cut 116 00:06:51,270 --> 00:06:54,540 and awk and this regard works similarly. Next one 117 00:06:54,540 --> 00:06:58,680 we'll learn is, what if you wanted to replace a 118 00:06:58,680 --> 00:07:03,210 column that matches a certain keyword. So let's 119 00:07:03,210 --> 00:07:07,080 say if you do echo Hello Tom, and in this case, 120 00:07:07,110 --> 00:07:10,290 the dollar two is Tom, so it will replace dollar 121 00:07:10,290 --> 00:07:14,280 two with Adam, and I'll print it. So let's try it. 122 00:07:14,460 --> 00:07:20,220 So when you do echo, let's say Hello Tom, and you 123 00:07:20,220 --> 00:07:23,880 hit enter, it will echo it out, and it will echo it 124 00:07:23,880 --> 00:07:27,960 back to you right. So if you run this command 125 00:07:27,960 --> 00:07:30,690 again by hitting up arrow key, and you do awk, 126 00:07:31,590 --> 00:07:35,880 single quote, curly braces, dollar two, and in this 127 00:07:35,880 --> 00:07:38,280 case dollar two we all know is Tom, and I want to 128 00:07:38,280 --> 00:07:42,935 replace Tom let's say with Adam, and we type Adam here, 129 00:07:42,935 --> 00:07:50,910 close, and then we do colon, no, a semicolon, 130 00:07:50,910 --> 00:07:55,740 sorry, and then print dollar zero, print dollar 131 00:07:55,740 --> 00:07:59,994 zero, dollar zero meaning whatever it finds 132 00:07:59,994 --> 00:08:04,350 print it now. So we will do curly braces 133 00:08:04,350 --> 00:08:07,830 close, and quote close, single quote close, and hit 134 00:08:07,830 --> 00:08:11,430 enter and you will see now instead of doing Hello 135 00:08:11,460 --> 00:08:15,420 Tom, it is doing Hello Adam, because we replaced the 136 00:08:15,420 --> 00:08:19,140 second column with Adam. Just like that we could 137 00:08:19,140 --> 00:08:22,530 use, we could also use our seinfeld-characters 138 00:08:22,530 --> 00:08:25,410 file. So let's do cat on seinfeld-characters file 139 00:08:25,410 --> 00:08:30,840 right. Now what if I wanted to replace every last 140 00:08:30,840 --> 00:08:34,860 name with my name. So my name is Imran right, as 141 00:08:34,860 --> 00:08:38,039 you all know. So I want my name to be as a last 142 00:08:38,039 --> 00:08:42,090 name for every second column. So in this case, what I 143 00:08:42,090 --> 00:08:48,000 will do is, cat seinfeld, and I will do awk, same 144 00:08:48,000 --> 00:08:51,150 thing, dollar two which is the second column 145 00:08:51,630 --> 00:08:56,610 equals double quote Imran, double quote close, 146 00:08:56,970 --> 00:09:02,580 semicolon, print dollar zero, curly braces close, 147 00:09:02,640 --> 00:09:05,910 single quote close, hit enter, and you will see now 148 00:09:06,210 --> 00:09:10,110 every last name is my name. That's how you could 149 00:09:10,140 --> 00:09:14,190 change, that's how this awk command become so 150 00:09:14,190 --> 00:09:18,660 powerful to make changes to your file. Moving on, 151 00:09:18,690 --> 00:09:23,340 what if you want it to replace the second, oh, I 152 00:09:23,340 --> 00:09:25,650 already did that, perfect. So I'm one step ahead. 153 00:09:26,100 --> 00:09:28,800 This is something that I just showed you. Then 154 00:09:28,800 --> 00:09:33,030 next one is, okay, what if you want to get the 155 00:09:33,030 --> 00:09:37,260 lines that have more than 15 byte size. Okay, so 156 00:09:37,350 --> 00:09:40,950 if you do cat seinfeld, you probably notice there 157 00:09:40,980 --> 00:09:44,370 are some names that are smaller and there are some 158 00:09:44,370 --> 00:09:47,250 names that are far bigger, like this George 159 00:09:47,400 --> 00:09:52,020 Steinbrenner, and Estelle Costanza. So these are 160 00:09:52,020 --> 00:09:54,810 the bigger names and it has a bigger bytes. So 161 00:09:54,810 --> 00:09:58,890 what if I only wanted to get the names of those 162 00:09:58,890 --> 00:10:02,220 lines or the lines that has those names , 163 00:10:02,220 --> 00:10:06,510 that has more than 15 bytes or length 15 164 00:10:06,510 --> 00:10:10,770 characters. So I will do, first of all, I'll do 165 00:10:10,770 --> 00:10:15,960 cat seinfeld, and then I'll do awk, or what I could 166 00:10:15,960 --> 00:10:17,730 do instead of cating it, I'll just start with 167 00:10:17,760 --> 00:10:20,790 awk, and I'll put the file name at the end, then 168 00:10:20,790 --> 00:10:23,359 single quote, length, 169 00:10:23,359 --> 00:10:26,246 [No audio] 170 00:10:26,246 --> 00:10:28,420 parentheses, dollar zero, 171 00:10:28,420 --> 00:10:33,930 parentheses closed, greater than 15, single 172 00:10:33,930 --> 00:10:36,300 quote close, and the name of the file, in this case 173 00:10:36,300 --> 00:10:39,180 seinfeld-character, and you will see it is giving 174 00:10:39,210 --> 00:10:44,460 me those two lines that has the size, the length 175 00:10:44,640 --> 00:10:50,130 that is more than 15. If you do 14, it will find a 176 00:10:50,130 --> 00:10:54,660 few more. It also found, this time it also found 177 00:10:54,900 --> 00:10:58,560 George Costanza. So that's how you could use this 178 00:10:58,590 --> 00:11:02,910 awk command in terms of the length. Then moving 179 00:11:02,910 --> 00:11:07,980 on, get the field of matching seinfeld in slash 180 00:11:07,980 --> 00:11:13,440 home slash iafzal. So when I do, when I go to my 181 00:11:13,440 --> 00:11:17,370 home directory, just type cd, hit pwd, and you will 182 00:11:17,370 --> 00:11:20,280 see you are in your home directory. In my home 183 00:11:20,280 --> 00:11:23,250 directory, when I do ls -l, I have all these 184 00:11:23,250 --> 00:11:27,330 files and directories. Now what if I wanted to get 185 00:11:27,330 --> 00:11:32,160 the list that only matches seinfeld. I'll do ls 186 00:11:32,160 --> 00:11:37,920 -l, awk, single quote, curly braces, if, 187 00:11:38,880 --> 00:11:42,180 parentheses, dollar nine, and we all know dollar 188 00:11:42,180 --> 00:11:44,730 nine is the last column, you could do dollar nine 189 00:11:44,730 --> 00:11:49,110 or you could do NF, equal equal, double quote, 190 00:11:49,110 --> 00:11:51,030 [No audio] 191 00:11:51,030 --> 00:11:56,040 seinfeld. If it matches seinfeld, then you have to 192 00:11:56,040 --> 00:12:01,890 print it, so print, dollar zero, semicolon, and then 193 00:12:01,890 --> 00:12:05,010 you have to close it with curly braces, and then a 194 00:12:05,010 --> 00:12:08,520 single quote, and hit enter. And you will see, oops, 195 00:12:08,520 --> 00:12:11,700 I got some error message. Let's see, where's that 196 00:12:11,730 --> 00:12:18,330 ls -l, awk, if dollar nine equals seinfeld, 197 00:12:18,360 --> 00:12:21,720 oh, I didn't put the parentheses right here. So I 198 00:12:21,720 --> 00:12:25,350 hit up arrow key, I have to put the parentheses to 199 00:12:25,350 --> 00:12:30,540 close my argument, and now hit enter, and it will 200 00:12:30,540 --> 00:12:35,130 give me only the file or a directory that matched 201 00:12:35,340 --> 00:12:38,370 the last field or the ninth field as equal to 202 00:12:38,370 --> 00:12:42,720 seinfeld. Then the last one I'll give you an 203 00:12:42,720 --> 00:12:46,020 example of the awk command is, what if you want to 204 00:12:46,020 --> 00:12:49,740 find the total numbers of fields in a file or in 205 00:12:49,740 --> 00:12:53,580 an output. So let's say if you run ls -l here, 206 00:12:53,610 --> 00:12:57,420 and you count it to find out which column that you 207 00:12:57,420 --> 00:13:00,840 want it to get, right. So let's say if you were 208 00:13:00,840 --> 00:13:03,000 looking for the last column, and you want to know 209 00:13:03,000 --> 00:13:05,880 how many columns this output has, you could 210 00:13:05,880 --> 00:13:12,180 count 1, 2, 3, 4, 5, 6, 7, 8, 9, or you could just simply run the 211 00:13:12,180 --> 00:13:17,160 command ls -l, and awk it, print it, NF, no 212 00:13:17,160 --> 00:13:21,210 no dollar NF, just NF, curly braces close, and 213 00:13:21,540 --> 00:13:23,610 single quote close, hit enter, and it will tell you 214 00:13:23,640 --> 00:13:28,500 it has nine fields. The top one, that you see 2, 215 00:13:28,650 --> 00:13:30,750 it is actually when you do ls -l, you're gonna 216 00:13:30,750 --> 00:13:33,930 see right here on top, it says total 280. So you 217 00:13:33,930 --> 00:13:36,120 could just ignore that, it's basically giving you 218 00:13:36,120 --> 00:13:40,890 there are two fields, and the rest every line has 219 00:13:40,890 --> 00:13:45,930 nine fields. So this is all about awk. There are 220 00:13:45,930 --> 00:13:48,780 just, there are, there's so much that you could do 221 00:13:48,780 --> 00:13:52,080 with the awk command that I probably I don't have 222 00:13:52,080 --> 00:13:55,410 time to cover, but you could go ahead and try and do 223 00:13:55,410 --> 00:13:59,520 always man on awk to find out what are the other 224 00:13:59,550 --> 00:14:03,690 options that you have. There are also gawk, pawk, 225 00:14:03,720 --> 00:14:07,110 gpawk, dgawk, and there's different ones that you 226 00:14:07,110 --> 00:14:09,480 could use. And there are different options that 227 00:14:09,480 --> 00:14:12,090 you could use. Try every one of them if you have 228 00:14:12,090 --> 00:14:16,920 time, so you have a better grip on the command awk. 229 00:14:16,920 --> 00:14:19,222 [No audio]