1 00:00:06,840 --> 00:00:10,020 - The next utility that I wanna talk about is awk. 2 00:00:10,020 --> 00:00:11,387 So what is awk? 3 00:00:11,387 --> 00:00:13,590 Awk is a great utility 4 00:00:13,590 --> 00:00:17,147 for performing more complex operations on text. 5 00:00:17,147 --> 00:00:20,190 It is using advanced features that make it 6 00:00:20,190 --> 00:00:21,813 into a scripting language, 7 00:00:23,100 --> 00:00:26,940 but these advanced features are not used that often anymore 8 00:00:26,940 --> 00:00:30,540 but awk can make complex task more easier. 9 00:00:30,540 --> 00:00:34,320 If you are going to try to find more information about awk, 10 00:00:34,320 --> 00:00:36,600 you will find entire books that have been written 11 00:00:36,600 --> 00:00:37,433 about awk. 12 00:00:37,433 --> 00:00:38,640 Now, why is that? 13 00:00:38,640 --> 00:00:41,130 That's because of the history of the command. 14 00:00:41,130 --> 00:00:43,770 Awk was invented in a time that computers did not 15 00:00:43,770 --> 00:00:45,720 have graphical screens next to them, 16 00:00:45,720 --> 00:00:49,920 and working with computers was mainly line oriented. 17 00:00:49,920 --> 00:00:51,450 In those days, 18 00:00:51,450 --> 00:00:56,310 we needed very powerful tools to program text cells. 19 00:00:56,310 --> 00:00:59,370 It's basically to automatically treat text cells 20 00:00:59,370 --> 00:01:02,550 and to program manipulations into your text cells. 21 00:01:02,550 --> 00:01:04,320 And that's where awk is coming from. 22 00:01:04,320 --> 00:01:06,810 And likewise, where set is coming from. 23 00:01:06,810 --> 00:01:10,170 That's the utility that is going to be up next. 24 00:01:10,170 --> 00:01:12,510 The confusing thing about it is that whenever 25 00:01:12,510 --> 00:01:13,343 you feel like, 26 00:01:13,343 --> 00:01:16,800 Hey, I wanna know more about awk and about set, 27 00:01:16,800 --> 00:01:20,130 then you will probably end up with one of these books 28 00:01:20,130 --> 00:01:22,260 that are describing every single feature 29 00:01:22,260 --> 00:01:23,700 of awk as well as set. 30 00:01:23,700 --> 00:01:26,970 But the thing is, most of these features 31 00:01:26,970 --> 00:01:30,420 are kind of not so useful anymore. 32 00:01:30,420 --> 00:01:34,230 So what I've done, I've listed four examples for awk 33 00:01:34,230 --> 00:01:37,560 and these are the examples that I want you to 34 00:01:37,560 --> 00:01:38,661 understand. 35 00:01:38,661 --> 00:01:40,680 Here on the slide, you can find them, 36 00:01:40,680 --> 00:01:42,690 if you want to experiment for yourself. 37 00:01:42,690 --> 00:01:43,980 Let me just demonstrate so 38 00:01:43,980 --> 00:01:45,780 that we can see what they are doing. 39 00:01:48,150 --> 00:01:50,160 So to start with, we have awk, 40 00:01:50,160 --> 00:01:52,920 and then the awk operation. 41 00:01:52,920 --> 00:01:57,920 The awk operation is between single quotes and curly braces 42 00:01:59,010 --> 00:02:00,450 as we can see here. 43 00:02:00,450 --> 00:02:05,310 So we have print dollar zero on ETC pass WD. 44 00:02:05,310 --> 00:02:06,180 Now, what is that doing? 45 00:02:06,180 --> 00:02:08,430 Well, that's just printing everything. 46 00:02:08,430 --> 00:02:11,610 How about if you make that print dollar one, 47 00:02:11,610 --> 00:02:13,470 that's also printing. 48 00:02:13,470 --> 00:02:14,550 Is that everything? 49 00:02:14,550 --> 00:02:15,960 No, it's not everything. 50 00:02:15,960 --> 00:02:18,270 It's printing the first field, 51 00:02:18,270 --> 00:02:19,800 but if you look closely, 52 00:02:19,800 --> 00:02:21,570 then you can see it has a hard time 53 00:02:21,570 --> 00:02:24,655 recognizing the fields. 54 00:02:24,655 --> 00:02:27,330 So if you want to help awk a little bit, 55 00:02:27,330 --> 00:02:29,700 and if you want to print a specific field, 56 00:02:29,700 --> 00:02:33,300 then you need to add a field separator. 57 00:02:33,300 --> 00:02:35,880 Minus F is that in an awk environment, 58 00:02:35,880 --> 00:02:39,360 so if you use awk, minus F colon print dollar one, 59 00:02:39,360 --> 00:02:43,350 then it knows that it needs colons as the field separator 60 00:02:43,350 --> 00:02:47,823 and it'll be successful in finding this field. 61 00:02:49,350 --> 00:02:51,450 Now let's do another manipulation and that's 62 00:02:51,450 --> 00:02:52,990 awk length 63 00:02:57,000 --> 00:03:01,683 dollar zero greater than 40 on ETC pass WD. 64 00:03:04,560 --> 00:03:06,300 Now, what is that? 65 00:03:06,300 --> 00:03:08,610 Well, it's looking for length dollar zero. 66 00:03:08,610 --> 00:03:10,890 Dollar zero is the entire line. 67 00:03:10,890 --> 00:03:14,817 And it'll only print if the length is greater than 40, 68 00:03:14,817 --> 00:03:15,780 and you can see 69 00:03:15,780 --> 00:03:18,390 that it limits the number of lines that we get. 70 00:03:18,390 --> 00:03:21,720 If you even make it longer, let's make it 50, 71 00:03:21,720 --> 00:03:22,980 you can see that the number 72 00:03:22,980 --> 00:03:25,620 of lines is even getting shorter. 73 00:03:25,620 --> 00:03:28,170 And how far can we go? 74 00:03:28,170 --> 00:03:29,520 Let's try 70. 75 00:03:29,520 --> 00:03:33,120 And there, you can see only a couple of lines. 76 00:03:33,120 --> 00:03:36,570 So this is allowing you to do filtering criteria. 77 00:03:36,570 --> 00:03:40,140 You only want to see lines that are longer than whatever. 78 00:03:40,140 --> 00:03:41,874 This is what you wanna do. 79 00:03:41,874 --> 00:03:45,900 Let's try to understand the historical background of that. 80 00:03:45,900 --> 00:03:48,180 The historical background is that 81 00:03:48,180 --> 00:03:51,180 if you have a text cell with lines longer than 40 82 00:03:51,180 --> 00:03:54,300 then they might not print correctly. 83 00:03:54,300 --> 00:03:58,230 So you might need to do an operation and split these lines. 84 00:03:58,230 --> 00:04:01,860 And that is why a command like this is useful. 85 00:04:01,860 --> 00:04:04,590 Okay. Last example with awk. 86 00:04:04,590 --> 00:04:09,590 Let's print the entire contents of the past WDFL again, 87 00:04:11,010 --> 00:04:12,393 so that I can explain. 88 00:04:14,070 --> 00:04:16,470 If you look at the last line, 89 00:04:16,470 --> 00:04:20,340 that is the line that contains the text Linda right here 90 00:04:20,340 --> 00:04:23,430 This last line, then you can see that in past WD 91 00:04:23,430 --> 00:04:25,800 we have a couple of fields. 92 00:04:25,800 --> 00:04:29,490 The fields are separated by a colon. 93 00:04:29,490 --> 00:04:31,200 Now it's field number three, 94 00:04:31,200 --> 00:04:33,600 field number four, these are useful. 95 00:04:33,600 --> 00:04:35,010 They contain the user ID 96 00:04:35,010 --> 00:04:37,590 and the group ID, which are unique IDs. 97 00:04:37,590 --> 00:04:40,500 And that's what I want to be looking for. 98 00:04:40,500 --> 00:04:43,290 Now how can awk help us with that? 99 00:04:43,290 --> 00:04:48,290 Well, I want to use awk to find the user ID for user Linda. 100 00:04:49,080 --> 00:04:51,180 So how am I going to do that using awk? 101 00:04:51,180 --> 00:04:54,600 Well, I am going to use awk minus F colon 102 00:04:54,600 --> 00:04:58,540 and then I'm using my search pattern slash Linda slash 103 00:04:59,550 --> 00:05:04,550 print dollar three, for instance, on ETC pass WD. 104 00:05:06,810 --> 00:05:07,890 And what is that doing? 105 00:05:07,890 --> 00:05:10,980 Well that is showing the user ID 106 00:05:10,980 --> 00:05:13,410 for user Linda using awk, 107 00:05:13,410 --> 00:05:14,550 and in case you're wondering 108 00:05:14,550 --> 00:05:16,770 do we really need to use awk for that? 109 00:05:16,770 --> 00:05:17,910 Well, of course not. 110 00:05:17,910 --> 00:05:21,342 You can also use different commands to build it up. 111 00:05:21,342 --> 00:05:26,342 Grep Linda on ETC pass WD, showing you the entire line. 112 00:05:27,120 --> 00:05:32,120 And if we next use cut minus D colon minus F three, 113 00:05:34,080 --> 00:05:36,003 then we get the same result. 114 00:05:37,050 --> 00:05:39,330 That is something that you will frequently see 115 00:05:39,330 --> 00:05:40,980 in Linux environments. 116 00:05:40,980 --> 00:05:45,810 There are different solutions to get to the same result. 117 00:05:45,810 --> 00:05:47,040 And if you think 118 00:05:47,040 --> 00:05:48,600 that awk is too complex, 119 00:05:48,600 --> 00:05:49,950 I can understand that. 120 00:05:49,950 --> 00:05:51,240 Just use a combination 121 00:05:51,240 --> 00:05:53,910 of utilities that you are more comfortable with 122 00:05:53,910 --> 00:05:55,680 and you will get there as well. 123 00:05:55,680 --> 00:05:57,990 And that's the most important lesson to be learned 124 00:05:57,990 --> 00:06:01,740 about awk and also about set, which will be coming up next. 125 00:06:01,740 --> 00:06:05,040 You can do pretty nice things with these utilities, 126 00:06:05,040 --> 00:06:08,250 but you don't have to do it with these utilities. 127 00:06:08,250 --> 00:06:09,870 You can do it in a different way. 128 00:06:09,870 --> 00:06:11,043 That's fine as well.