1 00:00:06,840 --> 00:00:09,540 - In this video we are going to talk about I/O monitoring 2 00:00:09,540 --> 00:00:10,373 and tuning. 3 00:00:11,460 --> 00:00:14,250 So, if you wanna know what's going on with your I/O, 4 00:00:14,250 --> 00:00:16,710 you should start using the top utility. 5 00:00:16,710 --> 00:00:18,900 Because, the top utility gives generic system 6 00:00:18,900 --> 00:00:20,640 use its information. 7 00:00:20,640 --> 00:00:23,733 And you should particularly check out the weight parameter. 8 00:00:25,560 --> 00:00:27,330 Then there is a I/O top utility. 9 00:00:27,330 --> 00:00:31,020 I/O top is specifically focusing on I/O performance 10 00:00:31,020 --> 00:00:33,420 and it's offering a top like interface 11 00:00:33,420 --> 00:00:35,730 and that is quite convenient. 12 00:00:35,730 --> 00:00:37,533 You can press Q to exit. 13 00:00:38,490 --> 00:00:40,530 You might need to do a kernel setting go, 14 00:00:40,530 --> 00:00:41,610 but we will see that 15 00:00:41,610 --> 00:00:44,490 I/O top is complaining if the kernel setting is not done 16 00:00:44,490 --> 00:00:46,560 and you can do it at that moment. 17 00:00:46,560 --> 00:00:48,780 Also convenient is vm stat. 18 00:00:48,780 --> 00:00:52,440 vm stat is giving an overview of current I/O activity. 19 00:00:52,440 --> 00:00:53,283 Let me show you. 20 00:00:55,830 --> 00:00:57,840 All right, so, well, 21 00:00:57,840 --> 00:01:01,320 before we are going to explore these utilities, 22 00:01:01,320 --> 00:01:03,693 let me write a small scale script. 23 00:01:06,060 --> 00:01:07,293 I/O stress. 24 00:01:13,830 --> 00:01:16,890 Because, it's so much nicer to have a look at this 25 00:01:16,890 --> 00:01:18,783 if something really is going on. 26 00:01:25,770 --> 00:01:26,850 So, as you can imagine, 27 00:01:26,850 --> 00:01:28,320 this is an infinite loop. 28 00:01:28,320 --> 00:01:30,240 And in this infinite loop, 29 00:01:30,240 --> 00:01:31,920 we are copying the entire contents 30 00:01:31,920 --> 00:01:35,910 of the input file dev sda to the output file dev null. 31 00:01:35,910 --> 00:01:36,900 So, to nowhere. 32 00:01:36,900 --> 00:01:38,070 And they should create 33 00:01:38,070 --> 00:01:40,593 at least some read performance stress. 34 00:01:42,690 --> 00:01:44,343 So let's make this executable. 35 00:01:47,670 --> 00:01:48,503 And run it. 36 00:01:52,860 --> 00:01:54,720 And now we can have a look at top 37 00:01:54,720 --> 00:01:56,520 to see if there's anything going on. 38 00:01:58,290 --> 00:01:59,250 So, what do we see? 39 00:01:59,250 --> 00:02:01,503 We see a very busy DD process. 40 00:02:02,400 --> 00:02:04,650 And we see a lot of activity in system space 41 00:02:04,650 --> 00:02:06,090 on the third line. 42 00:02:06,090 --> 00:02:07,140 And there we go. 43 00:02:07,140 --> 00:02:08,867 The WA parameter 44 00:02:08,867 --> 00:02:10,230 WA is waiting for I/O. 45 00:02:10,230 --> 00:02:11,970 And it's waiting for I/O, 46 00:02:11,970 --> 00:02:15,900 that's an indicator that something is not going alright 47 00:02:15,900 --> 00:02:17,823 with your I/O performance. 48 00:02:18,660 --> 00:02:19,920 So, once you have noticed 49 00:02:19,920 --> 00:02:21,840 that this WA parameter occurs, 50 00:02:21,840 --> 00:02:23,790 then you need to further zoom in 51 00:02:23,790 --> 00:02:26,640 and that is where I/O top comes in. 52 00:02:26,640 --> 00:02:30,360 I/O top in most cases is not installed by default. 53 00:02:30,360 --> 00:02:32,943 I might have already installed it before. 54 00:02:35,130 --> 00:02:36,600 But here we go. 55 00:02:36,600 --> 00:02:39,450 This will be sure DNF install I/O top. 56 00:02:39,450 --> 00:02:41,940 Just ignore the output of the DD command. 57 00:02:41,940 --> 00:02:45,060 That is, that's happening in my screen. 58 00:02:45,060 --> 00:02:46,740 Just a quick check to verify. 59 00:02:46,740 --> 00:02:50,010 Yeah, I was verifying that the script is still running. 60 00:02:50,010 --> 00:02:55,010 And DD is going to show us output every now and then. 61 00:02:55,200 --> 00:02:56,640 Now here we have I/O top. 62 00:02:56,640 --> 00:02:57,480 And what do we see? 63 00:02:57,480 --> 00:02:59,880 We see that I/O top isn't showing anything. 64 00:02:59,880 --> 00:03:02,250 But if you look at the lower part of the screen, 65 00:03:02,250 --> 00:03:06,240 then you can see kernel . task delay 66 00:03:06,240 --> 00:03:10,080 accounting sysctl not enabled in in the kernel. 67 00:03:10,080 --> 00:03:12,130 So, that's something that we need to fix. 68 00:03:13,470 --> 00:03:14,493 So, let's do that. 69 00:03:15,870 --> 00:03:19,173 By going to proc sys kernel. 70 00:03:21,000 --> 00:03:22,680 And the parameter that we are looking for 71 00:03:22,680 --> 00:03:24,570 is task delay accounting. 72 00:03:24,570 --> 00:03:25,670 So, what does it have? 73 00:03:27,060 --> 00:03:28,560 It has a zero. 74 00:03:28,560 --> 00:03:30,270 And if you want to see what is going on, 75 00:03:30,270 --> 00:03:32,640 this parameter needs to be on. 76 00:03:32,640 --> 00:03:35,910 So, now I'm going to try I/O top again. 77 00:03:35,910 --> 00:03:37,230 And at this point we can see 78 00:03:37,230 --> 00:03:40,293 I/O top clearly indicating what is going on. 79 00:03:41,280 --> 00:03:42,600 So, what do we see? 80 00:03:42,600 --> 00:03:45,000 Well, we see the DD process. 81 00:03:45,000 --> 00:03:47,177 And the DD process giving a lot of 82 00:03:47,177 --> 00:03:48,810 lot of activity. 83 00:03:48,810 --> 00:03:52,313 Oops, that's the DD output messing up 84 00:03:52,313 --> 00:03:54,210 what we are looking at. 85 00:03:54,210 --> 00:03:55,800 You can see the amount of disc reads. 86 00:03:55,800 --> 00:03:58,230 There's no disk writes because, we are just reading 87 00:03:58,230 --> 00:04:00,630 and we are redirecting to the no device. 88 00:04:00,630 --> 00:04:04,380 And that is indicating what is happening I/O wise. 89 00:04:04,380 --> 00:04:06,600 It's also showing the percentage of I/O. 90 00:04:06,600 --> 00:04:08,400 So, this is pretty nice, 91 00:04:08,400 --> 00:04:10,650 pretty nice overview of what is happening 92 00:04:10,650 --> 00:04:12,273 with the I/O on your system. 93 00:04:15,180 --> 00:04:18,720 Now, related to I/O top, there's also vm stat. 94 00:04:18,720 --> 00:04:20,640 Let me do vmstat 2 10 95 00:04:20,640 --> 00:04:24,150 where 2 indicates that I wanna do a 2 second interval. 96 00:04:24,150 --> 00:04:26,370 And I wanna do it 10 times. 97 00:04:26,370 --> 00:04:30,000 And there we can see what exactly is happening. 98 00:04:30,000 --> 00:04:31,620 This is what we are looking at 99 00:04:31,620 --> 00:04:33,764 SISO for Swap In Swap Out. 100 00:04:33,764 --> 00:04:36,543 BIBO for Block In and Block Out. 101 00:04:37,620 --> 00:04:40,140 Now the generic thing that you should be looking at 102 00:04:40,140 --> 00:04:41,370 again 103 00:04:41,370 --> 00:04:42,510 is top. 104 00:04:42,510 --> 00:04:43,950 So, what do we see in top? 105 00:04:43,950 --> 00:04:46,890 Well, in top we need to make a conclusion 106 00:04:46,890 --> 00:04:50,070 are we suffering from poor performance? 107 00:04:50,070 --> 00:04:52,500 And if you are suffering from poor performance, 108 00:04:52,500 --> 00:04:54,540 there's some optimization to be done. 109 00:04:54,540 --> 00:04:55,740 Now, what is the indicator? 110 00:04:55,740 --> 00:04:56,850 Indicator number one 111 00:04:56,850 --> 00:05:00,660 is that we have some significant waiting going on. 112 00:05:00,660 --> 00:05:02,850 Indicator number two 113 00:05:02,850 --> 00:05:04,980 is that we 114 00:05:04,980 --> 00:05:08,400 we do also have 42% in idle loop 115 00:05:08,400 --> 00:05:11,400 and that means that this system is not completely saturated. 116 00:05:12,360 --> 00:05:14,000 If it is, you need to 117 00:05:14,000 --> 00:05:17,130 to look at how to reorganize your storage channel. 118 00:05:17,130 --> 00:05:19,200 I can show you one thing. 119 00:05:19,200 --> 00:05:21,693 That is in the sys directory. 120 00:05:23,340 --> 00:05:25,620 There is a block sub directory 121 00:05:25,620 --> 00:05:27,030 and in this block sub directory 122 00:05:27,030 --> 00:05:31,710 we have one sub directory for every single disc device. 123 00:05:31,710 --> 00:05:34,443 So, here we have the sub directory for sda. 124 00:05:35,520 --> 00:05:38,100 In which, we have a queue sub directory. 125 00:05:38,100 --> 00:05:40,383 And in this queue sub directory, 126 00:05:41,640 --> 00:05:43,293 we have the scheduler. 127 00:05:44,790 --> 00:05:46,530 So, if you get on the scheduler, 128 00:05:46,530 --> 00:05:48,030 we can see the current scheduler. 129 00:05:48,030 --> 00:05:50,280 So that's mq deadline 130 00:05:50,280 --> 00:05:52,500 and there are other schedulers involved as well. 131 00:05:52,500 --> 00:05:54,660 Kyber, bfq, and none. 132 00:05:54,660 --> 00:05:57,090 What you can try as a quick fix 133 00:05:57,090 --> 00:05:58,800 is to change the scheduler. 134 00:05:58,800 --> 00:06:03,003 I'm going to echo none to this scheduler file. 135 00:06:03,990 --> 00:06:06,690 And as you can see that changes it immediately. 136 00:06:06,690 --> 00:06:09,150 The reason I'm using none as a scheduler, 137 00:06:09,150 --> 00:06:11,340 is because this is a virtual machine 138 00:06:11,340 --> 00:06:12,930 and in virtual machines, 139 00:06:12,930 --> 00:06:14,310 it's typically the hypervisor 140 00:06:14,310 --> 00:06:16,680 taking care of the I/O scheduling 141 00:06:16,680 --> 00:06:19,110 and that means that Linux shouldn't be involved 142 00:06:19,110 --> 00:06:22,410 and that is why none might give you better, 143 00:06:22,410 --> 00:06:23,970 better performance. 144 00:06:23,970 --> 00:06:25,260 Next, we can check in top 145 00:06:25,260 --> 00:06:28,080 if there is any significant improvement. 146 00:06:28,080 --> 00:06:29,760 And do we see significant improvement? 147 00:06:29,760 --> 00:06:33,420 Probably not, but that's because the test is too simple. 148 00:06:33,420 --> 00:06:37,290 With this, you now know a little bit about monitoring I/O. 149 00:06:37,290 --> 00:06:40,443 And, also a little bit about tuning of I/O performance. 150 00:06:41,340 --> 00:06:43,140 Let's have a look at the next video.