1 00:00:07,020 --> 00:00:10,320 - In modern Linux, we have this important feature, cgroups. 2 00:00:10,320 --> 00:00:12,690 Cgroups is all about resource allocation, 3 00:00:12,690 --> 00:00:15,090 and reservation, and limitation. 4 00:00:15,090 --> 00:00:18,274 And systemd is taking care of it. 5 00:00:18,274 --> 00:00:22,980 So cgroups or control groups plays resources in controllers 6 00:00:22,980 --> 00:00:25,860 that represent the type of resource. 7 00:00:25,860 --> 00:00:28,650 The most significant default controllers are cpu, 8 00:00:28,650 --> 00:00:30,540 memory, and blkio, 9 00:00:30,540 --> 00:00:33,960 and they allow you to work with well CPU restrictions, 10 00:00:33,960 --> 00:00:36,513 memory restrictions, as well as block IO. 11 00:00:37,350 --> 00:00:40,470 And these controllers are subdivided in a tree structure 12 00:00:40,470 --> 00:00:41,970 where different weights or limits 13 00:00:41,970 --> 00:00:44,122 are applied to each branch. 14 00:00:44,122 --> 00:00:46,590 Each of these branches is a cgroups, 15 00:00:46,590 --> 00:00:49,727 and one or more processes are assigned to a cgroup. 16 00:00:51,053 --> 00:00:53,700 Cgroups can be applied from the command line 17 00:00:53,700 --> 00:00:54,903 or from systemd. 18 00:00:55,920 --> 00:00:58,950 In the past, you could manually create cgroups 19 00:00:58,950 --> 00:01:03,360 using the cgconfig service and the cgred process. 20 00:01:03,360 --> 00:01:06,390 But then we are talking about the early 2000s 21 00:01:06,390 --> 00:01:09,513 before systemd became a relevant thing. 22 00:01:10,530 --> 00:01:13,410 In all cases, cgroup settings are written 23 00:01:13,410 --> 00:01:16,560 to /sys/fs/cgroups. 24 00:01:16,560 --> 00:01:19,530 So that's a sys file system that we talked about before. 25 00:01:19,530 --> 00:01:21,210 The pseudo file system that is used 26 00:01:21,210 --> 00:01:23,220 for managing hardware properties. 27 00:01:23,220 --> 00:01:24,480 And that is where you can go 28 00:01:24,480 --> 00:01:27,030 if you wanna know what is happening under the hood. 29 00:01:27,900 --> 00:01:31,110 In a cgroups environment, we have slices. 30 00:01:31,110 --> 00:01:33,930 And slices are the primary division 31 00:01:33,930 --> 00:01:35,343 of your operating system. 32 00:01:36,480 --> 00:01:39,750 They apply to CPU, to blkio, to memory, 33 00:01:39,750 --> 00:01:41,340 and that's three of them. 34 00:01:41,340 --> 00:01:45,450 The system slice, which is for system processes and daemons. 35 00:01:45,450 --> 00:01:48,030 The machine slice, which is for virtual machines, 36 00:01:48,030 --> 00:01:49,500 as well as containers. 37 00:01:49,500 --> 00:01:52,680 And the user slice, which is for user settings. 38 00:01:52,680 --> 00:01:55,590 Every user gets its own slice by default, 39 00:01:55,590 --> 00:01:59,850 which perfectly isolates the resources allocated to one user 40 00:01:59,850 --> 00:02:03,000 from the resources allocated to another user, 41 00:02:03,000 --> 00:02:06,030 and which guarantees that every user has the same claim 42 00:02:06,030 --> 00:02:07,443 to system resources. 43 00:02:09,120 --> 00:02:10,800 Apart from these default slices, 44 00:02:10,800 --> 00:02:12,903 you can also create custom slices. 45 00:02:14,700 --> 00:02:16,590 All right, let me make a drawing 46 00:02:16,590 --> 00:02:20,520 to explain how the systemd slices relate to one another. 47 00:02:20,520 --> 00:02:25,520 So we have the system slice, we have the user slice, 48 00:02:26,670 --> 00:02:28,353 and there is the machine slice. 49 00:02:30,000 --> 00:02:32,403 And these are pairs to one another. 50 00:02:33,600 --> 00:02:35,678 And when we talk about cgroups, 51 00:02:35,678 --> 00:02:38,790 you might be working with CPU shares. 52 00:02:38,790 --> 00:02:42,330 And the CPU shares are assigned to the different slices. 53 00:02:42,330 --> 00:02:47,330 So if all of the slices have CPU shares of 1024, 54 00:02:47,370 --> 00:02:51,510 that is the relative weight between the different slices. 55 00:02:51,510 --> 00:02:53,520 That means that at the slice level, 56 00:02:53,520 --> 00:02:56,070 if you have full activity in all of the slices, 57 00:02:56,070 --> 00:03:00,183 and each slice will get 1/3 of the available CPU resources. 58 00:03:01,230 --> 00:03:03,480 Now the interesting thing is that within a slice, 59 00:03:03,480 --> 00:03:06,090 and that goes for each of these slices, 60 00:03:06,090 --> 00:03:07,743 you can work with scopes. 61 00:03:09,750 --> 00:03:14,250 And on these scopes you can set CPU shares as well. 62 00:03:14,250 --> 00:03:19,250 So let's say that we have 1024 here and 1024 here, 63 00:03:19,560 --> 00:03:23,867 but in a machine slice, we also have scopes 64 00:03:23,867 --> 00:03:27,300 and maybe in a machine slice we have four of them 65 00:03:27,300 --> 00:03:30,270 with 1024 each. 66 00:03:30,270 --> 00:03:34,200 Then this 1024 relates to this one. 67 00:03:34,200 --> 00:03:38,130 So the scopes decide within the slice 68 00:03:38,130 --> 00:03:40,470 how much CPU shares they get. 69 00:03:40,470 --> 00:03:42,720 Likewise for here. 70 00:03:42,720 --> 00:03:44,918 But even if this machine slice 71 00:03:44,918 --> 00:03:47,550 has twice the amount of scopes, 72 00:03:47,550 --> 00:03:50,520 at the slice level, it's still 1024. 73 00:03:50,520 --> 00:03:53,280 So if you have four processes running here in the scopes 74 00:03:53,280 --> 00:03:55,170 and two processes running here, 75 00:03:55,170 --> 00:03:57,930 this will get half of the CPU resources. 76 00:03:57,930 --> 00:04:00,330 And all these processes in the machine slice 77 00:04:00,330 --> 00:04:03,480 get help of the CPU resources as well. 78 00:04:03,480 --> 00:04:07,752 Within a scope or directly within a slice, 79 00:04:07,752 --> 00:04:12,030 you can have the different services or units. 80 00:04:12,030 --> 00:04:13,470 And each system, the unit, 81 00:04:13,470 --> 00:04:15,646 well, typically that will be a service, 82 00:04:15,646 --> 00:04:18,060 will get its CPU share as well. 83 00:04:18,060 --> 00:04:20,793 So 1024, and 512, and 2048, 84 00:04:22,990 --> 00:04:27,095 where the numbers explain relative weight 85 00:04:27,095 --> 00:04:32,095 between these different services within the same level, 86 00:04:32,160 --> 00:04:34,500 but still within the context of the limitation 87 00:04:34,500 --> 00:04:36,000 that applies to the slice, 88 00:04:36,000 --> 00:04:38,790 which is within the context of the limitation 89 00:04:38,790 --> 00:04:41,610 that applies to the system unit. 90 00:04:41,610 --> 00:04:46,610 As a result, some surprising things might be happening. 91 00:04:46,931 --> 00:04:49,590 Let's say we have user Bob, 92 00:04:49,590 --> 00:04:52,170 and user Bob is starting a process. 93 00:04:52,170 --> 00:04:53,190 Now, what do we get? 94 00:04:53,190 --> 00:04:57,870 We get a bob slice at that moment. 95 00:04:57,870 --> 00:05:02,540 And this bob slice, if user Bob is the only user around, 96 00:05:02,540 --> 00:05:06,037 then user Bob is starting a very active job. 97 00:05:06,037 --> 00:05:10,680 He will get all the CPU shares within the user slice. 98 00:05:10,680 --> 00:05:12,450 And that means that one user 99 00:05:12,450 --> 00:05:16,300 is capable of getting an equal amount of CPU shares 100 00:05:16,300 --> 00:05:20,040 as all of your units within the system scope. 101 00:05:20,040 --> 00:05:22,390 And that's definitely something to be aware of. 102 00:05:24,060 --> 00:05:26,160 So let me show you how to use cgroups 103 00:05:26,160 --> 00:05:28,530 in a systemd environment. 104 00:05:28,530 --> 00:05:30,540 All right, in order to run this demo, 105 00:05:30,540 --> 00:05:33,510 you need access to the course Git repository. 106 00:05:33,510 --> 00:05:35,430 In case you have not yet installed it, 107 00:05:35,430 --> 00:05:40,430 git clone https://github.com/sandervanvugt/lfcs. 108 00:05:45,330 --> 00:05:47,820 And in that course Git repository, 109 00:05:47,820 --> 00:05:51,270 you will find stress1.service, stress2.service, 110 00:05:51,270 --> 00:05:54,338 which are custom systemd unit files. 111 00:05:54,338 --> 00:05:56,790 So what is in there? 112 00:05:56,790 --> 00:06:00,810 Well, stress1.service is a very simple service. 113 00:06:00,810 --> 00:06:04,740 The type is simple and it is running a dd process. 114 00:06:04,740 --> 00:06:09,740 And this dd process is going to cost a 100% system load. 115 00:06:10,110 --> 00:06:11,310 And CPUShares, 116 00:06:11,310 --> 00:06:14,370 that's a parameter that this demo is all about. 117 00:06:14,370 --> 00:06:16,110 CPUShares is set to 1024. 118 00:06:18,780 --> 00:06:21,720 If you have a look at the contents of stress2.service, 119 00:06:21,720 --> 00:06:23,461 you can see it's very similar, 120 00:06:23,461 --> 00:06:28,293 with the only difference that CPUShares is set to 2048. 121 00:06:28,293 --> 00:06:32,225 The meaning is that the CPUShares 2048 122 00:06:32,225 --> 00:06:37,225 is getting twice the amount of CPU cycles as the other one. 123 00:06:37,590 --> 00:06:40,015 Now, in order to run these custom services, 124 00:06:40,015 --> 00:06:44,446 we should copy them to the appropriate location, 125 00:06:44,446 --> 00:06:47,043 which will be /etc/systemd/system. 126 00:06:49,320 --> 00:06:52,380 You remember there's /usr/lib/systemd/system, 127 00:06:52,380 --> 00:06:55,230 that's where unit files that come from packages. 128 00:06:55,230 --> 00:06:57,454 So you shouldn't touch them yourself. 129 00:06:57,454 --> 00:07:01,590 There is /etc/systemd system, that's for your own stuff. 130 00:07:01,590 --> 00:07:04,590 And this is typically my own stuff. 131 00:07:04,590 --> 00:07:09,590 Now, let's run it, systemctl start on stress1, 132 00:07:09,930 --> 00:07:11,070 as well as stress2. 133 00:07:11,070 --> 00:07:14,133 And then let's observe on top what is going on. 134 00:07:15,420 --> 00:07:17,160 And what do we see? 135 00:07:17,160 --> 00:07:19,560 Well, we see something that might surprise you. 136 00:07:19,560 --> 00:07:21,270 And that is that the dd processes 137 00:07:21,270 --> 00:07:25,440 are about consuming almost 100%. 138 00:07:25,440 --> 00:07:28,230 So we don't see the difference in CPUShares. 139 00:07:28,230 --> 00:07:30,240 And that's a very good reason for that, 140 00:07:30,240 --> 00:07:34,500 and that is because this is a multi-CPU system. 141 00:07:34,500 --> 00:07:36,960 Let me press 1 from the top interface. 142 00:07:36,960 --> 00:07:38,460 Let's have a look at the third line 143 00:07:38,460 --> 00:07:41,340 where you can now see Cpu0 and Cpu1. 144 00:07:41,340 --> 00:07:42,867 We have multiple CPUs, 145 00:07:42,867 --> 00:07:45,780 and that is why this demo seems to be failing. 146 00:07:45,780 --> 00:07:50,370 But fortunately, thanks to the sys pseudo file system, 147 00:07:50,370 --> 00:07:52,920 there's something that we can do about it. 148 00:07:52,920 --> 00:07:57,920 I am going to use echo 0 > /sys/bus/. 149 00:08:01,830 --> 00:08:06,830 And then I need /cpu/devices/cpu1/online. 150 00:08:09,900 --> 00:08:11,250 Now what is this? 151 00:08:11,250 --> 00:08:15,418 Well, the /sys/bus/cpu/devices/cpu1/online file 152 00:08:15,418 --> 00:08:20,418 is what you use to enable or disable Cpu1. 153 00:08:21,120 --> 00:08:23,190 So if I echo a zero to that, 154 00:08:23,190 --> 00:08:25,410 then suddenly we have a one CPU system 155 00:08:25,410 --> 00:08:27,750 instead of a two CPU system. 156 00:08:27,750 --> 00:08:29,490 And if you get back to top, 157 00:08:29,490 --> 00:08:31,950 I press 1 again from the top interface. 158 00:08:31,950 --> 00:08:34,950 Now, we can see that indeed, there is only one CPU. 159 00:08:34,950 --> 00:08:38,250 And you can also see the cgroups in action. 160 00:08:38,250 --> 00:08:40,620 So we have one dd process process 161 00:08:40,620 --> 00:08:45,570 getting twice the amount of CPU cycles as the other one. 162 00:08:45,570 --> 00:08:48,600 But, hey, there is one thing that you need to be aware of. 163 00:08:48,600 --> 00:08:50,013 Let me open a new window. 164 00:08:50,940 --> 00:08:51,990 And in this new window, 165 00:08:51,990 --> 00:08:54,600 let me move it to the lower-right corner 166 00:08:54,600 --> 00:08:57,363 so that we can still see top in the background. 167 00:08:58,890 --> 00:09:00,120 As an ordinary user, 168 00:09:00,120 --> 00:09:03,903 I'm going to use while true; do true; done. 169 00:09:04,980 --> 00:09:07,230 And now we need to observe what is happening. 170 00:09:09,960 --> 00:09:11,370 You see what's happening? 171 00:09:11,370 --> 00:09:13,980 Best process getting about 50%, 172 00:09:13,980 --> 00:09:18,030 and both dd processes getting about 50% as well. 173 00:09:18,030 --> 00:09:18,863 How come? 174 00:09:18,863 --> 00:09:21,030 Well, that is for the simple reason 175 00:09:21,030 --> 00:09:23,010 that we are in the user slice 176 00:09:23,010 --> 00:09:26,520 for this while true; do true; thing 177 00:09:26,520 --> 00:09:29,940 and we are in the system slice for the dd processes. 178 00:09:29,940 --> 00:09:32,850 And by default, the user slice and the system slice 179 00:09:32,850 --> 00:09:34,380 have an equal weight. 180 00:09:34,380 --> 00:09:37,260 And that is why one ordinary user process 181 00:09:37,260 --> 00:09:41,580 is capable of of pushing away these other processes, 182 00:09:41,580 --> 00:09:43,320 and that's not good. 183 00:09:43,320 --> 00:09:47,910 Let me use Control + C, and let me quit these these things 184 00:09:47,910 --> 00:09:49,620 so that we can move forward. 185 00:09:49,620 --> 00:09:51,225 Oh, one thing by the way. 186 00:09:51,225 --> 00:09:56,225 I want to use systemd-cg for cgroup. 187 00:09:57,720 --> 00:10:01,950 And there we have a cgroup top and a cgroup ls. 188 00:10:01,950 --> 00:10:05,981 So here is a systemd-cgtop 189 00:10:05,981 --> 00:10:09,180 where you can see where the activity is. 190 00:10:09,180 --> 00:10:11,010 And you can see clearly indicated, 191 00:10:11,010 --> 00:10:13,860 stress2.service and stress1.service 192 00:10:13,860 --> 00:10:16,443 are the most active processes right now. 193 00:10:17,340 --> 00:10:19,770 An alternative view is cgls. 194 00:10:19,770 --> 00:10:23,460 Cgls is showing a list of everything that is happening 195 00:10:23,460 --> 00:10:25,110 in all of the cgroups. 196 00:10:25,110 --> 00:10:26,250 This is not convenient 197 00:10:26,250 --> 00:10:29,070 for monitoring what's going on right now 198 00:10:29,070 --> 00:10:31,290 and which process is busiest, 199 00:10:31,290 --> 00:10:34,050 but it is convenient to get an overview 200 00:10:34,050 --> 00:10:37,353 of all these different slices and services. 201 00:10:38,730 --> 00:10:42,217 Now, let me finish this by using killall on $ pidof dd. 202 00:10:47,340 --> 00:10:49,620 And that should make make them go away. 203 00:10:49,620 --> 00:10:50,463 Back to top. 204 00:10:52,020 --> 00:10:55,230 And, oh boy, the killall $ pid dd didn't work 205 00:10:55,230 --> 00:10:56,883 so let me kill it manually. 206 00:10:58,440 --> 00:11:01,290 Oh, there is number one, and K to kill, 207 00:11:01,290 --> 00:11:02,610 and there is number two. 208 00:11:02,610 --> 00:11:04,503 And now they are gone. 209 00:11:05,460 --> 00:11:07,140 There's only one thing remaining, 210 00:11:07,140 --> 00:11:09,930 and that's a few commands ago. 211 00:11:09,930 --> 00:11:12,900 I offlined my Cpu1. 212 00:11:12,900 --> 00:11:17,900 Now, by echoing a 1 into /sys/bus/cpu/devices/cpu1/online, 213 00:11:17,970 --> 00:11:21,120 I am getting it back online again. 214 00:11:21,120 --> 00:11:24,930 And you can still hear probably my computer making noise 215 00:11:24,930 --> 00:11:26,610 because it has been so busy. 216 00:11:26,610 --> 00:11:29,850 That will calm down in not too much time 217 00:11:29,850 --> 00:11:33,750 because the really occupying processes 218 00:11:33,750 --> 00:11:36,180 are now no longer active. 219 00:11:36,180 --> 00:11:40,110 As you can see, the load average is already decreasing. 220 00:11:40,110 --> 00:11:41,345 It's going slowly, 221 00:11:41,345 --> 00:11:46,345 but it will reach value below one in not too much time. 222 00:11:46,510 --> 00:11:49,320 Now, one more thing about these cgroups. 223 00:11:49,320 --> 00:11:50,850 You remember the modification 224 00:11:50,850 --> 00:11:54,660 that we made earlier to sshd.service? 225 00:11:54,660 --> 00:11:55,893 Let me show it again. 226 00:11:56,970 --> 00:11:58,410 There we go. 227 00:11:58,410 --> 00:12:00,750 MemoryMax is four megabytes. 228 00:12:00,750 --> 00:12:03,307 That's also cgroup functionality. 229 00:12:03,307 --> 00:12:06,060 In Linux, on a process level, 230 00:12:06,060 --> 00:12:09,330 you can set the maximum amount of memory. 231 00:12:09,330 --> 00:12:12,360 And you know where else cgroups are being used? 232 00:12:12,360 --> 00:12:13,443 In containers. 233 00:12:14,340 --> 00:12:17,520 We'll later talk about containers as isolated processes. 234 00:12:17,520 --> 00:12:20,610 And cgroups are one of the important pillars 235 00:12:20,610 --> 00:12:23,790 in the working of containers, together with namespaces, 236 00:12:23,790 --> 00:12:26,463 but we will talk about it in more detail later.