1
00:00:07,020 --> 00:00:10,320
- In modern Linux, we have this
important feature, cgroups.

2
00:00:10,320 --> 00:00:12,690
Cgroups is all about resource allocation,

3
00:00:12,690 --> 00:00:15,090
and reservation, and limitation.

4
00:00:15,090 --> 00:00:18,274
And systemd is taking care of it.

5
00:00:18,274 --> 00:00:22,980
So cgroups or control groups
plays resources in controllers

6
00:00:22,980 --> 00:00:25,860
that represent the type of resource.

7
00:00:25,860 --> 00:00:28,650
The most significant
default controllers are cpu,

8
00:00:28,650 --> 00:00:30,540
memory, and blkio,

9
00:00:30,540 --> 00:00:33,960
and they allow you to work
with well CPU restrictions,

10
00:00:33,960 --> 00:00:36,513
memory restrictions, as well as block IO.

11
00:00:37,350 --> 00:00:40,470
And these controllers are
subdivided in a tree structure

12
00:00:40,470 --> 00:00:41,970
where different weights or limits

13
00:00:41,970 --> 00:00:44,122
are applied to each branch.

14
00:00:44,122 --> 00:00:46,590
Each of these branches is a cgroups,

15
00:00:46,590 --> 00:00:49,727
and one or more processes
are assigned to a cgroup.

16
00:00:51,053 --> 00:00:53,700
Cgroups can be applied
from the command line

17
00:00:53,700 --> 00:00:54,903
or from systemd.

18
00:00:55,920 --> 00:00:58,950
In the past, you could
manually create cgroups

19
00:00:58,950 --> 00:01:03,360
using the cgconfig service
and the cgred process.

20
00:01:03,360 --> 00:01:06,390
But then we are talking
about the early 2000s

21
00:01:06,390 --> 00:01:09,513
before systemd became a relevant thing.

22
00:01:10,530 --> 00:01:13,410
In all cases, cgroup settings are written

23
00:01:13,410 --> 00:01:16,560
to /sys/fs/cgroups.

24
00:01:16,560 --> 00:01:19,530
So that's a sys file system
that we talked about before.

25
00:01:19,530 --> 00:01:21,210
The pseudo file system that is used

26
00:01:21,210 --> 00:01:23,220
for managing hardware properties.

27
00:01:23,220 --> 00:01:24,480
And that is where you can go

28
00:01:24,480 --> 00:01:27,030
if you wanna know what is
happening under the hood.

29
00:01:27,900 --> 00:01:31,110
In a cgroups environment, we have slices.

30
00:01:31,110 --> 00:01:33,930
And slices are the primary division

31
00:01:33,930 --> 00:01:35,343
of your operating system.

32
00:01:36,480 --> 00:01:39,750
They apply to CPU, to blkio, to memory,

33
00:01:39,750 --> 00:01:41,340
and that's three of them.

34
00:01:41,340 --> 00:01:45,450
The system slice, which is for
system processes and daemons.

35
00:01:45,450 --> 00:01:48,030
The machine slice, which
is for virtual machines,

36
00:01:48,030 --> 00:01:49,500
as well as containers.

37
00:01:49,500 --> 00:01:52,680
And the user slice, which
is for user settings.

38
00:01:52,680 --> 00:01:55,590
Every user gets its own slice by default,

39
00:01:55,590 --> 00:01:59,850
which perfectly isolates the
resources allocated to one user

40
00:01:59,850 --> 00:02:03,000
from the resources
allocated to another user,

41
00:02:03,000 --> 00:02:06,030
and which guarantees that
every user has the same claim

42
00:02:06,030 --> 00:02:07,443
to system resources.

43
00:02:09,120 --> 00:02:10,800
Apart from these default slices,

44
00:02:10,800 --> 00:02:12,903
you can also create custom slices.

45
00:02:14,700 --> 00:02:16,590
All right, let me make a drawing

46
00:02:16,590 --> 00:02:20,520
to explain how the systemd
slices relate to one another.

47
00:02:20,520 --> 00:02:25,520
So we have the system slice,
we have the user slice,

48
00:02:26,670 --> 00:02:28,353
and there is the machine slice.

49
00:02:30,000 --> 00:02:32,403
And these are pairs to one another.

50
00:02:33,600 --> 00:02:35,678
And when we talk about cgroups,

51
00:02:35,678 --> 00:02:38,790
you might be working with CPU shares.

52
00:02:38,790 --> 00:02:42,330
And the CPU shares are assigned
to the different slices.

53
00:02:42,330 --> 00:02:47,330
So if all of the slices
have CPU shares of 1024,

54
00:02:47,370 --> 00:02:51,510
that is the relative weight
between the different slices.

55
00:02:51,510 --> 00:02:53,520
That means that at the slice level,

56
00:02:53,520 --> 00:02:56,070
if you have full activity
in all of the slices,

57
00:02:56,070 --> 00:03:00,183
and each slice will get 1/3 of
the available CPU resources.

58
00:03:01,230 --> 00:03:03,480
Now the interesting thing
is that within a slice,

59
00:03:03,480 --> 00:03:06,090
and that goes for each of these slices,

60
00:03:06,090 --> 00:03:07,743
you can work with scopes.

61
00:03:09,750 --> 00:03:14,250
And on these scopes you
can set CPU shares as well.

62
00:03:14,250 --> 00:03:19,250
So let's say that we have
1024 here and 1024 here,

63
00:03:19,560 --> 00:03:23,867
but in a machine slice,
we also have scopes

64
00:03:23,867 --> 00:03:27,300
and maybe in a machine
slice we have four of them

65
00:03:27,300 --> 00:03:30,270
with 1024 each.

66
00:03:30,270 --> 00:03:34,200
Then this 1024 relates to this one.

67
00:03:34,200 --> 00:03:38,130
So the scopes decide within the slice

68
00:03:38,130 --> 00:03:40,470
how much CPU shares they get.

69
00:03:40,470 --> 00:03:42,720
Likewise for here.

70
00:03:42,720 --> 00:03:44,918
But even if this machine slice

71
00:03:44,918 --> 00:03:47,550
has twice the amount of scopes,

72
00:03:47,550 --> 00:03:50,520
at the slice level, it's still 1024.

73
00:03:50,520 --> 00:03:53,280
So if you have four processes
running here in the scopes

74
00:03:53,280 --> 00:03:55,170
and two processes running here,

75
00:03:55,170 --> 00:03:57,930
this will get half of the CPU resources.

76
00:03:57,930 --> 00:04:00,330
And all these processes
in the machine slice

77
00:04:00,330 --> 00:04:03,480
get help of the CPU resources as well.

78
00:04:03,480 --> 00:04:07,752
Within a scope or directly within a slice,

79
00:04:07,752 --> 00:04:12,030
you can have the different
services or units.

80
00:04:12,030 --> 00:04:13,470
And each system, the unit,

81
00:04:13,470 --> 00:04:15,646
well, typically that will be a service,

82
00:04:15,646 --> 00:04:18,060
will get its CPU share as well.

83
00:04:18,060 --> 00:04:20,793
So 1024, and 512, and 2048,

84
00:04:22,990 --> 00:04:27,095
where the numbers explain relative weight

85
00:04:27,095 --> 00:04:32,095
between these different
services within the same level,

86
00:04:32,160 --> 00:04:34,500
but still within the
context of the limitation

87
00:04:34,500 --> 00:04:36,000
that applies to the slice,

88
00:04:36,000 --> 00:04:38,790
which is within the
context of the limitation

89
00:04:38,790 --> 00:04:41,610
that applies to the system unit.

90
00:04:41,610 --> 00:04:46,610
As a result, some surprising
things might be happening.

91
00:04:46,931 --> 00:04:49,590
Let's say we have user Bob,

92
00:04:49,590 --> 00:04:52,170
and user Bob is starting a process.

93
00:04:52,170 --> 00:04:53,190
Now, what do we get?

94
00:04:53,190 --> 00:04:57,870
We get a bob slice at that moment.

95
00:04:57,870 --> 00:05:02,540
And this bob slice, if user
Bob is the only user around,

96
00:05:02,540 --> 00:05:06,037
then user Bob is starting
a very active job.

97
00:05:06,037 --> 00:05:10,680
He will get all the CPU
shares within the user slice.

98
00:05:10,680 --> 00:05:12,450
And that means that one user

99
00:05:12,450 --> 00:05:16,300
is capable of getting an
equal amount of CPU shares

100
00:05:16,300 --> 00:05:20,040
as all of your units
within the system scope.

101
00:05:20,040 --> 00:05:22,390
And that's definitely
something to be aware of.

102
00:05:24,060 --> 00:05:26,160
So let me show you how to use cgroups

103
00:05:26,160 --> 00:05:28,530
in a systemd environment.

104
00:05:28,530 --> 00:05:30,540
All right, in order to run this demo,

105
00:05:30,540 --> 00:05:33,510
you need access to the
course Git repository.

106
00:05:33,510 --> 00:05:35,430
In case you have not yet installed it,

107
00:05:35,430 --> 00:05:40,430
git clone
https://github.com/sandervanvugt/lfcs.

108
00:05:45,330 --> 00:05:47,820
And in that course Git repository,

109
00:05:47,820 --> 00:05:51,270
you will find stress1.service,
stress2.service,

110
00:05:51,270 --> 00:05:54,338
which are custom systemd unit files.

111
00:05:54,338 --> 00:05:56,790
So what is in there?

112
00:05:56,790 --> 00:06:00,810
Well, stress1.service is
a very simple service.

113
00:06:00,810 --> 00:06:04,740
The type is simple and it
is running a dd process.

114
00:06:04,740 --> 00:06:09,740
And this dd process is going
to cost a 100% system load.

115
00:06:10,110 --> 00:06:11,310
And CPUShares,

116
00:06:11,310 --> 00:06:14,370
that's a parameter that
this demo is all about.

117
00:06:14,370 --> 00:06:16,110
CPUShares is set to 1024.

118
00:06:18,780 --> 00:06:21,720
If you have a look at the
contents of stress2.service,

119
00:06:21,720 --> 00:06:23,461
you can see it's very similar,

120
00:06:23,461 --> 00:06:28,293
with the only difference that
CPUShares is set to 2048.

121
00:06:28,293 --> 00:06:32,225
The meaning is that the CPUShares 2048

122
00:06:32,225 --> 00:06:37,225
is getting twice the amount of
CPU cycles as the other one.

123
00:06:37,590 --> 00:06:40,015
Now, in order to run
these custom services,

124
00:06:40,015 --> 00:06:44,446
we should copy them to
the appropriate location,

125
00:06:44,446 --> 00:06:47,043
which will be /etc/systemd/system.

126
00:06:49,320 --> 00:06:52,380
You remember there's
/usr/lib/systemd/system,

127
00:06:52,380 --> 00:06:55,230
that's where unit files
that come from packages.

128
00:06:55,230 --> 00:06:57,454
So you shouldn't touch them yourself.

129
00:06:57,454 --> 00:07:01,590
There is /etc/systemd system,
that's for your own stuff.

130
00:07:01,590 --> 00:07:04,590
And this is typically my own stuff.

131
00:07:04,590 --> 00:07:09,590
Now, let's run it,
systemctl start on stress1,

132
00:07:09,930 --> 00:07:11,070
as well as stress2.

133
00:07:11,070 --> 00:07:14,133
And then let's observe
on top what is going on.

134
00:07:15,420 --> 00:07:17,160
And what do we see?

135
00:07:17,160 --> 00:07:19,560
Well, we see something
that might surprise you.

136
00:07:19,560 --> 00:07:21,270
And that is that the dd processes

137
00:07:21,270 --> 00:07:25,440
are about consuming almost 100%.

138
00:07:25,440 --> 00:07:28,230
So we don't see the
difference in CPUShares.

139
00:07:28,230 --> 00:07:30,240
And that's a very good reason for that,

140
00:07:30,240 --> 00:07:34,500
and that is because this
is a multi-CPU system.

141
00:07:34,500 --> 00:07:36,960
Let me press 1 from the top interface.

142
00:07:36,960 --> 00:07:38,460
Let's have a look at the third line

143
00:07:38,460 --> 00:07:41,340
where you can now see Cpu0 and Cpu1.

144
00:07:41,340 --> 00:07:42,867
We have multiple CPUs,

145
00:07:42,867 --> 00:07:45,780
and that is why this
demo seems to be failing.

146
00:07:45,780 --> 00:07:50,370
But fortunately, thanks to
the sys pseudo file system,

147
00:07:50,370 --> 00:07:52,920
there's something that we can do about it.

148
00:07:52,920 --> 00:07:57,920
I am going to use echo 0 > /sys/bus/.

149
00:08:01,830 --> 00:08:06,830
And then I need /cpu/devices/cpu1/online.

150
00:08:09,900 --> 00:08:11,250
Now what is this?

151
00:08:11,250 --> 00:08:15,418
Well, the
/sys/bus/cpu/devices/cpu1/online file

152
00:08:15,418 --> 00:08:20,418
is what you use to enable or disable Cpu1.

153
00:08:21,120 --> 00:08:23,190
So if I echo a zero to that,

154
00:08:23,190 --> 00:08:25,410
then suddenly we have a one CPU system

155
00:08:25,410 --> 00:08:27,750
instead of a two CPU system.

156
00:08:27,750 --> 00:08:29,490
And if you get back to top,

157
00:08:29,490 --> 00:08:31,950
I press 1 again from the top interface.

158
00:08:31,950 --> 00:08:34,950
Now, we can see that indeed,
there is only one CPU.

159
00:08:34,950 --> 00:08:38,250
And you can also see
the cgroups in action.

160
00:08:38,250 --> 00:08:40,620
So we have one dd process process

161
00:08:40,620 --> 00:08:45,570
getting twice the amount of
CPU cycles as the other one.

162
00:08:45,570 --> 00:08:48,600
But, hey, there is one thing
that you need to be aware of.

163
00:08:48,600 --> 00:08:50,013
Let me open a new window.

164
00:08:50,940 --> 00:08:51,990
And in this new window,

165
00:08:51,990 --> 00:08:54,600
let me move it to the lower-right corner

166
00:08:54,600 --> 00:08:57,363
so that we can still see
top in the background.

167
00:08:58,890 --> 00:09:00,120
As an ordinary user,

168
00:09:00,120 --> 00:09:03,903
I'm going to use while
true; do true; done.

169
00:09:04,980 --> 00:09:07,230
And now we need to
observe what is happening.

170
00:09:09,960 --> 00:09:11,370
You see what's happening?

171
00:09:11,370 --> 00:09:13,980
Best process getting about 50%,

172
00:09:13,980 --> 00:09:18,030
and both dd processes
getting about 50% as well.

173
00:09:18,030 --> 00:09:18,863
How come?

174
00:09:18,863 --> 00:09:21,030
Well, that is for the simple reason

175
00:09:21,030 --> 00:09:23,010
that we are in the user slice

176
00:09:23,010 --> 00:09:26,520
for this while true; do true; thing

177
00:09:26,520 --> 00:09:29,940
and we are in the system
slice for the dd processes.

178
00:09:29,940 --> 00:09:32,850
And by default, the user
slice and the system slice

179
00:09:32,850 --> 00:09:34,380
have an equal weight.

180
00:09:34,380 --> 00:09:37,260
And that is why one ordinary user process

181
00:09:37,260 --> 00:09:41,580
is capable of of pushing
away these other processes,

182
00:09:41,580 --> 00:09:43,320
and that's not good.

183
00:09:43,320 --> 00:09:47,910
Let me use Control + C, and
let me quit these these things

184
00:09:47,910 --> 00:09:49,620
so that we can move forward.

185
00:09:49,620 --> 00:09:51,225
Oh, one thing by the way.

186
00:09:51,225 --> 00:09:56,225
I want to use systemd-cg for cgroup.

187
00:09:57,720 --> 00:10:01,950
And there we have a cgroup
top and a cgroup ls.

188
00:10:01,950 --> 00:10:05,981
So here is a systemd-cgtop

189
00:10:05,981 --> 00:10:09,180
where you can see where the activity is.

190
00:10:09,180 --> 00:10:11,010
And you can see clearly indicated,

191
00:10:11,010 --> 00:10:13,860
stress2.service and stress1.service

192
00:10:13,860 --> 00:10:16,443
are the most active processes right now.

193
00:10:17,340 --> 00:10:19,770
An alternative view is cgls.

194
00:10:19,770 --> 00:10:23,460
Cgls is showing a list of
everything that is happening

195
00:10:23,460 --> 00:10:25,110
in all of the cgroups.

196
00:10:25,110 --> 00:10:26,250
This is not convenient

197
00:10:26,250 --> 00:10:29,070
for monitoring what's going on right now

198
00:10:29,070 --> 00:10:31,290
and which process is busiest,

199
00:10:31,290 --> 00:10:34,050
but it is convenient to get an overview

200
00:10:34,050 --> 00:10:37,353
of all these different
slices and services.

201
00:10:38,730 --> 00:10:42,217
Now, let me finish this by
using killall on $ pidof dd.

202
00:10:47,340 --> 00:10:49,620
And that should make make them go away.

203
00:10:49,620 --> 00:10:50,463
Back to top.

204
00:10:52,020 --> 00:10:55,230
And, oh boy, the killall
$ pid dd didn't work

205
00:10:55,230 --> 00:10:56,883
so let me kill it manually.

206
00:10:58,440 --> 00:11:01,290
Oh, there is number one, and K to kill,

207
00:11:01,290 --> 00:11:02,610
and there is number two.

208
00:11:02,610 --> 00:11:04,503
And now they are gone.

209
00:11:05,460 --> 00:11:07,140
There's only one thing remaining,

210
00:11:07,140 --> 00:11:09,930
and that's a few commands ago.

211
00:11:09,930 --> 00:11:12,900
I offlined my Cpu1.

212
00:11:12,900 --> 00:11:17,900
Now, by echoing a 1 into
/sys/bus/cpu/devices/cpu1/online,

213
00:11:17,970 --> 00:11:21,120
I am getting it back online again.

214
00:11:21,120 --> 00:11:24,930
And you can still hear probably
my computer making noise

215
00:11:24,930 --> 00:11:26,610
because it has been so busy.

216
00:11:26,610 --> 00:11:29,850
That will calm down in not too much time

217
00:11:29,850 --> 00:11:33,750
because the really occupying processes

218
00:11:33,750 --> 00:11:36,180
are now no longer active.

219
00:11:36,180 --> 00:11:40,110
As you can see, the load
average is already decreasing.

220
00:11:40,110 --> 00:11:41,345
It's going slowly,

221
00:11:41,345 --> 00:11:46,345
but it will reach value below
one in not too much time.

222
00:11:46,510 --> 00:11:49,320
Now, one more thing about these cgroups.

223
00:11:49,320 --> 00:11:50,850
You remember the modification

224
00:11:50,850 --> 00:11:54,660
that we made earlier to sshd.service?

225
00:11:54,660 --> 00:11:55,893
Let me show it again.

226
00:11:56,970 --> 00:11:58,410
There we go.

227
00:11:58,410 --> 00:12:00,750
MemoryMax is four megabytes.

228
00:12:00,750 --> 00:12:03,307
That's also cgroup functionality.

229
00:12:03,307 --> 00:12:06,060
In Linux, on a process level,

230
00:12:06,060 --> 00:12:09,330
you can set the maximum amount of memory.

231
00:12:09,330 --> 00:12:12,360
And you know where else
cgroups are being used?

232
00:12:12,360 --> 00:12:13,443
In containers.

233
00:12:14,340 --> 00:12:17,520
We'll later talk about
containers as isolated processes.

234
00:12:17,520 --> 00:12:20,610
And cgroups are one of
the important pillars

235
00:12:20,610 --> 00:12:23,790
in the working of containers,
together with namespaces,

236
00:12:23,790 --> 00:12:26,463
but we will talk about
it in more detail later.