1
00:00:00,000 --> 00:00:07,166
[No Audio]

2
00:00:07,167 --> 00:00:09,000
In this video we will be explaining the

3
00:00:09,001 --> 00:00:11,833
concept of barrier. A barrier enables

4
00:00:11,834 --> 00:00:13,866
multiple threads to synchronize the beginning

5
00:00:13,867 --> 00:00:16,433
of some computation, you may consider it like

6
00:00:16,434 --> 00:00:19,266
an obstacle or a point in a code, which will

7
00:00:19,267 --> 00:00:21,900
hault the execution of some threads until all

8
00:00:21,901 --> 00:00:24,400
the threads have executed the code up to that

9
00:00:24,401 --> 00:00:27,266
particular point. When all the thread reaches

10
00:00:27,267 --> 00:00:29,400
the point, they will be allowed for further

11
00:00:29,401 --> 00:00:31,366
execution of the remaining of their code.

12
00:00:31,966 --> 00:00:33,900
This is typically used to synchronize the

13
00:00:33,901 --> 00:00:36,600
beginning of some computation. Let us look at

14
00:00:36,601 --> 00:00:39,166
an example to understand this. I will first

15
00:00:39,167 --> 00:00:41,700
first bring the relevant modules into scope.

16
00:00:41,701 --> 00:00:46,666
[No Audio]

17
00:00:46,667 --> 00:00:48,666
Next, I will create a vector of threads.

18
00:00:48,667 --> 00:00:54,433
[No Audio]

19
00:00:54,434 --> 00:00:58,033
Now I will create a new barrier using the new function.

20
00:00:58,034 --> 00:01:03,200
[No Audio]

21
00:01:03,201 --> 00:01:05,633
Please note that since this variable needs to

22
00:01:05,634 --> 00:01:07,166
be passed between multiple threads,

23
00:01:07,167 --> 00:01:09,866
therefore, I have to wrap it inside the

24
00:01:09,900 --> 00:01:12,900
Atomic Reference Counting smart pointer. This

25
00:01:12,901 --> 00:01:14,933
will allow multiple threads to use it in a

26
00:01:14,934 --> 00:01:18,000
correct way. Next, I will iterate 10 times,

27
00:01:18,001 --> 00:01:20,066
and during each iteration, I will create a

28
00:01:20,067 --> 00:01:22,566
thread which will do some computation. So I

29
00:01:22,567 --> 00:01:25,166
will include a for loop for this purpose.

30
00:01:25,167 --> 00:01:29,833
[No Audio]

31
00:01:29,834 --> 00:01:31,766
Inside the loop, I will first create a clone

32
00:01:31,767 --> 00:01:35,400
of the variable barrier. We can do this

33
00:01:35,401 --> 00:01:38,033
because it is of type Arcs smart pointer

34
00:01:38,066 --> 00:01:40,600
which has this function for allowing multiple

35
00:01:40,633 --> 00:01:42,900
owners of the data in a thread safe way.

36
00:01:44,866 --> 00:01:47,566
Next, I will create a thread during each iteration.

37
00:01:47,567 --> 00:01:53,166
[No Audio]

38
00:01:53,167 --> 00:01:55,133
Inside the thread, I will have some code to

39
00:01:55,134 --> 00:01:57,300
reflect some sort of computation inside the

40
00:01:57,301 --> 00:01:59,566
thread. So I will include a simple print statement.

41
00:01:59,567 --> 00:02:05,233
[No Audio]

42
00:02:05,234 --> 00:02:06,166
I want all the

43
00:02:06,167 --> 00:02:08,133
threads that are being created inside the

44
00:02:08,134 --> 00:02:10,765
loop to execute this print command, and then

45
00:02:10,766 --> 00:02:12,966
wait until all the threads are done with

46
00:02:13,000 --> 00:02:15,433
executing this command. In other words, I

47
00:02:15,434 --> 00:02:18,066
want all the threads, all the threads code to

48
00:02:18,067 --> 00:02:20,700
get synchronized at this particular point. To

49
00:02:20,701 --> 00:02:23,466
do so I will call the wait function on the barrier.

50
00:02:23,467 --> 00:02:28,566
[No Audio]

51
00:02:28,567 --> 00:02:30,800
This will block the current thread until all

52
00:02:30,801 --> 00:02:32,700
the threads have executed up to this

53
00:02:32,701 --> 00:02:34,800
particular point. In other words, it will

54
00:02:34,801 --> 00:02:37,333
block the threads until all the threads meet

55
00:02:37,366 --> 00:02:41,333
at this point in code. After this line, I

56
00:02:41,334 --> 00:02:43,200
will add one more print statement.

57
00:02:43,201 --> 00:02:47,800
[No Audio]

58
00:02:47,801 --> 00:02:50,400
This statement will only execute once all the

59
00:02:50,401 --> 00:02:52,500
threads are done executing the respective

60
00:02:52,501 --> 00:02:55,466
code up to the line on which on which each

61
00:02:55,467 --> 00:02:57,600
specific thread is making a call to the

62
00:02:57,601 --> 00:03:00,633
wait function. Outside the thread I will

63
00:03:00,634 --> 00:03:02,533
collect all the threads in the thread vector.

64
00:03:02,534 --> 00:03:07,200
[No Audio]

65
00:03:07,201 --> 00:03:09,300
Finally, I will make sure that all the

66
00:03:09,301 --> 00:03:11,066
threads should go to completion. So I will

67
00:03:11,067 --> 00:03:13,866
call the join on all of them inside the loop.

68
00:03:13,867 --> 00:03:23,033
[No Audio]

69
00:03:23,034 --> 00:03:25,000
Now let us execute the code, and then we will

70
00:03:25,001 --> 00:03:26,800
explain some of the output.

71
00:03:26,801 --> 00:03:30,161
[No Audio]

72
00:03:30,162 --> 00:03:31,200
You may note that

73
00:03:31,201 --> 00:03:33,533
the first print line which was before the

74
00:03:33,534 --> 00:03:35,866
wait on the barrier in each of the threads

75
00:03:36,300 --> 00:03:38,433
are executed for the first five threads, and

76
00:03:38,434 --> 00:03:40,300
afterwards the remaining of the code is being

77
00:03:40,301 --> 00:03:42,666
executed. At the end of the terminal, you may

78
00:03:42,667 --> 00:03:44,733
note that all the second print statements are

79
00:03:44,734 --> 00:03:47,833
being executed. Let us explain the behavior

80
00:03:47,834 --> 00:03:51,300
of the results now in the main. In the main,

81
00:03:51,301 --> 00:03:53,433
the value of the barrier variable is set to 5.

82
00:03:53,434 --> 00:03:56,100
This means that the first five threads

83
00:03:56,101 --> 00:03:58,233
which will call this function will be blocked

84
00:03:58,500 --> 00:04:00,666
until all of them are being synchronized.

85
00:04:01,366 --> 00:04:03,666
This further means that when all the threads

86
00:04:03,667 --> 00:04:06,000
have executed their respective code until the

87
00:04:06,001 --> 00:04:07,800
point where they are calling the wait on

88
00:04:07,801 --> 00:04:10,700
the barrier, then they will be unblocked.

89
00:04:11,033 --> 00:04:13,966
Therefore the first five threads we are

90
00:04:13,967 --> 00:04:17,366
creating are being blocked. When the first

91
00:04:17,367 --> 00:04:19,065
five threads have displayed the message

92
00:04:19,066 --> 00:04:21,500
before wait, which is an indication that

93
00:04:21,501 --> 00:04:23,566
they are done with the code before the call

94
00:04:23,567 --> 00:04:26,666
to the wait so they will be unblocked, the

95
00:04:26,667 --> 00:04:28,966
Rust will allow the remaining of the code in

96
00:04:28,967 --> 00:04:31,233
the respective threads to execute. However,

97
00:04:31,466 --> 00:04:33,733
what happens is that, since we are creating 10

98
00:04:33,734 --> 00:04:36,100
threads. So therefore when the remaining five

99
00:04:36,101 --> 00:04:38,966
threads makes a call to the wait, they are

100
00:04:38,967 --> 00:04:41,800
also blocked, and they will be released once

101
00:04:41,801 --> 00:04:43,833
all of them are done with the code up to the

102
00:04:43,834 --> 00:04:46,666
point where they are calling. We are making a

103
00:04:46,667 --> 00:04:49,300
call to the wait function. In summary, the

104
00:04:49,301 --> 00:04:51,266
first five threads will be blocked, and once

105
00:04:51,267 --> 00:04:53,433
they are released the next five threads will

106
00:04:53,434 --> 00:04:55,700
be blocked. If we change the value of the

107
00:04:55,701 --> 00:04:58,800
barrier to date of 10, then in that case 10

108
00:04:58,801 --> 00:05:01,400
threads will be blocked. Now what will happen

109
00:05:01,401 --> 00:05:03,500
if I change the iteration of the loop to date

110
00:05:03,501 --> 00:05:05,500
of 3 instead of 10? Let me change.

111
00:05:07,333 --> 00:05:09,800
In this case, we are only creating three

112
00:05:09,801 --> 00:05:11,866
threads. So all the three threads will be

113
00:05:11,867 --> 00:05:14,533
blocked, when they reach to the line where

114
00:05:14,534 --> 00:05:17,566
they are calling the function of wait. Since

115
00:05:17,567 --> 00:05:20,066
the barrier has a value of five, which will

116
00:05:20,067 --> 00:05:23,166
only resume once the execution of

117
00:05:23,167 --> 00:05:25,366
five threads up to this particular point are

118
00:05:25,367 --> 00:05:28,733
complete. Since we are not creating any more

119
00:05:28,734 --> 00:05:30,466
threads, so therefore, all the threads will

120
00:05:30,467 --> 00:05:32,300
keep on waiting and will never go to

121
00:05:32,301 --> 00:05:35,033
completion. And this is because we are we are

122
00:05:35,034 --> 00:05:37,533
only creating three threads and not five

123
00:05:37,534 --> 00:05:40,300
threads. This example highlights a possible

124
00:05:40,301 --> 00:05:42,866
issue with the barrier there is it may lead

125
00:05:42,867 --> 00:05:45,433
to blocking of your program if misused or

126
00:05:45,434 --> 00:05:48,533
used in a wrong way. But let us now cover a

127
00:05:48,534 --> 00:05:50,700
nice use case where the barriers will be

128
00:05:50,701 --> 00:05:52,600
useful, I will comment out the code and

129
00:05:52,601 --> 00:05:53,666
we'll start fresh again.

130
00:05:53,667 --> 00:06:00,066
[No Audio]

131
00:06:00,067 --> 00:06:02,600
I will first bring the relevant modules into scope.

132
00:06:02,601 --> 00:06:08,366
[No Audio]

133
00:06:08,367 --> 00:06:10,200
Next I will create a vector of threads.

134
00:06:10,201 --> 00:06:14,566
[No Audio]

135
00:06:14,567 --> 00:06:17,366
Next, I will create a barrier variable wrapped

136
00:06:17,367 --> 00:06:20,033
inside the Atomic Reference Counting smart pointer.

137
00:06:20,034 --> 00:06:28,166
[No Audio]

138
00:06:28,167 --> 00:06:30,266
We want to simulate a scenario where we have

139
00:06:30,267 --> 00:06:32,633
some huge computation which is being spread

140
00:06:32,634 --> 00:06:35,233
and divided among multiple threads. And we

141
00:06:35,234 --> 00:06:36,966
would like to stop at some predetermined

142
00:06:36,967 --> 00:06:38,733
points during the execution to check the

143
00:06:38,734 --> 00:06:41,033
result. To achieve this functionality, we

144
00:06:41,034 --> 00:06:42,900
will assume that we have three arrays

145
00:06:42,901 --> 00:06:45,433
containing some values, we would like to

146
00:06:45,434 --> 00:06:48,300
first add the first half of the arrays

147
00:06:48,900 --> 00:06:51,300
array values and then add all the values

148
00:06:51,301 --> 00:06:53,766
together and store it this will be followed

149
00:06:53,767 --> 00:06:57,366
by the second half of the arrays. We will

150
00:06:57,367 --> 00:06:59,700
create three threads where each thread will

151
00:06:59,701 --> 00:07:03,000
be responsible for a one of the arrays. So I

152
00:07:03,001 --> 00:07:05,066
will create a data vector which will contain

153
00:07:05,067 --> 00:07:07,233
the arrays, and will have wrapped it inside

154
00:07:07,234 --> 00:07:08,897
the Arc smart pointer.

155
00:07:08,898 --> 00:07:15,466
[No Audio]

156
00:07:15,467 --> 00:07:17,400
Please note that since this vector will be

157
00:07:17,401 --> 00:07:19,266
passed to different threads, therefore, we

158
00:07:19,267 --> 00:07:22,200
need to wrap it inside the Arc smart pointer.

159
00:07:22,533 --> 00:07:24,200
Next I will create a variable which will

160
00:07:24,201 --> 00:07:27,066
store the final summation of the values. Since

161
00:07:27,067 --> 00:07:29,000
multiple arrays will be updating the values

162
00:07:29,001 --> 00:07:31,266
of this variable, therefore to make sure that

163
00:07:31,267 --> 00:07:33,133
the updation to this variable is being done

164
00:07:33,134 --> 00:07:35,766
in a correct way. We will use the mutex, and

165
00:07:35,767 --> 00:07:38,266
we'll wrap it inside the Arc smart pointer

166
00:07:38,267 --> 00:07:45,600
[No Audio]

167
00:07:45,601 --> 00:07:47,566
Please note that the vector of data is not

168
00:07:47,567 --> 00:07:50,200
going to be updated and will be read only,

169
00:07:50,201 --> 00:07:52,366
therefore we do not need to wrap it inside

170
00:07:52,367 --> 00:07:54,533
the mutex however, this variable will be

171
00:07:54,534 --> 00:07:56,933
updated inside multiple threads therefore, we

172
00:07:56,934 --> 00:08:00,600
have wrapped inside the mutex. Next we will

173
00:08:00,601 --> 00:08:02,300
iterate three times for creating three

174
00:08:02,301 --> 00:08:05,633
threads, I will include a for loop for this purpose.

175
00:08:05,634 --> 00:08:11,800
[No Audio]

176
00:08:11,801 --> 00:08:13,933
During each iteration, we need to pass the

177
00:08:13,934 --> 00:08:17,266
variables of barrier data and result to the

178
00:08:17,267 --> 00:08:19,100
threads. So therefore, we will make long

179
00:08:19,101 --> 00:08:20,666
copies of these variables.

180
00:08:20,667 --> 00:08:27,500
[No Audio]

181
00:08:27,566 --> 00:08:30,000
The clone copies will shadow the original

182
00:08:30,001 --> 00:08:32,200
variables. Next I will create a thread.

183
00:08:32,201 --> 00:08:36,900
[No Audio]

184
00:08:36,901 --> 00:08:40,466
In each of the threads i, I will add the first

185
00:08:40,467 --> 00:08:43,332
three values of the respective ith vector.

186
00:08:44,000 --> 00:08:45,600
And next I will add the summation of the

187
00:08:45,601 --> 00:08:48,700
first three values of the, of the vector or

188
00:08:48,701 --> 00:08:51,300
array to the variable of result. To do so, I

189
00:08:51,301 --> 00:08:53,933
will first obtain a lock on the variable of result.

190
00:08:53,934 --> 00:08:57,466
[No Audio]

191
00:08:57,467 --> 00:08:59,266
Next, I will add the variable x to

192
00:08:59,267 --> 00:09:01,566
the already existing value of the result.

193
00:09:01,567 --> 00:09:06,633
[No Audio]

194
00:09:06,634 --> 00:09:08,766
I will next add a print statement to tell the

195
00:09:08,767 --> 00:09:11,000
user that the thread has done its respective

196
00:09:11,001 --> 00:09:12,400
part of the computation.

197
00:09:12,401 --> 00:09:17,079
[No Audio]

198
00:09:17,080 --> 00:09:18,200
We will call a wait

199
00:09:18,201 --> 00:09:20,666
on the barrier next to make sure that all the

200
00:09:20,667 --> 00:09:23,166
threads synchronize on this particular point.

201
00:09:25,066 --> 00:09:28,200
Once this part is done for all the threads, we

202
00:09:28,201 --> 00:09:31,433
will write similar code for part two of

203
00:09:31,434 --> 00:09:34,300
all the threads. First we will compute

204
00:09:34,301 --> 00:09:36,000
the summation of the remaining values.

205
00:09:36,001 --> 00:09:40,766
[No Audio]

206
00:09:40,767 --> 00:09:41,864
This will be followed by the

207
00:09:41,865 --> 00:09:43,466
updation of the variable result.

208
00:09:43,467 --> 00:09:49,200
[No Audio]

209
00:09:49,201 --> 00:09:51,000
And finally, we will indicate that the

210
00:09:51,001 --> 00:09:53,200
computation is being completed by the thread

211
00:09:53,201 --> 00:09:54,433
using a print statement.

212
00:09:54,434 --> 00:10:00,966
[No Audio]

213
00:10:00,967 --> 00:10:03,066
Outside the thread we will collect the thread

214
00:10:03,067 --> 00:10:04,166
in the thread vector.

215
00:10:04,167 --> 00:10:09,233
[No Audio]

216
00:10:09,234 --> 00:10:12,066
Moreover outside the loop, we will call we

217
00:10:12,067 --> 00:10:14,000
will call join on all the threads to make

218
00:10:14,001 --> 00:10:16,100
sure that all of them goes to completion.

219
00:10:16,101 --> 00:10:24,866
[No Audio]

220
00:10:24,867 --> 00:10:26,966
Finally, once all the threads are done we

221
00:10:26,967 --> 00:10:28,466
will print the value of the result.

222
00:10:28,467 --> 00:10:33,866
[No Audio]

223
00:10:33,867 --> 00:10:35,666
Okay, let us cargo run this.

224
00:10:35,700 --> 00:10:39,466
[No Audio]

225
00:10:39,467 --> 00:10:40,433
You may note that

226
00:10:40,434 --> 00:10:42,700
it only allows the second part of the

227
00:10:42,701 --> 00:10:45,400
computation once all the threads are done

228
00:10:45,401 --> 00:10:47,300
with the first part of the computations.

229
00:10:48,266 --> 00:10:50,266
Okay, let me explain one final point before

230
00:10:50,267 --> 00:10:52,300
we end this tutorial, and this is with regards

231
00:10:52,301 --> 00:10:54,333
to the blocking behavior of the lock function

232
00:10:54,334 --> 00:10:56,333
on mutex. You may recall that the lock

233
00:10:56,334 --> 00:10:58,166
function on mutex will block the current

234
00:10:58,167 --> 00:11:00,900
thread unless the locking variable remains in

235
00:11:00,901 --> 00:11:03,566
scope. This means that if I create a variable

236
00:11:03,567 --> 00:11:05,600
that locks a value, then as long as that

237
00:11:05,601 --> 00:11:08,000
variable remains in scope, the lock remains

238
00:11:08,001 --> 00:11:11,000
intact. For instance, I will comment out the

239
00:11:11,001 --> 00:11:13,500
two lines in the start of the thread and also

240
00:11:13,501 --> 00:11:16,300
the two lines after the wait function. Now

241
00:11:16,301 --> 00:11:18,266
instead of doing in place change to the

242
00:11:18,267 --> 00:11:21,000
value, I will create a variable x first,

243
00:11:21,133 --> 00:11:22,766
which will acquire the lock.

244
00:11:22,767 --> 00:11:25,766
[No Audio]

245
00:11:25,767 --> 00:11:28,065
Next, I will update the value of the variable x

246
00:11:28,066 --> 00:11:29,366
in the next line.

247
00:11:29,367 --> 00:11:35,100
[No Audio]

248
00:11:35,101 --> 00:11:37,200
After the wait I will update the value of the

249
00:11:37,201 --> 00:11:38,752
variable x again.

250
00:11:38,753 --> 00:11:41,927
[No Audio]

251
00:11:41,928 --> 00:11:43,000
This will result in a

252
00:11:43,001 --> 00:11:46,200
blocking state because whichever thread runs

253
00:11:46,201 --> 00:11:48,533
for the first time, it will first acquire the

254
00:11:48,534 --> 00:11:51,100
lock and then goes into the waiting state and

255
00:11:51,101 --> 00:11:53,600
without releasing the lock. This will

256
00:11:53,601 --> 00:11:56,333
essentially block all other threads to be

257
00:11:56,334 --> 00:11:58,833
blocked whenever they try to acquire the lock

258
00:11:58,834 --> 00:12:01,333
resulting in an overall blocking behavior.

259
00:12:01,866 --> 00:12:03,933
The correct approach is to first acquire the

260
00:12:03,934 --> 00:12:06,600
lock, and do not assign it to some variable

261
00:12:06,601 --> 00:12:09,266
and then deref the value. This is non

262
00:12:09,267 --> 00:12:12,033
blocking in nature by not assigning the lock

263
00:12:12,034 --> 00:12:14,400
to some variable we are not bounding

264
00:12:14,433 --> 00:12:17,166
ourselves to unlock the value. The mutex will

265
00:12:17,167 --> 00:12:19,733
be automatically unlocked after the line of

266
00:12:19,734 --> 00:12:23,100
code is executed. The lock in other words is

267
00:12:23,101 --> 00:12:25,066
limited to a single line of code in this

268
00:12:25,067 --> 00:12:28,566
case. The essential idea is that, if you want

269
00:12:28,567 --> 00:12:31,500
to change the value in a single line and want

270
00:12:31,501 --> 00:12:34,600
to immediately unlock after that line, then

271
00:12:34,601 --> 00:12:36,800
you will first acquire the lock and then use

272
00:12:36,801 --> 00:12:39,933
the deref to update the value. I call this

273
00:12:39,934 --> 00:12:43,166
inplace lock and update. This does not come

274
00:12:43,167 --> 00:12:45,533
with the extra headache of properly unlocking

275
00:12:45,534 --> 00:12:47,866
the mutex, and is typically used in simple

276
00:12:47,867 --> 00:12:50,933
scenarios just like the one we have. That is

277
00:12:50,934 --> 00:12:52,633
it for this particular tutorial. See you

278
00:12:52,634 --> 00:12:55,233
again and until then enjoy Rust programming.

279
00:12:55,234 --> 00:13:00,566
[No Audio]