1 00:00:00,000 --> 00:00:07,166 [No Audio] 2 00:00:07,167 --> 00:00:09,000 In this video we will be explaining the 3 00:00:09,001 --> 00:00:11,833 concept of barrier. A barrier enables 4 00:00:11,834 --> 00:00:13,866 multiple threads to synchronize the beginning 5 00:00:13,867 --> 00:00:16,433 of some computation, you may consider it like 6 00:00:16,434 --> 00:00:19,266 an obstacle or a point in a code, which will 7 00:00:19,267 --> 00:00:21,900 hault the execution of some threads until all 8 00:00:21,901 --> 00:00:24,400 the threads have executed the code up to that 9 00:00:24,401 --> 00:00:27,266 particular point. When all the thread reaches 10 00:00:27,267 --> 00:00:29,400 the point, they will be allowed for further 11 00:00:29,401 --> 00:00:31,366 execution of the remaining of their code. 12 00:00:31,966 --> 00:00:33,900 This is typically used to synchronize the 13 00:00:33,901 --> 00:00:36,600 beginning of some computation. Let us look at 14 00:00:36,601 --> 00:00:39,166 an example to understand this. I will first 15 00:00:39,167 --> 00:00:41,700 first bring the relevant modules into scope. 16 00:00:41,701 --> 00:00:46,666 [No Audio] 17 00:00:46,667 --> 00:00:48,666 Next, I will create a vector of threads. 18 00:00:48,667 --> 00:00:54,433 [No Audio] 19 00:00:54,434 --> 00:00:58,033 Now I will create a new barrier using the new function. 20 00:00:58,034 --> 00:01:03,200 [No Audio] 21 00:01:03,201 --> 00:01:05,633 Please note that since this variable needs to 22 00:01:05,634 --> 00:01:07,166 be passed between multiple threads, 23 00:01:07,167 --> 00:01:09,866 therefore, I have to wrap it inside the 24 00:01:09,900 --> 00:01:12,900 Atomic Reference Counting smart pointer. This 25 00:01:12,901 --> 00:01:14,933 will allow multiple threads to use it in a 26 00:01:14,934 --> 00:01:18,000 correct way. Next, I will iterate 10 times, 27 00:01:18,001 --> 00:01:20,066 and during each iteration, I will create a 28 00:01:20,067 --> 00:01:22,566 thread which will do some computation. So I 29 00:01:22,567 --> 00:01:25,166 will include a for loop for this purpose. 30 00:01:25,167 --> 00:01:29,833 [No Audio] 31 00:01:29,834 --> 00:01:31,766 Inside the loop, I will first create a clone 32 00:01:31,767 --> 00:01:35,400 of the variable barrier. We can do this 33 00:01:35,401 --> 00:01:38,033 because it is of type Arcs smart pointer 34 00:01:38,066 --> 00:01:40,600 which has this function for allowing multiple 35 00:01:40,633 --> 00:01:42,900 owners of the data in a thread safe way. 36 00:01:44,866 --> 00:01:47,566 Next, I will create a thread during each iteration. 37 00:01:47,567 --> 00:01:53,166 [No Audio] 38 00:01:53,167 --> 00:01:55,133 Inside the thread, I will have some code to 39 00:01:55,134 --> 00:01:57,300 reflect some sort of computation inside the 40 00:01:57,301 --> 00:01:59,566 thread. So I will include a simple print statement. 41 00:01:59,567 --> 00:02:05,233 [No Audio] 42 00:02:05,234 --> 00:02:06,166 I want all the 43 00:02:06,167 --> 00:02:08,133 threads that are being created inside the 44 00:02:08,134 --> 00:02:10,765 loop to execute this print command, and then 45 00:02:10,766 --> 00:02:12,966 wait until all the threads are done with 46 00:02:13,000 --> 00:02:15,433 executing this command. In other words, I 47 00:02:15,434 --> 00:02:18,066 want all the threads, all the threads code to 48 00:02:18,067 --> 00:02:20,700 get synchronized at this particular point. To 49 00:02:20,701 --> 00:02:23,466 do so I will call the wait function on the barrier. 50 00:02:23,467 --> 00:02:28,566 [No Audio] 51 00:02:28,567 --> 00:02:30,800 This will block the current thread until all 52 00:02:30,801 --> 00:02:32,700 the threads have executed up to this 53 00:02:32,701 --> 00:02:34,800 particular point. In other words, it will 54 00:02:34,801 --> 00:02:37,333 block the threads until all the threads meet 55 00:02:37,366 --> 00:02:41,333 at this point in code. After this line, I 56 00:02:41,334 --> 00:02:43,200 will add one more print statement. 57 00:02:43,201 --> 00:02:47,800 [No Audio] 58 00:02:47,801 --> 00:02:50,400 This statement will only execute once all the 59 00:02:50,401 --> 00:02:52,500 threads are done executing the respective 60 00:02:52,501 --> 00:02:55,466 code up to the line on which on which each 61 00:02:55,467 --> 00:02:57,600 specific thread is making a call to the 62 00:02:57,601 --> 00:03:00,633 wait function. Outside the thread I will 63 00:03:00,634 --> 00:03:02,533 collect all the threads in the thread vector. 64 00:03:02,534 --> 00:03:07,200 [No Audio] 65 00:03:07,201 --> 00:03:09,300 Finally, I will make sure that all the 66 00:03:09,301 --> 00:03:11,066 threads should go to completion. So I will 67 00:03:11,067 --> 00:03:13,866 call the join on all of them inside the loop. 68 00:03:13,867 --> 00:03:23,033 [No Audio] 69 00:03:23,034 --> 00:03:25,000 Now let us execute the code, and then we will 70 00:03:25,001 --> 00:03:26,800 explain some of the output. 71 00:03:26,801 --> 00:03:30,161 [No Audio] 72 00:03:30,162 --> 00:03:31,200 You may note that 73 00:03:31,201 --> 00:03:33,533 the first print line which was before the 74 00:03:33,534 --> 00:03:35,866 wait on the barrier in each of the threads 75 00:03:36,300 --> 00:03:38,433 are executed for the first five threads, and 76 00:03:38,434 --> 00:03:40,300 afterwards the remaining of the code is being 77 00:03:40,301 --> 00:03:42,666 executed. At the end of the terminal, you may 78 00:03:42,667 --> 00:03:44,733 note that all the second print statements are 79 00:03:44,734 --> 00:03:47,833 being executed. Let us explain the behavior 80 00:03:47,834 --> 00:03:51,300 of the results now in the main. In the main, 81 00:03:51,301 --> 00:03:53,433 the value of the barrier variable is set to 5. 82 00:03:53,434 --> 00:03:56,100 This means that the first five threads 83 00:03:56,101 --> 00:03:58,233 which will call this function will be blocked 84 00:03:58,500 --> 00:04:00,666 until all of them are being synchronized. 85 00:04:01,366 --> 00:04:03,666 This further means that when all the threads 86 00:04:03,667 --> 00:04:06,000 have executed their respective code until the 87 00:04:06,001 --> 00:04:07,800 point where they are calling the wait on 88 00:04:07,801 --> 00:04:10,700 the barrier, then they will be unblocked. 89 00:04:11,033 --> 00:04:13,966 Therefore the first five threads we are 90 00:04:13,967 --> 00:04:17,366 creating are being blocked. When the first 91 00:04:17,367 --> 00:04:19,065 five threads have displayed the message 92 00:04:19,066 --> 00:04:21,500 before wait, which is an indication that 93 00:04:21,501 --> 00:04:23,566 they are done with the code before the call 94 00:04:23,567 --> 00:04:26,666 to the wait so they will be unblocked, the 95 00:04:26,667 --> 00:04:28,966 Rust will allow the remaining of the code in 96 00:04:28,967 --> 00:04:31,233 the respective threads to execute. However, 97 00:04:31,466 --> 00:04:33,733 what happens is that, since we are creating 10 98 00:04:33,734 --> 00:04:36,100 threads. So therefore when the remaining five 99 00:04:36,101 --> 00:04:38,966 threads makes a call to the wait, they are 100 00:04:38,967 --> 00:04:41,800 also blocked, and they will be released once 101 00:04:41,801 --> 00:04:43,833 all of them are done with the code up to the 102 00:04:43,834 --> 00:04:46,666 point where they are calling. We are making a 103 00:04:46,667 --> 00:04:49,300 call to the wait function. In summary, the 104 00:04:49,301 --> 00:04:51,266 first five threads will be blocked, and once 105 00:04:51,267 --> 00:04:53,433 they are released the next five threads will 106 00:04:53,434 --> 00:04:55,700 be blocked. If we change the value of the 107 00:04:55,701 --> 00:04:58,800 barrier to date of 10, then in that case 10 108 00:04:58,801 --> 00:05:01,400 threads will be blocked. Now what will happen 109 00:05:01,401 --> 00:05:03,500 if I change the iteration of the loop to date 110 00:05:03,501 --> 00:05:05,500 of 3 instead of 10? Let me change. 111 00:05:07,333 --> 00:05:09,800 In this case, we are only creating three 112 00:05:09,801 --> 00:05:11,866 threads. So all the three threads will be 113 00:05:11,867 --> 00:05:14,533 blocked, when they reach to the line where 114 00:05:14,534 --> 00:05:17,566 they are calling the function of wait. Since 115 00:05:17,567 --> 00:05:20,066 the barrier has a value of five, which will 116 00:05:20,067 --> 00:05:23,166 only resume once the execution of 117 00:05:23,167 --> 00:05:25,366 five threads up to this particular point are 118 00:05:25,367 --> 00:05:28,733 complete. Since we are not creating any more 119 00:05:28,734 --> 00:05:30,466 threads, so therefore, all the threads will 120 00:05:30,467 --> 00:05:32,300 keep on waiting and will never go to 121 00:05:32,301 --> 00:05:35,033 completion. And this is because we are we are 122 00:05:35,034 --> 00:05:37,533 only creating three threads and not five 123 00:05:37,534 --> 00:05:40,300 threads. This example highlights a possible 124 00:05:40,301 --> 00:05:42,866 issue with the barrier there is it may lead 125 00:05:42,867 --> 00:05:45,433 to blocking of your program if misused or 126 00:05:45,434 --> 00:05:48,533 used in a wrong way. But let us now cover a 127 00:05:48,534 --> 00:05:50,700 nice use case where the barriers will be 128 00:05:50,701 --> 00:05:52,600 useful, I will comment out the code and 129 00:05:52,601 --> 00:05:53,666 we'll start fresh again. 130 00:05:53,667 --> 00:06:00,066 [No Audio] 131 00:06:00,067 --> 00:06:02,600 I will first bring the relevant modules into scope. 132 00:06:02,601 --> 00:06:08,366 [No Audio] 133 00:06:08,367 --> 00:06:10,200 Next I will create a vector of threads. 134 00:06:10,201 --> 00:06:14,566 [No Audio] 135 00:06:14,567 --> 00:06:17,366 Next, I will create a barrier variable wrapped 136 00:06:17,367 --> 00:06:20,033 inside the Atomic Reference Counting smart pointer. 137 00:06:20,034 --> 00:06:28,166 [No Audio] 138 00:06:28,167 --> 00:06:30,266 We want to simulate a scenario where we have 139 00:06:30,267 --> 00:06:32,633 some huge computation which is being spread 140 00:06:32,634 --> 00:06:35,233 and divided among multiple threads. And we 141 00:06:35,234 --> 00:06:36,966 would like to stop at some predetermined 142 00:06:36,967 --> 00:06:38,733 points during the execution to check the 143 00:06:38,734 --> 00:06:41,033 result. To achieve this functionality, we 144 00:06:41,034 --> 00:06:42,900 will assume that we have three arrays 145 00:06:42,901 --> 00:06:45,433 containing some values, we would like to 146 00:06:45,434 --> 00:06:48,300 first add the first half of the arrays 147 00:06:48,900 --> 00:06:51,300 array values and then add all the values 148 00:06:51,301 --> 00:06:53,766 together and store it this will be followed 149 00:06:53,767 --> 00:06:57,366 by the second half of the arrays. We will 150 00:06:57,367 --> 00:06:59,700 create three threads where each thread will 151 00:06:59,701 --> 00:07:03,000 be responsible for a one of the arrays. So I 152 00:07:03,001 --> 00:07:05,066 will create a data vector which will contain 153 00:07:05,067 --> 00:07:07,233 the arrays, and will have wrapped it inside 154 00:07:07,234 --> 00:07:08,897 the Arc smart pointer. 155 00:07:08,898 --> 00:07:15,466 [No Audio] 156 00:07:15,467 --> 00:07:17,400 Please note that since this vector will be 157 00:07:17,401 --> 00:07:19,266 passed to different threads, therefore, we 158 00:07:19,267 --> 00:07:22,200 need to wrap it inside the Arc smart pointer. 159 00:07:22,533 --> 00:07:24,200 Next I will create a variable which will 160 00:07:24,201 --> 00:07:27,066 store the final summation of the values. Since 161 00:07:27,067 --> 00:07:29,000 multiple arrays will be updating the values 162 00:07:29,001 --> 00:07:31,266 of this variable, therefore to make sure that 163 00:07:31,267 --> 00:07:33,133 the updation to this variable is being done 164 00:07:33,134 --> 00:07:35,766 in a correct way. We will use the mutex, and 165 00:07:35,767 --> 00:07:38,266 we'll wrap it inside the Arc smart pointer 166 00:07:38,267 --> 00:07:45,600 [No Audio] 167 00:07:45,601 --> 00:07:47,566 Please note that the vector of data is not 168 00:07:47,567 --> 00:07:50,200 going to be updated and will be read only, 169 00:07:50,201 --> 00:07:52,366 therefore we do not need to wrap it inside 170 00:07:52,367 --> 00:07:54,533 the mutex however, this variable will be 171 00:07:54,534 --> 00:07:56,933 updated inside multiple threads therefore, we 172 00:07:56,934 --> 00:08:00,600 have wrapped inside the mutex. Next we will 173 00:08:00,601 --> 00:08:02,300 iterate three times for creating three 174 00:08:02,301 --> 00:08:05,633 threads, I will include a for loop for this purpose. 175 00:08:05,634 --> 00:08:11,800 [No Audio] 176 00:08:11,801 --> 00:08:13,933 During each iteration, we need to pass the 177 00:08:13,934 --> 00:08:17,266 variables of barrier data and result to the 178 00:08:17,267 --> 00:08:19,100 threads. So therefore, we will make long 179 00:08:19,101 --> 00:08:20,666 copies of these variables. 180 00:08:20,667 --> 00:08:27,500 [No Audio] 181 00:08:27,566 --> 00:08:30,000 The clone copies will shadow the original 182 00:08:30,001 --> 00:08:32,200 variables. Next I will create a thread. 183 00:08:32,201 --> 00:08:36,900 [No Audio] 184 00:08:36,901 --> 00:08:40,466 In each of the threads i, I will add the first 185 00:08:40,467 --> 00:08:43,332 three values of the respective ith vector. 186 00:08:44,000 --> 00:08:45,600 And next I will add the summation of the 187 00:08:45,601 --> 00:08:48,700 first three values of the, of the vector or 188 00:08:48,701 --> 00:08:51,300 array to the variable of result. To do so, I 189 00:08:51,301 --> 00:08:53,933 will first obtain a lock on the variable of result. 190 00:08:53,934 --> 00:08:57,466 [No Audio] 191 00:08:57,467 --> 00:08:59,266 Next, I will add the variable x to 192 00:08:59,267 --> 00:09:01,566 the already existing value of the result. 193 00:09:01,567 --> 00:09:06,633 [No Audio] 194 00:09:06,634 --> 00:09:08,766 I will next add a print statement to tell the 195 00:09:08,767 --> 00:09:11,000 user that the thread has done its respective 196 00:09:11,001 --> 00:09:12,400 part of the computation. 197 00:09:12,401 --> 00:09:17,079 [No Audio] 198 00:09:17,080 --> 00:09:18,200 We will call a wait 199 00:09:18,201 --> 00:09:20,666 on the barrier next to make sure that all the 200 00:09:20,667 --> 00:09:23,166 threads synchronize on this particular point. 201 00:09:25,066 --> 00:09:28,200 Once this part is done for all the threads, we 202 00:09:28,201 --> 00:09:31,433 will write similar code for part two of 203 00:09:31,434 --> 00:09:34,300 all the threads. First we will compute 204 00:09:34,301 --> 00:09:36,000 the summation of the remaining values. 205 00:09:36,001 --> 00:09:40,766 [No Audio] 206 00:09:40,767 --> 00:09:41,864 This will be followed by the 207 00:09:41,865 --> 00:09:43,466 updation of the variable result. 208 00:09:43,467 --> 00:09:49,200 [No Audio] 209 00:09:49,201 --> 00:09:51,000 And finally, we will indicate that the 210 00:09:51,001 --> 00:09:53,200 computation is being completed by the thread 211 00:09:53,201 --> 00:09:54,433 using a print statement. 212 00:09:54,434 --> 00:10:00,966 [No Audio] 213 00:10:00,967 --> 00:10:03,066 Outside the thread we will collect the thread 214 00:10:03,067 --> 00:10:04,166 in the thread vector. 215 00:10:04,167 --> 00:10:09,233 [No Audio] 216 00:10:09,234 --> 00:10:12,066 Moreover outside the loop, we will call we 217 00:10:12,067 --> 00:10:14,000 will call join on all the threads to make 218 00:10:14,001 --> 00:10:16,100 sure that all of them goes to completion. 219 00:10:16,101 --> 00:10:24,866 [No Audio] 220 00:10:24,867 --> 00:10:26,966 Finally, once all the threads are done we 221 00:10:26,967 --> 00:10:28,466 will print the value of the result. 222 00:10:28,467 --> 00:10:33,866 [No Audio] 223 00:10:33,867 --> 00:10:35,666 Okay, let us cargo run this. 224 00:10:35,700 --> 00:10:39,466 [No Audio] 225 00:10:39,467 --> 00:10:40,433 You may note that 226 00:10:40,434 --> 00:10:42,700 it only allows the second part of the 227 00:10:42,701 --> 00:10:45,400 computation once all the threads are done 228 00:10:45,401 --> 00:10:47,300 with the first part of the computations. 229 00:10:48,266 --> 00:10:50,266 Okay, let me explain one final point before 230 00:10:50,267 --> 00:10:52,300 we end this tutorial, and this is with regards 231 00:10:52,301 --> 00:10:54,333 to the blocking behavior of the lock function 232 00:10:54,334 --> 00:10:56,333 on mutex. You may recall that the lock 233 00:10:56,334 --> 00:10:58,166 function on mutex will block the current 234 00:10:58,167 --> 00:11:00,900 thread unless the locking variable remains in 235 00:11:00,901 --> 00:11:03,566 scope. This means that if I create a variable 236 00:11:03,567 --> 00:11:05,600 that locks a value, then as long as that 237 00:11:05,601 --> 00:11:08,000 variable remains in scope, the lock remains 238 00:11:08,001 --> 00:11:11,000 intact. For instance, I will comment out the 239 00:11:11,001 --> 00:11:13,500 two lines in the start of the thread and also 240 00:11:13,501 --> 00:11:16,300 the two lines after the wait function. Now 241 00:11:16,301 --> 00:11:18,266 instead of doing in place change to the 242 00:11:18,267 --> 00:11:21,000 value, I will create a variable x first, 243 00:11:21,133 --> 00:11:22,766 which will acquire the lock. 244 00:11:22,767 --> 00:11:25,766 [No Audio] 245 00:11:25,767 --> 00:11:28,065 Next, I will update the value of the variable x 246 00:11:28,066 --> 00:11:29,366 in the next line. 247 00:11:29,367 --> 00:11:35,100 [No Audio] 248 00:11:35,101 --> 00:11:37,200 After the wait I will update the value of the 249 00:11:37,201 --> 00:11:38,752 variable x again. 250 00:11:38,753 --> 00:11:41,927 [No Audio] 251 00:11:41,928 --> 00:11:43,000 This will result in a 252 00:11:43,001 --> 00:11:46,200 blocking state because whichever thread runs 253 00:11:46,201 --> 00:11:48,533 for the first time, it will first acquire the 254 00:11:48,534 --> 00:11:51,100 lock and then goes into the waiting state and 255 00:11:51,101 --> 00:11:53,600 without releasing the lock. This will 256 00:11:53,601 --> 00:11:56,333 essentially block all other threads to be 257 00:11:56,334 --> 00:11:58,833 blocked whenever they try to acquire the lock 258 00:11:58,834 --> 00:12:01,333 resulting in an overall blocking behavior. 259 00:12:01,866 --> 00:12:03,933 The correct approach is to first acquire the 260 00:12:03,934 --> 00:12:06,600 lock, and do not assign it to some variable 261 00:12:06,601 --> 00:12:09,266 and then deref the value. This is non 262 00:12:09,267 --> 00:12:12,033 blocking in nature by not assigning the lock 263 00:12:12,034 --> 00:12:14,400 to some variable we are not bounding 264 00:12:14,433 --> 00:12:17,166 ourselves to unlock the value. The mutex will 265 00:12:17,167 --> 00:12:19,733 be automatically unlocked after the line of 266 00:12:19,734 --> 00:12:23,100 code is executed. The lock in other words is 267 00:12:23,101 --> 00:12:25,066 limited to a single line of code in this 268 00:12:25,067 --> 00:12:28,566 case. The essential idea is that, if you want 269 00:12:28,567 --> 00:12:31,500 to change the value in a single line and want 270 00:12:31,501 --> 00:12:34,600 to immediately unlock after that line, then 271 00:12:34,601 --> 00:12:36,800 you will first acquire the lock and then use 272 00:12:36,801 --> 00:12:39,933 the deref to update the value. I call this 273 00:12:39,934 --> 00:12:43,166 inplace lock and update. This does not come 274 00:12:43,167 --> 00:12:45,533 with the extra headache of properly unlocking 275 00:12:45,534 --> 00:12:47,866 the mutex, and is typically used in simple 276 00:12:47,867 --> 00:12:50,933 scenarios just like the one we have. That is 277 00:12:50,934 --> 00:12:52,633 it for this particular tutorial. See you 278 00:12:52,634 --> 00:12:55,233 again and until then enjoy Rust programming. 279 00:12:55,234 --> 00:13:00,566 [No Audio]