1 00:00:06,560 --> 00:00:08,360 - [Instructor] Okay, so, we finished talking 2 00:00:08,360 --> 00:00:11,470 about the cache coherency and fault sharing issues 3 00:00:11,470 --> 00:00:12,740 which are really interesting when 4 00:00:12,740 --> 00:00:14,630 it comes to multithreaded software. 5 00:00:14,630 --> 00:00:15,860 But what I want to do is start off 6 00:00:15,860 --> 00:00:20,190 with a very simple data race problem. 7 00:00:20,190 --> 00:00:23,670 I think if we can look at a simple data race 8 00:00:23,670 --> 00:00:25,130 then you'll get a better understanding 9 00:00:25,130 --> 00:00:27,750 of what we mean about synchronization here. 10 00:00:27,750 --> 00:00:29,610 So, in this piece of code, what we're gonna do 11 00:00:29,610 --> 00:00:31,150 is we're gonna have our global 12 00:00:31,150 --> 00:00:33,330 variable right there on line 16. 13 00:00:33,330 --> 00:00:35,530 And it's called counter. 14 00:00:35,530 --> 00:00:37,390 So there's my counter variable. 15 00:00:37,390 --> 00:00:39,760 It's my global counter, right? 16 00:00:39,760 --> 00:00:42,880 And what I wanna do is launch two goroutines. 17 00:00:42,880 --> 00:00:44,740 Goroutine one 18 00:00:44,740 --> 00:00:46,920 and goroutine two. 19 00:00:46,920 --> 00:00:48,400 What these goroutines are gonna be doing 20 00:00:48,400 --> 00:00:52,030 is incrementing this local counter, right. 21 00:00:52,030 --> 00:00:54,020 I now have a synchronization issue. 22 00:00:54,020 --> 00:00:56,630 You get to go, you get to go, you get to go, you get to go. 23 00:00:56,630 --> 00:00:58,270 This is not orchestration. 24 00:00:58,270 --> 00:01:00,070 They're not talking to each other. 25 00:01:00,070 --> 00:01:02,974 They need to get in line and take a turn. 26 00:01:02,974 --> 00:01:06,570 Incrementing the shared state, which is counter. 27 00:01:06,570 --> 00:01:07,600 Okay, great. 28 00:01:07,600 --> 00:01:09,470 So, I go ahead on line 21. 29 00:01:09,470 --> 00:01:11,600 I define a constant of two for two goroutines. 30 00:01:11,600 --> 00:01:15,080 I use my WaitGroup orchestration mechanism. 31 00:01:15,080 --> 00:01:16,140 'Cause we're gonna have two goroutines. 32 00:01:16,140 --> 00:01:18,340 I wanna manage that once they're done 33 00:01:18,340 --> 00:01:20,280 then we can shut this program down. 34 00:01:20,280 --> 00:01:21,350 So we add two. 35 00:01:21,350 --> 00:01:23,290 And then in the loop, I go ahead 36 00:01:23,290 --> 00:01:25,590 and I use our literal function. 37 00:01:25,590 --> 00:01:27,750 Okay, literal function construction here. 38 00:01:27,750 --> 00:01:30,420 I call all the function down here, right here. 39 00:01:30,420 --> 00:01:33,420 And we put the keyword go in front of it, there it is. 40 00:01:33,420 --> 00:01:36,380 I now have my goroutine one and goroutine two. 41 00:01:36,380 --> 00:01:37,490 There it is, right there. 42 00:01:37,490 --> 00:01:41,650 And then, these goroutines do three operations here. 43 00:01:41,650 --> 00:01:43,050 It's a read, 44 00:01:43,050 --> 00:01:44,350 modify, 45 00:01:44,350 --> 00:01:45,460 write. There it is, 46 00:01:45,460 --> 00:01:46,610 read, modify, write. 47 00:01:46,610 --> 00:01:48,250 So basically the goroutine comes in, 48 00:01:48,250 --> 00:01:52,910 it does a read, does a modify, writes it back. 49 00:01:52,910 --> 00:01:53,993 Right, read, 50 00:01:54,920 --> 00:01:57,450 modify, write it back. 51 00:01:57,450 --> 00:02:02,450 And this in come in, do a read, modify, write it back. 52 00:02:02,580 --> 00:02:05,020 Read, modify, 53 00:02:05,020 --> 00:02:06,100 write it back. 54 00:02:06,100 --> 00:02:08,840 So every time we run this program, 55 00:02:08,840 --> 00:02:11,670 we should get the answer of 56 00:02:11,670 --> 00:02:12,610 four. 57 00:02:12,610 --> 00:02:16,460 So let's navigate to where this program is, real quick. 58 00:02:16,460 --> 00:02:18,950 And, we're gonna do a go build. 59 00:02:18,950 --> 00:02:20,520 And we're gonna run it. 60 00:02:20,520 --> 00:02:22,900 Now every time I run the program, 61 00:02:22,900 --> 00:02:26,220 we see exactly the output that I'm expecting. 62 00:02:26,220 --> 00:02:30,100 We have the read, the modify, and the write. 63 00:02:30,100 --> 00:02:31,680 Just like we drew it here. 64 00:02:31,680 --> 00:02:33,680 Two goroutines doing two increments, 65 00:02:33,680 --> 00:02:36,340 the answer should always be four. 66 00:02:36,340 --> 00:02:38,500 The answer should always be four. 67 00:02:38,500 --> 00:02:40,990 And every time I run this program, 68 00:02:40,990 --> 00:02:43,030 we get the answer of 69 00:02:43,030 --> 00:02:44,260 four. 70 00:02:44,260 --> 00:02:46,460 I'm happy, this program's working. 71 00:02:46,460 --> 00:02:48,900 We put it in production. 72 00:02:48,900 --> 00:02:51,450 But, we've been running on bare metal. 73 00:02:51,450 --> 00:02:54,730 And, one day we decide we're gonna move to the cloud. 74 00:02:54,730 --> 00:02:58,190 Which are much noisier and busier machines. 75 00:02:58,190 --> 00:03:01,500 And once we do that, things start to change. 76 00:03:01,500 --> 00:03:03,390 Because understand that there's nothing 77 00:03:03,390 --> 00:03:06,120 atomic going on with these statements. 78 00:03:06,120 --> 00:03:07,870 And to prove it what I'm gonna do 79 00:03:07,870 --> 00:03:11,720 is just add one more line of code. 80 00:03:11,720 --> 00:03:14,840 I'm gonna add a print statement here. 81 00:03:14,840 --> 00:03:16,670 Which is a system call. 82 00:03:16,670 --> 00:03:19,510 To display value, the local variable, 83 00:03:19,510 --> 00:03:23,360 before we write it back to our global state. 84 00:03:23,360 --> 00:03:25,430 Watch what happens just by adding 85 00:03:25,430 --> 00:03:28,680 this one line of code on line 38. 86 00:03:28,680 --> 00:03:29,623 Let's build it. 87 00:03:31,180 --> 00:03:32,163 And let's run it. 88 00:03:33,740 --> 00:03:35,820 Do you notice that the final counter 89 00:03:35,820 --> 00:03:37,890 is no longer four, it's two. 90 00:03:37,890 --> 00:03:39,260 And do you notice that this doesn't 91 00:03:39,260 --> 00:03:41,080 say one, two, three, four. 92 00:03:41,080 --> 00:03:44,197 It says one, two, one, two. 93 00:03:44,197 --> 00:03:47,700 Hum, how is it that just this line of code here 94 00:03:47,700 --> 00:03:51,160 has suddenly changed the behavior 95 00:03:51,160 --> 00:03:54,020 of this program and for the worse. 96 00:03:54,020 --> 00:03:56,280 How is it and how would that affect us? 97 00:03:56,280 --> 00:03:57,400 How could, when I said we went 98 00:03:57,400 --> 00:03:59,780 from bare metal to the cloud, it's a noisy machine. 99 00:03:59,780 --> 00:04:02,890 How could I'm, how am I really trying to simulate that? 100 00:04:02,890 --> 00:04:06,100 Let's go back to our chart here, okay. 101 00:04:06,100 --> 00:04:08,840 Let's do this all over again. 102 00:04:08,840 --> 00:04:11,880 Now with the print statement in place. 103 00:04:11,880 --> 00:04:13,770 So we can see it. 104 00:04:13,770 --> 00:04:14,810 Okay, brilliant. 105 00:04:14,810 --> 00:04:18,920 So what do we do, we again do our local read. 106 00:04:18,920 --> 00:04:21,170 There it is, read zero. 107 00:04:21,170 --> 00:04:23,180 And we do our 108 00:04:23,180 --> 00:04:25,810 local write, there it is. 109 00:04:25,810 --> 00:04:27,640 But when it comes to the print statement, 110 00:04:27,640 --> 00:04:30,060 remember that is a system call. 111 00:04:30,060 --> 00:04:34,030 And now what we're seeing is the scheduler going 112 00:04:34,030 --> 00:04:35,970 oh, you want to make a system call? 113 00:04:35,970 --> 00:04:38,280 Well we're gonna now move you from a 114 00:04:38,280 --> 00:04:41,580 running state into a waiting state. 115 00:04:41,580 --> 00:04:44,740 Guess what we just got, context switch. 116 00:04:44,740 --> 00:04:47,870 This goroutine comes off the P. 117 00:04:47,870 --> 00:04:51,160 Hum, which now means that this goroutine gets to run. 118 00:04:51,160 --> 00:04:52,460 No problem. 119 00:04:52,460 --> 00:04:55,890 It reads the global state, zero. 120 00:04:55,890 --> 00:04:58,590 It modifies it's local variable. 121 00:04:58,590 --> 00:04:59,900 And then it goes to print. 122 00:04:59,900 --> 00:05:04,810 And, the scheduler goes okay, context switch. 123 00:05:04,810 --> 00:05:07,410 Now, once we've done the context switch. 124 00:05:07,410 --> 00:05:10,100 This system call's probably done. 125 00:05:10,100 --> 00:05:12,780 And the scheduler chose G1 again. 126 00:05:12,780 --> 00:05:16,200 Which now means that it's time to write it back out. 127 00:05:16,200 --> 00:05:18,460 Which it does, but this goroutine 128 00:05:18,460 --> 00:05:21,180 has no idea that it's stopped running. 129 00:05:21,180 --> 00:05:23,680 These goroutines are kinda blind to that, right. 130 00:05:23,680 --> 00:05:25,150 They have no clue. 131 00:05:25,150 --> 00:05:28,870 So it does the write, and it comes back and does the read. 132 00:05:28,870 --> 00:05:32,040 Does the modify and then what happens? 133 00:05:32,040 --> 00:05:36,400 Well we've got another context switch. 134 00:05:36,400 --> 00:05:38,700 Which now means we're back here. 135 00:05:38,700 --> 00:05:41,080 Now remember we talked about how 136 00:05:42,040 --> 00:05:44,140 the hardware level in the caches 137 00:05:44,140 --> 00:05:46,260 once one core changed something 138 00:05:46,260 --> 00:05:48,860 it marked those cache lines dirty. 139 00:05:48,860 --> 00:05:51,780 Well guess what, I don't have any special snooping protocols 140 00:05:51,780 --> 00:05:55,520 here to tell the goroutine that its copy 141 00:05:55,520 --> 00:05:58,140 of the global state has changed. 142 00:05:58,140 --> 00:06:00,480 We have nothing here indicating to the goroutine 143 00:06:00,480 --> 00:06:03,900 that what it's working on right now is dirty. 144 00:06:03,900 --> 00:06:05,100 Fact this goroutine doesn't even 145 00:06:05,100 --> 00:06:06,920 know that it's stopped working. 146 00:06:06,920 --> 00:06:08,530 So what does it do? 147 00:06:08,530 --> 00:06:10,933 It goes ahead and writes back, 148 00:06:11,870 --> 00:06:13,320 one. 149 00:06:13,320 --> 00:06:14,233 And it reads it, 150 00:06:15,470 --> 00:06:17,060 modifies it, 151 00:06:17,060 --> 00:06:19,210 and then we have another context switch. 152 00:06:19,210 --> 00:06:23,900 So this is why we're seeing some very nasty behavior. 153 00:06:23,900 --> 00:06:27,060 Because this goroutine here has no idea 154 00:06:27,060 --> 00:06:29,400 that the global state has changed. 155 00:06:29,400 --> 00:06:34,160 And we start in this situation where only one goroutine. 156 00:06:34,160 --> 00:06:38,220 Only one goroutine is really doing the modification. 157 00:06:38,220 --> 00:06:41,500 Now the print statements can happen almost in any order. 158 00:06:41,500 --> 00:06:43,590 You're seeing one, two, one, two. 159 00:06:43,590 --> 00:06:46,550 Don't read into that with what I did on the board. 160 00:06:46,550 --> 00:06:48,610 Okay, because now we're dealing with 161 00:06:50,240 --> 00:06:51,510 running really in parallel. 162 00:06:51,510 --> 00:06:52,950 I am running against two P's. 163 00:06:52,950 --> 00:06:56,810 Each one of these goroutines is running on it's own M. 164 00:06:56,810 --> 00:06:59,270 Which is really running on it's own core. 165 00:06:59,270 --> 00:07:02,110 And we've got other synchronization issues 166 00:07:02,110 --> 00:07:04,040 with the output, don't read into that. 167 00:07:04,040 --> 00:07:06,700 What's important here is that print statement caused 168 00:07:06,700 --> 00:07:09,930 a context switch in the middle of these three lines of code. 169 00:07:09,930 --> 00:07:11,930 And therefore these three lines were 170 00:07:11,930 --> 00:07:14,750 no longer atomic the way they were before. 171 00:07:14,750 --> 00:07:16,460 But we were getting lucky before. 172 00:07:16,460 --> 00:07:18,910 There were no guarantees, remember I told you 173 00:07:18,910 --> 00:07:23,020 synchronization orchestration requires that we demand. 174 00:07:23,020 --> 00:07:24,330 Demand, right? 175 00:07:24,330 --> 00:07:25,380 The synchronization, the orchestration. 176 00:07:25,380 --> 00:07:28,010 You have to demand it, you can't just ask for it 177 00:07:28,010 --> 00:07:29,570 or hope it's going to happen. 178 00:07:29,570 --> 00:07:30,780 We've lost it. 179 00:07:30,780 --> 00:07:31,980 And I need you to understand that 180 00:07:31,980 --> 00:07:34,930 no line of coding go is atomic either. 181 00:07:34,930 --> 00:07:39,080 You see this ++ which is a read, modify, write. 182 00:07:39,080 --> 00:07:41,260 A read, modify, write operation. 183 00:07:41,260 --> 00:07:43,320 I've kinda simulated a larger scale. 184 00:07:43,320 --> 00:07:47,360 But value++ is also, at thought, at read, modify, write. 185 00:07:47,360 --> 00:07:48,720 You have to understand there are three lines 186 00:07:48,720 --> 00:07:51,670 of assembly language code behind that. 187 00:07:51,670 --> 00:07:53,770 And, even at the operating system level, 188 00:07:53,770 --> 00:07:55,800 any one of those three lines of code 189 00:07:55,800 --> 00:07:58,000 can be context switched. 190 00:07:58,000 --> 00:08:01,570 And then that assembly into machine code, machine code can 191 00:08:01,570 --> 00:08:05,580 even be broken down into more like lower level instructions. 192 00:08:05,580 --> 00:08:08,740 That any sort of preemptive context switch can happen. 193 00:08:08,740 --> 00:08:11,060 There is no one line of code, not even the one 194 00:08:11,060 --> 00:08:13,730 I've highlighted here, that is atomic. 195 00:08:13,730 --> 00:08:17,620 You have to demand the synchronization by coding it. 196 00:08:17,620 --> 00:08:20,890 That is your responsibility, alright. 197 00:08:20,890 --> 00:08:24,830 So we've got this logic here, and what I wanna know then is. 198 00:08:24,830 --> 00:08:28,150 How do we demand that these three instructions, 199 00:08:28,150 --> 00:08:30,620 or at least the increment of the counter, 200 00:08:30,620 --> 00:08:32,940 happens synchronously? 201 00:08:32,940 --> 00:08:36,000 Okay, especially in our multithreaded environment. 202 00:08:36,000 --> 00:08:37,950 Where we're running on multiple cores. 203 00:08:37,950 --> 00:08:39,520 Now you've got two choices here. 204 00:08:39,520 --> 00:08:42,060 You can either use the atomic instructions 205 00:08:42,060 --> 00:08:43,470 or you can use mutexes. 206 00:08:43,470 --> 00:08:46,470 Now atomic instructions are your fastest way to go. 207 00:08:46,470 --> 00:08:48,850 Because they're at the hardware level, okay. 208 00:08:48,850 --> 00:08:51,010 The hardware takes care of synchronization. 209 00:08:51,010 --> 00:08:54,240 But they're limited to just four or eight bytes of memory. 210 00:08:54,240 --> 00:08:56,620 They're great for counters like we have here. 211 00:08:56,620 --> 00:08:58,800 When you really have multiple lines of code 212 00:08:58,800 --> 00:09:01,770 that need to be synchronized in an atomic fashion. 213 00:09:01,770 --> 00:09:03,870 That's where mutexes are gonna come in. 214 00:09:03,870 --> 00:09:07,010 So let's start with the atomic instructions 215 00:09:07,010 --> 00:09:08,400 to clean this up. 216 00:09:08,400 --> 00:09:12,370 Start with the atomic instructions to clean this code up. 217 00:09:12,370 --> 00:09:15,750 Now, if I want to use the atomic instructions. 218 00:09:15,750 --> 00:09:19,280 I'm gonna be using the sync/atomic package. 219 00:09:19,280 --> 00:09:21,630 The atomic package under sync. 220 00:09:21,630 --> 00:09:24,760 And the atomic package has a really nice API, 221 00:09:24,760 --> 00:09:26,250 I'm gonna show it to you. 222 00:09:26,250 --> 00:09:28,140 But when we use the atomic package 223 00:09:28,140 --> 00:09:30,290 we need precision based integers. 224 00:09:30,290 --> 00:09:32,210 We can't just use int anymore. 225 00:09:32,210 --> 00:09:34,260 We've now gotta specify specifically 226 00:09:34,260 --> 00:09:36,920 are we using 64 bit or 32 bit integers. 227 00:09:36,920 --> 00:09:38,710 It has to be consistent across 228 00:09:38,710 --> 00:09:40,500 every platform that we're working on. 229 00:09:40,500 --> 00:09:42,960 That's just a constraint of the API. 230 00:09:42,960 --> 00:09:46,653 So I've moved counter now to a 64 bit integer. 231 00:09:47,730 --> 00:09:50,330 And we still got our constant of two. 232 00:09:50,330 --> 00:09:53,330 We got our wait group for our orchestration there. 233 00:09:53,330 --> 00:09:55,520 But notice that I've taken all 234 00:09:55,520 --> 00:09:57,980 three lines of code out of the program. 235 00:09:57,980 --> 00:10:01,373 And reduced it down to one call to AddInt64. 236 00:10:02,510 --> 00:10:05,280 Now if you've worked with the atomic instructions before 237 00:10:05,280 --> 00:10:07,720 in other languages, this API should look familiar. 238 00:10:07,720 --> 00:10:12,210 Look add a 32 or 64 bit, or add an address. 239 00:10:12,210 --> 00:10:14,210 Compare and swap, what that does 240 00:10:14,210 --> 00:10:16,580 is it compares two values atomically. 241 00:10:16,580 --> 00:10:19,710 If they're the same, then we swap the values out. 242 00:10:19,710 --> 00:10:22,300 Load is purely just for reading. 243 00:10:22,300 --> 00:10:24,350 Store is for writing. 244 00:10:24,350 --> 00:10:27,150 And this swap is a swap regardless of the compare. 245 00:10:27,150 --> 00:10:29,530 Basic atomic API set. 246 00:10:29,530 --> 00:10:31,770 And the way synchronization works here, 247 00:10:31,770 --> 00:10:35,850 is by accepting an address in the first parameter. 248 00:10:35,850 --> 00:10:38,710 And all synchronization is at the address level. 249 00:10:38,710 --> 00:10:42,210 So if you've got multiple goroutines like we're gonna have. 250 00:10:42,210 --> 00:10:45,360 Calling AddInt64 for the same memory location, 251 00:10:45,360 --> 00:10:46,580 like we're gonna have. 252 00:10:46,580 --> 00:10:48,530 Then these two goroutines get in line. 253 00:10:48,530 --> 00:10:50,260 We have synchronization, they get in line. 254 00:10:50,260 --> 00:10:51,690 But if these two goroutines called 255 00:10:51,690 --> 00:10:55,610 the same AddInt64 for different memory locations, 256 00:10:55,610 --> 00:10:57,580 they can continue to run in parallel. 257 00:10:57,580 --> 00:11:01,060 So it's the address, right, that the hardware is going 258 00:11:01,060 --> 00:11:04,330 to pick up on and then cause everything to go in line. 259 00:11:04,330 --> 00:11:05,683 This is our fastest way. 260 00:11:07,130 --> 00:11:09,410 But, what if we really wanted 261 00:11:09,410 --> 00:11:11,460 to maintain our logic as before. 262 00:11:11,460 --> 00:11:12,880 I wouldn't necessarily do that here. 263 00:11:12,880 --> 00:11:15,250 This is the best way to do that counter work. 264 00:11:15,250 --> 00:11:17,440 But let's say we did have those three lines of code. 265 00:11:17,440 --> 00:11:19,270 I wanted to maintain the three lines of code. 266 00:11:19,270 --> 00:11:20,103 And I had to make sure that 267 00:11:20,103 --> 00:11:21,900 nothing interrupted those three lines of code. 268 00:11:21,900 --> 00:11:24,030 They had to run one after the other 269 00:11:24,030 --> 00:11:26,280 every time without any interruption. 270 00:11:26,280 --> 00:11:29,773 Alright, this is where your mutex comes in.