1 00:00:06,570 --> 00:00:08,500 - So let's bring everything together 2 00:00:08,500 --> 00:00:11,530 with this really nasty program. 3 00:00:11,530 --> 00:00:14,110 But it's kinda more practical in how data races really 4 00:00:14,110 --> 00:00:15,820 end up in production, especially 5 00:00:15,820 --> 00:00:18,600 if you're not using the race detector. 6 00:00:18,600 --> 00:00:21,790 And in the past, how, without having something as beautiful 7 00:00:21,790 --> 00:00:24,310 as the race detector, data races could get into 8 00:00:24,310 --> 00:00:25,800 production-level code. 9 00:00:25,800 --> 00:00:29,010 So, imagine this scenario, Ben and Jerry are fighting 10 00:00:29,010 --> 00:00:31,180 with each other, they got into an argument 11 00:00:31,180 --> 00:00:33,630 that one thinks they're more popular than the other. 12 00:00:33,630 --> 00:00:35,280 Ben thinks he's the most popular. 13 00:00:35,280 --> 00:00:37,880 Jerry thinks he's the most popular. 14 00:00:37,880 --> 00:00:40,840 So they hire a marketing company to track likes, 15 00:00:40,840 --> 00:00:42,940 and they start this like campaign, 16 00:00:42,940 --> 00:00:45,470 where they wanna know who is more popular. 17 00:00:45,470 --> 00:00:47,820 So, let's take a look at some code we might write 18 00:00:47,820 --> 00:00:49,390 to solve this problem. 19 00:00:49,390 --> 00:00:53,100 There it is on line 16, we have what's called the speaker 20 00:00:53,100 --> 00:00:55,490 interface, when active behavior speak. 21 00:00:55,490 --> 00:01:00,160 What we want is users to speak up when they like Ben 22 00:01:00,160 --> 00:01:03,270 or speak up when they like Jerry. 23 00:01:03,270 --> 00:01:06,140 Now normally what we'd be doing is incrementing a like 24 00:01:06,140 --> 00:01:08,920 counter, but I wanna be able to detect 25 00:01:08,920 --> 00:01:11,170 our bad data races here. 26 00:01:11,170 --> 00:01:12,530 So we're gonna do something a little different, 27 00:01:12,530 --> 00:01:14,368 but imagine, every time you call speak, we're really gonna 28 00:01:14,368 --> 00:01:19,150 be speaking up and showing our support for Jerry or Ben. 29 00:01:19,150 --> 00:01:22,610 So, the Ben data structure has a name, it's Ben. 30 00:01:22,610 --> 00:01:25,480 And what we do in the speak, instead of right now, 31 00:01:25,480 --> 00:01:28,120 incrementing our like against the Ben value, 32 00:01:28,120 --> 00:01:30,140 what we do is we check Ben's name. 33 00:01:30,140 --> 00:01:33,360 Now if Ben is Ben, life is okay, but if we had a data 34 00:01:33,360 --> 00:01:36,940 race, we could end up in a situation where Ben is not 35 00:01:36,940 --> 00:01:39,060 Ben, Ben might end up being Jerry. 36 00:01:39,060 --> 00:01:40,110 We'll see that. 37 00:01:40,110 --> 00:01:42,110 And when that happens, we'll spit that information out, 38 00:01:42,110 --> 00:01:45,590 and we'll return fault so we can shut the program down. 39 00:01:45,590 --> 00:01:49,140 So there's Benny, Ben sorry, and here's Jerry. 40 00:01:49,140 --> 00:01:51,600 Jerry looks exactly like Ben, it just has a name, 41 00:01:51,600 --> 00:01:53,920 and the code is implemented the same, except Jerry's 42 00:01:53,920 --> 00:01:56,040 wanting to be Jerry all the time. 43 00:01:56,040 --> 00:01:56,873 So there it is. 44 00:01:56,873 --> 00:01:58,630 We have our speak, which traditionally, would increment 45 00:01:58,630 --> 00:02:01,220 a like, but Ben is checking out that it's always Ben, 46 00:02:01,220 --> 00:02:02,620 Jerry's always Jerry. 47 00:02:02,620 --> 00:02:06,710 Now, we start the program up, and I create my Ben value, 48 00:02:06,710 --> 00:02:10,340 here it is, there's Ben, here's Jerry, right. 49 00:02:10,340 --> 00:02:12,593 We're really gonna have our likes in here. 50 00:02:13,750 --> 00:02:17,290 And remember that we have implemented the speak methods 51 00:02:17,290 --> 00:02:19,310 against these concrete data values. 52 00:02:19,310 --> 00:02:22,570 Now, I start right out of the box with our person 53 00:02:22,570 --> 00:02:26,240 interface, and if you notice there, I'm storing, 54 00:02:26,240 --> 00:02:30,660 using pointer semantics, Ben inside of there. 55 00:02:30,660 --> 00:02:33,720 So when we call speak against the interface, 56 00:02:33,720 --> 00:02:36,240 we can either call speak against Ben or Jerry, 57 00:02:36,240 --> 00:02:40,510 depending on which data, in this case Ben, is in there. 58 00:02:40,510 --> 00:02:43,180 Call speak, we would be incrementing that to one, right. 59 00:02:43,180 --> 00:02:44,030 Okay, great. 60 00:02:44,030 --> 00:02:46,530 So we start this campaign, now imagine this is really 61 00:02:46,530 --> 00:02:49,258 a web service, right, with tens of thousands 62 00:02:49,258 --> 00:02:54,258 of requests coming in from Jerry and Ben's support. 63 00:02:54,330 --> 00:02:57,650 And here's a go routine that simulates maybe one request 64 00:02:57,650 --> 00:02:59,720 or a bunch of requests coming in. 65 00:02:59,720 --> 00:03:02,010 And you can see here that what this go routine does 66 00:03:02,010 --> 00:03:06,030 is it loads Ben into the interface, and then calls speak. 67 00:03:06,030 --> 00:03:11,030 So, that line on 65 is our write, it's a two-word right, 68 00:03:11,760 --> 00:03:14,010 in order to get Ben inside the interface. 69 00:03:14,010 --> 00:03:16,240 And then the call on 66 is our read. 70 00:03:16,240 --> 00:03:19,480 We got a write, a two-word write, and a read, okay. 71 00:03:19,480 --> 00:03:22,660 And then what we have is another go routine simulating 72 00:03:22,660 --> 00:03:26,310 other requests coming in that's saying no, no, no, no 73 00:03:26,310 --> 00:03:29,640 Jerry is the write, is who I support. 74 00:03:29,640 --> 00:03:32,450 Which now means we gotta do the two word-write, 75 00:03:32,450 --> 00:03:35,840 which says Jerry, and then we can call speak. 76 00:03:35,840 --> 00:03:38,660 So, I've got a data race here, right. 77 00:03:38,660 --> 00:03:42,930 I wanna show you how quickly this code 78 00:03:42,930 --> 00:03:45,830 suddenly has data corruption. 79 00:03:45,830 --> 00:03:47,241 Let's build it. 80 00:03:47,241 --> 00:03:48,110 (types on keyboard) 81 00:03:48,110 --> 00:03:49,500 And let's run it. 82 00:03:49,500 --> 00:03:52,817 Doesn't take long for the program to say Jerry says, 83 00:03:52,817 --> 00:03:54,720 "Hello my name is Ben." 84 00:03:54,720 --> 00:03:57,160 This is an integrity issue, and understand the code 85 00:03:57,160 --> 00:03:59,550 didn't blow up, the very worst thing that can ever happen 86 00:03:59,550 --> 00:04:02,470 to you in a data race is your code keeps running. 87 00:04:02,470 --> 00:04:05,330 We'll explain why this code doesn't blow up in a second. 88 00:04:05,330 --> 00:04:09,890 But for it to say Jerry says, means that the first word 89 00:04:09,890 --> 00:04:14,480 says that there is a Jerry pointer inside the interface. 90 00:04:14,480 --> 00:04:16,340 Because, this word is telling us which 91 00:04:16,340 --> 00:04:19,700 concrete implementation to call, which is Jerry. 92 00:04:19,700 --> 00:04:22,760 But what's happened is only half this interface 93 00:04:22,760 --> 00:04:27,050 has been written to before the other go routine calls speak. 94 00:04:27,050 --> 00:04:30,410 So the pointer was still pointing to Ben, remember it takes 95 00:04:30,410 --> 00:04:34,430 two operations here, you gotta change out what type 96 00:04:34,430 --> 00:04:36,230 of data is in the interface, and then you gotta 97 00:04:36,230 --> 00:04:38,520 change out the pointer to the concrete data. 98 00:04:38,520 --> 00:04:42,010 Well only half of the write operation took place, 99 00:04:42,010 --> 00:04:45,860 before the go routine called SP, or speak. 100 00:04:45,860 --> 00:04:48,930 And so now, suddenly, guess what? 101 00:04:48,930 --> 00:04:51,600 Ben got a like for Jerry. 102 00:04:51,600 --> 00:04:54,690 We wouldn't see this in code, because it didn't blow up. 103 00:04:54,690 --> 00:04:57,660 The reason it didn't blow up is because these two data 104 00:04:57,660 --> 00:05:00,760 structures that we're using are identical, 105 00:05:00,760 --> 00:05:03,130 they have the same exact memory layout. 106 00:05:03,130 --> 00:05:05,720 So nothing's gonna cause this code to blow up. 107 00:05:05,720 --> 00:05:08,630 Incrementing here is gonna be the same as incrementing here. 108 00:05:08,630 --> 00:05:10,380 In our case, the names are the same. 109 00:05:10,380 --> 00:05:12,710 So since the data models are the same against 110 00:05:12,710 --> 00:05:15,363 all the concrete types, guess what, this code doesn't blow 111 00:05:15,363 --> 00:05:16,650 up, it keeps running. 112 00:05:16,650 --> 00:05:18,360 And we wouldn't even know about it until somebody 113 00:05:18,360 --> 00:05:21,530 in marketing said, you know what, look I like Ben 114 00:05:21,530 --> 00:05:24,270 as much as the next person, but there's no way Ben 115 00:05:24,270 --> 00:05:26,160 is that much more popular than Jerry, 116 00:05:26,160 --> 00:05:28,500 it just statistically doesn't make sense. 117 00:05:28,500 --> 00:05:31,730 And now you're being asked, hey you got a bug in your 118 00:05:31,730 --> 00:05:33,410 program and I need you to find it. 119 00:05:33,410 --> 00:05:36,020 Yet, there is no stack trace, there is no corruption, 120 00:05:36,020 --> 00:05:38,890 I mean there's no code blowing up, there's no way 121 00:05:38,890 --> 00:05:40,840 to indicate where the data corruption is. 122 00:05:40,840 --> 00:05:45,340 This is why data races are nasty, nasty. 123 00:05:45,340 --> 00:05:49,420 Now, if I build this program using the race detector. 124 00:05:49,420 --> 00:05:50,253 There it is. 125 00:05:50,253 --> 00:05:51,590 When you use the race detector on a build, 126 00:05:51,590 --> 00:05:53,530 you have to build the binary and then run it. 127 00:05:53,530 --> 00:05:57,120 You can see how quickly the race detector identified 128 00:05:57,120 --> 00:05:59,180 that there is a race, right. 129 00:05:59,180 --> 00:06:03,040 Go routine seven, go routine six, and there in this case 130 00:06:03,040 --> 00:06:06,380 was two writes that happened at the same time 131 00:06:06,380 --> 00:06:07,340 that were not synchronized. 132 00:06:07,340 --> 00:06:10,730 Line 76 and line 65. 133 00:06:10,730 --> 00:06:15,730 So we're seeing here that line 76 and line 65, we could see 134 00:06:16,490 --> 00:06:18,510 here that these two calls happened to happen 135 00:06:18,510 --> 00:06:21,530 at the same time on this pass of execution. 136 00:06:21,530 --> 00:06:23,330 Yay for the race detector. 137 00:06:23,330 --> 00:06:25,690 Now you might say, oh Bill, you know what, forget about 138 00:06:25,690 --> 00:06:28,770 the race detector, I would have found this regardless. 139 00:06:28,770 --> 00:06:31,000 And I'm gonna say to you, no you wouldn't have. 140 00:06:31,000 --> 00:06:36,000 What if I use go maxprocks to only use a single core. 141 00:06:38,470 --> 00:06:41,310 I am now a single-threaded go program, maybe I'm running 142 00:06:41,310 --> 00:06:43,340 this stuff in Docker on my local machine, and I've 143 00:06:43,340 --> 00:06:46,350 restricted my containers to just use one core, 144 00:06:46,350 --> 00:06:50,330 because you know, I'm trying to use resources cleanly, 145 00:06:50,330 --> 00:06:52,810 and I've put this all into my Docker containers. 146 00:06:52,810 --> 00:06:55,680 Now, I've run the race detector, which is really funny. 147 00:06:55,680 --> 00:06:58,870 Notice that the race detector still found a race 148 00:06:58,870 --> 00:07:00,680 with this, but let's do something. 149 00:07:00,680 --> 00:07:04,120 Let's rebuild the binary, okay, without the race detector. 150 00:07:04,120 --> 00:07:06,740 Do this again, and look what happens. 151 00:07:06,740 --> 00:07:11,740 The race is technically never happening because 152 00:07:12,710 --> 00:07:14,440 since we're running on one thread, 153 00:07:14,440 --> 00:07:16,750 all of these reads and writes are happening 154 00:07:16,750 --> 00:07:20,139 in a synchronous way, but it's an accident. 155 00:07:20,139 --> 00:07:21,610 It's an accident. 156 00:07:21,610 --> 00:07:25,130 We're getting very lucky that every one of these two calls 157 00:07:25,130 --> 00:07:27,023 are happening atomically right now, 158 00:07:27,023 --> 00:07:29,250 when I'm single threaded. 159 00:07:29,250 --> 00:07:31,610 We don't see anything, our tests are working 160 00:07:31,610 --> 00:07:34,070 in single-threaded environment, our code is working 161 00:07:34,070 --> 00:07:35,650 in a single-threaded environment. 162 00:07:35,650 --> 00:07:38,390 But probably what's gonna happen is when this code 163 00:07:38,390 --> 00:07:42,880 moves to production, then we really start having a problem. 164 00:07:42,880 --> 00:07:44,690 Because in production, we're not doing single-threaded, 165 00:07:44,690 --> 00:07:46,200 maybe we're doing multi-threaded. 166 00:07:46,200 --> 00:07:48,570 So without the race detector, we're gonna be in a lot 167 00:07:48,570 --> 00:07:49,490 of trouble here. 168 00:07:49,490 --> 00:07:52,660 And this again is a very classic, historical way 169 00:07:52,660 --> 00:07:55,100 of how data races have entered production code. 170 00:07:55,100 --> 00:07:58,110 And how nasty they are to find, why it could take years 171 00:07:58,110 --> 00:08:00,510 to find the data race, because there's no indication 172 00:08:00,510 --> 00:08:05,340 in the code, there's no indication from stack traces, 173 00:08:05,340 --> 00:08:07,630 or things blowing up, it's purely a problem 174 00:08:07,630 --> 00:08:10,660 of data corruption, and it becomes very difficult 175 00:08:10,660 --> 00:08:13,593 to find data corruption sometimes just by looking at code.