1 00:00:01,400 --> 00:00:05,800 To illustrate how powerful a lot of the go concepts are such as channels, go 2 00:00:05,800 --> 00:00:09,800 routines and interfaces. I've got a 62 line program here 3 00:00:10,100 --> 00:00:14,200 which is actually a scalable work system. So what it can do is 4 00:00:14,200 --> 00:00:18,500 read line by line from standard in a list of jobs to do, 5 00:00:18,500 --> 00:00:22,600 and then run those jobs in parallel, collate the 6 00:00:22,600 --> 00:00:26,900 output and output that on standard out. And all of 7 00:00:26,900 --> 00:00:29,600 that is in 62, lines, it makes use of go routines. 8 00:00:30,000 --> 00:00:34,900 Channels for fan-out interfaces for defining, how work can be done and I'm going to show you 9 00:00:34,900 --> 00:00:37,700 how it works. Then we'll make an actual program, using it, that does something. 10 00:00:37,700 --> 00:00:41,500 So, first of all, I've defined an 11 00:00:41,500 --> 00:00:44,500 interface called task, a task 12 00:00:44,500 --> 00:00:48,900 is able to do two things. It has 13 00:00:48,900 --> 00:00:52,900 a process interface or a function which is called to process 14 00:00:52,900 --> 00:00:56,800 that task do whatever is actually needed and it has a function 15 00:00:56,800 --> 00:00:59,800 called output which is used to write the output. 16 00:01:00,300 --> 00:01:04,600 So will actually send out send out the output. There is also an 17 00:01:04,600 --> 00:01:08,200 interface which I call a factory and that is a thing that can create a 18 00:01:08,200 --> 00:01:12,200 task given some string. Now that should be a string that's 19 00:01:12,200 --> 00:01:16,700 received from standard in. So the idea is, this program will read from standard 20 00:01:16,700 --> 00:01:20,800 in call, create in your factory, make tasks and then run 21 00:01:20,800 --> 00:01:24,200 them in parallel and use output to get the output. 22 00:01:25,300 --> 00:01:29,800 So look at how this operates. So the Run function takes a factory, which is a 23 00:01:29,900 --> 00:01:33,900 An interface and it runs a couple 24 00:01:33,900 --> 00:01:37,900 of go routines. So the first go routine is using buff 25 00:01:37,900 --> 00:01:41,800 IO new scanner. Now the Buffalo scanner is a very nice thing. Let's 26 00:01:42,300 --> 00:01:44,100 let's bring that up here. 27 00:01:46,200 --> 00:01:50,600 So there's a scanner which is a thing for receiving, 28 00:01:50,600 --> 00:01:54,400 essentially, new line delimited pieces of text and it 29 00:01:54,500 --> 00:01:58,700 automatically wraps. All the usual things you do for reading. It doesn't actually 30 00:01:58,700 --> 00:02:02,800 scan Fields out. That's up to you, but it gives you the line by line and you just 31 00:02:02,800 --> 00:02:06,800 make a simple for Loop like this. And at each time that for Loop 32 00:02:06,900 --> 00:02:10,900 is called if there's a line available, it will be available here in s 33 00:02:11,200 --> 00:02:14,900 text and you can do something with it. And in this instance, 34 00:02:15,800 --> 00:02:19,800 What we'll do is we will call the create method on the 35 00:02:19,800 --> 00:02:23,800 factory that we have to create a task and pass it down a 36 00:02:23,800 --> 00:02:27,900 channel. And that channel is this guy is a channel that are so they'll just be this channel 37 00:02:27,900 --> 00:02:31,900 will get filled with tasks to do and it's an 38 00:02:31,900 --> 00:02:35,900 unbuffered channel. So somebody loves you have to be willing to do that to handle 39 00:02:35,900 --> 00:02:39,900 that task. Until before, that can actually be transmitted 40 00:02:40,100 --> 00:02:44,800 if anything goes wrong, then we'll send an error and when we're done we close that channel to say. There's no more 41 00:02:44,800 --> 00:02:45,300 work. 42 00:02:45,700 --> 00:02:49,500 And I'm using a white group to count go routine, just so I get 43 00:02:49,500 --> 00:02:53,700 clean termination of the end. So I've created a weight group for every go routine, I 44 00:02:53,700 --> 00:02:57,700 add one, go routine and every time, one terminates, I call 45 00:02:57,700 --> 00:03:01,600 done. Now, to do the work, I actually need some 46 00:03:01,600 --> 00:03:05,400 workers, and I've hard coded in here that I'm going to create a 47 00:03:05,400 --> 00:03:09,700 thousand. So I'm going to create a thousand go routines and I'm going to use the weight group to count 48 00:03:09,700 --> 00:03:13,900 them and each of these go routines is really very simple. It's going to pull a 49 00:03:13,900 --> 00:03:15,400 task off of 50 00:03:15,500 --> 00:03:19,600 The input channel in the task is T. And if you remember that 51 00:03:19,600 --> 00:03:23,800 task is an interface that has a method called process. So for 52 00:03:23,800 --> 00:03:27,300 each task it is going to call process to do whatever that task requires 53 00:03:27,700 --> 00:03:31,000 and then it's going to Output that down and output channel. So 54 00:03:32,000 --> 00:03:36,800 what's going to happen? He completely naturally in go is load balancing across a thousand. Go 55 00:03:36,800 --> 00:03:40,700 routine. So a thousand go routines are going to try and listen on in 56 00:03:40,800 --> 00:03:44,800 that's what this range in does. And one girl routine, this 57 00:03:44,800 --> 00:03:45,300 one that's really 58 00:03:45,400 --> 00:03:49,500 new standard in is going to write to in. So imagine a thousand, go routine start. 59 00:03:49,500 --> 00:03:53,800 Somebody starts writing and they just get picked up automatically as there's work to 60 00:03:53,800 --> 00:03:57,300 do. And when in is closed, then what will happen is? 61 00:03:57,300 --> 00:04:01,900 This will terminate. This Loop will end will call down on the weight group. The goal team 62 00:04:01,900 --> 00:04:05,900 will terminate so as soon as we close the in channel to say that we're done, there's no more 63 00:04:05,900 --> 00:04:08,800 work to be done. All those go routines, will terminate 64 00:04:09,900 --> 00:04:13,700 You also have a separate goroutine here and it's waiting 65 00:04:13,700 --> 00:04:17,000 for all of the go routines to terminate, and then it closes the out Channel 66 00:04:18,000 --> 00:04:22,900 and what's happening in the output. Is finally the program reads from the output so 67 00:04:22,900 --> 00:04:26,700 you can imagine. Now this is the fan inside, those thousand, go routines, 68 00:04:26,700 --> 00:04:30,800 all writing to the channel and this output will output all of 69 00:04:30,800 --> 00:04:34,900 the output from all of the tasks that been processed. So right 70 00:04:34,900 --> 00:04:38,900 here if you just Implement your factory and your process 71 00:04:38,900 --> 00:04:39,600 and output method, 72 00:04:39,700 --> 00:04:43,800 It's on some type, then you're done, you have something you can scale. 73 00:04:44,100 --> 00:04:48,600 Obviously, a thousand is hard coded, it could be any other number. We could use the flag package to set 74 00:04:48,600 --> 00:04:52,800 that and typically you might want to do that because you might want to control, how many simultaneous tasks 75 00:04:52,800 --> 00:04:56,600 work. But really in sixty two lines here. You've got something, which could be quite 76 00:04:56,600 --> 00:05:00,900 complicated to write in a language. There's no explicit synchronizations 77 00:05:00,900 --> 00:05:04,500 and no need to worry about how that's going to work. There's no load, balancer. That's all 78 00:05:04,500 --> 00:05:08,900 part of the program itself. So now that I've got this flexible work system where 79 00:05:08,900 --> 00:05:09,500 they can scale up, 80 00:05:09,600 --> 00:05:13,700 Up or scale down. Let's do something concrete with it. And what I'm going to do with it is except on 81 00:05:13,700 --> 00:05:17,800 standing in a list of URLs web pages. Go get those web 82 00:05:17,800 --> 00:05:21,700 pages in parallel and output, whether they were successful, 83 00:05:21,700 --> 00:05:25,800 not to whether there was any sort of error getting those pages and to do 84 00:05:25,800 --> 00:05:29,600 that, I'm going to have to define a task and a factory. So the task I've 85 00:05:29,600 --> 00:05:33,900 started here, the task is going to have a URL. That's the URL to get and a 86 00:05:33,900 --> 00:05:37,900 Boolean which indicates whether getting that page, worked correctly, and 87 00:05:37,900 --> 00:05:39,500 you've got a couple of functions. You got to 88 00:05:40,100 --> 00:05:44,900 The output function is going to be pretty simple. I'm just going to print out the URL and I'm going to say 89 00:05:44,900 --> 00:05:48,400 truth value. Did it work or not and the process functions a little more 90 00:05:48,400 --> 00:05:52,800 complicated. We're going to have to use get to some Network stuff here. So 91 00:05:52,900 --> 00:05:56,800 we can have a look at in the net HTTP package. There's a really nice little function called 92 00:05:56,800 --> 00:06:00,900 get. And what get does is it gets a URL gives 93 00:06:00,900 --> 00:06:04,900 you a response and an error. So I'm going to use that. So I'm going to import 94 00:06:04,900 --> 00:06:05,600 that up here. 95 00:06:10,800 --> 00:06:14,500 So we're pressing a task going to say response, error equals 96 00:06:14,500 --> 00:06:18,700 HTTP, get the URL. And well, 97 00:06:19,000 --> 00:06:23,200 if there's any sort of actual error occurs, then clearly 98 00:06:23,800 --> 00:06:27,500 things are not okay, and we're going to return from there. 99 00:06:28,600 --> 00:06:32,900 And then going to take a look at the response because an anything other than an HTTP error 200, we 100 00:06:32,900 --> 00:06:36,800 my indicators it problem. So we can have a look at that and let's just use 101 00:06:36,800 --> 00:06:38,200 go doc to look at that. 102 00:06:43,700 --> 00:06:47,500 So, the response structure actually has the status code here is an 103 00:06:47,500 --> 00:06:51,700 integer. Okay, we can do that so we can say response 104 00:06:51,700 --> 00:06:55,800 dot status, code goes 200. 105 00:06:55,800 --> 00:06:59,700 And in fact within the net HTTP package there's 106 00:06:59,700 --> 00:07:03,800 actually nice defines for all those things you can 107 00:07:03,800 --> 00:07:07,600 find in here. There's so let's be, let's be good. Program is not using 108 00:07:07,600 --> 00:07:11,900 magic numbers so status. Okay. So if 109 00:07:11,900 --> 00:07:12,900 it's okay, 110 00:07:13,500 --> 00:07:17,500 I'm going to set it to true, and we'll be done. 111 00:07:19,700 --> 00:07:20,400 Otherwise. 112 00:07:22,300 --> 00:07:26,900 I can set it to false explicit. He's been necessary to do that because the fact that by 113 00:07:26,900 --> 00:07:30,900 default will be false, but there you go. So that's the process that I think. So, an 114 00:07:30,900 --> 00:07:34,500 individual task me process like that and because it supports process and 115 00:07:34,500 --> 00:07:38,700 output. It's a task and now all we need to factory and if you remember the 116 00:07:38,700 --> 00:07:42,900 factory is the thing that's going to read from every line. So 117 00:07:42,900 --> 00:07:44,000 let's just make one of these. 118 00:07:44,700 --> 00:07:48,900 This isn't going to be very interesting cuz it doesn't have any anything ready to do. But I wanted to 119 00:07:48,900 --> 00:07:50,100 find a function on it. 120 00:07:52,400 --> 00:07:56,700 And that's remember it's this thing create. So I'm going to get a line given to me 121 00:07:57,800 --> 00:08:01,900 and I'm going to return a task. And in this case, I'm going to return an 122 00:08:01,900 --> 00:08:05,200 HTTP task. So we're just going to say 123 00:08:05,200 --> 00:08:06,200 tasks 124 00:08:08,500 --> 00:08:12,200 And we do this. So, very simply with your say, 125 00:08:12,700 --> 00:08:16,800 let's create an HTTP task set, 126 00:08:16,800 --> 00:08:20,000 its URL, to be able to the line and 127 00:08:20,300 --> 00:08:24,300 return it okay. 128 00:08:27,100 --> 00:08:31,900 Talk back. Let's see if we can type things correctly and I'm going 129 00:08:31,900 --> 00:08:33,400 to need for Matt as well. 130 00:08:36,100 --> 00:08:40,700 Okay, so that compiles nicely. So now it's just a question of hooking, this whole thing up. So I say run 131 00:08:41,400 --> 00:08:45,200 and the Run thing requires a factory. So let's make ourselves a factory. 132 00:08:51,100 --> 00:08:51,400 Okay. 133 00:09:01,300 --> 00:09:05,800 That guy's running. So it's running now, waiting for some input, so, let's actually try 134 00:09:05,800 --> 00:09:06,900 something out. 135 00:09:12,500 --> 00:09:16,900 And if I let us build it as a as a program and then it's 136 00:09:16,900 --> 00:09:17,400 good. 137 00:09:23,100 --> 00:09:27,800 So he went off and did that, it started a thousand go routines around the whole thing and it 138 00:09:27,800 --> 00:09:31,800 managed to get that particular page. So that's good for one thing. 139 00:09:32,300 --> 00:09:36,800 Now, let's have a look at doing something like this and that one false because in fact 140 00:09:36,800 --> 00:09:40,800 that would have created a 404 error on that particular web page. But will be 141 00:09:40,800 --> 00:09:44,900 more interesting is if we had a number of URLs. So let's just make ourselves a quick little 142 00:09:44,900 --> 00:09:45,800 file here. 143 00:09:47,800 --> 00:09:50,600 And Okay. So 144 00:10:04,600 --> 00:10:07,900 Okay, so there's some files and we can do this. 145 00:10:10,500 --> 00:10:14,800 And so out, they come as it works its way through these things. See if I 146 00:10:14,800 --> 00:10:15,400 Riley 147 00:10:16,500 --> 00:10:20,500 Okay, so there they all are now. Those were all run in 148 00:10:20,500 --> 00:10:24,200 parallel, but we could actually slowed things down. Let's just change this to 149 00:10:24,800 --> 00:10:26,700 only have one routine. 150 00:10:32,600 --> 00:10:36,500 And I was going to have to do them one by one and this would be much more obvious with many, 151 00:10:36,500 --> 00:10:40,900 many thousands of Euros. So just to cheat a bit, I'm going 152 00:10:40,900 --> 00:10:41,600 to do this. 153 00:11:00,700 --> 00:11:01,800 Let's clean this up. 154 00:11:04,900 --> 00:11:05,400 All right. 155 00:11:07,400 --> 00:11:11,900 I got all those timelines Bliss. Just in here if the URL is blanket, and we're just 156 00:11:11,900 --> 00:11:12,800 give up like this. 157 00:11:14,000 --> 00:11:17,800 if stroke URL equals blank, then 158 00:11:23,600 --> 00:11:27,800 That way in case is anything silly in the data and let's 159 00:11:27,800 --> 00:11:28,400 build it, 160 00:11:32,700 --> 00:11:36,600 and how it's going to go through fairly slowly, because it's still doing only one at a 161 00:11:36,600 --> 00:11:37,200 time. 162 00:11:38,700 --> 00:11:42,800 So let's do this. I say this in 163 00:11:42,800 --> 00:11:46,400 value in which is stop that expended while doing that 164 00:11:46,800 --> 00:11:50,600 and let's make n. The number of things were going to 165 00:11:50,600 --> 00:11:54,700 run simultaneously as use the flag package 166 00:11:56,000 --> 00:11:56,600 and 167 00:12:02,200 --> 00:12:05,200 This account by default. 10 of those. 168 00:12:07,900 --> 00:12:10,200 Number of workers like that. 169 00:12:14,300 --> 00:12:18,600 And n equals talking. So we just take whatever that is and then we'll be able to use 170 00:12:18,600 --> 00:12:22,500 that here with in here. So 171 00:12:25,200 --> 00:12:26,800 Okay, so go back to that. 172 00:12:27,900 --> 00:12:31,400 Now we got 10 running so you can see things are spinning along a bit, a bit 173 00:12:31,400 --> 00:12:32,000 faster. 174 00:12:33,200 --> 00:12:36,100 And what I can do is I can go in here and say, 175 00:12:38,500 --> 00:12:42,700 Count equals 1000. So now it's running a thousand. Go routine simultaneously 176 00:12:42,700 --> 00:12:46,700 running through those things as fast as you can. As you can see, some sites are actually 177 00:12:46,800 --> 00:12:50,600 faster to reply to us than others. So it's actually done everything in the 178 00:12:50,600 --> 00:12:54,600 entire list. That was in there. Let's throw in a bad 179 00:12:54,600 --> 00:12:55,000 one. 180 00:12:58,700 --> 00:13:02,600 To make it a bit more interesting. So you can see there, it was right there 181 00:13:03,500 --> 00:13:07,300 and you can see it's running them in pretty much any order either coming through and just getting 182 00:13:07,500 --> 00:13:11,800 executed as as they're available. So, sixty 183 00:13:11,800 --> 00:13:15,800 two lines of code and you've got a completely flexible system. You could obviously 184 00:13:15,800 --> 00:13:19,800 change this to not take from standard in, but the basic 185 00:13:19,800 --> 00:13:23,900 concept is here. And in fact, this program, I've used hundreds of times for different different 186 00:13:23,900 --> 00:13:26,300 purposes because it makes a very very quick. 187 00:13:26,400 --> 00:13:27,400 Parallel system. 188 00:13:29,900 --> 00:13:33,800 So if you look at the output from before, you might have been surprised that everything seems to be in 189 00:13:33,800 --> 00:13:37,700 order in some way, all the cloudflare was coming first and I think it was Google and O'Reilly in 190 00:13:37,700 --> 00:13:41,800 Yahoo and you may have been wondering if there was some sorting happening inside. 191 00:13:42,200 --> 00:13:46,900 In fact, the ordering is coming about because of the speed of those relative websites and to find 192 00:13:46,900 --> 00:13:50,800 that out, I can actually change the program a little bit. So what I've done is 193 00:13:50,800 --> 00:13:54,300 I've added to the task, a couple of other 194 00:13:54,400 --> 00:13:58,900 values, these are time durations one is elapsed which will be how long it 195 00:13:58,900 --> 00:13:59,600 actually took to. 196 00:13:59,700 --> 00:14:03,800 The web page and the other one is starred. Which is when did this particular task 197 00:14:03,800 --> 00:14:07,800 start processing? It should be the case that given the number of URLs we 198 00:14:07,800 --> 00:14:11,500 have and we have lots and lots of workers. They should all start pretty much the same time 199 00:14:11,900 --> 00:14:15,900 and then what I'm doing is when I output it, I'm going to say when 200 00:14:15,900 --> 00:14:19,800 did it start relative to the beginning of the program? And also how long did it take to 201 00:14:19,800 --> 00:14:23,600 run? So if we just build that 202 00:14:25,300 --> 00:14:29,500 oh and also I keep track of when the program started here. And when the factory starts up, 203 00:14:29,700 --> 00:14:33,800 Up I keep track of what the time now is as if we do that, and 204 00:14:33,800 --> 00:14:37,600 then go back and run through those URLs or with the 205 00:14:37,600 --> 00:14:41,900 Thousand workers we can run through. And again, you can kind of see 206 00:14:41,900 --> 00:14:45,900 the ordering. So they started around between 207 00:14:45,900 --> 00:14:49,600 three and six milliseconds, into the start of the program, various things started 208 00:14:49,600 --> 00:14:53,400 running. So they all started running with a few milliseconds of each other couple of 209 00:14:53,400 --> 00:14:57,800 milliseconds, but the times to actually finish downloading varied a lot. 210 00:14:57,800 --> 00:14:59,600 So for cloudflare, it was 211 00:14:59,800 --> 00:15:03,900 In the sort of four to six hundred milliseconds to get the complete page. Google 212 00:15:03,900 --> 00:15:07,900 about a second or Riley little bit longer. You can see there was one Google that 213 00:15:07,900 --> 00:15:11,700 was long there, Yahoo a bit longer etcetera. So the 214 00:15:11,700 --> 00:15:15,400 ordering actually came about just because of the speed of downloading the pages and you can see 215 00:15:15,900 --> 00:15:19,900 the faster, the page of the nearer, the beginning was. But looking at the timings here, you can 216 00:15:19,900 --> 00:15:23,400 see that it was running simultaneously. That all of those 217 00:15:23,400 --> 00:15:27,300 requests. At the same time we're executing without really no effort on our part.