1 00:00:06,810 --> 00:00:09,720 - So benchmarking in Go, is incredibly powerful. 2 00:00:09,720 --> 00:00:11,140 I'm going to show you some aspects 3 00:00:11,140 --> 00:00:13,150 of basic benchmarking now, 4 00:00:13,150 --> 00:00:15,150 and then when we get into profiling, 5 00:00:15,150 --> 00:00:16,800 I'm going to show you some more. 6 00:00:16,800 --> 00:00:19,670 But, let's just start with a basic benchmark here. 7 00:00:19,670 --> 00:00:21,180 Let's look at the mechanics 8 00:00:21,180 --> 00:00:22,750 and look at how we can write these 9 00:00:22,750 --> 00:00:25,830 and run a couple of them right there on the command line. 10 00:00:25,830 --> 00:00:28,650 So, what I'm starting out with is on line 17. 11 00:00:28,650 --> 00:00:31,270 In fact, look at line 17 and line 28. 12 00:00:31,270 --> 00:00:36,050 The idea of this benchmark is to do a CPU profile 13 00:00:36,050 --> 00:00:37,590 of how fast 14 00:00:37,590 --> 00:00:42,050 the S print or S print f function run. 15 00:00:42,050 --> 00:00:45,070 And one thing about benchmarking, is that we gotta really 16 00:00:45,070 --> 00:00:47,450 make sure that we're not guessing. 17 00:00:47,450 --> 00:00:50,800 We gotta make sure that the benchmark we're writing 18 00:00:50,800 --> 00:00:51,860 is accurate. 19 00:00:51,860 --> 00:00:53,550 And one of the things I worry about, 20 00:00:53,550 --> 00:00:55,490 is since this code's going to be compiled, 21 00:00:55,490 --> 00:00:59,550 and the compiler is capable of throwing dead code away. 22 00:00:59,550 --> 00:01:03,490 With the SSA backend, it is capable of identifying, 23 00:01:03,490 --> 00:01:06,910 like this scenario, you call a function like S print, 24 00:01:06,910 --> 00:01:09,490 and S print doesn't really mutate state. 25 00:01:09,490 --> 00:01:10,900 It's using value semantics. 26 00:01:10,900 --> 00:01:12,880 It gets its own copy of the string, 27 00:01:12,880 --> 00:01:14,710 returns a new string. 28 00:01:14,710 --> 00:01:17,490 So, if you don't capture the new string coming out 29 00:01:17,490 --> 00:01:19,460 of S print or S print f, 30 00:01:19,460 --> 00:01:21,890 technically the compiler can identify that, 31 00:01:21,890 --> 00:01:25,630 let's not waste any CPU cycles making this function call. 32 00:01:25,630 --> 00:01:27,880 And suddenly, you're trying to do a benchmark like this 33 00:01:27,880 --> 00:01:30,400 to figure out what's faster, S print or S print f. 34 00:01:30,400 --> 00:01:32,340 And it's blazing fast for both, 35 00:01:32,340 --> 00:01:34,740 because you didn't capture the output. 36 00:01:34,740 --> 00:01:36,670 So, what we're doing in this test 37 00:01:36,670 --> 00:01:38,880 is also making sure that the code 38 00:01:38,880 --> 00:01:40,610 is not getting thrown away, 39 00:01:40,610 --> 00:01:44,170 but also that we're using code the same way 40 00:01:44,170 --> 00:01:47,700 as accurately as we can, like we do in production. 41 00:01:47,700 --> 00:01:48,920 Now, here's the test. 42 00:01:48,920 --> 00:01:52,010 I want to know what's faster, S print or S print f, 43 00:01:52,010 --> 00:01:53,900 for the string, hello. 44 00:01:53,900 --> 00:01:57,720 My guess is, is that S print is going to be faster 45 00:01:57,720 --> 00:02:00,330 because we're not doing any formatting. 46 00:02:00,330 --> 00:02:02,720 And remember, any time you guess about something, 47 00:02:02,720 --> 00:02:04,910 you got a good chance you're going to be wrong, 48 00:02:04,910 --> 00:02:06,080 but this one's going to be obvious, right? 49 00:02:06,080 --> 00:02:09,600 I'm just showing you a basic test, S print versus S print f. 50 00:02:09,600 --> 00:02:11,010 Now, look at the structure of the test, 51 00:02:11,010 --> 00:02:12,900 we've got a function named Benchmark, 52 00:02:12,900 --> 00:02:15,940 this time it takes a testing B pointer, 53 00:02:15,940 --> 00:02:17,960 not a testing T, a testing B pointer. 54 00:02:17,960 --> 00:02:20,020 It's the word Benchmark, we have the same rules, 55 00:02:20,020 --> 00:02:22,700 it has to be exported S print. 56 00:02:22,700 --> 00:02:25,230 'Kay, S print also becomes the name for that particular 57 00:02:25,230 --> 00:02:27,330 test Benchmark function. 58 00:02:27,330 --> 00:02:29,583 And the big thing here is the loop. 59 00:02:30,600 --> 00:02:33,490 Now, the way the loop works, is the first time 60 00:02:33,490 --> 00:02:36,700 the testing framer calls this Benchmark function, 61 00:02:36,700 --> 00:02:38,910 b.N is going to be one. 62 00:02:38,910 --> 00:02:41,880 And then it's going to increase b.N over time. 63 00:02:41,880 --> 00:02:44,840 The idea is, we want enough iterations of the loop 64 00:02:44,840 --> 00:02:47,540 to feel confident of the results. 65 00:02:47,540 --> 00:02:50,140 Now, by default, the test will run for about one second. 66 00:02:50,140 --> 00:02:52,130 Which means it runs for a few seconds. 67 00:02:52,130 --> 00:02:54,120 We're going to up it to about three seconds, 68 00:02:54,120 --> 00:02:55,870 just to make sure we get enough iterations. 69 00:02:55,870 --> 00:02:57,590 You gotta figure out what enough iterations are. 70 00:02:57,590 --> 00:02:58,980 A million, a hundred million. 71 00:02:58,980 --> 00:02:59,950 It's up to you. 72 00:02:59,950 --> 00:03:03,700 So you can increase the bench time to get more iterations 73 00:03:03,700 --> 00:03:05,830 to feel more confident, but again, 74 00:03:05,830 --> 00:03:07,480 it's going to take longer to run. 75 00:03:08,570 --> 00:03:11,410 So, look at this Benchmark function S print. 76 00:03:11,410 --> 00:03:13,260 We've got our loop b.N. 77 00:03:13,260 --> 00:03:14,860 We are calling S print. 78 00:03:14,860 --> 00:03:18,000 I'm assigning the string coming back out, to s. 79 00:03:18,000 --> 00:03:19,360 Which is a local variable. 80 00:03:19,360 --> 00:03:21,770 We know that if we don't use the local variable, 81 00:03:21,770 --> 00:03:23,460 Go's complier's going to complain. 82 00:03:23,460 --> 00:03:25,890 I don't want to add extra CPU cycles here. 83 00:03:25,890 --> 00:03:29,410 So, we assign it to a global variable, just so the compiler 84 00:03:29,410 --> 00:03:30,390 won't throw it away. 85 00:03:30,390 --> 00:03:32,620 That's all we're doing here with gs, 86 00:03:32,620 --> 00:03:35,040 is to validate that we don't throw anything away. 87 00:03:35,040 --> 00:03:37,140 That's not going to give us any extra cost 88 00:03:37,140 --> 00:03:39,880 on this benchmark, because the real cost that we're 89 00:03:39,880 --> 00:03:42,070 looking at here is inside the loop. 90 00:03:42,070 --> 00:03:45,010 Okay, so, we got S print and we got S print f. 91 00:03:45,010 --> 00:03:46,890 So, I want you to make a guess on which ones 92 00:03:46,890 --> 00:03:48,100 going to be faster. 93 00:03:48,100 --> 00:03:49,330 I'm guessing S print, 94 00:03:49,330 --> 00:03:52,050 because we're not doing any formatting. 95 00:03:52,050 --> 00:03:54,880 Again, this is in an underscore test file. 96 00:03:54,880 --> 00:03:58,490 So, if I move back over to our 97 00:03:58,490 --> 00:03:59,580 terminal, there it is. 98 00:03:59,580 --> 00:04:01,080 Basic underscore test. 99 00:04:01,080 --> 00:04:04,530 And we're going to use go test like we did before. 100 00:04:04,530 --> 00:04:09,290 But since there are no functions that start with test, 101 00:04:09,290 --> 00:04:11,410 I don't really want to waste the tools time 102 00:04:11,410 --> 00:04:12,450 looking for them. 103 00:04:12,450 --> 00:04:15,103 And if there were functions that started with test, 104 00:04:15,103 --> 00:04:17,460 last thing I want is them to run. 105 00:04:17,460 --> 00:04:21,210 By default, go test will always look for 106 00:04:21,210 --> 00:04:24,220 and run test functions. 107 00:04:24,220 --> 00:04:26,550 So, if we only want to run the benchmarks, 108 00:04:26,550 --> 00:04:28,280 then we gotta give it a regular expression 109 00:04:28,280 --> 00:04:29,530 where it won't find anything. 110 00:04:29,530 --> 00:04:31,200 That's why I like using none. 111 00:04:31,200 --> 00:04:34,130 None isn't special, just a string that I know there are 112 00:04:34,130 --> 00:04:35,630 no test functions around. 113 00:04:35,630 --> 00:04:37,570 So, we'll filter out all those test functions 114 00:04:37,570 --> 00:04:39,200 even though there are none. 115 00:04:39,200 --> 00:04:42,330 It doesn't matter, I still like to say none as practice. 116 00:04:42,330 --> 00:04:45,180 Then, we're going to use the bench flag, 117 00:04:45,180 --> 00:04:46,820 and I'm going to use dot for everything, 118 00:04:46,820 --> 00:04:48,230 remember, it's regular expression. 119 00:04:48,230 --> 00:04:52,650 So, run none tests, but all the benchmarks. 120 00:04:52,650 --> 00:04:56,060 Let's increase the benchtime to three seconds. 121 00:04:56,060 --> 00:04:59,350 So, that uses basic time formatting that we would get out 122 00:04:59,350 --> 00:05:01,170 of the time package, three s. 123 00:05:01,170 --> 00:05:03,590 You know, you can do your hours, you can do your minutes. 124 00:05:03,590 --> 00:05:04,710 There it is three seconds. 125 00:05:04,710 --> 00:05:07,260 So, we're going to go from one second to three seconds. 126 00:05:07,260 --> 00:05:09,820 And I'm just looking at CPU right now, 127 00:05:09,820 --> 00:05:10,653 but you know what? 128 00:05:10,653 --> 00:05:12,880 Let me show you what else we can do. 129 00:05:12,880 --> 00:05:16,170 I can also tell it to look at memory allocations 130 00:05:16,170 --> 00:05:18,860 just by saying benchmem. 131 00:05:18,860 --> 00:05:20,780 So if I do benchmem, I'm also going to tell it 132 00:05:20,780 --> 00:05:22,820 report allocations. 133 00:05:22,820 --> 00:05:24,840 Remember, an allocation is anytime a value escapes 134 00:05:24,840 --> 00:05:27,320 the stack and goes into the heap. 135 00:05:27,320 --> 00:05:28,520 So, I think we're good here. 136 00:05:28,520 --> 00:05:32,710 Go test, run none, bench dot, bench time three seconds. 137 00:05:32,710 --> 00:05:34,030 Bench the memory. 138 00:05:34,030 --> 00:05:35,170 Okay, brilliant. 139 00:05:35,170 --> 00:05:36,650 So, I'm running this now, 140 00:05:36,650 --> 00:05:38,010 and then those that were on Darwin, 141 00:05:38,010 --> 00:05:40,770 we know we're in a amd64 architecture. 142 00:05:40,770 --> 00:05:44,690 S print, now, the first number here as it runs, 143 00:05:44,690 --> 00:05:48,000 is showing us how many times we executed that code 144 00:05:48,000 --> 00:05:48,833 inside the loop. 145 00:05:48,833 --> 00:05:51,930 That was 50 million times over that three seconds. 146 00:05:51,930 --> 00:05:52,763 If you really noticed, 147 00:05:52,763 --> 00:05:55,600 that whole test took almost nine seconds. 148 00:05:55,600 --> 00:05:56,880 Okay. 149 00:05:56,880 --> 00:05:58,940 Three, at least three seconds for each, 150 00:05:58,940 --> 00:06:01,880 it looks like almost four seconds for each. 151 00:06:01,880 --> 00:06:04,760 So, we ran this 50,000 times. 152 00:06:04,760 --> 00:06:08,650 We identified that S print ran at 68 nanoseconds 153 00:06:08,650 --> 00:06:10,200 per operation. 154 00:06:10,200 --> 00:06:13,540 With five bites allocated over one object. 155 00:06:13,540 --> 00:06:14,950 So there's one object 156 00:06:14,950 --> 00:06:17,590 on the heap worth five bites of memory. 157 00:06:17,590 --> 00:06:20,730 Notice, however, that we were able to run S print f 158 00:06:20,730 --> 00:06:23,520 100 million times. 159 00:06:23,520 --> 00:06:26,540 Because it ran faster, at 53 nanoseconds per op, 160 00:06:26,540 --> 00:06:28,180 with the same memory allocation. 161 00:06:28,180 --> 00:06:30,700 In other words, our guess was wrong. 162 00:06:30,700 --> 00:06:34,750 S print f is actually faster than S print 163 00:06:34,750 --> 00:06:37,480 by almost, what, in this particular run, 164 00:06:37,480 --> 00:06:40,680 we're looking at about 15 nanoseconds per op. 165 00:06:40,680 --> 00:06:42,330 I mean, we're talking small. 166 00:06:42,330 --> 00:06:44,670 But this is why I'm always going to tell you 167 00:06:44,670 --> 00:06:45,560 we cannot guess. 168 00:06:45,560 --> 00:06:46,960 Do not guess about performance. 169 00:06:46,960 --> 00:06:47,860 Do not guess. 170 00:06:47,860 --> 00:06:51,400 When you are writing code, we optimize for correctness, 171 00:06:51,400 --> 00:06:54,570 readability, simplicity, integrity, 172 00:06:54,570 --> 00:06:57,660 because that's all you can really know about 173 00:06:57,660 --> 00:06:59,040 at the time you're writing the code. 174 00:06:59,040 --> 00:07:00,590 If you write code for performance, 175 00:07:00,590 --> 00:07:03,180 especially if it's new code, you are guessing, 176 00:07:03,180 --> 00:07:04,330 you're going to be wrong. 177 00:07:04,330 --> 00:07:05,510 Get something working, 178 00:07:05,510 --> 00:07:08,150 we can use these benchmarks to validate. 179 00:07:08,150 --> 00:07:10,380 So, now we know we probably would have used S print 180 00:07:10,380 --> 00:07:13,080 in our code, when from a performance standpoint, 181 00:07:13,080 --> 00:07:14,890 S print f would have been better. 182 00:07:14,890 --> 00:07:16,420 I mean, 15 nanoseconds per op, 183 00:07:16,420 --> 00:07:18,180 but that's my point is, 184 00:07:18,180 --> 00:07:20,760 we don't want to guess about performance. 185 00:07:20,760 --> 00:07:23,910 So, we've just seen how we can create a basic benchmark 186 00:07:24,750 --> 00:07:26,400 and run that on the command line.