1
00:00:06,810 --> 00:00:09,720
- So benchmarking in Go,
is incredibly powerful.

2
00:00:09,720 --> 00:00:11,140
I'm going to show you some aspects

3
00:00:11,140 --> 00:00:13,150
of basic benchmarking now,

4
00:00:13,150 --> 00:00:15,150
and then when we get into profiling,

5
00:00:15,150 --> 00:00:16,800
I'm going to show you some more.

6
00:00:16,800 --> 00:00:19,670
But, let's just start with
a basic benchmark here.

7
00:00:19,670 --> 00:00:21,180
Let's look at the mechanics

8
00:00:21,180 --> 00:00:22,750
and look at how we can write these

9
00:00:22,750 --> 00:00:25,830
and run a couple of them right
there on the command line.

10
00:00:25,830 --> 00:00:28,650
So, what I'm starting
out with is on line 17.

11
00:00:28,650 --> 00:00:31,270
In fact, look at line 17 and line 28.

12
00:00:31,270 --> 00:00:36,050
The idea of this benchmark
is to do a CPU profile

13
00:00:36,050 --> 00:00:37,590
of how fast

14
00:00:37,590 --> 00:00:42,050
the S print or S print f function run.

15
00:00:42,050 --> 00:00:45,070
And one thing about benchmarking,
is that we gotta really

16
00:00:45,070 --> 00:00:47,450
make sure that we're not guessing.

17
00:00:47,450 --> 00:00:50,800
We gotta make sure that
the benchmark we're writing

18
00:00:50,800 --> 00:00:51,860
is accurate.

19
00:00:51,860 --> 00:00:53,550
And one of the things I worry about,

20
00:00:53,550 --> 00:00:55,490
is since this code's going to be compiled,

21
00:00:55,490 --> 00:00:59,550
and the compiler is capable
of throwing dead code away.

22
00:00:59,550 --> 00:01:03,490
With the SSA backend, it
is capable of identifying,

23
00:01:03,490 --> 00:01:06,910
like this scenario, you call
a function like S print,

24
00:01:06,910 --> 00:01:09,490
and S print doesn't really mutate state.

25
00:01:09,490 --> 00:01:10,900
It's using value semantics.

26
00:01:10,900 --> 00:01:12,880
It gets its own copy of the string,

27
00:01:12,880 --> 00:01:14,710
returns a new string.

28
00:01:14,710 --> 00:01:17,490
So, if you don't capture
the new string coming out

29
00:01:17,490 --> 00:01:19,460
of S print or S print f,

30
00:01:19,460 --> 00:01:21,890
technically the compiler
can identify that,

31
00:01:21,890 --> 00:01:25,630
let's not waste any CPU cycles
making this function call.

32
00:01:25,630 --> 00:01:27,880
And suddenly, you're trying
to do a benchmark like this

33
00:01:27,880 --> 00:01:30,400
to figure out what's faster,
S print or S print f.

34
00:01:30,400 --> 00:01:32,340
And it's blazing fast for both,

35
00:01:32,340 --> 00:01:34,740
because you didn't capture the output.

36
00:01:34,740 --> 00:01:36,670
So, what we're doing in this test

37
00:01:36,670 --> 00:01:38,880
is also making sure that the code

38
00:01:38,880 --> 00:01:40,610
is not getting thrown away,

39
00:01:40,610 --> 00:01:44,170
but also that we're
using code the same way

40
00:01:44,170 --> 00:01:47,700
as accurately as we can,
like we do in production.

41
00:01:47,700 --> 00:01:48,920
Now, here's the test.

42
00:01:48,920 --> 00:01:52,010
I want to know what's
faster, S print or S print f,

43
00:01:52,010 --> 00:01:53,900
for the string, hello.

44
00:01:53,900 --> 00:01:57,720
My guess is, is that S
print is going to be faster

45
00:01:57,720 --> 00:02:00,330
because we're not doing any formatting.

46
00:02:00,330 --> 00:02:02,720
And remember, any time
you guess about something,

47
00:02:02,720 --> 00:02:04,910
you got a good chance
you're going to be wrong,

48
00:02:04,910 --> 00:02:06,080
but this one's going to be obvious, right?

49
00:02:06,080 --> 00:02:09,600
I'm just showing you a basic
test, S print versus S print f.

50
00:02:09,600 --> 00:02:11,010
Now, look at the structure of the test,

51
00:02:11,010 --> 00:02:12,900
we've got a function named Benchmark,

52
00:02:12,900 --> 00:02:15,940
this time it takes a testing B pointer,

53
00:02:15,940 --> 00:02:17,960
not a testing T, a testing B pointer.

54
00:02:17,960 --> 00:02:20,020
It's the word Benchmark,
we have the same rules,

55
00:02:20,020 --> 00:02:22,700
it has to be exported S print.

56
00:02:22,700 --> 00:02:25,230
'Kay, S print also becomes
the name for that particular

57
00:02:25,230 --> 00:02:27,330
test Benchmark function.

58
00:02:27,330 --> 00:02:29,583
And the big thing here is the loop.

59
00:02:30,600 --> 00:02:33,490
Now, the way the loop
works, is the first time

60
00:02:33,490 --> 00:02:36,700
the testing framer calls
this Benchmark function,

61
00:02:36,700 --> 00:02:38,910
b.N is going to be one.

62
00:02:38,910 --> 00:02:41,880
And then it's going to
increase b.N over time.

63
00:02:41,880 --> 00:02:44,840
The idea is, we want enough
iterations of the loop

64
00:02:44,840 --> 00:02:47,540
to feel confident of the results.

65
00:02:47,540 --> 00:02:50,140
Now, by default, the test
will run for about one second.

66
00:02:50,140 --> 00:02:52,130
Which means it runs for a few seconds.

67
00:02:52,130 --> 00:02:54,120
We're going to up it
to about three seconds,

68
00:02:54,120 --> 00:02:55,870
just to make sure we
get enough iterations.

69
00:02:55,870 --> 00:02:57,590
You gotta figure out what
enough iterations are.

70
00:02:57,590 --> 00:02:58,980
A million, a hundred million.

71
00:02:58,980 --> 00:02:59,950
It's up to you.

72
00:02:59,950 --> 00:03:03,700
So you can increase the bench
time to get more iterations

73
00:03:03,700 --> 00:03:05,830
to feel more confident, but again,

74
00:03:05,830 --> 00:03:07,480
it's going to take longer to run.

75
00:03:08,570 --> 00:03:11,410
So, look at this Benchmark
function S print.

76
00:03:11,410 --> 00:03:13,260
We've got our loop b.N.

77
00:03:13,260 --> 00:03:14,860
We are calling S print.

78
00:03:14,860 --> 00:03:18,000
I'm assigning the string
coming back out, to s.

79
00:03:18,000 --> 00:03:19,360
Which is a local variable.

80
00:03:19,360 --> 00:03:21,770
We know that if we don't
use the local variable,

81
00:03:21,770 --> 00:03:23,460
Go's complier's going to complain.

82
00:03:23,460 --> 00:03:25,890
I don't want to add extra CPU cycles here.

83
00:03:25,890 --> 00:03:29,410
So, we assign it to a global
variable, just so the compiler

84
00:03:29,410 --> 00:03:30,390
won't throw it away.

85
00:03:30,390 --> 00:03:32,620
That's all we're doing here with gs,

86
00:03:32,620 --> 00:03:35,040
is to validate that we
don't throw anything away.

87
00:03:35,040 --> 00:03:37,140
That's not going to give us any extra cost

88
00:03:37,140 --> 00:03:39,880
on this benchmark, because
the real cost that we're

89
00:03:39,880 --> 00:03:42,070
looking at here is inside the loop.

90
00:03:42,070 --> 00:03:45,010
Okay, so, we got S print
and we got S print f.

91
00:03:45,010 --> 00:03:46,890
So, I want you to make
a guess on which ones

92
00:03:46,890 --> 00:03:48,100
going to be faster.

93
00:03:48,100 --> 00:03:49,330
I'm guessing S print,

94
00:03:49,330 --> 00:03:52,050
because we're not doing any formatting.

95
00:03:52,050 --> 00:03:54,880
Again, this is in an underscore test file.

96
00:03:54,880 --> 00:03:58,490
So, if I move back over to our

97
00:03:58,490 --> 00:03:59,580
terminal, there it is.

98
00:03:59,580 --> 00:04:01,080
Basic underscore test.

99
00:04:01,080 --> 00:04:04,530
And we're going to use go
test like we did before.

100
00:04:04,530 --> 00:04:09,290
But since there are no
functions that start with test,

101
00:04:09,290 --> 00:04:11,410
I don't really want to
waste the tools time

102
00:04:11,410 --> 00:04:12,450
looking for them.

103
00:04:12,450 --> 00:04:15,103
And if there were functions
that started with test,

104
00:04:15,103 --> 00:04:17,460
last thing I want is them to run.

105
00:04:17,460 --> 00:04:21,210
By default, go test will always look for

106
00:04:21,210 --> 00:04:24,220
and run test functions.

107
00:04:24,220 --> 00:04:26,550
So, if we only want to run the benchmarks,

108
00:04:26,550 --> 00:04:28,280
then we gotta give it a regular expression

109
00:04:28,280 --> 00:04:29,530
where it won't find anything.

110
00:04:29,530 --> 00:04:31,200
That's why I like using none.

111
00:04:31,200 --> 00:04:34,130
None isn't special, just a
string that I know there are

112
00:04:34,130 --> 00:04:35,630
no test functions around.

113
00:04:35,630 --> 00:04:37,570
So, we'll filter out
all those test functions

114
00:04:37,570 --> 00:04:39,200
even though there are none.

115
00:04:39,200 --> 00:04:42,330
It doesn't matter, I still
like to say none as practice.

116
00:04:42,330 --> 00:04:45,180
Then, we're going to use the bench flag,

117
00:04:45,180 --> 00:04:46,820
and I'm going to use dot for everything,

118
00:04:46,820 --> 00:04:48,230
remember, it's regular expression.

119
00:04:48,230 --> 00:04:52,650
So, run none tests,
but all the benchmarks.

120
00:04:52,650 --> 00:04:56,060
Let's increase the
benchtime to three seconds.

121
00:04:56,060 --> 00:04:59,350
So, that uses basic time
formatting that we would get out

122
00:04:59,350 --> 00:05:01,170
of the time package, three s.

123
00:05:01,170 --> 00:05:03,590
You know, you can do your
hours, you can do your minutes.

124
00:05:03,590 --> 00:05:04,710
There it is three seconds.

125
00:05:04,710 --> 00:05:07,260
So, we're going to go from
one second to three seconds.

126
00:05:07,260 --> 00:05:09,820
And I'm just looking at CPU right now,

127
00:05:09,820 --> 00:05:10,653
but you know what?

128
00:05:10,653 --> 00:05:12,880
Let me show you what else we can do.

129
00:05:12,880 --> 00:05:16,170
I can also tell it to
look at memory allocations

130
00:05:16,170 --> 00:05:18,860
just by saying benchmem.

131
00:05:18,860 --> 00:05:20,780
So if I do benchmem, I'm
also going to tell it

132
00:05:20,780 --> 00:05:22,820
report allocations.

133
00:05:22,820 --> 00:05:24,840
Remember, an allocation
is anytime a value escapes

134
00:05:24,840 --> 00:05:27,320
the stack and goes into the heap.

135
00:05:27,320 --> 00:05:28,520
So, I think we're good here.

136
00:05:28,520 --> 00:05:32,710
Go test, run none, bench dot,
bench time three seconds.

137
00:05:32,710 --> 00:05:34,030
Bench the memory.

138
00:05:34,030 --> 00:05:35,170
Okay, brilliant.

139
00:05:35,170 --> 00:05:36,650
So, I'm running this now,

140
00:05:36,650 --> 00:05:38,010
and then those that were on Darwin,

141
00:05:38,010 --> 00:05:40,770
we know we're in a amd64 architecture.

142
00:05:40,770 --> 00:05:44,690
S print, now, the first
number here as it runs,

143
00:05:44,690 --> 00:05:48,000
is showing us how many
times we executed that code

144
00:05:48,000 --> 00:05:48,833
inside the loop.

145
00:05:48,833 --> 00:05:51,930
That was 50 million times
over that three seconds.

146
00:05:51,930 --> 00:05:52,763
If you really noticed,

147
00:05:52,763 --> 00:05:55,600
that whole test took almost nine seconds.

148
00:05:55,600 --> 00:05:56,880
Okay.

149
00:05:56,880 --> 00:05:58,940
Three, at least three seconds for each,

150
00:05:58,940 --> 00:06:01,880
it looks like almost
four seconds for each.

151
00:06:01,880 --> 00:06:04,760
So, we ran this 50,000 times.

152
00:06:04,760 --> 00:06:08,650
We identified that S print
ran at 68 nanoseconds

153
00:06:08,650 --> 00:06:10,200
per operation.

154
00:06:10,200 --> 00:06:13,540
With five bites allocated over one object.

155
00:06:13,540 --> 00:06:14,950
So there's one object

156
00:06:14,950 --> 00:06:17,590
on the heap worth five bites of memory.

157
00:06:17,590 --> 00:06:20,730
Notice, however, that we
were able to run S print f

158
00:06:20,730 --> 00:06:23,520
100 million times.

159
00:06:23,520 --> 00:06:26,540
Because it ran faster,
at 53 nanoseconds per op,

160
00:06:26,540 --> 00:06:28,180
with the same memory allocation.

161
00:06:28,180 --> 00:06:30,700
In other words, our guess was wrong.

162
00:06:30,700 --> 00:06:34,750
S print f is actually faster than S print

163
00:06:34,750 --> 00:06:37,480
by almost, what, in this particular run,

164
00:06:37,480 --> 00:06:40,680
we're looking at about
15 nanoseconds per op.

165
00:06:40,680 --> 00:06:42,330
I mean, we're talking small.

166
00:06:42,330 --> 00:06:44,670
But this is why I'm
always going to tell you

167
00:06:44,670 --> 00:06:45,560
we cannot guess.

168
00:06:45,560 --> 00:06:46,960
Do not guess about performance.

169
00:06:46,960 --> 00:06:47,860
Do not guess.

170
00:06:47,860 --> 00:06:51,400
When you are writing code,
we optimize for correctness,

171
00:06:51,400 --> 00:06:54,570
readability, simplicity, integrity,

172
00:06:54,570 --> 00:06:57,660
because that's all you
can really know about

173
00:06:57,660 --> 00:06:59,040
at the time you're writing the code.

174
00:06:59,040 --> 00:07:00,590
If you write code for performance,

175
00:07:00,590 --> 00:07:03,180
especially if it's new
code, you are guessing,

176
00:07:03,180 --> 00:07:04,330
you're going to be wrong.

177
00:07:04,330 --> 00:07:05,510
Get something working,

178
00:07:05,510 --> 00:07:08,150
we can use these benchmarks to validate.

179
00:07:08,150 --> 00:07:10,380
So, now we know we probably
would have used S print

180
00:07:10,380 --> 00:07:13,080
in our code, when from a
performance standpoint,

181
00:07:13,080 --> 00:07:14,890
S print f would have been better.

182
00:07:14,890 --> 00:07:16,420
I mean, 15 nanoseconds per op,

183
00:07:16,420 --> 00:07:18,180
but that's my point is,

184
00:07:18,180 --> 00:07:20,760
we don't want to guess about performance.

185
00:07:20,760 --> 00:07:23,910
So, we've just seen how we
can create a basic benchmark

186
00:07:24,750 --> 00:07:26,400
and run that on the command line.