1
00:00:00,000 --> 00:00:06,300
[No Audio]

2
00:00:06,301 --> 00:00:08,266
Performance is important for many Rust

3
00:00:08,267 --> 00:00:10,900
programs. This section of the course contains

4
00:00:10,901 --> 00:00:12,933
many techniques that can improve the

5
00:00:12,934 --> 00:00:15,000
performance in terms of speed and memory

6
00:00:15,001 --> 00:00:18,300
usage. Some techniques are entirely Rust

7
00:00:18,301 --> 00:00:21,266
specific, and some involves ideas that can be

8
00:00:21,267 --> 00:00:23,266
applied with modifications to programs

9
00:00:23,267 --> 00:00:26,066
written in other languages. In this first

10
00:00:26,067 --> 00:00:28,066
tutorial, we will be talking about

11
00:00:28,067 --> 00:00:31,600
benchmarking. Benchmarking typically involves

12
00:00:31,601 --> 00:00:33,733
comparing the performance of two or more

13
00:00:33,734 --> 00:00:36,900
programs that do the same thing or same task.

14
00:00:37,533 --> 00:00:40,066
Sometimes this might involve comparing two

15
00:00:40,067 --> 00:00:42,666
different implementations, and at other times,

16
00:00:42,700 --> 00:00:44,633
it may involve comparing two different

17
00:00:44,634 --> 00:00:48,466
versions of the same implementation. The key

18
00:00:48,467 --> 00:00:50,833
question that we would like to answer in this

19
00:00:50,834 --> 00:00:53,833
context is, did the change speed up things or

20
00:00:53,834 --> 00:00:56,400
not, and benchmarking different factors may

21
00:00:56,401 --> 00:00:58,366
be taken into consideration such as

22
00:00:58,367 --> 00:01:00,833
processing time, memory usage, and disk

23
00:01:00,834 --> 00:01:04,166
access time. However, in this tutorial, we

24
00:01:04,167 --> 00:01:06,400
will be primarily focusing on processing

25
00:01:06,401 --> 00:01:09,600
time. Let us look at an example for

26
00:01:09,601 --> 00:01:11,766
understanding the benchmarking in Rust.

27
00:01:12,200 --> 00:01:14,366
Consider a scenario where we would like to

28
00:01:14,367 --> 00:01:16,800
compare two implementations of sorting

29
00:01:16,801 --> 00:01:21,300
elements of an array. In the lib.rs file, I

30
00:01:21,301 --> 00:01:23,300
have added these two implementations as

31
00:01:23,301 --> 00:01:25,500
functions. The first implementation is

32
00:01:25,501 --> 00:01:27,700
sorting algo_1, and the second

33
00:01:27,701 --> 00:01:30,866
implementation is sorting algo_2. These two

34
00:01:30,867 --> 00:01:32,700
functions are representing two different

35
00:01:32,701 --> 00:01:35,100
implementations of sorting algorithms.

36
00:01:35,533 --> 00:01:37,866
The input to both the functions are a mutable

37
00:01:37,867 --> 00:01:39,633
reference to a vector, and inside the

38
00:01:39,634 --> 00:01:42,300
function, these functions are sorting the

39
00:01:42,301 --> 00:01:45,233
respective vector elements. Now, let us see

40
00:01:45,234 --> 00:01:47,833
how we can use benchmarking in order to track

41
00:01:47,834 --> 00:01:50,000
performance and make sure that performance is

42
00:01:50,001 --> 00:01:52,833
not regressed. Rust actually has a built in

43
00:01:52,834 --> 00:01:55,500
benchmark tests. However, at this moment,

44
00:01:55,501 --> 00:01:58,466
they are unstable features and are therefore

45
00:01:58,467 --> 00:02:00,766
only available in nightly versions of Rust.

46
00:02:00,767 --> 00:02:03,000
So instead, we are going to use a library

47
00:02:03,001 --> 00:02:06,300
called criterion. To properly use the library,

48
00:02:06,301 --> 00:02:08,466
we will open the cargo.toml file.

49
00:02:09,765 --> 00:02:13,033
We will add Criterion as dev dependency.

50
00:02:13,034 --> 00:02:16,133
[No Audio]

51
00:02:16,134 --> 00:02:19,266
Next, we will configure our benchmark target, which is

52
00:02:19,267 --> 00:02:21,033
the name of the file that contains our

53
00:02:21,034 --> 00:02:23,533
benchmark tests. I will use the name of

54
00:02:23,534 --> 00:02:25,233
sorting_benchmark.

55
00:02:25,234 --> 00:02:28,333
[No Audio]

56
00:02:28,334 --> 00:02:30,633
Next we will set harness to false.

57
00:02:30,634 --> 00:02:33,933
[No Audio]

58
00:02:33,934 --> 00:02:35,066
This will disable the

59
00:02:35,067 --> 00:02:37,633
default benchmarking system, so we could use

60
00:02:37,634 --> 00:02:41,266
Criterion's provided benchmarking system. Next,

61
00:02:41,267 --> 00:02:43,500
we will create a folder called benches in the

62
00:02:43,501 --> 00:02:44,900
root of our project.

63
00:02:44,901 --> 00:02:47,066
[No Audio]

64
00:02:47,067 --> 00:02:48,233
Please make sure that the

65
00:02:48,234 --> 00:02:50,700
folder name must be benches, otherwise,

66
00:02:50,701 --> 00:02:53,100
the command for executing the benchmark may

67
00:02:53,101 --> 00:02:55,533
not be successful as cargo may not be able to

68
00:02:55,534 --> 00:02:57,766
find the relevant file for the benchmark.

69
00:02:58,500 --> 00:03:01,000
Inside that folder, we will create a new file

70
00:03:01,033 --> 00:03:04,466
called sorting_benchmark.rs.

71
00:03:04,467 --> 00:03:07,433
[No Audio]

72
00:03:07,434 --> 00:03:09,700
This name must be the same as the name that we

73
00:03:09,701 --> 00:03:11,900
mentioned in the cargo.toml file.

74
00:03:12,966 --> 00:03:17,366
Now we will add some code to the file. First let's

75
00:03:17,367 --> 00:03:20,133
let's bring our sought array functions into scope.

76
00:03:20,134 --> 00:03:22,433
[No Audio]

77
00:03:22,434 --> 00:03:24,766
We will also bring some items from the

78
00:03:24,767 --> 00:03:26,633
Criterion crate into scope.

79
00:03:26,634 --> 00:03:31,366
[No Audio]

80
00:03:31,367 --> 00:03:35,000
Next, we will set up our benchmarking test function.

81
00:03:35,001 --> 00:03:38,800
[No Audio]

82
00:03:38,801 --> 00:03:41,933
The benchmark function gets access to an instance

83
00:03:41,934 --> 00:03:44,400
of the Criterion struct, which allows us to

84
00:03:44,401 --> 00:03:47,133
configure and execute benchmarks. In this

85
00:03:47,134 --> 00:03:49,500
case, the variable c is a mutable reference

86
00:03:49,501 --> 00:03:52,800
of type Criterion. We will use it to create a

87
00:03:52,801 --> 00:03:55,300
new benchmark. Inside the function, I will

88
00:03:55,301 --> 00:03:58,100
first define a vector containing some numbers.

89
00:03:58,101 --> 00:04:02,333
[No Audio]

90
00:04:02,334 --> 00:04:04,233
The vector contains a list of

91
00:04:04,234 --> 00:04:06,800
unsorted numbers which will be sorted using

92
00:04:06,801 --> 00:04:09,300
the sorting algorithm, that we will be using

93
00:04:09,301 --> 00:04:11,933
for benchmarking, and now let's create our

94
00:04:11,934 --> 00:04:14,866
benchmark. To create a new benchmark we will

95
00:04:14,867 --> 00:04:16,366
call the benchmark function on the

96
00:04:16,367 --> 00:04:18,132
Criterion struct instance.

97
00:04:18,133 --> 00:04:20,600
[No Audio]

98
00:04:20,601 --> 00:04:22,333
This function takes two

99
00:04:22,334 --> 00:04:26,166
arguments, an id, which I will mention inside quotes.

100
00:04:26,167 --> 00:04:28,100
[No Audio]

101
00:04:28,101 --> 00:04:30,233
And the second argument of a closure.

102
00:04:31,333 --> 00:04:34,133
Inside the closure, we get access to an

103
00:04:34,134 --> 00:04:36,100
instance of Bencher which allows us to

104
00:04:36,101 --> 00:04:38,566
iterate a benchmark function to measure its

105
00:04:38,567 --> 00:04:41,166
performance. Inside the closure body we will

106
00:04:41,167 --> 00:04:42,666
call the iter method.

107
00:04:42,667 --> 00:04:46,800
[No Audio]

108
00:04:46,801 --> 00:04:48,166
This takes another closure

109
00:04:48,167 --> 00:04:50,266
that executes our sort function.

110
00:04:50,267 --> 00:04:52,500
[No Audio]

111
00:04:52,501 --> 00:04:55,466
The sort array function will execute many times

112
00:04:55,467 --> 00:04:57,300
for its performance to be accurately

113
00:04:57,301 --> 00:05:00,166
measured, and with that our benchmark is now

114
00:05:00,167 --> 00:05:04,333
complete. Next, we will use the criterion_group macro.

115
00:05:04,334 --> 00:05:07,033
[No Audio]

116
00:05:07,034 --> 00:05:09,000
This macro is used to define a

117
00:05:09,001 --> 00:05:11,166
collection of functions to call within a

118
00:05:11,167 --> 00:05:13,633
common Criterion configuration. The first

119
00:05:13,666 --> 00:05:16,100
argument is the name of the group, which in

120
00:05:16,101 --> 00:05:18,233
this case is benches and the remaining

121
00:05:18,266 --> 00:05:20,866
arguments or names of functions. Since we have

122
00:05:20,867 --> 00:05:24,100
only a single function, so we will mention that.

123
00:05:24,101 --> 00:05:26,766
[No Audio]

124
00:05:26,767 --> 00:05:28,933
Finally, we will need to use one macro

125
00:05:28,934 --> 00:05:30,900
called a criterion_main.

126
00:05:30,901 --> 00:05:34,933
[No Audio]

127
00:05:34,934 --> 00:05:36,433
This macro expands to

128
00:05:36,434 --> 00:05:38,733
a main function which runs all the benchmarks

129
00:05:38,734 --> 00:05:40,900
in a given group, in this case, the benches

130
00:05:40,901 --> 00:05:44,433
group. We are all set up, so let's go ahead and

131
00:05:44,434 --> 00:05:47,300
rent a benchmark tests. To execute, we will use

132
00:05:47,301 --> 00:05:49,066
the command of cargo bench.

133
00:05:49,067 --> 00:05:54,666
[No Audio]

134
00:05:54,667 --> 00:05:57,500
It provides information about 100 runs of the

135
00:05:57,501 --> 00:06:00,000
same program and compute their average

136
00:06:00,001 --> 00:06:02,400
running performance in terms of nanoseconds.

137
00:06:02,700 --> 00:06:04,666
In this case, the average running times turns

138
00:06:04,667 --> 00:06:08,200
out to be this much nanoseconds. It also

139
00:06:08,201 --> 00:06:11,000
shows in some information about the outliers,

140
00:06:11,033 --> 00:06:13,966
which out of the 100 runs were quite deviant

141
00:06:13,967 --> 00:06:17,066
in performance. Let us run the benchmark again.

142
00:06:17,067 --> 00:06:21,166
[No Audio]

143
00:06:21,167 --> 00:06:23,100
It says this time no change in

144
00:06:23,101 --> 00:06:25,433
performance is detected. This means that the

145
00:06:25,434 --> 00:06:27,600
performance is quite stable and does not

146
00:06:27,601 --> 00:06:30,533
change during different runs. Now, let us

147
00:06:30,534 --> 00:06:32,733
change the algorithm from algorithm 1 to

148
00:06:32,734 --> 00:06:35,433
date of algorithm 1 and see the performance again.

149
00:06:35,434 --> 00:06:41,800
[No Audio]

150
00:06:41,801 --> 00:06:44,533
The performance has regressed. This clearly

151
00:06:44,534 --> 00:06:46,833
indicates that the performance of algorithm 1

152
00:06:46,834 --> 00:06:49,433
is superior to that of algorithm 2. It

153
00:06:49,434 --> 00:06:51,533
also gives useful information about the

154
00:06:51,534 --> 00:06:54,166
percentage change in the performance. In this

155
00:06:54,167 --> 00:06:56,400
case, it is quite high, and therefore provides

156
00:06:56,433 --> 00:06:59,100
a clear indication to the implementers that

157
00:06:59,101 --> 00:07:01,466
they should prefer to use algorithm 1 over

158
00:07:01,467 --> 00:07:04,300
algorithm 2. That brings us to the end of

159
00:07:04,301 --> 00:07:06,733
this tutorial. See you again with more

160
00:07:06,734 --> 00:07:09,166
performance improvement tutorials and until

161
00:07:09,167 --> 00:07:10,966
then happy Rust programming.