1 00:00:00,000 --> 00:00:06,300 [No Audio] 2 00:00:06,301 --> 00:00:08,266 Performance is important for many Rust 3 00:00:08,267 --> 00:00:10,900 programs. This section of the course contains 4 00:00:10,901 --> 00:00:12,933 many techniques that can improve the 5 00:00:12,934 --> 00:00:15,000 performance in terms of speed and memory 6 00:00:15,001 --> 00:00:18,300 usage. Some techniques are entirely Rust 7 00:00:18,301 --> 00:00:21,266 specific, and some involves ideas that can be 8 00:00:21,267 --> 00:00:23,266 applied with modifications to programs 9 00:00:23,267 --> 00:00:26,066 written in other languages. In this first 10 00:00:26,067 --> 00:00:28,066 tutorial, we will be talking about 11 00:00:28,067 --> 00:00:31,600 benchmarking. Benchmarking typically involves 12 00:00:31,601 --> 00:00:33,733 comparing the performance of two or more 13 00:00:33,734 --> 00:00:36,900 programs that do the same thing or same task. 14 00:00:37,533 --> 00:00:40,066 Sometimes this might involve comparing two 15 00:00:40,067 --> 00:00:42,666 different implementations, and at other times, 16 00:00:42,700 --> 00:00:44,633 it may involve comparing two different 17 00:00:44,634 --> 00:00:48,466 versions of the same implementation. The key 18 00:00:48,467 --> 00:00:50,833 question that we would like to answer in this 19 00:00:50,834 --> 00:00:53,833 context is, did the change speed up things or 20 00:00:53,834 --> 00:00:56,400 not, and benchmarking different factors may 21 00:00:56,401 --> 00:00:58,366 be taken into consideration such as 22 00:00:58,367 --> 00:01:00,833 processing time, memory usage, and disk 23 00:01:00,834 --> 00:01:04,166 access time. However, in this tutorial, we 24 00:01:04,167 --> 00:01:06,400 will be primarily focusing on processing 25 00:01:06,401 --> 00:01:09,600 time. Let us look at an example for 26 00:01:09,601 --> 00:01:11,766 understanding the benchmarking in Rust. 27 00:01:12,200 --> 00:01:14,366 Consider a scenario where we would like to 28 00:01:14,367 --> 00:01:16,800 compare two implementations of sorting 29 00:01:16,801 --> 00:01:21,300 elements of an array. In the lib.rs file, I 30 00:01:21,301 --> 00:01:23,300 have added these two implementations as 31 00:01:23,301 --> 00:01:25,500 functions. The first implementation is 32 00:01:25,501 --> 00:01:27,700 sorting algo_1, and the second 33 00:01:27,701 --> 00:01:30,866 implementation is sorting algo_2. These two 34 00:01:30,867 --> 00:01:32,700 functions are representing two different 35 00:01:32,701 --> 00:01:35,100 implementations of sorting algorithms. 36 00:01:35,533 --> 00:01:37,866 The input to both the functions are a mutable 37 00:01:37,867 --> 00:01:39,633 reference to a vector, and inside the 38 00:01:39,634 --> 00:01:42,300 function, these functions are sorting the 39 00:01:42,301 --> 00:01:45,233 respective vector elements. Now, let us see 40 00:01:45,234 --> 00:01:47,833 how we can use benchmarking in order to track 41 00:01:47,834 --> 00:01:50,000 performance and make sure that performance is 42 00:01:50,001 --> 00:01:52,833 not regressed. Rust actually has a built in 43 00:01:52,834 --> 00:01:55,500 benchmark tests. However, at this moment, 44 00:01:55,501 --> 00:01:58,466 they are unstable features and are therefore 45 00:01:58,467 --> 00:02:00,766 only available in nightly versions of Rust. 46 00:02:00,767 --> 00:02:03,000 So instead, we are going to use a library 47 00:02:03,001 --> 00:02:06,300 called criterion. To properly use the library, 48 00:02:06,301 --> 00:02:08,466 we will open the cargo.toml file. 49 00:02:09,765 --> 00:02:13,033 We will add Criterion as dev dependency. 50 00:02:13,034 --> 00:02:16,133 [No Audio] 51 00:02:16,134 --> 00:02:19,266 Next, we will configure our benchmark target, which is 52 00:02:19,267 --> 00:02:21,033 the name of the file that contains our 53 00:02:21,034 --> 00:02:23,533 benchmark tests. I will use the name of 54 00:02:23,534 --> 00:02:25,233 sorting_benchmark. 55 00:02:25,234 --> 00:02:28,333 [No Audio] 56 00:02:28,334 --> 00:02:30,633 Next we will set harness to false. 57 00:02:30,634 --> 00:02:33,933 [No Audio] 58 00:02:33,934 --> 00:02:35,066 This will disable the 59 00:02:35,067 --> 00:02:37,633 default benchmarking system, so we could use 60 00:02:37,634 --> 00:02:41,266 Criterion's provided benchmarking system. Next, 61 00:02:41,267 --> 00:02:43,500 we will create a folder called benches in the 62 00:02:43,501 --> 00:02:44,900 root of our project. 63 00:02:44,901 --> 00:02:47,066 [No Audio] 64 00:02:47,067 --> 00:02:48,233 Please make sure that the 65 00:02:48,234 --> 00:02:50,700 folder name must be benches, otherwise, 66 00:02:50,701 --> 00:02:53,100 the command for executing the benchmark may 67 00:02:53,101 --> 00:02:55,533 not be successful as cargo may not be able to 68 00:02:55,534 --> 00:02:57,766 find the relevant file for the benchmark. 69 00:02:58,500 --> 00:03:01,000 Inside that folder, we will create a new file 70 00:03:01,033 --> 00:03:04,466 called sorting_benchmark.rs. 71 00:03:04,467 --> 00:03:07,433 [No Audio] 72 00:03:07,434 --> 00:03:09,700 This name must be the same as the name that we 73 00:03:09,701 --> 00:03:11,900 mentioned in the cargo.toml file. 74 00:03:12,966 --> 00:03:17,366 Now we will add some code to the file. First let's 75 00:03:17,367 --> 00:03:20,133 let's bring our sought array functions into scope. 76 00:03:20,134 --> 00:03:22,433 [No Audio] 77 00:03:22,434 --> 00:03:24,766 We will also bring some items from the 78 00:03:24,767 --> 00:03:26,633 Criterion crate into scope. 79 00:03:26,634 --> 00:03:31,366 [No Audio] 80 00:03:31,367 --> 00:03:35,000 Next, we will set up our benchmarking test function. 81 00:03:35,001 --> 00:03:38,800 [No Audio] 82 00:03:38,801 --> 00:03:41,933 The benchmark function gets access to an instance 83 00:03:41,934 --> 00:03:44,400 of the Criterion struct, which allows us to 84 00:03:44,401 --> 00:03:47,133 configure and execute benchmarks. In this 85 00:03:47,134 --> 00:03:49,500 case, the variable c is a mutable reference 86 00:03:49,501 --> 00:03:52,800 of type Criterion. We will use it to create a 87 00:03:52,801 --> 00:03:55,300 new benchmark. Inside the function, I will 88 00:03:55,301 --> 00:03:58,100 first define a vector containing some numbers. 89 00:03:58,101 --> 00:04:02,333 [No Audio] 90 00:04:02,334 --> 00:04:04,233 The vector contains a list of 91 00:04:04,234 --> 00:04:06,800 unsorted numbers which will be sorted using 92 00:04:06,801 --> 00:04:09,300 the sorting algorithm, that we will be using 93 00:04:09,301 --> 00:04:11,933 for benchmarking, and now let's create our 94 00:04:11,934 --> 00:04:14,866 benchmark. To create a new benchmark we will 95 00:04:14,867 --> 00:04:16,366 call the benchmark function on the 96 00:04:16,367 --> 00:04:18,132 Criterion struct instance. 97 00:04:18,133 --> 00:04:20,600 [No Audio] 98 00:04:20,601 --> 00:04:22,333 This function takes two 99 00:04:22,334 --> 00:04:26,166 arguments, an id, which I will mention inside quotes. 100 00:04:26,167 --> 00:04:28,100 [No Audio] 101 00:04:28,101 --> 00:04:30,233 And the second argument of a closure. 102 00:04:31,333 --> 00:04:34,133 Inside the closure, we get access to an 103 00:04:34,134 --> 00:04:36,100 instance of Bencher which allows us to 104 00:04:36,101 --> 00:04:38,566 iterate a benchmark function to measure its 105 00:04:38,567 --> 00:04:41,166 performance. Inside the closure body we will 106 00:04:41,167 --> 00:04:42,666 call the iter method. 107 00:04:42,667 --> 00:04:46,800 [No Audio] 108 00:04:46,801 --> 00:04:48,166 This takes another closure 109 00:04:48,167 --> 00:04:50,266 that executes our sort function. 110 00:04:50,267 --> 00:04:52,500 [No Audio] 111 00:04:52,501 --> 00:04:55,466 The sort array function will execute many times 112 00:04:55,467 --> 00:04:57,300 for its performance to be accurately 113 00:04:57,301 --> 00:05:00,166 measured, and with that our benchmark is now 114 00:05:00,167 --> 00:05:04,333 complete. Next, we will use the criterion_group macro. 115 00:05:04,334 --> 00:05:07,033 [No Audio] 116 00:05:07,034 --> 00:05:09,000 This macro is used to define a 117 00:05:09,001 --> 00:05:11,166 collection of functions to call within a 118 00:05:11,167 --> 00:05:13,633 common Criterion configuration. The first 119 00:05:13,666 --> 00:05:16,100 argument is the name of the group, which in 120 00:05:16,101 --> 00:05:18,233 this case is benches and the remaining 121 00:05:18,266 --> 00:05:20,866 arguments or names of functions. Since we have 122 00:05:20,867 --> 00:05:24,100 only a single function, so we will mention that. 123 00:05:24,101 --> 00:05:26,766 [No Audio] 124 00:05:26,767 --> 00:05:28,933 Finally, we will need to use one macro 125 00:05:28,934 --> 00:05:30,900 called a criterion_main. 126 00:05:30,901 --> 00:05:34,933 [No Audio] 127 00:05:34,934 --> 00:05:36,433 This macro expands to 128 00:05:36,434 --> 00:05:38,733 a main function which runs all the benchmarks 129 00:05:38,734 --> 00:05:40,900 in a given group, in this case, the benches 130 00:05:40,901 --> 00:05:44,433 group. We are all set up, so let's go ahead and 131 00:05:44,434 --> 00:05:47,300 rent a benchmark tests. To execute, we will use 132 00:05:47,301 --> 00:05:49,066 the command of cargo bench. 133 00:05:49,067 --> 00:05:54,666 [No Audio] 134 00:05:54,667 --> 00:05:57,500 It provides information about 100 runs of the 135 00:05:57,501 --> 00:06:00,000 same program and compute their average 136 00:06:00,001 --> 00:06:02,400 running performance in terms of nanoseconds. 137 00:06:02,700 --> 00:06:04,666 In this case, the average running times turns 138 00:06:04,667 --> 00:06:08,200 out to be this much nanoseconds. It also 139 00:06:08,201 --> 00:06:11,000 shows in some information about the outliers, 140 00:06:11,033 --> 00:06:13,966 which out of the 100 runs were quite deviant 141 00:06:13,967 --> 00:06:17,066 in performance. Let us run the benchmark again. 142 00:06:17,067 --> 00:06:21,166 [No Audio] 143 00:06:21,167 --> 00:06:23,100 It says this time no change in 144 00:06:23,101 --> 00:06:25,433 performance is detected. This means that the 145 00:06:25,434 --> 00:06:27,600 performance is quite stable and does not 146 00:06:27,601 --> 00:06:30,533 change during different runs. Now, let us 147 00:06:30,534 --> 00:06:32,733 change the algorithm from algorithm 1 to 148 00:06:32,734 --> 00:06:35,433 date of algorithm 1 and see the performance again. 149 00:06:35,434 --> 00:06:41,800 [No Audio] 150 00:06:41,801 --> 00:06:44,533 The performance has regressed. This clearly 151 00:06:44,534 --> 00:06:46,833 indicates that the performance of algorithm 1 152 00:06:46,834 --> 00:06:49,433 is superior to that of algorithm 2. It 153 00:06:49,434 --> 00:06:51,533 also gives useful information about the 154 00:06:51,534 --> 00:06:54,166 percentage change in the performance. In this 155 00:06:54,167 --> 00:06:56,400 case, it is quite high, and therefore provides 156 00:06:56,433 --> 00:06:59,100 a clear indication to the implementers that 157 00:06:59,101 --> 00:07:01,466 they should prefer to use algorithm 1 over 158 00:07:01,467 --> 00:07:04,300 algorithm 2. That brings us to the end of 159 00:07:04,301 --> 00:07:06,733 this tutorial. See you again with more 160 00:07:06,734 --> 00:07:09,166 performance improvement tutorials and until 161 00:07:09,167 --> 00:07:10,966 then happy Rust programming.