1 00:00:06,630 --> 00:00:07,890 - In this section we're going to look 2 00:00:07,890 --> 00:00:09,750 at string handling in Rust. 3 00:00:09,750 --> 00:00:11,670 And it turns to be a bit more complicated 4 00:00:11,670 --> 00:00:13,500 than you might imagine. 5 00:00:13,500 --> 00:00:15,060 It turns out there are two different kinds 6 00:00:15,060 --> 00:00:19,369 of string in Rust, and the first type is a string literal, 7 00:00:19,369 --> 00:00:22,140 where you just literally have some text enclosed 8 00:00:22,140 --> 00:00:24,900 inside double quotes, like that, 9 00:00:24,900 --> 00:00:26,940 that's a string literal 10 00:00:26,940 --> 00:00:29,460 String literals are statically allocated. 11 00:00:29,460 --> 00:00:33,210 The text is part of your program forever 12 00:00:33,210 --> 00:00:34,230 but you can't change it. 13 00:00:34,230 --> 00:00:37,680 It's immutable, so it's not very flexible. 14 00:00:37,680 --> 00:00:40,500 If you want flexibility and potentially to change 15 00:00:40,500 --> 00:00:43,800 or resize a string, then you can create an instance 16 00:00:43,800 --> 00:00:45,570 of the string type. 17 00:00:45,570 --> 00:00:49,860 String is a structure in Rust and represents 18 00:00:49,860 --> 00:00:51,450 dynamically allocated memory. 19 00:00:51,450 --> 00:00:53,600 The text is actually allocated on the heap. 20 00:00:55,050 --> 00:00:57,000 You can potentially change it 21 00:00:57,000 --> 00:01:00,120 if you declare your string object as mutable 22 00:01:00,120 --> 00:01:03,900 and when your string object disappears, 23 00:01:03,900 --> 00:01:05,223 when it goes outta scope, 24 00:01:06,207 --> 00:01:08,520 then the text is dynamically de-allocated. 25 00:01:08,520 --> 00:01:11,760 So this string structure in Rust 26 00:01:11,760 --> 00:01:16,760 is kind of similar to the string class in C++. 27 00:01:16,890 --> 00:01:20,520 When you create an instance of the string class, 28 00:01:20,520 --> 00:01:22,140 when you create a string object, 29 00:01:22,140 --> 00:01:26,260 the memory is allocated on the heap like so 30 00:01:27,330 --> 00:01:30,300 and you can potentially change it if you want to 31 00:01:30,300 --> 00:01:33,090 as long as you declared the string as mutable. 32 00:01:33,090 --> 00:01:36,190 You could, for example, append something at the end 33 00:01:37,110 --> 00:01:41,227 like this, my name and the O like that. 34 00:01:41,227 --> 00:01:42,450 And at the end of the lifetime 35 00:01:42,450 --> 00:01:44,580 when the string object is destroyed, 36 00:01:44,580 --> 00:01:47,730 it automatically de-allocates its memory like that. 37 00:01:47,730 --> 00:01:50,309 Effectively there's an equivalent of a distractor 38 00:01:50,309 --> 00:01:54,150 in the string class to de-allocate its buffer. 39 00:01:54,150 --> 00:01:57,630 So we need to look at both of these two types separately. 40 00:01:57,630 --> 00:01:59,423 String literals first. 41 00:02:00,469 --> 00:02:02,970 So a string literal is literally text 42 00:02:02,970 --> 00:02:05,580 enclosed in double quotes, okay? 43 00:02:05,580 --> 00:02:07,653 So that seems quite straightforward. 44 00:02:08,490 --> 00:02:11,910 You might be wondering what type is S 45 00:02:11,910 --> 00:02:15,445 and it turns out to be a bit complicated. 46 00:02:15,445 --> 00:02:18,900 When you declare a variable to hold a string literal, 47 00:02:18,900 --> 00:02:23,900 the variable is a string slice, okay? 48 00:02:24,090 --> 00:02:29,090 So str is a primitive type in Rust and the ampersand, 49 00:02:30,390 --> 00:02:32,850 when you say ampersand str, 50 00:02:32,850 --> 00:02:34,850 when you look at that bit of the syntax, 51 00:02:36,060 --> 00:02:38,130 I'll explain the static bit in a minute. 52 00:02:38,130 --> 00:02:40,477 When you see ampersand str, 53 00:02:40,477 --> 00:02:43,770 basically that's called a string slice. 54 00:02:43,770 --> 00:02:47,310 A string slice is sometimes called a fat pointer. 55 00:02:47,310 --> 00:02:49,710 Underneath the surface, the text 56 00:02:49,710 --> 00:02:53,820 that we're trying to refer to is static storage 57 00:02:53,820 --> 00:02:56,850 allocated along the code segment. 58 00:02:56,850 --> 00:02:59,580 And when you declare a string slice like I've done, 59 00:02:59,580 --> 00:03:03,930 either using type inference or when you use explicit typing, 60 00:03:03,930 --> 00:03:08,523 effectively what you get, s, is a string slice. 61 00:03:10,260 --> 00:03:14,860 It has a pointer to the first byte in the text 62 00:03:16,200 --> 00:03:20,160 and it has a count to tell you how big the string is. 63 00:03:20,160 --> 00:03:22,500 There were five characters in hello. 64 00:03:22,500 --> 00:03:26,010 Okay, there's no null terminator at the end of strings 65 00:03:26,010 --> 00:03:28,620 in Rust, they just kind of stop. 66 00:03:28,620 --> 00:03:32,493 So a string slice is a fat pointer. 67 00:03:33,420 --> 00:03:35,700 It points to the first byte 68 00:03:35,700 --> 00:03:38,040 and it knows how many bytes there are. 69 00:03:38,040 --> 00:03:43,040 This bit of syntax here, the quote and the static, 70 00:03:45,300 --> 00:03:47,940 it's called a lifetime specifier. 71 00:03:47,940 --> 00:03:51,360 Rust is very careful about only referring to things 72 00:03:51,360 --> 00:03:53,820 that it knows are still allocated. 73 00:03:53,820 --> 00:03:55,410 You can think of a string slice 74 00:03:55,410 --> 00:03:59,400 as being a bit like a pointer into memory 75 00:03:59,400 --> 00:04:03,270 and Rust needs to know that the memory isn't gonna disappear 76 00:04:03,270 --> 00:04:05,550 before the pointer does. 77 00:04:05,550 --> 00:04:09,840 Imagine that this storage was dynamically allocated 78 00:04:09,840 --> 00:04:13,500 then he'd be left pointing to a dangling pointer. 79 00:04:13,500 --> 00:04:18,500 So this syntax basically says it is basically a string slice 80 00:04:18,990 --> 00:04:21,720 to static storage, okay? 81 00:04:21,720 --> 00:04:24,030 Meaning that the memory that you're pointing to 82 00:04:24,030 --> 00:04:27,480 is statically allocated and will always be there. 83 00:04:27,480 --> 00:04:30,240 You can always guarantee that the underlying text 84 00:04:30,240 --> 00:04:31,440 will be there, 85 00:04:31,440 --> 00:04:34,110 so that means you can always rely on this slice 86 00:04:34,110 --> 00:04:37,113 always pointing to a valid address in memory. 87 00:04:38,010 --> 00:04:38,843 So there we are. 88 00:04:38,843 --> 00:04:41,010 I did say it was gonna be a bit complicated, didn't I? 89 00:04:41,010 --> 00:04:43,080 So anyway, a string slice in essence 90 00:04:43,080 --> 00:04:47,310 is that the ampersand str syntax is a string slice. 91 00:04:47,310 --> 00:04:49,800 It contains the pointer to the text 92 00:04:49,800 --> 00:04:53,490 and it knows how many bytes there are in the text, okay? 93 00:04:53,490 --> 00:04:57,150 So that's basically the type of a string literal 94 00:04:57,150 --> 00:05:00,903 in Rust is a static string slice. 95 00:05:02,460 --> 00:05:05,460 Right, so a string slice knows the address 96 00:05:05,460 --> 00:05:08,340 of the first byte and it knows the number of bytes. 97 00:05:08,340 --> 00:05:10,980 And it turns out if you looked at the documentation 98 00:05:10,980 --> 00:05:15,980 for str online, you'd find out that the str type 99 00:05:16,650 --> 00:05:20,610 has an as pointer method. 100 00:05:20,610 --> 00:05:23,250 It'll basically give you the pointer, 101 00:05:23,250 --> 00:05:25,953 it'll give you the address of the first byte. 102 00:05:27,060 --> 00:05:29,246 It also has a length method, 103 00:05:29,246 --> 00:05:34,246 which gives you the length in bytes of the string like so. 104 00:05:35,640 --> 00:05:37,500 So have a look at this bit of code 105 00:05:37,500 --> 00:05:40,440 in my bottom yellow box here. 106 00:05:40,440 --> 00:05:41,400 You can output. 107 00:05:41,400 --> 00:05:45,210 If you output a string, a string slice I should say, 108 00:05:45,210 --> 00:05:46,743 it outputs hello. 109 00:05:48,030 --> 00:05:52,350 If you call the as pointer function, 110 00:05:52,350 --> 00:05:56,190 it gives you back the address. 111 00:05:56,190 --> 00:05:59,910 You've gotta print that address using the print, sorry, 112 00:05:59,910 --> 00:06:02,040 the pointer formatter. 113 00:06:02,040 --> 00:06:06,990 The colon P syntax means print out an address as a pointer. 114 00:06:06,990 --> 00:06:11,370 It'll actually print out an address as a pointer. 115 00:06:11,370 --> 00:06:14,280 And then finally I print out the length of the string. 116 00:06:14,280 --> 00:06:16,590 The length of the string here would be five. 117 00:06:16,590 --> 00:06:17,940 I'm gonna show you an example of this 118 00:06:17,940 --> 00:06:19,203 before we go any further. 119 00:06:20,070 --> 00:06:24,540 Here's the code example, lesson six, scope ownership. 120 00:06:24,540 --> 00:06:27,240 In the main code, I've got a function call 121 00:06:27,240 --> 00:06:28,830 demo string handling. 122 00:06:28,830 --> 00:06:30,660 I've commented it. 123 00:06:30,660 --> 00:06:35,660 String handling is here and in this example 124 00:06:37,560 --> 00:06:40,200 I've got three separate parts to the demo. 125 00:06:40,200 --> 00:06:45,200 A demo of string literals, first of all, and I'm gonna call, 126 00:06:45,660 --> 00:06:47,040 I've got two other functions here 127 00:06:47,040 --> 00:06:51,150 using string objects and using mutable string objects. 128 00:06:51,150 --> 00:06:52,560 We'll come to those later. 129 00:06:52,560 --> 00:06:54,900 We'll just have a look at string literals first 130 00:06:54,900 --> 00:06:57,963 and I'm gonna comment out these other two statements. 131 00:06:59,130 --> 00:07:03,150 So here is a string literal using type inference. 132 00:07:03,150 --> 00:07:04,890 Here is another string literal 133 00:07:04,890 --> 00:07:09,300 using explicit typing if you want to. 134 00:07:09,300 --> 00:07:12,300 And here I output string one. 135 00:07:12,300 --> 00:07:15,900 I get the address of the text of the slice. 136 00:07:15,900 --> 00:07:20,460 Basically that's the address of the letter H as a pointer. 137 00:07:20,460 --> 00:07:24,390 And I get the length of s1, that's going to be five. 138 00:07:24,390 --> 00:07:26,190 And then I do the same thing for s2, 139 00:07:27,210 --> 00:07:28,810 to print out the contents of s2, 140 00:07:29,790 --> 00:07:34,790 the address where the string starts in memory as pointer 141 00:07:35,250 --> 00:07:38,430 and then the length of the string s2 dot length. 142 00:07:38,430 --> 00:07:39,783 Okay, so let's run this. 143 00:07:44,820 --> 00:07:45,903 Cargo run, 144 00:07:49,350 --> 00:07:53,550 so it outputs s1 is hello. 145 00:07:53,550 --> 00:07:55,140 Okay, so that's just basically printed out 146 00:07:55,140 --> 00:07:57,780 the string slice as it is. 147 00:07:57,780 --> 00:08:01,390 Get the address of the first byte in s1 148 00:08:02,340 --> 00:08:04,740 and gimme the length of s1 in bytes. 149 00:08:04,740 --> 00:08:09,460 And then the same thing for the second string slice, s2. 150 00:08:09,460 --> 00:08:12,360 Okay, so basically when you have a string literal, 151 00:08:12,360 --> 00:08:14,730 technically it is a string slice. 152 00:08:14,730 --> 00:08:16,890 It knows the address and the length 153 00:08:16,890 --> 00:08:18,590 of the text that it's pointing to.