1 00:00:06,660 --> 00:00:07,920 - In the previous section, 2 00:00:07,920 --> 00:00:10,050 we started looking at string slices. 3 00:00:10,050 --> 00:00:12,540 As we know, a string slice refers to some text 4 00:00:12,540 --> 00:00:14,340 you've allocated elsewhere. 5 00:00:14,340 --> 00:00:15,750 We're gonna look at some techniques 6 00:00:15,750 --> 00:00:18,300 with string slices now in this section. 7 00:00:18,300 --> 00:00:21,600 So when you create a string slice into some text, 8 00:00:21,600 --> 00:00:24,780 the string slice contains two pieces of information. 9 00:00:24,780 --> 00:00:28,397 It contains the address of one of the bytes in the text, 10 00:00:28,397 --> 00:00:32,370 and it also knows the length of the slice in bytes, 11 00:00:32,370 --> 00:00:33,720 and it's quite useful. 12 00:00:33,720 --> 00:00:35,490 You can get this information. 13 00:00:35,490 --> 00:00:38,557 Like here, I've got a string literal, 14 00:00:38,557 --> 00:00:41,160 "hello" with a smiley face, 15 00:00:41,160 --> 00:00:43,987 and so let me just draw that string literal here, 16 00:00:43,987 --> 00:00:48,150 "hello" smiley face. 17 00:00:48,150 --> 00:00:50,640 Our s is a string slice. 18 00:00:50,640 --> 00:00:53,640 So a string slice, remember, is a fat pointer. 19 00:00:53,640 --> 00:00:56,790 It has a pointer to a byte, 20 00:00:56,790 --> 00:00:58,650 and then it has the length of the string. 21 00:00:58,650 --> 00:01:02,040 Let's call it n like that, the length of the slice, 22 00:01:02,040 --> 00:01:03,840 and when you've got a string slice, 23 00:01:03,840 --> 00:01:06,510 there's an as_ptr function, as pointer. 24 00:01:06,510 --> 00:01:08,880 It'll give you the address that it's pointing to, 25 00:01:08,880 --> 00:01:10,980 and then you can print that using the print formatter, 26 00:01:10,980 --> 00:01:13,200 the pointer formatter, okay? 27 00:01:13,200 --> 00:01:15,150 So it'll give you the address of the first byte. 28 00:01:15,150 --> 00:01:17,340 You can also get the length of the slice. 29 00:01:17,340 --> 00:01:20,610 How big is this slice in bytes? 30 00:01:20,610 --> 00:01:22,770 And also, of course, you can print out the slice 31 00:01:22,770 --> 00:01:24,690 as it stands, like I've done here. 32 00:01:24,690 --> 00:01:26,070 So I'm gonna run lots of demos. 33 00:01:26,070 --> 00:01:28,140 I'm gonna run four separate examples in this section. 34 00:01:28,140 --> 00:01:30,810 So let's have a look at this one first. 35 00:01:30,810 --> 00:01:35,100 So here's the code for the chapter, lesson07_borrowing, 36 00:01:35,100 --> 00:01:39,450 and in main, I'm gonna uncomment this statement here, 37 00:01:39,450 --> 00:01:41,970 demo_string_slice_techniques. 38 00:01:41,970 --> 00:01:44,013 Okay, and that's here. 39 00:01:45,000 --> 00:01:46,020 I've got four functions. 40 00:01:46,020 --> 00:01:48,060 We're going to look at basic slice usage, 41 00:01:48,060 --> 00:01:50,580 and then some other techniques later on. 42 00:01:50,580 --> 00:01:54,120 So I'll just comment out the other functions just for now. 43 00:01:54,120 --> 00:01:56,100 We'll come and look at those later, 44 00:01:56,100 --> 00:01:59,223 and let's have a look at the slice_usage function. 45 00:02:00,390 --> 00:02:04,350 Okay, so I create a string slice, s1, 46 00:02:04,350 --> 00:02:06,270 into that literal text. 47 00:02:06,270 --> 00:02:08,910 I could have also created an actual String object. 48 00:02:08,910 --> 00:02:10,620 This would've worked equally well 49 00:02:10,620 --> 00:02:14,100 for an actual String object, not just a slice. 50 00:02:14,100 --> 00:02:18,570 So I can get the address of the first byte in the text 51 00:02:18,570 --> 00:02:20,520 and display it as a pointer. 52 00:02:20,520 --> 00:02:22,380 I can get the length of my slice 53 00:02:22,380 --> 00:02:25,290 in bytes and display it there, 54 00:02:25,290 --> 00:02:28,590 and I can display the whole text for the string as well. 55 00:02:28,590 --> 00:02:30,210 So if I run this, 56 00:02:30,210 --> 00:02:32,550 don't think there's gonna be any great surprises here. 57 00:02:32,550 --> 00:02:35,223 Let's just see. Well, I say that. 58 00:02:36,600 --> 00:02:39,540 It outputs the pointer. That's interesting. 59 00:02:39,540 --> 00:02:41,370 Notice though that the length is nine. 60 00:02:41,370 --> 00:02:43,420 Now, you probably weren't expecting that. 61 00:02:44,880 --> 00:02:46,590 We'll talk about how this works in a moment. 62 00:02:46,590 --> 00:02:50,910 It actually required nine bytes to represent that string, 63 00:02:50,910 --> 00:02:54,840 not six like I might imagine, but nine, 64 00:02:54,840 --> 00:02:58,920 and then it output the text hello smiley face. 65 00:02:58,920 --> 00:03:02,790 So we need to understand how strings actually work 66 00:03:02,790 --> 00:03:04,950 in terms of encoding. 67 00:03:04,950 --> 00:03:08,940 So in Rust, strings are encoded as UTF-8. 68 00:03:08,940 --> 00:03:12,210 In other words, a string or a character 69 00:03:12,210 --> 00:03:16,860 is an unsigned eight-bit integer, okay? 70 00:03:16,860 --> 00:03:21,860 And some characters can be simply represented in one byte, 71 00:03:22,440 --> 00:03:26,760 like the letter H only requires one byte to represent it, 72 00:03:26,760 --> 00:03:29,700 but more complex characters like a smiley face 73 00:03:29,700 --> 00:03:31,110 might require more bytes. 74 00:03:31,110 --> 00:03:36,030 So the smiley face requires four bytes to represent it. 75 00:03:36,030 --> 00:03:39,780 So that means when you iterate over a string, 76 00:03:39,780 --> 00:03:42,060 you've got two different ways you can iterate. 77 00:03:42,060 --> 00:03:44,123 You can iterate over the raw bytes if you wanted to. 78 00:03:44,123 --> 00:03:48,510 You can say, what's the byte? What's the value of each byte? 79 00:03:48,510 --> 00:03:50,400 Or you can iterate over the characters, 80 00:03:50,400 --> 00:03:52,590 which might be a bit more useful. 81 00:03:52,590 --> 00:03:54,000 So I'm gonna show you an example 82 00:03:54,000 --> 00:03:56,340 of both of those types of iteration. 83 00:03:56,340 --> 00:03:59,250 First of all, have a look here. 84 00:03:59,250 --> 00:04:03,960 S is my string slice, "hello" smiley face, 85 00:04:03,960 --> 00:04:07,710 and the string slice has a bytes method, 86 00:04:07,710 --> 00:04:10,020 and it'll give you each individual byte. 87 00:04:10,020 --> 00:04:12,750 So this loop is actually going to iterate nine times. 88 00:04:12,750 --> 00:04:16,140 It'll iterate over the simple characters, 89 00:04:16,140 --> 00:04:19,410 and it'll iterate four times over the smiley face 90 00:04:19,410 --> 00:04:22,860 because that smiley face is actually four bytes internally, 91 00:04:22,860 --> 00:04:25,800 and it'll output each byte individually. 92 00:04:25,800 --> 00:04:26,940 That might be useful. 93 00:04:26,940 --> 00:04:29,430 Accessing bytes might be useful if you're interacting 94 00:04:29,430 --> 00:04:31,380 with some kind of low-level API, 95 00:04:31,380 --> 00:04:33,960 might actually want to call some kind of C API 96 00:04:33,960 --> 00:04:36,720 where you need to access individual bytes. 97 00:04:36,720 --> 00:04:38,700 This might be the way to do it, 98 00:04:38,700 --> 00:04:41,130 but from a more high-level point of view, 99 00:04:41,130 --> 00:04:42,900 it's probably easier if you iterate 100 00:04:42,900 --> 00:04:44,730 over the actual characters. 101 00:04:44,730 --> 00:04:47,460 So here, if I iterate over the characters, 102 00:04:47,460 --> 00:04:49,080 there's a .chars method. 103 00:04:49,080 --> 00:04:51,690 It'll give you back a whole character at a time. 104 00:04:51,690 --> 00:04:54,300 So this will give me back an h, then an e, 105 00:04:54,300 --> 00:04:57,060 then the l, then the l, then the o, 106 00:04:57,060 --> 00:04:59,940 then the smiley face will come back as a single character. 107 00:04:59,940 --> 00:05:02,700 So this is going to iterate six times. 108 00:05:02,700 --> 00:05:06,390 This smiley face will be one character. 109 00:05:06,390 --> 00:05:09,810 So let's see an example of both of these techniques. 110 00:05:09,810 --> 00:05:11,460 Here we are back in the code. 111 00:05:11,460 --> 00:05:15,580 I'm going to uncomment the second function, slice_iteration, 112 00:05:16,530 --> 00:05:20,880 and here it is, slice_iteration. 113 00:05:20,880 --> 00:05:24,573 I'm gonna run it so we can then see what it output, 114 00:05:25,440 --> 00:05:26,790 what the output looks like. 115 00:05:28,290 --> 00:05:31,770 So first of all, here's my string slice. 116 00:05:31,770 --> 00:05:35,610 It would work just as well with actual String objects. 117 00:05:35,610 --> 00:05:39,660 So you can take your string slice or the String object 118 00:05:39,660 --> 00:05:41,820 and treat it as bytes. 119 00:05:41,820 --> 00:05:43,800 That's gonna iterate nine times. 120 00:05:43,800 --> 00:05:48,360 There were nine bytes in this slice, as we saw just now. 121 00:05:48,360 --> 00:05:50,010 What I've done for each byte, 122 00:05:50,010 --> 00:05:53,673 I output the value in decimal and hexadecimal and octal. 123 00:05:54,750 --> 00:05:57,690 I output the same byte in those formats. 124 00:05:57,690 --> 00:06:01,127 So that's the letter h in decimal, octal, 125 00:06:02,233 --> 00:06:04,680 and so hexadecimal and octal, 126 00:06:04,680 --> 00:06:09,480 and this is the letter e in decimal, hexadecimal, octal. 127 00:06:09,480 --> 00:06:12,720 That's the letter l. That's the other letter l. 128 00:06:12,720 --> 00:06:16,650 That's the letter o, and then this, these are the first, 129 00:06:16,650 --> 00:06:19,680 these are the four bytes that constitute the smiley face 130 00:06:19,680 --> 00:06:21,243 as individual bytes. 131 00:06:22,440 --> 00:06:24,300 Okay, or alternatively, 132 00:06:24,300 --> 00:06:26,790 you can iterate over the actual characters themselves, 133 00:06:26,790 --> 00:06:29,160 which is kind of more user friendly. 134 00:06:29,160 --> 00:06:32,850 So in that case, we get back in the whole character. 135 00:06:32,850 --> 00:06:36,510 That's four bytes' worth, but it's one logical character. 136 00:06:36,510 --> 00:06:39,543 Rust understands the internal formatting. 137 00:06:40,710 --> 00:06:44,670 All right, so that was actually quite useful to discuss. 138 00:06:44,670 --> 00:06:46,350 The next thing we're going to look at is 139 00:06:46,350 --> 00:06:48,393 how you can get back part of a string. 140 00:06:49,500 --> 00:06:52,170 Okay, so let's discuss the theory for that. 141 00:06:52,170 --> 00:06:54,330 This is a bit like what you can do in Python. 142 00:06:54,330 --> 00:06:56,820 You can actually get back not the whole text 143 00:06:56,820 --> 00:06:58,830 but part of a text. 144 00:06:58,830 --> 00:07:02,280 You can specify the start index and the end index 145 00:07:02,280 --> 00:07:04,440 that you want to get back as byte positions, 146 00:07:04,440 --> 00:07:07,290 not as character positions, but as byte positions. 147 00:07:07,290 --> 00:07:08,700 So it's quite low level. 148 00:07:08,700 --> 00:07:11,400 What you do is you specify the start index 149 00:07:11,400 --> 00:07:14,730 as an inclusive number, and the default is zero. 150 00:07:14,730 --> 00:07:17,340 The end index is exclusive, 151 00:07:17,340 --> 00:07:19,650 and it defaults to the end of the string. 152 00:07:19,650 --> 00:07:22,893 The syntax looks like this. You have some kind of string. 153 00:07:23,730 --> 00:07:26,370 You can specify there's my string message. 154 00:07:26,370 --> 00:07:28,050 This is a string slice. 155 00:07:28,050 --> 00:07:30,840 You can say I want the start index 156 00:07:30,840 --> 00:07:32,763 and whatever number that might be, 157 00:07:32,763 --> 00:07:35,490 .., and then some end index, 158 00:07:35,490 --> 00:07:38,010 and what you do is you basically take an & there. 159 00:07:38,010 --> 00:07:40,860 This means gimme back a slice of the text 160 00:07:40,860 --> 00:07:43,680 starting at this position and ending at that position, 161 00:07:43,680 --> 00:07:46,260 or rather, one before that position 162 00:07:46,260 --> 00:07:48,363 because the end index is exclusive. 163 00:07:49,200 --> 00:07:51,950 So I'm gonna show you some examples of that. Let's see. 164 00:07:53,593 --> 00:07:55,343 Slice_part_of_string. 165 00:07:57,060 --> 00:08:01,320 So let me let you see what the code looks like, 166 00:08:01,320 --> 00:08:02,940 and while you're looking at that, 167 00:08:02,940 --> 00:08:04,590 I'm just going to do a cargo run. 168 00:08:09,630 --> 00:08:13,140 Okay then, so my string is howdy. 169 00:08:13,140 --> 00:08:17,010 I think that's going to be nine bytes, five for the letters, 170 00:08:17,010 --> 00:08:20,253 and then four bytes for the sunglass man. 171 00:08:21,090 --> 00:08:25,200 So here I take a slice starting at position zero 172 00:08:25,200 --> 00:08:27,780 up to but not including three. 173 00:08:27,780 --> 00:08:31,260 So that's going to be position zero, one, two. 174 00:08:31,260 --> 00:08:33,000 That's going to be "how." 175 00:08:33,000 --> 00:08:34,950 That slice is what I'm gonna get back. 176 00:08:34,950 --> 00:08:37,770 S3 is going to be that slice there. 177 00:08:37,770 --> 00:08:42,770 I output for s3 the pointer where it starts. 178 00:08:44,640 --> 00:08:46,950 I output the length of the slice. 179 00:08:46,950 --> 00:08:49,020 Okay, the length of the slice there. 180 00:08:49,020 --> 00:08:50,070 Well, the length of the slice 181 00:08:50,070 --> 00:08:52,290 is going to be three, isn't it? 182 00:08:52,290 --> 00:08:54,030 Zero, one and two. 183 00:08:54,030 --> 00:08:55,890 It has a length of three, 184 00:08:55,890 --> 00:08:58,470 and the text that it contains in this slice 185 00:08:58,470 --> 00:09:01,590 from bytes zero to two, really, 186 00:09:01,590 --> 00:09:03,930 is going to be zero, one, two. 187 00:09:03,930 --> 00:09:07,110 That's the text that's going to be in this slice. 188 00:09:07,110 --> 00:09:09,690 So s3 just gives me that piece of text. 189 00:09:09,690 --> 00:09:12,330 The whole text was "howdy" smiley face, 190 00:09:12,330 --> 00:09:15,303 but this slice is only looking at part of it. 191 00:09:17,010 --> 00:09:19,050 S4 is going to be the same, actually. 192 00:09:19,050 --> 00:09:21,180 If you don't specify the start index, 193 00:09:21,180 --> 00:09:22,800 then it defaults to zero. 194 00:09:22,800 --> 00:09:24,420 So that would actually be the same. 195 00:09:24,420 --> 00:09:26,760 S4 would be the same as s3. 196 00:09:26,760 --> 00:09:29,850 S4 is also "how," 197 00:09:29,850 --> 00:09:31,770 and then here I'm getting 198 00:09:31,770 --> 00:09:35,790 from byte position two up to four, really, 199 00:09:35,790 --> 00:09:38,850 so zero, one, two. 200 00:09:38,850 --> 00:09:43,140 So it'll be two, three, four. It'll be "wdy." 201 00:09:43,140 --> 00:09:46,072 That's also got a length of three bytes. 202 00:09:46,072 --> 00:09:49,410 Okay, so a length of three, "wdy," 203 00:09:49,410 --> 00:09:52,380 and then finally, I get a slice that starts at position two 204 00:09:52,380 --> 00:09:53,790 and goes to the end. 205 00:09:53,790 --> 00:09:57,720 Position two is the w, and it goes to the end. 206 00:09:57,720 --> 00:10:00,780 So that's going to be seven bytes, isn't it? 207 00:10:00,780 --> 00:10:02,430 Because that's four bytes. 208 00:10:02,430 --> 00:10:06,210 The smiley face is four bytes, and plus those three, 209 00:10:06,210 --> 00:10:08,310 that's gonna be seven bytes altogether. 210 00:10:08,310 --> 00:10:09,930 That's the length of that slice, 211 00:10:09,930 --> 00:10:14,310 and the text of that slice is "wdy" 212 00:10:14,310 --> 00:10:17,250 man with sunglasses or person sunglasses. 213 00:10:17,250 --> 00:10:19,230 So that's really the point of string slices. 214 00:10:19,230 --> 00:10:20,490 You don't own the text. 215 00:10:20,490 --> 00:10:23,520 You just have a pointer into a viewport, 216 00:10:23,520 --> 00:10:26,223 and you know the length of that viewport in bytes. 217 00:10:27,450 --> 00:10:28,283 Very good. 218 00:10:28,283 --> 00:10:30,900 Right, one last thing to look at, slice_mutability. 219 00:10:30,900 --> 00:10:34,680 So that's going to be the last thing we'll discuss 220 00:10:34,680 --> 00:10:38,070 in this little section, slice_mutability. 221 00:10:38,070 --> 00:10:40,230 Let's look at the theory first. 222 00:10:40,230 --> 00:10:41,493 So imagine you've got a mutable string, 223 00:10:41,493 --> 00:10:45,030 a string that you've created using the mut keyword. 224 00:10:45,030 --> 00:10:47,850 You can define a mutable string slice. 225 00:10:47,850 --> 00:10:51,450 You say &mut to make it a mutable reference, 226 00:10:51,450 --> 00:10:54,270 and you use str to mean it's a string slice 227 00:10:54,270 --> 00:10:58,020 not a whole string, but just a slice of a string. 228 00:10:58,020 --> 00:10:59,120 This is what it looks like. 229 00:10:59,120 --> 00:11:02,553 In this example, I've got a mutable string called message. 230 00:11:05,520 --> 00:11:07,650 Okay, and it points to "croeso," 231 00:11:07,650 --> 00:11:10,953 which you know means welcome in Wales, 232 00:11:12,397 --> 00:11:17,397 "croeso," and then s is a mutable string slice. 233 00:11:17,520 --> 00:11:19,800 So you take a mutable reference to the string, 234 00:11:19,800 --> 00:11:22,920 but the slice, remember, is like a fat pointer 235 00:11:22,920 --> 00:11:24,570 to the text itself. 236 00:11:24,570 --> 00:11:26,760 So the slice will have a pointer 237 00:11:26,760 --> 00:11:28,590 that points to the letter c, 238 00:11:28,590 --> 00:11:32,370 and it'll have a length of six, okay? 239 00:11:32,370 --> 00:11:35,280 So that basically is my string slice s, 240 00:11:35,280 --> 00:11:38,550 and then I can use my string slice. 241 00:11:38,550 --> 00:11:39,810 On my string slice, 242 00:11:39,810 --> 00:11:42,180 I can call functions like make uppercase. 243 00:11:42,180 --> 00:11:45,153 It would convert this string into uppercase, 244 00:11:47,077 --> 00:11:50,100 "CROESO," like so. 245 00:11:50,100 --> 00:11:51,420 I'll run an example of that. 246 00:11:51,420 --> 00:11:53,100 It's quite straightforward actually, 247 00:11:53,100 --> 00:11:54,750 but it's nice to see an example. 248 00:11:54,750 --> 00:11:57,270 So slice_mutability, 249 00:11:57,270 --> 00:11:59,370 I'll run the example so you can see the output, 250 00:11:59,370 --> 00:12:01,530 and then we'll discuss. 251 00:12:01,530 --> 00:12:04,950 So here's my initial mutable string, "croeso," 252 00:12:04,950 --> 00:12:07,660 and then, oh, I append some other text. 253 00:12:07,660 --> 00:12:10,830 "Croeso o gymru" means welcome from Wales. 254 00:12:10,830 --> 00:12:14,160 So then in here, 255 00:12:14,160 --> 00:12:18,300 I've got a mutable reference only visible inside this block, 256 00:12:18,300 --> 00:12:21,390 goes outta scope there, and my mutable reference, 257 00:12:21,390 --> 00:12:25,080 my mutable string slice, oh, it refers to, 258 00:12:25,080 --> 00:12:29,820 it's a mutable reference into characters nine onwards, 259 00:12:29,820 --> 00:12:31,320 so not the whole text. 260 00:12:31,320 --> 00:12:32,700 It just looks at part of the text 261 00:12:32,700 --> 00:12:34,410 from position nine onwards. 262 00:12:34,410 --> 00:12:38,823 Well, let's see, zero, one, two, three, four, five, 263 00:12:40,680 --> 00:12:42,480 six, seven, eight, nine. 264 00:12:42,480 --> 00:12:46,980 So basically it starts, my slice starts at the g, 265 00:12:46,980 --> 00:12:48,810 and it goes to the end of the string. 266 00:12:48,810 --> 00:12:51,780 The g is position nine. 267 00:12:51,780 --> 00:12:55,050 So my slice is just looking at "gymru," 268 00:12:55,050 --> 00:12:57,840 and it converts that slice to uppercase. 269 00:12:57,840 --> 00:13:01,440 So just this part of the actual text itself 270 00:13:01,440 --> 00:13:04,657 will be converted to uppercase, as we can see here, 271 00:13:04,657 --> 00:13:06,660 "croeso o GYMRU." 272 00:13:06,660 --> 00:13:08,400 Okay, so hopefully by now, 273 00:13:08,400 --> 00:13:11,700 you've got a better feel for why string slices are useful. 274 00:13:11,700 --> 00:13:14,700 They give you access to a window of text 275 00:13:14,700 --> 00:13:18,033 which you can access and slice and dice as you need.