1 00:00:06,700 --> 00:00:09,610 - I wanna show you again, more practical slicing code 2 00:00:09,610 --> 00:00:11,220 and at the same time, teach you a little bit more 3 00:00:11,220 --> 00:00:13,870 about what strings are in Go. 4 00:00:13,870 --> 00:00:16,590 So let's take a look at this sample program. 5 00:00:16,590 --> 00:00:19,620 Now, strings in Go are UTF-8 based. 6 00:00:19,620 --> 00:00:21,980 In fact, the entire language is UTF-8 based. 7 00:00:21,980 --> 00:00:24,480 If you are not saving your source code as UTF-8 8 00:00:24,480 --> 00:00:25,880 you're gonna have a real problem 9 00:00:25,880 --> 00:00:27,960 with those literal strings and raw strings 10 00:00:27,960 --> 00:00:30,610 that you're storing inside your code files. 11 00:00:30,610 --> 00:00:33,930 Everything has to be UTF-8 encoding, all the way through. 12 00:00:33,930 --> 00:00:35,820 Now, what's interesting about UTF-8 13 00:00:35,820 --> 00:00:39,530 is that it's a three layer character set. 14 00:00:39,530 --> 00:00:41,680 You have bytes at the very bottom, 15 00:00:41,680 --> 00:00:43,880 and really, we would always consider strings 16 00:00:43,880 --> 00:00:46,190 just to be bytes at the end of the day. 17 00:00:46,190 --> 00:00:48,420 You've got bytes at the bottom, 18 00:00:48,420 --> 00:00:51,050 in the middle you have what are call code points. 19 00:00:51,050 --> 00:00:54,540 And a code point is a 32-bit or 4-byte value. 20 00:00:54,540 --> 00:00:57,950 And then, after code points, you have characters. 21 00:00:57,950 --> 00:01:00,610 And the idea is that a code point 22 00:01:00,610 --> 00:01:04,110 is anywhere from one to four bytes. 23 00:01:04,110 --> 00:01:06,160 And then a character is anywhere 24 00:01:06,160 --> 00:01:07,930 from one to multiple code points. 25 00:01:07,930 --> 00:01:09,550 You have this, kind of like, 26 00:01:09,550 --> 00:01:12,380 n-tiered type of character set. 27 00:01:12,380 --> 00:01:14,930 So, let's take a look at some code here. 28 00:01:14,930 --> 00:01:18,540 On line 38 here, I have a literal string 29 00:01:18,540 --> 00:01:20,210 and this string actually is going to be 30 00:01:20,210 --> 00:01:23,380 a little string of 18 bytes. 31 00:01:23,380 --> 00:01:24,760 Why is that? 32 00:01:24,760 --> 00:01:27,468 Well, if we look at what this string is, 33 00:01:27,468 --> 00:01:28,733 it's gonna be 18 bytes. 34 00:01:30,110 --> 00:01:32,733 What you're gonna see is that the array of bytes, 35 00:01:33,920 --> 00:01:36,490 I'm just gonna draw a bunch of them 36 00:01:36,490 --> 00:01:39,010 until we get 17 there. 37 00:01:39,010 --> 00:01:41,690 That first Chinese character that you see, 38 00:01:41,690 --> 00:01:44,220 the very first one, that requires 39 00:01:44,220 --> 00:01:48,270 three bytes of encoding in order to store it. 40 00:01:48,270 --> 00:01:51,470 Three bytes, which represents one code point, 41 00:01:51,470 --> 00:01:53,000 to that one character. 42 00:01:53,000 --> 00:01:56,900 And, the next three, also, the next Chinese character there 43 00:01:56,900 --> 00:01:59,910 is one code point that requires three bytes. 44 00:01:59,910 --> 00:02:01,800 This is why we're getting to 18. 45 00:02:01,800 --> 00:02:03,890 Three bytes, three bytes, and then everything else 46 00:02:03,890 --> 00:02:06,460 is one byte all the way through. 47 00:02:06,460 --> 00:02:09,130 Now, I want you to look at the next variable. 48 00:02:09,130 --> 00:02:12,120 Notice that we're creating a variable named buf. 49 00:02:12,120 --> 00:02:15,260 But buf is an array, it's not a slice, okay? 50 00:02:15,260 --> 00:02:17,960 This is a constant in Go, which represents 51 00:02:17,960 --> 00:02:21,040 the max number of bytes you need for code point. 52 00:02:21,040 --> 00:02:23,070 So buf is an array 53 00:02:24,170 --> 00:02:26,100 of four bytes. 54 00:02:26,100 --> 00:02:27,240 That's our buf. 55 00:02:27,240 --> 00:02:30,024 And that is also set to its zero value 56 00:02:30,024 --> 00:02:32,250 through the use of the key word var. 57 00:02:32,250 --> 00:02:34,940 Now, things start to get interesting here. 58 00:02:34,940 --> 00:02:38,120 I'm about to range over the string. 59 00:02:38,120 --> 00:02:39,797 And it might be interesting to think, 60 00:02:39,797 --> 00:02:41,810 "Ah man, wait, I can range over a string?" 61 00:02:41,810 --> 00:02:42,643 Yeah, yeah, yeah, yeah. 62 00:02:42,643 --> 00:02:44,068 You can range over string or else 63 00:02:44,068 --> 00:02:45,800 we are using value semantics here. 64 00:02:45,800 --> 00:02:48,270 But, what's important here is that 65 00:02:48,270 --> 00:02:50,120 we're gonna get the index position 66 00:02:50,120 --> 00:02:52,870 and then a copy of, what? 67 00:02:52,870 --> 00:02:54,500 I love this question. 68 00:02:54,500 --> 00:02:56,410 When we range over the string, 69 00:02:56,410 --> 00:02:58,880 we're going to get the index position of what? 70 00:02:58,880 --> 00:03:00,900 And we are gonna get a copy of what? 71 00:03:00,900 --> 00:03:02,420 You have three choices. 72 00:03:02,420 --> 00:03:04,610 We could get a copy of every byte, 73 00:03:04,610 --> 00:03:06,570 we could get a copy of every code point, 74 00:03:06,570 --> 00:03:10,150 or we could get a copy of every character on iteration. 75 00:03:10,150 --> 00:03:11,690 A lot of people wanna tell me that 76 00:03:11,690 --> 00:03:14,060 I think we're gonna iterate by character. 77 00:03:14,060 --> 00:03:17,400 What we're really gonna do is be iterating by code point. 78 00:03:17,400 --> 00:03:19,540 Code point by code point and 79 00:03:19,540 --> 00:03:21,640 the index position's gonna respect that. 80 00:03:21,640 --> 00:03:23,080 So what does that mean? 81 00:03:23,080 --> 00:03:24,970 It means that on the first iteration 82 00:03:24,970 --> 00:03:27,010 we're gonna start at index zero. 83 00:03:27,010 --> 00:03:29,110 Absolutely start at zero. 84 00:03:29,110 --> 00:03:32,840 But on the second iteration, the next code point, 85 00:03:32,840 --> 00:03:34,570 this is the first code point, 86 00:03:34,570 --> 00:03:36,940 the next code point's gonna start here, 87 00:03:36,940 --> 00:03:39,780 three bytes in, 'cause that Chinese character 88 00:03:39,780 --> 00:03:42,240 has it's own code point that's also three bytes. 89 00:03:42,240 --> 00:03:44,100 Which means on the third iteration, 90 00:03:44,100 --> 00:03:46,830 our next code point is here, then here, 91 00:03:46,830 --> 00:03:49,110 then here, and then here, one byte away. 92 00:03:49,110 --> 00:03:51,150 So when we range over string, we're ranging 93 00:03:51,150 --> 00:03:52,690 code point by code point, 94 00:03:52,690 --> 00:03:56,040 which means r is going to give us back 95 00:03:56,040 --> 00:03:58,870 the net position, the code point that we just 96 00:03:58,870 --> 00:03:59,950 iterated over. 97 00:03:59,950 --> 00:04:03,040 R, represents the type rune, 98 00:04:03,040 --> 00:04:07,390 and rune really isn't a type in Go, it's an alias for n-32. 99 00:04:07,390 --> 00:04:08,223 In fact, 100 00:04:10,670 --> 00:04:14,950 the type byte really also is an alias, for UNT-8 101 00:04:14,950 --> 00:04:16,870 But r represents, rune, 102 00:04:16,870 --> 00:04:20,900 our 32-bit, our 4-byte value. 103 00:04:20,900 --> 00:04:22,500 So, look how this code works. 104 00:04:22,500 --> 00:04:23,800 On the first iteration, 105 00:04:23,800 --> 00:04:25,200 what's going to happen? 106 00:04:25,200 --> 00:04:26,320 Well, the first iteration, 107 00:04:26,320 --> 00:04:27,970 i equals zero. 108 00:04:27,970 --> 00:04:30,010 That's where we're starting, right here's zero. 109 00:04:30,010 --> 00:04:32,600 One, two, three, four, five, six. 110 00:04:32,600 --> 00:04:34,200 Starting at zero. 111 00:04:34,200 --> 00:04:36,110 We get the rune length. 112 00:04:36,110 --> 00:04:39,200 And the rune length is gonna tell us how many bytes 113 00:04:39,200 --> 00:04:40,870 we need for this code point. 114 00:04:40,870 --> 00:04:42,290 That's going to be three. 115 00:04:42,290 --> 00:04:45,160 Remember, we needed three bytes for the first code point. 116 00:04:45,160 --> 00:04:48,090 Then I created this string index position, 117 00:04:48,090 --> 00:04:50,500 which is the sum of both of these values. 118 00:04:50,500 --> 00:04:51,760 That's three. 119 00:04:51,760 --> 00:04:54,090 And now we get to line 54. 120 00:04:54,090 --> 00:04:56,460 And line 54 is interesting to me. 121 00:04:56,460 --> 00:04:59,190 We're using the built-in function copy. 122 00:04:59,190 --> 00:05:01,990 And copy says destination source. 123 00:05:01,990 --> 00:05:04,220 I want you to notice that copy works with both slices 124 00:05:04,220 --> 00:05:06,530 and strings, but since a string is immutable, 125 00:05:06,530 --> 00:05:08,730 it can only be a source. 126 00:05:08,730 --> 00:05:11,500 I also want you to notice that we're slicing 127 00:05:11,500 --> 00:05:13,120 the string here. 128 00:05:13,120 --> 00:05:14,973 Now, that's not going to create a slice value, 129 00:05:14,973 --> 00:05:16,580 that will create a new string value. 130 00:05:16,580 --> 00:05:18,450 So look at what we're doing. 131 00:05:18,450 --> 00:05:23,450 As the source inside the copy s, bracket, I, colon, si, 132 00:05:23,710 --> 00:05:26,130 we're saying create me a new string value. 133 00:05:26,130 --> 00:05:26,963 Here it is. 134 00:05:27,890 --> 00:05:31,310 That starts at index position i zero, 135 00:05:31,310 --> 00:05:32,910 through three. 136 00:05:32,910 --> 00:05:34,880 It's gonna give us a length of three. 137 00:05:34,880 --> 00:05:38,160 So, basically, what we're going to end up having, 138 00:05:38,160 --> 00:05:40,460 is a pointer that points here, 139 00:05:40,460 --> 00:05:42,310 with a length of three. 140 00:05:42,310 --> 00:05:43,670 That's our source. 141 00:05:43,670 --> 00:05:45,540 Now, what is our destination? 142 00:05:45,540 --> 00:05:48,470 Our destination is the buffer itself. 143 00:05:48,470 --> 00:05:51,420 We're saying that we're going to copy this into the buffer. 144 00:05:51,420 --> 00:05:53,090 But there's a problem. 145 00:05:53,090 --> 00:05:55,830 Copy only works with slices and strings. 146 00:05:55,830 --> 00:05:57,770 It doesn't work with arrays. 147 00:05:57,770 --> 00:05:59,450 Buf is an array. 148 00:05:59,450 --> 00:06:01,180 But notice something here, 149 00:06:01,180 --> 00:06:04,790 this buffer, this array, the syntax in Go 150 00:06:04,790 --> 00:06:09,080 allows us to apply by slicing syntax to an array. 151 00:06:09,080 --> 00:06:13,260 One of my favorite sayings in Go, is that every array 152 00:06:13,260 --> 00:06:16,440 is just a slice waiting to happen. 153 00:06:16,440 --> 00:06:18,000 And this is what we see here. 154 00:06:18,000 --> 00:06:20,080 So, in order to be able to use the array 155 00:06:20,080 --> 00:06:23,920 as the backing array, that syntax that I've highlighted 156 00:06:23,920 --> 00:06:27,110 will create a new slice value 157 00:06:27,110 --> 00:06:30,580 using the backing array as our storage, 158 00:06:30,580 --> 00:06:33,670 and setting length and capacity to four. 159 00:06:33,670 --> 00:06:34,740 This is brilliant. 160 00:06:34,740 --> 00:06:38,210 I can now use our local array, in the copy, 161 00:06:38,210 --> 00:06:42,070 and we can tell it to just copy the first three bytes. 162 00:06:42,070 --> 00:06:44,853 So these bytes here, come into here, 163 00:06:45,740 --> 00:06:47,330 there you go. 164 00:06:47,330 --> 00:06:49,270 And we've copied this data. 165 00:06:49,270 --> 00:06:51,100 I'm trying to give you a sense of what this program's 166 00:06:51,100 --> 00:06:53,860 going to try to do, is range over the string, 167 00:06:53,860 --> 00:06:55,470 code point by code point. 168 00:06:55,470 --> 00:06:57,620 We're going to store that code point 169 00:06:57,620 --> 00:06:59,230 inside of our local buffer. 170 00:06:59,230 --> 00:07:01,410 And then we're going to display information about it. 171 00:07:01,410 --> 00:07:06,410 And if notice here, I'm slicing the buffer one more time 172 00:07:07,290 --> 00:07:09,630 and this time what I'm saying is 173 00:07:09,630 --> 00:07:12,620 I want to make sure that the length in this case, 174 00:07:12,620 --> 00:07:14,330 is only three. 175 00:07:14,330 --> 00:07:18,260 So I'm saying from the beginning to rl, which is three. 176 00:07:18,260 --> 00:07:20,600 Which will now do this. 177 00:07:20,600 --> 00:07:24,020 And that way we only display those three bytes. 178 00:07:24,020 --> 00:07:26,920 Now, on the next iteration, what happens? 179 00:07:26,920 --> 00:07:29,660 Well, on the next iteration we come back 180 00:07:29,660 --> 00:07:31,520 and we're going to be here. 181 00:07:31,520 --> 00:07:34,480 Which now means that i isn't going to be zero, 182 00:07:34,480 --> 00:07:36,600 i is going to be three. 183 00:07:36,600 --> 00:07:39,160 The rune length is still three. 184 00:07:39,160 --> 00:07:41,490 Si is now six. 185 00:07:41,490 --> 00:07:44,380 And we go ahead now, and this time, 186 00:07:44,380 --> 00:07:46,840 we're going to create a string value. 187 00:07:46,840 --> 00:07:48,550 It will still have a length of three, 188 00:07:48,550 --> 00:07:50,340 but it will start here. 189 00:07:50,340 --> 00:07:52,050 And we do it again. 190 00:07:52,050 --> 00:07:54,470 And then after that, it will be one, one, one, one. 191 00:07:54,470 --> 00:07:56,300 So, if you look at the output here, 192 00:07:56,300 --> 00:08:01,070 you could see that we're ranging code point by code point, 193 00:08:01,070 --> 00:08:03,770 through this string data that we have. 194 00:08:03,770 --> 00:08:05,380 Code point by code point. 195 00:08:05,380 --> 00:08:08,910 You can see the three bytes for the first Chinese character, 196 00:08:08,910 --> 00:08:11,890 the next three bytes for the next Chinese character, 197 00:08:11,890 --> 00:08:14,040 and I'm even able to display those characters out 198 00:08:14,040 --> 00:08:15,870 the way you see it right there. 199 00:08:15,870 --> 00:08:20,550 And then we ended up going into single byte characters 200 00:08:20,550 --> 00:08:21,660 all the way through. 201 00:08:21,660 --> 00:08:23,760 And you can see what those values are. 202 00:08:23,760 --> 00:08:26,140 So, really cool, more practical code. 203 00:08:26,140 --> 00:08:27,300 And we learned a little bit more 204 00:08:27,300 --> 00:08:29,103 about how strings work in Go.