1 00:00:06,634 --> 00:00:09,302 - Now let's talk about the Amazon Simple Storage Service 2 00:00:09,302 --> 00:00:11,135 otherwise known as S3. 3 00:00:12,721 --> 00:00:15,888 So S3, again, as we've mentioned before, 4 00:00:15,888 --> 00:00:18,623 is considered object storage. 5 00:00:18,623 --> 00:00:20,542 Now that's a bit different than block storage, 6 00:00:20,542 --> 00:00:21,794 and we're gonna talk about 7 00:00:21,794 --> 00:00:23,549 that difference here in just a minute. 8 00:00:23,549 --> 00:00:26,874 But the Amazon S3 Service, it's important to know, 9 00:00:26,874 --> 00:00:31,492 that it's inherently highly available and fault tolerant. 10 00:00:31,492 --> 00:00:34,412 And we get that because the cluster 11 00:00:34,412 --> 00:00:37,588 already spans the entire region. 12 00:00:37,588 --> 00:00:39,521 S3 Service makes use 13 00:00:39,521 --> 00:00:43,397 of every availability zone within that region. 14 00:00:43,397 --> 00:00:46,227 And so, it's generally durable to the loss 15 00:00:46,227 --> 00:00:48,546 of two availability zones. 16 00:00:48,546 --> 00:00:51,185 So one of our very fundamental 17 00:00:51,185 --> 00:00:53,626 primary architectural questions we should 18 00:00:53,626 --> 00:00:56,552 always be asking ourselves when we design things 19 00:00:56,552 --> 00:00:59,431 and architect solutions for AWS, 20 00:00:59,431 --> 00:01:02,098 is what happens when this fails? 21 00:01:03,446 --> 00:01:05,417 Whatever that component may be. 22 00:01:05,417 --> 00:01:07,217 What happens when my storage fails? 23 00:01:07,217 --> 00:01:10,206 Well, if we're using S3, then we can say, 24 00:01:10,206 --> 00:01:13,787 what happens when an availability zone becomes unavailable? 25 00:01:13,787 --> 00:01:17,227 Well, our data is already stored 26 00:01:17,227 --> 00:01:20,919 in another two or more availability zones. 27 00:01:20,919 --> 00:01:22,694 And so we have durability 28 00:01:22,694 --> 00:01:24,497 to the loss of an availability zone, 29 00:01:24,497 --> 00:01:26,995 or to the loss of a data center. 30 00:01:26,995 --> 00:01:30,129 We get that baked in already with S3. 31 00:01:30,129 --> 00:01:32,707 There's nothing we need to do to make that happen. 32 00:01:32,707 --> 00:01:35,318 It's also important to know that within S3, 33 00:01:35,318 --> 00:01:37,278 there is no file system. 34 00:01:37,278 --> 00:01:39,186 Once we upload a file to S3, 35 00:01:39,186 --> 00:01:41,130 it's no longer really considered a file. 36 00:01:41,130 --> 00:01:43,478 It's considered an object. 37 00:01:43,478 --> 00:01:48,238 And there's very simple flat hierarchy of buckets, 38 00:01:48,238 --> 00:01:49,768 and objects within buckets. 39 00:01:49,768 --> 00:01:50,601 And that's it. 40 00:01:50,601 --> 00:01:52,196 It doesn't go any deeper than that. 41 00:01:52,196 --> 00:01:54,904 Now, if we notice up here, 42 00:01:54,904 --> 00:01:57,038 let's take a look at a URL 43 00:01:57,038 --> 00:02:00,996 through which we could download an object from S3. 44 00:02:00,996 --> 00:02:03,879 You will notice that we have a slash here, 45 00:02:03,879 --> 00:02:06,469 and that we can use slashes 46 00:02:06,469 --> 00:02:10,200 to form the semblance of a file-based structure. 47 00:02:10,200 --> 00:02:13,591 But ultimately, there are no folders or files. 48 00:02:13,591 --> 00:02:14,511 There are objects. 49 00:02:14,511 --> 00:02:17,940 So what we have here, we have a bucket called, 50 00:02:17,940 --> 00:02:19,732 in this example, mybucket. 51 00:02:19,732 --> 00:02:22,086 And we have an object 52 00:02:22,086 --> 00:02:24,919 called images/cat1.jpg. 53 00:02:27,235 --> 00:02:30,073 That is the key of that object, 54 00:02:30,073 --> 00:02:32,594 and it just so happens to include a slash. 55 00:02:32,594 --> 00:02:34,634 And so by doing that, we can form 56 00:02:34,634 --> 00:02:36,996 the semblance of a hierarchy. 57 00:02:36,996 --> 00:02:39,576 But ultimately, it's still very flat. 58 00:02:39,576 --> 00:02:41,934 It's also important to note that 59 00:02:41,934 --> 00:02:44,703 in this example, we're using https. 60 00:02:44,703 --> 00:02:46,670 There are times when you can get 61 00:02:46,670 --> 00:02:49,253 to S3 using just insecure http, 62 00:02:50,394 --> 00:02:52,503 but it's recommended that anytime 63 00:02:52,503 --> 00:02:55,009 we're transmitting data to or from S3, 64 00:02:55,009 --> 00:02:57,267 we should be using https. 65 00:02:57,267 --> 00:03:00,800 It's also important to note that our bucket name 66 00:03:00,800 --> 00:03:04,467 needs to be globally unique among all of S3. 67 00:03:05,347 --> 00:03:07,408 So, once you create a bucket, 68 00:03:07,408 --> 00:03:10,904 no one else in the world, across every region, 69 00:03:10,904 --> 00:03:13,725 will be able to use that name of their bucket. 70 00:03:13,725 --> 00:03:15,914 And that has to do with DNS, 71 00:03:15,914 --> 00:03:17,704 and you will see in some cases, 72 00:03:17,704 --> 00:03:19,974 the name of the bucket is actually here 73 00:03:19,974 --> 00:03:22,224 as part of the domain name. 74 00:03:23,298 --> 00:03:26,663 There is no limit to the size of the bucket, 75 00:03:26,663 --> 00:03:29,323 or to the number of objects that are in that bucket. 76 00:03:29,323 --> 00:03:32,086 So you can store millions, billions, 77 00:03:32,086 --> 00:03:34,201 or more objects within a bucket. 78 00:03:34,201 --> 00:03:36,561 You can grow into the petabytes and beyond, 79 00:03:36,561 --> 00:03:38,302 and Amazon does not impose 80 00:03:38,302 --> 00:03:40,465 any kind of limit to that bucket. 81 00:03:40,465 --> 00:03:44,076 Now, where we talked about, with elastic block storage, 82 00:03:44,076 --> 00:03:46,696 you pay for allocated storage. 83 00:03:46,696 --> 00:03:49,351 With Amazon Simple Storage Service, 84 00:03:49,351 --> 00:03:51,461 we pay for what you actually use. 85 00:03:51,461 --> 00:03:53,943 So there's no need to pre-allocate anything. 86 00:03:53,943 --> 00:03:56,373 If you upload a meg, you pay for a meg. 87 00:03:56,373 --> 00:03:58,854 If you upload a terabyte or a petabyte, 88 00:03:58,854 --> 00:04:01,002 that's what you pay for. 89 00:04:01,002 --> 00:04:04,113 Now, we do have some limits on the size 90 00:04:04,113 --> 00:04:05,832 of individual objects. 91 00:04:05,832 --> 00:04:08,094 So, individual objects can be 92 00:04:08,094 --> 00:04:10,511 up to five terabytes in size, 93 00:04:11,681 --> 00:04:13,422 but we should remember 94 00:04:13,422 --> 00:04:15,423 that we have an upload limit 95 00:04:15,423 --> 00:04:19,006 of five gigs per upload or per put request. 96 00:04:20,003 --> 00:04:23,691 And so, if we have this limit on the upload size, 97 00:04:23,691 --> 00:04:27,165 but we have the ability to upload objects 98 00:04:27,165 --> 00:04:28,645 that are up to five terabytes, 99 00:04:28,645 --> 00:04:31,013 the way that we get these large files 100 00:04:31,013 --> 00:04:32,874 is through multi-part upload, 101 00:04:32,874 --> 00:04:34,684 and a lot of the Amazon tools, 102 00:04:34,684 --> 00:04:38,845 such as the command line interface and the SDK, 103 00:04:38,845 --> 00:04:42,162 will handle multi-part upload for us. 104 00:04:42,162 --> 00:04:44,472 Within Amazon Simple Storage Service, 105 00:04:44,472 --> 00:04:48,643 we also have the ability to leverage server side encryption, 106 00:04:48,643 --> 00:04:52,352 and we do that by setting a flag per object 107 00:04:52,352 --> 00:04:54,605 at the time of upload. 108 00:04:54,605 --> 00:04:57,471 And Amazon Simple Storage Service 109 00:04:57,471 --> 00:05:01,272 will use AES, the advanced encryption standard, 110 00:05:01,272 --> 00:05:02,855 256-bit encryption. 111 00:05:04,403 --> 00:05:06,892 Amazon Web Services will also 112 00:05:06,892 --> 00:05:09,639 leverage a key per object. 113 00:05:09,639 --> 00:05:11,328 So every time you upload an object, 114 00:05:11,328 --> 00:05:14,752 each object gets its own key to be encrypted with, 115 00:05:14,752 --> 00:05:18,381 and that key is then encrypted by a master key, 116 00:05:18,381 --> 00:05:20,001 which is regularly rotated. 117 00:05:20,001 --> 00:05:24,362 So we have a very high degree of encryption available to us, 118 00:05:24,362 --> 00:05:29,191 so that we can ensure that our data is encrypted 119 00:05:29,191 --> 00:05:32,222 both during transit and at rest 120 00:05:32,222 --> 00:05:34,730 using server side encryption. 121 00:05:34,730 --> 00:05:36,480 Now let's take a look 122 00:05:37,620 --> 00:05:40,945 at object storage versus block storage, 123 00:05:40,945 --> 00:05:44,862 and what it really means to store things on S3. 124 00:05:45,837 --> 00:05:48,706 So a good way to visualize this 125 00:05:48,706 --> 00:05:50,382 and understand the difference 126 00:05:50,382 --> 00:05:52,076 between object storage and block storage, 127 00:05:52,076 --> 00:05:54,133 for those of us who might not be familiar, 128 00:05:54,133 --> 00:05:56,964 is let's take the example of changing 129 00:05:56,964 --> 00:06:00,464 just one character in a one gigabyte file. 130 00:06:01,674 --> 00:06:04,284 You can see here, that somewhere within this file, 131 00:06:04,284 --> 00:06:08,086 we have some bits that represent that character. 132 00:06:08,086 --> 00:06:09,663 With object storage, 133 00:06:09,663 --> 00:06:12,333 if we wanted to change that one character, 134 00:06:12,333 --> 00:06:14,283 we would have to upload 135 00:06:14,283 --> 00:06:18,444 the entire one gigabyte file all over again. 136 00:06:18,444 --> 00:06:19,954 As opposed to block storage, 137 00:06:19,954 --> 00:06:21,454 where we can just go in 138 00:06:21,454 --> 00:06:24,993 and change just the individual blocks 139 00:06:24,993 --> 00:06:27,332 that make up those bits 140 00:06:27,332 --> 00:06:29,371 that make up that character. 141 00:06:29,371 --> 00:06:32,583 Another difference is that with S3 being object storage, 142 00:06:32,583 --> 00:06:35,812 it uses http or https for the transfer, 143 00:06:35,812 --> 00:06:38,092 which is a text-based transfer. 144 00:06:38,092 --> 00:06:40,353 That's going to be a lot slower 145 00:06:40,353 --> 00:06:43,409 than the more efficient protocols used by block storage, 146 00:06:43,409 --> 00:06:48,119 such as instant store and the elastic block store service. 147 00:06:48,119 --> 00:06:49,528 Now... 148 00:06:49,528 --> 00:06:53,802 Earlier we did talk about S3 being able to be mounted, 149 00:06:53,802 --> 00:06:55,490 but that's kind of an illusion. 150 00:06:55,490 --> 00:06:58,748 Ultimately from a real systems level, 151 00:06:58,748 --> 00:07:01,482 it can't be mounted in the same way 152 00:07:01,482 --> 00:07:03,355 that block storage can be. 153 00:07:03,355 --> 00:07:05,882 We do have pieces of software 154 00:07:05,882 --> 00:07:10,039 that can create the illusion of S3 being mounted, 155 00:07:10,039 --> 00:07:14,206 but again, its still using that http for its transfer. 156 00:07:15,328 --> 00:07:17,038 So, if you do that. 157 00:07:17,038 --> 00:07:20,077 If you use that software and, again, 158 00:07:20,077 --> 00:07:23,107 create that illusion of having mounted S3, 159 00:07:23,107 --> 00:07:25,440 you are not going to get 160 00:07:25,440 --> 00:07:29,041 anywhere near the performance of a block storage. 161 00:07:29,041 --> 00:07:31,270 So S3 is not at all the type of storage 162 00:07:31,270 --> 00:07:34,341 that we want to use for a database. 163 00:07:34,341 --> 00:07:36,822 We would want to use an EBS volume 164 00:07:36,822 --> 00:07:38,031 or an instant store volume, 165 00:07:38,031 --> 00:07:41,235 a block storage media, for a database. 166 00:07:41,235 --> 00:07:43,610 S3 is perfect for database backups, 167 00:07:43,610 --> 00:07:46,676 but not for the disk to which the database 168 00:07:46,676 --> 00:07:49,081 would read and write its data. 169 00:07:49,081 --> 00:07:52,772 With block storage, we don't get random IO. 170 00:07:52,772 --> 00:07:55,730 We either download the entire object, 171 00:07:55,730 --> 00:07:58,782 or upload the entire object and that's it. 172 00:07:58,782 --> 00:08:01,923 If you have applications that need random IO, 173 00:08:01,923 --> 00:08:05,160 then block storage is going to be your best bet. 174 00:08:05,160 --> 00:08:08,577 So that is Amazon Simple Storage Service.