1 00:00:06,760 --> 00:00:08,660 - Now, let's review an introduction 2 00:00:08,660 --> 00:00:12,280 to Amazon Simple Storage Service, or S3. 3 00:00:12,280 --> 00:00:16,470 With Amazon S3, we gain access to object storage. 4 00:00:16,470 --> 00:00:19,870 Object storage is a perfect place 5 00:00:19,870 --> 00:00:24,470 for relatively static type of data. 6 00:00:24,470 --> 00:00:25,745 You know, 7 00:00:25,745 --> 00:00:27,100 if your data doesn't change very often at all, 8 00:00:27,100 --> 00:00:29,350 then S3 is an ideal storage. 9 00:00:29,350 --> 00:00:33,150 When we think about static assets for websites, for example. 10 00:00:33,150 --> 00:00:34,980 Those are classic examples. 11 00:00:34,980 --> 00:00:39,980 CSS, JavaScript, images, PDF documents, 12 00:00:40,020 --> 00:00:41,280 those things, 13 00:00:41,280 --> 00:00:43,620 and some of those may update once a day 14 00:00:43,620 --> 00:00:44,760 or a couple times a day, 15 00:00:44,760 --> 00:00:46,970 but often times when they do, 16 00:00:46,970 --> 00:00:49,610 you want them to get the new name anyway. 17 00:00:49,610 --> 00:00:52,180 You're not necessarily reusing the existing name, 18 00:00:52,180 --> 00:00:54,413 due to cash-in, right? 19 00:00:54,413 --> 00:00:58,470 With object storage storing these things in S3, 20 00:00:58,470 --> 00:01:01,620 S3 makes a really great central place 21 00:01:01,620 --> 00:01:03,920 for others things to access 22 00:01:03,920 --> 00:01:08,910 and those other things could be other services within AWS, 23 00:01:08,910 --> 00:01:11,720 they could be your end users out there on the web 24 00:01:11,720 --> 00:01:13,869 downloading CSS and JavaScript, 25 00:01:13,869 --> 00:01:16,560 and so with S3, 26 00:01:16,560 --> 00:01:19,210 when we store data there, 27 00:01:19,210 --> 00:01:21,320 and whether it's static assets 28 00:01:21,320 --> 00:01:25,688 or even keep in mind that snapshots are also stored in S3, 29 00:01:25,688 --> 00:01:30,688 that S3 is a distributed system. 30 00:01:30,770 --> 00:01:33,930 When we write data to it in the standard storage class, 31 00:01:33,930 --> 00:01:36,220 and we'll talk more about storage classes later, 32 00:01:36,220 --> 00:01:39,950 but the standard storage class replicates that data 33 00:01:39,950 --> 00:01:43,920 across numerous devices in numerous availability zones. 34 00:01:43,920 --> 00:01:47,220 So there's an inherently a very high degree 35 00:01:47,220 --> 00:01:48,943 of durability in our data. 36 00:01:49,860 --> 00:01:54,860 Now, there is no file system within S3. 37 00:01:54,971 --> 00:01:59,971 There's basically a flat structure inside the bucket. 38 00:02:00,472 --> 00:02:04,270 If we look here within the S3 service, 39 00:02:04,270 --> 00:02:06,648 we can create buckets, 40 00:02:06,648 --> 00:02:08,750 and we may create a different bucket 41 00:02:08,750 --> 00:02:10,930 for different types of data. 42 00:02:10,930 --> 00:02:13,030 You might have a bucket for web assets, 43 00:02:13,030 --> 00:02:14,210 a bucket for logs, 44 00:02:14,210 --> 00:02:16,760 maybe a bucket for a data lake, 45 00:02:16,760 --> 00:02:19,310 and then once we have a bucket, 46 00:02:19,310 --> 00:02:22,580 we can store objects in that bucket. 47 00:02:22,580 --> 00:02:27,580 And so again, objects exists inside the bucket. 48 00:02:28,100 --> 00:02:30,380 And so that's as flat as it goes. 49 00:02:30,380 --> 00:02:32,080 There's no more hierarchy than that. 50 00:02:32,080 --> 00:02:33,563 Just objects in a bucket. 51 00:02:34,740 --> 00:02:38,100 It's also worth noting that buckets are private by default. 52 00:02:38,100 --> 00:02:41,540 The only credentials that can be used 53 00:02:41,540 --> 00:02:44,800 to access a bucket would be the master credentials. 54 00:02:44,800 --> 00:02:47,440 Those credentials that own the account 55 00:02:47,440 --> 00:02:49,511 and any other IM user 56 00:02:49,511 --> 00:02:53,030 or any other user for that matter would have 57 00:02:53,030 --> 00:02:55,290 to be granted explicit permissions 58 00:02:55,290 --> 00:02:57,100 in order to access that bucket 59 00:02:57,100 --> 00:02:58,780 and we'll talk about more ways 60 00:02:58,780 --> 00:03:01,840 to do that in a later lesson. 61 00:03:01,840 --> 00:03:05,606 Now, when we do upload files to a bucket, 62 00:03:05,606 --> 00:03:08,810 every bucket has its own URL 63 00:03:08,810 --> 00:03:09,880 and you'll notice here 64 00:03:09,880 --> 00:03:11,922 that the URL in this particular case, 65 00:03:11,922 --> 00:03:16,790 we have there are several different types 66 00:03:16,790 --> 00:03:17,960 of URLs that we can get. 67 00:03:17,960 --> 00:03:21,881 This is one option where we have the region, 68 00:03:21,881 --> 00:03:24,330 you'll notice here that in this particular case, 69 00:03:24,330 --> 00:03:27,070 this bucket in this example would exist 70 00:03:27,070 --> 00:03:28,520 in the US West 2 region, 71 00:03:28,520 --> 00:03:30,570 so this is a regional endpoint. 72 00:03:30,570 --> 00:03:32,843 There are other ways of getting to that bucket. 73 00:03:32,843 --> 00:03:35,721 Now, in this kind of URL, 74 00:03:35,721 --> 00:03:39,205 you'll see here that we have the bucket name 75 00:03:39,205 --> 00:03:41,110 that's a part of the URL 76 00:03:41,110 --> 00:03:44,130 and then we'll have a slash 77 00:03:44,130 --> 00:03:46,972 and then you'll notice here the key here 78 00:03:46,972 --> 00:03:51,349 is images/logo.png. 79 00:03:51,349 --> 00:03:55,239 We can create the illusion of a file system 80 00:03:55,239 --> 00:04:00,239 by including slashes and periods in the name, 81 00:04:00,750 --> 00:04:03,721 but ultimately, there is no folder called images, 82 00:04:03,721 --> 00:04:07,160 there's no file called logo.png, 83 00:04:07,160 --> 00:04:11,320 there is an object that contains a slash 84 00:04:11,320 --> 00:04:13,610 and a period in its name, 85 00:04:13,610 --> 00:04:16,050 and we can use that kind of pattern 86 00:04:16,050 --> 00:04:17,630 to organize our buckets, 87 00:04:17,630 --> 00:04:19,330 and when we do that, 88 00:04:19,330 --> 00:04:24,136 if we had a number of different objects back here, 89 00:04:24,136 --> 00:04:26,510 a number of different images back here, 90 00:04:26,510 --> 00:04:28,420 all with the same, 91 00:04:28,420 --> 00:04:30,570 all under images/, 92 00:04:30,570 --> 00:04:31,500 then in that regard, 93 00:04:31,500 --> 00:04:34,550 images/ becomes a common prefix 94 00:04:34,550 --> 00:04:36,440 and we can use that common prefix 95 00:04:36,440 --> 00:04:40,573 to help find things in a bucket. 96 00:04:41,412 --> 00:04:46,040 There is no limit to the size of a bucket. 97 00:04:46,040 --> 00:04:49,630 We can store as many objects as we want in a bucket 98 00:04:49,630 --> 00:04:52,830 and we can store as much data as we want in a bucket, 99 00:04:52,830 --> 00:04:56,543 far into the the petabytes or beyond, if we need to. 100 00:04:57,960 --> 00:05:00,960 There is a limit per object. 101 00:05:00,960 --> 00:05:05,960 So any one object can be up to five terabytes in size. 102 00:05:06,250 --> 00:05:08,770 There is also an upload limit 103 00:05:08,770 --> 00:05:11,600 of five gigs per put operation, 104 00:05:11,600 --> 00:05:14,000 and a common question is, 105 00:05:14,000 --> 00:05:16,876 well, if you can only upload five gigs at a time, 106 00:05:16,876 --> 00:05:21,300 how do you possibly get an object that's five terabytes? 107 00:05:21,300 --> 00:05:24,140 And the answer to that is multipart upload. 108 00:05:24,140 --> 00:05:28,040 S3 does support multipart upload and Amazon recommends 109 00:05:28,040 --> 00:05:32,220 that for any objects greater than 100 meg or so, 110 00:05:32,220 --> 00:05:33,899 that multipart upload is, 111 00:05:33,899 --> 00:05:38,640 can be a faster way of getting that object to S3. 112 00:05:38,640 --> 00:05:39,690 Simply because, 113 00:05:39,690 --> 00:05:40,800 especially if you're running 114 00:05:40,800 --> 00:05:43,410 in a multi-threaded environment 115 00:05:43,410 --> 00:05:48,290 where you can dedicate multiple threads to multiple parts. 116 00:05:48,290 --> 00:05:50,790 That way you have multiple parts 117 00:05:50,790 --> 00:05:55,790 of this file being uploaded essentially in parallel. 118 00:05:55,850 --> 00:06:00,850 S3 also supports server-side encryption using AES-256, 119 00:06:01,930 --> 00:06:03,921 the Advanced Encryption Standard, 120 00:06:03,921 --> 00:06:08,780 and when S3 performs that server-side encryption, 121 00:06:08,780 --> 00:06:10,120 we have a number of options 122 00:06:10,120 --> 00:06:12,250 in regards to where the keys come from 123 00:06:12,250 --> 00:06:14,604 and where the encryption keys are managed, 124 00:06:14,604 --> 00:06:15,835 but either way, 125 00:06:15,835 --> 00:06:18,820 wherever the key is actually managed, 126 00:06:18,820 --> 00:06:22,770 S3 uses a key per object. 127 00:06:22,770 --> 00:06:25,630 Every object gets its own encryption key. 128 00:06:25,630 --> 00:06:29,470 That way if any one key were compromised, 129 00:06:29,470 --> 00:06:32,010 only that one object would be compromised, 130 00:06:32,010 --> 00:06:34,660 and not an entire bucket, right? 131 00:06:34,660 --> 00:06:36,700 So that's S3. 132 00:06:36,700 --> 00:06:37,540 And keep in mind, 133 00:06:37,540 --> 00:06:39,130 again, the takeaway from this lesson is 134 00:06:39,130 --> 00:06:43,110 that S3 provides object storage 135 00:06:43,110 --> 00:06:47,900 and it's really great for relatively static types of data, 136 00:06:47,900 --> 00:06:51,940 like static assets, log files, data lakes, and so on, 137 00:06:51,940 --> 00:06:54,610 and we will take a closer look at S3 138 00:06:54,610 --> 00:06:56,880 and its relationship to other services 139 00:06:56,880 --> 00:07:00,680 and how we can use S3 in the bigger picture 140 00:07:00,680 --> 00:07:02,440 as the course goes on. 141 00:07:02,440 --> 00:07:04,260 Now a few other features 142 00:07:04,260 --> 00:07:07,660 that are worth knowing about Amazon S3, 143 00:07:07,660 --> 00:07:09,980 would be one, versioning. 144 00:07:09,980 --> 00:07:12,000 With versioning you can, 145 00:07:12,000 --> 00:07:15,703 as you write objects with the same name 146 00:07:15,703 --> 00:07:17,855 with the same object key, 147 00:07:17,855 --> 00:07:19,470 every time you do that, 148 00:07:19,470 --> 00:07:21,304 you get a new version, 149 00:07:21,304 --> 00:07:23,753 and by having versioning turned on, 150 00:07:23,753 --> 00:07:27,291 you can always roll back to a previous version. 151 00:07:27,291 --> 00:07:30,310 So, if you were to accidentally update something 152 00:07:30,310 --> 00:07:31,830 and needed to get back, 153 00:07:31,830 --> 00:07:33,950 you can roll back to a previous version. 154 00:07:33,950 --> 00:07:36,890 If you delete something and wanted to get it back, 155 00:07:36,890 --> 00:07:40,054 you can undo a deletion with versioning. 156 00:07:40,054 --> 00:07:44,450 Another feature would be what we would call query in place, 157 00:07:44,450 --> 00:07:47,470 using a feature called S3 select. 158 00:07:47,470 --> 00:07:49,040 With S3 select, 159 00:07:49,040 --> 00:07:53,030 you can actually extract ranges 160 00:07:53,030 --> 00:07:57,290 or subsets of data from an object in S3. 161 00:07:57,290 --> 00:07:59,590 And that would obviate the need 162 00:07:59,590 --> 00:08:02,679 for you to use your own compute resources 163 00:08:02,679 --> 00:08:05,630 to perform some type of extract, 164 00:08:05,630 --> 00:08:08,250 transform, load, and query operation. 165 00:08:08,250 --> 00:08:12,040 You can perform that kind of sequel query operation 166 00:08:12,040 --> 00:08:16,423 directly on an object in S3, using S3 select. 167 00:08:17,470 --> 00:08:21,133 Another useful feature is cross-region replication. 168 00:08:21,990 --> 00:08:24,730 If you turn on cross-region replication 169 00:08:24,730 --> 00:08:27,271 and you specify the two different buckets 170 00:08:27,271 --> 00:08:29,630 in two different regions, 171 00:08:29,630 --> 00:08:33,260 any changes you make to one bucket will be reflected 172 00:08:33,260 --> 00:08:34,430 in that other region, 173 00:08:34,430 --> 00:08:37,793 so that's very useful for disaster recovery. 174 00:08:39,000 --> 00:08:43,350 Another useful feature would be event notifications. 175 00:08:43,350 --> 00:08:45,260 With event notifications, 176 00:08:45,260 --> 00:08:48,590 as things happen in your S3 buckets, 177 00:08:48,590 --> 00:08:51,450 for example, if you upload a new object 178 00:08:51,450 --> 00:08:53,905 or you update an object or delete an object, 179 00:08:53,905 --> 00:08:58,810 each one of those actions would be an event notification 180 00:08:58,810 --> 00:09:01,800 and you can subscribe to those notifications 181 00:09:01,800 --> 00:09:03,963 in a number of different ways. 182 00:09:04,900 --> 00:09:09,900 A common use for these is to subscribe lambda functions 183 00:09:10,062 --> 00:09:12,460 to those notifications, 184 00:09:12,460 --> 00:09:16,313 so that AWS Lambda can programmatically respond 185 00:09:16,313 --> 00:09:19,800 to say, the addition of a new object 186 00:09:19,800 --> 00:09:23,990 and perform some type of an analysis on that object. 187 00:09:23,990 --> 00:09:27,973 Another feature would be S3 transfer acceleration. 188 00:09:28,860 --> 00:09:32,830 When you are moving data between regions 189 00:09:32,830 --> 00:09:36,520 or from large physical distances, 190 00:09:36,520 --> 00:09:40,590 it's worth noting that AWS has their own backbone, 191 00:09:40,590 --> 00:09:45,160 the fiber that connects Amazon regions, 192 00:09:45,160 --> 00:09:47,410 much of that is owned by AWS, 193 00:09:47,410 --> 00:09:50,720 and so they operate essentially their own backbone 194 00:09:50,720 --> 00:09:53,807 and with S3 transfer acceleration, 195 00:09:53,807 --> 00:09:58,060 AWS is able to optimize the network paths 196 00:09:58,060 --> 00:10:01,990 to enable us to accelerate a connection 197 00:10:01,990 --> 00:10:05,890 over long distances that might ordinarily incur 198 00:10:05,890 --> 00:10:08,530 a much higher degree of latency, 199 00:10:08,530 --> 00:10:11,140 and so the thing to keep in mind 200 00:10:11,140 --> 00:10:13,680 about S3 transfer acceleration is 201 00:10:13,680 --> 00:10:16,090 that the longer the distance, 202 00:10:16,090 --> 00:10:18,460 the greater benefit you actually get, 203 00:10:18,460 --> 00:10:23,460 and so if we were to go from say the East Coast of the US 204 00:10:23,750 --> 00:10:28,210 over to somewhere in Eastern Europe, 205 00:10:28,210 --> 00:10:30,400 then we would see a significant improvement 206 00:10:30,400 --> 00:10:34,170 in the latencies during that transfer acceleration. 207 00:10:34,170 --> 00:10:39,170 If we were to go from say the Northern US to Southern US, 208 00:10:40,060 --> 00:10:42,280 we would see a less improvement, 209 00:10:42,280 --> 00:10:44,620 because it's a shorter distance. 210 00:10:44,620 --> 00:10:47,270 And the last feature I'll leave you 211 00:10:47,270 --> 00:10:51,050 with here is static websites. 212 00:10:51,050 --> 00:10:56,050 S3 can also be configured to simply host a static website, 213 00:10:56,110 --> 00:10:58,737 directly from that bucket, 214 00:10:58,737 --> 00:11:02,080 and that can be very useful in many cases. 215 00:11:02,080 --> 00:11:02,913 For example, 216 00:11:02,913 --> 00:11:06,555 I've worked with clients who have a dynamic API, 217 00:11:06,555 --> 00:11:09,460 but the marketing component 218 00:11:09,460 --> 00:11:10,770 of their domain, 219 00:11:10,770 --> 00:11:11,803 the triple W, 220 00:11:11,803 --> 00:11:14,500 is relatively static, 221 00:11:14,500 --> 00:11:16,570 and so I've configured this 222 00:11:16,570 --> 00:11:21,570 for clients where their API.domain.com runs 223 00:11:22,190 --> 00:11:24,051 from an elastic load balancer, 224 00:11:24,051 --> 00:11:25,550 is dynamic, 225 00:11:25,550 --> 00:11:30,420 but triple W dot their domain dot com is served 226 00:11:30,420 --> 00:11:32,520 directly from an S3 bucket, 227 00:11:32,520 --> 00:11:34,050 and one of the benefits of doing 228 00:11:34,050 --> 00:11:37,147 that is that you could potentially run something 229 00:11:37,147 --> 00:11:39,709 like WordPress for example 230 00:11:39,709 --> 00:11:42,604 on a machine on premises 231 00:11:42,604 --> 00:11:45,757 and by making the updates there 232 00:11:45,757 --> 00:11:50,320 and publishing that content as static HTML, 233 00:11:50,320 --> 00:11:52,660 you could greatly lower your costs, 234 00:11:52,660 --> 00:11:55,038 simply by serving it from a static website in S3, 235 00:11:55,038 --> 00:12:00,038 rather than running WordPress on EC2, right? 236 00:12:00,220 --> 00:12:03,520 That's Amazon S3 and we will take a closer look 237 00:12:03,520 --> 00:12:05,210 at some of these things through demos 238 00:12:05,210 --> 00:12:08,063 and uses cases as the course progresses.