1 00:00:06,530 --> 00:00:10,520 Now let's talk about Amazon EBS Snapshots. 2 00:00:10,520 --> 00:00:13,700 So a really powerful feature of EBS 3 00:00:13,700 --> 00:00:16,090 is the ability to take snapshots. 4 00:00:16,090 --> 00:00:18,960 Like we talked about earlier, EBS volumes exist 5 00:00:18,960 --> 00:00:21,120 in a single availability zone 6 00:00:21,120 --> 00:00:24,810 and they are durable to the loss of a device 7 00:00:24,810 --> 00:00:27,130 within that availability zone 8 00:00:27,130 --> 00:00:30,050 but they are not durable to the loss 9 00:00:30,050 --> 00:00:31,840 of that availability zone. 10 00:00:31,840 --> 00:00:35,721 So if that availability zone is offline, 11 00:00:35,721 --> 00:00:37,890 your data is unavailable. 12 00:00:37,890 --> 00:00:42,890 If that availability zone is destroyed by natural disaster, 13 00:00:43,070 --> 00:00:47,830 the data that is on that volume will also be destroyed. 14 00:00:47,830 --> 00:00:52,830 And so volumes can be restored from snapshots, right? 15 00:00:52,870 --> 00:00:55,710 So to mitigate that kind of thing 16 00:00:55,710 --> 00:00:57,550 by taking regular snapshots, 17 00:00:57,550 --> 00:01:00,610 we know that we have a backup of that volume somewhere 18 00:01:00,610 --> 00:01:04,420 so that if the volume is accidentally deleted, 19 00:01:04,420 --> 00:01:07,170 if the availability zone were to be destroyed, 20 00:01:07,170 --> 00:01:09,663 then we still have a copy of that data somewhere 21 00:01:09,663 --> 00:01:12,390 in the form of a snapshot. 22 00:01:12,390 --> 00:01:15,476 When we take a snapshot of that volume, 23 00:01:15,476 --> 00:01:20,113 the data is stored, the snapshot is stored, in S3. 24 00:01:21,430 --> 00:01:24,840 So looking here, if we were to initiate a snapshot 25 00:01:24,840 --> 00:01:29,589 of this volume then the data that comprises these snapshots 26 00:01:29,589 --> 00:01:34,589 all gets stored over here within S3. 27 00:01:34,680 --> 00:01:38,820 Now this is not to say that we have to create 28 00:01:38,820 --> 00:01:40,951 and manage an S3 bucket. 29 00:01:40,951 --> 00:01:43,420 AWS does that for us, right. 30 00:01:43,420 --> 00:01:46,730 So the management of S3, the creation of buckets 31 00:01:46,730 --> 00:01:49,380 and all of that, that's handled by AWS. 32 00:01:49,380 --> 00:01:54,070 We just know that snapshots, according to AWS documentation, 33 00:01:54,070 --> 00:01:57,730 we know that snapshots are stored within S3, 34 00:01:57,730 --> 00:02:02,030 and we know that S3 gives us numerous copies 35 00:02:02,030 --> 00:02:04,490 across numerous availability zones 36 00:02:04,490 --> 00:02:07,085 so that S3 is durable to the loss 37 00:02:07,085 --> 00:02:09,330 of an availability zone, right? 38 00:02:09,330 --> 00:02:13,463 So if we're storing our data in the form of snapshots 39 00:02:13,463 --> 00:02:17,575 if an availability zone is lost we will still have a copy 40 00:02:17,575 --> 00:02:20,130 in the form of a snapshot. 41 00:02:20,130 --> 00:02:24,471 Another great benefit of snapshots is the ability 42 00:02:24,471 --> 00:02:27,230 to copy them to another region. 43 00:02:27,230 --> 00:02:30,470 So if we taka snapshot of a volume, 44 00:02:30,470 --> 00:02:32,190 that snapshot will be stored 45 00:02:32,190 --> 00:02:34,804 in the same region as that volume. 46 00:02:34,804 --> 00:02:37,474 We could then copy that snapshot 47 00:02:37,474 --> 00:02:40,930 to any other region and by doing that, 48 00:02:40,930 --> 00:02:43,500 now we have multiple snapshots, 49 00:02:43,500 --> 00:02:46,200 now we have some geographical diversity. 50 00:02:46,200 --> 00:02:49,470 We have backups of the data potentially thousands 51 00:02:49,470 --> 00:02:54,146 of miles apart so that if an entire region were affected 52 00:02:54,146 --> 00:02:58,050 by natural disaster, then we would still have a copy 53 00:02:58,050 --> 00:03:00,265 of that data in another region. 54 00:03:00,265 --> 00:03:03,879 We can also share those snapshots 55 00:03:03,879 --> 00:03:08,450 so again it's common practice for organizations 56 00:03:08,450 --> 00:03:11,120 to leverage multiple accounts. 57 00:03:11,120 --> 00:03:14,760 And again you may have an account for admin purposes, 58 00:03:14,760 --> 00:03:17,450 you may have an account for different environments 59 00:03:17,450 --> 00:03:19,276 or accounts for different business units 60 00:03:19,276 --> 00:03:23,670 and it's possible that you can take a snapshot 61 00:03:23,670 --> 00:03:28,540 in one account and share that snapshot to other accounts. 62 00:03:28,540 --> 00:03:31,080 Now the way that snapshots work 63 00:03:31,080 --> 00:03:32,940 if we were to take a look at the mechanics 64 00:03:32,940 --> 00:03:37,030 if we take a look here at this particular slide, 65 00:03:37,030 --> 00:03:39,710 let's say here we have a block level volume, right, 66 00:03:39,710 --> 00:03:42,920 and each one of these squares represents a block 67 00:03:42,920 --> 00:03:44,820 on that volume. 68 00:03:44,820 --> 00:03:48,500 If for let's say a fairly new volume we have a lot 69 00:03:48,500 --> 00:03:50,980 of blocks but only some of them have data. 70 00:03:50,980 --> 00:03:54,621 Right, we've only written data to a number of blocks here. 71 00:03:54,621 --> 00:03:58,190 The first backup that we do will be a full backup 72 00:03:58,190 --> 00:04:02,033 that includes all the data on that volume, right? 73 00:04:02,033 --> 00:04:05,360 So the first backup includes all the data 74 00:04:05,360 --> 00:04:06,893 that currently exists. 75 00:04:07,780 --> 00:04:11,797 When we perform subsequent backups or snapshots 76 00:04:11,797 --> 00:04:16,494 those are incremental, that's only recording deltas. 77 00:04:16,494 --> 00:04:21,494 And so here perhaps we've changed some data, right, 78 00:04:21,890 --> 00:04:25,800 in these blocks, and perhaps we've added some data 79 00:04:25,800 --> 00:04:30,000 in some other blocks and so only the data that has changed 80 00:04:30,000 --> 00:04:33,140 or been added will be included 81 00:04:33,140 --> 00:04:35,940 in that subsequent snapshots. 82 00:04:35,940 --> 00:04:39,020 And so a common question here is, well, 83 00:04:39,020 --> 00:04:40,860 if you're only storing deltas, 84 00:04:40,860 --> 00:04:45,220 what happens if you go back and delete like 85 00:04:45,220 --> 00:04:46,793 an earlier snapshot. 86 00:04:47,640 --> 00:04:50,320 And the answer to that is that 87 00:04:50,320 --> 00:04:52,550 you may be taking regular snapshots 88 00:04:52,550 --> 00:04:55,280 and each one only recording deltas. 89 00:04:55,280 --> 00:04:59,590 If you go back and delete an earlier one, 90 00:04:59,590 --> 00:05:03,780 or several or several dozen earlier ones, it doesn't matter, 91 00:05:03,780 --> 00:05:08,780 any later snapshot will maintain a reference 92 00:05:09,550 --> 00:05:12,890 to all of the data that it needs, right? 93 00:05:12,890 --> 00:05:15,472 So just because you delete an old snapshot 94 00:05:15,472 --> 00:05:20,472 doesn't mean that a later snapshot is now unusable, right? 95 00:05:21,080 --> 00:05:24,410 Any snapshot that you take will include 96 00:05:24,410 --> 00:05:28,400 all of the data necessary to restore that volume. 97 00:05:28,400 --> 00:05:31,110 Another feature that we have access to 98 00:05:31,110 --> 00:05:32,600 that we should keep in mind 99 00:05:32,600 --> 00:05:34,580 is the data lifecycle manager, 100 00:05:34,580 --> 00:05:37,050 and this was released fairly recently 101 00:05:37,050 --> 00:05:39,313 in the latter part of 2018. 102 00:05:39,313 --> 00:05:44,313 And for a long time people in my classes would ask me, 103 00:05:46,450 --> 00:05:48,647 or clients of mine would also ask me, 104 00:05:48,647 --> 00:05:51,579 "Hey, is there a way to automate snapshots?" 105 00:05:51,579 --> 00:05:55,020 And I would say "Yeah, we can write a script to do that." 106 00:05:55,020 --> 00:05:57,730 And this was a feature that we've been looking for 107 00:05:57,730 --> 00:05:59,646 for a long time and now it's finally here, 108 00:05:59,646 --> 00:06:02,694 the data lifecycle manager allows us 109 00:06:02,694 --> 00:06:06,790 to automate these snapshots 110 00:06:06,790 --> 00:06:10,070 so we can create a schedule here, right? 111 00:06:10,070 --> 00:06:11,590 We can create a timer that says 112 00:06:11,590 --> 00:06:16,590 every so often I want to automate these snapshots 113 00:06:16,640 --> 00:06:19,350 and regularly take a snapshot of that volume. 114 00:06:19,350 --> 00:06:22,482 So we as a customer, we define the backup 115 00:06:22,482 --> 00:06:24,120 and retention schedules. 116 00:06:24,120 --> 00:06:26,890 How often do we want those backups to be performed? 117 00:06:26,890 --> 00:06:30,966 How long do we want those backups to be maintained? 118 00:06:30,966 --> 00:06:35,966 And of course we can create those rules with tags. 119 00:06:35,970 --> 00:06:38,410 We haven't really talked a whole lot about tags so far 120 00:06:38,410 --> 00:06:42,320 but tags are a way of organizing our environment. 121 00:06:42,320 --> 00:06:46,730 And tags are essentially arbitrary key-value pairs. 122 00:06:46,730 --> 00:06:49,720 And so we could tag something with, say, 123 00:06:49,720 --> 00:06:52,250 environment equals production, 124 00:06:52,250 --> 00:06:56,100 or project equals whatever, cost center 125 00:06:56,100 --> 00:06:58,880 and owner and so on. 126 00:06:58,880 --> 00:07:01,400 And so we could use those life cycle rules 127 00:07:01,400 --> 00:07:03,636 to say, well, every volume that's tagged 128 00:07:03,636 --> 00:07:07,820 in a particular way should be, you know, 129 00:07:07,820 --> 00:07:11,990 we should take a snapshot on a particular schedule. 130 00:07:11,990 --> 00:07:14,090 So for those of us who have data 131 00:07:14,090 --> 00:07:16,053 that is really important to us, 132 00:07:17,310 --> 00:07:21,420 and we need to maintain a high degree 133 00:07:21,420 --> 00:07:23,050 of durability in that data, 134 00:07:23,050 --> 00:07:25,480 if we want to rest assured knowing 135 00:07:25,480 --> 00:07:28,950 that we have not only copies of the data live 136 00:07:28,950 --> 00:07:31,250 but also multiple copies of the data 137 00:07:31,250 --> 00:07:33,220 in the form of backups 138 00:07:33,220 --> 00:07:36,263 then we should take a look at EBS snapshots.