1 00:00:06,604 --> 00:00:10,141 - Here we'll talk about auto scaling. 2 00:00:10,141 --> 00:00:13,882 Auto scaling service is an incredibly powerful service 3 00:00:13,882 --> 00:00:17,944 that allows us to handle changes in traffic 4 00:00:17,944 --> 00:00:21,289 or changes in the demands of our application. 5 00:00:21,289 --> 00:00:25,122 Now one of the really key features of not only 6 00:00:26,609 --> 00:00:29,897 Amazon Web Services but cloud computing as a whole, 7 00:00:29,897 --> 00:00:32,362 is the ability to be elastic. 8 00:00:32,362 --> 00:00:34,235 We're operating from a paradigm 9 00:00:34,235 --> 00:00:38,331 of use what you need, and pay for what you use. 10 00:00:38,331 --> 00:00:41,194 So if we don't need it, why should we be paying for it? 11 00:00:41,194 --> 00:00:44,748 If we don't need massive servers, 12 00:00:44,748 --> 00:00:46,843 we should be paying for smaller ones. 13 00:00:46,843 --> 00:00:49,000 If we don't need 1,000 machines, 14 00:00:49,000 --> 00:00:51,114 we should be paying for fewer ones. 15 00:00:51,114 --> 00:00:53,917 We want to be looking at our traffic, 16 00:00:53,917 --> 00:00:56,059 looking at the demands of our application 17 00:00:56,059 --> 00:00:58,041 and really saying, what's the minimum 18 00:00:58,041 --> 00:01:02,208 that we need to fulfill these requests and meet this demand? 19 00:01:03,581 --> 00:01:06,077 And then as that demand rises, 20 00:01:06,077 --> 00:01:07,763 again, we ask that question. 21 00:01:07,763 --> 00:01:11,773 Okay, from this point on what's the minimum that we need? 22 00:01:11,773 --> 00:01:14,601 And we keep doing that as demand rises, 23 00:01:14,601 --> 00:01:17,032 and then of course ad demand falls, 24 00:01:17,032 --> 00:01:19,517 we wanna be reevaluating that 25 00:01:19,517 --> 00:01:21,868 and going, you know what, demand is coming down, 26 00:01:21,868 --> 00:01:25,305 we don't need this amount of compute power 27 00:01:25,305 --> 00:01:28,298 so let's pull these machines out and save money. 28 00:01:28,298 --> 00:01:31,742 So the auto scaling service can help us with that. 29 00:01:31,742 --> 00:01:36,652 We have our VPC leveraging multiple availability zones. 30 00:01:36,652 --> 00:01:38,973 We have an elastic load balancer 31 00:01:38,973 --> 00:01:41,085 leveraging multiple availability zones. 32 00:01:41,085 --> 00:01:45,514 As is our application in multiple EC2 instances 33 00:01:45,514 --> 00:01:48,219 across multiple availability zones. 34 00:01:48,219 --> 00:01:51,197 What we have here is an auto scaling group 35 00:01:51,197 --> 00:01:54,762 that is designed to launch machines 36 00:01:54,762 --> 00:01:58,122 into multiple availability zones. 37 00:01:58,122 --> 00:02:00,205 As our traffic increases, 38 00:02:01,885 --> 00:02:05,414 and we start to see perhaps that we're going over 39 00:02:05,414 --> 00:02:06,540 some threshold, 40 00:02:06,540 --> 00:02:10,619 maybe it's CPU, or network, or the number of processes, 41 00:02:10,619 --> 00:02:14,779 then we can allow the auto scaling service 42 00:02:14,779 --> 00:02:18,170 to start to add machines for us, 43 00:02:18,170 --> 00:02:21,853 and perhaps we wanna start at three and go up from there. 44 00:02:21,853 --> 00:02:24,282 So after a certain period of time 45 00:02:24,282 --> 00:02:26,855 we see that our metrics have not lowered. 46 00:02:26,855 --> 00:02:30,826 We're still above a certain amount of CPU usage. 47 00:02:30,826 --> 00:02:33,211 We're still above a certain amount of network usage. 48 00:02:33,211 --> 00:02:36,774 Whatever it is, whatever we determine the bottle neck to be, 49 00:02:36,774 --> 00:02:40,164 then the auto scaling service will continue 50 00:02:40,164 --> 00:02:44,987 to add machines for us, and will continue to scale. 51 00:02:44,987 --> 00:02:46,493 Now we do have some limits. 52 00:02:46,493 --> 00:02:49,053 We can set a minimum, we can set a maximum. 53 00:02:49,053 --> 00:02:52,199 Typically the minimum would be set based on 54 00:02:52,199 --> 00:02:55,261 how many machines do we need to maintain 55 00:02:55,261 --> 00:02:58,589 the minimum load or the lowest amount of traffic 56 00:02:58,589 --> 00:03:00,544 that we ever see. 57 00:03:00,544 --> 00:03:04,760 The maximum would be determined simply by 58 00:03:04,760 --> 00:03:06,757 how much we're willing to spend. 59 00:03:06,757 --> 00:03:10,022 Within Amazon, while we do have some initial limits 60 00:03:10,022 --> 00:03:12,342 on the number of instances we launch, 61 00:03:12,342 --> 00:03:15,955 we can very easily get those limits lifted, 62 00:03:15,955 --> 00:03:18,608 and perhaps we aren't willing to pay for 63 00:03:18,608 --> 00:03:20,638 100 machines running. 64 00:03:20,638 --> 00:03:25,614 Perhaps we're only willing to spend money on 12 or 20, 65 00:03:25,614 --> 00:03:27,862 and so we can set that as our maximum. 66 00:03:27,862 --> 00:03:32,029 And now we could see that our system has scaled out 67 00:03:33,138 --> 00:03:36,342 and it's maintained a balance of machines 68 00:03:36,342 --> 00:03:38,838 across multiple availability zones. 69 00:03:38,838 --> 00:03:42,255 If we were to lose one of these machines, 70 00:03:43,701 --> 00:03:48,118 then auto scaling would automatically replace it. 71 00:03:48,118 --> 00:03:51,440 If we were to lose one of these availability zones, 72 00:03:51,440 --> 00:03:54,182 the auto scaling group is smart enough to know 73 00:03:54,182 --> 00:03:57,544 if that availability zone is unreachable, 74 00:03:57,544 --> 00:04:00,939 then it will automatically rebalance 75 00:04:00,939 --> 00:04:03,680 and put these two machines over here 76 00:04:03,680 --> 00:04:06,156 in the other two availability zones 77 00:04:06,156 --> 00:04:09,179 that are still in operation. 78 00:04:09,179 --> 00:04:12,699 Now as our traffic starts to come down, 79 00:04:12,699 --> 00:04:15,041 and starts to fall off, 80 00:04:15,041 --> 00:04:17,861 we can allow the auto scaling service 81 00:04:17,861 --> 00:04:20,580 to start terminating machines. 82 00:04:20,580 --> 00:04:24,099 So our traffic comes down even further 83 00:04:24,099 --> 00:04:26,994 and we've removed three machines. 84 00:04:26,994 --> 00:04:29,099 We're back to our original minimum. 85 00:04:29,099 --> 00:04:32,822 So auto scaling service can help us scale out 86 00:04:32,822 --> 00:04:36,961 to meet demand and then during times where 87 00:04:36,961 --> 00:04:38,662 we don't have that demand, 88 00:04:38,662 --> 00:04:40,721 it helps us scale in to save money, 89 00:04:40,721 --> 00:04:44,297 and the scaling in is just as important, 90 00:04:44,297 --> 00:04:47,436 because one of the primary reasons so many people, 91 00:04:47,436 --> 00:04:50,194 so many fortune 500s and government agencies 92 00:04:50,194 --> 00:04:52,299 are moving to Amazon Web Services 93 00:04:52,299 --> 00:04:55,049 is the opportunity to save money, 94 00:04:55,921 --> 00:04:57,942 and so it doesn't really do us any good 95 00:04:57,942 --> 00:05:00,001 to scale out to dozens of machines 96 00:05:00,001 --> 00:05:01,638 and just leave them there. 97 00:05:01,638 --> 00:05:04,699 We want to be sure that we configure our auto scaling group 98 00:05:04,699 --> 00:05:08,032 to scale in as a way to lower our costs. 99 00:05:09,899 --> 00:05:13,649 Now in order to create an auto scaling group, 100 00:05:15,076 --> 00:05:17,499 two things are required. 101 00:05:17,499 --> 00:05:19,499 We're going to have what we call 102 00:05:19,499 --> 00:05:21,819 the auto scaling group itself 103 00:05:21,819 --> 00:05:25,302 and then a launch configuration. 104 00:05:25,302 --> 00:05:27,659 Some things that are optional would be 105 00:05:27,659 --> 00:05:31,015 scheduled actions, and scaling policies, 106 00:05:31,015 --> 00:05:33,441 and we'll talk about the details of these things 107 00:05:33,441 --> 00:05:35,222 here coming up. 108 00:05:35,222 --> 00:05:39,389 The auto scaling service can replace failed instances. 109 00:05:40,801 --> 00:05:44,699 It can change the capacity or the number of instances 110 00:05:44,699 --> 00:05:45,866 in that group. 111 00:05:47,078 --> 00:05:50,022 We could, in some cases, maybe we don't need 112 00:05:50,022 --> 00:05:51,599 to scale out and in, 113 00:05:51,599 --> 00:05:54,998 maybe we just wanna maintain two instances, 114 00:05:54,998 --> 00:05:56,497 or three, or four instances. 115 00:05:56,497 --> 00:05:58,998 Some fixed number of instances, 116 00:05:58,998 --> 00:06:02,502 and so if one of them fails, auto scaling will replace it 117 00:06:02,502 --> 00:06:06,982 and just maintain that certain number of machines. 118 00:06:06,982 --> 00:06:09,299 We're gonna see here in a little bit 119 00:06:09,299 --> 00:06:13,622 how auto scaling can work with Amazon CloudWatch 120 00:06:13,622 --> 00:06:17,789 so that we can leverage alarms to trigger scaling events. 121 00:06:19,040 --> 00:06:21,222 As our machines come and go, 122 00:06:21,222 --> 00:06:24,699 as machines are launched, as machines are terminated, 123 00:06:24,699 --> 00:06:28,881 we can capture those events with the Amazon 124 00:06:28,881 --> 00:06:30,777 simple notification service, 125 00:06:30,777 --> 00:06:34,998 as well as Lambda and do some kind of processing 126 00:06:34,998 --> 00:06:37,498 or coordination on that event. 127 00:06:39,636 --> 00:06:43,061 So let's take a look at the two things that are required. 128 00:06:43,061 --> 00:06:47,228 The launch configuration and the auto scaling group. 129 00:06:48,537 --> 00:06:52,704 The launch configuration defines what gets launched. 130 00:06:53,601 --> 00:06:55,899 Such as the instance type. 131 00:06:55,899 --> 00:07:00,241 Maybe we wanna launch a T2 small, or an M4 large, 132 00:07:00,241 --> 00:07:02,058 or a C4 extra large. 133 00:07:02,058 --> 00:07:04,081 So we defined the type of instance 134 00:07:04,081 --> 00:07:05,782 that we're going to launch. 135 00:07:05,782 --> 00:07:08,282 We can specify the EBS volumes 136 00:07:09,217 --> 00:07:11,382 or the instance store volumes. 137 00:07:11,382 --> 00:07:14,001 We can specify the Amazon machine image, 138 00:07:14,001 --> 00:07:18,481 any user data that we wanna us to bootstrap that machine, 139 00:07:18,481 --> 00:07:21,318 and once we create the launch configuration, 140 00:07:21,318 --> 00:07:23,478 it cannot be edited. 141 00:07:23,478 --> 00:07:27,121 We have to delete and recreate a launch configuration 142 00:07:27,121 --> 00:07:29,499 if we need to make a change to it. 143 00:07:29,499 --> 00:07:31,739 The other component that's required is 144 00:07:31,739 --> 00:07:34,001 the auto scaling group itself. 145 00:07:34,001 --> 00:07:36,337 Where the launch configuration determines 146 00:07:36,337 --> 00:07:39,659 what gets launched, the auto scaling group determines 147 00:07:39,659 --> 00:07:41,142 where it gets launched, 148 00:07:41,142 --> 00:07:44,993 and that is, here we specify what subnets 149 00:07:44,993 --> 00:07:48,076 and by way of subnets we thus specify 150 00:07:49,099 --> 00:07:52,349 what availability zones are being used. 151 00:07:53,318 --> 00:07:57,158 We also specify which elastic load balancers 152 00:07:57,158 --> 00:07:58,617 are going to be used, 153 00:07:58,617 --> 00:08:03,073 and yes we can specify multiple elastic load balancers 154 00:08:03,073 --> 00:08:05,238 for an auto scaling group to use 155 00:08:05,238 --> 00:08:08,195 so that when instances are launched, 156 00:08:08,195 --> 00:08:10,696 the auto scaling group will automatically 157 00:08:10,696 --> 00:08:14,863 register those instances to the ELBs that we've defined. 158 00:08:16,422 --> 00:08:19,238 Within the auto scaling group we also specify 159 00:08:19,238 --> 00:08:20,961 the health check type. 160 00:08:20,961 --> 00:08:24,502 Now an auto scaling group needs some way 161 00:08:24,502 --> 00:08:27,339 of knowing whether or not a machine is healthy 162 00:08:27,339 --> 00:08:29,121 or unhealthy. 163 00:08:29,121 --> 00:08:32,038 If a machine is considered to be unhealthy, 164 00:08:32,038 --> 00:08:34,796 then the auto scaling group will terminate it 165 00:08:34,796 --> 00:08:37,921 and replace it with a new machine. 166 00:08:37,921 --> 00:08:41,441 We can get those health checks from either 167 00:08:41,441 --> 00:08:43,739 EC2 system health checks 168 00:08:43,739 --> 00:08:48,214 or we can get the health checks from the ELB itself, 169 00:08:48,214 --> 00:08:52,241 and we can configure a health check on the ELB. 170 00:08:52,241 --> 00:08:55,314 If we specify tags, as we should be doing 171 00:08:55,314 --> 00:08:58,758 in order to keep our environments organized, 172 00:08:58,758 --> 00:09:02,037 we have the option of adding these tags 173 00:09:02,037 --> 00:09:03,878 to the auto scaling group 174 00:09:03,878 --> 00:09:07,782 and then the auto scaling group can propagate those tags 175 00:09:07,782 --> 00:09:11,409 to the instances that are launched. 176 00:09:11,409 --> 00:09:13,259 A couple of things that are optional 177 00:09:13,259 --> 00:09:17,579 within auto scaling would be things like scheduled actions. 178 00:09:17,579 --> 00:09:21,746 So let's say that you're planing some large marketing event. 179 00:09:23,757 --> 00:09:27,259 Perhaps your business is gonna be featured on the news, 180 00:09:27,259 --> 00:09:30,401 or perhaps you're launching some kind of campaign 181 00:09:30,401 --> 00:09:33,656 and you note that you expect a certain amount of traffic 182 00:09:33,656 --> 00:09:36,957 or a spike in traffic as a result. 183 00:09:36,957 --> 00:09:39,297 If you know about it ahead of time, 184 00:09:39,297 --> 00:09:42,714 we can schedule perhaps a one time event, 185 00:09:43,562 --> 00:09:46,812 or I've worked with plenty of customers 186 00:09:47,718 --> 00:09:51,899 who have, if you look at their traffic patterns 187 00:09:51,899 --> 00:09:53,979 it's like a regular sign wave. 188 00:09:53,979 --> 00:09:57,302 Everyday at a certain period of time they see the same spike 189 00:09:57,302 --> 00:10:00,699 and everyday at another time they see the same troff, 190 00:10:00,699 --> 00:10:04,758 and so when you know you have those types of rises and falls 191 00:10:04,758 --> 00:10:08,264 in traffic, then we can leverage intervals 192 00:10:08,264 --> 00:10:12,995 to schedule, all right, everyday at four o'clock 193 00:10:12,995 --> 00:10:14,433 we're gonna scale out, 194 00:10:14,433 --> 00:10:17,062 and everyday at eight o'clock we're gonna scale back in, 195 00:10:17,062 --> 00:10:20,219 and that's really helpful when you know 196 00:10:20,219 --> 00:10:22,118 the events ahead of time, 197 00:10:22,118 --> 00:10:25,478 then you can get out ahead of that curve 198 00:10:25,478 --> 00:10:28,102 and go ahead and have those machines available 199 00:10:28,102 --> 00:10:31,142 before that traffic starts coming in. 200 00:10:31,142 --> 00:10:33,714 The other thing that we could do is what's called 201 00:10:33,714 --> 00:10:34,939 a scaling policy, 202 00:10:34,939 --> 00:10:38,742 and this allows us to scale based on a dynamic load. 203 00:10:38,742 --> 00:10:42,787 It's really useful for those times when you can't predict 204 00:10:42,787 --> 00:10:45,558 when that load is gonna come in. 205 00:10:45,558 --> 00:10:49,725 Maybe just out of the blue your website is featured on CNN 206 00:10:51,158 --> 00:10:53,221 or out of the blue it's featured, 207 00:10:53,221 --> 00:10:55,019 it's on the front page of Reddit, 208 00:10:55,019 --> 00:10:57,702 and all of the sudden you have a million more users today 209 00:10:57,702 --> 00:11:00,775 than you did yesterday and you didn't expect them, 210 00:11:00,775 --> 00:11:04,358 and so we can leverage these scaling policies 211 00:11:04,358 --> 00:11:08,902 to scale our system in a number of different ways. 212 00:11:08,902 --> 00:11:12,699 Now a scaling policy has to be triggered by something. 213 00:11:12,699 --> 00:11:15,782 We could trigger it via a CloudWatch alarm. 214 00:11:15,782 --> 00:11:17,739 We could trigger it manually. 215 00:11:17,739 --> 00:11:20,558 Which is what we might call the push button approach. 216 00:11:20,558 --> 00:11:23,381 You just go into the console and say trigger, 217 00:11:23,381 --> 00:11:26,299 and allow the scaling policy to do what it does. 218 00:11:26,299 --> 00:11:28,262 We could do it programmatically. 219 00:11:28,262 --> 00:11:30,699 Perhaps even from within our own application. 220 00:11:30,699 --> 00:11:33,361 Maybe the application is smart enough to know 221 00:11:33,361 --> 00:11:34,998 that it's time to scale 222 00:11:34,998 --> 00:11:37,458 and it can trigger the scaling policy. 223 00:11:37,458 --> 00:11:40,481 So leveraging the elastic load balancer, 224 00:11:40,481 --> 00:11:42,481 leveraging auto scaling, 225 00:11:42,481 --> 00:11:44,801 leveraging our launch configurations, 226 00:11:44,801 --> 00:11:48,379 we can achieve what we call self-healing services. 227 00:11:48,379 --> 00:11:52,321 I don't know about you but I personally don't really like 228 00:11:52,321 --> 00:11:54,822 being woken up at two o'clock in the morning 229 00:11:54,822 --> 00:11:57,841 to go in and reinstall and reconfigure Apache 230 00:11:57,841 --> 00:11:59,979 because our server failed. 231 00:11:59,979 --> 00:12:01,819 I don't wanna have to do that. 232 00:12:01,819 --> 00:12:04,833 Our customers don't wanna wait for us to do that. 233 00:12:04,833 --> 00:12:08,038 Our business owners don't want to be losing money 234 00:12:08,038 --> 00:12:11,542 because we were down for an hour or more. 235 00:12:11,542 --> 00:12:15,121 We want a system that can heal itself 236 00:12:15,121 --> 00:12:16,758 in the case of a fault. 237 00:12:16,758 --> 00:12:19,238 So if one of these instances were to fail 238 00:12:19,238 --> 00:12:22,992 for whatever reason, then we want the auto scaling 239 00:12:22,992 --> 00:12:25,259 service to be able to replace it. 240 00:12:25,259 --> 00:12:29,342 We can leverage user data, as we've talked about, 241 00:12:30,779 --> 00:12:33,078 either adding environment variables, 242 00:12:33,078 --> 00:12:36,918 or adding a script that installs what needs to be installed, 243 00:12:36,918 --> 00:12:39,222 configures what needs to be configured, 244 00:12:39,222 --> 00:12:42,241 and gets that machine bootstrapped and running 245 00:12:42,241 --> 00:12:44,158 as quickly as possible. 246 00:12:45,201 --> 00:12:48,432 So here's an example where perhaps we have 247 00:12:48,432 --> 00:12:51,735 a front end web server tier that needs 248 00:12:51,735 --> 00:12:54,561 a little more CPU than it does memory. 249 00:12:54,561 --> 00:12:57,621 So perhaps in this tier we have a launch configuration 250 00:12:57,621 --> 00:12:59,681 that uses a C4 launch, 251 00:12:59,681 --> 00:13:03,073 and the user data just has environment variables 252 00:13:03,073 --> 00:13:04,961 and then starts the process, 253 00:13:04,961 --> 00:13:07,757 and the process looks to those environment variables 254 00:13:07,757 --> 00:13:10,358 to determine what environment it's in, 255 00:13:10,358 --> 00:13:14,582 what backend service to connect to, and so on. 256 00:13:14,582 --> 00:13:17,082 In our private backend service 257 00:13:18,379 --> 00:13:20,539 behind the internal load balancer, 258 00:13:20,539 --> 00:13:23,281 we also have an auto scaling group 259 00:13:23,281 --> 00:13:25,678 that is with a launch configuration, 260 00:13:25,678 --> 00:13:28,640 that leverages a different AMI. 261 00:13:28,640 --> 00:13:31,155 Perhaps this is an application AMI 262 00:13:31,155 --> 00:13:34,561 that already has our application installed and configured, 263 00:13:34,561 --> 00:13:37,394 but this one will use the M4 large 264 00:13:38,358 --> 00:13:40,561 rather than a smaller server, 265 00:13:40,561 --> 00:13:44,022 because perhaps our application needs more memory. 266 00:13:44,022 --> 00:13:47,689 Here we would leverage the user data to one, 267 00:13:48,737 --> 00:13:51,659 perhaps pull the latest version of our code 268 00:13:51,659 --> 00:13:55,462 down from GitHub, start our application, 269 00:13:55,462 --> 00:13:58,721 and then leveraging the simple notification service, 270 00:13:58,721 --> 00:14:01,579 notify perhaps some monitoring system, 271 00:14:01,579 --> 00:14:04,503 or the person on call that hey, 272 00:14:04,503 --> 00:14:07,222 an instance has been launched, 273 00:14:07,222 --> 00:14:10,117 and is now up and running and ready to go. 274 00:14:10,117 --> 00:14:13,645 So combining our launch configurations, 275 00:14:13,645 --> 00:14:16,056 multiple availability zones, 276 00:14:16,056 --> 00:14:17,953 and auto scaling service, 277 00:14:17,953 --> 00:14:22,120 we can achieve a system that is not only highly available, 278 00:14:23,158 --> 00:14:26,825 but fault tolerant and self-healing as well.