1
00:00:06,604 --> 00:00:10,141
- Here we'll talk about auto scaling.

2
00:00:10,141 --> 00:00:13,882
Auto scaling service is an
incredibly powerful service

3
00:00:13,882 --> 00:00:17,944
that allows us to handle
changes in traffic

4
00:00:17,944 --> 00:00:21,289
or changes in the demands
of our application.

5
00:00:21,289 --> 00:00:25,122
Now one of the really
key features of not only

6
00:00:26,609 --> 00:00:29,897
Amazon Web Services but
cloud computing as a whole,

7
00:00:29,897 --> 00:00:32,362
is the ability to be elastic.

8
00:00:32,362 --> 00:00:34,235
We're operating from a paradigm

9
00:00:34,235 --> 00:00:38,331
of use what you need,
and pay for what you use.

10
00:00:38,331 --> 00:00:41,194
So if we don't need it, why
should we be paying for it?

11
00:00:41,194 --> 00:00:44,748
If we don't need massive servers,

12
00:00:44,748 --> 00:00:46,843
we should be paying for smaller ones.

13
00:00:46,843 --> 00:00:49,000
If we don't need 1,000 machines,

14
00:00:49,000 --> 00:00:51,114
we should be paying for fewer ones.

15
00:00:51,114 --> 00:00:53,917
We want to be looking at our traffic,

16
00:00:53,917 --> 00:00:56,059
looking at the demands of our application

17
00:00:56,059 --> 00:00:58,041
and really saying, what's the minimum

18
00:00:58,041 --> 00:01:02,208
that we need to fulfill these
requests and meet this demand?

19
00:01:03,581 --> 00:01:06,077
And then as that demand rises,

20
00:01:06,077 --> 00:01:07,763
again, we ask that question.

21
00:01:07,763 --> 00:01:11,773
Okay, from this point on what's
the minimum that we need?

22
00:01:11,773 --> 00:01:14,601
And we keep doing that as demand rises,

23
00:01:14,601 --> 00:01:17,032
and then of course ad demand falls,

24
00:01:17,032 --> 00:01:19,517
we wanna be reevaluating that

25
00:01:19,517 --> 00:01:21,868
and going, you know what,
demand is coming down,

26
00:01:21,868 --> 00:01:25,305
we don't need this amount of compute power

27
00:01:25,305 --> 00:01:28,298
so let's pull these
machines out and save money.

28
00:01:28,298 --> 00:01:31,742
So the auto scaling service
can help us with that.

29
00:01:31,742 --> 00:01:36,652
We have our VPC leveraging
multiple availability zones.

30
00:01:36,652 --> 00:01:38,973
We have an elastic load balancer

31
00:01:38,973 --> 00:01:41,085
leveraging multiple availability zones.

32
00:01:41,085 --> 00:01:45,514
As is our application in
multiple EC2 instances

33
00:01:45,514 --> 00:01:48,219
across multiple availability zones.

34
00:01:48,219 --> 00:01:51,197
What we have here is an auto scaling group

35
00:01:51,197 --> 00:01:54,762
that is designed to launch machines

36
00:01:54,762 --> 00:01:58,122
into multiple availability zones.

37
00:01:58,122 --> 00:02:00,205
As our traffic increases,

38
00:02:01,885 --> 00:02:05,414
and we start to see perhaps
that we're going over

39
00:02:05,414 --> 00:02:06,540
some threshold,

40
00:02:06,540 --> 00:02:10,619
maybe it's CPU, or network,
or the number of processes,

41
00:02:10,619 --> 00:02:14,779
then we can allow the auto scaling service

42
00:02:14,779 --> 00:02:18,170
to start to add machines for us,

43
00:02:18,170 --> 00:02:21,853
and perhaps we wanna start at
three and go up from there.

44
00:02:21,853 --> 00:02:24,282
So after a certain period of time

45
00:02:24,282 --> 00:02:26,855
we see that our metrics have not lowered.

46
00:02:26,855 --> 00:02:30,826
We're still above a certain
amount of CPU usage.

47
00:02:30,826 --> 00:02:33,211
We're still above a certain
amount of network usage.

48
00:02:33,211 --> 00:02:36,774
Whatever it is, whatever we
determine the bottle neck to be,

49
00:02:36,774 --> 00:02:40,164
then the auto scaling
service will continue

50
00:02:40,164 --> 00:02:44,987
to add machines for us,
and will continue to scale.

51
00:02:44,987 --> 00:02:46,493
Now we do have some limits.

52
00:02:46,493 --> 00:02:49,053
We can set a minimum,
we can set a maximum.

53
00:02:49,053 --> 00:02:52,199
Typically the minimum
would be set based on

54
00:02:52,199 --> 00:02:55,261
how many machines do we need to maintain

55
00:02:55,261 --> 00:02:58,589
the minimum load or the
lowest amount of traffic

56
00:02:58,589 --> 00:03:00,544
that we ever see.

57
00:03:00,544 --> 00:03:04,760
The maximum would be determined simply by

58
00:03:04,760 --> 00:03:06,757
how much we're willing to spend.

59
00:03:06,757 --> 00:03:10,022
Within Amazon, while we do
have some initial limits

60
00:03:10,022 --> 00:03:12,342
on the number of instances we launch,

61
00:03:12,342 --> 00:03:15,955
we can very easily get
those limits lifted,

62
00:03:15,955 --> 00:03:18,608
and perhaps we aren't willing to pay for

63
00:03:18,608 --> 00:03:20,638
100 machines running.

64
00:03:20,638 --> 00:03:25,614
Perhaps we're only willing
to spend money on 12 or 20,

65
00:03:25,614 --> 00:03:27,862
and so we can set that as our maximum.

66
00:03:27,862 --> 00:03:32,029
And now we could see that
our system has scaled out

67
00:03:33,138 --> 00:03:36,342
and it's maintained a balance of machines

68
00:03:36,342 --> 00:03:38,838
across multiple availability zones.

69
00:03:38,838 --> 00:03:42,255
If we were to lose one of these machines,

70
00:03:43,701 --> 00:03:48,118
then auto scaling would
automatically replace it.

71
00:03:48,118 --> 00:03:51,440
If we were to lose one of
these availability zones,

72
00:03:51,440 --> 00:03:54,182
the auto scaling group
is smart enough to know

73
00:03:54,182 --> 00:03:57,544
if that availability zone is unreachable,

74
00:03:57,544 --> 00:04:00,939
then it will automatically rebalance

75
00:04:00,939 --> 00:04:03,680
and put these two machines over here

76
00:04:03,680 --> 00:04:06,156
in the other two availability zones

77
00:04:06,156 --> 00:04:09,179
that are still in operation.

78
00:04:09,179 --> 00:04:12,699
Now as our traffic starts to come down,

79
00:04:12,699 --> 00:04:15,041
and starts to fall off,

80
00:04:15,041 --> 00:04:17,861
we can allow the auto scaling service

81
00:04:17,861 --> 00:04:20,580
to start terminating machines.

82
00:04:20,580 --> 00:04:24,099
So our traffic comes down even further

83
00:04:24,099 --> 00:04:26,994
and we've removed three machines.

84
00:04:26,994 --> 00:04:29,099
We're back to our original minimum.

85
00:04:29,099 --> 00:04:32,822
So auto scaling service
can help us scale out

86
00:04:32,822 --> 00:04:36,961
to meet demand and then during times where

87
00:04:36,961 --> 00:04:38,662
we don't have that demand,

88
00:04:38,662 --> 00:04:40,721
it helps us scale in to save money,

89
00:04:40,721 --> 00:04:44,297
and the scaling in is just as important,

90
00:04:44,297 --> 00:04:47,436
because one of the primary
reasons so many people,

91
00:04:47,436 --> 00:04:50,194
so many fortune 500s
and government agencies

92
00:04:50,194 --> 00:04:52,299
are moving to Amazon Web Services

93
00:04:52,299 --> 00:04:55,049
is the opportunity to save money,

94
00:04:55,921 --> 00:04:57,942
and so it doesn't really do us any good

95
00:04:57,942 --> 00:05:00,001
to scale out to dozens of machines

96
00:05:00,001 --> 00:05:01,638
and just leave them there.

97
00:05:01,638 --> 00:05:04,699
We want to be sure that we
configure our auto scaling group

98
00:05:04,699 --> 00:05:08,032
to scale in as a way to lower our costs.

99
00:05:09,899 --> 00:05:13,649
Now in order to create
an auto scaling group,

100
00:05:15,076 --> 00:05:17,499
two things are required.

101
00:05:17,499 --> 00:05:19,499
We're going to have what we call

102
00:05:19,499 --> 00:05:21,819
the auto scaling group itself

103
00:05:21,819 --> 00:05:25,302
and then a launch configuration.

104
00:05:25,302 --> 00:05:27,659
Some things that are optional would be

105
00:05:27,659 --> 00:05:31,015
scheduled actions, and scaling policies,

106
00:05:31,015 --> 00:05:33,441
and we'll talk about the
details of these things

107
00:05:33,441 --> 00:05:35,222
here coming up.

108
00:05:35,222 --> 00:05:39,389
The auto scaling service can
replace failed instances.

109
00:05:40,801 --> 00:05:44,699
It can change the capacity
or the number of instances

110
00:05:44,699 --> 00:05:45,866
in that group.

111
00:05:47,078 --> 00:05:50,022
We could, in some cases,
maybe we don't need

112
00:05:50,022 --> 00:05:51,599
to scale out and in,

113
00:05:51,599 --> 00:05:54,998
maybe we just wanna
maintain two instances,

114
00:05:54,998 --> 00:05:56,497
or three, or four instances.

115
00:05:56,497 --> 00:05:58,998
Some fixed number of instances,

116
00:05:58,998 --> 00:06:02,502
and so if one of them fails,
auto scaling will replace it

117
00:06:02,502 --> 00:06:06,982
and just maintain that
certain number of machines.

118
00:06:06,982 --> 00:06:09,299
We're gonna see here in a little bit

119
00:06:09,299 --> 00:06:13,622
how auto scaling can work
with Amazon CloudWatch

120
00:06:13,622 --> 00:06:17,789
so that we can leverage alarms
to trigger scaling events.

121
00:06:19,040 --> 00:06:21,222
As our machines come and go,

122
00:06:21,222 --> 00:06:24,699
as machines are launched,
as machines are terminated,

123
00:06:24,699 --> 00:06:28,881
we can capture those
events with the Amazon

124
00:06:28,881 --> 00:06:30,777
simple notification service,

125
00:06:30,777 --> 00:06:34,998
as well as Lambda and do
some kind of processing

126
00:06:34,998 --> 00:06:37,498
or coordination on that event.

127
00:06:39,636 --> 00:06:43,061
So let's take a look at the
two things that are required.

128
00:06:43,061 --> 00:06:47,228
The launch configuration
and the auto scaling group.

129
00:06:48,537 --> 00:06:52,704
The launch configuration
defines what gets launched.

130
00:06:53,601 --> 00:06:55,899
Such as the instance type.

131
00:06:55,899 --> 00:07:00,241
Maybe we wanna launch a
T2 small, or an M4 large,

132
00:07:00,241 --> 00:07:02,058
or a C4 extra large.

133
00:07:02,058 --> 00:07:04,081
So we defined the type of instance

134
00:07:04,081 --> 00:07:05,782
that we're going to launch.

135
00:07:05,782 --> 00:07:08,282
We can specify the EBS volumes

136
00:07:09,217 --> 00:07:11,382
or the instance store volumes.

137
00:07:11,382 --> 00:07:14,001
We can specify the Amazon machine image,

138
00:07:14,001 --> 00:07:18,481
any user data that we wanna
us to bootstrap that machine,

139
00:07:18,481 --> 00:07:21,318
and once we create the
launch configuration,

140
00:07:21,318 --> 00:07:23,478
it cannot be edited.

141
00:07:23,478 --> 00:07:27,121
We have to delete and recreate
a launch configuration

142
00:07:27,121 --> 00:07:29,499
if we need to make a change to it.

143
00:07:29,499 --> 00:07:31,739
The other component that's required is

144
00:07:31,739 --> 00:07:34,001
the auto scaling group itself.

145
00:07:34,001 --> 00:07:36,337
Where the launch configuration determines

146
00:07:36,337 --> 00:07:39,659
what gets launched, the auto
scaling group determines

147
00:07:39,659 --> 00:07:41,142
where it gets launched,

148
00:07:41,142 --> 00:07:44,993
and that is, here we specify what subnets

149
00:07:44,993 --> 00:07:48,076
and by way of subnets we thus specify

150
00:07:49,099 --> 00:07:52,349
what availability zones are being used.

151
00:07:53,318 --> 00:07:57,158
We also specify which
elastic load balancers

152
00:07:57,158 --> 00:07:58,617
are going to be used,

153
00:07:58,617 --> 00:08:03,073
and yes we can specify
multiple elastic load balancers

154
00:08:03,073 --> 00:08:05,238
for an auto scaling group to use

155
00:08:05,238 --> 00:08:08,195
so that when instances are launched,

156
00:08:08,195 --> 00:08:10,696
the auto scaling group will automatically

157
00:08:10,696 --> 00:08:14,863
register those instances to
the ELBs that we've defined.

158
00:08:16,422 --> 00:08:19,238
Within the auto scaling
group we also specify

159
00:08:19,238 --> 00:08:20,961
the health check type.

160
00:08:20,961 --> 00:08:24,502
Now an auto scaling group needs some way

161
00:08:24,502 --> 00:08:27,339
of knowing whether or
not a machine is healthy

162
00:08:27,339 --> 00:08:29,121
or unhealthy.

163
00:08:29,121 --> 00:08:32,038
If a machine is considered
to be unhealthy,

164
00:08:32,038 --> 00:08:34,796
then the auto scaling
group will terminate it

165
00:08:34,796 --> 00:08:37,921
and replace it with a new machine.

166
00:08:37,921 --> 00:08:41,441
We can get those health checks from either

167
00:08:41,441 --> 00:08:43,739
EC2 system health checks

168
00:08:43,739 --> 00:08:48,214
or we can get the health
checks from the ELB itself,

169
00:08:48,214 --> 00:08:52,241
and we can configure a
health check on the ELB.

170
00:08:52,241 --> 00:08:55,314
If we specify tags, as we should be doing

171
00:08:55,314 --> 00:08:58,758
in order to keep our
environments organized,

172
00:08:58,758 --> 00:09:02,037
we have the option of adding these tags

173
00:09:02,037 --> 00:09:03,878
to the auto scaling group

174
00:09:03,878 --> 00:09:07,782
and then the auto scaling
group can propagate those tags

175
00:09:07,782 --> 00:09:11,409
to the instances that are launched.

176
00:09:11,409 --> 00:09:13,259
A couple of things that are optional

177
00:09:13,259 --> 00:09:17,579
within auto scaling would be
things like scheduled actions.

178
00:09:17,579 --> 00:09:21,746
So let's say that you're planing
some large marketing event.

179
00:09:23,757 --> 00:09:27,259
Perhaps your business is
gonna be featured on the news,

180
00:09:27,259 --> 00:09:30,401
or perhaps you're launching
some kind of campaign

181
00:09:30,401 --> 00:09:33,656
and you note that you expect
a certain amount of traffic

182
00:09:33,656 --> 00:09:36,957
or a spike in traffic as a result.

183
00:09:36,957 --> 00:09:39,297
If you know about it ahead of time,

184
00:09:39,297 --> 00:09:42,714
we can schedule perhaps a one time event,

185
00:09:43,562 --> 00:09:46,812
or I've worked with plenty of customers

186
00:09:47,718 --> 00:09:51,899
who have, if you look at
their traffic patterns

187
00:09:51,899 --> 00:09:53,979
it's like a regular sign wave.

188
00:09:53,979 --> 00:09:57,302
Everyday at a certain period
of time they see the same spike

189
00:09:57,302 --> 00:10:00,699
and everyday at another time
they see the same troff,

190
00:10:00,699 --> 00:10:04,758
and so when you know you have
those types of rises and falls

191
00:10:04,758 --> 00:10:08,264
in traffic, then we can leverage intervals

192
00:10:08,264 --> 00:10:12,995
to schedule, all right,
everyday at four o'clock

193
00:10:12,995 --> 00:10:14,433
we're gonna scale out,

194
00:10:14,433 --> 00:10:17,062
and everyday at eight o'clock
we're gonna scale back in,

195
00:10:17,062 --> 00:10:20,219
and that's really helpful when you know

196
00:10:20,219 --> 00:10:22,118
the events ahead of time,

197
00:10:22,118 --> 00:10:25,478
then you can get out ahead of that curve

198
00:10:25,478 --> 00:10:28,102
and go ahead and have
those machines available

199
00:10:28,102 --> 00:10:31,142
before that traffic starts coming in.

200
00:10:31,142 --> 00:10:33,714
The other thing that we
could do is what's called

201
00:10:33,714 --> 00:10:34,939
a scaling policy,

202
00:10:34,939 --> 00:10:38,742
and this allows us to scale
based on a dynamic load.

203
00:10:38,742 --> 00:10:42,787
It's really useful for those
times when you can't predict

204
00:10:42,787 --> 00:10:45,558
when that load is gonna come in.

205
00:10:45,558 --> 00:10:49,725
Maybe just out of the blue
your website is featured on CNN

206
00:10:51,158 --> 00:10:53,221
or out of the blue it's featured,

207
00:10:53,221 --> 00:10:55,019
it's on the front page of Reddit,

208
00:10:55,019 --> 00:10:57,702
and all of the sudden you have
a million more users today

209
00:10:57,702 --> 00:11:00,775
than you did yesterday and
you didn't expect them,

210
00:11:00,775 --> 00:11:04,358
and so we can leverage
these scaling policies

211
00:11:04,358 --> 00:11:08,902
to scale our system in a
number of different ways.

212
00:11:08,902 --> 00:11:12,699
Now a scaling policy has to
be triggered by something.

213
00:11:12,699 --> 00:11:15,782
We could trigger it
via a CloudWatch alarm.

214
00:11:15,782 --> 00:11:17,739
We could trigger it manually.

215
00:11:17,739 --> 00:11:20,558
Which is what we might call
the push button approach.

216
00:11:20,558 --> 00:11:23,381
You just go into the
console and say trigger,

217
00:11:23,381 --> 00:11:26,299
and allow the scaling
policy to do what it does.

218
00:11:26,299 --> 00:11:28,262
We could do it programmatically.

219
00:11:28,262 --> 00:11:30,699
Perhaps even from within
our own application.

220
00:11:30,699 --> 00:11:33,361
Maybe the application
is smart enough to know

221
00:11:33,361 --> 00:11:34,998
that it's time to scale

222
00:11:34,998 --> 00:11:37,458
and it can trigger the scaling policy.

223
00:11:37,458 --> 00:11:40,481
So leveraging the elastic load balancer,

224
00:11:40,481 --> 00:11:42,481
leveraging auto scaling,

225
00:11:42,481 --> 00:11:44,801
leveraging our launch configurations,

226
00:11:44,801 --> 00:11:48,379
we can achieve what we
call self-healing services.

227
00:11:48,379 --> 00:11:52,321
I don't know about you but I
personally don't really like

228
00:11:52,321 --> 00:11:54,822
being woken up at two
o'clock in the morning

229
00:11:54,822 --> 00:11:57,841
to go in and reinstall
and reconfigure Apache

230
00:11:57,841 --> 00:11:59,979
because our server failed.

231
00:11:59,979 --> 00:12:01,819
I don't wanna have to do that.

232
00:12:01,819 --> 00:12:04,833
Our customers don't wanna
wait for us to do that.

233
00:12:04,833 --> 00:12:08,038
Our business owners don't
want to be losing money

234
00:12:08,038 --> 00:12:11,542
because we were down for an hour or more.

235
00:12:11,542 --> 00:12:15,121
We want a system that can heal itself

236
00:12:15,121 --> 00:12:16,758
in the case of a fault.

237
00:12:16,758 --> 00:12:19,238
So if one of these instances were to fail

238
00:12:19,238 --> 00:12:22,992
for whatever reason, then
we want the auto scaling

239
00:12:22,992 --> 00:12:25,259
service to be able to replace it.

240
00:12:25,259 --> 00:12:29,342
We can leverage user data,
as we've talked about,

241
00:12:30,779 --> 00:12:33,078
either adding environment variables,

242
00:12:33,078 --> 00:12:36,918
or adding a script that installs
what needs to be installed,

243
00:12:36,918 --> 00:12:39,222
configures what needs to be configured,

244
00:12:39,222 --> 00:12:42,241
and gets that machine
bootstrapped and running

245
00:12:42,241 --> 00:12:44,158
as quickly as possible.

246
00:12:45,201 --> 00:12:48,432
So here's an example where perhaps we have

247
00:12:48,432 --> 00:12:51,735
a front end web server tier that needs

248
00:12:51,735 --> 00:12:54,561
a little more CPU than it does memory.

249
00:12:54,561 --> 00:12:57,621
So perhaps in this tier we
have a launch configuration

250
00:12:57,621 --> 00:12:59,681
that uses a C4 launch,

251
00:12:59,681 --> 00:13:03,073
and the user data just
has environment variables

252
00:13:03,073 --> 00:13:04,961
and then starts the process,

253
00:13:04,961 --> 00:13:07,757
and the process looks to
those environment variables

254
00:13:07,757 --> 00:13:10,358
to determine what environment it's in,

255
00:13:10,358 --> 00:13:14,582
what backend service to
connect to, and so on.

256
00:13:14,582 --> 00:13:17,082
In our private backend service

257
00:13:18,379 --> 00:13:20,539
behind the internal load balancer,

258
00:13:20,539 --> 00:13:23,281
we also have an auto scaling group

259
00:13:23,281 --> 00:13:25,678
that is with a launch configuration,

260
00:13:25,678 --> 00:13:28,640
that leverages a different AMI.

261
00:13:28,640 --> 00:13:31,155
Perhaps this is an application AMI

262
00:13:31,155 --> 00:13:34,561
that already has our application
installed and configured,

263
00:13:34,561 --> 00:13:37,394
but this one will use the M4 large

264
00:13:38,358 --> 00:13:40,561
rather than a smaller server,

265
00:13:40,561 --> 00:13:44,022
because perhaps our
application needs more memory.

266
00:13:44,022 --> 00:13:47,689
Here we would leverage
the user data to one,

267
00:13:48,737 --> 00:13:51,659
perhaps pull the latest
version of our code

268
00:13:51,659 --> 00:13:55,462
down from GitHub, start our application,

269
00:13:55,462 --> 00:13:58,721
and then leveraging the
simple notification service,

270
00:13:58,721 --> 00:14:01,579
notify perhaps some monitoring system,

271
00:14:01,579 --> 00:14:04,503
or the person on call that hey,

272
00:14:04,503 --> 00:14:07,222
an instance has been launched,

273
00:14:07,222 --> 00:14:10,117
and is now up and running and ready to go.

274
00:14:10,117 --> 00:14:13,645
So combining our launch configurations,

275
00:14:13,645 --> 00:14:16,056
multiple availability zones,

276
00:14:16,056 --> 00:14:17,953
and auto scaling service,

277
00:14:17,953 --> 00:14:22,120
we can achieve a system that
is not only highly available,

278
00:14:23,158 --> 00:14:26,825
but fault tolerant and
self-healing as well.