1
00:00:06,680 --> 00:00:08,190
- [Speaker] Now let's review a used case

2
00:00:08,190 --> 00:00:10,513
of using databases for E-commerce.

3
00:00:11,820 --> 00:00:13,950
So, you'll see that there's a fair amount

4
00:00:13,950 --> 00:00:16,320
going on here in this diagram.

5
00:00:16,320 --> 00:00:18,760
We have a number of
different types of databases

6
00:00:18,760 --> 00:00:21,220
for different types of data and different

7
00:00:21,220 --> 00:00:23,560
types of usage patterns.

8
00:00:23,560 --> 00:00:26,960
So first, over here in the
top left corner, you'll

9
00:00:26,960 --> 00:00:29,930
see that we have our
E-commerce application,

10
00:00:29,930 --> 00:00:32,700
which is a load balanced application.

11
00:00:32,700 --> 00:00:34,880
We're using here a, perhaps

12
00:00:34,880 --> 00:00:36,380
an application load balancer.

13
00:00:36,380 --> 00:00:38,920
We could also use a network load balancer,

14
00:00:38,920 --> 00:00:42,430
and then of course we would
have an auto scaled fleet

15
00:00:42,430 --> 00:00:46,686
of EC2 instances and on
those, we can run monolithic

16
00:00:46,686 --> 00:00:50,063
application, or a series of microservices.

17
00:00:50,063 --> 00:00:52,980
Now, just moving down
and across, we can see

18
00:00:52,980 --> 00:00:55,610
that we're leveraging
DynamoDB for a number

19
00:00:55,610 --> 00:00:57,150
of different things.

20
00:00:57,150 --> 00:01:01,160
First, we might be storing
different campaign events,

21
00:01:01,160 --> 00:01:04,730
so your marketing team
may be running different

22
00:01:04,730 --> 00:01:08,680
marketing campaigns to target
different demographics.

23
00:01:08,680 --> 00:01:11,620
And we want to track all
of the people that come in

24
00:01:11,620 --> 00:01:13,678
through those campaigns,
so perhaps they land on

25
00:01:13,678 --> 00:01:16,748
a particular landing page
or they carry with them

26
00:01:16,748 --> 00:01:20,170
some information about
that campaign as to where

27
00:01:20,170 --> 00:01:22,960
it actually came from, you
know, where did they click

28
00:01:22,960 --> 00:01:26,740
on that ad, or where did they
click on some kind a link

29
00:01:26,740 --> 00:01:28,600
in order to get them to our site

30
00:01:28,600 --> 00:01:30,510
to that particular landing page.

31
00:01:30,510 --> 00:01:33,520
And so, we can record
that kind of information

32
00:01:33,520 --> 00:01:36,716
very quickly and very
easily with DynamoDB.

33
00:01:36,716 --> 00:01:39,866
The same is true for something
like affiliate tracking,

34
00:01:39,866 --> 00:01:42,560
so let's say that you
have an affiliate program

35
00:01:42,560 --> 00:01:47,520
for your particular
online store and you have,

36
00:01:47,520 --> 00:01:49,880
your affiliates out
there are putting links

37
00:01:49,880 --> 00:01:53,382
on their website, on their
blog or what have you

38
00:01:53,382 --> 00:01:57,680
and as someone clicks that
link and follows that link

39
00:01:57,680 --> 00:02:00,720
through to your site and makes
a purchase, the affiliate

40
00:02:00,720 --> 00:02:03,860
gets a commission and so
you have to track, you know,

41
00:02:03,860 --> 00:02:06,595
what users came in through
what affiliate link

42
00:02:06,595 --> 00:02:09,710
and then, of course,
tie that purchase back

43
00:02:09,710 --> 00:02:11,530
to that affiliate and then be able

44
00:02:11,530 --> 00:02:13,120
to pay them that commission.

45
00:02:13,120 --> 00:02:17,510
So again, DynamoDB is very
well-suited for very quickly

46
00:02:17,510 --> 00:02:20,980
ingesting that kind of
information, that kind of data.

47
00:02:20,980 --> 00:02:24,240
And then we might also
use DynamoDB to track user

48
00:02:24,240 --> 00:02:27,400
history as they view different
products, as they search

49
00:02:27,400 --> 00:02:30,470
for different products,
we store that in DynamoDB

50
00:02:30,470 --> 00:02:32,680
so that we can perform a number

51
00:02:32,680 --> 00:02:33,750
of different things out of that.

52
00:02:33,750 --> 00:02:36,390
In the least, we could show
them, hey these are the products

53
00:02:36,390 --> 00:02:39,462
that you looked at and make
it easy for them to go back

54
00:02:39,462 --> 00:02:42,500
to the product that they were
recently looking at rather

55
00:02:42,500 --> 00:02:46,600
than them having to dig
and find it all over again.

56
00:02:46,600 --> 00:02:50,040
So, there's, for the user
experience, there's just

57
00:02:50,040 --> 00:02:52,867
the matter of making shopping
more convenient for them,

58
00:02:52,867 --> 00:02:57,100
sort of greasing the skids,
as you will, making it easy

59
00:02:57,100 --> 00:03:00,633
for them to find that product
and then actually buy it.

60
00:03:01,603 --> 00:03:03,860
And then, of course, there's
also, that kind of data

61
00:03:03,860 --> 00:03:06,610
could also tie into a
recommendation engine.

62
00:03:06,610 --> 00:03:10,055
If you're searching for
something, then maybe we want,

63
00:03:10,055 --> 00:03:13,110
you know, some days later
if you haven't purchased it,

64
00:03:13,110 --> 00:03:15,887
then maybe we want to send
an E-mail to say, "Hey,

65
00:03:15,887 --> 00:03:19,127
"we saw that you were searching
for this, it's on sale,

66
00:03:19,127 --> 00:03:23,857
"the price has changed, or,
if you're searching for this,

67
00:03:23,857 --> 00:03:27,750
"you might also like these
other things that are related,"

68
00:03:27,750 --> 00:03:28,830
right?

69
00:03:28,830 --> 00:03:31,870
And then, of course, as users
are actually doing their

70
00:03:31,870 --> 00:03:33,948
shopping and placing things in their cart,

71
00:03:33,948 --> 00:03:36,140
we can record that information as well.

72
00:03:36,140 --> 00:03:39,000
So, this is all very rich
information that we can use

73
00:03:39,000 --> 00:03:43,037
to establish trends, to help
recommend different products

74
00:03:43,037 --> 00:03:48,037
to users directly or to, perhaps users

75
00:03:48,320 --> 00:03:50,270
that are somehow related to each other.

76
00:03:51,490 --> 00:03:55,909
Now, other types of data,
looking over here to RDS,

77
00:03:55,909 --> 00:03:59,060
we would be running RDS,
in production we would

78
00:03:59,060 --> 00:04:01,370
run that in a Multi-AZ Deployment.

79
00:04:01,370 --> 00:04:04,590
Again, keep in mind that
a Multi-AZ Deployment

80
00:04:04,590 --> 00:04:09,590
means that we are running both
a primary and a secondary,

81
00:04:10,810 --> 00:04:14,636
these are two completely
physically distinct servers

82
00:04:14,636 --> 00:04:16,960
and physically distinct data centers.

83
00:04:16,960 --> 00:04:21,960
And then, of course, AWS
is handling the synchronous

84
00:04:22,008 --> 00:04:26,380
replication between those
two, and so we have two

85
00:04:26,380 --> 00:04:28,600
copies of our data that are live.

86
00:04:28,600 --> 00:04:30,832
Now, it's not list shown on
this diagram, but we could

87
00:04:30,832 --> 00:04:35,832
also say that we are relying
on those automated backups

88
00:04:35,840 --> 00:04:39,214
going to S3 for the sake
of disaster recovery,

89
00:04:39,214 --> 00:04:43,670
giving us a greater degree
of durability in that data.

90
00:04:43,670 --> 00:04:46,621
Now, in a relational
database, that may be more

91
00:04:46,621 --> 00:04:51,532
well-suited for data like
user accounts and the product

92
00:04:51,532 --> 00:04:56,532
catalog in inventory, right,
so that as users place

93
00:04:56,617 --> 00:05:01,617
their orders, and there's
now this relationship

94
00:05:01,680 --> 00:05:04,970
between the order that they
placed, the user, the order

95
00:05:04,970 --> 00:05:08,860
that was placed, and the
inventory that is now available.

96
00:05:08,860 --> 00:05:11,120
And that type of thing,
and then, of course, all

97
00:05:11,120 --> 00:05:13,280
of the billing that's
related to that as well,

98
00:05:13,280 --> 00:05:15,168
that the actual financial transaction.

99
00:05:15,168 --> 00:05:18,128
All of that we may very well
need to be acid compliant,

100
00:05:18,128 --> 00:05:21,380
right, we may want to
record all of that using

101
00:05:21,380 --> 00:05:24,130
a transaction with the
ability to roll back.

102
00:05:24,130 --> 00:05:27,110
Now, of course, keep in
mind that a later update

103
00:05:27,110 --> 00:05:32,110
to DynamoDB supports transactions,

104
00:05:32,190 --> 00:05:35,185
but your team may feel
more comfortable using

105
00:05:35,185 --> 00:05:37,630
a relational database
for that kind of thing.

106
00:05:37,630 --> 00:05:39,475
It's all up to you, right.

107
00:05:39,475 --> 00:05:42,691
Now, in this particular
case, because we do have

108
00:05:42,691 --> 00:05:44,978
the product catalog there, we could assume

109
00:05:44,978 --> 00:05:48,548
that the, it would make
sense that this application

110
00:05:48,548 --> 00:05:51,573
would be performing a lot
of reads, so at least some

111
00:05:51,573 --> 00:05:54,990
tables would be very read
heavy, such as the product

112
00:05:54,990 --> 00:05:57,510
catalog if people are constantly querying

113
00:05:57,510 --> 00:06:00,100
that and looking for items.

114
00:06:00,100 --> 00:06:02,460
Now we could potentially
share the product catalog

115
00:06:02,460 --> 00:06:04,721
with DynamoDB so that we
could read it from there,

116
00:06:04,721 --> 00:06:09,721
or we could create read
replicas so that all of our

117
00:06:11,100 --> 00:06:14,080
application reads are
performed by the read replicas.

118
00:06:14,080 --> 00:06:17,730
So as users are browsing the
site, as they are searching

119
00:06:17,730 --> 00:06:20,357
for products, those reads
are being pulled from

120
00:06:20,357 --> 00:06:23,070
the read replicas and that would

121
00:06:23,070 --> 00:06:25,797
reduce the load on the primary.

122
00:06:25,797 --> 00:06:28,210
And then, of course, we
could allow the primary

123
00:06:28,210 --> 00:06:31,050
to handle only rights, right, so splitting

124
00:06:31,050 --> 00:06:34,174
reads off to read
replicas, we could increase

125
00:06:34,174 --> 00:06:37,150
the performance for the entire system.

126
00:06:37,150 --> 00:06:40,330
Now, as far as a cache, we
could have certain things,

127
00:06:40,330 --> 00:06:43,529
for example, we may have
different types of pages

128
00:06:43,529 --> 00:06:48,529
or entire pages, HTML pages that are built

129
00:06:49,150 --> 00:06:52,633
out of both, perhaps,
a slower query, a very

130
00:06:52,633 --> 00:06:57,633
complex query or, and rendered HTML.

131
00:06:57,710 --> 00:07:01,529
We may also have parts of
pages, right, just related

132
00:07:01,529 --> 00:07:05,830
products that are only
sub-components of different

133
00:07:05,830 --> 00:07:08,793
pages, right, so the related
products may also be either

134
00:07:08,793 --> 00:07:13,793
a sequel query and the
result set of that, or

135
00:07:13,854 --> 00:07:16,860
the result set and the rendered HTML.

136
00:07:16,860 --> 00:07:19,760
So either way, if those
kinds of things are running

137
00:07:19,760 --> 00:07:23,510
a little bit slower, then
we can store that in memory.

138
00:07:23,510 --> 00:07:28,510
And our application would,
first, look to the cache

139
00:07:29,590 --> 00:07:31,327
and say, "Hey, do you have this thing

140
00:07:31,327 --> 00:07:33,040
"that I'm looking for?"

141
00:07:33,040 --> 00:07:36,165
If it does, then we can
return that very very quickly,

142
00:07:36,165 --> 00:07:41,165
right, and maybe even in
microsecond latency, depending

143
00:07:41,386 --> 00:07:46,060
on the engine that we're
using and the nature

144
00:07:46,060 --> 00:07:47,415
of that particular data.

145
00:07:47,415 --> 00:07:51,594
If it doesn't, then we
could, of course, go to

146
00:07:51,594 --> 00:07:55,440
the relational database and
pull that or we would pull

147
00:07:55,440 --> 00:07:57,520
it from the read replica and
then, of course, we would

148
00:07:57,520 --> 00:08:00,560
do our rights to the
master, to the primary.

149
00:08:00,560 --> 00:08:04,590
And then, of course, we
have Redshift, and Redshift,

150
00:08:04,590 --> 00:08:08,000
again, is Amazon's petabyte
scale data warehouse,

151
00:08:08,000 --> 00:08:10,990
and so with Redshift, we
want to be able to run

152
00:08:10,990 --> 00:08:15,270
analytical queries and
analyze all of this data

153
00:08:15,270 --> 00:08:17,389
that we have in these
various places, right,

154
00:08:17,389 --> 00:08:21,501
so we want to be able to pull
in all of this information

155
00:08:21,501 --> 00:08:26,501
from DynamoDB, campaigns,
affiliate tracking, product

156
00:08:26,990 --> 00:08:29,110
view history, search
history, shopping carts.

157
00:08:29,110 --> 00:08:32,721
We pull all of that data
in, we could use potentially

158
00:08:32,721 --> 00:08:36,750
Data Pipeline or Lambda
or something else to help

159
00:08:36,750 --> 00:08:38,609
us extract, transform, and load

160
00:08:38,609 --> 00:08:40,193
between DynamoDB and Redshfit.

161
00:08:42,240 --> 00:08:46,141
And then we could do the
same thing, pull data in

162
00:08:46,141 --> 00:08:50,413
from our relational
database and now we have all

163
00:08:50,413 --> 00:08:54,666
of that data in one place,
in one data warehouse,

164
00:08:54,666 --> 00:08:58,820
and now our data analytics
team can use their

165
00:08:58,820 --> 00:09:01,730
existing, whatever business
intelligence tools they want

166
00:09:01,730 --> 00:09:05,640
to use from their local
machines on premises or at home,

167
00:09:05,640 --> 00:09:09,040
or wherever they are, they
can connect to Redshift

168
00:09:09,040 --> 00:09:13,610
using those ODBC or JDBC
drivers and run whatever

169
00:09:13,610 --> 00:09:16,930
complex ad-hoc queries that
they can think of in order

170
00:09:16,930 --> 00:09:21,310
to derive meaning out of all of this data.

171
00:09:21,310 --> 00:09:23,256
Right, so, of course
we didn't show it here

172
00:09:23,256 --> 00:09:26,286
in this particular diagram,
we didn't really have room.

173
00:09:26,286 --> 00:09:30,170
But, going back to some
of the other databases

174
00:09:30,170 --> 00:09:32,850
that we've talked about,
such as Neptune, for example.

175
00:09:32,850 --> 00:09:37,051
Neptune would have a very
clear place here in terms

176
00:09:37,051 --> 00:09:41,490
of recording relationships
between users and the products

177
00:09:41,490 --> 00:09:43,410
that they've purchased or
the products that they've

178
00:09:43,410 --> 00:09:45,510
searched for and that would also

179
00:09:45,510 --> 00:09:47,879
help us create a recommendation engine.

180
00:09:47,879 --> 00:09:51,560
So, as you can see, we
have a number of different

181
00:09:51,560 --> 00:09:55,600
great options within
AWS to store our data.

182
00:09:55,600 --> 00:09:58,505
Many of them are fully
managed services, allowing

183
00:09:58,505 --> 00:10:01,450
us to offload the operational burden.

184
00:10:01,450 --> 00:10:03,970
If you notice here, in this diagram,

185
00:10:03,970 --> 00:10:07,400
there's very little operational burden,

186
00:10:07,400 --> 00:10:10,105
in fact, the only operational
burden that we would have

187
00:10:10,105 --> 00:10:14,215
would be, essentially,
the operating systems

188
00:10:14,215 --> 00:10:17,670
for our EC2 instances over here.

189
00:10:17,670 --> 00:10:20,620
And so you can see that
we can run a fairly

190
00:10:20,620 --> 00:10:24,045
complex type of scenario,
in this example, E-commerce

191
00:10:24,045 --> 00:10:28,251
with a smaller team, right,
by offloading all of those

192
00:10:28,251 --> 00:10:31,700
burdens to AWS, allowing
AWS to manage many

193
00:10:31,700 --> 00:10:33,563
of these services for us.