1 00:00:06,680 --> 00:00:08,190 - [Speaker] Now let's review a used case 2 00:00:08,190 --> 00:00:10,513 of using databases for E-commerce. 3 00:00:11,820 --> 00:00:13,950 So, you'll see that there's a fair amount 4 00:00:13,950 --> 00:00:16,320 going on here in this diagram. 5 00:00:16,320 --> 00:00:18,760 We have a number of different types of databases 6 00:00:18,760 --> 00:00:21,220 for different types of data and different 7 00:00:21,220 --> 00:00:23,560 types of usage patterns. 8 00:00:23,560 --> 00:00:26,960 So first, over here in the top left corner, you'll 9 00:00:26,960 --> 00:00:29,930 see that we have our E-commerce application, 10 00:00:29,930 --> 00:00:32,700 which is a load balanced application. 11 00:00:32,700 --> 00:00:34,880 We're using here a, perhaps 12 00:00:34,880 --> 00:00:36,380 an application load balancer. 13 00:00:36,380 --> 00:00:38,920 We could also use a network load balancer, 14 00:00:38,920 --> 00:00:42,430 and then of course we would have an auto scaled fleet 15 00:00:42,430 --> 00:00:46,686 of EC2 instances and on those, we can run monolithic 16 00:00:46,686 --> 00:00:50,063 application, or a series of microservices. 17 00:00:50,063 --> 00:00:52,980 Now, just moving down and across, we can see 18 00:00:52,980 --> 00:00:55,610 that we're leveraging DynamoDB for a number 19 00:00:55,610 --> 00:00:57,150 of different things. 20 00:00:57,150 --> 00:01:01,160 First, we might be storing different campaign events, 21 00:01:01,160 --> 00:01:04,730 so your marketing team may be running different 22 00:01:04,730 --> 00:01:08,680 marketing campaigns to target different demographics. 23 00:01:08,680 --> 00:01:11,620 And we want to track all of the people that come in 24 00:01:11,620 --> 00:01:13,678 through those campaigns, so perhaps they land on 25 00:01:13,678 --> 00:01:16,748 a particular landing page or they carry with them 26 00:01:16,748 --> 00:01:20,170 some information about that campaign as to where 27 00:01:20,170 --> 00:01:22,960 it actually came from, you know, where did they click 28 00:01:22,960 --> 00:01:26,740 on that ad, or where did they click on some kind a link 29 00:01:26,740 --> 00:01:28,600 in order to get them to our site 30 00:01:28,600 --> 00:01:30,510 to that particular landing page. 31 00:01:30,510 --> 00:01:33,520 And so, we can record that kind of information 32 00:01:33,520 --> 00:01:36,716 very quickly and very easily with DynamoDB. 33 00:01:36,716 --> 00:01:39,866 The same is true for something like affiliate tracking, 34 00:01:39,866 --> 00:01:42,560 so let's say that you have an affiliate program 35 00:01:42,560 --> 00:01:47,520 for your particular online store and you have, 36 00:01:47,520 --> 00:01:49,880 your affiliates out there are putting links 37 00:01:49,880 --> 00:01:53,382 on their website, on their blog or what have you 38 00:01:53,382 --> 00:01:57,680 and as someone clicks that link and follows that link 39 00:01:57,680 --> 00:02:00,720 through to your site and makes a purchase, the affiliate 40 00:02:00,720 --> 00:02:03,860 gets a commission and so you have to track, you know, 41 00:02:03,860 --> 00:02:06,595 what users came in through what affiliate link 42 00:02:06,595 --> 00:02:09,710 and then, of course, tie that purchase back 43 00:02:09,710 --> 00:02:11,530 to that affiliate and then be able 44 00:02:11,530 --> 00:02:13,120 to pay them that commission. 45 00:02:13,120 --> 00:02:17,510 So again, DynamoDB is very well-suited for very quickly 46 00:02:17,510 --> 00:02:20,980 ingesting that kind of information, that kind of data. 47 00:02:20,980 --> 00:02:24,240 And then we might also use DynamoDB to track user 48 00:02:24,240 --> 00:02:27,400 history as they view different products, as they search 49 00:02:27,400 --> 00:02:30,470 for different products, we store that in DynamoDB 50 00:02:30,470 --> 00:02:32,680 so that we can perform a number 51 00:02:32,680 --> 00:02:33,750 of different things out of that. 52 00:02:33,750 --> 00:02:36,390 In the least, we could show them, hey these are the products 53 00:02:36,390 --> 00:02:39,462 that you looked at and make it easy for them to go back 54 00:02:39,462 --> 00:02:42,500 to the product that they were recently looking at rather 55 00:02:42,500 --> 00:02:46,600 than them having to dig and find it all over again. 56 00:02:46,600 --> 00:02:50,040 So, there's, for the user experience, there's just 57 00:02:50,040 --> 00:02:52,867 the matter of making shopping more convenient for them, 58 00:02:52,867 --> 00:02:57,100 sort of greasing the skids, as you will, making it easy 59 00:02:57,100 --> 00:03:00,633 for them to find that product and then actually buy it. 60 00:03:01,603 --> 00:03:03,860 And then, of course, there's also, that kind of data 61 00:03:03,860 --> 00:03:06,610 could also tie into a recommendation engine. 62 00:03:06,610 --> 00:03:10,055 If you're searching for something, then maybe we want, 63 00:03:10,055 --> 00:03:13,110 you know, some days later if you haven't purchased it, 64 00:03:13,110 --> 00:03:15,887 then maybe we want to send an E-mail to say, "Hey, 65 00:03:15,887 --> 00:03:19,127 "we saw that you were searching for this, it's on sale, 66 00:03:19,127 --> 00:03:23,857 "the price has changed, or, if you're searching for this, 67 00:03:23,857 --> 00:03:27,750 "you might also like these other things that are related," 68 00:03:27,750 --> 00:03:28,830 right? 69 00:03:28,830 --> 00:03:31,870 And then, of course, as users are actually doing their 70 00:03:31,870 --> 00:03:33,948 shopping and placing things in their cart, 71 00:03:33,948 --> 00:03:36,140 we can record that information as well. 72 00:03:36,140 --> 00:03:39,000 So, this is all very rich information that we can use 73 00:03:39,000 --> 00:03:43,037 to establish trends, to help recommend different products 74 00:03:43,037 --> 00:03:48,037 to users directly or to, perhaps users 75 00:03:48,320 --> 00:03:50,270 that are somehow related to each other. 76 00:03:51,490 --> 00:03:55,909 Now, other types of data, looking over here to RDS, 77 00:03:55,909 --> 00:03:59,060 we would be running RDS, in production we would 78 00:03:59,060 --> 00:04:01,370 run that in a Multi-AZ Deployment. 79 00:04:01,370 --> 00:04:04,590 Again, keep in mind that a Multi-AZ Deployment 80 00:04:04,590 --> 00:04:09,590 means that we are running both a primary and a secondary, 81 00:04:10,810 --> 00:04:14,636 these are two completely physically distinct servers 82 00:04:14,636 --> 00:04:16,960 and physically distinct data centers. 83 00:04:16,960 --> 00:04:21,960 And then, of course, AWS is handling the synchronous 84 00:04:22,008 --> 00:04:26,380 replication between those two, and so we have two 85 00:04:26,380 --> 00:04:28,600 copies of our data that are live. 86 00:04:28,600 --> 00:04:30,832 Now, it's not list shown on this diagram, but we could 87 00:04:30,832 --> 00:04:35,832 also say that we are relying on those automated backups 88 00:04:35,840 --> 00:04:39,214 going to S3 for the sake of disaster recovery, 89 00:04:39,214 --> 00:04:43,670 giving us a greater degree of durability in that data. 90 00:04:43,670 --> 00:04:46,621 Now, in a relational database, that may be more 91 00:04:46,621 --> 00:04:51,532 well-suited for data like user accounts and the product 92 00:04:51,532 --> 00:04:56,532 catalog in inventory, right, so that as users place 93 00:04:56,617 --> 00:05:01,617 their orders, and there's now this relationship 94 00:05:01,680 --> 00:05:04,970 between the order that they placed, the user, the order 95 00:05:04,970 --> 00:05:08,860 that was placed, and the inventory that is now available. 96 00:05:08,860 --> 00:05:11,120 And that type of thing, and then, of course, all 97 00:05:11,120 --> 00:05:13,280 of the billing that's related to that as well, 98 00:05:13,280 --> 00:05:15,168 that the actual financial transaction. 99 00:05:15,168 --> 00:05:18,128 All of that we may very well need to be acid compliant, 100 00:05:18,128 --> 00:05:21,380 right, we may want to record all of that using 101 00:05:21,380 --> 00:05:24,130 a transaction with the ability to roll back. 102 00:05:24,130 --> 00:05:27,110 Now, of course, keep in mind that a later update 103 00:05:27,110 --> 00:05:32,110 to DynamoDB supports transactions, 104 00:05:32,190 --> 00:05:35,185 but your team may feel more comfortable using 105 00:05:35,185 --> 00:05:37,630 a relational database for that kind of thing. 106 00:05:37,630 --> 00:05:39,475 It's all up to you, right. 107 00:05:39,475 --> 00:05:42,691 Now, in this particular case, because we do have 108 00:05:42,691 --> 00:05:44,978 the product catalog there, we could assume 109 00:05:44,978 --> 00:05:48,548 that the, it would make sense that this application 110 00:05:48,548 --> 00:05:51,573 would be performing a lot of reads, so at least some 111 00:05:51,573 --> 00:05:54,990 tables would be very read heavy, such as the product 112 00:05:54,990 --> 00:05:57,510 catalog if people are constantly querying 113 00:05:57,510 --> 00:06:00,100 that and looking for items. 114 00:06:00,100 --> 00:06:02,460 Now we could potentially share the product catalog 115 00:06:02,460 --> 00:06:04,721 with DynamoDB so that we could read it from there, 116 00:06:04,721 --> 00:06:09,721 or we could create read replicas so that all of our 117 00:06:11,100 --> 00:06:14,080 application reads are performed by the read replicas. 118 00:06:14,080 --> 00:06:17,730 So as users are browsing the site, as they are searching 119 00:06:17,730 --> 00:06:20,357 for products, those reads are being pulled from 120 00:06:20,357 --> 00:06:23,070 the read replicas and that would 121 00:06:23,070 --> 00:06:25,797 reduce the load on the primary. 122 00:06:25,797 --> 00:06:28,210 And then, of course, we could allow the primary 123 00:06:28,210 --> 00:06:31,050 to handle only rights, right, so splitting 124 00:06:31,050 --> 00:06:34,174 reads off to read replicas, we could increase 125 00:06:34,174 --> 00:06:37,150 the performance for the entire system. 126 00:06:37,150 --> 00:06:40,330 Now, as far as a cache, we could have certain things, 127 00:06:40,330 --> 00:06:43,529 for example, we may have different types of pages 128 00:06:43,529 --> 00:06:48,529 or entire pages, HTML pages that are built 129 00:06:49,150 --> 00:06:52,633 out of both, perhaps, a slower query, a very 130 00:06:52,633 --> 00:06:57,633 complex query or, and rendered HTML. 131 00:06:57,710 --> 00:07:01,529 We may also have parts of pages, right, just related 132 00:07:01,529 --> 00:07:05,830 products that are only sub-components of different 133 00:07:05,830 --> 00:07:08,793 pages, right, so the related products may also be either 134 00:07:08,793 --> 00:07:13,793 a sequel query and the result set of that, or 135 00:07:13,854 --> 00:07:16,860 the result set and the rendered HTML. 136 00:07:16,860 --> 00:07:19,760 So either way, if those kinds of things are running 137 00:07:19,760 --> 00:07:23,510 a little bit slower, then we can store that in memory. 138 00:07:23,510 --> 00:07:28,510 And our application would, first, look to the cache 139 00:07:29,590 --> 00:07:31,327 and say, "Hey, do you have this thing 140 00:07:31,327 --> 00:07:33,040 "that I'm looking for?" 141 00:07:33,040 --> 00:07:36,165 If it does, then we can return that very very quickly, 142 00:07:36,165 --> 00:07:41,165 right, and maybe even in microsecond latency, depending 143 00:07:41,386 --> 00:07:46,060 on the engine that we're using and the nature 144 00:07:46,060 --> 00:07:47,415 of that particular data. 145 00:07:47,415 --> 00:07:51,594 If it doesn't, then we could, of course, go to 146 00:07:51,594 --> 00:07:55,440 the relational database and pull that or we would pull 147 00:07:55,440 --> 00:07:57,520 it from the read replica and then, of course, we would 148 00:07:57,520 --> 00:08:00,560 do our rights to the master, to the primary. 149 00:08:00,560 --> 00:08:04,590 And then, of course, we have Redshift, and Redshift, 150 00:08:04,590 --> 00:08:08,000 again, is Amazon's petabyte scale data warehouse, 151 00:08:08,000 --> 00:08:10,990 and so with Redshift, we want to be able to run 152 00:08:10,990 --> 00:08:15,270 analytical queries and analyze all of this data 153 00:08:15,270 --> 00:08:17,389 that we have in these various places, right, 154 00:08:17,389 --> 00:08:21,501 so we want to be able to pull in all of this information 155 00:08:21,501 --> 00:08:26,501 from DynamoDB, campaigns, affiliate tracking, product 156 00:08:26,990 --> 00:08:29,110 view history, search history, shopping carts. 157 00:08:29,110 --> 00:08:32,721 We pull all of that data in, we could use potentially 158 00:08:32,721 --> 00:08:36,750 Data Pipeline or Lambda or something else to help 159 00:08:36,750 --> 00:08:38,609 us extract, transform, and load 160 00:08:38,609 --> 00:08:40,193 between DynamoDB and Redshfit. 161 00:08:42,240 --> 00:08:46,141 And then we could do the same thing, pull data in 162 00:08:46,141 --> 00:08:50,413 from our relational database and now we have all 163 00:08:50,413 --> 00:08:54,666 of that data in one place, in one data warehouse, 164 00:08:54,666 --> 00:08:58,820 and now our data analytics team can use their 165 00:08:58,820 --> 00:09:01,730 existing, whatever business intelligence tools they want 166 00:09:01,730 --> 00:09:05,640 to use from their local machines on premises or at home, 167 00:09:05,640 --> 00:09:09,040 or wherever they are, they can connect to Redshift 168 00:09:09,040 --> 00:09:13,610 using those ODBC or JDBC drivers and run whatever 169 00:09:13,610 --> 00:09:16,930 complex ad-hoc queries that they can think of in order 170 00:09:16,930 --> 00:09:21,310 to derive meaning out of all of this data. 171 00:09:21,310 --> 00:09:23,256 Right, so, of course we didn't show it here 172 00:09:23,256 --> 00:09:26,286 in this particular diagram, we didn't really have room. 173 00:09:26,286 --> 00:09:30,170 But, going back to some of the other databases 174 00:09:30,170 --> 00:09:32,850 that we've talked about, such as Neptune, for example. 175 00:09:32,850 --> 00:09:37,051 Neptune would have a very clear place here in terms 176 00:09:37,051 --> 00:09:41,490 of recording relationships between users and the products 177 00:09:41,490 --> 00:09:43,410 that they've purchased or the products that they've 178 00:09:43,410 --> 00:09:45,510 searched for and that would also 179 00:09:45,510 --> 00:09:47,879 help us create a recommendation engine. 180 00:09:47,879 --> 00:09:51,560 So, as you can see, we have a number of different 181 00:09:51,560 --> 00:09:55,600 great options within AWS to store our data. 182 00:09:55,600 --> 00:09:58,505 Many of them are fully managed services, allowing 183 00:09:58,505 --> 00:10:01,450 us to offload the operational burden. 184 00:10:01,450 --> 00:10:03,970 If you notice here, in this diagram, 185 00:10:03,970 --> 00:10:07,400 there's very little operational burden, 186 00:10:07,400 --> 00:10:10,105 in fact, the only operational burden that we would have 187 00:10:10,105 --> 00:10:14,215 would be, essentially, the operating systems 188 00:10:14,215 --> 00:10:17,670 for our EC2 instances over here. 189 00:10:17,670 --> 00:10:20,620 And so you can see that we can run a fairly 190 00:10:20,620 --> 00:10:24,045 complex type of scenario, in this example, E-commerce 191 00:10:24,045 --> 00:10:28,251 with a smaller team, right, by offloading all of those 192 00:10:28,251 --> 00:10:31,700 burdens to AWS, allowing AWS to manage many 193 00:10:31,700 --> 00:10:33,563 of these services for us.