1 00:00:06,690 --> 00:00:08,080 - Now let's take a look at a demo 2 00:00:08,080 --> 00:00:11,033 of performing Scan and Query operations. 3 00:00:12,080 --> 00:00:15,310 All right so we're back here in DynamoDB 4 00:00:15,310 --> 00:00:18,310 and here is the Product Catalog Table 5 00:00:18,310 --> 00:00:20,160 that we created earlier. 6 00:00:20,160 --> 00:00:21,760 And so if we take a look at that 7 00:00:22,620 --> 00:00:25,030 so we created this table and we also 8 00:00:25,030 --> 00:00:28,090 loaded some data into that table 9 00:00:28,090 --> 00:00:32,740 so we can explore that by going to the Items tab 10 00:00:32,740 --> 00:00:36,150 and of course we don't have a ton of data but it's enough 11 00:00:36,150 --> 00:00:40,000 to allow us to take a look at these operations. 12 00:00:40,000 --> 00:00:41,480 And so 13 00:00:41,480 --> 00:00:42,750 with a, 14 00:00:42,750 --> 00:00:47,750 the two primary operations that we have are Scan and Query. 15 00:00:48,700 --> 00:00:52,320 So Scan operations, as the name suggests, 16 00:00:52,320 --> 00:00:55,610 will perform a full table Scan. 17 00:00:55,610 --> 00:00:59,440 And so you know there may be a time to do that 18 00:00:59,440 --> 00:01:00,810 and so if we were to for example 19 00:01:00,810 --> 00:01:03,440 if we were to do a Scan 20 00:01:04,300 --> 00:01:05,680 and look for 21 00:01:06,860 --> 00:01:08,650 let's say the title of a book, right? 22 00:01:08,650 --> 00:01:13,107 Let's look for where title is greater than 18. 23 00:01:14,970 --> 00:01:16,920 Let's see if we can do that. 24 00:01:16,920 --> 00:01:19,120 So the attribute will be title 25 00:01:19,120 --> 00:01:21,810 and the reason that we are doing a Scan here 26 00:01:21,810 --> 00:01:24,730 is because we don't have an index on title. 27 00:01:24,730 --> 00:01:29,253 The only index that we have is ID. 28 00:01:30,340 --> 00:01:35,190 And so without an index the only, we can't do a Query. 29 00:01:35,190 --> 00:01:38,823 And so we have to use a Scan. 30 00:01:39,700 --> 00:01:43,690 And then we're Scanning using a filter. 31 00:01:43,690 --> 00:01:45,310 So we're saying that we're looking for 32 00:01:45,310 --> 00:01:47,400 or we're Scanning the table looking for 33 00:01:49,130 --> 00:01:54,130 items where the title attribute equals 34 00:01:54,320 --> 00:01:58,140 with a, you know we can set it to, you know 35 00:01:58,140 --> 00:02:00,110 we're looking for these as string values 36 00:02:00,110 --> 00:02:04,643 and then we can also say begins with 18. 37 00:02:05,840 --> 00:02:09,260 And so if we were to do that and start the search, 38 00:02:09,260 --> 00:02:11,300 there you go now we're getting all the titles 39 00:02:11,300 --> 00:02:12,570 that begin with 18. 40 00:02:12,570 --> 00:02:14,890 It appeared to come back fast 41 00:02:14,890 --> 00:02:16,140 and you know technically it did, 42 00:02:16,140 --> 00:02:18,400 that's because we don't have a whole lot of records. 43 00:02:18,400 --> 00:02:20,790 But if you were into 44 00:02:21,660 --> 00:02:23,781 you know the millions of records 45 00:02:23,781 --> 00:02:27,800 and larger scale applications, it's very easy 46 00:02:27,800 --> 00:02:29,990 to get into the millions and millions of records 47 00:02:29,990 --> 00:02:33,850 maybe into the hundreds of millions or billions. 48 00:02:33,850 --> 00:02:37,800 And in those cases doing a Scan operation like that 49 00:02:37,800 --> 00:02:40,340 where you're literally doing a full table Scan 50 00:02:40,340 --> 00:02:43,510 analyzing every single record in the table 51 00:02:43,510 --> 00:02:46,230 and you know asking the question 52 00:02:46,230 --> 00:02:47,730 does this begin with 18? 53 00:02:47,730 --> 00:02:49,370 Does this begin with 18? 54 00:02:49,370 --> 00:02:53,750 You know to do that a billion times will take a long time. 55 00:02:53,750 --> 00:02:57,300 And so DynamoDB is capable 56 00:02:57,300 --> 00:02:59,650 of single digit millisecond latency 57 00:02:59,650 --> 00:03:03,310 but only when we use the Query type of operation. 58 00:03:03,310 --> 00:03:04,870 So if we were to do a Query 59 00:03:04,870 --> 00:03:07,280 then you'll notice here the difference. 60 00:03:07,280 --> 00:03:08,880 Watch what happens. 61 00:03:08,880 --> 00:03:12,080 With the Scan we're able to basically say 62 00:03:12,080 --> 00:03:15,865 well I want a filter on just about any attribute. 63 00:03:15,865 --> 00:03:18,010 And you can also combine that. 64 00:03:18,010 --> 00:03:20,700 Let's just try this one more time before we move on. 65 00:03:20,700 --> 00:03:22,280 We can also say that 66 00:03:23,740 --> 00:03:28,740 brand is a string that equals or begins with 67 00:03:28,940 --> 00:03:31,373 or let's just say contains mountain, right? 68 00:03:33,080 --> 00:03:34,270 And then if we start that search 69 00:03:34,270 --> 00:03:37,600 then now we retrieve that one record. 70 00:03:37,600 --> 00:03:39,320 So again that's a full table Scan 71 00:03:39,320 --> 00:03:42,340 and so notice the difference here 72 00:03:42,340 --> 00:03:44,990 that when we go from Scan to Query. 73 00:03:44,990 --> 00:03:49,790 Now it's requiring us to enter the partition key, 74 00:03:49,790 --> 00:03:53,850 or the primary key, and it's saying 75 00:03:53,850 --> 00:03:57,790 you know you can only do a Query on an index 76 00:03:57,790 --> 00:04:01,250 and in this particular case the ID is the only index 77 00:04:01,250 --> 00:04:04,390 the primary key being the only index. 78 00:04:04,390 --> 00:04:08,860 And so you'll also notice that we don't get any choice 79 00:04:08,860 --> 00:04:11,460 other than an equals. 80 00:04:11,460 --> 00:04:15,270 So the scan operations provide us with the ability 81 00:04:15,270 --> 00:04:19,613 to do a number of different kinds of comparison operators, 82 00:04:21,470 --> 00:04:25,320 equals, not equals, less than, greater than, between, 83 00:04:25,320 --> 00:04:26,940 all those kinds of things. 84 00:04:26,940 --> 00:04:28,870 But you don't get that kind of flexibility 85 00:04:28,870 --> 00:04:32,340 with the partition key or the primary key. 86 00:04:32,340 --> 00:04:35,490 So here, if we were to remove these, and the we could say 87 00:04:35,490 --> 00:04:39,430 well why it wants a Query based on the ID 88 00:04:39,430 --> 00:04:43,170 so show me 203, start the search 89 00:04:43,170 --> 00:04:45,520 and just like that is comes back, 90 00:04:45,520 --> 00:04:50,520 that kind of operation where you are basing it on an index 91 00:04:50,700 --> 00:04:53,300 is going to be much much faster. 92 00:04:53,300 --> 00:04:56,310 So that's the general recommended best practices 93 00:04:56,310 --> 00:04:59,810 that we try our best to find a primary key 94 00:04:59,810 --> 00:05:02,400 that allows us to do that kind of Query. 95 00:05:02,400 --> 00:05:05,420 And the other thing to keep in mind with DynamoDB 96 00:05:05,420 --> 00:05:08,040 just based on the distributed nature of it, 97 00:05:08,040 --> 00:05:10,990 we talk a lot more about this in other videos 98 00:05:10,990 --> 00:05:13,560 like the Solutions Architect video, 99 00:05:13,560 --> 00:05:17,253 you know you really need to give some thought to this ID, 100 00:05:18,460 --> 00:05:20,989 the kind of value that you're using there. 101 00:05:20,989 --> 00:05:25,210 If one of these products were to become much more popular, 102 00:05:25,210 --> 00:05:28,870 let's say that this particular product, 103 00:05:28,870 --> 00:05:30,970 ID 203, became so popular that 104 00:05:30,970 --> 00:05:32,960 millions and millions of people were looking at it, 105 00:05:32,960 --> 00:05:35,890 then the fact that this one ID 106 00:05:35,890 --> 00:05:38,460 is getting so much more traffic than the other IDs 107 00:05:38,460 --> 00:05:42,180 would mean that we would end up with a hot key 108 00:05:42,180 --> 00:05:45,510 where one key is getting a lot of traffic 109 00:05:45,510 --> 00:05:48,770 and as a result we would end up with a hot partition. 110 00:05:48,770 --> 00:05:51,820 And we could see some performance degradation 111 00:05:51,820 --> 00:05:53,500 with that kind of thing. 112 00:05:53,500 --> 00:05:58,500 And so ideally we have IDs that allow the demand, 113 00:05:59,450 --> 00:06:02,500 whatever that is, the reads and the rights 114 00:06:02,500 --> 00:06:04,980 to be evenly spread across 115 00:06:06,180 --> 00:06:10,270 the cluster that is powering this DynamoDB table. 116 00:06:10,270 --> 00:06:13,190 Okay let's go back to our scan 117 00:06:13,190 --> 00:06:16,340 and here we're not, we don't have any filters. 118 00:06:16,340 --> 00:06:17,270 Let's go ahead and do that 119 00:06:17,270 --> 00:06:20,320 and then without any filters it just gives us everything. 120 00:06:20,320 --> 00:06:23,633 So now we can see essentially all the records, 121 00:06:24,590 --> 00:06:26,830 they're in the table. 122 00:06:26,830 --> 00:06:31,310 It will paginate and so we can create indexes 123 00:06:31,310 --> 00:06:34,020 after the fact, global indexes after the fact. 124 00:06:34,020 --> 00:06:36,540 There is a local secondary index 125 00:06:36,540 --> 00:06:38,734 and a global secondary indexes. 126 00:06:38,734 --> 00:06:41,770 Local secondary indexes can only be created 127 00:06:41,770 --> 00:06:44,040 at the time you create the table. 128 00:06:44,040 --> 00:06:47,410 Global secondary indexes can be created at any time. 129 00:06:47,410 --> 00:06:49,510 So let's go back to the items real quick 130 00:06:49,510 --> 00:06:53,720 and I'll show you if we wanted to Query this table, 131 00:06:53,720 --> 00:06:55,610 and if we find that we were regularly 132 00:06:55,610 --> 00:06:58,330 querying the table based on the title, 133 00:06:58,330 --> 00:07:00,920 then we would want to create an index for that. 134 00:07:00,920 --> 00:07:03,850 We don't want to rely on the scan operation 135 00:07:03,850 --> 00:07:08,850 to return fast results because once this is big enough, 136 00:07:08,870 --> 00:07:10,110 it just won't. 137 00:07:10,110 --> 00:07:13,870 So in this particular case we could create 138 00:07:13,870 --> 00:07:17,630 a global secondary index for the title attribute. 139 00:07:17,630 --> 00:07:19,250 So let's see if we can do that real quick. 140 00:07:19,250 --> 00:07:22,801 Let's create an index and in this particular case 141 00:07:22,801 --> 00:07:25,210 let's just verify that title. 142 00:07:25,210 --> 00:07:27,190 Yep so it's just title, right. 143 00:07:27,190 --> 00:07:30,580 So Indexes, Create, Title. 144 00:07:30,580 --> 00:07:32,150 And it is a string. 145 00:07:32,150 --> 00:07:35,400 And in terms of projected attributes 146 00:07:35,400 --> 00:07:36,620 we can go ahead and say all. 147 00:07:36,620 --> 00:07:38,153 That way we don't, 148 00:07:38,153 --> 00:07:41,940 if we only projected the keys, 149 00:07:41,940 --> 00:07:45,980 then we would have to pull the key from the 150 00:07:45,980 --> 00:07:48,070 from the index and then use the key 151 00:07:48,070 --> 00:07:49,980 to pull the data from the table. 152 00:07:49,980 --> 00:07:53,480 But if we project all of the attributes into the index, 153 00:07:53,480 --> 00:07:56,720 then we can just get everything from the index, 154 00:07:56,720 --> 00:07:59,420 rather than having to go and join it to the 155 00:07:59,420 --> 00:08:02,480 you know to the table in our application. 156 00:08:02,480 --> 00:08:06,280 And we do have the ability to change the capacity units. 157 00:08:06,280 --> 00:08:10,220 For global index the 158 00:08:10,220 --> 00:08:13,290 the throughput capacity that we provisioned 159 00:08:13,290 --> 00:08:15,070 is separate from the table. 160 00:08:15,070 --> 00:08:16,650 So here I might say well you know 161 00:08:16,650 --> 00:08:18,880 let's just go with two and two for now 162 00:08:18,880 --> 00:08:20,620 for the sake of this demo. 163 00:08:20,620 --> 00:08:23,220 And then we'll go ahead and say Create the Index. 164 00:08:23,220 --> 00:08:25,610 And it will take a few minutes for that to create 165 00:08:25,610 --> 00:08:29,180 so we'll go ahead and give that just a few minutes 166 00:08:29,180 --> 00:08:32,160 and we'll fast forward to that creation. 167 00:08:32,160 --> 00:08:33,330 All right and there we go. 168 00:08:33,330 --> 00:08:36,960 Now our global secondary index 169 00:08:36,960 --> 00:08:39,740 for the title is now active. 170 00:08:39,740 --> 00:08:40,940 That did take a few minutes. 171 00:08:40,940 --> 00:08:42,660 It took a little bit longer 172 00:08:42,660 --> 00:08:45,260 than it took to create the table. 173 00:08:45,260 --> 00:08:49,320 And again you can see here it has one partition key 174 00:08:49,320 --> 00:08:50,700 so you can think of, 175 00:08:50,700 --> 00:08:52,380 for those of us who might be you know 176 00:08:52,380 --> 00:08:55,380 coming from a background in relational databases, 177 00:08:55,380 --> 00:08:58,130 you can think of a global secondary index 178 00:08:58,130 --> 00:09:00,400 as a, sort of a, 179 00:09:01,890 --> 00:09:04,260 It's very similar to a materialized view. 180 00:09:04,260 --> 00:09:07,090 So as changes are made to the table, 181 00:09:07,090 --> 00:09:09,300 those changes are then reflected 182 00:09:09,300 --> 00:09:12,280 in the global secondary index. 183 00:09:12,280 --> 00:09:16,800 And so if we were to come back here to our items, 184 00:09:16,800 --> 00:09:18,300 and we refreshed this, 185 00:09:18,300 --> 00:09:20,743 we don't really get the console, 186 00:09:21,730 --> 00:09:23,090 let's just try to refresh this. 187 00:09:23,090 --> 00:09:25,680 I don't think the console gives us the ability 188 00:09:27,530 --> 00:09:31,300 to query based on, 189 00:09:31,300 --> 00:09:33,580 yeah there we go so it does. 190 00:09:33,580 --> 00:09:35,960 Now that we have our index active 191 00:09:35,960 --> 00:09:37,890 and by refreshing the page here, 192 00:09:37,890 --> 00:09:41,260 now we have the ability to 193 00:09:41,260 --> 00:09:44,070 query based on the title index. 194 00:09:44,070 --> 00:09:46,700 Right so now if we do a Query, 195 00:09:46,700 --> 00:09:50,670 now it's saying title as a string equals 196 00:09:50,670 --> 00:09:53,110 and then of course we have to know the title of the book. 197 00:09:53,110 --> 00:09:55,920 So if we wanted to get information based on, 198 00:09:55,920 --> 00:09:58,600 you know the exact title of a book, 199 00:09:58,600 --> 00:10:03,600 such as Book 101 Title, and we run that search, 200 00:10:03,990 --> 00:10:08,157 then very quickly, in single digit milliseconds, 201 00:10:08,157 --> 00:10:10,540 we could get that result back. 202 00:10:10,540 --> 00:10:14,880 So that's an example of you know using Queries 203 00:10:14,880 --> 00:10:19,210 again Queries can leverage some type of an index 204 00:10:19,210 --> 00:10:23,930 so the, again, the primary key is an index 205 00:10:23,930 --> 00:10:26,560 and then we created a secondary index 206 00:10:26,560 --> 00:10:28,940 on the title attribute. 207 00:10:28,940 --> 00:10:32,210 And so by using Query we can 208 00:10:33,310 --> 00:10:36,990 get very fast, single digit millisecond response 209 00:10:36,990 --> 00:10:39,760 otherwise, maybe there's a time and a place 210 00:10:39,760 --> 00:10:41,610 when you need to, but just remember 211 00:10:41,610 --> 00:10:45,644 that scan operations are full table scans 212 00:10:45,644 --> 00:10:48,900 and you know when you have a lot of items 213 00:10:48,900 --> 00:10:52,503 it will add latency to those results.