1 00:00:07,300 --> 00:00:10,493 - Now let's review an introduction to Amazon Redshift. 2 00:00:11,360 --> 00:00:16,360 With Amazon Redshift we gain a fully-managed data warehouse, 3 00:00:16,360 --> 00:00:18,730 a petabyte scale data warehouse. 4 00:00:18,730 --> 00:00:21,170 Upwards of two petabytes can run 5 00:00:21,170 --> 00:00:23,930 in a single cluster within Amazon Redshift. 6 00:00:23,930 --> 00:00:27,873 And then of course Redshift is based on Postgres 8.0.2. 7 00:00:29,200 --> 00:00:33,930 And so AWS took a fork of Postgres 8.0.2 8 00:00:33,930 --> 00:00:35,980 and made their own modifications 9 00:00:35,980 --> 00:00:38,900 in order to support certain types of technologies 10 00:00:38,900 --> 00:00:43,320 and allow us to run complex ad hoc queries 11 00:00:43,320 --> 00:00:46,473 against those petabyte-sized data sets. 12 00:00:47,640 --> 00:00:51,890 Now because it is a fork of Postgres 8.0.2, 13 00:00:51,890 --> 00:00:53,590 it is SQL compliant. 14 00:00:53,590 --> 00:00:57,350 It supports all of the data types and the SQL language 15 00:00:57,350 --> 00:00:59,203 that was supported by Postgres 8.0.2. 16 00:01:00,130 --> 00:01:02,340 And of course, because of that, 17 00:01:02,340 --> 00:01:04,900 you can connect to a Redshift cluster 18 00:01:04,900 --> 00:01:07,860 using existing business intelligence tools 19 00:01:07,860 --> 00:01:11,300 that support JDBC and ODBC drivers. 20 00:01:11,300 --> 00:01:14,850 Now, one of the things that make Redshift really performant 21 00:01:14,850 --> 00:01:17,050 is that AWS took that fork of Postgres 8.0.2 22 00:01:19,270 --> 00:01:21,770 and they added columnar storage. 23 00:01:21,770 --> 00:01:24,890 Columnar storage as opposed to row-based storage 24 00:01:24,890 --> 00:01:26,640 makes much more efficient use 25 00:01:26,640 --> 00:01:29,050 of the actual underlying storage 26 00:01:29,050 --> 00:01:33,340 as well as more efficient use of input/output operations. 27 00:01:33,340 --> 00:01:36,910 So we can retrieve a similar amount of data 28 00:01:36,910 --> 00:01:40,730 from columnar storage with fewer I/O operations, 29 00:01:40,730 --> 00:01:43,243 as opposed to using row-based storage. 30 00:01:43,243 --> 00:01:45,980 We can make more efficient use of memory 31 00:01:45,980 --> 00:01:49,210 by using columnar storage as well. 32 00:01:49,210 --> 00:01:51,570 And so, along with columnar storage, 33 00:01:51,570 --> 00:01:54,670 Redshift also added parallel queries. 34 00:01:54,670 --> 00:01:59,670 So with Redshift, we can create clusters of nodes 35 00:01:59,980 --> 00:02:03,480 so that our queries are being run in parallel 36 00:02:03,480 --> 00:02:05,110 across all of those nodes. 37 00:02:05,110 --> 00:02:07,760 That is one of the ways that we can achieve 38 00:02:07,760 --> 00:02:12,280 very fast results across very large data sets. 39 00:02:12,280 --> 00:02:14,940 No one server is running the query. 40 00:02:14,940 --> 00:02:17,160 It's actually being run in parallel 41 00:02:17,160 --> 00:02:21,863 across those many nodes, each using columnar storage. 42 00:02:22,750 --> 00:02:26,943 And so with Amazon Redshift, keep in mind that Redshift 43 00:02:26,943 --> 00:02:31,790 is not meant for online transactional processing. 44 00:02:31,790 --> 00:02:35,110 It's ideal for online analytical processing. 45 00:02:35,110 --> 00:02:38,570 So, many workloads such as eCommerce, for example, 46 00:02:38,570 --> 00:02:42,120 where you do have transactional processing happening, 47 00:02:42,120 --> 00:02:44,730 for the transactional processing you may want to use 48 00:02:44,730 --> 00:02:49,580 something like RDS or Aurora or even DynamoDB. 49 00:02:49,580 --> 00:02:53,750 But then when later you want to run analytics on that data, 50 00:02:53,750 --> 00:02:56,870 then you might look to Redshift, because it's ideal 51 00:02:56,870 --> 00:02:59,440 for online analytical processing, 52 00:02:59,440 --> 00:03:01,870 as opposed to transactional processing. 53 00:03:01,870 --> 00:03:06,120 So again, if you have a large, very large data set 54 00:03:06,120 --> 00:03:08,540 and perhaps maybe you have smaller data sets 55 00:03:08,540 --> 00:03:12,010 that are spread out across numerous databases. 56 00:03:12,010 --> 00:03:14,870 Perhaps you have small data sets 57 00:03:14,870 --> 00:03:18,600 that are in NoSQL and relational and graph databases 58 00:03:18,600 --> 00:03:22,940 and you want to pull all of that in into one place, 59 00:03:22,940 --> 00:03:25,930 so that you can run complex ad hoc queries 60 00:03:25,930 --> 00:03:30,183 across vast data sets, take a look at Amazon Redshift.