1
00:00:07,300 --> 00:00:10,493
- Now let's review an
introduction to Amazon Redshift.

2
00:00:11,360 --> 00:00:16,360
With Amazon Redshift we gain a
fully-managed data warehouse,

3
00:00:16,360 --> 00:00:18,730
a petabyte scale data warehouse.

4
00:00:18,730 --> 00:00:21,170
Upwards of two petabytes can run

5
00:00:21,170 --> 00:00:23,930
in a single cluster
within Amazon Redshift.

6
00:00:23,930 --> 00:00:27,873
And then of course Redshift
is based on Postgres 8.0.2.

7
00:00:29,200 --> 00:00:33,930
And so AWS took a fork of Postgres 8.0.2

8
00:00:33,930 --> 00:00:35,980
and made their own modifications

9
00:00:35,980 --> 00:00:38,900
in order to support certain
types of technologies

10
00:00:38,900 --> 00:00:43,320
and allow us to run complex ad hoc queries

11
00:00:43,320 --> 00:00:46,473
against those petabyte-sized data sets.

12
00:00:47,640 --> 00:00:51,890
Now because it is a
fork of Postgres 8.0.2,

13
00:00:51,890 --> 00:00:53,590
it is SQL compliant.

14
00:00:53,590 --> 00:00:57,350
It supports all of the data
types and the SQL language

15
00:00:57,350 --> 00:00:59,203
that was supported by Postgres 8.0.2.

16
00:01:00,130 --> 00:01:02,340
And of course, because of that,

17
00:01:02,340 --> 00:01:04,900
you can connect to a Redshift cluster

18
00:01:04,900 --> 00:01:07,860
using existing business intelligence tools

19
00:01:07,860 --> 00:01:11,300
that support JDBC and ODBC drivers.

20
00:01:11,300 --> 00:01:14,850
Now, one of the things that
make Redshift really performant

21
00:01:14,850 --> 00:01:17,050
is that AWS took that
fork of Postgres 8.0.2

22
00:01:19,270 --> 00:01:21,770
and they added columnar storage.

23
00:01:21,770 --> 00:01:24,890
Columnar storage as opposed
to row-based storage

24
00:01:24,890 --> 00:01:26,640
makes much more efficient use

25
00:01:26,640 --> 00:01:29,050
of the actual underlying storage

26
00:01:29,050 --> 00:01:33,340
as well as more efficient use
of input/output operations.

27
00:01:33,340 --> 00:01:36,910
So we can retrieve a
similar amount of data

28
00:01:36,910 --> 00:01:40,730
from columnar storage
with fewer I/O operations,

29
00:01:40,730 --> 00:01:43,243
as opposed to using row-based storage.

30
00:01:43,243 --> 00:01:45,980
We can make more efficient use of memory

31
00:01:45,980 --> 00:01:49,210
by using columnar storage as well.

32
00:01:49,210 --> 00:01:51,570
And so, along with columnar storage,

33
00:01:51,570 --> 00:01:54,670
Redshift also added parallel queries.

34
00:01:54,670 --> 00:01:59,670
So with Redshift, we can
create clusters of nodes

35
00:01:59,980 --> 00:02:03,480
so that our queries are
being run in parallel

36
00:02:03,480 --> 00:02:05,110
across all of those nodes.

37
00:02:05,110 --> 00:02:07,760
That is one of the ways
that we can achieve

38
00:02:07,760 --> 00:02:12,280
very fast results across
very large data sets.

39
00:02:12,280 --> 00:02:14,940
No one server is running the query.

40
00:02:14,940 --> 00:02:17,160
It's actually being run in parallel

41
00:02:17,160 --> 00:02:21,863
across those many nodes,
each using columnar storage.

42
00:02:22,750 --> 00:02:26,943
And so with Amazon Redshift,
keep in mind that Redshift

43
00:02:26,943 --> 00:02:31,790
is not meant for online
transactional processing.

44
00:02:31,790 --> 00:02:35,110
It's ideal for online
analytical processing.

45
00:02:35,110 --> 00:02:38,570
So, many workloads such
as eCommerce, for example,

46
00:02:38,570 --> 00:02:42,120
where you do have transactional
processing happening,

47
00:02:42,120 --> 00:02:44,730
for the transactional
processing you may want to use

48
00:02:44,730 --> 00:02:49,580
something like RDS or
Aurora or even DynamoDB.

49
00:02:49,580 --> 00:02:53,750
But then when later you want
to run analytics on that data,

50
00:02:53,750 --> 00:02:56,870
then you might look to
Redshift, because it's ideal

51
00:02:56,870 --> 00:02:59,440
for online analytical processing,

52
00:02:59,440 --> 00:03:01,870
as opposed to transactional processing.

53
00:03:01,870 --> 00:03:06,120
So again, if you have a
large, very large data set

54
00:03:06,120 --> 00:03:08,540
and perhaps maybe you
have smaller data sets

55
00:03:08,540 --> 00:03:12,010
that are spread out
across numerous databases.

56
00:03:12,010 --> 00:03:14,870
Perhaps you have small data sets

57
00:03:14,870 --> 00:03:18,600
that are in NoSQL and
relational and graph databases

58
00:03:18,600 --> 00:03:22,940
and you want to pull all
of that in into one place,

59
00:03:22,940 --> 00:03:25,930
so that you can run complex ad hoc queries

60
00:03:25,930 --> 00:03:30,183
across vast data sets, take
a look at Amazon Redshift.