1 00:00:06,703 --> 00:00:09,961 - Now let's talk about AWS Data Pipeline. 2 00:00:09,961 --> 00:00:13,961 AWS Data Pipeline is another very useful service 3 00:00:15,339 --> 00:00:19,573 that helps us move data from one service to another. 4 00:00:19,573 --> 00:00:22,263 Perhaps we have data in S3 that we wanna get 5 00:00:22,263 --> 00:00:25,763 into Redshift, or we have data in DynamoDB 6 00:00:26,742 --> 00:00:29,242 that we want to copy into RDS, 7 00:00:30,207 --> 00:00:32,061 but they're different data types. 8 00:00:32,061 --> 00:00:33,611 They're stored in different ways, 9 00:00:33,611 --> 00:00:36,048 so we need to transform that data 10 00:00:36,048 --> 00:00:39,097 from one data type to another data type, 11 00:00:39,097 --> 00:00:41,437 and Data Pipeline can do that as well. 12 00:00:41,437 --> 00:00:45,155 So again it helps us move data between various services, 13 00:00:45,155 --> 00:00:48,358 including those outside of Amazon web services 14 00:00:48,358 --> 00:00:49,967 such as on-premises. 15 00:00:49,967 --> 00:00:54,933 Maybe we have data stored in some type of a local database. 16 00:00:54,933 --> 00:00:58,895 Perhaps it's Oracle or an on-premises sequel server, 17 00:00:58,895 --> 00:01:01,275 and we want to run queries on that data, 18 00:01:01,275 --> 00:01:05,442 and then store that data somewhere else within AWS. 19 00:01:06,362 --> 00:01:10,683 So we can use Data Pipeline to execute sequel queries 20 00:01:10,683 --> 00:01:14,003 and then take the result of that and store it somewhere. 21 00:01:14,003 --> 00:01:17,907 We can use Data Pipeline to execute custom applications. 22 00:01:17,907 --> 00:01:21,921 So perhaps we have custom applications running on EC2 23 00:01:21,921 --> 00:01:24,338 that pull data from somewhere 24 00:01:25,494 --> 00:01:29,255 or generate some kind of data from a external API. 25 00:01:29,255 --> 00:01:31,549 Does whatever processing that it does, 26 00:01:31,549 --> 00:01:34,236 and then Data Pipeline can facilitate 27 00:01:34,236 --> 00:01:37,266 the moving of that data from our application 28 00:01:37,266 --> 00:01:40,766 to, perhaps, Redshift or back to DynamoDB. 29 00:01:42,024 --> 00:01:45,182 We also get the ability to schedule these operations. 30 00:01:45,182 --> 00:01:49,219 So perhaps we want to, every day, move data 31 00:01:49,219 --> 00:01:52,552 from a DynamoDB daily table to Redshift, 32 00:01:54,602 --> 00:01:58,391 and then perform daily aggregates and the like. 33 00:01:58,391 --> 00:02:00,568 It does have built-in air handling, 34 00:02:00,568 --> 00:02:02,996 and then of course the execution logs 35 00:02:02,996 --> 00:02:05,804 of whatever happened will be stored in S3 36 00:02:05,804 --> 00:02:07,856 so that we can go back and analyze, 37 00:02:07,856 --> 00:02:09,132 perhaps something went wrong. 38 00:02:09,132 --> 00:02:10,419 We can go back and look 39 00:02:10,419 --> 00:02:12,102 and do our troubleshooting from there, 40 00:02:12,102 --> 00:02:13,769 or we can just look at those logs 41 00:02:13,769 --> 00:02:18,080 and see yes, indeed, things, our data was pulled 42 00:02:18,080 --> 00:02:19,292 from DynamoDB. 43 00:02:19,292 --> 00:02:21,849 Yes, it was written into Redshift. 44 00:02:21,849 --> 00:02:24,250 So you can see that AWS Data Pipeline 45 00:02:24,250 --> 00:02:28,404 is yet another very powerful, very flexible tool 46 00:02:28,404 --> 00:02:32,793 that can help us move data from one place to another. 47 00:02:32,793 --> 00:02:34,593 It can help us transform that data 48 00:02:34,593 --> 00:02:36,628 from one data type to another, 49 00:02:36,628 --> 00:02:39,075 and so it's a very useful service, 50 00:02:39,075 --> 00:02:41,575 and that is AWS Data Pipeline.