1 00:00:06,860 --> 00:00:10,540 - Welcome, in this demonstration we are going 2 00:00:10,540 --> 00:00:14,310 to create an Athena table that will allow us 3 00:00:14,310 --> 00:00:19,310 to query cloudtrail logs that are located in an S3 bucket. 4 00:00:20,710 --> 00:00:23,680 And there are a number of different pieces 5 00:00:23,680 --> 00:00:27,710 that have to be in place for this to work properly. 6 00:00:27,710 --> 00:00:30,130 The first of these is that we are going to 7 00:00:30,130 --> 00:00:35,130 need an S3 bucket created for our query results. 8 00:00:35,370 --> 00:00:37,760 And so we're gonna go ahead and do that first. 9 00:00:37,760 --> 00:00:40,310 We're gonna go to the S3 service 10 00:00:40,310 --> 00:00:43,003 and we're gonna go ahead and create a bucket. 11 00:00:43,860 --> 00:00:46,470 And we are going to make sure that the bucket 12 00:00:46,470 --> 00:00:49,690 is in the same region as the cloudtrail logs, 13 00:00:49,690 --> 00:00:51,453 which are in US East one. 14 00:00:52,350 --> 00:00:54,480 And we're gonna pick a bucket name 15 00:00:54,480 --> 00:00:59,480 and I will call this "Live Lessons Athena Query Results" 16 00:01:01,460 --> 00:01:04,250 and we're gonna accept most of the defaults. 17 00:01:04,250 --> 00:01:07,330 We're gonna leave object ownership, the defaults, 18 00:01:07,330 --> 00:01:09,480 block public accesses, the defaults, 19 00:01:09,480 --> 00:01:11,630 bucket versioning off. 20 00:01:11,630 --> 00:01:13,240 We'll add a tag to the bucket. 21 00:01:13,240 --> 00:01:14,973 We'll give it a cost center tag. 22 00:01:17,090 --> 00:01:18,910 Default encryption we'll enable 23 00:01:18,910 --> 00:01:22,470 and just allow Amazon Managed Keys 24 00:01:22,470 --> 00:01:23,870 and we'll create the bucket. 25 00:01:26,380 --> 00:01:31,380 So now we have a destination for our query's results. 26 00:01:33,300 --> 00:01:37,490 Now we need to be able to create an Athena table 27 00:01:37,490 --> 00:01:40,530 that will be placed in the data catalog. 28 00:01:40,530 --> 00:01:45,530 We could go to the Athena service and do this directly. 29 00:01:46,060 --> 00:01:48,890 We could go to explore the query editor 30 00:01:48,890 --> 00:01:51,720 and we could scroll down here on the left hand side, 31 00:01:51,720 --> 00:01:52,740 under data catalog, 32 00:01:52,740 --> 00:01:56,560 and realize that we don't have any databases. 33 00:01:56,560 --> 00:01:58,640 We don't have any tables. 34 00:01:58,640 --> 00:02:00,890 And so we could click here on create a table 35 00:02:00,890 --> 00:02:02,540 from a data source, 36 00:02:02,540 --> 00:02:04,770 but then it would require us 37 00:02:04,770 --> 00:02:09,270 to define all of the information that was part 38 00:02:09,270 --> 00:02:12,190 of that data set, including the data format. 39 00:02:12,190 --> 00:02:16,420 But there's an easier way to do this for cloudtrail logs. 40 00:02:16,420 --> 00:02:20,163 So we can cancel out of that entirely. 41 00:02:21,360 --> 00:02:24,310 Go back to our Athena Query Editor main page 42 00:02:24,310 --> 00:02:26,640 and leave that there. 43 00:02:26,640 --> 00:02:30,140 We are now going to skip over to the cloudtrail dashboard. 44 00:02:30,140 --> 00:02:32,593 Now, assuming that this has been enabled, 45 00:02:33,470 --> 00:02:35,560 you can check in your own account 46 00:02:35,560 --> 00:02:37,370 by going to the dashboard 47 00:02:37,370 --> 00:02:40,450 and see if you have a trail created. 48 00:02:40,450 --> 00:02:43,653 I have one called cloudtrail Write Demo. 49 00:02:44,490 --> 00:02:46,390 Now, assuming that you do have one, 50 00:02:46,390 --> 00:02:48,040 and if not you can enable that 51 00:02:48,040 --> 00:02:51,660 by creating an account wide trail pretty easily. 52 00:02:51,660 --> 00:02:54,410 We're actually gonna talk about this in a later lesson. 53 00:02:55,520 --> 00:02:57,970 We can then go to event history, 54 00:02:57,970 --> 00:02:59,930 and in the event dashboard 55 00:02:59,930 --> 00:03:03,770 we have the ability to do some basic searching here, 56 00:03:03,770 --> 00:03:05,860 but it's not very sophisticated. 57 00:03:05,860 --> 00:03:09,860 And if you notice at the top right, look at that. 58 00:03:09,860 --> 00:03:12,120 Create Athena table. 59 00:03:12,120 --> 00:03:13,800 This is a great example 60 00:03:13,800 --> 00:03:18,320 of how AWS is supposed to be used as an ecosystem 61 00:03:18,320 --> 00:03:21,953 rather than just a bunch of independent parts. 62 00:03:22,860 --> 00:03:24,990 So when we click here, 63 00:03:24,990 --> 00:03:27,570 it's gonna ask us for the storage location. 64 00:03:27,570 --> 00:03:32,570 So the S3 bucket that contains the actual cloudtrail files. 65 00:03:33,170 --> 00:03:37,130 So that's easy, right there in the dropdown. 66 00:03:37,130 --> 00:03:41,260 And then it's going to fill out all of that information 67 00:03:41,260 --> 00:03:44,690 for creating the table on our behalf. 68 00:03:44,690 --> 00:03:46,740 So it's gonna create the external table. 69 00:03:46,740 --> 00:03:48,940 We have all of the different fields 70 00:03:48,940 --> 00:03:50,860 that have been listed in there. 71 00:03:50,860 --> 00:03:54,703 And now all we have to do is click, Create Table. 72 00:03:56,820 --> 00:03:59,540 And now that that's done, 73 00:03:59,540 --> 00:04:04,270 we can now go back to the Athena dashboard. 74 00:04:04,270 --> 00:04:06,963 We'll do a quick refresh. 75 00:04:08,190 --> 00:04:13,190 And from here, we can then scroll down a little bit further. 76 00:04:13,320 --> 00:04:16,660 We see that we have a default database 77 00:04:16,660 --> 00:04:18,473 that did not exist before. 78 00:04:19,500 --> 00:04:21,630 We can go down a little bit further 79 00:04:21,630 --> 00:04:24,040 and see we have tables and views 80 00:04:24,040 --> 00:04:28,340 and under tables, we have one right here 81 00:04:28,340 --> 00:04:30,450 called, cloudtrail Logs 82 00:04:30,450 --> 00:04:32,050 brightkey_cloudtrail_write_demo. 83 00:04:33,000 --> 00:04:36,400 We have all of the information there in the table 84 00:04:36,400 --> 00:04:41,400 and the schema so that we can then perform an actual query. 85 00:04:43,850 --> 00:04:48,850 But first says here, before you run your first query, 86 00:04:49,010 --> 00:04:52,270 you need to set up a query result location in S3. 87 00:04:52,270 --> 00:04:53,780 We created the bucket. 88 00:04:53,780 --> 00:04:55,773 Now we need to change our settings. 89 00:04:56,640 --> 00:04:58,100 So we can go ahead and click here. 90 00:04:58,100 --> 00:05:01,533 We can see that there's no query result location. 91 00:05:02,510 --> 00:05:04,750 And so we click on Manage, 92 00:05:04,750 --> 00:05:09,130 and we can browse S3 to find, 93 00:05:09,130 --> 00:05:11,110 let's look for Athena. 94 00:05:11,110 --> 00:05:12,173 There we go. 95 00:05:13,370 --> 00:05:16,483 Live lessons, Athena query results. 96 00:05:17,850 --> 00:05:20,310 So we can then choose that. 97 00:05:20,310 --> 00:05:22,480 We already know that it's the same account 98 00:05:22,480 --> 00:05:25,110 so we don't need to make any changes there. 99 00:05:25,110 --> 00:05:30,110 We can enable query result encryption, if we like, 100 00:05:30,250 --> 00:05:32,160 as well as bucket owner, full control. 101 00:05:32,160 --> 00:05:34,363 But since it's all in the same account, 102 00:05:35,240 --> 00:05:36,923 we can just go ahead and save. 103 00:05:38,300 --> 00:05:42,190 So now we can go back to our query editor 104 00:05:43,100 --> 00:05:45,360 and we can create a query. 105 00:05:45,360 --> 00:05:49,630 Now I'm gonna provide a couple of example queries. 106 00:05:49,630 --> 00:05:54,020 And let me go ahead and copy the text. 107 00:05:54,020 --> 00:05:55,283 And we'll paste this in. 108 00:05:56,150 --> 00:05:59,520 We'll see here, select user identity, username, 109 00:05:59,520 --> 00:06:02,210 source IP event time and additional event data. 110 00:06:02,210 --> 00:06:06,940 What we're gonna do is we're gonna query all AWS logins. 111 00:06:06,940 --> 00:06:09,300 Says where event name equals console login, 112 00:06:09,300 --> 00:06:14,110 starting in January of 2022. 113 00:06:14,110 --> 00:06:16,350 The only piece that we have to change 114 00:06:16,350 --> 00:06:19,310 is right here from your table name. 115 00:06:19,310 --> 00:06:23,570 So we can grab the name of this table right here 116 00:06:25,490 --> 00:06:27,223 and paste that in instead. 117 00:06:31,100 --> 00:06:34,320 And all we have to do at this point is click run. 118 00:06:34,320 --> 00:06:36,780 And it is now scanning 119 00:06:36,780 --> 00:06:39,770 all of the various cloudtrail logs 120 00:06:39,770 --> 00:06:43,960 to locate those console login events. 121 00:06:43,960 --> 00:06:48,000 And you can see that it is taking a little bit of time. 122 00:06:48,000 --> 00:06:53,000 And because I didn't identify anything farther in 123 00:06:53,230 --> 00:06:57,170 to the S3 bucket than just the name of the bucket 124 00:06:57,170 --> 00:07:02,170 it's gonna scan all of my cloudtrail logs. 125 00:07:02,240 --> 00:07:05,080 And that might take a little bit of time. 126 00:07:05,080 --> 00:07:07,990 And you can see that as it's going along here 127 00:07:07,990 --> 00:07:09,840 it's updating the run time. 128 00:07:09,840 --> 00:07:12,840 It's also updating the data scanned 129 00:07:13,770 --> 00:07:17,320 and you can see that that number keeps going up. 130 00:07:17,320 --> 00:07:22,320 Athena as a service is relatively expensive, 131 00:07:23,460 --> 00:07:26,133 if you're not optimizing your queries. 132 00:07:27,530 --> 00:07:29,810 And so while this runs, I'm gonna pull up 133 00:07:29,810 --> 00:07:34,450 another example of a query that we can execute. 134 00:07:34,450 --> 00:07:36,880 And this one's a little bit different. 135 00:07:36,880 --> 00:07:41,880 This one is going to select all failed operations 136 00:07:43,560 --> 00:07:47,523 based on permissions from the beginning of 2021. 137 00:07:48,450 --> 00:07:51,290 Now with this query that's executing 138 00:07:51,290 --> 00:07:53,660 we could just choose to cancel, oh, there we go. 139 00:07:53,660 --> 00:07:57,870 So it finished and we have 64 different logins 140 00:07:59,120 --> 00:08:01,970 that have happened since the beginning of 2022. 141 00:08:01,970 --> 00:08:05,023 And it gives all the information on these. 142 00:08:06,950 --> 00:08:09,343 So let's try one more query. 143 00:08:12,850 --> 00:08:17,090 For this one, we're gonna grab all the failed operations. 144 00:08:17,090 --> 00:08:19,610 See this, where error code like denied 145 00:08:19,610 --> 00:08:22,570 or error code like unauthorized. 146 00:08:22,570 --> 00:08:27,220 The only thing we need to do is change the table name again 147 00:08:31,470 --> 00:08:34,920 Paste that in and execute. 148 00:08:34,920 --> 00:08:39,440 Now notice that previous one scanned 1.49 gigabytes of data. 149 00:08:39,440 --> 00:08:41,760 That might not sound like a lot, 150 00:08:41,760 --> 00:08:46,760 but what if you have an S3 bucket that has petabytes of data 151 00:08:47,670 --> 00:08:50,920 and you're scanning the entire thing for every query 152 00:08:50,920 --> 00:08:54,483 that's gonna be extraordinarily expensive. 153 00:08:56,720 --> 00:08:58,480 So with this one, when we click run, 154 00:08:58,480 --> 00:09:00,210 this one would take a little bit longer 155 00:09:00,210 --> 00:09:01,750 because it's going all the way back 156 00:09:01,750 --> 00:09:04,000 to the beginning of 2021. 157 00:09:04,000 --> 00:09:08,250 But notice the data scanned going up pretty quickly. 158 00:09:08,250 --> 00:09:11,240 So we could let this sit until it completes 159 00:09:11,240 --> 00:09:13,370 but when it's done, it's gonna give us the results 160 00:09:13,370 --> 00:09:16,430 of all of the various failed operations 161 00:09:16,430 --> 00:09:21,430 since the beginning of January 1st, 2021. 162 00:09:22,090 --> 00:09:24,083 And that completes this demonstration.