1 00:00:07,350 --> 00:00:09,190 - [narrator] Web Log Analysis is actually 2 00:00:09,190 --> 00:00:13,220 the process to parse and analyze a log file 3 00:00:13,220 --> 00:00:14,840 from a web server. 4 00:00:14,840 --> 00:00:18,770 And you can actually use this to know about 5 00:00:18,770 --> 00:00:23,770 indicators about when, how, by whom a web server 6 00:00:24,060 --> 00:00:27,310 or website is actually visited. 7 00:00:27,310 --> 00:00:29,660 Now the data extracted from the log files 8 00:00:29,660 --> 00:00:34,320 can also be stored in a database allowing for many reports 9 00:00:34,320 --> 00:00:36,203 to actually be generated on demand. 10 00:00:37,290 --> 00:00:40,220 There are several security event management platforms 11 00:00:40,220 --> 00:00:43,350 like Splunk and Graylog that can 12 00:00:43,350 --> 00:00:46,130 collect and process web server logs. 13 00:00:46,130 --> 00:00:49,890 And they actually do a lot of Discord type 14 00:00:49,890 --> 00:00:53,560 of correlation for you in a very effective manner. 15 00:00:53,560 --> 00:00:56,760 Now features supported by log analysis packages 16 00:00:56,760 --> 00:00:59,210 may include things like "hit filters," 17 00:00:59,210 --> 00:01:02,050 which actually use pattern matching to 18 00:01:02,050 --> 00:01:05,470 examine the selected log data. 19 00:01:05,470 --> 00:01:09,189 Now there's different type of a log data that includes 20 00:01:09,189 --> 00:01:12,170 things like the number of visits and number 21 00:01:12,170 --> 00:01:15,210 of unique visitors, the visit duration, 22 00:01:15,210 --> 00:01:18,540 and when was actually the last time that 23 00:01:18,540 --> 00:01:23,540 the user or web service was actually visited. 24 00:01:24,120 --> 00:01:27,290 And what are the actually authenticated users? 25 00:01:27,290 --> 00:01:30,000 When they were actually last authenticated 26 00:01:30,000 --> 00:01:31,800 to the system. 27 00:01:31,800 --> 00:01:36,290 The days of the week and rush hours of that web server. 28 00:01:36,290 --> 00:01:40,623 The number of webpages that actually were viewed. 29 00:01:41,720 --> 00:01:45,480 The most viewed entries and exit pages 30 00:01:45,480 --> 00:01:48,430 and also file types within the web server. 31 00:01:48,430 --> 00:01:51,890 Also the operating systems and the browsers 32 00:01:51,890 --> 00:01:53,683 that are used by the users, right? 33 00:01:55,309 --> 00:01:59,450 The HTTP referrers and search engine, key phrases 34 00:01:59,450 --> 00:02:03,460 and keywords used to actually find the analyzed website, 35 00:02:03,460 --> 00:02:05,823 and many other things like HTTP errors. 36 00:02:06,760 --> 00:02:10,200 Now, two very popular web servers are 37 00:02:10,200 --> 00:02:14,510 the Apache HTTP server and NGiNX. 38 00:02:14,510 --> 00:02:19,510 Now the Apache HTTP server provides a very comprehensive 39 00:02:19,750 --> 00:02:23,080 and flexible logging capabilities. 40 00:02:23,080 --> 00:02:26,810 NGiNX also provides a very similar log structure and 41 00:02:26,810 --> 00:02:31,787 by default NGiNX and Apache writes activities 42 00:02:33,510 --> 00:02:38,320 and information into two types of logs, the error log 43 00:02:38,320 --> 00:02:40,210 and the access log. 44 00:02:40,210 --> 00:02:43,080 The error log is where the web server records 45 00:02:43,080 --> 00:02:44,260 any type of errors. 46 00:02:44,260 --> 00:02:48,010 Anything that it doesn't think that is actually quite right. 47 00:02:48,010 --> 00:02:50,940 Sometimes you will also see warnings in 48 00:02:50,940 --> 00:02:54,710 there that don't indicate a problem, but advise you 49 00:02:54,710 --> 00:02:59,070 that a particular event or configuration may actually cause 50 00:02:59,070 --> 00:03:02,993 some problems later within the system. 51 00:03:03,940 --> 00:03:07,470 Now the access log is where your web server records 52 00:03:07,470 --> 00:03:10,110 all the visitors to your site. 53 00:03:10,110 --> 00:03:13,040 And there you can actually see what files the user is 54 00:03:13,040 --> 00:03:17,890 actually accessing, how the web server responded to request, 55 00:03:17,890 --> 00:03:22,110 and other information, like what kind of web browsers 56 00:03:22,110 --> 00:03:24,340 the visitors are using and also 57 00:03:24,340 --> 00:03:28,183 the operating systems of those systems. 58 00:03:29,060 --> 00:03:31,390 Now, before you can read your logs 59 00:03:31,390 --> 00:03:32,950 you'll need to find them, right? 60 00:03:32,950 --> 00:03:35,180 You need to know where those logs are. 61 00:03:35,180 --> 00:03:40,180 So to find the error log, look in your main NGiNX 62 00:03:40,500 --> 00:03:43,090 or Apache HTTP config files. 63 00:03:43,090 --> 00:03:47,500 However, by default, all the logs typically are stored 64 00:03:47,500 --> 00:03:51,040 under VAR log in your system. 65 00:03:51,040 --> 00:03:55,080 So under the /var/log directory. 66 00:03:55,080 --> 00:03:57,390 Here's an example of the Apache logs. 67 00:03:57,390 --> 00:04:01,080 You can see the access and the error logs here. 68 00:04:01,080 --> 00:04:04,000 And as we mentioned before, in the access log 69 00:04:04,000 --> 00:04:07,920 you can see what files the users are accessing, 70 00:04:07,920 --> 00:04:11,140 how the web server responded to request, 71 00:04:11,140 --> 00:04:15,330 and other types of information, like what type of 72 00:04:15,330 --> 00:04:19,053 web browsers the visitors are actually using.