1 00:00:00,000 --> 00:00:01,570 [No audio] 2 00:00:01,570 --> 00:00:03,178 System is running slow. 3 00:00:03,334 --> 00:00:05,278 Okay, this is the lecture where I'm 4 00:00:05,314 --> 00:00:07,378 going to cover a lot of commands. 5 00:00:07,414 --> 00:00:10,758 So sit tight and make sure you are ready to 6 00:00:10,784 --> 00:00:14,826 go over all the commands because many many many times 7 00:00:14,948 --> 00:00:17,610 you have this issue and people say system is running 8 00:00:17,660 --> 00:00:21,666 slow and you are being reported this problem by someone 9 00:00:21,728 --> 00:00:24,978 else either in your group or somewhere else or actually 10 00:00:25,064 --> 00:00:27,954 you find out your own system is running slow and 11 00:00:27,992 --> 00:00:30,140 what are the things that you have to do? 12 00:00:30,650 --> 00:00:33,210 Well, first of all, you need to understand the 13 00:00:33,259 --> 00:00:36,066 problem. As I discussed that earlier as well, 14 00:00:36,188 --> 00:00:38,022 understanding the problem, where exactly 15 00:00:38,096 --> 00:00:39,306 is it coming from? 16 00:00:39,428 --> 00:00:42,066 This is the more important thing before 17 00:00:42,128 --> 00:00:45,066 you actually start troubleshooting any issue. 18 00:00:45,248 --> 00:00:48,234 So you need to find out whether it's process 19 00:00:48,332 --> 00:00:51,430 related, your processor is the one that's running slow 20 00:00:51,610 --> 00:00:55,050 or what exactly the problem is with the disk, 21 00:00:55,050 --> 00:00:58,026 is that the disk is causing the issue. 22 00:00:58,088 --> 00:01:03,226 Maybe you have external disk where your system is writing 23 00:01:03,358 --> 00:01:06,522 or you even have your internal disk that probably one 24 00:01:06,536 --> 00:01:10,210 of the disk has issues or bad blocks or degraded. 25 00:01:10,330 --> 00:01:12,102 You need to find that out. 26 00:01:12,236 --> 00:01:13,986 After that you have to look into 27 00:01:14,048 --> 00:01:17,190 the networking tool like file transfer. 28 00:01:17,300 --> 00:01:20,346 Like if someone is transferring files from your system 29 00:01:20,408 --> 00:01:23,922 to another system or anywhere within the network, you 30 00:01:23,936 --> 00:01:27,394 have to find out whether that transfer is causing 31 00:01:27,442 --> 00:01:31,726 by processing, disk issue, or networking itself, or memory 32 00:01:31,798 --> 00:01:33,426 that is actually in use. 33 00:01:33,548 --> 00:01:36,680 So memory will be the part of the processing as well. 34 00:01:37,370 --> 00:01:38,814 Then at the end we have to 35 00:01:38,852 --> 00:01:41,362 know whether it is hardware related. 36 00:01:41,446 --> 00:01:43,314 So these are the few things you have 37 00:01:43,352 --> 00:01:45,414 to keep in mind whether it's processing or 38 00:01:45,452 --> 00:01:50,550 memory, disk issues, networking issue or hardware issue. 39 00:01:50,660 --> 00:01:53,598 So we have different tools for every one of 40 00:01:53,624 --> 00:01:59,982 them when we are troubleshooting the system slowness issues. 41 00:02:00,176 --> 00:02:02,518 So we'll talk about the troubleshooting steps 42 00:02:02,554 --> 00:02:05,934 now. The troubleshooting step, the first thing is 43 00:02:05,972 --> 00:02:09,370 always check if the right system is reported 44 00:02:09,430 --> 00:02:11,479 or you are on the right system. 45 00:02:11,810 --> 00:02:15,750 Yes, when someone reports there's a problem with system 46 00:02:15,860 --> 00:02:22,270 ABC and in fact it's not ABC, it's XYZ. 47 00:02:22,450 --> 00:02:24,066 So you need to find out what 48 00:02:24,128 --> 00:02:27,294 system they are reporting the problem on. 49 00:02:27,452 --> 00:02:29,362 Whether it's a database server, it's 50 00:02:29,386 --> 00:02:31,390 a system server, a web server. 51 00:02:31,510 --> 00:02:33,894 Once you find that out, when you log into that 52 00:02:33,932 --> 00:02:37,398 system, of course always do hostname to check you 53 00:02:37,484 --> 00:02:40,062 are on the right system before you start troubleshooting and 54 00:02:40,076 --> 00:02:42,978 start digging into it because I don't want you to 55 00:02:43,004 --> 00:02:47,394 spend like hours, and at the end finding out you are 56 00:02:47,432 --> 00:02:48,980 not on the right system. 57 00:02:49,970 --> 00:02:53,274 Then we will check the disk space. 58 00:02:53,372 --> 00:02:56,134 The first thing you'll check is if the disk 59 00:02:56,182 --> 00:03:00,054 is full because most of the time when the 60 00:03:00,092 --> 00:03:03,318 disk is full it will cause problems for everything 61 00:03:03,404 --> 00:03:08,130 that is processing, memory, networking or disk related issues. 62 00:03:08,240 --> 00:03:11,218 Everything will actually end up at the disk. 63 00:03:11,254 --> 00:03:14,934 If your disk is 99% full, your memory, your 64 00:03:15,032 --> 00:03:17,598 processor cannot do anything at that point because they 65 00:03:17,624 --> 00:03:20,238 cannot write to the disk because it's full. 66 00:03:20,324 --> 00:03:22,170 So the commands that we'll learn is about 67 00:03:22,220 --> 00:03:24,822 df and du, which we already covered, but 68 00:03:24,836 --> 00:03:27,860 I'll quickly cover in this lecture as well. 69 00:03:28,430 --> 00:03:31,530 Then about the check the processing and 70 00:03:31,580 --> 00:03:33,522 memory of course, and the commands that 71 00:03:33,536 --> 00:03:36,658 we'll cover is top, free, lsmem, 72 00:03:36,754 --> 00:03:38,578 we'll check the /proc/meminfo 73 00:03:38,614 --> 00:03:39,930 that is currently in use. 74 00:03:40,040 --> 00:03:44,898 We'll check them vmstat, pmap which will 75 00:03:44,984 --> 00:03:49,594 tell me which process individual process is using how much memory, 76 00:03:49,642 --> 00:03:53,538 not just the entire system, dmidecode which 77 00:03:53,564 --> 00:03:55,990 will tell me some information about the hardware. 78 00:03:56,110 --> 00:03:59,386 What type of hardware I have installed 79 00:03:59,578 --> 00:04:03,210 or added to my system, lscpu. 80 00:04:03,890 --> 00:04:05,734 What the process is listed. 81 00:04:05,782 --> 00:04:06,750 How much it is used. 82 00:04:06,800 --> 00:04:09,034 How much it is used by other processes. 83 00:04:09,202 --> 00:04:11,910 Also the /proc/cpuinf you can use the same 84 00:04:11,960 --> 00:04:15,102 thing that will give you the same output as list 85 00:04:15,176 --> 00:04:19,685 CPU. Then we'll cover about the check on the disk issues. 86 00:04:19,808 --> 00:04:22,529 And the main command that I always use is 87 00:04:22,579 --> 00:04:27,514 iostat and I have every five second interval. 88 00:04:27,562 --> 00:04:30,922 So while you're troubleshooting issue, you could run the iostat 89 00:04:30,946 --> 00:04:35,082 and check the disk status, meaning what is going out, 90 00:04:35,216 --> 00:04:38,454 what is coming in to your system and what's the 91 00:04:38,552 --> 00:04:41,490 status on that traffic going in and out. 92 00:04:41,660 --> 00:04:45,138 Then we'll also cover the lsof which stands for list 93 00:04:45,224 --> 00:04:53,302 open file, any files that are up and open at 94 00:04:53,316 --> 00:04:56,978 this point, that is taking up the memory or CPU, 95 00:04:57,014 --> 00:04:58,846 what are those files that are open? 96 00:04:58,908 --> 00:05:04,030 We'll find those out that. That lsof command is also used 97 00:05:04,080 --> 00:05:09,034 sometimes when you're trying to unmount a file system and you 98 00:05:09,072 --> 00:05:12,070 cannot because it's telling you it's busy, but you don't know 99 00:05:12,120 --> 00:05:16,246 how it is busy, maybe there's a script in it, maybe 100 00:05:16,308 --> 00:05:19,198 there's a file in it, in the mounted file system 101 00:05:19,284 --> 00:05:22,946 that is used by another process or another user. 102 00:05:23,018 --> 00:05:25,798 So you will use lsof to find that 103 00:05:25,824 --> 00:05:27,840 out and you could kill that process. 104 00:05:28,830 --> 00:05:31,246 Then comes to the check networking port. 105 00:05:31,308 --> 00:05:33,814 So the networking commands that we will be using is 106 00:05:33,852 --> 00:05:37,270 the main one is the tcpdump with -i 107 00:05:37,320 --> 00:05:40,970 option I'll use on the interface, i stands for interface. 108 00:05:41,090 --> 00:05:45,502 Also at the same time we'll use lsof to see what 109 00:05:45,576 --> 00:05:50,402 is listening, what the port is open and we'll also grep 110 00:05:50,426 --> 00:05:53,710 for the listen what the files are open, what protocol, what 111 00:05:53,760 --> 00:05:56,640 sockets are open and what is it listening to. 112 00:05:57,210 --> 00:06:03,106 We'll use netstat -plnt, and by the way, netstat has 113 00:06:03,168 --> 00:06:07,022 been replaced in the new version of Linux system in CentOS 114 00:06:07,106 --> 00:06:11,386 and Redhat is replaced with ss as in Socket and 115 00:06:11,448 --> 00:06:14,834 same options can be used that were used in Netstat. 116 00:06:15,002 --> 00:06:18,718 Then there's another command iftop, which is very 117 00:06:18,804 --> 00:06:23,040 useful when you are troubleshooting network related issue. 118 00:06:23,430 --> 00:06:27,530 iftop does not come preinstalled with your operating 119 00:06:27,590 --> 00:06:31,630 system, you would have to actually install couple of 120 00:06:31,680 --> 00:06:36,082 packages like yum install epel-release, and then you 121 00:06:36,096 --> 00:06:40,814 will install yum install iftop. Then this command 122 00:06:40,862 --> 00:06:42,094 will be available to you. 123 00:06:42,132 --> 00:06:46,946 It's like an add on command, but it is very useful. 124 00:06:47,078 --> 00:06:50,470 That will tell you exactly what is coming 125 00:06:50,520 --> 00:06:53,002 in, what is going out, what is being 126 00:06:53,076 --> 00:06:56,700 exported, transferred, what is the transfer rate? 127 00:06:57,030 --> 00:06:58,920 If it's high, it's low. 128 00:06:59,490 --> 00:07:03,718 Then we'll also check the system uptime because the 129 00:07:03,744 --> 00:07:07,790 system is running slow, sometimes what happens system crashed 130 00:07:07,910 --> 00:07:10,654 and you don't know if it's crashed or not. 131 00:07:10,692 --> 00:07:12,562 So you should check whether the system 132 00:07:12,636 --> 00:07:16,582 just been booted up or not. 133 00:07:16,716 --> 00:07:19,450 So update will give you the time when it has 134 00:07:19,500 --> 00:07:22,642 been up and running and it will also give you 135 00:07:22,776 --> 00:07:26,506 the system load every 5, 10, and 15 minutes. 136 00:07:26,628 --> 00:07:28,378 That is pretty much the same way the 137 00:07:28,404 --> 00:07:30,850 load that you see on the top command. 138 00:07:31,530 --> 00:07:33,314 We will also check the logs. 139 00:07:33,362 --> 00:07:38,110 Log is one of the main resource that you 140 00:07:38,160 --> 00:07:42,142 have where you could actually go in and check 141 00:07:42,276 --> 00:07:45,034 if there is something wrong with your operating system, 142 00:07:45,132 --> 00:07:49,558 Kernel, memory, CPU, hardware, any type of errors that 143 00:07:49,584 --> 00:07:53,410 are reported will be reported in your logs. 144 00:07:53,790 --> 00:07:57,118 Then there are other things that you 145 00:07:57,144 --> 00:07:59,374 could check on the hardware side that 146 00:07:59,412 --> 00:08:01,426 are available to your operating system. 147 00:08:01,548 --> 00:08:04,466 And if you cannot get to the hardware 148 00:08:04,658 --> 00:08:07,930 issue, you need to actually go into the 149 00:08:07,980 --> 00:08:13,140 system console to actually find that information out. 150 00:08:13,530 --> 00:08:16,078 Then there are other tools that you could install. 151 00:08:16,164 --> 00:08:18,686 Again, these tools are not added 152 00:08:18,758 --> 00:08:21,010 when you install the operating system. 153 00:08:21,120 --> 00:08:24,518 You would have to install this using your yum command. 154 00:08:24,674 --> 00:08:29,890 And some of these tools are htop, iotop, IP Traffic 155 00:08:30,630 --> 00:08:34,477 PS account, it's related to the user account status. 156 00:08:34,634 --> 00:08:36,754 So these are the few commands if you want to 157 00:08:36,792 --> 00:08:40,905 use it, keep them handy, keep them in your mind. 158 00:08:41,028 --> 00:08:44,400 So these are the few commands that you need to know 159 00:08:44,730 --> 00:08:49,774 for being a good system administrator, how you will use them, 160 00:08:49,871 --> 00:08:52,857 how you will run them in the system, and when you 161 00:08:52,884 --> 00:08:56,506 get the output, how are you going to read them. 162 00:08:56,688 --> 00:09:01,042 So let's log into my Linux machine and I will 163 00:09:01,116 --> 00:09:04,560 show you each one of these commands one by one 164 00:09:05,130 --> 00:09:07,978 and hopefully we'll cover pretty much all of them. 165 00:09:08,124 --> 00:09:10,414 So, as I said, first check you are 166 00:09:10,452 --> 00:09:12,442 on the right system, or the system that 167 00:09:12,456 --> 00:09:15,298 is reported to you is the right system. 168 00:09:15,444 --> 00:09:18,598 So for that you always run the hostname which will 169 00:09:18,624 --> 00:09:21,118 tell you oh yes, this is the right host, 170 00:09:21,204 --> 00:09:25,630 because many times in a bigger organization you have servers, 171 00:09:26,010 --> 00:09:29,520 10-20 servers using the same hostname, but they have 172 00:09:29,520 --> 00:09:33,418 01, 02 or incremental numbers at the end. 173 00:09:33,504 --> 00:09:38,340 So just make sure you are logged into the right system. 174 00:09:39,630 --> 00:09:41,460 Then check the disk space. 175 00:09:42,270 --> 00:09:44,782 That's the first thing you should always do. 176 00:09:44,916 --> 00:09:48,180 And to run that you do df -h. 177 00:09:48,870 --> 00:09:52,454 So df -h, h for human readable. 178 00:09:52,622 --> 00:09:55,618 So this is going to tell you the file system on 179 00:09:55,644 --> 00:09:59,470 the left side, the size, what is used, what is available 180 00:09:59,580 --> 00:10:04,634 in percentage and what is it mounted on. All the tmpfs 181 00:10:04,742 --> 00:10:08,100 can be ignored because these are the memory files system. 182 00:10:08,550 --> 00:10:11,810 You could run df -h and you could grep minus 183 00:10:11,870 --> 00:10:16,070 v to exclude tmpfs to exactly see the status. 184 00:10:16,190 --> 00:10:19,030 So if you look at my system status, I 185 00:10:19,080 --> 00:10:21,670 have a pretty healthy system in terms of disk. 186 00:10:22,230 --> 00:10:26,734 The root is at 59%, sd1 is 187 00:10:26,772 --> 00:10:28,790 at one and the boot is 17. 188 00:10:28,910 --> 00:10:31,174 So the system at the disk level, 189 00:10:31,332 --> 00:10:33,658 the usage of the disk looks okay. 190 00:10:33,804 --> 00:10:38,662 If you see anything here that says 99% or 100% or 191 00:10:38,736 --> 00:10:44,098 any threshold that you think has exceeded the normal threshold, then 192 00:10:44,124 --> 00:10:48,814 you have to check why it is using that much and 193 00:10:48,852 --> 00:10:51,490 for that you will have to use the command du. 194 00:10:51,810 --> 00:10:56,222 When you do du -a, which is all and specify 195 00:10:56,306 --> 00:10:58,942 which file system that you're trying to look for. 196 00:10:59,076 --> 00:11:02,578 You'll see all the file system that's come all the 197 00:11:02,604 --> 00:11:05,494 files in the system that will show up here and 198 00:11:05,532 --> 00:11:08,962 it will tell you every file and its size. 199 00:11:09,096 --> 00:11:16,066 You could do the -h for human 200 00:11:16,128 --> 00:11:18,814 readable as well and it will tell you 201 00:11:18,852 --> 00:11:22,126 in Kilobyte, Megabyte or Gigabyte as well. 202 00:11:22,308 --> 00:11:24,418 If you want to sort this out, you 203 00:11:24,444 --> 00:11:27,010 could do sort in the reverse order. 204 00:11:27,060 --> 00:11:31,666 So you'll see that the highest usage of file at the top 205 00:11:31,728 --> 00:11:35,638 you will do this and it will sort everything for you. 206 00:11:35,664 --> 00:11:37,726 But of course it's going to go so fast 207 00:11:37,788 --> 00:11:40,498 so you will have to actually more it. 208 00:11:40,584 --> 00:11:42,600 So pipe it to more. 209 00:11:43,890 --> 00:11:46,018 So it is actually going to the 210 00:11:46,044 --> 00:11:47,518 proc which is just the memory one. 211 00:11:47,544 --> 00:11:50,194 So no such file while you're running it. Don't worry about it. 212 00:11:50,232 --> 00:11:52,414 You could ignore any of that that 213 00:11:52,512 --> 00:11:54,540 comes up and says cannot access. 214 00:11:54,930 --> 00:11:57,142 But if you are getting an error message 215 00:11:57,216 --> 00:12:01,522 that saying hey, Permission denied, then it definitely 216 00:12:01,596 --> 00:12:03,302 means that you are not root. 217 00:12:03,446 --> 00:12:05,914 So make sure you are root before you run 218 00:12:05,952 --> 00:12:09,814 the du -a command or -h command on 219 00:12:09,852 --> 00:12:12,238 any file system, so it could go into every 220 00:12:12,324 --> 00:12:16,450 file without getting their Permission denied error. 221 00:12:17,550 --> 00:12:19,762 Let's clear the screen then. 222 00:12:19,836 --> 00:12:22,934 The next step is about check processing. 223 00:12:23,102 --> 00:12:26,854 So let me make this window a little smaller so 224 00:12:26,892 --> 00:12:29,422 we could go over the commands one by one. 225 00:12:29,496 --> 00:12:31,654 The first one we have top which 226 00:12:31,692 --> 00:12:34,078 we have already covered previously as well. 227 00:12:34,164 --> 00:12:35,902 I'm just going to go quickly again. 228 00:12:35,976 --> 00:12:39,278 The first line tells you about the system uptime, 229 00:12:39,434 --> 00:12:42,494 how many users are logged in, and the load 230 00:12:42,602 --> 00:12:46,234 average in every 5, 10, and 15 minutes. 231 00:12:46,392 --> 00:12:48,038 This one tells you total tasks 232 00:12:48,074 --> 00:12:50,054 that are running, how many sleeping. 233 00:12:50,222 --> 00:12:52,922 This line tells you about the CPU status. 234 00:12:53,066 --> 00:12:55,466 This line tells you about the memory status. 235 00:12:55,598 --> 00:12:59,290 This line tells you about the swap status. Then 236 00:12:59,340 --> 00:13:03,358 coming down, this is the list of every process 237 00:13:03,444 --> 00:13:05,314 that you are running in your system. 238 00:13:05,472 --> 00:13:08,842 Mostly the high utilized process you 239 00:13:08,856 --> 00:13:10,080 will see at the top. 240 00:13:10,410 --> 00:13:13,906 The first column is the process ID who is using 241 00:13:13,968 --> 00:13:22,618 it, the virtualization, reservation, shared CPU, memory, time, command, and 242 00:13:22,644 --> 00:13:25,860 it has a lot of information as well. 243 00:13:26,490 --> 00:13:29,710 Then we'll go into the next command which is free. 244 00:13:29,820 --> 00:13:32,642 free is specifically for memory utilization. 245 00:13:32,726 --> 00:13:35,100 You can also do free -m 246 00:13:35,730 --> 00:13:37,694 if you wanted to see in Megabytes. 247 00:13:37,802 --> 00:13:39,958 So this will tell you how much memory you 248 00:13:39,984 --> 00:13:43,730 have total, how much used, how much free, shared, 249 00:13:43,790 --> 00:13:45,718 how much is it buffer, and how much is 250 00:13:45,744 --> 00:13:52,074 available to be used by the other processes. 251 00:13:52,122 --> 00:13:55,430 Yes, other scripts or other programs. 252 00:13:56,770 --> 00:14:02,990 Okay, now next one we have list memory, lsmem. 253 00:14:03,160 --> 00:14:05,438 This is telling you exactly about the 254 00:14:05,464 --> 00:14:08,510 memory block size, total online memory that 255 00:14:08,560 --> 00:14:10,454 we have available to the system. 256 00:14:10,612 --> 00:14:13,754 It is one gig total offline memory that is 257 00:14:13,792 --> 00:14:15,830 sitting that we are not using, which is zero. 258 00:14:15,880 --> 00:14:16,946 We don't have any. 259 00:14:17,068 --> 00:14:20,306 So this is going to tell you about your memory as well. 260 00:14:20,368 --> 00:14:23,898 Same thing pretty much when you do /proc/meminfo 261 00:14:24,054 --> 00:14:32,822 cat sorry, /proc/meminfo, this is going to tell you exactly 262 00:14:32,956 --> 00:14:35,882 what is going on in the system right now. 263 00:14:36,016 --> 00:14:38,390 Total memory, free memory, memory available. 264 00:14:38,500 --> 00:14:40,694 It has a bunch of information about your 265 00:14:40,732 --> 00:14:45,530 memory, and how memory is allocated to every 266 00:14:45,640 --> 00:14:48,854 process, every page in your system. 267 00:14:48,952 --> 00:14:52,120 So you have to read through it, every one of them. 268 00:14:52,120 --> 00:14:55,238 If you really wanted to know exactly what 269 00:14:55,324 --> 00:14:58,302 each of them means, if I start explaining 270 00:14:58,326 --> 00:14:59,898 them, it's going to take me forever. 271 00:14:59,994 --> 00:15:01,274 So I will recommend you, 272 00:15:01,312 --> 00:15:03,506 you could just go to Google and type 273 00:15:03,568 --> 00:15:05,918 /proc/meminfo and it will give you 274 00:15:05,944 --> 00:15:08,690 a lot of information about memory info. 275 00:15:10,030 --> 00:15:13,730 Then vmstat, which is the virtual memory status. 276 00:15:14,050 --> 00:15:16,802 This is telling you about your swap space. 277 00:15:16,876 --> 00:15:21,006 Swap is pretty much the same as a virtual memory. 278 00:15:21,198 --> 00:15:24,194 How much it has been assigned, how much 279 00:15:24,232 --> 00:15:27,090 is it free, how much is it buffer. 280 00:15:27,270 --> 00:15:30,426 This will also tell you if you have memory. 281 00:15:30,618 --> 00:15:34,754 So if you are running free -m, and you 282 00:15:34,792 --> 00:15:39,926 see the total is 991, and used is 991, nothing 283 00:15:39,988 --> 00:15:42,350 is used, then the next thing you have to find 284 00:15:42,400 --> 00:15:45,614 out is whether you have the same issue with your 285 00:15:45,652 --> 00:15:51,200 swap or virtual machine, sorry, virtual memory as well. 286 00:15:51,950 --> 00:15:54,858 By the way, free can also give you both as 287 00:15:54,884 --> 00:15:58,842 well the regular memory and swap memory as well. 288 00:15:59,036 --> 00:16:01,818 So check the memory status, make 289 00:16:01,844 --> 00:16:03,702 sure the memory is not used. 290 00:16:03,836 --> 00:16:06,414 If it is used, then you actually have 291 00:16:06,452 --> 00:16:10,962 to assign more memory to this machine because 292 00:16:11,036 --> 00:16:13,666 it is using up a lot of processes. 293 00:16:13,858 --> 00:16:16,998 If you run top again and if you see a lot of 294 00:16:17,024 --> 00:16:23,458 processes here and that are using this column right here, this column 295 00:16:23,494 --> 00:16:26,826 has used up a lot of memory, by a lot of different 296 00:16:26,888 --> 00:16:31,798 processes, then you actually have to add more memory. 297 00:16:31,954 --> 00:16:34,962 Or you could also use a new command that 298 00:16:34,976 --> 00:16:37,078 I'm going to show you which is pmap. 299 00:16:37,174 --> 00:16:40,374 So right here if you notice, the 300 00:16:40,412 --> 00:16:44,082 top command has a process ID 6610. 301 00:16:44,216 --> 00:16:47,686 So if I do open up another session, 302 00:16:47,686 --> 00:16:51,991 [No audio] 303 00:16:51,991 --> 00:16:54,126 and I type pmap, 304 00:16:54,126 --> 00:17:00,010 and put in the process ID, then it is telling me 305 00:17:00,060 --> 00:17:05,204 this is a total process that is being used by the top 306 00:17:05,387 --> 00:17:09,622 command that are the top process that I'm running. 307 00:17:09,756 --> 00:17:15,074 So just that specific information about the pmap. 308 00:17:15,182 --> 00:17:16,915 You could also find out about the 309 00:17:16,944 --> 00:17:18,550 other process if you want to know. 310 00:17:18,598 --> 00:17:21,598 Like, let's say process ID one. 311 00:17:22,770 --> 00:17:25,810 This one actually gives you the process information 312 00:17:25,980 --> 00:17:30,442 sorry, the memory information of one particular process. 313 00:17:30,576 --> 00:17:32,986 Instead of going around and finding out every 314 00:17:33,048 --> 00:17:37,066 process, you could check it over here. Okay. 315 00:17:37,127 --> 00:17:42,046 So the next one is dmidecode. 316 00:17:42,228 --> 00:17:47,434 dmidecode. This is going to give you a lot of information 317 00:17:47,531 --> 00:17:51,000 about your hardware, what is inside of your computer. 318 00:17:51,450 --> 00:17:55,054 So when you run this, and if you notice you 319 00:17:55,092 --> 00:17:58,414 have an issue with the CPU or memory and you 320 00:17:58,452 --> 00:18:04,262 have to actually call the vendor and the vendor asks 321 00:18:04,286 --> 00:18:07,258 you some information about give me the information about the 322 00:18:07,284 --> 00:18:10,534 CPU serial number or memory serial number or what's the 323 00:18:10,572 --> 00:18:15,046 size or what's the release or what's the version information, 324 00:18:15,168 --> 00:18:18,094 you will find all the information right here. 325 00:18:18,192 --> 00:18:19,658 As you can see, the manufacturer 326 00:18:19,694 --> 00:18:22,814 information, the product name, the chassis 327 00:18:22,862 --> 00:18:25,742 information says manufacturer is Oracle Corporation. 328 00:18:25,826 --> 00:18:28,174 So you know who to call when you are trying 329 00:18:28,212 --> 00:18:31,174 to open up a case and with who, so if you 330 00:18:31,212 --> 00:18:34,066 do not know which server that you are on, 331 00:18:34,128 --> 00:18:36,358 this is the dmidecode, will tell you 332 00:18:36,384 --> 00:18:39,674 whether it's HP, Oracle machine, Spark machine, 333 00:18:39,782 --> 00:18:40,742 sorry, not sparks, 334 00:18:40,766 --> 00:18:43,738 they have different ones like any other 335 00:18:43,764 --> 00:18:45,794 hardware, like Dell or any other hardware 336 00:18:45,842 --> 00:18:49,270 that run 64 bit operating systems. 337 00:18:50,310 --> 00:18:56,170 Okay, then lscpu, which is kind of similar to lsmem, 338 00:18:57,090 --> 00:19:00,910 it gives you the information about your CPU architecture. 339 00:19:01,410 --> 00:19:05,662 CPU architecture is x86 64 bit. 340 00:19:05,796 --> 00:19:09,494 It actually supports 32-bit and 64-bit models. 341 00:19:09,602 --> 00:19:13,486 The byte size CPU is one any 342 00:19:13,548 --> 00:19:19,430 socket, any core per socket, the vendor 343 00:19:19,550 --> 00:19:25,430 CPU family, model name, Intel CPU, Megahertz. 344 00:19:25,610 --> 00:19:27,310 Everything that you need to know about 345 00:19:27,360 --> 00:19:31,030 CPU can also be found in lscpu. 346 00:19:32,310 --> 00:19:38,666 Okay, next one is /proc/cpuinfo, which is again similar to 347 00:19:38,666 --> 00:19:41,292 [No audio] 348 00:19:41,292 --> 00:19:46,910 lscpu, /proc/cpuinfo. 349 00:19:47,770 --> 00:19:50,534 If you notice, pretty much the same as the other 350 00:19:50,572 --> 00:19:55,418 one, but probably it is rearranged, but it gives you 351 00:19:55,564 --> 00:20:00,470 similar information to what you would get with lscpu. 352 00:20:01,150 --> 00:20:03,026 Okay, let's clear the screen. 353 00:20:03,208 --> 00:20:07,206 So you have find out so many things about CPU. 354 00:20:07,338 --> 00:20:09,270 You know, your memory 355 00:20:09,330 --> 00:20:11,258 information, your disk information. 356 00:20:11,404 --> 00:20:14,870 Now moving on to the disk status. 357 00:20:15,483 --> 00:20:20,858 You know the disk, when you run df -h, you see that 358 00:20:21,004 --> 00:20:26,380 the disk usage is good and it's actually 59%. 359 00:20:26,710 --> 00:20:30,642 So there isn't a disk in terms of disk 360 00:20:30,666 --> 00:20:35,754 capacity issue, but maybe the disk input output disk 361 00:20:35,802 --> 00:20:40,590 itself is running in a degraded mode. 362 00:20:40,710 --> 00:20:43,442 So for that to find out, you have to run 363 00:20:43,516 --> 00:20:49,358 iostat, meaning input output statistics, -y, with five. 364 00:20:49,384 --> 00:20:54,542 I always use that because I am kind of used to it, 365 00:20:54,616 --> 00:20:57,146 I'll say that this way, there are many other options 366 00:20:57,208 --> 00:21:00,580 that you could use with iostat as well. 367 00:21:00,970 --> 00:21:03,640 I would say there are many options 368 00:21:03,970 --> 00:21:05,978 with every command that I'm showing you. 369 00:21:06,004 --> 00:21:08,606 So the best way is that you run man on every 370 00:21:08,668 --> 00:21:13,742 command and find out every option that is available to you. 371 00:21:13,876 --> 00:21:19,278 Anyway, let's run iostat -y with every 5 seconds. 372 00:21:19,374 --> 00:21:21,786 Now this is actually going to the disk 373 00:21:21,918 --> 00:21:23,798 and it's going to ask disk give me your 374 00:21:23,824 --> 00:21:28,190 statistics every five minutes, sorry, every 5 seconds. 375 00:21:28,870 --> 00:21:31,326 And here it's telling you your disks. 376 00:21:31,458 --> 00:21:33,318 You have two disks, SDA, 377 00:21:33,414 --> 00:21:39,638 SDB and it's kilobyte read, kilobyte written, how much 378 00:21:39,724 --> 00:21:42,254 it is reading, how much it is writing it. 379 00:21:42,352 --> 00:21:45,446 So by looking at my system, my system is 380 00:21:45,508 --> 00:21:51,422 running very efficient because it's pretty much 0.0. 381 00:21:51,496 --> 00:21:55,838 Nothing is, I don't run any production type 382 00:21:55,864 --> 00:21:57,374 of application on my system. 383 00:21:57,472 --> 00:21:59,786 So that's why you see that everything is zero. 384 00:21:59,848 --> 00:22:04,874 But if you are running, you will see how it is 385 00:22:04,912 --> 00:22:09,302 going through, how much it is KB reading, how much it 386 00:22:09,316 --> 00:22:12,100 is writing, how much it is transferring per second. 387 00:22:12,490 --> 00:22:14,906 So all that information will show up here. 388 00:22:14,968 --> 00:22:19,682 If you see the number very high, it means either you 389 00:22:19,696 --> 00:22:24,362 are of course writing so much or reading so much or 390 00:22:24,436 --> 00:22:29,258 it's just your disk is simply having some issues reading or 391 00:22:29,284 --> 00:22:35,150 writing or actually queuing up the entire transactions. 392 00:22:35,950 --> 00:22:38,630 So this is about iostat. 393 00:22:40,450 --> 00:22:42,938 Now let's get into networking part. 394 00:22:43,024 --> 00:22:49,050 Now you confirmed nothing is wrong with the disk. 395 00:22:49,230 --> 00:22:51,938 Everything looks good at the processor, everything 396 00:22:52,024 --> 00:22:54,062 looks good on the memory side. 397 00:22:54,196 --> 00:22:56,270 Now let's look at the networking. If you are actually, 398 00:22:56,270 --> 00:23:00,054 if your programs are actually involved 399 00:23:00,102 --> 00:23:04,958 into transferring or actually receiving, you 400 00:23:04,984 --> 00:23:06,942 would have to check the networking. 401 00:23:07,086 --> 00:23:12,290 So first command that we will run is tcpdump. 402 00:23:13,990 --> 00:23:19,266 TCP, every protocol that goes in and out is mostly on TCP. 403 00:23:19,458 --> 00:23:22,950 So with -i option it's interface. 404 00:23:23,070 --> 00:23:25,218 First you need to find out what interface 405 00:23:25,254 --> 00:23:27,074 you're going to run tcpdump on. 406 00:23:27,112 --> 00:23:29,510 So for that you have to run if config. 407 00:23:29,890 --> 00:23:34,630 And you notice my interface is enp0s3. 408 00:23:34,630 --> 00:23:41,320 So I'm going to run on my tcpdump -i enps03. 409 00:23:41,830 --> 00:23:51,690 And when I run that enp, wrong interface, spelled wrong. 410 00:23:51,860 --> 00:23:54,918 Okay, now you see it is going so fast 411 00:23:55,004 --> 00:23:59,718 because my PC, where I'm running my machine or 412 00:23:59,744 --> 00:24:03,258 where I'm actually showing you this whole lecture, it 413 00:24:03,284 --> 00:24:05,374 is communicating to this virtual machine. 414 00:24:05,422 --> 00:24:06,882 That is why it's going back and 415 00:24:06,896 --> 00:24:08,518 forth, back and forth, back and forth. 416 00:24:08,614 --> 00:24:12,510 So if I just close this or CTRL C 417 00:24:12,560 --> 00:24:15,870 out of it and open up another window. 418 00:24:16,730 --> 00:24:18,138 By the way, you have to be 419 00:24:18,164 --> 00:24:20,266 root to run most of these commands. 420 00:24:20,338 --> 00:24:21,886 So always log in root. 421 00:24:22,018 --> 00:24:24,210 Now, if I try to do ping, 422 00:24:25,430 --> 00:24:31,318 let's say www.google.com, it is pinging, 423 00:24:31,414 --> 00:24:33,894 but now as it's pinging, I could actually 424 00:24:33,992 --> 00:24:36,990 check this traffic on my tcpdump. 425 00:24:37,670 --> 00:24:40,880 So let's just grab for, let's say google. 426 00:24:40,880 --> 00:24:45,570 [No audio] 427 00:24:45,570 --> 00:24:50,278 So as it's going down no, actually it's actually not 428 00:24:50,304 --> 00:24:52,786 going to Google, it's actually going to the IP address. 429 00:24:52,848 --> 00:24:57,166 So tcpdump actually is not going to give you Google. 430 00:24:57,228 --> 00:24:59,434 It's actually going to give you the IP address. 431 00:24:59,592 --> 00:25:07,527 So let me grep for 162.243.10.151 432 00:25:07,527 --> 00:25:10,178 and there you go. 433 00:25:10,204 --> 00:25:12,650 See, it's telling me my machine, which name 434 00:25:12,700 --> 00:25:17,090 is myfirstlinuxvm is going to 435 00:25:17,140 --> 00:25:20,358 the Google with the protocol ICMP. 436 00:25:20,514 --> 00:25:22,910 Anytime you ping a machine, that 437 00:25:22,960 --> 00:25:26,990 protocol is known as ICMP. 438 00:25:26,990 --> 00:25:33,830 And so this machine, my machine is using ICMP protocol and 439 00:25:33,880 --> 00:25:35,942 pinging this machine which is Google, 440 00:25:36,076 --> 00:25:39,102 the protocol is listed echo request, 441 00:25:39,186 --> 00:25:42,986 and again this machine which is Google, coming 442 00:25:43,048 --> 00:25:46,434 back to my machine and acknowledging the traffic. 443 00:25:46,542 --> 00:25:47,814 It's getting reply. 444 00:25:47,922 --> 00:25:50,886 So I am look at this very carefully. 445 00:25:50,958 --> 00:25:54,098 This is my machine going to Google with this 446 00:25:54,184 --> 00:26:00,846 protocol and requesting to send the sequence the length 447 00:26:00,918 --> 00:26:05,990 of 64 bit 64 bytes right here. 448 00:26:06,040 --> 00:26:09,474 Sequence time ICMP. 449 00:26:09,642 --> 00:26:10,982 That's what it's telling you. 450 00:26:11,056 --> 00:26:14,918 So then when it goes there, the Google is 451 00:26:14,944 --> 00:26:17,678 coming back to me and say hey, I got 452 00:26:17,704 --> 00:26:20,522 it, you are the one who's pinging me. 453 00:26:20,656 --> 00:26:22,718 And I'm replying back to you and 454 00:26:22,744 --> 00:26:25,900 telling you yes, I'm alive, I'm available. 455 00:26:26,650 --> 00:26:29,490 So it's just a simple way of doing tcpdump. 456 00:26:29,610 --> 00:26:32,054 If you are trying to troubleshoot an issue where 457 00:26:32,092 --> 00:26:35,114 your machine is talking to a database server, then 458 00:26:35,152 --> 00:26:38,234 I would say put that database server IP address 459 00:26:38,392 --> 00:26:41,502 or use the protocol in the tcpdump. 460 00:26:41,586 --> 00:26:43,950 There are different options to use the protocol 461 00:26:44,070 --> 00:26:50,230 specifically for that port, sorry, not the protocol, 462 00:26:50,290 --> 00:26:51,642 I meant to say the port. 463 00:26:51,776 --> 00:26:56,002 If you are going to, let's say 192.168.1.13, 464 00:26:56,146 --> 00:26:59,394 and that is your database server and that is 465 00:26:59,432 --> 00:27:05,142 using a protocol, let's say 2049 no, sorry, the 466 00:27:05,156 --> 00:27:09,738 database protocol is sorry, the port number that is 467 00:27:09,764 --> 00:27:12,910 used on database is anyway, I forgot. 468 00:27:12,970 --> 00:27:15,786 But I hope you understand what I'm trying to say. 469 00:27:15,848 --> 00:27:20,322 So you could actually specify that port number to listen, 470 00:27:20,396 --> 00:27:23,180 whether this is listing on the port number or not. 471 00:27:23,990 --> 00:27:30,418 So, tcpdump is a very long kind of discussion 472 00:27:30,454 --> 00:27:32,538 that I could go into, but I want you to 473 00:27:32,564 --> 00:27:34,578 try it on and this will give you a lot 474 00:27:34,604 --> 00:27:37,160 of information when you're troubleshooting traffic issues. 475 00:27:37,970 --> 00:27:41,922 Next one is lsof, list open files and 476 00:27:41,936 --> 00:27:45,534 -i, -p, and -n. 477 00:27:45,572 --> 00:27:48,778 This will give you what is open, what is listening. 478 00:27:48,814 --> 00:27:51,582 So, let's say if my machine, this is the one 479 00:27:51,716 --> 00:27:59,154 is actually running a service, let's say SSH and I 480 00:27:59,192 --> 00:28:04,534 cannot connect from this machine to this machine using SSH, 481 00:28:04,642 --> 00:28:07,254 then I will come into this machine, I'll run that 482 00:28:07,352 --> 00:28:11,106 command and I will grep for 22. 483 00:28:11,288 --> 00:28:13,938 And as you could see right here, this will 484 00:28:13,964 --> 00:28:17,554 list all established connection and also listening connection. 485 00:28:17,662 --> 00:28:22,554 Right here it says SSH is listening on port 22. 486 00:28:22,592 --> 00:28:26,866 So I could tell, hey, my machine is listening. 487 00:28:27,058 --> 00:28:29,610 I have no problem, if something is your 488 00:28:29,660 --> 00:28:33,166 machine that's not listening or maybe you haven't 489 00:28:33,238 --> 00:28:36,942 listening, maybe my machine is listening and probably 490 00:28:37,016 --> 00:28:39,870 I have a firewall that is blocking any 491 00:28:39,920 --> 00:28:43,270 traffic coming from this machine to my machine. 492 00:28:43,450 --> 00:28:45,620 So I have to check firewall as well. 493 00:28:46,550 --> 00:28:53,123 Okay, then the next one I have is netstat 494 00:28:53,123 --> 00:28:56,538 [No audio] 495 00:28:56,538 --> 00:28:58,957 -plnt. 496 00:28:59,130 --> 00:29:01,450 This will pretty much give me the same information 497 00:29:01,560 --> 00:29:03,934 that I just typed in a different way. 498 00:29:04,092 --> 00:29:05,642 So this is another command. 499 00:29:05,726 --> 00:29:07,402 Another thing that I was going to talk to you 500 00:29:07,416 --> 00:29:11,378 about is Netstat is now deprecated and the newer command 501 00:29:11,414 --> 00:29:14,510 that is used is ss which stands for Socket. 502 00:29:14,630 --> 00:29:19,078 You could use the same options to list 503 00:29:19,164 --> 00:29:23,230 all your ports that are open or listening. 504 00:29:23,790 --> 00:29:26,302 You can also do man on ss and you will 505 00:29:26,316 --> 00:29:31,560 find another utility to investigate sockets that are open. 506 00:29:32,190 --> 00:29:36,142 Socket is something that runs on your computer which 507 00:29:36,216 --> 00:29:40,654 actually attaches the port number and opens up every 508 00:29:40,692 --> 00:29:44,254 time a traffic comes in on that port or 509 00:29:44,292 --> 00:29:47,158 anything that comes in and trying to knock on 510 00:29:47,184 --> 00:29:50,400 the door that has 22 label on it. 511 00:29:51,210 --> 00:29:54,062 The last one is about check networking 512 00:29:54,206 --> 00:29:56,674 and I have this command iftop. 513 00:29:56,832 --> 00:29:59,458 And as I said earlier, iftop is 514 00:29:59,484 --> 00:30:01,918 a lead that does not come with your 515 00:30:01,944 --> 00:30:05,230 Linux operating system, you have to install that. 516 00:30:05,280 --> 00:30:07,634 And to install that you have to run the command 517 00:30:07,682 --> 00:30:11,986 yum install epel-release and then yum install iftop. 518 00:30:12,108 --> 00:30:16,358 I have already installed this command in my system, so I'll 519 00:30:16,394 --> 00:30:21,358 run it iftop, and you will see right here, this 520 00:30:21,384 --> 00:30:27,046 is telling me about the transfer that is going from to 521 00:30:27,168 --> 00:30:31,570 this machine and from this machine to any other place. 522 00:30:31,680 --> 00:30:35,558 So if I'm using FTP to transfer some files 523 00:30:35,594 --> 00:30:38,642 from one machine to another, or I'm using protocols 524 00:30:38,666 --> 00:30:43,954 like NFS or SCP protocols, every time you use 525 00:30:43,992 --> 00:30:46,598 them, every time you run those commands, 526 00:30:46,694 --> 00:30:51,098 and if you wanted to know how much bandwidth 527 00:30:51,134 --> 00:30:56,374 is used and how much megabytes or kilobytes per 528 00:30:56,412 --> 00:31:00,634 second is being transferred, this command is amazing. 529 00:31:00,792 --> 00:31:04,954 It will give you all that information from to 530 00:31:05,112 --> 00:31:09,190 the rate and in kilobyte, how much it has 531 00:31:09,240 --> 00:31:12,718 transferred, how much is remaining, and whole bunch of 532 00:31:12,744 --> 00:31:15,562 information if you want to learn more about it. 533 00:31:15,696 --> 00:31:18,598 I think when you install it, it actually 534 00:31:18,684 --> 00:31:20,880 also installed the man pages, I believe. 535 00:31:21,270 --> 00:31:24,902 Yes it did. And it says display bandwidth usage 536 00:31:24,986 --> 00:31:26,950 on an interface by host. 537 00:31:27,630 --> 00:31:28,778 So use that command. 538 00:31:28,814 --> 00:31:31,258 If you do not have it, I highly recommend you 539 00:31:31,284 --> 00:31:34,186 to install it with the instruction I have given you. 540 00:31:34,308 --> 00:31:37,414 Next one is check system uptime, see 541 00:31:37,452 --> 00:31:40,226 if your system has been rebooted recently. 542 00:31:40,358 --> 00:31:42,470 The time is there with the system load 543 00:31:42,530 --> 00:31:46,670 average. Moving on the check the logs, 544 00:31:46,730 --> 00:31:50,902 all the logs are in /var/logs, and when you 545 00:31:50,916 --> 00:31:55,894 do ls -l, ltr whichever option you prefer and 546 00:31:55,932 --> 00:31:59,134 the best log which has all the information or log 547 00:31:59,172 --> 00:32:03,106 that I always look for is messages. I could always 548 00:32:03,168 --> 00:32:09,830 do tail -100 show me last 100 lines of messages, 549 00:32:09,890 --> 00:32:14,054 and I could easily grep for error there's no error. 550 00:32:14,102 --> 00:32:17,882 So I could just take out entire tail, 551 00:32:17,966 --> 00:32:20,066 just cut it and grep for errors. 552 00:32:20,198 --> 00:32:22,882 And here I will see if there are any errors that 553 00:32:22,896 --> 00:32:26,520 have been reported that might help me find the issue. 554 00:32:27,150 --> 00:32:30,394 I would also say look for, always put -i 555 00:32:30,432 --> 00:32:33,310 by the way, I would also say put warning. 556 00:32:33,630 --> 00:32:36,490 I will also say put error. 557 00:32:36,990 --> 00:32:39,710 I will say put fail as in failure. 558 00:32:39,770 --> 00:32:42,094 You'll find so many information that 559 00:32:42,132 --> 00:32:44,160 could help you troubleshoot the issue. 560 00:32:45,210 --> 00:32:46,774 Check the hardware status by 561 00:32:46,812 --> 00:32:48,554 logging into the system console. 562 00:32:48,602 --> 00:32:50,734 You could do that by logging in, 563 00:32:50,772 --> 00:32:54,622 if it's HP, go into ILO, open up a browser, put in 564 00:32:54,636 --> 00:32:57,994 the ILO IP and you will see the logs in there. 565 00:32:58,032 --> 00:33:00,094 If it's Dell, put in the 566 00:33:00,132 --> 00:33:02,038 Idrag and if it's other system. 567 00:33:02,124 --> 00:33:04,920 Every system does have console access. 568 00:33:05,370 --> 00:33:07,078 Find that out and log in and 569 00:33:07,104 --> 00:33:11,350 find that hardware information or hardware message. 570 00:33:11,520 --> 00:33:14,134 The last one are other tools there are 571 00:33:14,172 --> 00:33:16,114 other tools that are available to you. 572 00:33:16,152 --> 00:33:17,878 You have to download them as 573 00:33:17,904 --> 00:33:19,262 well, just like I downloaded. 574 00:33:19,286 --> 00:33:22,346 iftop these are the tools are also helpful. 575 00:33:22,478 --> 00:33:26,314 Some of them like htop, iotop. iotop is helpful, which 576 00:33:26,352 --> 00:33:29,230 works the same way for disks as iostat. 577 00:33:30,090 --> 00:33:31,994 So these are some of the tools. 578 00:33:32,042 --> 00:33:33,838 I want you to actually go through 579 00:33:33,864 --> 00:33:37,378 them one by one, read about them. 580 00:33:37,464 --> 00:33:39,394 They will definitely really help you 581 00:33:39,492 --> 00:33:43,642 if you are pursuing system administration type of job or 582 00:33:43,656 --> 00:33:47,290 Linux related job or any type of computer related job, 583 00:33:47,340 --> 00:33:52,900 you will be using these type of commands quite often. 584 00:33:52,900 --> 00:33:54,624 [No audio]