1 00:00:06,340 --> 00:00:08,010 - We're gonna get lower level in this lesson. 2 00:00:08,010 --> 00:00:09,810 We're gonna talk about what buffers are 3 00:00:09,810 --> 00:00:11,740 and what a buffer overflow is 4 00:00:11,740 --> 00:00:14,750 and how you can use a buffer overflow to exploit 5 00:00:14,750 --> 00:00:18,163 and gain control of a process or a device. 6 00:00:19,040 --> 00:00:19,990 And first, let's talk a little bit 7 00:00:19,990 --> 00:00:21,830 about what buffers actually are. 8 00:00:21,830 --> 00:00:24,430 Buffers are containers for data. 9 00:00:24,430 --> 00:00:26,220 And really when we're talking about buffers 10 00:00:26,220 --> 00:00:28,960 we're talking about the C programming language. 11 00:00:28,960 --> 00:00:30,240 This is a programming language 12 00:00:30,240 --> 00:00:33,930 that has very manual memory management. 13 00:00:33,930 --> 00:00:36,290 As you bring in messages and you pass these messages 14 00:00:36,290 --> 00:00:39,490 you have to have these temporary storage places 15 00:00:39,490 --> 00:00:42,113 to keep information as you're passing the data. 16 00:00:42,990 --> 00:00:46,930 And so your buffers are your containers for this data. 17 00:00:46,930 --> 00:00:50,070 And in this case we're doing something called a linked list 18 00:00:50,070 --> 00:00:53,090 where we have a buffer, and then we have a pointer 19 00:00:53,090 --> 00:00:56,140 that points to the next structure 20 00:00:56,140 --> 00:00:58,820 that has the same buffer in it and so on. 21 00:00:58,820 --> 00:01:00,410 And when you do a buffer overflow 22 00:01:00,410 --> 00:01:03,590 you're stuffing too much information into that container. 23 00:01:03,590 --> 00:01:06,880 So we've only set aside 14 characters 24 00:01:06,880 --> 00:01:09,960 or 14 bytes to hold that information. 25 00:01:09,960 --> 00:01:12,740 And when you write over this buffer, 26 00:01:12,740 --> 00:01:15,820 you are writing into the next field of that structure 27 00:01:15,820 --> 00:01:16,653 in this case. 28 00:01:16,653 --> 00:01:20,240 So you will have lost that information that you had 29 00:01:20,240 --> 00:01:22,520 to be able to link those buffers together. 30 00:01:22,520 --> 00:01:24,250 And this is a buffer overflow. 31 00:01:24,250 --> 00:01:25,870 In this case, we've lost track 32 00:01:25,870 --> 00:01:28,210 of our linked list data structure 33 00:01:28,210 --> 00:01:30,810 but you can do things that are much more damaging 34 00:01:30,810 --> 00:01:32,850 which you'll soon see. 35 00:01:32,850 --> 00:01:34,910 And this is just showing how we are doing it 36 00:01:34,910 --> 00:01:38,630 is that you're using an operation or a function in C 37 00:01:38,630 --> 00:01:43,570 for instance, like a strcpy or sprintf 38 00:01:43,570 --> 00:01:45,580 and you are writing to that buffer 39 00:01:45,580 --> 00:01:48,940 and if you're not careful with your boundaries 40 00:01:48,940 --> 00:01:51,050 you can actually overwrite that data 41 00:01:51,050 --> 00:01:53,180 and write over to the end of the buffer 42 00:01:53,180 --> 00:01:56,050 and into the thing that's next in memory. 43 00:01:56,050 --> 00:02:00,230 When we're talking about processors and processes 44 00:02:00,230 --> 00:02:04,510 you have very low level architectural things 45 00:02:04,510 --> 00:02:07,570 where you have to worry about like registers. 46 00:02:07,570 --> 00:02:12,360 In this case, we're talking about Intel's X86 processors 47 00:02:12,360 --> 00:02:14,470 and we're running in 32 byte mode. 48 00:02:14,470 --> 00:02:17,260 Whenever you see something with a prefix of E 49 00:02:17,260 --> 00:02:20,030 that's an extended instruction pointer. 50 00:02:20,030 --> 00:02:22,090 This is for compatibility reasons 51 00:02:22,090 --> 00:02:24,730 they had an instruction pointer that was just an IP 52 00:02:24,730 --> 00:02:27,630 that was only 16 bytes so when they went to 32 bytes, 53 00:02:27,630 --> 00:02:29,750 they gave everything with an E prefix. 54 00:02:29,750 --> 00:02:33,110 And then when you go into 64 bytes, you have an R prefix. 55 00:02:33,110 --> 00:02:34,900 So we're talking about the instruction pointer 56 00:02:34,900 --> 00:02:36,550 and the instruction pointer is a register 57 00:02:36,550 --> 00:02:39,120 that points to the address in memory 58 00:02:39,120 --> 00:02:41,190 where the next instruction is. 59 00:02:41,190 --> 00:02:44,020 The next instruction that we're going to execute. 60 00:02:44,020 --> 00:02:47,710 And you manipulate this register directly using calls 61 00:02:47,710 --> 00:02:49,220 and branches. 62 00:02:49,220 --> 00:02:51,930 And when you're returning from a call 63 00:02:51,930 --> 00:02:54,300 you're putting information into the instruction pointer 64 00:02:54,300 --> 00:02:56,820 to resume where you left off. 65 00:02:56,820 --> 00:03:01,060 And it's valid only you're pointing to an executable region 66 00:03:01,060 --> 00:03:01,980 in memory. 67 00:03:01,980 --> 00:03:06,210 If you start executing parts of memory that are maybe data 68 00:03:06,210 --> 00:03:09,270 then the processor is actually interpreting that data 69 00:03:09,270 --> 00:03:12,550 as code and it will go off into the weeds 70 00:03:12,550 --> 00:03:15,410 and it will likely trigger an exception. 71 00:03:15,410 --> 00:03:18,600 And an exception in this case is a segmentation fault. 72 00:03:18,600 --> 00:03:21,710 There are other types of exceptions like dividing by zero, 73 00:03:21,710 --> 00:03:23,290 for example. 74 00:03:23,290 --> 00:03:26,120 And if you execute invalid memory, 75 00:03:26,120 --> 00:03:28,840 if you execute from a location that is maybe say, 76 00:03:28,840 --> 00:03:31,730 the zero page then that may not be mapped 77 00:03:31,730 --> 00:03:32,840 into a process at all. 78 00:03:32,840 --> 00:03:35,420 It may not be data, it's just not there. 79 00:03:35,420 --> 00:03:38,140 And if you start trying to run things that are not there 80 00:03:38,140 --> 00:03:39,540 then the processor gets upset 81 00:03:39,540 --> 00:03:42,260 and it'll tell the OS to stop the program. 82 00:03:42,260 --> 00:03:43,860 So talking about the registers, 83 00:03:43,860 --> 00:03:46,780 we have a couple of special purpose registers that are used 84 00:03:46,780 --> 00:03:48,607 for keeping track of the stack. 85 00:03:48,607 --> 00:03:53,220 And the stack is, it's a temporary location to put values 86 00:03:53,220 --> 00:03:56,820 in memory and stacks grow from higher addresses 87 00:03:56,820 --> 00:03:57,760 to lower addresses. 88 00:03:57,760 --> 00:03:59,470 So this is kind of maybe backwards, 89 00:03:59,470 --> 00:04:01,440 which you may be expecting is that 90 00:04:01,440 --> 00:04:03,811 you start from the higher addresses you have saved, 91 00:04:03,811 --> 00:04:06,120 values that are on the stack 92 00:04:06,120 --> 00:04:08,579 and then you have local values that are stored 93 00:04:08,579 --> 00:04:12,250 inside of the stack frame. 94 00:04:12,250 --> 00:04:15,930 And between the local frames end and the beginning 95 00:04:15,930 --> 00:04:18,650 is the stack pointer and the base pointer. 96 00:04:18,650 --> 00:04:21,150 And you could see that we have a saved base pointer 97 00:04:21,150 --> 00:04:23,910 that allows you to to do like a linked list 98 00:04:23,910 --> 00:04:25,930 of a stack frames. 99 00:04:25,930 --> 00:04:29,650 And the stack pointers, then hold those local variables 100 00:04:29,650 --> 00:04:31,890 and those parameters that are used by a function 101 00:04:31,890 --> 00:04:33,840 and it's used to save registers. 102 00:04:33,840 --> 00:04:35,140 So what does that mean? 103 00:04:35,140 --> 00:04:37,290 So some of the registers that we talked about earlier 104 00:04:37,290 --> 00:04:40,530 like EAX, EAX is a general purpose register 105 00:04:40,530 --> 00:04:43,640 that could be used as a result for mathematical functions 106 00:04:43,640 --> 00:04:45,020 or mathematical operations. 107 00:04:45,020 --> 00:04:46,920 Let's say you're adding two numbers together 108 00:04:46,920 --> 00:04:49,320 and you have a result and you would save those 109 00:04:49,320 --> 00:04:50,270 in that register. 110 00:04:50,270 --> 00:04:54,980 Well, let's say that a function called another function 111 00:04:54,980 --> 00:04:57,440 and it expects EAX to have another value, 112 00:04:57,440 --> 00:04:58,880 you have to store those somewhere 113 00:04:58,880 --> 00:05:01,040 and you would store them on the stack. 114 00:05:01,040 --> 00:05:03,780 The stack forms a chain in which functions are called 115 00:05:03,780 --> 00:05:05,390 and functions return 116 00:05:05,390 --> 00:05:07,870 and those values that are stored on the stack 117 00:05:07,870 --> 00:05:10,360 like the EIP is the saved instruction pointer, 118 00:05:10,360 --> 00:05:14,290 that's so that we can resume back from where we left off 119 00:05:14,290 --> 00:05:16,860 after we return from a function. 120 00:05:16,860 --> 00:05:19,220 So when a function is called those local values 121 00:05:19,220 --> 00:05:22,000 and function arguments are pushed onto the stack 122 00:05:22,000 --> 00:05:24,120 and then when the function returns 123 00:05:24,120 --> 00:05:28,270 those parameters are popped, so to speak, off the stack. 124 00:05:28,270 --> 00:05:29,780 So lemme show you an example of this. 125 00:05:29,780 --> 00:05:31,850 So we're calling a function 126 00:05:31,850 --> 00:05:33,680 and we're decrementing the stack pointer 127 00:05:33,680 --> 00:05:35,070 and that's our local frame. 128 00:05:35,070 --> 00:05:38,645 So we have saved the base pointer that we had from before 129 00:05:38,645 --> 00:05:41,060 and the instruction pointer, so that we can resume 130 00:05:41,060 --> 00:05:43,300 from where we were when we left off 131 00:05:43,300 --> 00:05:45,740 and then that base pointer is then incremented backup 132 00:05:45,740 --> 00:05:48,510 and that establishes our new stack frame 133 00:05:48,510 --> 00:05:50,163 and it keeps going and so on. 134 00:05:51,670 --> 00:05:54,200 And so I think you could see where I'm going here with this 135 00:05:54,200 --> 00:05:57,430 is if you overflow a buffer that is on the stack, 136 00:05:57,430 --> 00:05:59,540 you can actually overflow these values 137 00:05:59,540 --> 00:06:02,010 that are saved registers. 138 00:06:02,010 --> 00:06:05,860 And one of the ones that's important is the EIP. 139 00:06:05,860 --> 00:06:07,970 So let me give you an example of this. 140 00:06:07,970 --> 00:06:09,900 And this is written again in C 141 00:06:09,900 --> 00:06:12,220 and it's an intentionally vulnerable example. 142 00:06:12,220 --> 00:06:15,670 You probably will not see code that looks like this 143 00:06:15,670 --> 00:06:17,110 in the real world 144 00:06:17,110 --> 00:06:19,100 but this is just for demonstration purposes 145 00:06:19,100 --> 00:06:21,550 that'll give you an idea about what's going on. 146 00:06:21,550 --> 00:06:25,940 And so what we have here is we have a main function 147 00:06:26,913 --> 00:06:30,940 and this main function allocates a buffer called mybuffer. 148 00:06:30,940 --> 00:06:34,590 And you can see where it's called char mybuffer[128] 149 00:06:34,590 --> 00:06:37,280 So that means that I am allocating 128 bytes 150 00:06:37,280 --> 00:06:41,310 for that buffer and I'm using this function later on 151 00:06:41,310 --> 00:06:42,700 called strcpy. 152 00:06:42,700 --> 00:06:45,340 And what strcpy does is it takes two buffers 153 00:06:45,340 --> 00:06:46,640 and it copies between them. 154 00:06:46,640 --> 00:06:49,850 So in this case, we have a buffer that is called argv 155 00:06:49,850 --> 00:06:51,280 and we're indexing it by one. 156 00:06:51,280 --> 00:06:55,890 That is the first command argument value that we're copying 157 00:06:55,890 --> 00:06:57,050 to the buffer. 158 00:06:57,050 --> 00:06:59,820 And we've only allocated 128 bytes for this 159 00:06:59,820 --> 00:07:02,330 and my arguments that I could pass to the program 160 00:07:02,330 --> 00:07:03,960 can be much longer. 161 00:07:03,960 --> 00:07:08,600 In my case, in my system, I had two megabytes was the limit. 162 00:07:08,600 --> 00:07:10,680 And so you could see that I could actually overflow 163 00:07:10,680 --> 00:07:12,820 that buffer far, you know 164 00:07:12,820 --> 00:07:16,800 far over what it actually is supposed to fit in there. 165 00:07:16,800 --> 00:07:19,480 And so what happens when we put more than 128 bytes 166 00:07:19,480 --> 00:07:22,717 in this buffer is that we overwrite not only mybuffer, 167 00:07:22,717 --> 00:07:25,320 but we start writing into the base pointer 168 00:07:25,320 --> 00:07:26,950 and the instruction pointer. 169 00:07:26,950 --> 00:07:29,570 So let's say that I'm putting just all As 170 00:07:29,570 --> 00:07:31,430 and I'm passing those into the command. 171 00:07:31,430 --> 00:07:35,720 You could see that I'm writing over 128 bytes of As 172 00:07:35,720 --> 00:07:38,260 into the process, and then it goes in 173 00:07:38,260 --> 00:07:40,350 and eventually it overwrites the base pointer 174 00:07:40,350 --> 00:07:41,350 and the instruction pointer 175 00:07:41,350 --> 00:07:44,240 and possibly even data that's after that. 176 00:07:44,240 --> 00:07:46,660 And we're taking the control. 177 00:07:46,660 --> 00:07:49,130 So what happens is when the process returns 178 00:07:49,130 --> 00:07:52,280 it's going to load the value that is the saved EIP, 179 00:07:52,280 --> 00:07:53,790 the saved instruction pointer 180 00:07:53,790 --> 00:07:56,500 and then it is gonna put that into the instruction pointer 181 00:07:56,500 --> 00:07:58,670 and it's gonna start executing. 182 00:07:58,670 --> 00:08:01,240 So that's really basically the gist 183 00:08:01,240 --> 00:08:04,556 of how a buffer overflow works is you're taking control 184 00:08:04,556 --> 00:08:07,980 of the instruction pointer, and you're running whatever 185 00:08:07,980 --> 00:08:10,500 you know, you can actually point to code to run at 186 00:08:10,500 --> 00:08:13,390 and what you could do if you know where that buffer is, 187 00:08:13,390 --> 00:08:16,130 you could point the instruction pointer to that buffer 188 00:08:16,130 --> 00:08:17,930 and actually execute it directly 189 00:08:17,930 --> 00:08:20,163 so what was data becomes code.