1 00:00:06,407 --> 00:00:08,072 - We're gonna get lower level in this lesson. 2 00:00:08,072 --> 00:00:09,907 We're gonna talk about what buffers are 3 00:00:09,907 --> 00:00:11,941 and what a buffer overflow is, 4 00:00:11,941 --> 00:00:14,986 and how you can use a buffer overflow to exploit 5 00:00:14,986 --> 00:00:18,886 and gain control of a process or a device. 6 00:00:18,886 --> 00:00:19,934 And first, let's talk a little bit 7 00:00:19,934 --> 00:00:22,131 about what buffers actually are. 8 00:00:22,131 --> 00:00:24,526 Buffers are containers for data, 9 00:00:24,526 --> 00:00:26,372 and really, when we're talking about buffers, 10 00:00:26,372 --> 00:00:29,123 we're talking about the C programming language. 11 00:00:29,123 --> 00:00:30,357 This is a programming language 12 00:00:30,357 --> 00:00:34,096 that has very manual memory management. 13 00:00:34,096 --> 00:00:36,481 As you bring in messages, and you parse these messages, 14 00:00:36,481 --> 00:00:39,618 you have to have these temporary storage places 15 00:00:39,618 --> 00:00:43,044 to keep information as you're parsing the data, 16 00:00:43,044 --> 00:00:46,990 and so your buffers are your containers for this data, 17 00:00:46,990 --> 00:00:48,123 and, in this case, 18 00:00:48,123 --> 00:00:50,356 we're doing something called a linked list 19 00:00:50,356 --> 00:00:53,326 where we have a buffer, and then we have a pointer 20 00:00:53,326 --> 00:00:56,189 that points to the next structure 21 00:00:56,189 --> 00:00:59,020 that has the same buffer in it, and so on, 22 00:00:59,020 --> 00:01:00,561 and when you do a buffer overflow, 23 00:01:00,561 --> 00:01:03,792 you're stuffing too much information into that container, 24 00:01:03,792 --> 00:01:07,061 so we've only set aside 14 characters, 25 00:01:07,061 --> 00:01:10,006 or 14 bytes, to hold that information, 26 00:01:10,006 --> 00:01:12,992 and when you write over this buffer, 27 00:01:12,992 --> 00:01:16,116 you are writing into the next field of that structure, 28 00:01:16,116 --> 00:01:19,724 in this case, so you will have lost that information 29 00:01:19,724 --> 00:01:22,825 that you had to be able to link those buffers together, 30 00:01:22,825 --> 00:01:24,426 and this is a buffer overflow. 31 00:01:24,426 --> 00:01:25,916 In this case, we've lost track 32 00:01:25,916 --> 00:01:28,373 of our linked list data structure, 33 00:01:28,373 --> 00:01:30,951 but you can do things that are much more damaging, 34 00:01:30,951 --> 00:01:32,940 which you'll soon see, 35 00:01:32,940 --> 00:01:35,141 and this is just showing how we're doing it, 36 00:01:35,141 --> 00:01:37,833 is that you're using an operation or a function 37 00:01:37,833 --> 00:01:41,583 in C, for instance, like a strcpy or sprintf, 38 00:01:43,785 --> 00:01:45,834 and you're writing to that buffer, 39 00:01:45,834 --> 00:01:49,245 and if you're not careful with your boundaries, 40 00:01:49,245 --> 00:01:51,189 you can actually overwrite that data 41 00:01:51,189 --> 00:01:53,350 and write over to the end of the buffer 42 00:01:53,350 --> 00:01:55,959 and into the thing that's next to the memory. 43 00:01:55,959 --> 00:02:00,446 When we're talking about processors and processes, 44 00:02:00,446 --> 00:02:04,775 you have very low level architectural things 45 00:02:04,775 --> 00:02:07,845 where you have to worry about, like, registers. 46 00:02:07,845 --> 00:02:12,738 In this case, we're talking about Intel's x86 processors, 47 00:02:12,738 --> 00:02:14,662 and we're running in 32 bit mode. 48 00:02:14,662 --> 00:02:17,472 Whenever you see something with a prefix of E, 49 00:02:17,472 --> 00:02:20,190 that's an extended instruction pointer. 50 00:02:20,190 --> 00:02:22,231 This is for compatibility reasons. 51 00:02:22,231 --> 00:02:24,925 They had an instruction pointer that was just an IP 52 00:02:24,925 --> 00:02:27,712 that was only 16 bits, so when they went to 32 bits, 53 00:02:27,712 --> 00:02:30,064 they gave everything with an E prefix, 54 00:02:30,064 --> 00:02:33,224 and then when you go into 64 bits, you have an R prefix, 55 00:02:33,224 --> 00:02:35,114 so we're talking about the instruction pointer, 56 00:02:35,114 --> 00:02:36,738 and the instruction pointer is a register 57 00:02:36,738 --> 00:02:39,260 that points to the address in memory 58 00:02:39,260 --> 00:02:41,492 where the next instruction is, 59 00:02:41,492 --> 00:02:44,225 the next instruction that we're going to execute, 60 00:02:44,225 --> 00:02:46,706 and you manipulate this register directly 61 00:02:46,706 --> 00:02:49,368 using calls and branches, 62 00:02:49,368 --> 00:02:52,090 and when you're returning from a call, 63 00:02:52,090 --> 00:02:54,542 you're putting information into the instruction pointer 64 00:02:54,542 --> 00:02:56,918 to resume where you left off, 65 00:02:56,918 --> 00:02:59,760 and it's valid only if you're pointing 66 00:02:59,760 --> 00:03:02,211 to an executable region in memory. 67 00:03:02,211 --> 00:03:06,290 If you start executing parts of memory that are maybe data, 68 00:03:06,290 --> 00:03:09,456 then the processor is actually interpreting that data 69 00:03:09,456 --> 00:03:12,632 as code, and it will go off into the weeds, 70 00:03:12,632 --> 00:03:15,492 and it will likely trigger an exception, 71 00:03:15,492 --> 00:03:18,875 and an exception, in this case, is a segmentation fault. 72 00:03:18,875 --> 00:03:20,518 There are other types of exceptions, 73 00:03:20,518 --> 00:03:23,418 like dividing by zero, for example, 74 00:03:23,418 --> 00:03:26,329 and if you execute invalid memory, 75 00:03:26,329 --> 00:03:28,738 if you execute from a location that is maybe, 76 00:03:28,738 --> 00:03:31,782 say, the zero page, then that may not be mapped 77 00:03:31,782 --> 00:03:33,121 into a process at all. 78 00:03:33,121 --> 00:03:35,588 It may not be data, it's just not there, 79 00:03:35,588 --> 00:03:37,590 and if you start trying to run things 80 00:03:37,590 --> 00:03:39,782 that are not there, then the processor gets upset 81 00:03:39,782 --> 00:03:42,398 and it'll tell the OS to stop the program. 82 00:03:42,398 --> 00:03:44,012 So, talking about the registers, 83 00:03:44,012 --> 00:03:46,346 we have a couple of special purpose registers 84 00:03:46,346 --> 00:03:49,021 that are used for keeping track of the stack, 85 00:03:49,021 --> 00:03:52,482 and the stack is, it's a temporary location 86 00:03:52,482 --> 00:03:55,831 to put values in memory, and stacks grow 87 00:03:55,831 --> 00:03:58,023 from higher addresses to lower addresses, 88 00:03:58,023 --> 00:03:59,739 so this is kind of maybe backwards. 89 00:03:59,739 --> 00:04:02,107 What you may be expecting is that you start 90 00:04:02,107 --> 00:04:04,578 from the higher addresses you have saved, 91 00:04:04,578 --> 00:04:06,353 values that are on the stack, 92 00:04:06,353 --> 00:04:08,037 and then you have local values 93 00:04:08,037 --> 00:04:11,537 that are stored inside of the stack frame, 94 00:04:12,371 --> 00:04:16,153 and between the local frame's end and the beginning 95 00:04:16,153 --> 00:04:18,892 is the stack pointer and the base pointer, 96 00:04:18,892 --> 00:04:21,311 and you can see that we have a saved base pointer 97 00:04:21,311 --> 00:04:26,148 that allows you to do, like, a linked list of stack frames, 98 00:04:26,148 --> 00:04:29,808 and the stack pointers then hold those local variables 99 00:04:29,808 --> 00:04:32,174 and those parameters that are used by a function, 100 00:04:32,174 --> 00:04:35,280 and it's used to save registers, so what does that mean? 101 00:04:35,280 --> 00:04:36,247 So, some of the registers 102 00:04:36,247 --> 00:04:38,641 that we talked about earlier, like EAX, 103 00:04:38,641 --> 00:04:40,632 EAX is a general purpose register 104 00:04:40,632 --> 00:04:43,883 that could be used as a result for mathematical functions 105 00:04:43,883 --> 00:04:45,242 or mathematical operations. 106 00:04:45,242 --> 00:04:47,130 Let's say you're adding two numbers together, 107 00:04:47,130 --> 00:04:48,690 and you have a result, 108 00:04:48,690 --> 00:04:50,540 and you would save those in that register. 109 00:04:50,540 --> 00:04:55,145 Well, let's say that a function called another function, 110 00:04:55,145 --> 00:04:57,666 and it expects EAX to have another value. 111 00:04:57,666 --> 00:04:59,070 You have to store those somewhere, 112 00:04:59,070 --> 00:05:01,247 and you would store them on the stack. 113 00:05:01,247 --> 00:05:03,944 The stack forms a chain in which functions are called 114 00:05:03,944 --> 00:05:06,871 and functions return, and those values 115 00:05:06,871 --> 00:05:09,224 that are stored on the stack, like the EIP, 116 00:05:09,224 --> 00:05:10,538 is the saved instruction pointer, 117 00:05:10,538 --> 00:05:14,334 so that we can resume back from where we left off 118 00:05:14,334 --> 00:05:16,898 after we return from a function. 119 00:05:16,898 --> 00:05:18,257 So, when a function is called, 120 00:05:18,257 --> 00:05:20,525 those local values and function arguments 121 00:05:20,525 --> 00:05:22,378 are pushed onto the stack, 122 00:05:22,378 --> 00:05:24,404 and then, when the function returns, 123 00:05:24,404 --> 00:05:28,415 those parameters are popped, so to speak, off the stack. 124 00:05:28,415 --> 00:05:29,939 So let me show you an example of this. 125 00:05:29,939 --> 00:05:31,904 So we're calling a function, 126 00:05:31,904 --> 00:05:33,894 and we're decorating the stack pointer, 127 00:05:33,894 --> 00:05:35,230 and that's our local frame, 128 00:05:35,230 --> 00:05:39,165 so we have saved the base pointer that we had from before 129 00:05:39,165 --> 00:05:41,119 and the instruction pointers so that we can resume 130 00:05:41,119 --> 00:05:43,469 from where we were when we left off, 131 00:05:43,469 --> 00:05:45,888 and then that base pointer is then incremented back up, 132 00:05:45,888 --> 00:05:48,391 and that establishes our new stack frame, 133 00:05:48,391 --> 00:05:51,516 and it keeps going and so on. 134 00:05:51,516 --> 00:05:54,424 And so, I think you could see where I'm going here with this 135 00:05:54,424 --> 00:05:57,545 is if you overflow a buffer that is on the stack, 136 00:05:57,545 --> 00:05:59,947 you can actually overflow these values 137 00:05:59,947 --> 00:06:02,177 that are saved registers, 138 00:06:02,177 --> 00:06:06,049 and one of the ones that's important is the EIP. 139 00:06:06,049 --> 00:06:08,156 So let me give you an example of this, 140 00:06:08,156 --> 00:06:10,161 and this is written, again, in C, 141 00:06:10,161 --> 00:06:12,331 and it's an intentionally vulnerable example. 142 00:06:12,331 --> 00:06:14,757 You probably will not see code 143 00:06:14,757 --> 00:06:17,073 that looks like this in the real world, 144 00:06:17,073 --> 00:06:19,224 but this is just for demonstration purposes 145 00:06:19,224 --> 00:06:21,761 that'll give you an idea about what's going on, 146 00:06:21,761 --> 00:06:25,928 and so what we have here is we have a main function, 147 00:06:26,904 --> 00:06:31,134 and this main function allocates a buffer called mybuffer, 148 00:06:31,134 --> 00:06:34,824 and you can see where it's called char mybuffer[128], 149 00:06:34,824 --> 00:06:38,661 so that means that I'm allocating 128 bytes for that buffer, 150 00:06:38,661 --> 00:06:42,947 and I'm using this function later on called strcpy, 151 00:06:42,947 --> 00:06:45,396 and what strcpy does is it takes two buffers 152 00:06:45,396 --> 00:06:46,844 and it copies between them, 153 00:06:46,844 --> 00:06:48,870 so, in this case, we have a buffer 154 00:06:48,870 --> 00:06:51,464 that is called argv, and we're indexing it by one. 155 00:06:51,464 --> 00:06:55,223 That is the first command argument value 156 00:06:55,223 --> 00:06:57,267 that we're copying to the buffer, 157 00:06:57,267 --> 00:07:00,068 and we've only allocated 128 bytes for this, 158 00:07:00,068 --> 00:07:01,765 and my arguments that I could pass 159 00:07:01,765 --> 00:07:04,095 to the program can be much longer. 160 00:07:04,095 --> 00:07:08,466 In my case, in my system, I had two megabytes was the limit, 161 00:07:08,466 --> 00:07:10,769 and so you could see that I could actually overflow 162 00:07:10,769 --> 00:07:15,449 that buffer far over what it actually is supposed 163 00:07:15,449 --> 00:07:16,968 to fit in there. 164 00:07:16,968 --> 00:07:19,521 And so, what happens when we put more than 128 bytes 165 00:07:19,521 --> 00:07:23,039 in this buffer is that we overwrite not only my buffer, 166 00:07:23,039 --> 00:07:25,429 but we start writing into the base pointer 167 00:07:25,429 --> 00:07:27,170 and the instruction pointer, 168 00:07:27,170 --> 00:07:29,769 so let's say that I'm putting just all As, 169 00:07:29,769 --> 00:07:31,644 and I'm passing those into the command. 170 00:07:31,644 --> 00:07:35,899 You could see that I'm writing over 128 bytes of As 171 00:07:35,899 --> 00:07:38,559 into the process, and then it goes in, 172 00:07:38,559 --> 00:07:40,470 and eventually it overwrites the base pointer 173 00:07:40,470 --> 00:07:41,527 and the instruction pointer, 174 00:07:41,527 --> 00:07:44,227 and possibly even data that's after that, 175 00:07:44,227 --> 00:07:46,826 and we're taking the control, 176 00:07:46,826 --> 00:07:49,247 so what happens is, when the process returns, 177 00:07:49,247 --> 00:07:52,478 it's going to load the value that is the saved EIP, 178 00:07:52,478 --> 00:07:54,144 the saved instruction pointer, 179 00:07:54,144 --> 00:07:56,675 and then it is gonna put that into the instruction pointer, 180 00:07:56,675 --> 00:07:58,772 and it's gonna start executing. 181 00:07:58,772 --> 00:08:01,330 So, that's really basically the gist 182 00:08:01,330 --> 00:08:03,041 of how a buffer overflow works 183 00:08:03,041 --> 00:08:06,818 is you're taking control of the instruction pointer, 184 00:08:06,818 --> 00:08:08,484 and you're running whatever, you know, 185 00:08:08,484 --> 00:08:10,769 you can actually point to code to run at, 186 00:08:10,769 --> 00:08:13,625 and what you could do, if you know where that buffer is, 187 00:08:13,625 --> 00:08:16,311 you could point the instruction pointer to that buffer 188 00:08:16,311 --> 00:08:18,193 and actually execute it directly. 189 00:08:18,193 --> 00:08:20,776 So, what was data becomes code.