1 00:00:06,506 --> 00:00:08,216 - [Instructor] So what a mutation-based fuzzer, 2 00:00:08,216 --> 00:00:09,399 I think it's fairly obvious, 3 00:00:09,399 --> 00:00:11,145 is it takes well-formatted input, 4 00:00:11,145 --> 00:00:13,079 like a message, a string, maybe a file, 5 00:00:13,079 --> 00:00:14,936 and it mutates the data. 6 00:00:14,936 --> 00:00:17,617 And you don't need to have any structure of the input data 7 00:00:17,617 --> 00:00:20,376 for a mutation-based fuzzer, at least for very simple ones. 8 00:00:20,376 --> 00:00:22,427 But for good code coverage, you need that 9 00:00:22,427 --> 00:00:24,577 large collection of well-formed inputs. 10 00:00:24,577 --> 00:00:26,068 This is called a Corpus. 11 00:00:26,068 --> 00:00:28,977 And you can get this information from the internet 12 00:00:28,977 --> 00:00:30,729 when you're fuzzing against a file, 13 00:00:30,729 --> 00:00:31,897 or maybe you're doing fuzzing 14 00:00:31,897 --> 00:00:33,737 against a certain packet format. 15 00:00:33,737 --> 00:00:36,706 You can get just p caps that are from the internet 16 00:00:36,706 --> 00:00:39,525 that are basically representations of the data, 17 00:00:39,525 --> 00:00:41,253 that the process would use, 18 00:00:41,253 --> 00:00:43,476 and then you could mutate that data 19 00:00:43,476 --> 00:00:45,745 and use it as fuzzing test cases. 20 00:00:45,745 --> 00:00:48,255 And those mutations can include flipping bits, 21 00:00:48,255 --> 00:00:50,887 you can duplicate certain fields, 22 00:00:50,887 --> 00:00:53,253 you can randomize data that's in the message, 23 00:00:53,253 --> 00:00:54,646 and then you could use your imagination. 24 00:00:54,646 --> 00:00:57,636 There's many different ways you can mutate the data. 25 00:00:57,636 --> 00:00:59,166 And then the result of the mutation 26 00:00:59,166 --> 00:01:00,645 is to be fed into the target, 27 00:01:00,645 --> 00:01:02,836 and that is your anomalous message. 28 00:01:02,836 --> 00:01:04,465 And then when you have a failure, 29 00:01:04,465 --> 00:01:07,234 you have to have some ways of detecting that failure, 30 00:01:07,234 --> 00:01:09,775 because sometimes, you won't have a hard crash. 31 00:01:09,775 --> 00:01:12,563 Oh maybe, the process is alive, maybe it's stuck, 32 00:01:12,563 --> 00:01:15,202 maybe it is stuck in an infinite loop, 33 00:01:15,202 --> 00:01:17,425 maybe it no longer processes messages, 34 00:01:17,425 --> 00:01:20,495 but the process is waiting for something, basically, 35 00:01:20,495 --> 00:01:22,916 waiting for resource to be freed up. 36 00:01:22,916 --> 00:01:25,632 So, you have to have some way of monitoring this. 37 00:01:25,632 --> 00:01:28,015 And typically, you would monitor the process directly, 38 00:01:28,015 --> 00:01:30,196 and you'd just see whether or not the process is running. 39 00:01:30,196 --> 00:01:32,919 Or you can see whether or not it's consuming CPU. 40 00:01:32,919 --> 00:01:36,484 If it consumes over 90% or 95% CPU, that would be indicative 41 00:01:36,484 --> 00:01:39,575 of an infinite loop scenario. 42 00:01:39,575 --> 00:01:41,636 And then another way to detect failures, 43 00:01:41,636 --> 00:01:43,185 which I would recommend, 44 00:01:43,185 --> 00:01:45,164 is just to send a good message, 45 00:01:45,164 --> 00:01:47,937 rather than sending an anomalous message with modifications 46 00:01:47,937 --> 00:01:50,106 to it, you could just send one that's good. 47 00:01:50,106 --> 00:01:52,512 And if the process acts the way it should, 48 00:01:52,512 --> 00:01:54,045 and responds maybe the way it should, 49 00:01:54,045 --> 00:01:57,005 then you that it's fine, you can move on. 50 00:01:57,005 --> 00:01:58,365 And then if you don't have a response, 51 00:01:58,365 --> 00:02:00,545 then you might have a denial of service condition. 52 00:02:00,545 --> 00:02:01,913 Maybe it doesn't crash anything, 53 00:02:01,913 --> 00:02:04,894 maybe it just went off to lunch. 54 00:02:04,894 --> 00:02:07,804 So in the case of a crash, you have to restart the process, 55 00:02:07,804 --> 00:02:09,585 and then note that information. 56 00:02:09,585 --> 00:02:12,105 And then, the anomalous message and program crash 57 00:02:12,105 --> 00:02:13,294 information you saved, 58 00:02:13,294 --> 00:02:14,225 like I said before, 59 00:02:14,225 --> 00:02:17,025 and then you would use that for further analysis. 60 00:02:17,025 --> 00:02:18,722 Generation-based Fuzzers. 61 00:02:18,722 --> 00:02:20,934 These are your Smart Fuzzers. 62 00:02:20,934 --> 00:02:23,294 And those work with a data model of the protocol. 63 00:02:23,294 --> 00:02:26,014 And that data model has to be created in advance. 64 00:02:26,014 --> 00:02:28,561 So, it requires a lot of hard work, 65 00:02:28,561 --> 00:02:31,720 first, before you can ever do any kind of fuzzing. 66 00:02:31,720 --> 00:02:33,601 Generation-based Fuzzers are 67 00:02:33,601 --> 00:02:36,321 typically something you would do much later on, 68 00:02:36,321 --> 00:02:38,313 after you set up mutation for a little while, 69 00:02:38,313 --> 00:02:39,585 you can do some generation fuzzing, 70 00:02:39,585 --> 00:02:41,244 after you have a model. 71 00:02:41,244 --> 00:02:43,825 Can actually create a message from scratch. 72 00:02:43,825 --> 00:02:45,872 In other words, you're actually creating from the ground up, 73 00:02:45,872 --> 00:02:47,694 a message based on the model. 74 00:02:47,694 --> 00:02:50,933 And then, other times, you can just start from the corpus, 75 00:02:50,933 --> 00:02:53,766 and use the model to modify the corpus, 76 00:02:53,766 --> 00:02:55,534 but also do those fix-ups that you need, 77 00:02:55,534 --> 00:02:57,644 to make the protocol look the way it should, 78 00:02:57,644 --> 00:02:59,883 so the parts surpasses the message. 79 00:02:59,883 --> 00:03:02,203 And then, modifications you add to the data model, 80 00:03:02,203 --> 00:03:04,033 for each anomalous test case. 81 00:03:04,033 --> 00:03:05,531 So you can start from one field, 82 00:03:05,531 --> 00:03:07,161 and you can go to another field, 83 00:03:07,161 --> 00:03:09,852 and then, the data model may include fix-ups, 84 00:03:09,852 --> 00:03:10,940 that you have to do, 85 00:03:10,940 --> 00:03:14,310 and those could be intelligently made by the fuzzer, 86 00:03:14,310 --> 00:03:16,340 based on the data model itself. 87 00:03:16,340 --> 00:03:18,138 Any those could be things like length fields, 88 00:03:18,138 --> 00:03:20,818 for instance, or check sounds. 89 00:03:20,818 --> 00:03:24,021 And, well, I found that Generation-based fuzzers, 90 00:03:24,021 --> 00:03:26,581 do provide better results for code. 91 00:03:26,581 --> 00:03:28,551 It maybe is not as well mature. 92 00:03:28,551 --> 00:03:31,330 If the code has been around for a long time, 93 00:03:31,330 --> 00:03:33,271 you may find that Generation fuzzers 94 00:03:33,271 --> 00:03:34,522 may not find anything. 95 00:03:34,522 --> 00:03:38,202 It's kind of that, you really won't know, until you try. 96 00:03:38,202 --> 00:03:41,358 So to move on, on the newer generations of fuzzers, 97 00:03:41,358 --> 00:03:42,630 the Evolutionary Fuzzers. 98 00:03:42,630 --> 00:03:45,296 This was where a lot of research is happening. 99 00:03:45,296 --> 00:03:48,336 And these actually generate test cases based on 100 00:03:48,336 --> 00:03:50,446 looking at the program itself, 101 00:03:50,446 --> 00:03:54,024 so you have another process that run alongside the program, 102 00:03:54,024 --> 00:03:57,235 and it actually watches what's happening with the program, 103 00:03:57,235 --> 00:04:00,184 so you can see what the code coverage is. 104 00:04:00,184 --> 00:04:02,103 You usually start with something very small, 105 00:04:02,103 --> 00:04:04,435 a very small seed corpus, to get you into that state 106 00:04:04,435 --> 00:04:07,024 to where you can start walking through the program. 107 00:04:07,024 --> 00:04:10,323 And then figure out what you want to do from there. 108 00:04:10,323 --> 00:04:13,464 And it aims for a more complete coverage, 109 00:04:13,464 --> 00:04:17,403 and you also want to eliminate redundant test cases, 110 00:04:17,403 --> 00:04:19,123 that are testing the exact same code. 111 00:04:19,123 --> 00:04:21,094 There's really no point, to running the same code, 112 00:04:21,094 --> 00:04:23,561 over and over again, just to look for those 113 00:04:23,561 --> 00:04:25,318 anomalous test cases. 114 00:04:25,318 --> 00:04:27,188 So it's much quicker to set up, 115 00:04:27,188 --> 00:04:30,079 it may give you better results than 116 00:04:30,079 --> 00:04:34,535 both either mutation, or generation-based fuzzing. 117 00:04:34,535 --> 00:04:36,468 But there are some downsides to running with 118 00:04:36,468 --> 00:04:37,967 an Evolutionary fuzzer. 119 00:04:37,967 --> 00:04:40,898 In some cases, the program has to be written a certain way. 120 00:04:40,898 --> 00:04:43,458 It has to be written in a certain language, 121 00:04:43,458 --> 00:04:45,159 it has to be, you have a source code available, 122 00:04:45,159 --> 00:04:47,610 so you can re-compile the program, 123 00:04:47,610 --> 00:04:49,530 so that you could instrument it, 124 00:04:49,530 --> 00:04:52,780 and actually listen to what's going on in the program. 125 00:04:52,780 --> 00:04:55,754 And so, you may need code available. 126 00:04:55,754 --> 00:04:57,226 And you may not have that code available, 127 00:04:57,226 --> 00:04:59,405 so maybe that's something you can't do. 128 00:04:59,405 --> 00:05:02,239 There are Evolutionary fuzzers that do work 129 00:05:02,239 --> 00:05:03,243 on like QEMU. 130 00:05:03,243 --> 00:05:05,333 One of the examples is an American Fuzzy Lop. 131 00:05:05,333 --> 00:05:08,083 It's an experimental feature, but you can try it. 132 00:05:08,083 --> 00:05:10,784 But this is definitely something to try, 133 00:05:10,784 --> 00:05:14,104 simply, like I said, you really wanna always be fuzzing, 134 00:05:14,104 --> 00:05:15,864 when you have a target. 135 00:05:15,864 --> 00:05:17,494 And Evolutionary fuzzer gives you 136 00:05:17,494 --> 00:05:19,744 a quick way to get started.