1
00:00:06,506 --> 00:00:08,216
- [Instructor] So what
a mutation-based fuzzer,

2
00:00:08,216 --> 00:00:09,399
I think it's fairly obvious,

3
00:00:09,399 --> 00:00:11,145
is it takes well-formatted input,

4
00:00:11,145 --> 00:00:13,079
like a message, a string, maybe a file,

5
00:00:13,079 --> 00:00:14,936
and it mutates the data.

6
00:00:14,936 --> 00:00:17,617
And you don't need to have any
structure of the input data

7
00:00:17,617 --> 00:00:20,376
for a mutation-based fuzzer,
at least for very simple ones.

8
00:00:20,376 --> 00:00:22,427
But for good code coverage, you need that

9
00:00:22,427 --> 00:00:24,577
large collection of well-formed inputs.

10
00:00:24,577 --> 00:00:26,068
This is called a Corpus.

11
00:00:26,068 --> 00:00:28,977
And you can get this
information from the internet

12
00:00:28,977 --> 00:00:30,729
when you're fuzzing against a file,

13
00:00:30,729 --> 00:00:31,897
or maybe you're doing fuzzing

14
00:00:31,897 --> 00:00:33,737
against a certain packet format.

15
00:00:33,737 --> 00:00:36,706
You can get just p caps
that are from the internet

16
00:00:36,706 --> 00:00:39,525
that are basically
representations of the data,

17
00:00:39,525 --> 00:00:41,253
that the process would use,

18
00:00:41,253 --> 00:00:43,476
and then you could mutate that data

19
00:00:43,476 --> 00:00:45,745
and use it as fuzzing test cases.

20
00:00:45,745 --> 00:00:48,255
And those mutations can
include flipping bits,

21
00:00:48,255 --> 00:00:50,887
you can duplicate certain fields,

22
00:00:50,887 --> 00:00:53,253
you can randomize data
that's in the message,

23
00:00:53,253 --> 00:00:54,646
and then you could use your imagination.

24
00:00:54,646 --> 00:00:57,636
There's many different ways
you can mutate the data.

25
00:00:57,636 --> 00:00:59,166
And then the result of the mutation

26
00:00:59,166 --> 00:01:00,645
is to be fed into the target,

27
00:01:00,645 --> 00:01:02,836
and that is your anomalous message.

28
00:01:02,836 --> 00:01:04,465
And then when you have a failure,

29
00:01:04,465 --> 00:01:07,234
you have to have some ways
of detecting that failure,

30
00:01:07,234 --> 00:01:09,775
because sometimes, you
won't have a hard crash.

31
00:01:09,775 --> 00:01:12,563
Oh maybe, the process is
alive, maybe it's stuck,

32
00:01:12,563 --> 00:01:15,202
maybe it is stuck in an infinite loop,

33
00:01:15,202 --> 00:01:17,425
maybe it no longer processes messages,

34
00:01:17,425 --> 00:01:20,495
but the process is waiting
for something, basically,

35
00:01:20,495 --> 00:01:22,916
waiting for resource to be freed up.

36
00:01:22,916 --> 00:01:25,632
So, you have to have some
way of monitoring this.

37
00:01:25,632 --> 00:01:28,015
And typically, you would
monitor the process directly,

38
00:01:28,015 --> 00:01:30,196
and you'd just see whether or
not the process is running.

39
00:01:30,196 --> 00:01:32,919
Or you can see whether or
not it's consuming CPU.

40
00:01:32,919 --> 00:01:36,484
If it consumes over 90% or 95%
CPU, that would be indicative

41
00:01:36,484 --> 00:01:39,575
of an infinite loop scenario.

42
00:01:39,575 --> 00:01:41,636
And then another way to detect failures,

43
00:01:41,636 --> 00:01:43,185
which I would recommend,

44
00:01:43,185 --> 00:01:45,164
is just to send a good message,

45
00:01:45,164 --> 00:01:47,937
rather than sending an anomalous
message with modifications

46
00:01:47,937 --> 00:01:50,106
to it, you could just
send one that's good.

47
00:01:50,106 --> 00:01:52,512
And if the process acts the way it should,

48
00:01:52,512 --> 00:01:54,045
and responds maybe the way it should,

49
00:01:54,045 --> 00:01:57,005
then you that it's fine, you can move on.

50
00:01:57,005 --> 00:01:58,365
And then if you don't have a response,

51
00:01:58,365 --> 00:02:00,545
then you might have a
denial of service condition.

52
00:02:00,545 --> 00:02:01,913
Maybe it doesn't crash anything,

53
00:02:01,913 --> 00:02:04,894
maybe it just went off to lunch.

54
00:02:04,894 --> 00:02:07,804
So in the case of a crash, you
have to restart the process,

55
00:02:07,804 --> 00:02:09,585
and then note that information.

56
00:02:09,585 --> 00:02:12,105
And then, the anomalous
message and program crash

57
00:02:12,105 --> 00:02:13,294
information you saved,

58
00:02:13,294 --> 00:02:14,225
like I said before,

59
00:02:14,225 --> 00:02:17,025
and then you would use
that for further analysis.

60
00:02:17,025 --> 00:02:18,722
Generation-based Fuzzers.

61
00:02:18,722 --> 00:02:20,934
These are your Smart Fuzzers.

62
00:02:20,934 --> 00:02:23,294
And those work with a data
model of the protocol.

63
00:02:23,294 --> 00:02:26,014
And that data model has
to be created in advance.

64
00:02:26,014 --> 00:02:28,561
So, it requires a lot of hard work,

65
00:02:28,561 --> 00:02:31,720
first, before you can ever
do any kind of fuzzing.

66
00:02:31,720 --> 00:02:33,601
Generation-based Fuzzers are

67
00:02:33,601 --> 00:02:36,321
typically something you
would do much later on,

68
00:02:36,321 --> 00:02:38,313
after you set up mutation
for a little while,

69
00:02:38,313 --> 00:02:39,585
you can do some generation fuzzing,

70
00:02:39,585 --> 00:02:41,244
after you have a model.

71
00:02:41,244 --> 00:02:43,825
Can actually create a
message from scratch.

72
00:02:43,825 --> 00:02:45,872
In other words, you're actually
creating from the ground up,

73
00:02:45,872 --> 00:02:47,694
a message based on the model.

74
00:02:47,694 --> 00:02:50,933
And then, other times, you can
just start from the corpus,

75
00:02:50,933 --> 00:02:53,766
and use the model to modify the corpus,

76
00:02:53,766 --> 00:02:55,534
but also do those fix-ups that you need,

77
00:02:55,534 --> 00:02:57,644
to make the protocol
look the way it should,

78
00:02:57,644 --> 00:02:59,883
so the parts surpasses the message.

79
00:02:59,883 --> 00:03:02,203
And then, modifications
you add to the data model,

80
00:03:02,203 --> 00:03:04,033
for each anomalous test case.

81
00:03:04,033 --> 00:03:05,531
So you can start from one field,

82
00:03:05,531 --> 00:03:07,161
and you can go to another field,

83
00:03:07,161 --> 00:03:09,852
and then, the data model
may include fix-ups,

84
00:03:09,852 --> 00:03:10,940
that you have to do,

85
00:03:10,940 --> 00:03:14,310
and those could be intelligently
made by the fuzzer,

86
00:03:14,310 --> 00:03:16,340
based on the data model itself.

87
00:03:16,340 --> 00:03:18,138
Any those could be things
like length fields,

88
00:03:18,138 --> 00:03:20,818
for instance, or check sounds.

89
00:03:20,818 --> 00:03:24,021
And, well, I found that
Generation-based fuzzers,

90
00:03:24,021 --> 00:03:26,581
do provide better results for code.

91
00:03:26,581 --> 00:03:28,551
It maybe is not as well mature.

92
00:03:28,551 --> 00:03:31,330
If the code has been
around for a long time,

93
00:03:31,330 --> 00:03:33,271
you may find that Generation fuzzers

94
00:03:33,271 --> 00:03:34,522
may not find anything.

95
00:03:34,522 --> 00:03:38,202
It's kind of that, you really
won't know, until you try.

96
00:03:38,202 --> 00:03:41,358
So to move on, on the newer
generations of fuzzers,

97
00:03:41,358 --> 00:03:42,630
the Evolutionary Fuzzers.

98
00:03:42,630 --> 00:03:45,296
This was where a lot of
research is happening.

99
00:03:45,296 --> 00:03:48,336
And these actually generate
test cases based on

100
00:03:48,336 --> 00:03:50,446
looking at the program itself,

101
00:03:50,446 --> 00:03:54,024
so you have another process
that run alongside the program,

102
00:03:54,024 --> 00:03:57,235
and it actually watches what's
happening with the program,

103
00:03:57,235 --> 00:04:00,184
so you can see what the code coverage is.

104
00:04:00,184 --> 00:04:02,103
You usually start with
something very small,

105
00:04:02,103 --> 00:04:04,435
a very small seed corpus,
to get you into that state

106
00:04:04,435 --> 00:04:07,024
to where you can start
walking through the program.

107
00:04:07,024 --> 00:04:10,323
And then figure out what
you want to do from there.

108
00:04:10,323 --> 00:04:13,464
And it aims for a more complete coverage,

109
00:04:13,464 --> 00:04:17,403
and you also want to eliminate
redundant test cases,

110
00:04:17,403 --> 00:04:19,123
that are testing the exact same code.

111
00:04:19,123 --> 00:04:21,094
There's really no point,
to running the same code,

112
00:04:21,094 --> 00:04:23,561
over and over again,
just to look for those

113
00:04:23,561 --> 00:04:25,318
anomalous test cases.

114
00:04:25,318 --> 00:04:27,188
So it's much quicker to set up,

115
00:04:27,188 --> 00:04:30,079
it may give you better results than

116
00:04:30,079 --> 00:04:34,535
both either mutation, or
generation-based fuzzing.

117
00:04:34,535 --> 00:04:36,468
But there are some
downsides to running with

118
00:04:36,468 --> 00:04:37,967
an Evolutionary fuzzer.

119
00:04:37,967 --> 00:04:40,898
In some cases, the program has
to be written a certain way.

120
00:04:40,898 --> 00:04:43,458
It has to be written
in a certain language,

121
00:04:43,458 --> 00:04:45,159
it has to be, you have
a source code available,

122
00:04:45,159 --> 00:04:47,610
so you can re-compile the program,

123
00:04:47,610 --> 00:04:49,530
so that you could instrument it,

124
00:04:49,530 --> 00:04:52,780
and actually listen to what's
going on in the program.

125
00:04:52,780 --> 00:04:55,754
And so, you may need code available.

126
00:04:55,754 --> 00:04:57,226
And you may not have that code available,

127
00:04:57,226 --> 00:04:59,405
so maybe that's something you can't do.

128
00:04:59,405 --> 00:05:02,239
There are Evolutionary
fuzzers that do work

129
00:05:02,239 --> 00:05:03,243
on like QEMU.

130
00:05:03,243 --> 00:05:05,333
One of the examples is
an American Fuzzy Lop.

131
00:05:05,333 --> 00:05:08,083
It's an experimental
feature, but you can try it.

132
00:05:08,083 --> 00:05:10,784
But this is definitely something to try,

133
00:05:10,784 --> 00:05:14,104
simply, like I said, you
really wanna always be fuzzing,

134
00:05:14,104 --> 00:05:15,864
when you have a target.

135
00:05:15,864 --> 00:05:17,494
And Evolutionary fuzzer gives you

136
00:05:17,494 --> 00:05:19,744
a quick way to get started.