1
00:00:00,000 --> 00:00:06,532
Until now, I provided an example of identifying if an object

2
00:00:06,533 --> 00:00:10,004
in an image is a dog or not a dog.

3
00:00:10,005 --> 00:00:14,788
This type of task is called classification, and this

4
00:00:14,789 --> 00:00:18,692
is a very popular use case in supervised learning.

5
00:00:18,693 --> 00:00:22,794
Think about an email service, how the system can identify

6
00:00:22,795 --> 00:00:27,580
which email is a spam or regular legitimate email.

7
00:00:27,581 --> 00:00:31,210
This is a classical use case of classification.

8
00:00:31,211 --> 00:00:34,748
The machine learning solution to be implemented in

9
00:00:34,749 --> 00:00:39,504
such email service should automatically classify if a

10
00:00:39,505 --> 00:00:41,568
new email is a spam or not.

11
00:00:41,569 --> 00:00:44,560
We can't even imagine our life today without

12
00:00:44,561 --> 00:00:48,048
using such features in any email system.

13
00:00:48,049 --> 00:00:52,010
This type of classification is also called binary

14
00:00:52,011 --> 00:00:56,510
classification, meaning only two options, two classes.

15
00:00:57,890 --> 00:01:02,612
During the training phase, the classification algorithm will

16
00:01:02,613 --> 00:01:06,472
be given labeled data points with emails that

17
00:01:06,473 --> 00:01:09,886
are both spam and not spam.

18
00:01:09,887 --> 00:01:14,408
Using this information, it will create a model with

19
00:01:14,409 --> 00:01:20,010
a mapping function moving from x to y.

20
00:01:20,011 --> 00:01:24,124
Then, when provided with an unseen new email,

21
00:01:24,125 --> 00:01:27,792
the model will use this mapping function to

22
00:01:27,793 --> 00:01:31,290
determine whether or not the email is spam.

23
00:01:32,430 --> 00:01:39,238
Other classification tasks will require multiple values, multiple classes,

24
00:01:39,239 --> 00:01:44,602
which is also called multiclass classification, like the task

25
00:01:44,603 --> 00:01:49,834
of identifying if the color of a specific flower

26
00:01:49,835 --> 00:01:53,710
is yellow, green, red or blue.

27
00:01:53,711 --> 00:01:56,790
So here there are four classes.

28
00:01:56,791 --> 00:01:59,438
We can build a binary classifier

29
00:01:59,439 --> 00:02:03,480
or multiclass classifier using shallow learning

30
00:02:03,481 --> 00:02:05,806
or using deep learning algorithms.

31
00:02:05,807 --> 00:02:08,044
For example, one of the common

32
00:02:08,045 --> 00:02:12,108
classification algorithms under the shallow learning

33
00:02:12,109 --> 00:02:15,750
category is called Support Vector Machines.

34
00:02:17,050 --> 00:02:19,744
Okay. As a reminder, we are still under the

35
00:02:19,745 --> 00:02:25,254
supervised learning category and under the classification task,

36
00:02:25,255 --> 00:02:26,784
which means we would like to build a

37
00:02:26,785 --> 00:02:30,060
machine learning system that will classify data.

38
00:02:30,610 --> 00:02:34,532
SVM, Support Vector Machine is an algorithm to

39
00:02:34,533 --> 00:02:37,988
create such type of classifier as points in

40
00:02:37,989 --> 00:02:41,418
space that are mapped into separate domains.

41
00:02:41,419 --> 00:02:43,600
Let's see that in a visual way.

42
00:02:44,130 --> 00:02:46,312
Imagine we have a data set of

43
00:02:46,313 --> 00:02:48,446
people with their weight and height.

44
00:02:48,447 --> 00:02:50,648
In machine learning terminology, this type

45
00:02:50,649 --> 00:02:53,112
of information are called features. Okay.

46
00:02:53,113 --> 00:02:54,120
We talked about it.

47
00:02:54,121 --> 00:02:55,708
One feature is the weight and

48
00:02:55,709 --> 00:02:57,282
another one is the height.

49
00:02:57,283 --> 00:03:01,050
In addition, each person can be classified

50
00:03:01,051 --> 00:03:03,234
as a male or a female.

51
00:03:03,235 --> 00:03:06,650
So this is a binary classification task.

52
00:03:06,651 --> 00:03:09,894
Now that group of data points can be placed

53
00:03:09,895 --> 00:03:15,630
in two dimensional space, x1 and x2,

54
00:03:15,631 --> 00:03:18,463
the weight and height, like you can see here.

55
00:03:18,864 --> 00:03:22,858
Now, can we draw a line that will separate

56
00:03:22,859 --> 00:03:26,186
between the two groups somehow, male and female.

57
00:03:26,187 --> 00:03:29,972
Using this information, the weight and height features, I

58
00:03:29,973 --> 00:03:32,936
can of course manually try to draw the line

59
00:03:32,937 --> 00:03:36,870
here and maybe move that a little bit here.

60
00:03:36,871 --> 00:03:41,990
There are many lines that might classify the data,

61
00:03:41,991 --> 00:03:44,664
but it's not going to be an optimal line.

62
00:03:44,665 --> 00:03:48,882
It is better to find the line that represent

63
00:03:48,883 --> 00:03:54,390
the largest separation or margin between those classes.

64
00:03:55,210 --> 00:03:58,086
The job of the Support Vector machine

65
00:03:58,087 --> 00:04:02,350
algorithm is to search for this optimal

66
00:04:02,351 --> 00:04:05,472
line, or better call it hyperplane, okay,

67
00:04:05,473 --> 00:04:09,046
because in two dimension it's a line and it's

68
00:04:09,047 --> 00:04:11,310
going to be a plane in three dimensions.

69
00:04:12,610 --> 00:04:15,940
In our example, this line should break those

70
00:04:15,941 --> 00:04:19,668
points into two groups, two classes, and it

71
00:04:19,669 --> 00:04:22,522
is basically a simple math formula.

72
00:04:22,523 --> 00:04:24,472
It will find a line that has

73
00:04:24,473 --> 00:04:28,430
the maximum margin, meaning the maximum distance

74
00:04:28,431 --> 00:04:32,248
between data points of both classes. Okay?

75
00:04:32,249 --> 00:04:34,952
More margin will increase the chance to

76
00:04:34,953 --> 00:04:38,316
classify correctly a future data points.

77
00:04:38,317 --> 00:04:41,500
So when getting a new data point, the machine learning

78
00:04:41,501 --> 00:04:46,540
system will use this line to decide if the point

79
00:04:46,541 --> 00:04:49,756
is related to class A or class B.

80
00:04:49,757 --> 00:04:52,650
Is that a male or a female?

81
00:04:52,651 --> 00:04:55,468
Again, I don't want to go too deep here,

82
00:04:55,469 --> 00:04:59,154
so let's zoom out to the big picture.

83
00:04:59,155 --> 00:05:01,530
We talked about the first typical

84
00:05:01,531 --> 00:05:04,930
task in supervised learning, meaning classification.

85
00:05:04,931 --> 00:05:09,156
One common method to build such classifier in

86
00:05:09,157 --> 00:05:13,578
shallow learning is called Support Vector Machine algorithm.

87
00:05:13,579 --> 00:05:16,340
The task of classification is very

88
00:05:16,341 --> 00:05:19,252
common in practical machine learning system.

89
00:05:19,253 --> 00:05:20,772
Let's move to the next type

90
00:05:20,773 --> 00:05:23,407
of task in supervised learning.