1 00:00:00,000 --> 00:00:06,532 Until now, I provided an example of identifying if an object 2 00:00:06,533 --> 00:00:10,004 in an image is a dog or not a dog. 3 00:00:10,005 --> 00:00:14,788 This type of task is called classification, and this 4 00:00:14,789 --> 00:00:18,692 is a very popular use case in supervised learning. 5 00:00:18,693 --> 00:00:22,794 Think about an email service, how the system can identify 6 00:00:22,795 --> 00:00:27,580 which email is a spam or regular legitimate email. 7 00:00:27,581 --> 00:00:31,210 This is a classical use case of classification. 8 00:00:31,211 --> 00:00:34,748 The machine learning solution to be implemented in 9 00:00:34,749 --> 00:00:39,504 such email service should automatically classify if a 10 00:00:39,505 --> 00:00:41,568 new email is a spam or not. 11 00:00:41,569 --> 00:00:44,560 We can't even imagine our life today without 12 00:00:44,561 --> 00:00:48,048 using such features in any email system. 13 00:00:48,049 --> 00:00:52,010 This type of classification is also called binary 14 00:00:52,011 --> 00:00:56,510 classification, meaning only two options, two classes. 15 00:00:57,890 --> 00:01:02,612 During the training phase, the classification algorithm will 16 00:01:02,613 --> 00:01:06,472 be given labeled data points with emails that 17 00:01:06,473 --> 00:01:09,886 are both spam and not spam. 18 00:01:09,887 --> 00:01:14,408 Using this information, it will create a model with 19 00:01:14,409 --> 00:01:20,010 a mapping function moving from x to y. 20 00:01:20,011 --> 00:01:24,124 Then, when provided with an unseen new email, 21 00:01:24,125 --> 00:01:27,792 the model will use this mapping function to 22 00:01:27,793 --> 00:01:31,290 determine whether or not the email is spam. 23 00:01:32,430 --> 00:01:39,238 Other classification tasks will require multiple values, multiple classes, 24 00:01:39,239 --> 00:01:44,602 which is also called multiclass classification, like the task 25 00:01:44,603 --> 00:01:49,834 of identifying if the color of a specific flower 26 00:01:49,835 --> 00:01:53,710 is yellow, green, red or blue. 27 00:01:53,711 --> 00:01:56,790 So here there are four classes. 28 00:01:56,791 --> 00:01:59,438 We can build a binary classifier 29 00:01:59,439 --> 00:02:03,480 or multiclass classifier using shallow learning 30 00:02:03,481 --> 00:02:05,806 or using deep learning algorithms. 31 00:02:05,807 --> 00:02:08,044 For example, one of the common 32 00:02:08,045 --> 00:02:12,108 classification algorithms under the shallow learning 33 00:02:12,109 --> 00:02:15,750 category is called Support Vector Machines. 34 00:02:17,050 --> 00:02:19,744 Okay. As a reminder, we are still under the 35 00:02:19,745 --> 00:02:25,254 supervised learning category and under the classification task, 36 00:02:25,255 --> 00:02:26,784 which means we would like to build a 37 00:02:26,785 --> 00:02:30,060 machine learning system that will classify data. 38 00:02:30,610 --> 00:02:34,532 SVM, Support Vector Machine is an algorithm to 39 00:02:34,533 --> 00:02:37,988 create such type of classifier as points in 40 00:02:37,989 --> 00:02:41,418 space that are mapped into separate domains. 41 00:02:41,419 --> 00:02:43,600 Let's see that in a visual way. 42 00:02:44,130 --> 00:02:46,312 Imagine we have a data set of 43 00:02:46,313 --> 00:02:48,446 people with their weight and height. 44 00:02:48,447 --> 00:02:50,648 In machine learning terminology, this type 45 00:02:50,649 --> 00:02:53,112 of information are called features. Okay. 46 00:02:53,113 --> 00:02:54,120 We talked about it. 47 00:02:54,121 --> 00:02:55,708 One feature is the weight and 48 00:02:55,709 --> 00:02:57,282 another one is the height. 49 00:02:57,283 --> 00:03:01,050 In addition, each person can be classified 50 00:03:01,051 --> 00:03:03,234 as a male or a female. 51 00:03:03,235 --> 00:03:06,650 So this is a binary classification task. 52 00:03:06,651 --> 00:03:09,894 Now that group of data points can be placed 53 00:03:09,895 --> 00:03:15,630 in two dimensional space, x1 and x2, 54 00:03:15,631 --> 00:03:18,463 the weight and height, like you can see here. 55 00:03:18,864 --> 00:03:22,858 Now, can we draw a line that will separate 56 00:03:22,859 --> 00:03:26,186 between the two groups somehow, male and female. 57 00:03:26,187 --> 00:03:29,972 Using this information, the weight and height features, I 58 00:03:29,973 --> 00:03:32,936 can of course manually try to draw the line 59 00:03:32,937 --> 00:03:36,870 here and maybe move that a little bit here. 60 00:03:36,871 --> 00:03:41,990 There are many lines that might classify the data, 61 00:03:41,991 --> 00:03:44,664 but it's not going to be an optimal line. 62 00:03:44,665 --> 00:03:48,882 It is better to find the line that represent 63 00:03:48,883 --> 00:03:54,390 the largest separation or margin between those classes. 64 00:03:55,210 --> 00:03:58,086 The job of the Support Vector machine 65 00:03:58,087 --> 00:04:02,350 algorithm is to search for this optimal 66 00:04:02,351 --> 00:04:05,472 line, or better call it hyperplane, okay, 67 00:04:05,473 --> 00:04:09,046 because in two dimension it's a line and it's 68 00:04:09,047 --> 00:04:11,310 going to be a plane in three dimensions. 69 00:04:12,610 --> 00:04:15,940 In our example, this line should break those 70 00:04:15,941 --> 00:04:19,668 points into two groups, two classes, and it 71 00:04:19,669 --> 00:04:22,522 is basically a simple math formula. 72 00:04:22,523 --> 00:04:24,472 It will find a line that has 73 00:04:24,473 --> 00:04:28,430 the maximum margin, meaning the maximum distance 74 00:04:28,431 --> 00:04:32,248 between data points of both classes. Okay? 75 00:04:32,249 --> 00:04:34,952 More margin will increase the chance to 76 00:04:34,953 --> 00:04:38,316 classify correctly a future data points. 77 00:04:38,317 --> 00:04:41,500 So when getting a new data point, the machine learning 78 00:04:41,501 --> 00:04:46,540 system will use this line to decide if the point 79 00:04:46,541 --> 00:04:49,756 is related to class A or class B. 80 00:04:49,757 --> 00:04:52,650 Is that a male or a female? 81 00:04:52,651 --> 00:04:55,468 Again, I don't want to go too deep here, 82 00:04:55,469 --> 00:04:59,154 so let's zoom out to the big picture. 83 00:04:59,155 --> 00:05:01,530 We talked about the first typical 84 00:05:01,531 --> 00:05:04,930 task in supervised learning, meaning classification. 85 00:05:04,931 --> 00:05:09,156 One common method to build such classifier in 86 00:05:09,157 --> 00:05:13,578 shallow learning is called Support Vector Machine algorithm. 87 00:05:13,579 --> 00:05:16,340 The task of classification is very 88 00:05:16,341 --> 00:05:19,252 common in practical machine learning system. 89 00:05:19,253 --> 00:05:20,772 Let's move to the next type 90 00:05:20,773 --> 00:05:23,407 of task in supervised learning.