1 00:00:00,000 --> 00:00:02,772 The second very common method in 2 00:00:02,773 --> 00:00:05,434 supervised learning is called regression. 3 00:00:05,435 --> 00:00:08,148 And maybe you already encountered that as a 4 00:00:08,149 --> 00:00:11,908 statistical method to analyze and predict data. 5 00:00:11,909 --> 00:00:14,452 I used it multiple times during my 6 00:00:14,453 --> 00:00:16,548 engineering degree a long time ago. 7 00:00:16,549 --> 00:00:19,754 It is very straightforward method to predict 8 00:00:19,755 --> 00:00:24,000 a continuous number based on historical data. 9 00:00:24,610 --> 00:00:26,732 And let's say that we would like to 10 00:00:26,733 --> 00:00:29,682 build a machine learning system that can predict 11 00:00:29,683 --> 00:00:33,690 the price of real estate products like houses. 12 00:00:33,691 --> 00:00:37,132 If you will ask a real estate expert how 13 00:00:37,133 --> 00:00:41,500 to evaluate the price of a specific property, then 14 00:00:42,030 --> 00:00:45,574 he or she will use multiple attributes. 15 00:00:45,575 --> 00:00:48,096 It can be the size of the property, 16 00:00:48,097 --> 00:00:52,388 the number of rooms, overall condition, location, the 17 00:00:52,389 --> 00:00:55,236 average price of similar houses, and many more 18 00:00:55,237 --> 00:00:59,364 attributes. Going back to our machine system, 19 00:00:59,365 --> 00:01:01,892 the objective of the algorithm will be to 20 00:01:01,893 --> 00:01:05,694 predict the price based on such attributes. 21 00:01:05,695 --> 00:01:09,540 It's like making an AI real estate agent. 22 00:01:11,350 --> 00:01:15,496 A price is an example of a continuous number. 23 00:01:15,497 --> 00:01:19,900 A continuous number can be an age of a person 24 00:01:19,901 --> 00:01:25,346 or a product weight, some score in an exam, income 25 00:01:25,347 --> 00:01:29,390 of a person, annual company revenue, and many more. 26 00:01:29,391 --> 00:01:31,952 On the other end, gender is 27 00:01:31,953 --> 00:01:33,616 not really a continuous number. 28 00:01:33,617 --> 00:01:36,496 It's a group of possible options, 29 00:01:36,497 --> 00:01:38,486 like a female or male. 30 00:01:38,487 --> 00:01:41,650 So we will handle it using classification, 31 00:01:41,651 --> 00:01:44,148 as we saw earlier. To be 32 00:01:44,149 --> 00:01:47,252 able to predict a continuous number, 33 00:01:47,253 --> 00:01:49,818 one of the most relevant types 34 00:01:49,819 --> 00:01:52,618 of algorithms is based on regression. 35 00:01:52,619 --> 00:01:57,640 The concept of regression analysis is widely used for data 36 00:01:57,641 --> 00:02:02,568 analysis, but in addition it is also used as one 37 00:02:02,569 --> 00:02:05,852 of the most basic forms of machine learning. 38 00:02:05,853 --> 00:02:08,145 So what is regression? 39 00:02:08,146 --> 00:02:12,172 Regression is a set of statistical methods for 40 00:02:12,173 --> 00:02:16,652 estimating the strength of the relationship between a 41 00:02:16,653 --> 00:02:21,686 dependent variable and one or more independent variables. 42 00:02:21,687 --> 00:02:25,870 Such relationships can be linear or nonlinear. 43 00:02:25,871 --> 00:02:29,270 The most common form of regression analysis 44 00:02:29,271 --> 00:02:32,452 is linear regression, but there are also 45 00:02:32,453 --> 00:02:34,666 different types of regression algorithms. 46 00:02:34,667 --> 00:02:36,906 For example, logistic regression 47 00:02:36,907 --> 00:02:39,674 and polynomial regression. 48 00:02:39,675 --> 00:02:41,300 At this point, let's talk about 49 00:02:41,301 --> 00:02:43,810 the first option called linear regression. 50 00:02:44,790 --> 00:02:49,352 Linear regression algorithms learns a model 51 00:02:49,353 --> 00:02:51,528 which is a linear combination of 52 00:02:51,529 --> 00:02:54,494 features coming from the input examples. 53 00:02:54,495 --> 00:02:59,548 There is a dependent variable labeled y which we 54 00:02:59,549 --> 00:03:04,322 would like to predict, and independent group of variables 55 00:03:04,323 --> 00:03:07,202 labeled x1, x2, and so forth. 56 00:03:07,203 --> 00:03:09,254 These are the predictors. 57 00:03:09,255 --> 00:03:12,560 Y is basically a function of x 58 00:03:12,561 --> 00:03:16,192 variables and the regression model is a 59 00:03:16,193 --> 00:03:19,398 linear approximation of these functions. 60 00:03:19,399 --> 00:03:22,740 The basic assumption, which is not always 61 00:03:22,741 --> 00:03:26,282 true, is that there are linear relationships 62 00:03:26,283 --> 00:03:29,870 between the dependent and independent variables. 63 00:03:30,530 --> 00:03:34,622 Looking on this simple one dimensional linear 64 00:03:34,623 --> 00:03:38,878 regression graph, we have one input feature 65 00:03:38,879 --> 00:03:42,530 X, which is the independent variable. 66 00:03:43,270 --> 00:03:47,810 Y is the dependent variable or the predicted output. 67 00:03:47,811 --> 00:03:51,020 All the points in the graph are the training 68 00:03:51,021 --> 00:03:54,188 dataset and we can easily see that there 69 00:03:54,189 --> 00:03:57,484 is a linear relation between X and Y. 70 00:03:57,485 --> 00:04:03,072 The algorithm will search for the best fit linear line 71 00:04:03,073 --> 00:04:08,342 for finding w and b, where w is the slope 72 00:04:08,343 --> 00:04:12,308 of the line describing how strong is the linear relationship 73 00:04:12,309 --> 00:04:15,988 between X and Y, and b is the intersection with 74 00:04:15,989 --> 00:04:20,640 the Y, X describing the arrow in that model. 75 00:04:21,170 --> 00:04:24,056 Now how the algorithm will know 76 00:04:24,057 --> 00:04:26,088 that this is the best line? 77 00:04:26,089 --> 00:04:28,392 Well, it is using something 78 00:04:28,393 --> 00:04:30,488 that is called cost function. 79 00:04:30,489 --> 00:04:34,792 It will take any available point and measure the 80 00:04:34,793 --> 00:04:37,884 distance between the actual point that we have in 81 00:04:37,885 --> 00:04:41,948 the dataset and the point over the line. 82 00:04:41,949 --> 00:04:45,500 The line that the algorithm created as a model. 83 00:04:45,501 --> 00:04:50,176 The distance represent the error in the model. 84 00:04:50,177 --> 00:04:52,672 In linear regression, the cost function is 85 00:04:52,673 --> 00:04:56,320 called Mean Square Error, MSE, which is 86 00:04:56,321 --> 00:04:59,776 basically the average of squared error between 87 00:04:59,777 --> 00:05:02,662 the predicted values and the actual values. 88 00:05:02,663 --> 00:05:05,748 So the goal is to reduce this error of 89 00:05:05,749 --> 00:05:09,092 course, but taking into account not just one point, 90 00:05:09,093 --> 00:05:11,620 we need to take into account all available points 91 00:05:11,621 --> 00:05:14,472 in the data in the training dataset. 92 00:05:14,473 --> 00:05:19,192 Finally, this line, which is the trained model, 93 00:05:19,193 --> 00:05:22,136 can be used to predict new data points. 94 00:05:22,137 --> 00:05:27,346 We will insert new input X1, and get as output 95 00:05:27,347 --> 00:05:32,200 the predicted Y1 which is on that specific line. 96 00:05:33,530 --> 00:05:37,266 The generic equation of linear regression with multiple 97 00:05:37,267 --> 00:05:41,776 input features will look something like that, where 98 00:05:41,777 --> 00:05:46,448 xi specific location is the features for the 99 00:05:46,449 --> 00:05:50,150 date and wi and b are the parameters 100 00:05:50,151 --> 00:05:53,268 that are discovered during training. 101 00:05:53,269 --> 00:05:56,052 This is the idea about building a model. 102 00:05:56,053 --> 00:06:01,258 x are the features in the input data and the algorithm 103 00:06:01,259 --> 00:06:05,592 is trying to find those parameters w and b that will 104 00:06:05,593 --> 00:06:09,966 describe as good as possible that specific linear graph. 105 00:06:09,967 --> 00:06:14,456 Now imagine x1, x2, x3 are basically 106 00:06:14,457 --> 00:06:18,508 a group of attributes that describe a real estate product. 107 00:06:18,509 --> 00:06:22,204 So using regression, we can build a model 108 00:06:22,205 --> 00:06:24,620 that can be used to predict the market 109 00:06:24,621 --> 00:06:28,460 price of such real estate property, as an example. 110 00:06:29,230 --> 00:06:31,654 Okay, let's do a quick recap. 111 00:06:31,655 --> 00:06:34,144 We talked about the most common type 112 00:06:34,145 --> 00:06:38,112 of machine learning, which is supervised learning. 113 00:06:38,113 --> 00:06:42,228 In supervised learning, we supervise the learning process 114 00:06:42,229 --> 00:06:46,708 by deciding which labeled data instances will be 115 00:06:46,709 --> 00:06:48,480 part of the training dataset. 116 00:06:49,090 --> 00:06:52,612 And there are two main tasks in supervised learning. 117 00:06:52,613 --> 00:06:57,652 We can use shallow learning algorithms like SVM for 118 00:06:57,653 --> 00:07:03,962 classification task, regression algorithms for prediction or use neural 119 00:07:03,963 --> 00:07:07,560 networks under concept of deep learning. 120 00:07:08,090 --> 00:07:10,934 Let's move to unsupervised learning. 121 00:07:10,935 --> 00:07:13,470 [No audio]