Machine Learning: Univariate Linear Regression

Listen to this article

Welcome back! This is in continuation to my previous post “Machine Learning: Introduction”. In this post I will discuss the problem of univariate linear regression, which is a supervised learning problem.

Model Representation

Recall that in supervised learning, the goal is to predict some “output” value, given a set of “input” values or features. Also, in a regression problem, the value being predicted is a continuous variable.

The simplest case of regression is univariate linear regression. In this case, for each training example, there is only one feature (say x) along with the “output” (say y). Also, the aim is to fit a linear function to the training data. The following illustration will make it clear.

(Source: Andrew Ng’s ‘Machine Learning’ course on Coursera)

Here, the input feature is the house size, and the “output” is the price of the house. The red crosses are the training examples plotted on a graph, and the green line shows what a good straight line fit to the data may look like. The figure also shows how the green line can be used to make predictions – the estimated price for a 1250 square feet house is $220,000.

Let’s introduce some notation before proceeding further:

  • m denotes the number of training examples
  • x is the “input” variable or feature
  • y is the “output” or target variable
  • (x, y) denotes a training example
  • (x(i), y(i)) denotes the i’th training example

Hypothesis function

The function used to predict the output value (the housing price in this case) is called the hypothesis function, denoted by hθ(x). In the case of univariate linear regression, we represent the hypothesis as:

hθ(x) = θ0 + θ1x

Here, θ0 and θ1 are called parameters. Clearly, in this case, the function hθ(x) is a linear function of one variable, namely x. For different values of the parameters, the graphs of hθ(x) will be different straight lines. We need to pick suitable values of the parameters so that the hypothesis function fits the data well, or in other words, minimizes the error between the actual and predicted values of y (the output).

Cost Function

Now the question arises, how exactly should we choose the parameters θ0 and θ1? As I mentioned earlier, we need to minimize the error between the actual and the predicted values. How can we express this mathematically? Now observe that, for a given hypothesis hθ(x),

  • Predicted value of the output for training example i = hθ(x(i))
  • Actual value of the output for this training example = y(i)
  • Squared error in the prediction = (hθ(x(i)) – y(i))2

Note that the error is squared because the quantity (hθ(x(i)) – y(i)) can be positive or negative, and only the magnitude of the error, not its sign, matters in this case. If we were to sum up the squared errors in the predictions of all the m training examples, and then divide by m, we will get the mean squared error:

Our objective, of course, is to minimize this error. Usually this function is multiplied by 1/2 to simplify further calculations, and what results is called the squared error cost function, J:

So, the mathematical formulation of the problem is: find the parameters θ0 and θ1 so that the value of the cost function, J(θ0, θ1) (as defined above), is minimized for the given set of training examples.

We have seen the idea of univariate linear regression, as well as its mathematical formulation, in this post. In the next post, I will discuss an algorithm called gradient descent that actually solves this problem – please do stay tuned!

(Visited 255 times, 1 visits today)

Related Posts

Leave a Reply