Simple Linear Regression Models

Hongwei Zhang http://www.cs.wayne.edu/~hzhang Statistics is the art of lying by means of figures.

--- Dr. Wilhelm Stekhel

Acknowledgement: this lecture is partially based on the slides of Dr. Raj Jain.

Simple linear regression models

Response Variable: Estimated variable

Predictor Variables: Variables used to predict the response

Also called predictors or factors

Regression Model: Predict a response for a given set of predictor variables

Linear Regression Models: Response is a linear function of predictors Simple Linear Regression Models: Only one predictor

Outline

Definition of a Good Model

Estimation of Model parameters

Allocation of Variation

Standard deviation of Errors

Confidence Intervals for Regression Parameters

Confidence Intervals for Predictions

Visual Tests for verifying Regression Assumption

Outline

Definition of a Good Model

Estimation of Model parameters

Allocation of Variation

Standard deviation of Errors

Confidence Intervals for Regression Parameters

Confidence Intervals for Predictions

Visual Tests for verifying Regression Assumption

Definition of a good model?

Good models (contd.)

Regression models attempt to minimize the distance measured vertically between the observation point and the model line (or curve) The length of the line segment is called residual, modeling error, or simply error

The negative and positive errors should cancel out => Zero overall error Many lines will satisfy this criterion

Choose the line that minimizes the sum of squares of the errors

Good models (contd.)

Formally,

where,

is the predicted response when the predictor variable is

x. The parameter b0 and b1 are fixed regression parameters to be determined from the data.

Given n observation pairs {(x1, y1), …, (xn, yn)}, the estimated response for the i-th observation is:

The error is:

Good models (contd.)

The best linear model minimizes the sum of squared errors

(SSE):

subject to the constraint that the overall mean error is zero:

This is equivalent to the unconstrained minimization of the variance of errors (Exercise 14.1)

Outline

Definition of a Good Model

Estimation of Model parameters

Allocation of Variation

Standard deviation of Errors

Confidence Intervals for Regression Parameters

Confidence Intervals for Predictions

Visual Tests for verifying Regression Assumption

Estimation of model parameters

Regression parameters that give minimum error variance are:

where,

Example 14.1

Example (contd.)

Example (contd.)

Derivation of regression parameters?

Derivation (contd.)

Derivation (contd.)

Least Squares Regression vs. Least Absolute

Deviations Regression?

Least Squares Regression

Least Absolute Deviations

Regression

Not very robust to outliers

Robust to outliers

Simple analytical solution

No analytical solving method

(have to use iterative computation-intensive method)

Stable solution

Unstable solution

Always one unique solution

Possibly multiple solutions

The unstable property of the method of least absolute deviations means that, for any small horizontal adjustment of a data point, the regression line may jump a large amount. In contrast, the least squares solutions is stable in that, for any small horizontal adjustment of a data point, the regression line will always move only slightly, or continuously.

Outline

Definition of a Good Model

Estimation of Model parameters

Allocation of Variation

Standard deviation of Errors

Confidence Intervals for Regression Parameters

Confidence Intervals for Predictions

Visual Tests for verifying Regression Assumption

Allocation of variation

Allocation of variation (contd.)

The sum of squared errors without regression would be:

This is called total sum of squares or (SST). It is a measure of

y's variability and is called variation of y. SST can be computed as follows:

Where, SSY is the sum of squares of y (or Σy2). SS0 is the sum of squares of

and is equal to

Allocation of