Linear regression with one predictor variable

1

Regression (Historically)

Regression means ‘going back’

Francis Galton (1822‐1911) studied “Hereditary Genius”

(1869) and other traits

Heights of fathers and sons

Sons of the tallest fathers tended to be taller than average, but shorter than their fathers

Sons of the shortest fathers tended to be shorter than average, but taller than their fathers

This kind of thing was observed for lots of traits.

Galton was deeply concerned about “regression to mediocrity.” 2

Types of Data

Typically, data come to us in one of four forms:

Categorical (Nominal)

Ordinal

Interval

Ratio

3

Categorical variables

Take on several levels, none of which have any natural ordering

Sex (M, F, …)

Race (Black, White, Asian, …)

Program major (Stat, CS, Math, Psych, Bio, …)

Type of fertilizer (A, B, …)

Drug (Active, Placebo)

When controlled by the experimenter, called a Factor

Important nomenclature for R

4

Ordinal variables

Take on several levels which have a natural order, but no consistent distance metric

Grade (A+, A, A-, B+, …)

Professor Rating (5, 4, 3, 2, 1)

Likert item

Level of education (PhD, Masters, Bachelors, HS,

Primary, None)

Sports (Rugby, Football, Soccer, … Basketball)

Difficult to deal with, so we usually consider them as either Categorical, or

5

Interval variables

Numerical variable with a consistent distance metric, but no proper zero point

IQ

Temperature (in °C)

SAT score

Slope and difference are meaningful, but ratios are not

6

Ratio variables

Interval variable with a proper