Chapter 1 Essay

Submitted By jackykk101
Words: 2499
Pages: 10

Chapter 1
Linear regression with one predictor variable


Regression (Historically)

Regression means ‘going back’
Francis Galton (1822‐1911) studied “Hereditary Genius”
(1869) and other traits
Heights of fathers and sons

Sons of the tallest fathers tended to be taller than average, but shorter than their fathers
Sons of the shortest fathers tended to be shorter than average, but taller than their fathers

This kind of thing was observed for lots of traits.
Galton was deeply concerned about “regression to mediocrity.” 2

Types of Data

Typically, data come to us in one of four forms: 

Categorical (Nominal)


Categorical variables

Take on several levels, none of which have any natural ordering

Sex (M, F, …)
Race (Black, White, Asian, …)
Program major (Stat, CS, Math, Psych, Bio, …)
Type of fertilizer (A, B, …)
Drug (Active, Placebo)

When controlled by the experimenter, called a Factor

Important nomenclature for R


Ordinal variables

Take on several levels which have a natural order, but no consistent distance metric

Grade (A+, A, A-, B+, …)
Professor Rating (5, 4, 3, 2, 1)

Likert item

Level of education (PhD, Masters, Bachelors, HS,
Primary, None)
Sports (Rugby, Football, Soccer, … Basketball) 

Difficult to deal with, so we usually consider them as either Categorical, or


Interval variables

Numerical variable with a consistent distance metric, but no proper zero point

Temperature (in °C)
SAT score

Slope and difference are meaningful, but ratios are not

Ratio variables

Interval variable with a proper