Sri Krishnamurthy, CFA, CAP

Hult.quant.fenway@gmail.com

Hult.quant.backbay@gmail.com

QuantUniversity LLC.

Adjunct Lecturer

Hult International Business School

Session 2 - Part 1

Copyright 2014 QuantUniversity LLC. Cannot be reproduced or used without written permission from

QuantUniversity LLC.

Agenda

Regression Analysis : Estimating Statistical

Relationships

In-class Exercise

Regression

Topics

Regression Terminology

Scatterplots

Correlation

Simple Linear Regression

Multiple Regression

Modeling relationships

Where can you use regression?

Can you predict starting salaries for graduating students ?

Salary = function of { Previous years of education, Experience,

GPA, Interview Performance etc.}

Can you predict speed of a car ?

Speed = Function of { Engine parameters, Car age, Tire age,

Car Model}

Does spending more on advertising increase sales ?

Can you predict the energy prices/temperatures for the

next year?

What is regression ?

Regression analysis is the study of relationships between variables.

Prediction

Unknown variable = function of finite known variables

Type of Data:

Cross sectional data

For example : at a point in time, sales = function of {no of

promotional TV ads}

Time series

For example : Temperature tomorrow = function of {temperature 1

year ago, temperature 2 years ago etc}

Regression terms

Dependent variable / Target Variable /

Response variable (Y)

The unknown variable we are trying to explain/predict Independent variable / Explanatory variable / Predictor variable (X)

The variables used to predict the response variables Type of Relationship

Type of Relationship:

Linear Or non Linear

Types

Simple Linear Regression

1 dependent variable , 1 independent variable

Ex : Sales = function of { ad spend}

Multiple Linear Regression

1 dependent variable , many independent variable

Ex : Sales = function of { ad spend, no of promotional events,

number of sales offices}

Scatter Plots

Why Scatterplots ?

A scatterplot is a graphical plot of two numerical variables,

an X and a Y.

If there is any relationship between the two variables, it is usually apparent from the scatterplot.

Drugstore Sales.xlsx

To use a scatterplot to examine the relationship between

promotional expenditures and sales at Pharmex.

Observations:

Drugstore Sales.xlsx

The scatterplot indicates that there is a positive relationship between Promote and Sales.

The relationship is not perfect. While the variable Promote is helpful in predicting Sales, it does not lead to perfect predictions. Correlation and Causation

Correlation between the variables does not imply

causation.

Scatterplot only tells if there is a relationship between the two plotted variables.

To analyze multiple variables

We can use scatterplots to examine the relationships among

the dataset variables.

Examine scatterplots between each explanatory variable and the dependent variable.

With multiple explanatory variables, check for relationships among them.

Overhead Costs.xlsx

Data file contains observations of overhead costs, machine

hours, and production runs at Bendrix, an automobile parts manufacturing company.

To check for Linear and Non-Linear relationships Scatterplots are useful for detecting relationships that may not

be obvious otherwise.

Some relationships may not be linear – when points do not cluster around a straight line.

Scatterplot below indicates a nonlinear relationship between life expectancy of newborns in 1990 and GNP per capita.

To check for outliers

Scatterplots are especially useful for identifying outliers –

observations that fall outside of the general pattern of the rest of the observations.

Example below shows an outlier on a scatterplot of the relationship between CEO salary and years of experience.

How to address outliers?

Bad data/not relevant to the analysis : Omit

If not clear, run…