Regression Analysis and Simple Linear Regression Essay

Submitted By rohankapil
Words: 2277
Pages: 10

Quantitative Methods

Sri Krishnamurthy, CFA, CAP
Hult.quant.fenway@gmail.com
Hult.quant.backbay@gmail.com
QuantUniversity LLC.
Adjunct Lecturer
Hult International Business School

Session 2 - Part 1
Copyright 2014 QuantUniversity LLC. Cannot be reproduced or used without written permission from
QuantUniversity LLC.

Agenda
 Regression Analysis : Estimating Statistical

Relationships
 In-class Exercise

Regression

Topics
 Regression Terminology
 Scatterplots

 Correlation
 Simple Linear Regression
 Multiple Regression
 Modeling relationships

Where can you use regression?
 Can you predict starting salaries for graduating students ?
 Salary = function of { Previous years of education, Experience,

GPA, Interview Performance etc.}
 Can you predict speed of a car ?
 Speed = Function of { Engine parameters, Car age, Tire age,

Car Model}
 Does spending more on advertising increase sales ?

 Can you predict the energy prices/temperatures for the

next year?

What is regression ?
 Regression analysis is the study of relationships between variables.
 Prediction
 Unknown variable = function of finite known variables

Type of Data:
 Cross sectional data
 For example : at a point in time, sales = function of {no of

promotional TV ads}
 Time series
 For example : Temperature tomorrow = function of {temperature 1

year ago, temperature 2 years ago etc}

Regression terms
 Dependent variable / Target Variable /

Response variable (Y)
 The unknown variable we are trying to explain/predict  Independent variable / Explanatory variable / Predictor variable (X)
 The variables used to predict the response variables Type of Relationship
 Type of Relationship:
 Linear Or non Linear

Types
 Simple Linear Regression
 1 dependent variable , 1 independent variable
 Ex : Sales = function of { ad spend}

 Multiple Linear Regression
 1 dependent variable , many independent variable
 Ex : Sales = function of { ad spend, no of promotional events,

number of sales offices}

Scatter Plots

Why Scatterplots ?
 A scatterplot is a graphical plot of two numerical variables,

an X and a Y.
 If there is any relationship between the two variables, it is usually apparent from the scatterplot.
 Drugstore Sales.xlsx
 To use a scatterplot to examine the relationship between

promotional expenditures and sales at Pharmex.

Observations:
Drugstore Sales.xlsx




The scatterplot indicates that there is a positive relationship between Promote and Sales.
The relationship is not perfect. While the variable Promote is helpful in predicting Sales, it does not lead to perfect predictions. Correlation and Causation
 Correlation between the variables does not imply

causation.
 Scatterplot only tells if there is a relationship between the two plotted variables.

To analyze multiple variables
 We can use scatterplots to examine the relationships among

the dataset variables.




Examine scatterplots between each explanatory variable and the dependent variable.
With multiple explanatory variables, check for relationships among them.

 Overhead Costs.xlsx
 Data file contains observations of overhead costs, machine

hours, and production runs at Bendrix, an automobile parts manufacturing company.

To check for Linear and Non-Linear relationships  Scatterplots are useful for detecting relationships that may not

be obvious otherwise.
 Some relationships may not be linear – when points do not cluster around a straight line.
 Scatterplot below indicates a nonlinear relationship between life expectancy of newborns in 1990 and GNP per capita.

To check for outliers
 Scatterplots are especially useful for identifying outliers –

observations that fall outside of the general pattern of the rest of the observations.
 Example below shows an outlier on a scatterplot of the relationship between CEO salary and years of experience.

How to address outliers?
 Bad data/not relevant to the analysis : Omit
 If not clear, run