COURSE PROJECT

AJ DAVIS DEPARTMENT STORES

PART

A

EXPLORATORY DATA ANALYSIS

DATE

May 20, 2014

Submitted By

TEAM

E

Saadia Aslam

DSI #

Mohammad Sajjadul Islam

DSI # 40176233

Bossman Samuel Quansah

DSI #

Table of Contents

Cover page…………..……..01

Table of Contents…..…..… 02

Introduction…..……..…….03

INDIVIDUAL VARIABLES

Location…………..……….04

Income…………………….05

Credit Balance…………….06

RELATIONSHIPS

Location and Income……..08

Size and Years……….…….10

Income and Balance………11

Conclusion………………….13

Introduction

Statistics encompasses the collection, processing, interpretation, presentation and analysis of data. The following report represents the detailed statistical analysis of the data collected from a sample of credit customers in the department chain store AJ DAVIS. A sample of 50 credit customers is selected with data collected on the following five variables.

1. Location (Rural, Urban, Suburban)

2. Income (in $1,000’s)

3. Size (Number of people living in the Household)

4. Years (the number of years that the customer has lived in the current location)

5. Credit balance (the customers’ current credits balance on the store’s credit card, in ($)

Individual Variables

Among the five variables we will demonstrate three significant variables which are Location,

Income and

Credit Balance.

Location

The 1st individual variable considered is Location. It is a categorical variable. The three subcategories are Urban, Suburban and Rural. Since this is a categorical variable, the measures of central tendency and descriptive statistics has not been calculated for this variable. The frequency distribution and pie chart are given as follows:

Tally for Discrete Variables: Location:

Location

Count

Percent

Rural

13

26.00

Suburban

15

30.00

Urban

22

40.00

N

50

Interpretation: Following calculating the frequency a pie chart was developed to determine which particular area had the highest percentage from the sample data provided. The pie chart demonstrates that the Urban area holds the highest percentage of customers which is 44.0% whereas, the Rural area holds the lowest percentage which means 26.0 % customers live in Rural areas.

Income

The 2nd individual variable considered is Income. It is a quantitative variable. The measures of central tendency, variation and other descriptive statistics have been calculated for this variable and are given as follows:

Descriptive Statistics: Income ($1000)

Variable

Income ($1000)

Variable

Income ($1000)

Mean

43.74

Q3

55.00

StDev

14.64

Minimum

67.00

Variance

214.32

Range

46.00

Minimum

21.00

Mode

55

Q1

30.00

Total Count

50

Median

43.00

Interpretation: Based on the histogram the Range of the customer’s income is highest at $67,000 and the lowest at the $21,000. The mean (average income) is $43,740 and the median income is $43,000. Because the Median is less than Mean, distribution is skewed to the right. This means the bigger portion of customers have an income under the rest of the customers. However this difference is not too much.

Credit Balance

The 3rd individual variable considered is Credit Balance. It is a quantitative variable. The measures of central tendency, variation and other descriptive statistics have been calculated for this variable and are given as follows:

Descriptive Statistics: Credit Balance ($)

Variable

Credit Balance ($)

Variable

Credit Balane ($)

Mean

3970

Median

4090

StDev

932

Q3

4748

Variance

868430

Maximum

5678

Minimum

1864

Range

3814

Q1

3109

Count

50

Interpretation: The Range of the credit balance ($) is $3814. The highest credit balance is at $5678 and the lowest credit balance is at $1864. The Median credit balance of the customer is $4090 and the Mean is $3970. The distribution is skewed to the left because Median is more than the Mean. The bigger portion of cardholders carries an approximate $4000 balance. This means that a greater number of