Chapter 4 Descriptive Statistics

Page 1

Central Tendency, Dispersion, and Shape

Where are values concentrated? How much do values vary? Is the distribution symmetric? Skewed?

Is it a population or a sample?

For populations: Model populations with distributions Usually use Greek letters for descriptors, such as µ and σ Referred to as parameters of the distribution Typically are unknown and so try to find estimates Usually use Latin letters, such as ̅ and S Referred to as statistics Calculated from sample data Typically used to estimate corresponding parameters Use sample data to make inferences about the population from which the sample was taken

For sample data:

Chapter 4 Descriptive Statistics

Page 2

Central Tendency

Mean

The average of a sample of size n

1 n X X 2 ... X n X Xi 1 n i 1 n

The average of a population of size N

1 N

X i 1

N

i

X 1 X 2 ... X N N

Example 1: sample of n = 8 employees’ ages 21, 33, 47, 51, 68, 29, 31, 44

4

Frequency

3 2 1 0 0 20 40 ages 60 80 100

Sample average age is 40.5

Chapter 4 Descriptive Statistics

Page 3

Central Tendency Median- the value in the middle How does the median compare to the mean? What did the histogram look like?

Example 1: of n = 8 employees’ ages (sorted) 21, 29, 31, 33, 44, 47, 51, 68 Median is the average of the two values in the middle.

4

Frequency

3 2 1 0 0 20 40 ages 60 80 100

Median is 38.5 The average is 40.5.

Chapter 4 Descriptive Statistics

Page 4

Central Tendency For symmetric data, including data that is normally distributed, the mean, mode, and the median are the same.

Average, median, and mode are the same

For skewed data, the three differ:

Average Median Mode

Chapter 4 Descriptive Statistics Page 5

Measures of Position Percentiles

express ranks as percentages from 0% to 100% median is the 50th percentile

The Five-Number Summary 1. smallest data value (0th percentile)- min 2. the lower quartile (25th percentile)- Q1 3. median (50th percentile)- Q2 4. upper quartile (75th percentile)- Q3 5. largest data value (100th percentile)- max Box Plot- a picture of the Five Number Summary where outliers are defined to be values that are larger than Q3 + 1.5*(Q3 – Q1) smaller than Q1 - 1.5*(Q3 – Q1)

Chapter 4 Descriptive Statistics Page 6

Example 1: ages of 8 employees (sorted) 21, 29, 31, 33, 44, 47, 51, 68

0% percentile is 21 (the smallest) 25% percentile is 30 (average of 29 and 31)- Q1 50% percentile is 38.5 (average of 33 and 44)- Q2 75% percentile is 49 (average of 47 and 51)- Q3 100th percentile is 68 (the largest)

0

20

40 ages

60

80

Chapter 4 Descriptive Statistics

Page 7

Dispersion

Measures of variability tell how the data values differ Standard Deviation the square root of the variance in the same units as the data the “usual” choice How far away, from the average, the data values typically lie.

(X

i 1 n

s

i

X )2

n1

sample standard deviation

i 1

N

( X i )2 N

population standard deviation

Range R = largest value - smallest value is easy to calculate does not use much data is sensitive to extremes often used in control chart calculations

Page 8

Chapter 4 Descriptive Statistics

Chebyshev’s Theorem- For any population, at least 11/ k 2 % of the values lie within k standard deviations of the mean

75% of values within 2 standard deviations of the mean 89% of values within 3 standard deviations of the mean 94% of values within 4 standard deviations of the mean

Empirical…