Random Variables, Simulations, and the Law of Cosines

Peter Flanagan-Hyde

Phoenix Country Day School peter.flanagan-hyde@pcds.org Abstract

Many teachers demonstrate many of the concepts of random variables through simulation. However, the concept of independence doesn’t lend itself to this approach – randomly generated sets of values will not be independent, even when the conditions of the simulation include independent random variables. This leads to interesting connections to other mathematical topics, including the Law of Cosines, vectors, and projectile motion.

Combining Variation in Random Variables

In an introductory statistics class, students learn the following, which forms the basis of most of statistical inference:

If X and Y are two independent random variables, then the sum X + Y (or difference X – Y) has variance given by:

2

2

2

2

2

2

! X + ! Y = ! X +Y (or ! X + ! Y = ! X "Y )

These bear at least a formal similarity to the more famous Pythagorean Theorem, which forms the basis of most calculations of length:

If a and b are legs of a right triangle, then the third side c has length given by

a2 + b2 = c2

Is there anything to the symmetry in these two important theorems? The discussion that follows explains answers this question in the affirmative.

A Scenario: Life plus Monopoly

The board game Life uses a spinner numbered 1 – 10, and the game Monopoly uses two dice, numbered 1 – 6. Imagine a hybrid game that borrows from these elements, played by spinning the Life spinner and throwing one of the

Monopoly dice and subtracting the results. Your token is then advanced by this difference, making for an interesting game: you generally move forward, but might move backwards or even not move at all. How does the spaces advanced relate to the individual outcomes of the spinner and the die?

At the right is a simulation of 20 plays of this game done on a TI calculator.

To follow along with exactly these numbers on your calculator, reset the random number generator as shown on the top screen; otherwise, you’ll generate a different simulation.

In the notation of the preceding section, X is the outcome of a spin, with 20

examples stored in L1, and Y is the outcome of 20 throws of a die, stored in L2. Twenty examples of the difference X – Y are stored in L3.

2

Can we use this simulation to show that the variances add, s X

2

2

+ sY = sX !Y ?

Here is a tabulation of

the variables that we have:

Random

Variable

X (L1)

Mean x = 5.65

Standard deviation s X = 3.4683

Variance

Y (L2)

y = 3.45

sY = 2.0641

2 sY = 4.2605

X – Y (L3)

x ! y = 2.2

s X !Y = 3.5333

2 s X !Y = 12.4842

2 s X = 12.0289

The means are quite close to expected (5.5 and 3.5) and the mean of the difference is a demonstration of the property, true for all random variables, X – Y = X ! Y .

The variances, however are another story: the sum of the variances of X and Y are not even close to the variance of the difference!

2

sX

2

2

+ sY = s X !Y

12.0289 + 4.2605 " 12.4842

Let’s explore this discrepancy:

2

sX

2

+

sY

12.0289 + 4.2605 !

3.8052

2

= s X !Y

= 12.4842

The missing quantity is about 3.8052. Does this number have any meaning?

The original theorem stated, “If X and Y are independent, then the variances add.” If the variances don’t add, then this implies that X and Y are not independent – despite the fact that we had set up the simulation that way.

At the right is a scatterplot of the spin (X) and die (Y) values (there are only 17 points, since some combinations are repeated). This doesn’t show a striking pattern. But a linear regression reveals that the two lists of values are not independent, since the correlation between them isn’t zero – the correlation,

0.2658, measures an association.

So now we have a case in which we might start a theorem, “If X and Y are not independent, then…