Probability Theory and Its Applications

The notion of independence of two random variables, \(X_{1}\) and \(X_{2}\) , is defined in section 6 of Chapter 7 . In this section we show how the notion of independence may be formulated in terms of expectations. At the same time, by a modification of the condition for independence of random variables, we are led to the notion of uncorrelated random variables.

We begin by considering the properties of expectations of products of random variables. Let \(X_{1}\) and \(X_{2}\) be jointly distributed random variables. By the linearity properties of the operation of taking expectations, it follows that for any two functions, \(g_{1}(.,.)\) and \(g_{2}(.,.)\) \[E\left[g_{1}\left(X_{1}, X_{2}\right)+g_{2}\left(X_{1}, X_{2}\right)\right]=E\left[g_{1}\left(X_{1}, X_{2}\right)\right]+E\left[g_{2}\left(X_{1}, X_{2}\right)\right] \tag{3.1}\] if the expectations on the right side of (3.1) exist. However, it is not true that a similar relation holds for products; namely, it is not true in general that \(E\left[g_{1}\left(X_{1}, X_{2}\right) g_{2}\left(X_{1}, X_{2}\right)\right]=E\left[g_{1}\left(X_{1}, X_{2}\right)\right] E\left[g_{2}\left(X_{1}, X_{2}\right)\right]\) . There is one special circumstance in which a relation similar to the foregoing is valid, namely, if the random variables \(X_{1}\) and \(X_{2}\) are independent and if the functions are functions of one variable only. More precisely, we have the following theorem:

Theorem 3A: If the random variables \(X_{1}\) and \(X_{2}\) are independent, then for any two Borel functions \(g_{1}(\cdot)\) and \(g_{2}(\cdot)\) of one real variable the product moment of \(g_{1}\left(X_{1}\right)\) and \(g_{2}\left(X_{2}\right)\) is equal to the product of their means; in symbols,

\[E\left[g_{1}\left(X_{1}\right) g_{2}\left(X_{2}\right)\right]=E\left[g_{1}\left(X_{1}\right)\right] E\left[g_{2}\left(X_{2}\right)\right] \tag{3.2}\]

if the expectations on the right side of (3.2) exist.

To prove equation (3.2), it suffices to prove it in the form

\[E\left[Y_{1} Y_{2}\right]=E\left[Y_{1}\right] E\left[Y_{2}\right] \quad \text { if } Y_{1} \text { and } Y_{2} \text { are independent}, \tag{3.3}\]

since independence of \(X_{1}\) and \(X_{2}\) implies independence of \(g\left(X_{1}\right)\) and \(g\left(X_{2}\right)\) . We write out the proof of (3.3) only for the case of jointly continuous random variables. We have

\begin{align} E\left[Y_{1} Y_{2}\right] & =\int_{-\infty}^{\infty} \int_{-\infty}^{\infty} y_{1} y_{2} f_{Y_{1}, Y_{2}}\left(y_{1}, y_{2}\right) d y_{1} d y_{2} \\[3mm] & =\int_{-\infty}^{\infty} \int_{-\infty}^{\infty} y_{1} y_{2} f_{Y_{1}}\left(y_{1}\right) f_{Y_{2}}\left(y_{2}\right) d y_{1} d y_{2} \\[3mm] & =\int_{-\infty}^{\infty} y_{1} f_{Y_{1}}\left(y_{1}\right) d y_{1} \int_{-\infty}^{\infty} d y_{2} f_{Y_{2}}\left(y_{2}\right) d y_{2}=E\left[Y_{1}\right] E\left[Y_{2}\right]. \end{align}

Now suppose that we modify (3.2) and ask only that it hold for the functions \(g_{1}(x)=x\) and \(g_{2}(x)=x\) , so that

\[E\left[X_{1} X_{2}\right]=E\left[X_{1}\right] E\left[X_{2}\right]. \tag{3.4}\]

For reasons that are explained after (3.7), two random variables, \(X_{1}\) and \(X_{2}\) , which satisfy (3.4), are said to be uncorrelated. From (2.10) it follows that \(X_{1}\) and \(X_{2}\) satisfy (3.4) and therefore are uncorrelated if and only if

\[\operatorname{Cov}\left[X_{1}, X_{2}\right]=0. \tag{3.5}\]

For uncorrelated random variables the formula given by (2.11) for the variance of the sum of two random variables becomes particularly elegant; the variance of the sum of two uncorrelated random variables is equal to the sum of their variances. Indeed,

\[\operatorname{Var}\left[X_{1}+X_{2}\right]=\operatorname{Var}\left[X_{1}\right]+\operatorname{Var}\left[X_{2}\right] \tag{3.6}\]

if and only if \(X_{1}\) and \(X_{2}\) are uncorrelated.

Two random variables that are independent are uncorrelated, for if (3.2) holds then, a fortiori, (3.4) holds. The converse is not true in general; an example of two uncorrelated random variables that are not independent is given in theoretical exercise 3.2. In the important special case in which \(X_{1}\) and \(X_{2}\) are jointly normally distributed, it follows that they are independent if they are uncorrelated (see theoretical exercise 3.3).

The correlation coefficient \(\rho\left(X_{1}, X_{2}\right)\) of two jointly distributed random variables with finite positive variances is defined by

\[\rho\left(X_{1}, X_{2}\right)=\frac{\operatorname{Cov}\left[X_{1}, X_{2}\right]}{\sigma\left[X_{1}\right] \sigma\left[X_{2}\right]}. \tag{3.7}\]

In view of (3.7) and (3.5), two random variables \(X_{1}\) and \(X_{2}\) are uncorrelated if and only if their correlation coefficient is zero.

The correlation coefficient provides a measure of how good a prediction of the value of one of the random variables can be formed on the basis of an observed value of the other. It is subsequently shown that

\[\left|\rho\left(X_{1}, X_{2}\right)\right| \leq 1. \tag{3.8}\]

Further \(\rho\left(X_{1}, X_{2}\right)=1\) if and only if

\[\frac{X_{2}-E\left[X_{2}\right]}{\sigma\left[X_{2}\right]}=\frac{X_{1}-E\left[X_{1}\right]}{\sigma\left[X_{1}\right]}. \tag{3.9}\]

and \(\rho\left(X_{1}, X_{2}\right)=-1\) if and only if

\[\frac{X_{2}-E\left[X_{2}\right]}{\sigma\left[X_{2}\right]}=-\frac{X_{1}-E\left[X_{1}\right]}{\sigma\left[X_{1}\right]}. \tag{3.10}\]

From (3.9) and (3.10) it follows that if the correlation coefficient equals 1 or -1 then there is perfect prediction; to a given value of one of the random variables there is one and only one value that the other random variable can assume. What is even more striking is that \(\rho\left(X_{1}, X_{2}\right)= \pm 1\) if and only if \(X_{1}\) and \(X_{2}\) are linearly dependent.

That (3.8), (3.9), and (3.10) hold follows from the following important theorem.

Theorem 3B . For any two jointly distributed random variables, \(X_{1}\) and \(X_{2}\) , with finite second moments \[E^{2}\left[X_{1}\;X_{2}\right]= \left|E\left[X_{1}\;X_{2}\right]\right|^{2} \leq E\left[X_{1}^{2}\right] E\left[X_{2}^{2}\right]. \tag{3.11}\] Further, equality holds in (3.11), that is, \(E^{2}\left[X_{1} X_{2}\right]=E\left[X_{1}^{2}\right] E\left[X_{2}^{2}\right]\) if and only if, for some constant \(t, X_{2}=t X_{1}\) , which means that the probability mass distributed over the \(\left(x_{1}, x_{2}\right)\) -plane by the joint probability law of the random variables is situated on the line \(x_{2}=t x_{1}\) .

Applied to the random variables \(X_{1}-E\left[X_{1}\right]\) and \(X_{2}-E\left[X_{2}\right]\) , (3.11) states that

\[\left|\operatorname{Cov}\left[X_{1}, X_{2}\right]\right|^{2} \leq \operatorname{Var}\left[X_{1}\right] \operatorname{Var}\left[X_{2}\right], \quad\left|\operatorname{Cov}\left[X_{1}, X_{2}\right]\right| \leq \sigma\left[X_{1}\right] \sigma\left[X_{2}\right]. \tag{3.12}\]

We prove (3.11) as follows. Define, for any real number \(t, h(t)=\) \(E\left[\left(t X_{1}-X_{2}\right)^{2}\right]=t^{2} E\left[X_{1}^{2}\right]-2 t E\left[X_{1} X_{2}\right]+E\left[X_{2}^{2}\right]\) . Clearly \(h(t) \geq 0\) for all \(t\) . Consequently, the quadratic equation \(h(t)=0\) has either no solutions or one solution. The equation \(h(t)=0\) has no solutions if and only if \(E^{2}\left[X_{1} X_{2}\right]-E\left[X_{1}^{2}\right] E\left[X_{2}^{2}\right]<0\) . It has exactly one solution if and only if \(E^{2}\left[X_{1} X_{2}\right]=E\left[X_{1}^{2}\right] E\left[X_{2}^{2}\right]\) . From these facts one may immediately infer (3.11) and the sentence following it.

The inequalities given by (3.11) and (3.12) are usually referred to as Schwarz’s inequality or Cauchy’s inequality.

Conditions for Independence . It is important to note the difference between two random variables being independent and being uncorrelated. They are uncorrelated if and only if (3.4) holds. It may be shown that they are independent if and only if (3.2) holds for all functions \(g_{1}(\cdot)\) and \(g_{2}(\cdot)\) , for which the expectations in (3.2) exist. More generally, theorem \(3 \mathrm{c}\) can be proved.

Theorem 3c. Two jointly distributed random variables \(X_{1}\) and \(X_{2}\) are independent if and only if each of the following equivalent statements is true:

(i) Criterion in terms of probability functions. For any Borel sets \(B_{1}\) and \(B_{2}\) of real numbers, \(P\left[X_{1}\right.\) is in \(B_{1}, X_{2}\) is in \(\left.B_{2}\right]=P\left[X_{1}\right.\) is in \(\left.B_{1}\right] P\left[X_{2}\right.\) is in \(\left.B_{2}\right]\) .

(ii) Criterion in terms of distribution functions. For any two real numbers, \(x_{1}\) and \(x_{2}, F_{X_{1}, X_{2}}\left(x_{1}, x_{2}\right)=F_{X_{1}}\left(x_{1}\right) F_{X_{2}}\left(x_{2}\right)\) .

(iii) Criterion in terms of expectations. For any two Borel functions, \(g_{1}(\cdot)\) and \(g_{2}(\cdot), E\left[g_{1}\left(X_{1}\right) g_{2}\left(X_{2}\right)\right]=E\left[g_{1}\left(X_{1}\right)\right] E\left[g_{2}\left(X_{2}\right)\right]\) if the expectations involved exist.

(iv) Criterion in terms of moment-generating functions (if they exist). For any two real numbers, \(t_{1}\) and \(t_{2}\) ,

\[\psi_{X_{1}, X_{2}}\left(t_{1}, t_{2}\right)=E\left[e^{t_{1} X_{1}+t_{2} X_{2}}\right]=\psi_{X_{1}}\left(t_{1}\right) \psi_{X_{2}}\left(t_{2}\right) \tag{3.13}\]

Theoretical Exercises

3.1. The standard deviation has the properties of the operation of taking the absolute value of a number : show first that for any 2 real numbers, \(x\) and \(y,|x+y| \leq|x|+|y|,|| x|-| y|| \leq|x-y|\) .

Hint : Square both sides of the equations. Show next that for any 2 random variables, \(X\) and \(Y\) ,

\[\sigma[X+Y] \leq \sigma[X]+\sigma[Y], \quad|\sigma[X]-\sigma[Y]| \leq \sigma[X-Y]. \tag{3.14}\]

Give an example to prove that the variance does not satisfy similar relationships.

3.2. Show that independent random variables are uncorrelated. Give an example to show that the converse is false.

Hint : Let \(X=\sin 2 \pi U\) , \(Y=\cos 2 \pi U\) , in which \(U\) is uniformly distributed over the interval 0 to 1.

3.3. Prove that if \(X_{1}\) and \(X_{2}\) are jointly normally distributed random variables whose correlation coefficient vanishes then \(X_{1}\) and \(X_{2}\) are independent. Hint : Use example 2A .

3.4 . Let \(\alpha\) and \(\beta\) be the values of \(a\) and \(b\) which minimize

\[f(a, b)=E\left|X_{2}-a-b X_{1}\right|^{2}.\]

Express \(\alpha, \beta\) , and \(f(\alpha, \beta)\) in terms of \(\rho\left(X_{1}, X_{2}\right)\) . The random variable \(\alpha+\beta X_{1}\) is called the best linear predictor of \(X_{2}\) , given \(X_{1}\) [see Section 7, in particular, (7.13) and (7.14)].

3.5. Prove that (3.9) and (3.10) hold under the conditions stated.

3.6. Let \(X_{1}\) and \(X_{2}\) be jointly distributed random variables possessing finite second moments. State conditions under which it is possible to find 2 uncorrelated random variables, \(Y_{1}\) and \(Y_{2}\) , which are linear combinations of \(X_{1}\) and \(X_{2}\) (that is, \(Y_{1}=a_{11} X_{1}+a_{12} X_{2}\) and \(Y_{2}=a_{21} X_{1}+a_{22} X_{2}\) for some constants \(a_{11}, a_{12}, a_{21}, a_{22}\) and \(\operatorname{Cov}\left[Y_{1}, Y_{2}\right]=0\) ).

3.7. Let \(X\) and \(Y\) be jointly normally distributed with mean 0, arbitrary variances, and correlation \(\rho\) . Show that \begin{align} & P[X \geq 0, Y \geq 0]=P[X \leq 0, Y \leq 0]=\frac{1}{4}+\frac{1}{2 \pi} \sin ^{-1} \rho. \\ & P[X \leq 0, Y \geq 0]=P[X \geq 0, Y \leq 0]=\frac{1}{4}-\frac{1}{2 \pi} \sin ^{-1} \rho. \end{align} Hint : Consult H. Cramér, Mathematical Methods of Statistics , Princeton University Press, 1946, p. 290.

3.8. Suppose that \(n\) tickets bear arbitrary numbers \(x_{1}, x_{2}, \ldots, x_{n}\) , which are not all the same. Suppose further that 2 of the tickets are selected at random without replacement. Show that the correlation coefficient \(\rho\) between the numbers appearing on the 2 tickets is equal to \((-1) /(n-1)\) .

3.9. In an urn containing \(N\) balls, a proportion \(p\) is white and \(q=1-p\) are black. A ball is drawn and its color noted. The ball drawn is then replaced, and \(N r\) balls are added of the same color as the ball drawn. The process is repeated until \(n\) balls have been drawn. For \(j=1,2, \ldots, n\) let \(X_{j}\) be equal to 1 or 0, depending on whether the ball drawn on the \(j\) th draw is white or black. Show that the correlation coefficient between \(X_{i}\) and \(X_{j}\) is equal to \(r /(1+r)\) . Note that the case \(r=-1 / N\) corresponds to sampling without replacement, and \(r=0\) corresponds to sampling with replacement.

Exercises

3.1. Consider 2 events \(A\) and \(B\) such that \(P[A]=\frac{1}{4}, P[B \mid A]=\frac{1}{2}, P[A \mid B]==\frac{1}{4}\) . Define random variables \(X\) and \(Y: X=1\) or 0, depending on whether the event \(A\) has or has not occurred, and \(Y=1\) or 0, depending on whether the event \(B\) has or has not occurred. Find \(E[X], E[Y], \operatorname{Var}[X], \operatorname{Var}[Y]\) , \(\rho(X, Y)\) . Are \(X\) and \(Y\) independent?

Answer

\(E[X]=\frac{1}{4}, E[Y]=\frac{1}{2}, \operatorname{Var}[X]=\frac{3}{16}, \operatorname{Var}[Y]=\frac{1}{4}, \rho[X, Y]=0; X\) and \(Y\) are independent.

3.2. Consider a sample of size 2 drawn with replacement (without replacement) from an urn containing 4 balls, numbered 1 to 4. Let \(X_{1}\) be the smallest and \(X_{2}\) be the largest among the numbers drawn in the sample. Find \(\rho\left(X_{1}, X_{2}\right)\) .

3.3. Two fair coins, each with faces numbered 1 and 2, are thrown independently. Let \(X\) denote the sum of the 2 numbers obtained, and let \(Y\) denote the maximum of the numbers obtained. Find the correlation coefficient between \(X\) and \(Y\) .

Answer

\(\sqrt{2 / 3}\) .

3.4. Let \(U, V\) , and \(W\) be uncorrelated random variables with equal variances. Let \(X=U+V, Y=U+W\) . Find the correlation coefficient between \(X\) and \(Y\) .

3.5. Let \(X_{1}\) and \(X_{2}\) be uncorrelated random variables. Find the correlation \(\rho\left(Y_{1}, Y_{2}\right)\) between the random variables \(Y_{1}=X_{1}+X_{2}\) and \(Y_{2}=X_{1}-X_{2}\) in terms of the variances of \(X_{1}\) and \(X_{2}\) .

Answer

\(\left(\sigma_{1}^{2}-\sigma_{2}^{2}\right) /\left(\sigma_{3}^{2}+\sigma_{2}^{2}\right)\) .

3.6. Let \(X_{1}\) and \(X_{2}\) be uncorrelated normally distributed random variables. Find the correlation \(\rho\left(Y_{1}, Y_{2}\right)\) between the random variables \(Y_{1}=X_{1}^{2}\) and \(Y_{2}=X_{2}^{2}\) .

3.7. Consider the random variables whose joint moment-generating function is given in exercise 2.6 . Find \(\rho\left(X_{1}, X_{2}\right)\) .

Answer

\(4 a-1\) .

3.8. Consider the random variables whose joint moment-generating function is given in exercise 2.7 . Find \(\rho\left(X_{1}, X_{2}\right)\) .

3.9. Consider the random variables whose joint moment-generating function is given in exercise 2.8 . Find \(\rho\left(X_{1}, X_{2}\right)\) .

Answer

\(e^{-\left(a_{2}-a_{1}\right)}\) .

3.10. Consider the random variables whose joint moment-generating function is given in exercise 2.9 . Find \(\rho\left(X_{1}, X_{2}\right)\) .