Probability Theory and Its Applications

In the applications of probability theory to real phenomena two results of the mathematical theory of probability play a conspicuous role. These results are known as the law of large numbers and central limit theorem. At this point in this book we have sufficient mathematical tools available to show how to apply these basic results. In Chapters 9 and 10 we develop the additional mathematical tools required to prove these theorems with a sufficient degree of generality.

A set of \(n\) observations \(X_{1}, X_{2}, \ldots, X_{n}\) are said to constitute a random sample of a random variable \(X\) if \(X_{1}, X_{2}, \ldots, X_{n}\) are independent random variables, identically distributed as \(X\) . Let

\[S_{n}=X_{1}+X_{2}+\cdots+X_{n}, \tag{5.1}\]

be the sum of the observations. Their arithmetic mean

\[M_{n}=\frac{1}{n} S_{n}, \tag{5.2}\]

is called the sample mean .

By (4.1), (4.6), and (4.7), we obtain the following expressions for the mean, variance, and moment-generating function of \(S_{n}\) and \(M_{n}\) , in terms of the mean, variance, and moment-generating function of \(X\) (assuming these exist):

\begin{align} E\left[S_{n}\right] & =n E[X], & \operatorname{Var}\left[S_{n}\right]=n \operatorname{Var}[X], & \quad \psi_{S_{n}}(t)=\left[\psi_{X}(t)\right]^{n}. \\ E\left[M_{n}\right] & =E[X], & \operatorname{Var}\left[M_{n}\right]=\frac{1}{n} \operatorname{Var}[X], & \quad \psi_{M_{n}}(t)=\left[\psi_{X}\left(\frac{t}{n}\right)\right]^{n}. \tag{5.3} \end{align}

From (5.4) we obtain the striking fact that the variance of the sample mean \((1 / n) S_{n}\) tends to 0 as the sample size \(n\) tends to infinity. Now, by Chebyshev’s inequality, it follows that if a random variable has a small variance then it is approximately equal to its mean, in the sense that with probability close to 1 an observation of the random variable will yield an observed value approximately equal to the mean of the random variable; in particular, the probability is 0.99 that an observed value of the random variable is within 10 standard deviations of the mean of the random variable. We have thus established that the sample mean of a random sample \(X_{1}, X_{2}, \ldots, X_{n}\) of a random variable, with a probability that can be made as close to 1 as desired by taking a large enough sample, is approximately equal to the ensemble mean \(E[X]\) . This fact, known as the law of large numbers , was first established by Bernoulli in 1713 (see section 5 of Chapter 5). The validity of the law of large numbers is the mathematical expression of the fact that increasingly accurate measurements of a quantity (such as the length of a rod) are obtained by averaging an increasingly large number of observations of the value of the quantity. A precise mathematical statement and proof of the law of large numbers is given in Chapter 10.

However, even more can be proved about the sample mean than that it tends to be equal to the mean. One can approximately evaluate, for any interval about the mean, the probability that the sample mean will have an observed value in that interval, since the sample mean is approximately normally distributed. More generally, it may be shown that if \(S_{n}\) is the sum of independent identically distributed random variables \(X_{1}, X_{2}, \ldots, X_{n}\) , with finite means and variances then, for any real numbers \(a\begin{align} P\left[a \leq S_{n} \leq b\right] & =P\left[\frac{a-E\left[S_{n}\right]}{\sigma\left[S_{n}\right]} \leq \frac{S_{n}-E\left[S_{n}\right]}{\sigma\left[S_{n}\right]} \leq \frac{b-E\left[S_{n}\right]}{\sigma\left[S_{n}\right]}\right] \tag{5.5}\\[3mm] & =\Phi\left(\frac{b-E\left[S_{n}\right]}{\sigma\left[S_{n}\right]}\right)-\Phi\left(\frac{a-E\left[S_{n}\right]}{\sigma\left[S_{n}\right]}\right). \end{align}

In words, (5.5) may be expressed as follows: the sum of a large number of independent identically distributed random variables with finite means and variances, nomalized to have mean zero and variance 1, is approximately normally distributed . Equation (5.5) represents a rough statement of one of the most important theorems of probability theory. In 1920 G. Polya gave this theorem the name “the central limit theorem of probability theory”. This name continues to be used today, although a more apt description would be “the normal convergence theorem”. The central limit theorem was first proved by De Moivre in 1733 for the case in which \(X_{1}, X_{2}, \ldots, X_{n}\) are Bernoulli random variables, so that \(S_{n}\) is then a binomial random variable. A proof of (5.5) in this case (with a continuity correction) was given in section 2 of Chapter 6. The determination of the exact conditions for the validity of (5.5) constituted the outstanding problem of probability theory from its beginning until the decade of the 1930’s. A precise mathematical statement and proof of the central limit theorem is given in Chapter 10 .

It may be of interest to outline the basic idea of the proof of (5.5), even though the mathematical tools are not at hand to justify the statements made. To prove (5.5) it suffices to prove that the moment-generating function

\[\psi_{n}(t)= E\left[e^{t\left(S_{n}-E\left[S_{n}\right]\right) / \sigma\left[S_{n}\right]}\right]=\left\{\psi_{X-E[X]}\left(\frac{t}{\sqrt{n} \sigma[X]}\right)\right\}^{n} \tag{5.6}\]

satisfies for \(t\) in a neighborhood of 0

\[\lim _{n \rightarrow \infty} \log \psi_{n}(t)=\frac{t^{2}}{2}, \tag{5.7}\]

in which \(t^{2} / 2\) is the logarithm of the moment-generating function of a random variable \(X\) , which is \(N(0,1)\) . Now, expanding in Taylor series,

\[\psi_{X-E[X]}(u)=1+\frac{1}{2} \sigma^{2}[X] u^{2}+A(u), \tag{5.8}\]

where the remainder \(A(u)\) satisfies the condition \(\lim_{u \rightarrow 0} A(u) / u^{2}=0\) . Similarly, \(\log (1+v)=v+B(v)\) where \(\displaystyle\lim _{v \rightarrow 0} B(v) / v=0\) . Consequently one may show that for values of \(u\) sufficiently close to 0

\[\log \psi_{X-E[X]}(u)=\frac{1}{2} \sigma^{2}\left[X^{2}\right] u^{2}+C(u), \tag{5.9}\]

where

\[\lim _{u \rightarrow 0} \frac{C(u)}{u^{2}}=0. \tag{5.10}\]

It then follows that

\[\log \psi_{n}(t)=n \log \psi_{X-E[X]}\left(\frac{t}{\sqrt{n} \sigma[X]}\right)=\frac{t^{2}}{2}+n C\left(\frac{t}{\sqrt{n} \sigma[X]}\right), \tag{5.11}\]

where

\[\lim _{n \rightarrow \infty} n C\left(\frac{t}{\sqrt{n} \sigma[X]}\right)=0. \tag{5.12}\]

From (5.11) and (5.12) one obtains (5.7). Our heuristic outline of the proof of (5.5) is now complete.

Given any random variable \(X\) with finite mean and variance, we define its standardization , denoted by \(X^{*}\) , as the random variable

\[X^{*}=\frac{X-E[X]}{\sigma[X]}, \tag{5.13}\]

The standardization \(X^{*}\) is a dimensionless random variable, with mean \(E\left[X^{*}\right]=0\) and variance \(\sigma^{2}\left[X^{*}\right]=1\) .

The central limit theorem of probability theory can now be formulated: The standardization \(\left(S_{n}\right)^{*}\) of the sum \(S_{n}\) of a large number \(n\) of independent and identically distributed random variables is approximately normally distributed . In Chapter 10 it is shown that this result may be considerably extended to include cases in which \(S_{n}\) is the sum of dependent nonidentically distributed random variables.

Example 5A . Reliability . Evaluation of the reliability of rockets is a problem of obvious importance in the space age. By the reliability of a rocket one means the probability \(p\) that an attempted launching of the rocket will be successful. Suppose that rockets of a certain type have, by many tests, been established as \(90 \%\) reliable. Suppose that a modification of the rocket design is being considered. Which of the following sets of evidence throws more doubt on the hypothesis that the modified rocket is only \(90 \%\) reliable: (i) of 100 modified rockets tested, 96 performed satisfactorily, (ii) of 64 modified rockets tested, 62 (equal to \(96.9 \%\) ) performed satisfactorily.

Solution

Let \(S_{1}\) be the number of rockets in the group of 100 which performed satisfactorily, and let \(S_{2}\) be the number of rockets in the group of 64 which performed satisfactorily. If \(p\) is the reliability of a rocket, then \(S_{1}\) and \(S_{2}\) have standardizations (since \(S_{1}\) and \(S_{2}\) have binomial distributions):

\[\left(S_{1}\right)^{*}=\frac{S_{1}-100 p}{10 \sqrt{p q}}, \quad\left(S_{2}\right)^{*}=\frac{S_{2}-64 p}{8 \sqrt{p q}}.\]

If \(p=0.9, S_{1}=96\) , and \(S_{2}=62\) , then \(\left(S_{1}\right)^{*}=2\) and \(\left(S_{2}\right)^{*}=1 \frac{5}{6}\) . If \(\left(S_{1}\right)^{*}\) is \(N(0,1)\) , the probability of observing a value of \(\left(S_{1}\right)^{*}\) greater than or equal to 2 is 0.023. If \(\left(S_{2}\right)^{*}\) is \(N(0,1)\) , the probability of observing a value of \(\left(S_{2}\right)^{*}\) greater than or equal to 1.83 is 0.034. Consequently, scoring 96 successes in 100 tries is better evidence than scoring 62 successes in 64 tries for the hypothesis that the modified rocket has a higher reliability than the original rocket.

Example 5B . Brownian motion and random walk . A particle (of diameter \(10^{-4}\) centimeter, say) immersed in a liquid or gas exhibits ceaseless irregular motions that are discernible under the microscope. The motion of such a particle is called Brownian, after the English botanist Robert Brown, who noticed the phenomenon in 1827. The same phenomenon is also exhibited in striking fashion by smoke particles suspended in air. The explanation of the phenomenon of Brownian motion was one of the major successes of statistical mechanics and kinetic theory. In 1905 Einstein showed that the Brownian motion could be explained by assuming that the particles are subject to the continual bombardment of the molecules of the surrounding medium. The theoretical results of Einstein were soon confirmed by the exact experimental work of Perrin. To appreciate the importance of these events, the reader should be aware that in the years around 1900 atoms and molecules were far from being accepted as they are today-there were still physicists who did not believe in them. After Einstein’s work this was possible no longer (see Max Born, Natural Philosophy of Cause and Chance , Oxford, 1949, p.63). If we let \(S_{t}\) denote the displacement after \(t\) minutes of a particle in Brownian motion from its starting point, Einstein showed that \(S_{t}\) has probability density function

\[f_{S_{t}}(x)=\left(\frac{1}{4 \pi D t}\right)^{1 / 2} e^{-x^{2} / 4 D t},s \tag{5.14}\]

in which \(D\) is a constant, called the diffusion coefficient , which depends on the absolute temperature and friction coefficient of the surrounding medium. In words, \(S_{t}\) is normally distributed with mean 0 and variance

\[E\left[S_{t}^{2}\right]=2 D t. \tag{5.15}\]

The result given by (5.15) is especially important; it states that the mean square displacement \(E\left[S_{t}^{2}\right]\) of a particle in Brownian motion is proportional to the time \(t\) . A model for Brownian motion is provided by a particle undergoing a random walk. Let \(X_{1}, X_{2}, \ldots, X_{n}\) be independent random variables, identically distributed as a random variable \(X\) , which has mean \(E[X]=0\) and finite variance \(E\left[X^{2}\right]\) . The sum \(S_{n}=X_{1}+X_{2}+\ldots\) \(+X_{n}\) represents the displacement from its starting position of a point (or particle) performing a random walk on a straight line by taking at the \(k\) th step a displacement \(X_{k}\) . After \(n\) steps, the total displacement \(S_{n}\) has a mean and mean square given by

\[E\left[S_{n}\right]=0, \quad E\left[S_{n}^{2}\right]=n E\left[X^{2}\right]. \tag{5.16}\]

Thus the mean-square displacement of a particle undergoing a random walk is proportional to the number of steps \(n\) . Since \(S_{n}\) is approximately normally distributed in the sense that (5.5) holds, it might be thought that the probability density function of \(S_{n}\) is approximately given by

\[f_{S_{n}}(x)=\left(\frac{1}{2 \pi B n}\right)^{1 / 2} e^{-x^{2} / 2 B n}, \tag{5.17}\]

in which \(B=E\left[X^{2}\right]\) . However, (5.17) represents a stronger conclusion than (5.5). Equation (5.17) is a normal convergence theorem for probability density functions, whereas (5.5) is a normal convergence theorem for distribution functions; (5.17) implies (5.5), but the converse is not true. It may be shown that a sufficient condition for the validity of (5.17) is that the random variable \(X\) possesses a square integrable probability density function. From the fact that \(S_{n}\) is approximately normally distributed in the sense that (5.5) holds it follows that it is very improbable that a value of \(S_{n}\) will be observed more than 3 or 4 standard deviations from its mean. Consequently, in a random walk in which the individual steps have mean 0 it is very unlikely after \(n\) steps that the distance from the origin will be greater than \(4 \sigma[X] \sqrt{n}\) .

Exercises

5.1. Which of the following sets of evidence throws more doubt on the hypothesis that new born babies are as likely to be boys as girls: (i) of 10,000 new born babies, 5100 are male; (ii) of 1000 new born babies, 510 are male.

Answer

(i) throws more doubt than (ii).

5.2. The game of roulette is described in example 1D. Find the probability that the total amount of money lost by a gambling house on 100,000 bets made by the public on an odd outcome at roulette will be negative.

5.3. As an estimate of the unknown mean \(E[X]\) of a random variable, it is customary to take the sample mean \(\bar{X}=\left(X_{1}+X_{2}+\cdots+X_{n}\right) / n\) of a random sample \(X_{1}, X_{2}, \ldots, X_{n}\) of the random variable \(X\) . How large a sample should one observe if there is to be a probability of at least 0.95 that the sample mean \(\bar{X}\) will not differ from the true mean \(E[X]\) by more than \(25 \%\) of the standard deviation \(\sigma[X]\) ?

Answer

62.

5.4. A man plays a game in which his probability of winning or losing a doliar is \(\frac{1}{2}\) . Let \(S_{n}\) be the man’s fortune (that is, the amount he has won or lost) after \(n\) independent plays of the game.

(i) Find \(E\left[S_{n}\right]\) and \(\operatorname{Var}\left[S_{n}\right]\) . Hint : Write \(S_{n}=X_{1}+\cdots+X_{n}\) , in which \(X_{i}\) is the change in the man’s fortune on the \(i\) th play of the game.

(ii) Find approximately the probability that after 10,000 plays of the game the change in the man’s fortune will be between -50 and 50 dollars.

5.5. Consider a game of chance in which one may win 10 dollars or lose \(1,2,3\) , or 4 dollars; each possibility has probability 0.20. How many times can this game be played if there is to be a probability of at least \(95 \%\) that in the final outcome the average gain or loss per game will be between -2 and +2?

Answer

25 or more.

5.6. A certain gambler’s daily income (in dollars) is a random variable \(X\) uniformly distributed over the interval -3 to 3.

(i) Find approximately the probability that after 100 days of independent play he will have won more than 200 dollars.

(ii) Find the quantity \(A\) that the probability is greater than \(95 \%\) that the gambler’s winnings (which may be negative) in 100 independent days of play will be greater than \(A\) .
(iii) Determine the number of days the gambler can play in order to have a probability greater than \(95 \%\) that his total winnings on these days will be less than 180 dollars in absolute value.

5.7. Add 100 real numbers, each of which is rounded off to the nearest integer. Assume that each rounding-off error is a random variable uniformly distributed between \(-\frac{1}{2}\) and \(\frac{1}{2}\) and that the 100 rounding-off errors are independent. Find approximately the probability that the error in the sum will be between -3 and 3. Find the quantity \(A\) that the probability is approximately \(99 \%\) that the error in the sum will be less than \(A\) in absolute value.

Answer

\(0.70 ; 7.4\) .

5.8. If each strand in a rope has a breaking strength, with mean 20 pounds and standard deviation 2 pounds, and the breaking strength of a rope is the sum of the (independent) breaking strengths of all the strands, what is the probability that a rope made up of 64 strands will support a weight of (i) 1280 pounds, (ii) 1240 pounds.

5.9. A delivery truck carries loaded cartons of items. If the weight of each carton is a random variable, with mean 50 pounds and standard deviation 5 pounds, how many cartons can the truck carry so that the probability of the total load exceeding 1 ton will be less than \(5 \%\) ? State any assumptions made.

Answer

38.

5.10. Consider light bulbs, produced by a machine, whose life \(X\) in hours is a random variable obeying an exponential probability law with a mean lifetime of 1000 hours.

(i) Find approximately the probability that a sample of 100 bulbs selected at random from the output of the machine will contain between 30 and 40 bulbs with a lifetime greater than 1020 hours.

(ii) Find approximately the probability that the sum of the lifetimes of 100 bulbs selected randomly from the output of the machine will be less than 110,000 hours.

5.11. The apparatus known as Galton’s quincunx is described in exercise 2.10 of Chapter 6. Assume that in passing from one row to the next the change \(X\) in the abscissa of a ball is a random variable, with the following probability law: \(P\left[X=\frac{1}{2}\right]=P\left[X=-\frac{1}{2}\right]=\frac{1}{2}-\eta,\; P\left[X=\frac{3}{2}\right]=P\left[X=-\frac{3}{2}\right]=\) \(\eta\) , in which \(\eta\) is an unknown constant. In an experiment performed with a quincunx consisting of 100 rows, it was found that \(80 \%\) of the balls inserted into the apparatus passed through the 21 central openings of the last row (that is, the openings with abscissas \(0, \pm 1, \pm 2, \ldots, \pm 10\) ). Determine the value of \(\eta\) consistent with this result.

Answer

\(\eta=0.10\) .

5.12. A man invests a total of \(N\) dollars in a group of \(n\) securities, whose rates of return (interest rates) are independent random variables \(X_{1}, X_{2}, \ldots, X_{n}\) , respectively, with means \(i_{1}, i_{2}, \ldots, i_{n}\) and variances \(\sigma_{1}^{2}, \sigma_{2}^{2}, \ldots, \sigma_{n}^{2}\) , respectively. If the man invests \(N_{j}\) dollars in the \(j\) th security, then his return in dollars on this particular portfolio is a random variable \(R\) given by \(R=N_{1} X_{1}+N_{2} X_{2}+\cdots+N_{n} X_{n}\) . Let the standard deviation \(\sigma[R]\) of \(R\) be used as a measure of the risk involved in selecting a given portfolio of securities. In particular, let us consider the problem of distributing investments of 5500 dollars between two securities, one of which has a rate of return \(X_{1}\) , with mean \(6 \%\) and standard deviation \(1 \%\) , whereas the other has a rate of return \(X_{2}\) with mean \(15 \%\) and standard deviation \(10 \%\) .

(i) If it is desired to hold the risk to a minimum, what amounts \(N_{1}\) and \(N_{2}\) should be invested in the respective securities? What is the mean and variance of the return from this portfolio?

(ii) What is the amount of risk that must be taken in order to achieve a portfolio whose mean return is equal to 400 dollars?

(iii) By means of Chebyshev’s inequality, find an interval, symmetric about 400 dollars, that, with probability greater than \(75 \%\) , will contain the return \(R\) from the portfolio with a mean return \(E[R]=400\) dollars. Would you be justified in assuming that the return \(R\) is approximately normally distributed?