Probability Theory and Its Applications

Consider an experiment with two possible outcomes, denoted by success and failure. Suppose, however, that the probability \(p\) of success at each trial is unknown. According to the frequency interpretation of probability, \(p\) represents the relative frequency of successes in an indefinitely prolonged series of trials. Consequently, one might think that in order to determine \(p\) one must only perform a long series of trials and take as the value of \(p\) the observed relative frequency of success. The question arises: can one justify this procedure, not by appealing to the frequency interpretation of probability theory, but by appealing to the mathematical theory of probability?

The mathematical theory of probability is a logical construct, consisting of conclusions logically deduced from the axioms of probability theory. These conclusions are applicable to the world of real experience in the sense that they are conclusions about real phenomena, which are assumed to satisfy the axioms. We now show that one can reach a conclusion within the mathematical theory of probability that may be interpreted to justify the frequency interpretation of probability (and consequently may be used to justify the procedure described for estimating \(p\) ). This result is known as the law of large numbers, since it applies to the outcome of a large number of trials. The law of large numbers we are about to investigate may be considerably generalized. Consequently, the version to be discussed is called the Bernoulli law of large numbers , as it was first discovered by Jacob Bernoulli and published in his posthumous book Ars conjectandi (1713).

The Bernoulli Law of Large Numbers . Let \(S_{n}\) be the observed number of successes in \(n\) independent repeated Bernoulli trials, with probability \(p\) of success at each trial. Let

\[f_{n}=\frac{S_{n}}{n} \tag{5.1}\]

denote the relative frequency of successes in the \(n\) trials. Then, for any positive number \(\epsilon\) , no matter how small, it follows that

\begin{align} & \lim _{n \rightarrow \infty} P\left[\left|f_{n}-p\right| \leq \epsilon\right]=1, \tag{5.2} \\ & \lim _{n \rightarrow \infty} P\left[\left|f_{n}-p\right|>\epsilon\right]=0 . \tag{5.3} \end{align}

In words, (5.2) and (5.3) state that as the number \(n\) of trials tends to infinity the relative frequency of successes in \(n\) trials tends to the true probability \(p\) of success at each trial, in the probabilistic sense that any nonzero difference \(\epsilon\) between \(f_{n}\) and \(p\) becomes less and less probable of observation as the number of trials is increased indefinitely.

Bernoulli proved (5.3) by a tedious evaluation of the probability in (5.3). Using Chebyshev’s inequality, one can give a very simple proof of (5.3). By using the fact that the probability law of \(S_{n}\) has mean \(n p\) and variance \(n p q\) , one may prove that the probability law of \(f_{n}\) has mean \(p\) and variance \([p(1-p)] / n\) . Consequently, for any \(\epsilon>0\)

\[P\left[\left|f_{n}-p\right|>\epsilon\right] \leq \frac{p(1-p)}{n \epsilon^{2}}. \tag{5.4}\]

Now, for any value of \(p\) in the interval \(0 \leq p \leq 1\)

\[p(1-p) \leq \frac{1}{4}, \tag{5.5}\]

using the fact that \(4 p(1-p)-1=-(2 p-1)^{2} \leq 0\) . Consequently, for any \(\epsilon>0\)

\[P\left[\left|f_{n}-p\right|>\epsilon\right] \leq \frac{1}{4 n \epsilon^{2}} \rightarrow 0 \quad \text { as } n \rightarrow \infty, \tag{5.6}\]

no matter what the true value of \(p\) . To prove (5.2) , one uses (5.3) and the fact that

\[P\left[\left|f_{n}-p\right| \leq \epsilon\right]=1-P\left[\left|f_{n}-p\right|>\epsilon\right]. \tag{5.7}\]

It is shown in section 5 of Chapter 8 that the foregoing method of proof, using Chebyshev’s inequality, permits one to prove that if \(X_{1}, X_{2}, \ldots\) , \(X_{n}, \ldots\) is a sequence of independent observations of a numerical valued random phenomenon whose probability law has mean \(m\) then for any \(\epsilon>0\)

\[\lim _{n \rightarrow \infty} P\left[\left|\frac{X_{1}+X_{2}+\cdots+X_{n}}{n}-m\right|>\epsilon\right]=0. \tag{5.8}\]

The result given by (5.8) is known as the law of large numbers.

The Bernoulli law of large numbers states that to estimate the unknown value of \(p\) , as an estimate of \(p\) , the observed relative frequency \(f_{n}\) of successes in \(n\) trials can be employed; this estimate becomes perfectly correct as the number of trials becomes infinitely large. In practice, a finite number of trials is performed. Consequently, the number of trials must be determined, in order that, with high probability, the observed relative frequency be within a preassigned distance \(\epsilon\) from \(p\) . In symbols, to any number \(\alpha\) one desires to find \(n\) so that

\[P\left[\left|f_{n}-p\right| \leq \epsilon \mid p\right] \geq \alpha \quad \text { for all } p \text { in } 0 \leq p \leq 1 \tag{5.9}\]

where we write \(P[\cdot \mid p]\) to indicate that the probability is being calculated under the assumption that \(p\) is the true probability of success at each trial.

One may obtain an expression for the value of \(n\) that satisfies (5.9) by means of Chebyshev’s inequality. Since

\[P\left[\left|f_{n}-p\right| \leq \epsilon\right] \geq 1-\frac{1}{4 n \epsilon^{2}} \quad \text { for all } p \text { in } 0 \leq p \leq 1, \tag{5.10}\]

it follows that (5.9) is satisfied if \(n\) is chosen so that

\[n \geq \frac{1}{4 \epsilon^{2}(1-\alpha)}. \tag{5.11}\]

Example 5A . How many trials of an experiment with two outcomes, called \(A\) and \(B\) , should be performed in order that the probability be \(95 \%\) or better that the observed relative frequency of occurrences of \(A\) will differ from the probability \(p\) of occurrence of \(A\) by no more than 0.02? Here \(\alpha=0.95, \epsilon=0.02\) . Therefore, the number \(n\) of trials should be chosen so that \(n \geq 12,500\) .

The estimate of \(n\) given by (5.11) can be improved upon. In section 2 of Chapter 6 we prove the normal approximation to the binomial law. In particular, it is shown that if \(p\) is the probability of success at each trial then the number \(S_{n}\) of successes in \(n\) independent repeated Bernoulli trials approximately satisfies, for any \(h>0\) , \[P\left[\frac{\left|S_{n}-n p\right|}{\sqrt{n p q}} \leq h\right]=2 \Phi(h)-1. \tag{5.12}\] Consequently, the relative frequency of successes satisfies, for any \(\epsilon>0\) , \[P\left[\left|f_{n}-p\right| \leq \epsilon\right] \doteq 2 \Phi(\epsilon \sqrt{n / p q})-1. \tag{5.13}\] To obtain (5.13) from (5.12), let \(h=\epsilon \sqrt{n / p q}\) .

Define \(K(\alpha)\) as the solution of the equation

\[2 \Phi(K(\alpha))-1=\int_{-K(\alpha)}^{K(\alpha)} \phi(y) d y=\alpha. \tag{5.14}\]

A table of selected values of \(K(\alpha)\) is given in Table 5A.

\(\alpha\)	\(K(\alpha)\)
0.50	0.675
0.6827	1.000
0.90	1.645
0.95	1.960
0.9546	2.000
0.99	2.576
0.9973	3.000

TABLE 5A

From (5.13) we may obtain the conclusion that

\[P\left[\left|f_{n}-p\right| \leq \epsilon\right] \geq \alpha \quad \text { if } \epsilon \sqrt{(n / p q)} \geq K(\alpha). \tag{5.15}\]

To justify (5.15), note that \(\epsilon \sqrt{(n / p q)}>K(\alpha)\) implies that the right-hand side of (5.13) is greater than the left-hand side of (5.14).

Since \(p q \leq\left(\frac{1}{4}\right)\) for all \(p\) , we finally obtain from (5.15) that (5.9) will hold if

\[n \geq \frac{K^{2}(\alpha)}{4 \epsilon^{2}}. \tag{5.16}\]

Example 5B . If \(\alpha=0.95\) and \(\epsilon=0.02\) , then according to (5.16) \(n\) should be chosen so that \(n \geq 2500\) . Thus the number of trials required for \(f_{n}\) to be within 0.02 of \(p\) with probability greater than \(95 \%\) is approximately 2500, which is \(\frac{1}{5}\) of the number of trials that Chebyshev’s inequality states is required.

Exercises

5.1 . A sample is taken to find the proportion \(p\) of smokers in a certain population. Find a sample size so that the probability is (i) 0.95 or better, (ii) 0.99 or better that the observed proportion of smokers will differ from the true proportion of smokers by less than \((a) 1 \%,(b) 10 \%\) .

Answer

Chebyshev bound, (i): (a) 50,000, (b) 500; (ii) (a) 250,000, (b) 2500. Normal approximation, (i): (a) 9600, (b) 96; (ii) (a) 16,600, (b) 166.

5.2 . Consider an urn that contains 10 balls numbered 0 to 9, each of which is equally likely to be drawn; thus choosing a ball from the urn is equivalent to choosing a number 0 to 9; this experiment is sometimes described by saying a random digit has been chosen. Let \(n\) balls be chosen with replacement.

(i) What does the law of large numbers tell you about occurrences of 9’s in the \(n\) drawings.

(ii) How many drawings must be made in order that, with probability 0.95 or better, the relative frequency of occurrence of 9’s will be between 0.09 and 0.11?

5.3 . If you wish to estimate the proportion of engineers and scientists who have studied probability theory and you wish your estimate to be correct, within \(2 \%\) , with probability 0.95 or better, how large a sample should you take (i) if you feel confident that the true proportion is less than 0.2, (ii) if you have no idea what the true proportion is.

Answer

Chebyshev bound, (i) 8000; (ii) 12,500. Normal approximation, (i) 1537; (ii) 2400.

5.4 . The law of large numbers, in popular terminology, is called the law of averages. Comment on the following advice. When you toss a fair coin to decide a bet, let your companion do the calling. “Heads” is called 7 times out of 10. The simple law of averages gives the man who listens a tremendous advantage.