Probability Theory and Its Applications

In section 4 of Chapter 2 the notion of conditional probability was discussed for events defined on a sample description space on which a probability function was defined. However, an important use of the notion of conditional probability is to set up a probability function on the subsets of a sample description space \(S\) , which consists of \(n\) trials that are dependent (or, more correctly, nonindependent). In many applications of probability theory involving dependent trials one will state one’s assumptions about the random phenomenon under consideration in terms of certain conditional probabilities that suffice to specify the probability model of the random phenomenon.

As in section 2, for \(k=1,2, \ldots, n\) , let \(\mathscr{A}_{k}\) be the family of events on \(S\) which depend on the \(k\) th trial. Consider an event \(A\) that may be written as the intersection, \(A=A_{1} A_{2} \ldots A_{n}\) , of events \(A_{1}, A_{2}, \ldots, A_{n}\) , which belong to \(\mathscr{A}_{1}, \mathscr{A}_{2}, \ldots, \mathscr{A}_{n}\) , respectively. Now suppose that a probability function \(P[\cdot]\) has been defined on the subsets of \(S\) and suppose that \(P[A]>0\) . Then, by the multiplicative rule given in theoretical exercise 1.4,

\[P[A]=P\left[A_{1}\right] P\left[A_{2} \mid A_{1}\right] P\left[A_{3} \mid A_{1}, A_{2}\right] \cdots P\left[A_{n} \mid A_{1}, A_{2}, \ldots, A_{n-1}\right] \tag{4.1}\]

Now, as shown in section 2, any event \(A\) that is a combinatorial product event may be written as the intersection of \(n\) events, each depending on only one trial. Further, as we pointed out there, a probability function defined on the subsets of a space \(S\) , consisting of \(n\) trials, is completely determined by its values on combinatorial product events.

Consequently, to know the value of \(P[A]\) for any event \(A\) it suffices to know, for \(k=2,3, \ldots, n\) , the conditional probability \(P\left[A_{k} \mid A_{1}, \ldots, A_{k-1}\right]\) of any event \(A_{k}\) depending on the \(k\) th trial, given any events \(A_{1}, A_{2}, \ldots\) , \(A_{k-1}\) depending on the 1 st, 2 nd, \(\ldots,(k-1)\) st trials, respectively; one also must know \(P\left[A_{1}\right]\) for any event \(A_{1}\) depending on the first trial. In other words, if one assumes a knowledge of

\begin{align} & P\left[A_{1}\right] \\ & P\left[A_{2} \mid A_{1}\right] \\ & P\left[A_{3} \mid A_{1}, A_{2}\right] \tag{4.2} \\ & \cdot \\ & \cdot \\ & P\left[A_{n} \mid A_{1}, A_{2}, \ldots, A_{n-1}\right] \end{align}

for any events \(A_{1}\) in \(\mathscr{A}_{1}, A_{2}\) in \(\mathscr{A}_{2}, \ldots, A_{n}\) in \(\mathscr{A}_{n}\) , one has thereby specified the value of \(P[A]\) for any event \(A\) on \(S\) .

Example 4A . Consider an urn containing \(M\) balls of which \(M_{W}\) are white. Let a sample of size \(n \leq M_{W}\) be drawn without replacement. Let us find the probability of the event that all the balls drawn will be white. The problem was solved in section 3 of Chapter 2; here, let us see how (4.1) may be used to provide insight into that solution. For \(i=1, \ldots, n\) let \(A_{i}\) be the event that the ball drawn on the ith draw is white. We are then seeking \(P\left[A_{1} A_{2} \ldots A_{n}\right]\) . It is intuitively appealing that the conditional probability of drawing a white ball on the \(i\) th draw, given that white balls were drawn on the preceding \((i-1)\) draws, is described for \(i=2, \ldots, n\) by \[P\left[A_{i} \mid A_{1}, A_{2}, \ldots, A_{i-1}\right]=\frac{M_{W}-(i-1)}{M-(i-1)} \tag{4.3}\] since just before the \(i\) th draw there are \(M-(i-1)\) balls in the urn, of which \(M_{W}-(i-1)\) are white. Let us assume that (4.3) is valid; more generally, we assume a knowledge of all the probabilities in (4.1) by means of the assumption that, whatever the first \((i-1)\) choices, at the ith draw each of the remaining \(M-i+1\) elements will have probability \(1 /(M-i+1)\) of being chosen. Then, from (4.1) it follows that \[P\left[A_{1} A_{2} \cdots A_{n}\right]=\frac{M_{W}\left(M_{W}-1\right) \cdots\left(M_{W}-n+1\right)}{M(M-1) \cdots(M-n+1)}, \tag{4.4}\] which agrees with (3.1) of Chapter 2 for the case of \(k=n\) .

Further illustrations of the specification of a probability function on the subsets of a space of \(n\) dependent trials by means of conditional probability functions of the form given in (4.1) are supplied in examples 4B and 4C.

Example 4B . Consider two urns; urn I contains five white and three black balls, urn II, three white and seven black balls. One of the urns is selected at random, and a ball is drawn from it. Find the probability that the ball drawn will be white.

Solution

The sample description space of the experiment described consists of 2-tuples \(\left(z_{1}, z_{2}\right)\) , in which \(z_{1}\) is the number of the urn chosen and \(z_{2}\) is the “name” of the ball chosen. The probability function \(P[\cdot]\) on the subsets of \(S\) is specified by means of the functions listed in (4.1) , with \(n=2\) , which the assumptions stated in the problem enable us to compute. In particular, let \(C_{1}\) be the event that urn I is chosen, and let \(C_{2}\) be the event that urn II is chosen. Then \(P\left[C_{1}\right]=P\left[C_{2}\right]=\frac{1}{2}\) . Next, let \(B\) be the event that a white ball is chosen. Then \(P\left[B \mid C_{1}\right]=\frac{5}{8}\) , and \(P\left[B \mid C_{2}\right]=\frac{3}{10}\) . The events \(C_{1}\) and \(C_{2}\) are the complements of each other. Consequently, by (4.5) of Chapter 2,

\[P[B]=P\left[B \mid C_{1}\right] P\left[C_{1}\right]+P\left[B \mid C_{2}\right] P\left[C_{2}\right]=\frac{37}{80} \tag{4.5}\]

Example 4C . A case of hemophilia . ¹The first child born to a certain woman was a boy who had hemophilia. The woman, who had a long family history devoid of hemophilia, was perturbed about having a second child. She reassured herself by reasoning as follows. “My son obviously did not inherit his hemophilia from me. Consequently, he is a mutant. The probability that my second child will have hemophilia, if he is a boy, is consequently the probability that he will be a mutant, which is a very small number \(m\) (equal to, say, \(1 / 100,000\) )”. Actually, what is the conditional probability that a second son will have hemophilia, given that the first son had hemophilia?

Solution

Let us write a 3-tuple \(\left(z_{1}, z_{2}, z_{3}\right)\) to describe the history of the mother and her two sons with regard to hemophilia. Let \(z_{1}\) equal \(s\) or \(f\) , depending on whether the mother is or is not a hemophilia carrier. Let \(z_{2}\) equal \(s\) or \(f\) , depending on whether the first son is or is not hemophilic. Let \(z_{3}\) equal \(s\) or \(f\) , depending on whether the second son will or will not have hemophilia. On this sample description space, we define the events \(A_{1}, A_{2}\) , and \(A_{3}: A_{1}\) is the event that the mother is a hemophilia carrier, \(A_{2}\) is the event that the first son has hemophilia, and \(A_{3}\) is the event that the second son will have hemophilia. To specify a probability function on the subsets of \(S\) , we specify all conditional probabilities of the form given in (4.1) :

\begin{align} & P\left[A_{1}\right]=2 m, \quad P\left[A_{1}^{c}\right]=1-2 m, \\ & P\left[A_{2} \mid A_{1}\right]=\frac{1}{2}, \quad P\left[A_{2}^{c} \mid A_{1}\right]=\frac{1}{2}, \\ & P\left[A_{2} \mid A_{1}^{c}\right]=m, \quad P\left[A_{2}^{c} \mid A_{1}^{c}\right]=1-m, \\ & P\left[A_{3} \mid A_{1}, A_{2}\right]=P\left[A_{3} \mid A_{1}, A_{2}^{c}\right]=\frac{1}{2}, \\ & P\left[A_{3}^{c} \mid A_{1}, A_{2}\right]=P\left[A_{3}^{c} \mid A_{1}, A_{2}^{c}\right]=\frac{1}{2}, \\ & P\left[A_{3} \mid A_{1}^{c}, A_{2}\right]=P\left[A_{3} \mid A_{1}^{c}, A_{2}^{c}\right]=m, \\ & P\left[A_{3}^{c} \mid A_{1}^{c}, A_{2}\right]=P\left[A_{3}^{c} \mid A_{1}^{c}, A_{2}^{c}\right]=1-m . \end{align}

In making these assumptions (4.6) we have used the fact that the woman has no family history of hemophilia. A boy usually carries an \(X\) chromosome and a \(Y\) chromosome; he has hemophilia if and only if, instead of an \(X\) chromsome, he has an \(X^{\prime}\) chromosome which bears a gene causing hemophilia. Let \(m\) be the probability of mutation of an \(X\) chromosome into an \(X^{\prime}\) chromosome. Now the mother carries two \(X\) chromosomes. Event \(A_{1}\) can occur only if at least one of these \(X\) chromosomes is a mutant; this will happen with probability \(1-(1-m)^{2} \doteq 2 m\) , since \(m^{2}\) is much smaller than \(2 \mathrm{~m}\) . Assuming that the woman is a hemophilia carrier and exactly one of her chromosomes is \(X^{\prime}\) , it follows that her son will have probability \(\frac{1}{2}\) of inheriting the \(X^{\prime}\) chromosome.

We are seeking \(P\left[A_{3} \mid A_{2}\right]\) . Now

\[P\left[A_{3} \mid A_{2}\right]=\frac{P\left[A_{2} A_{3}\right]}{P\left[A_{2}\right]} \tag{4.7}\]

To compute \(P\left[A_{2} A_{3}\right]\) , we use the formula

\begin{align} P\left[A_{2} A_{3}\right]= & P\left[A_{1} A_{2} A_{3}\right]+P\left[A_{1}^{c} A_{2} A_{3}\right] \tag{4.8} \\ = & P\left[A_{1}\right] P\left[A_{2} \mid A_{1}\right] P\left[A_{3} \mid A_{2}, A_{1}\right] \\ & +P\left[A_{1}^{c}\right] P\left[A_{2} \mid A_{1}^{c}\right] P\left[A_{3} \mid A_{2}, A_{1}^{c}\right] \\ = & 2 m\left(\frac{1}{2}\right)^{\frac{1}{2}}+(1-2 m) m m \\ = & \frac{1}{2} m \end{align}

since we may consider \(1-2 m\) as approximately equal to 1 and \(m^{2}\) as approximately equal to 0. To compute \(P\left[A_{2}\right]\) , we use the formula

\begin{align} P\left[A_{2}\right] & =P\left[A_{2} \mid A_{1}\right] P\left[A_{1}\right]+P\left[A_{2} \mid A_{1}^{c}\right] P\left[A_{1}^{c}\right] \tag{4.9} \\ & =\frac{1}{2} 2 m+m(1-2 m) \\ & \doteq 2 m. \end{align}

Consequently,

\[P\left[A_{3} \mid A_{2}\right]=\frac{\frac{1}{2} m}{2 m}=\frac{1}{4} \tag{4.10}\]

Thus the conditional probability that the second son of a woman with no family history of hemophilia will have hemophilia, given that her first son has hemophilia, is approximately \(\frac{1}{4}\) !

A very important use of the notion of conditional probability derives from the following extension of (4.5) . Let \(C_{1}, C_{2}, \ldots, C_{n}\) be \(n\) events, each of positive probability, which are mutually exclusive and are also exhaustive (that is, the union of all the events \(C_{1}, C_{2}, \ldots, C_{n}\) is equal to the certain event). Then, for any event \(B\) one may express the unconditional probability \(P[B]\) of \(B\) in terms of the conditional probabilities \(P\left[B \mid C_{1}\right], \ldots\) , \(P\left[B \mid C_{n}\right]\) and the unconditional probabilities \(P\left[C_{1}\right], \ldots, P\left[C_{n}\right]\) :

\[P[B]=P\left[B \mid C_{1}\right] P\left[C_{1}\right]+\cdots+P\left[B \mid C_{n}\right] P\left[C_{n}\right] \tag{4.11}\] if \[C_{1} \cup C_{2} \cup \cdots \cup C_{n}=S, \quad C_{i} C_{j}=\emptyset \quad \text { for } i \neq j,\] \[P\left[C_{i}\right]>0.\] Equation (4.11) follows immediately from the relation \[P[B]=P\left[B C_{1}\right]+P\left[B C_{2}\right]+\cdots+P\left[B C_{n}\right] \tag{4.12}\] and the fact that \(P\left[B C_{i}\right]=P\left[B \mid C_{i}\right] P\left[C_{i}\right]\) for any event \(C_{i}\) .

Example 4D . On drawing a sample from a sample. Consider a box containing five radio tubes selected at random from the output of a machine, which is known to be \(20 \%\) defective on the average (that is, the probability that an item produced by the machine will be defective is 0.2).

(i) Find the probability that a tube selected from the box will be defective.

(ii) Suppose that a tube selected at random from the box is defective; what is the probability that a second tube selected at random from the box will be defective?

Solution

To describe the results of the experiment that consists in selecting five tubes from the output of the machine and then selecting one tube from among the five previously selected, we write a 6-tuple \(\left(z_{1}, z_{2}, z_{3}\right.\) , \(z_{4}, z_{5}, z_{6}\) ); for \(k=1,2, \ldots, 5, z_{k}\) is equal to \(s\) or \(f\) , depending on whether the \(k\) th tube selected is defective or non-defective, whereas \(z_{6}\) is equal to \(s\) or \(f\) , depending on whether the tube selected from those previously selected is defective or non-defective. For \(j=0, \ldots, 5\) let \(C_{j}\) denote the event that \(j\) defective tubes were selected from the output of the machine.

Assuming that the selections were independent, \(P\left[C_{j}\right]=\left(\begin{array}{l}5 \\ j\end{array}\right)(0.2)^{j}(0.8)^{5-j}\) . Let \(B\) denote the event that the sixth tube selected from the box, is defective. We assume that \(P\left[B \mid C_{j}\right]=j / 5\) ; in words, each of the tubes in the box is equally likely to be chosen. By (4.11), it follows that

\[P[B]=\sum_{j=0}^{5} \frac{j}{5}\left(\begin{array}{l} 5 \tag{4.13} \\ j \end{array}\right)(0.2)^{j}(0.8)^{5-j}\]

To evaluate the sum in (4.13), we write it as \[\sum_{j=1}^{5} \frac{j}{5}\left(\begin{array}{l} 5 \tag{4.14} \\ j \end{array}\right)(0.2)^{j}(0.8)^{5-j}=(0.2) \sum_{j=1}^{5}\left(\begin{array}{c} 4 \\ j-1 \end{array}\right)(0.2)^{j-1}(0.8)^{4-(j-1)}=0.2\] in which we have used the easily verifiable fact that \[\frac{j}{n}\left(\begin{array}{l} n \tag{4.15} \\ j \end{array}\right)=\left(\begin{array}{c} n-1 \\ j-1 \end{array}\right)\] and the fact that the last sum in (4.14) is equal to 1 by the binomial theorem. Combining (4.13) and (4.14), we have \(P[B]=0.2\) . In words, we have proved that selecting an item randomly from a sample which has been selected randomly from a larger population is statistically equivalent to selecting the item from the larger population. Note the fact that \(P[B]=0.2\) does not imply that the box containing five tubes will always contain one defective tube.

Let us next consider part (ii) of example 4D. To describe the results of the experiment that consists in selecting five tubes from the output of the machine and then selecting two tubes from among the five previously selected, we write a 7 -tuple \(\left(z_{1}, z_{2}, \ldots, z_{7}\right)\) , in which \(z_{6}\) and \(z_{7}\) denote the tubes drawn from the box containing the first five tubes selected. Let \(C_{0}, \ldots, C_{5}\) and \(B\) be defined as before. Let \(A\) be the event that the seventh tube is defective. We seek \(P[A \mid B]\) . Now, if two tubes, each of which has probability 0.2 of being defective, are drawn independently, the conditional probability that the second tube will be defective, given that the first tube is defective, is equal to the unconditional probability that the second tube will be defective, which is equal to 0.2. We now proceed to prove that \(P[A \mid B]=0.2\) . In so doing, we are proving a special case of the principle that a sample of size 2, drawn without replacement from a sample of any size whose members are selected independently from a given population, has statistically the same properties as a sample of size 2 whose members are selected independently from the population! More general statements of this principle are given in the theoretical exercises of section 4, Chapter 4 . We prove that \(P[A \mid B]=0.2\) under the assumption that \(P\left[A B \mid C_{j}\right]=(j)_{2} /(5)_{2}\) for \(j=0, \ldots, 5\) . Then, by (4.11), \begin{align} P[A B] & =\sum_{j=0}^{5} \frac{(j)_{2}}{(5)_{2}}\left(\begin{array}{l} 5 \\ j \end{array}\right)(0.2)^{j}(0.8)^{5-j} \\ & =(0.2)^{2} \sum_{j=2}^{5}\left(\begin{array}{c} 3 \\ j-2 \end{array}\right)(0.2)^{j-2}(0.8)^{3-(j-2)}=(0.2)^{2} \end{align}

Consequently, \(P[A \mid B]=P[A B] / P[B]=(0.2)^{2} /(0.2)=0.2\) .

Bayes’s Theorem. There is an interesting consequence to (4.11) , which has led to much philosophical speculation and has been the source of much controversy. Let \(C_{1}, C_{2}, \ldots, C_{n}\) be \(n\) mutually exclusive and exhaustive events, and let \(B\) be an event for which one knows the conditional probabilities \(P\left[B \mid C_{i}\right]\) of \(B\) , given \(C_{i}\) , and also the absolute probabilities \(P\left[C_{i}\right]\) . One may then compute the conditional probability \(P\left[C_{i} \mid B\right]\) of any one of the events \(C_{i}\) , given \(B\) , by the following formula:

\[P\left[C_{i} \mid B\right]=\frac{P\left[B C_{i}\right]}{P[B]}=\frac{P\left[B \mid C_{i}\right] P\left[C_{i}\right]}{\sum_{j=1}^{n} P\left[B \mid C_{j}\right] P\left[C_{j}\right]} \tag{4.16}\]

The relation expressed by (4.16) is called “Bayes’s theorem” or “Bayes’s formula”, after the English philosopher Thomas Bayes. ²If the events \(C_{i}\) are called “causes,” then Bayes’s formula can be regarded as a formula for the probability that the event \(B\) , which has occurred, is the result of the “cause” \(C_{i}\) . In this way (4.16) has been interpreted as a formula for the probabilities of “causes” or “hypotheses”. The difficulty with this interpretation, however, is that in many contexts one will rarely know the probabilities, especially the unconditional probabilities \(P\left[C_{i}\right]\) of the “causes,” which enter into the right-hand side of (4.16). However, Bayes’s theorem has its uses, as the following examples indicate. ³

Example 4E . Cancer diagnosis. Suppose, contrary to fact, there were a diagnostic test for cancer with the properties that \(P[A \mid C]=0.95\) , \(P\left[A^{c} \mid C^{c}\right]=0.95\) , in which \(C\) denotes the event that a person tested has cancer and \(A\) denotes the event that the test states that the person tested has cancer. Let us compute \(P[C \mid A]\) , the probability that a person who according to the test has cancer actually has it. We have

\[P[C \mid A]=\frac{P[A C]}{P[A]}=\frac{P[A \mid C] P[C]}{P[A \mid C] P[C]+P\left[A \mid C^{c}\right] P\left[C^{c}\right]}. \tag{4.17}\]

Let us assume that the probability that a person taking the test actually has cancer is given by \(P[C]=0.005\) . Then \begin{align} P[C \mid A] & =\frac{(0.95)(0.005)}{(0.95)(0.005)+(0.05)(0.995)} \tag{4.18} \\ & =\frac{0.00475}{0.00475+0.04975}=0.087 \end{align}

One should carefully consider the meaning of this result. On the one hand, the cancer diagnostic test is highly reliable, since it will detect cancer in \(95 \%\) of the cases in which cancer is present. On the other hand, in only \(8.7 \%\) of the cases in which the test gives a positive result and asserts cancer to be present is it actually true that cancer is present! (This example is continued in exercise 4.8.)

Example 4F . Prior and posterior probability. Consider an urn that contains a large number of coins: Not all of the coins are necessarily fair. Let a coin be chosen randomly from the urn and tossed independently 100 times. Suppose that in the 100 tosses heads appear 55 times. What is the probability that the coin selected is a fair coin (that is, the probability that the coin will fall heads at each toss is equal to \(\frac{1}{2}\) )?

Solution

To describe the results of the experiment we write a 101-tuple \(\left(z_{1}, z_{2}, \ldots, z_{101}\right)\) . The components \(z_{2}, \ldots, z_{101}\) are \(H\) or \(T\) , depending on whether the outcome of the respective toss is heads or tails. What are the possible values that may be assumed by the first component \(z_{1}\) ? We assume that there is a set of \(N\) numbers, \(p_{1}, p_{2}, \ldots, p_{N}\) , each between 0 and 1, such that any coin in the urn has as its probability of falling heads some one of the numbers \(p_{1}, p_{2}, \ldots, p_{\mathrm{N}}\) . Having selected a coin from the urn, we let \(z_{1}\) denote the probability that the coin will fall heads; consequently, \(z_{1}\) is one of the numbers \(p_{1}, \ldots, p_{X}\) . Now, for \(j=1,2, \ldots, N\) let \(C_{j}\) be the event that the coin selected has probability \(p_{j}\) of falling heads, and let \(B\) be the event that the coin selected yielded 55 heads in 100 tosses. Let \(j_{0}\) be the number, 1 to \(N\) , such that \(p_{j_{0}}=\frac{1}{2}\) . We are now seeking \(P\left[C_{j_{0}} \mid B\right]\) , the conditional probability that the coin selected is a fair coin, given that it yielded 55 heads in 100 tosses. In order to use (4.16) to evaluate \(P\left[C_{j_{0}} \mid B\right]\) , we require a knowledge of \(P\left[C_{j}\right]\) and \(P\left[B \mid C_{j}\right]\) for \(j=1, \ldots, N\) . By the binomial law,

\[P\left[B \mid C_{j}\right]=\left(\begin{array}{c} 100 \tag{4.19} \\ 55 \end{array}\right)\left(p_{j}\right)^{55}\left(1-p_{j}\right)^{45}\]

The probabilities \(P\left[C_{j}\right]\) cannot be computed but must be assumed. The probability \(P\left[C_{j}\right]\) represents the proportion of coins in the urn which has probability \(p_{j}\) of falling heads. It is clear that the value we obtain for \(P\left[C_{j_{0}} \mid B\right]\) depends directly on the values we assume for \(P\left[C_{1}\right], \ldots, P\left[C_{v}\right]\) . If the latter probabilities are unknown to us, then we must resign ourselves to not being able to compute \(P\left[C_{j_{8}} \mid B\right]\) . However, let us obtain a numerical answer for \(P\left[C_{j_{0}} \mid B\right]\) under the assumption that \(P\left[C_{1}\right]=\cdots=P\left[C_{N}\right]=\) \(1 / N\) , so that a coin selected from the urn is equally likely to have any one of the probabilities \(p_{1}, \ldots, p_{N}\) . We then obtain that \[P\left[C_{j_{0}} \mid B\right]=\frac{(1 / N)\left(\begin{array}{c} 100 \tag{4.20} \\ 55 \end{array}\right)\left(p_{j_{0}}\right)^{55}\left(1-p_{j_{0}}\right)^{45}}{(1 / N) \sum_{j=1}^{N}\left(\begin{array}{c} 100 \\ 55 \end{array}\right)\left(p_{j}\right)^{55}\left(1-p_{j}\right)^{45}}\]

Let us next assume that \(N=9\) , and \(p_{j}=j / 10\) for \(j=1,2, \ldots, 9\) . Then \(j_{0}=5\) , and \begin{align} P\left[C_{5} \mid B\right] & =\frac{\left(\begin{array}{c} 100 \\ 55 \end{array}\right)(1 / 2)^{100}}{\sum_{j=1}^{7}\left(\begin{array}{c} 100 \\ 55 \end{array}\right)(j / 10)^{55}[(10-j) / 10]^{45}} \tag{4.21} \\ & =\frac{0.048475}{0.097664}=0.496 . \end{align}

The probability \(P\left[C_{5}\right]=\frac{1}{9}\) is called the prior (or a priori) probability of the event \(C_{5}\) ; the conditional probability \(P\left[C_{5} \mid B\right]=0.496\) is called the posterior (or a posteriori) probability of the event \(C_{5}\) . The prior probability is an unconditional probability that is known to us before any observations are taken. The posterior probability is a conditional probability that is of interest to us only if it is known that the conditioning event has occurred.

Our next example illustrates a controversial use of Bayes’s theorem.

Example 4G . Laplace’s rule of succession. Consider a coin that in \(n\) independent tosses yields \(k\) heads. What is the probability that \(n^{\prime}\) subsequent independent tosses will yield \(k^{\prime}\) heads? The problem may also be phrased in terms of drawing balls from an urn. Consider an urn that contains white and red balls in unknown proportions. In a sample of size \(n\) , drawn with replacement from the urn, \(k\) white balls appear. What is the probability that a sample of size \(n^{\prime}\) drawn with replacement will contain \(k^{\prime}\) white balls? A particular case of this problem, in which \(k=n\) and \(k^{\prime}=\) \(n^{\prime}\) , can be interpreted as a simple form of the fundamental problem of inductive inference if one formulates the problem as follows: if \(n\) independent trials of an experiment have resulted in success, what is the probability that \(n^{\prime}\) additional independent trials will result in success? Another reformulation is this: if the results of \(n\) independent experiments, performed to test a theory, agree with the theory, what is the probability that \(n^{\prime}\) additional independent experiments will agree with the theory.

Solution

To describe the results of our observations, we write an \(\left(n+n^{\prime}+1\right)\) -tuple \(\left(z_{1}, z_{2}, \ldots, z_{n+n^{\prime}+1}\right)\) in which the components \(z_{2}, \ldots\) , \(z_{n+1}\) describe the outcomes of the coin tosses which have been made and the components \(z_{n+2}, \ldots, z_{n+n^{\prime}+1}\) describe the outcomes of the subsequent coin tosses. The first component \(z_{1}\) describes the probability that the coin tossed has of falling heads; we assume that there are \(N\) known numbers, \(p_{1}, p_{2}, \ldots, p_{N}\) , which \(z_{1}\) can take as its value. We have italicized this assumption to indicate that it is considered controversial. For \(j=1,2, \ldots\) , \(N\) let \(C_{j}\) be the event that the coin tossed has probability \(p_{j}\) of falling heads. Let \(B\) be the event that the coin yields \(n\) heads in its first \(n\) tosses, and let \(A\) be the event that it yields \(n^{\prime}\) heads in its subsequent \(n^{\prime}\) tosses. We are seeking \(P[A \mid B]\) . Now

whereas

\begin{align} P[A B] & =\sum_{j=1}^{N} P\left[A B \mid C_{j}\right] P\left[C_{j}\right] \tag{4.22} \\ & =\sum_{j=1}^{N}\left(p_{j}\right)^{n+n^{\prime}} P\left[C_{j}\right] \end{align}

\[P[B]=\sum_{j=1}^{N}\left(p_{j}\right)^{n} P\left[C_{j}\right] \tag{4.23}\]

Let us now assume that \(p_{j}\) is equal to \(j / N\) and that \(P\left[C_{j}\right]=1 / N\) . Then

\[P[A \mid B]=\frac{(1 / N) \sum_{j=1}^{N}(j / N)^{n+n^{\prime}}}{(1 / N) \sum_{j=1}^{N}(j / N)^{n}} \tag{4.24}\]

The sums in (4.24) may be approximately evaluated in the case that \(N\) is large by means of the integral calculus. The sums can be regarded as approximating sums of Riemann integrals, and we have

\begin{align} \frac{1}{N} \sum_{j=1}^{N}\left(\frac{j}{N}\right)^{n+n^{\prime}} & \doteq \int_{0}^{1} x^{n+n^{\prime}} d x=\frac{1}{n+n^{\prime}+1} \\ \frac{1}{N} \sum_{j=1}^{N}\left(\frac{j}{N}\right)^{n} & \doteq \int_{0}^{1} x^{n} d x=\frac{1}{n+1} . \tag{4.25} \end{align}

Consequently, given that the first \(n\) tosses yielded a head, the conditional probability that \(n^{\prime}\) subsequent tosses of the coin will yield a head, under the assumption that the probability of the coin falling heads is equally likely to be any one of the numbers \(1 / N, 2 / N, \ldots, N / N\) , and \(N\) is large, is given by

\[P[A \mid B]=\frac{n+1}{n+n^{\prime}+1} \tag{4.26}\]

Equation (4.26) is known as Laplace’s general rule of succession. If we take \(n^{\prime}=1\) , then

\[P[A \mid B]=\frac{n+1}{n+2}. \tag{4.27}\]

Equation (4.27) is known as Laplace’s special rule of succession.

Equation (4.27) has been interpreted by some writers on probability theory to imply that if a theory has been verified in \(n\) consecutive trials then the probability of its being verified on the \((n+1)\) st trial is \((n+1)\) / \((n+2)\) . That the rule has a certain appeal at first acquaintance may be seen from the following example:

Consider a tourist in a foreign city who scarcely understands the language. With trepidation, he selects a restaurant in which to eat. After ten meals taken there he has felt no ill effects. Consequently, he goes quite confidently to the restaurant the eleventh time in the knowledge that, according to the rule of succession, the probability is \(\frac{11}{12}\) that he will not be poisoned by his next meal.

However, it is easy to exhibit applications of the rule that lead to absurd answers. A boy is 10 years old today. The rule says that, having lived ten years, he has probability \(\frac{11}{12}\) of living one more year. On the other hand, his 80 -year-old grandfather has probability \(81 / 82\) of living one more year! Yet, in fact, the boy has a greater probability of living one more year.

Laplace gave the following often-quoted application of the special rule of succession. “Assume”, he says, “that history goes back 5000 years, that is, \(1,826,213\) days. The sun rose each day and so you can bet \(1,826,214\) against 1 that the sun will rise again tomorrow”. However, before believing this assertion, ask yourself if you would believe the following consequence of the general rule of succession; the sun having risen on each of the last \(1,826,213\) days, the probability that it will rise on each of the next \(1,826,214\) days is \(\frac{1}{2}\) , which means that the probability is \(\frac{1}{2}\) that on at least one of the next \(1,826,214\) days the sun will not rise.

It is to be emphasized that Baye’s formula and Laplace’s rule of succession are true theorems, of mathematical probability theory. The foregoing examples do not in any way cast doubt on the validity of these theorems. Rather they serve to illustrate what may be called the fundamental principle of applied probability theory: before applying a theorem, one must carefully ponder whether the hypotheses of the theorem may be assumed to be satisfied.

Theoretical Exercises

4.1 . An urn contains \(M\) balls, of which \(M_{T W}\) are white (where \(M_{W} \leq M\) ). Let a sample of size \(m\) (where \(m \leq M_{W}\) ) be drawn from the urn with replacement [without replacement] and deposited in an empty urn. Let a sample of size \(n\) (where \(n \leq m\) ) be drawn from the second urn without replacement. Show that for \(k=0,1, \ldots, n\) the probability that the second sample will contain exactly \(k\) white balls continues to be given by (3.2) [(3.1)] of Chapter 2. The result shows that, as one might expect, drawing a sample of size \(n\) from a sample of larger size is statistically equivalent to drawing a sample of size \(n\) from the urn. An alternate statement of this theorem, and an outline of the proof, is given in theoretical exercise 4.1 of Chapter 4.

4.2 . Consider a box containing \(N\) radio tubes selected at random from the output of a machine; the probability \(p\) that an item produced by the machine is defective is known.

(i) Let \(k \leq n \leq N\) be integers. Show that the probability that \(n\) tubes selected at random from the box will have \(k\) defectives is given by \(\left(\begin{array}{l}n \\ k\end{array}\right) p^{k} q^{n-k}\) :

(ii) Suppose that \(m\) tubes are selected at random from the box and found to be defective. Show that the probability that \(n\) tubes selected at random from the remaining \(N-m\) tubes in the box will contain \(k\) defectives is equal to \(\left(\begin{array}{l}n \\ k\end{array}\right) p^{k} q^{n-k}\) .

(iii) Suppose that \(m+n\) tubes are selected at random from the box and tested. You are informed that at least \(m\) of the tubes are defective; show that the probability that exactly \(m+k\) tubes are defective, where \(k\) is an integer from 0 to \(n\) , is given by (3.13) . Express in words the conclusions implied by this exercise.

4.3 . Consider an urn containing \(M\) balls, of which \(M_{W}\) are white. Let \(N\) be an integer such that \(N \geq M_{I V}\) . Choose an integer \(n\) at random from the set \(\{1,2, \ldots, N\}\) , and then choose a sample of size \(n\) without replacement from the urn. Show that the probability that all the balls in the sample will be white (letting \(M_{R}=M-M_{W}\) ) is equal to

\[\frac{1}{N} \sum_{k=1}^{N} \frac{\left(M_{W}\right)_{k}}{(M)_{k}}=\frac{1}{N} \frac{M_{W}}{M_{R}+1}.\]

4.4 . An application of Bayes’s theorem. Suppose that in answering a question on a multiple choice test an examinee either knows the answer or he guesses. Let \(p\) be the probability that he will know the answer, and let \(1-p\) be the probability that he will guess. Assume that the probability of answering a question correctly is unity for an examinee who knows the answer and \(1 / m\) for an examinee who guesses; \(m\) is the number of multiple choice alternatives. Show that the conditional probability that an examinee knew the answer to a question, given that he has correctly answered it, is equal to \[\frac{m p}{1+(m-1) p}\]

4.5 . Solution of a difference equation. The difference equation

\[p_{n}=a p_{n-1}+b, \quad n=2,3, \ldots,\]

in which \(a\) and \(b\) are given constants, arises in the theory of Markov dependent trials (see section 5). By mathematical induction, show that if a sequence of numbers \(p_{1}, p_{2}, \ldots, p_{n}\) satisfies this difference equation, and if \(a \neq 1\) , then

\[p_{n}=\left(p_{1}-\frac{b}{1-a}\right) a^{n-1}+\frac{b}{1-a}.\]

Exercises

4.1 . Urn I contains 5 white and 7 black balls. Urn II contains 4 white and 2 black balls. Find the probability of drawing a white ball if (i) 1 urn is selected at random, and a ball is drawn from it, (ii) the 2 urns are emptied into a third urn from which 1 ball is drawn.

Answer

(i) \(\frac{1}{2} \frac{3}{4}\) ; (ii) \(\frac{1}{2}\) .

4.2 . Un I contains 5 white and 7 black balls. Un II contains 4 white and 2 black balls. An urn is selected at random, and a ball is drawn from it. Given that the ball drawn is white, what is the probability that urn I was chosen?

4.3 . A man draws a ball from an urn containing 4 white and 2 red balls. If the ball is white, he does not return it to the urn; if the ball is red, he does return it. He draws another ball. Let \(A\) be the event that the first ball drawn is white, and let \(B\) be the event that the second ball drawn is white. Answer each of the following statements, true or false.(i) \(P[A]=\frac{2}{3}\) , (ii) \(P[B]=\frac{3}{5}\) , (iii) \(P[B \mid A]=\frac{3}{5}\) , (iv) \(P[A \mid B]=\frac{9}{14}\) , (v) The events \(A\) and \(B\) are mutually exclusive.(vi) The events \(A\) and \(B\) are independent.

Answer

(i) \(T\) ; (ii) \(F\) ; (iii) \(T\) ; (iv) \(T\) ; (v) \(F\) ; (vi) \(F\) .

4.4 . From an urn containing 6 white and 4 black balls, 5 balls are transferred into an empty second urn. From it 3 balls are transferred into an empty box. One ball is drawn from the box; it turns out to be white. What is the probability that exactly 4 of the balls transferred from the first to the second urn will be white?

4.5 . Consider an urn containing 12 balls, of which 8 are white. Let a sample of size 4 be drawn with replacement (without replacement). Next, let a ball be selected randomly from the sample of size 4. Find the probability that it will be white.

Answer

\(\frac{2}{3}\) .

4.6 . Urn I contains 6 white and 4 black balls. Urn II contains 2 white and 2 black balls. From urn I 2 balls are transferred to urn II. A sample of size 2 is then drawn without replacement from urn II. What is the probability that the sample will contain exactly 1 white ball?

4.7 . Consider a box containing 5 radio tubes selected at random from the output of a machine, which is known to be \(20 \%\) defective on the average (that is, the probability that an item produced by the machine will be defective is 0.2). Suppose that 2 tubes are selected at random from the box and tested. You are informed that at least 1 of the tubes selected is defective; what is the probability that both tubes will be defective?

Answer

\(\frac{1}{9}\) .

4.8 . Let the events \(A\) and \(C\) be defined as in example 4E. Let \(P[A \mid C]=\) \(P\left[A^{c} \mid C^{c}\right]=R\) and \(P[C]=0.005\) . What value must \(R\) have in order that \(P[C \mid A]=0.95\) ? Interpret your answer.

4.9 . In a certain college the geographical distribution of men students is as follows: \(50 \%\) come from the East, \(30 \%\) come from the Midwest, and \(20 \%\) come from the Far West. The following proportions of the men students wear ties: \(80 \%\) of the Easterners, \(60 \%\) of the Midwesterners, and \(40 \%\) of the Far Westerners. What is the probability that a student who wears a tie comes from the East? From the Midwest? From the Far West?

Answer

Let the event that a student wears a tie, comes from the East, comes from the Midwest, or comes from the Far West be denoted, respectively by \(A, B, C, D\) . Then \(P[B \mid A]=\frac{20}{33}, P[C \mid A]=\frac{9}{33}, P[D \mid A]=\frac{4}{33}\) .

4.10 . Consider an urn containing 10 balls, of which 4 are white. Choose an integer \(n\) at random from the set \(\{1,2,3,4,5,6\}\) and then choose a sample of size \(n\) without replacement from the urn. Find the probability that all the balls in the sample will be white.

4.11 . Each of 3 boxes, identical in appearance, has 2 drawers. Box A contains a gold coin in each drawer; box \(B\) contains a silver coin in each drawer; box \(C\) contains a gold coin in 1 drawer and a silver coin in the other. A box is chosen, one of its drawers is opened, and a gold coin is found.

(i) What is the probability that the other drawer contains a silver coin? Write out the probability space of the experiment. Why is it fallacious to reason that the probability is \(\frac{1}{2}\) that there will be a silver coin in the second drawer, since there are 2 possible types of coins, gold or silver, that may be found there?

(ii) What is the probability that the box chosen was box \(A\) ? Box \(B\) ? Box \(C\) ?

Answer

(i) \(\frac{1}{3}\) ; (ii) box \(A, \frac{2}{3}\) ; box \(B, 0\) ; box \(C\) , \(\frac{1}{3}\) .

4.12 . Three prisoners, whom we may call \(A, B\) , and \(C\) , are informed by their jailer that one of them has been chosen at random to be executed, and the other 2 are to be freed. Prisoner \(A\) , who has studied probability theory, then reasons to himself that he has probability \(\frac{1}{3}\) of being executed. He then asks the jailer to tell him privately which of his fellow prisoners will be set free, claiming that there would be no harm in divulging this information, since he already knows that at least 1 will go. The jailer (being an ethical fellow) refuses to reply to this question, pointing out that if \(A\) knew which of his fellows were to be set free then his probability of being executed would increase to \(\frac{1}{2}\) , since he would then be 1 of 2 prisoners, I of whom is to be executed. Show that the probability that \(A\) will be executed is still \(\frac{1}{3}\) , even if the jailer were to answer his question, assuming that, in the event that \(A\) is to be executed, the jailer is as likely to say that \(B\) is to be set free as he is to say that \(C\) is to be set free.

4.13 . A male rat is either doubly dominant \((A A)\) or heterozygous ( \(A a\) ), owing to Mendelian properties, the probabilities of either being true is \(\frac{1}{2}\) . The male rat is bred to a doubly recessive (aa) female. If the male rat is doubly dominant, the offspring will exhibit the dominant characteristic; if heterozygous, the offspring will exhibit the dominant characteristic \(\frac{1}{2}\) of the time and the recessive characteristic \(\frac{1}{2}\) of the time. Suppose all of 3 offspring exhibit the dominant characteristic. What is the probability that the male is doubly dominant?

Answer

\(\frac{8}{8}\) .

4.14 . Consider an urn that contains 5 white and 7 black balls. A ball is drawn and its color is noted. It is then replaced; in addition, 3 balls of the color drawn are added to the urn. A ball is then drawn from the urn. Find the probability that (i) the second ball drawn will be black, (ii) both balls drawn will be black.

4.15 . Consider a sample of size 3 drawn in the following manner. One starts with an urn containing 5 white and 7 red balls. At each trial a ball is drawn and its color is noted. The ball drawn is then returned to the urn, together with an additional ball of the same color. Find the probability that the sample will contain exactly (i) 0 white balls, (ii) 1 white ball, (iii) 3 white balls.

Answer

(i) \(\frac{3}{13}\) ; (ii) \(\frac{5}{13}\) ; (iii) \(\frac{5}{52}\) .

4.16 . A certain kind of nuclear particle splits into 0,1, or 2 new particles (which we call offsprings) with probabilities \(\frac{1}{4}, \frac{1}{2}\) , and \(\frac{1}{4}\) , respectively, and then dies. The individual particles act independently of each other. Given a particle, let \(X_{1}\) denote the number of its offsprings, let \(X_{2}\) denote the number of offsprings of its offsprings, and let \(X_{3}\) denote the number of offsprings of the offsprings of its offsprings.

(i) Find the probability that \(X_{2}>0\) .

(ii) Find the conditional probability that \(X_{1}=1\) , given that \(X_{2}=1\) ,

(iii) Find the probability that \(X_{3}=0\) .

4.17 . A number, denoted by \(X_{1}\) , is chosen at random from the set of integers \(\{1,2,3,4\}\) . A second number, denoted by \(X_{2}\) , is chosen at random from the set \(\left\{1,2, \ldots, X_{1}\right\}\) .

(i) For each integer \(k, 1\) to 4, find the conditional probability that \(X_{2}=1\) , given that \(X_{1}=k\) .

(ii) Find the probability that \(X_{2}=1\) .

(iii) Find the conditional probability that \(X_{1}=2\) , given that \(X_{2}=1\) .

Answer

(i) \(1 / k\) ; (ii) \(\frac{25}{48}\) ; (iii) 0.24.

I am indebted to my esteemed colleague Lincoln E. Moses for the idea of this example. ↩︎
A reprint of Bayes’s original essay may be found in Biometrika, Vol. 46 (1958), pp. 293–315. ↩︎
The use of Bayes’s formula to evaluate probabilities during the course of play of a bridge game is illustrated in Dan F. Waugh and Frederick V. Waugh, “On Probabilities in Bridge”, Journal of the American Statistical Association , Vol. 48 (1953), pp. 79–87. ↩︎