Probability Theory and Its Applications

Of interest in many problems of applied probability theory is the evolution in time of the state of a random phenomenon. For example, suppose one has two urns (I and II), each of which contains one white and one black ball. One ball is drawn simultaneously from each urn and placed in the other urn. One is often concerned with questions such as, what is the probability that after 100 repetitions of this procedure urn I will contain two white balls? The theory of Markov ¹chains is applicable to questions of this type.

The theory of Markov chains relates to every field of physical and social science (see the forthcoming book by A. T. Bharucha-Reid, Introduction to the Theory of Markov Processes and their Applications ; for applications of the theory of Markov chains to the description of social or psychological phenomena, see the book by Kemeny and Snell cited in the next paragraph). There is an immense literature concerning the theory of Markov chains. In this section and the next we can provide only a brief introduction.

Excellent elementary accounts of this theory are to be found in the works of W. Feller, An Introduction to Probability Theory and Its Applications , second edition, Wiley, New York, 1957, and J. G. Kemeny and J. L. Snell, Finite Markov Chains , Van Nostrand, Princeton, New Jersey, 1959. The reader is referred to these books for proof of the assertions made in section 6 .

The natural generalization of the notion of independent Bernoulli trials is the notion of Markov dependent Bernoulli trials. Given \(n\) trials of an experiment, which has only two possible outcomes (denoted by \(s\) or \(f\) , for “success” or “failure”), we recall that they are said to be independent Bernoulli trials if for any integer \(k(1\) to \(n-1)\) and \(k+1\) events \(A_{1}\) , \(A_{2}, \ldots, A_{k+1}\) , depending, respectively, on the first, second,…, \((k+1)\) st trials, \[P\left[A_{k+1} \mid A_{k}, A_{k-1}, \ldots, A_{1}\right]=P\left[A_{k+1}\right] \tag{5.1}\] We define the trials as Markov dependent Bernoulli trials if, instead of (5.1) , it holds that \[P\left[A_{k+1} \mid A_{k}, A_{k-1}, \ldots, A_{1}\right]=P\left[A_{k+1} \mid A_{k}\right] \tag{5.2}\] In words, (5.2) says that at the \(k\) th trial the conditional probability of any event \(A_{k+1}\) , depending on the next trial, will not depend on what has happened in past trials but only on what is happening at the present time. One sometimes says that the trials have no memory.

Suppose that the quantities \begin{align} P(s, s)= & \text { probability of success on the }(k+1) \text {st trial, } \\ & \text { given that there was success on the } k \text {th trial, } \\ P(f, s)= & \text { probability of success at the }(k+1) \text {st trial, } \\ & \text { given that there was failure at the } k \text {th trial, } \tag{5.3} \\ P(f, f)= & \text { probability of failure at the }(k+1) \text {st trial, } \\ & \text { given that there was failure at the } k \text {th trial, } \\ P(s, f)= & \text { probability of failure at the }(k+1) \text {st trial, } \\ & \text { given that there was success at the } k \text {th trial, } \end{align} are independent of \(k\) . We then say that the trials are Markov dependent repeated Bernoulli trials .

Example 5A . Let the weather be observed on \(n\) consecutive days. Let \(s\) describe a day on which rain falls, and let \(f\) describe a day on which no rain falls. Suppose one assumes that weather observations constitute a series of Markov dependent Bernoulli trials (or, in the terminology of section 6, a Markov chain with two states). Then \(P(s, s)\) is the probability of rain tomorrow, given rain today; \(P(s, f)\) is the probability of no rain tomorrow, given rain today; \(P(f, f)\) is the probability of no rain tomorrow, given no rain today; and \(P(f, s)\) is the probability of rain tomorrow, given no rain today. It is now natural to ask for such probabilities as that of rain the day after tomorrow, given no rain today; we denote this probability by \(P_{2}(f, s)\) and obtain a formula for it.

In the case of independent repeated Bernoulli trials the probability function \(P[\cdot]\) on the sample description space of the \(n\) trials is completely specified once we have the probability \(p\) of success at each trial. In the case of Markov dependent repeated Bernoulli trials it suffices to specify the quantities \begin{align} p_{1}(s) & =\text { probability of success at the first trial, } \\ p_{1}(f) & =\text { probability of failure at the first trial, } \tag{5.4} \end{align} as well as the conditional probabilities in (5.3) . The probability of any event can be computed in terms of these quantities. For example, for \(k=1,2, \ldots, n\) let \begin{align} p_{k}(s) & =\text { probability of success at the } k \text { th trial, } \\ p_{k}(f) & =\text { probability of failure at the } k \text { th trial. } \tag{5.5} \end{align}

The quantities \(p_{k}(s)\) satisfy the following equations for \(k=2,3, \ldots, n\) : \begin{align} p_{k}(s) & =p_{k-1}(s) P(s, s)+p_{k-1}(f) P(f, s) \tag{5.6} \\ & =p_{k-1}(s) P(s, s)+\left[1-p_{k-1}(s)\right][1-P(f, f)] \\ & =p_{k-1}(s)[P(s, s)+P(f, f)-1]+[1-P(f, f)] \end{align}

To justify (5.6) , we reason as follows: if \(A_{k}\) is the event that there is success on the \(k\) th trial, then \[P\left[A_{k}\right]=P\left[A_{k-1}\right] P\left[A_{k} \mid A_{k-1}\right]+P\left[A_{k-1}^{c}\right] P\left[A_{k} \mid A^{c}_{k-1}\right] . \tag{5.7}\]

From (5.7) and the fact that (5.8) \(P(s, f)=1-P(s, s) \quad P(f, s)=1-P(f, f), \quad p_{k}(f)=1-p_{k}(s)\) one obtains (5.6) .

Equation (5.6) constitutes a recursive relationship for \(p_{k}(s)\) , known as a difference equation. Throughout this section we make the assumption that \[|P(s, s)+P(f, f)-1|<1. \tag{5.9}\]

By using theoretical exercise 4.5, it follows from (5.6) and (5.9) that for \(k=1,2, \ldots, n\) \begin{align} p_{k}(s)=\left[p_{1}(s)\right. & \left.-\frac{1-P(f, f)}{2-P(s, s)-P(f, f)}\right][P(s, s)+P(f, f)-1]^{k-1} \tag{5.10} \\ & +\left[\frac{1-P(f, f)}{2-P(s, s)-P(f, f)}\right]. \end{align}

By interchanging the role of \(s\) and \(f\) in (5.10) , we obtain, similarly, for \(k=1,2, \ldots, n\) \begin{align} p_{k}(f)=\left[p_{1}(f)\right. & \left.-\frac{1-P(s, s)}{2-P(s, s)-P(f, f)}\right][P(s, s)+P(f, f)-1]^{k-1} \tag{5.11} \\ & +\left[\frac{1-P(s, s)}{2-P(s, s)-P(f, f)}\right]. \end{align}

It is readily verifiable that the expressions in (5.10) and (5.11) sum to one, as they ought.

In many problems involving Markov dependent repeated Bernoulli trials we do not know the probability \(p_{1}(s)\) of success at the first trial. We can only compute the quantities

\begin{align} P_{k}(s, s)= & \text { conditional probability of success at the } \\ & (k+1) \text { st trial, given success at the first trial, } \\ P_{k}(s, f)= & \text { conditional probability of failure at the } \\ & (k+1) \text { st trial, given success at the first trial, } \\ P_{k}(f, f)= & \text { conditional probability of failure at the } \tag{5.12} \\ & (k+1) \text { st trial, given failure at the first trial, } \\ P_{k}(f, s)= & \text { conditional probability of success at the } \\ & (k+1) \text { st trial, given failure at the first trial. } \end{align}

Since \[P_{k}(s, f)=1-P_{k}(s, s), \quad P_{k}(f, s)=1-P_{k}(f, f) \tag{5.13}\] it suffices to obtain formulas for \(P_{k}(s, s)\) and \(P_{k}(f, f)\) .

In the same way that we obtained (5.6) we obtain \begin{align} P_{k}(s, s) & =P_{k-1}(s, s) P(s, s)+P_{k-1}(s, f) P(f, s) \tag{5.14} \\ & =P_{k-1}(s, s) P(s, s)+\left[1-P_{k-1}(s, s)\right][1-P(f, f)] \\ & =P_{k-1}(s, s)[P(s, s)+P(f, f)-1]+[1-P(f, f)] \end{align}

By using theoretical exericse 4.5, it follows from (5.14) and (5.9) that for \(k=1,2, \ldots, n\) \begin{align} P_{k}(s, s) & =\left[P_{1}(s, s)-\frac{1-P(f, f)}{2-P(s, s)-P(f, f)}\right] \tag{5.15} \\ & \times[P(s, s)+P(f, f)-1]^{k-1}+\left[\frac{1-P(f, f)}{2-P(s, s)-P(f, f)}\right] \end{align} which can be simplified to \begin{align} P_{k}(s, s)=\frac{1-P(s, s)}{2-P(s, s)-P(f, f)} & {[P(s, s)+P(f, f)-1]^{k} } \tag{5.16} \\ + & {\left[\frac{1-P(f, f)}{2-P(s, s)-P(f, f)}\right] . } \end{align}

By interchanging the role of \(s\) and \(f\) , we obtain, similarly, \begin{align} P_{k}(f, f)=\frac{1-P(f, f)}{2-P(s, s)-P(f, f)}[P(s, s)+P(f, f)-1]^{k} \tag{5.17}\\ \quad+\frac{1-P(s, s)}{2-P(s, s)-P(f, f)} \end{align} By using (5.13) , we obtain \begin{align} P_{k}(s, f)=-\frac{1-P(s, s)}{2-P(s, s)-P(f, f)}&[P(s, s)+P(f, f)-1]^{k}\\ & +\frac{1-P(s, s)}{2-P(s, s)-P(f, f)} \tag{5.18} \end{align} \begin{align} P_{k}(f, s)=-\frac{1-P(f, f)}{2-P(s, s)-P(f, f)}&[P(s, s)+P(f, f)-1]^{k} \\ & +\frac{1-P(f, f)}{2-P(s, s)-P(f, f)}.\tag{5.19} \end{align} Equations (5.16) to (5.19) represent the basic conclusions in the theory of Markov dependent Bernoulli trials (in the case that (5.9) holds).

Example 5B . Consider a communications system which transmits the digits 0 and 1. Each digit transmitted must pass through several stages, at each of which there is a probability \(p\) that the digit that enters will be unchanged when it leaves. Suppose that the system consists of three stages. What is the probability that a digit entering the system as 0 will be (i) transmitted by the third stage as 0, (ii) transmitted by each stage as 0 (that is, never changed from 0)? Evaluate these probabilities for \(p=\frac{1}{3}\) .

Solution

In observing the passage of the digit through the communications system, we are observing a 4 -tuple \(\left(z_{1}, z_{2}, z_{3}, z_{4}\right)\) , whose first component \(z_{1}\) is 1 or 0, depending on whether the digit entering the system is 1 or 0. For \(i=2,3,4\) the component \(z_{i}\) is equal to 1 or 0, depending on whether the digit leaving the \(i\) th stage is 1 or 0. We now use the foregoing formulas, identifying \(s\) with 1, say, and 0 with \(f\) . Ou basic assumption is that \[P(0,0)=P(1,1)=p \tag{5.20}\]

The probability that a digit entering the system as 0 will be transmitted by the third stage as 0 is given by \begin{align} P_{3}(0,0)= & \frac{1-P(0,0)}{2-P(0,0)-P(1,1)}[P(0,0)+P(1,1)-1]^{3} \tag{5.21} \\ & \quad+\frac{1-P(1,1)}{2-P(0,0)-P(1,1)} \\ = & \frac{1-p}{2-2 p}(2 p-1)^{3}+\frac{1-p}{2-2 p} \\ = & \frac{1}{2}\left[1+(2 p-1)^{3}\right] . \end{align} If \(p=\frac{1}{3}\) , then \(P_{3}(0,0)=\frac{1}{2}\left[1-\left(\frac{1}{3}\right)^{3}\right]=\frac{13}{2}\) . The probability that a digit entering the system as 0 will be transmitted by each stage as 0 is given by the product \[P(0,0) P(0,0) P(0,0)=p^{3}=\left(\frac{1}{3}\right)^{3}=\frac{-1}{27}.\]

Example 5C . Suppose that the digit transmitted through the communications system described in example 5B (with \(p=\frac{1}{3}\) ) is chosen by a chance mechanism; digit 0 is chosen with probability \(\frac{1}{3}\) and digit 1, with probability \(\frac{2}{3}\) . What is the conditional probability that a digit transmitted by the third stage as 0 in fact entered the system as 0?

Solution

The conditional probability that a digit transmitted by the third stage as 0 entered the system as 0 is given by \[\frac{p_{1}(0) P_{3}(0,0)}{p_{4}(0)} \tag{5.22}\]

To justify (5.22), note that \(p_{1}(0) P_{3}(0,0)\) is the probability that the first digit is 0 and the fourth digit is 0, whereas \(p_{4}(0)\) is the probability that the fourth digit is 0. Now, under the assumption that \[P(1,1)=P(0,0)=\frac{1}{3}, \quad p_{1}(0)=\frac{1}{3}, \quad p_{1}(1)=\frac{2}{3},\] it follows from (5.10) that \begin{align} p_{4}(0)= & {\left[p_{1}(0)-\frac{1-P(1,1)}{2-P(0,0)-P(1,1)}\right][P(0,0)+P(1,1)-1]^{3} } \tag{5.23} \\ & +\frac{1-P(1,1)}{2-P(0,0)-P(1,1)} \\ = & \left(\frac{1}{3}-\frac{1}{2}\right)\left(-\frac{1}{3}\right)^{3}+\frac{1}{2}=\frac{41}{81} \end{align}

From (5.21) and (5.23) it follows that the conditional probability that a digit leaving the system as 0 entered the system as 0 is given by \[\frac{1}{3} \cdot \frac{13}{27} / \frac{41}{81}=\frac{13}{41}\]

Example 5D . Let us use the considerations of examples 5B and 5C to solve the following problem, first proposed by the celebrated cosmologist A. S. Eddington (see W. Feller, “The Problem of \(n\) liars and Markov chains”, American Mathematical Monthly , Vol. 58 (1951), pp. 606–608). “If \(A, B\) , \(C, D\) each speak the truth once in 3 times (independently), and \(A\) affirms that \(B\) denies that \(C\) declares that \(D\) is a liar, what is the probability that \(D\) was telling the truth”?

Solution

We consider a sample description space of 4-tuples \(\left(z_{1}, z_{2}, z_{3}, z_{4}\right)\) in which \(z_{1}\) equals 0 or 1, depending on whether \(D\) is truthful or a liar, \(z_{2}\) equals 0 or 1, depending on whether the statement made by \(C\) implies that \(D\) is truthful or a liar, \(z_{3}\) equals 0 or 1, depending on whether the statement made by \(B\) implies that \(D\) is truthful or a liar, and \(z_{4}\) equals 0 or 1, depending on whether the statement made by \(A\) implies that \(D\) is truthful or a liar. The sample description space thus defined constitutes a series of Markov dependent repeated Bernoulli trials with \[P(0,0)=P(1,1)=\frac{1}{3}, \quad p_{1}(0)=\frac{1}{3}, \quad p_{1}(1)=\frac{2}{3}.\]

The persons \(A, B, C\) , and \(D\) can be regarded as forming a communications system. We are seeking the conditional probability that the digit entering the system was 0 (which is equivalent to \(D\) being truthful), given that the digit transmitted by the third stage was 0 (if \(A\) affirms that \(B\) denies that \(C\) declares that \(D\) is a liar, then \(A\) is asserting that \(D\) is truthful). In view of example \(5 C\) , the required probability is \(\frac{13}{41}\) .

Statistical Equilibrium. For large values of \(k\) the values of \(p_{k}(s)\) and \(p_{k}(f)\) are approximately given by \begin{align} p_{k}(s) & =\frac{1-P(f, f)}{2-P(s, s)-P(f, f)} \tag{5.24} \\ p_{k}(f) & =\frac{1-P(s, s)}{2-P(s, s)-P(f, f)} \end{align}

To justify (5.24) , use (5.10) and (5.11) and the fact that (5.9) implies that \[\lim _{k \rightarrow \infty}[P(s, s)+P(f, f)-1]^{k-1}=0.\]

Equation (5.24) has the following significant interpretation: After a large number of Markov dependent repeated Bernoulli trials has been performed, one is in a state of statistical equilibrium, in the sense that the probability \(p_{k}(s)\) of success on the kth trial is the same for all large values of \(k\) and indeed is functionally independent of the initial conditions, represented \(b y\) \(p_{1}(s)\) .

From (5.24) one sees that the trials are asymptotically fair in the sense that approximately \[p_{k}(s)=p_{k}(f)=\frac{1}{2} \tag{5.25}\] for large values of \(k\) if and only if \[P(s, s)=P(f, f) \tag{5.26}\]

Example 5E . If the communications system described in example 5B consists of a very large number of stages, then half the digits transmitted by the system will be 0’s, irrespective of the proportion of 0’s among the digits entering the system.

Exercises

5.1 . Consider a series of Markov dependent Bernoulli trials such that \(P(s, s)=\frac{1}{3}, P(f, f)=\frac{1}{2}\) . Find \(P_{3}(s, f), P_{3}(f, s)\) .

Answer

\(P_{3}(s, f)=\frac{31}{5}, P_{3}(f, s)=\frac{3}{7} \frac{1}{2}\) .

5.2 . Consider a series of Markov dependent Bernoulli trials such that \(P(s, s)=\frac{1}{4}, P(f, f)=\frac{1}{2}, p_{1}(s)=\frac{1}{3}\) . Find \(p_{3}(s), p_{3}(f)\) .

5.3 . Consider a series of Markov dependent Bernoulli trials such that \(P(s, s)=\frac{1}{2}, P(f, f)=\frac{3}{4}, p_{1}(s)=\frac{1}{2}\) . Find the conditional probability of a success at the first trial, given that there was a success at the fourth trial.

Answer

\(\frac{2}{4} \frac{2}{3}\) .

5.4 . Consider a series of Markov dependent Bernoulli trials such that \(P(s, s)=\frac{1}{2}, P(f, f)=\frac{3}{4}\) . Find \(\lim _{k \rightarrow \infty} p_{k}(s)\) .

5.5 . If \(A, B, C\) , and \(D\) each speak the truth once in 3 times (independently), and \(A\) affirms that \(B\) denies that \(C\) denies that \(D\) is a liar, what is the probability that \(D\) was telling the truth?

Answer

0.35.

5.6 . Suppose the probability is equal to \(p\) that the weather (rain or no rain) on any arbitrary day is the same as on the preceding day. Let \(p_{1}\) be the probability of rain on the first day of the year. Find the probability \(p_{n}\) of rain on the \(n\) th day. Evaluate the limit of \(p_{n}\) as \(n\) tends to infinity.

5.7 . Consider a game played as follows: a group of \(n\) persons is arranged in a line. The first person starts a rumor by telling his neighbor that the last person in line is a nonconformist. Each person in line then repeats this rumor to his neighbor; however, with probability \(p>0\) , he reverses the sense of the rumor as it is told to him. What is the probability that the last person in line will be told he is a nonconformist if (i) \(n=5\) , (ii) \(n=6\) , (iii) \(n\) is very large?

Answer

(i) \(\frac{1}{2}+\frac{1}{2}(1-2 p)^{3}\) ; (ii) \(\frac{1}{2}+\frac{1}{2}(1-2 p)^{4}\) ; (iii) \(\frac{1}{2}\) if \(p<1\) .

5.8 . Suppose you are confronted with 2 coins, \(A\) and \(B\) . You are to make \(n\) tosses, using the coin you prefer at each toss. You will be paid 1 dollar for each time the coin falls heads. Coin \(A\) has probability \(\frac{1}{2}\) of falling heads, and coin \(B\) has probability \(\frac{1}{4}\) of falling heads. Unfortunately, you are not told which of the coins is coin \(A\) . Consequently, you decide to toss the coins according to the following system. For the first toss you choose a coin at random. For all succeeding tosses you select the coin used on the preceding toss if it fell heads, and otherwise switch coins. What is the probability that coin \(A\) will be the coin tossed on the \(n\) th toss if (i) \(n=2\) , (ii) \(n=4\) , (iii) \(n=6\) , (iv) \(n\) is very large? What is the probability that the coin tossed on the \(n\) th toss will fall heads if (i) \(n=2\) , (ii) \(n=4\) , (iii) \(n=6\) , (iv) \(n\) is very large? Hint: On each trial let \(s\) denote the use of coin \(A\) and \(f\) denote the use of coin \(B\) .

5.9 . A certain young lady is being wooed by a certain young man. The young lady tries not to be late for their dates too often. If she is late on 1 date, she is \(90 \%\) sure to be on time on the next date. If she is on time, then there is a \(60 \%\) chance of her being late on the next date. In the long run, how often is she late?

Answer

\(\frac{2}{5}\) .

5.10 . Suppose that people in a certain group may be classified into 2 categories (say, city dwellers and country dwellers; Republicans and Democrats; Easterners and Westerners; skilled and unskilled workers, and so on). Let us consider a group of engineers, some of whom are Easterners and some of whom are Westerners. Suppose that each person has a certain probability of changing his status: The probability that an Easterner will become a Westerner is 0.04, whereas the probability that a Westerner will become an Easterner is 0.01. In the long run, what proportion of the group will be (i) Easterners, (ii) Westerners, (iii) will move from East to West in a given year, (iv) will move from West to East in a given year? Comment on your answers.

The theory of Markov chains derives its name from the celebrated Russian probabilist, A. A. Markov (1856–1922). ↩︎