Probability Theory and Its Applications

Although, by definition, a random variable \(X\) is a function on a probability space, in probability theory we are rarely concerned with the functional form of \(X\) , for we are not interested in computing the value \(X(s)\) that the function \(X\) assumes at any individual member \(s\) of the sample description space \(S\) on which \(X\) is defined. Indeed, we do not usually wish to know the space \(S\) on which \(X\) is defined. Rather, we are interested in the probability that an observed value of the random variable \(X\) will lie in a given set \(B\) . We are interested in a random variable as a mechanism that gives rise to a numerical valued random phenomenon, and the questions we shall ask about a random variable \(X\) are precisely the same as those asked about numerical valued random phenomena. Similarly, the techniques we use to describe random variables are precisely the same as those used to describe numerical valued random phenomena.

To begin with, we define the probability function of a random variable \(X\) , denoted by \(P_{X}[\cdot]\) , as a set function defined for every Borel set \(B\) of real numbers, whose value \(P_{X}[B]\) is the probability that \(X\) is in \(B\) . We sometimes write the intuitively meaningful expression \(P[X\) is in \(B]\) for the mathematically correct expression \(P_{X}[B]\) . Similarly, we adopt the following expressions for any real numbers \(a, b\) , and \(x\) : \begin{align} P[a

One obtains the probability function \(P_{X}[\cdot]\) of the random variable \(X\) from the probability function \(P[\cdot]\) , which exists on the sample description space \(S\) on which \(X\) is defined as a function, by means of the following basic formula: for any Borel set \(B\) of real numbers \[P_{X}[B]=P[\{s: \ X(s) \text { is in } B\}]. \tag{2.2}\]

Equation (2.2) represents the definition of \(P_{X}[B]\) ; it is clear that it embodies the intuitive meaning of \(P_{X}[B]\) given above, since the function \(X\) will have an observed value lying in the set \(B\) if and only if the observed value \(s\) of the underlying random phenomenon is such that \(X(s)\) is in \(B\) .

Example 2A . The probability function of the number of white balls in a sample . To illustrate the use of (2.2), let us compute the probability function of the random variable \(X\) defined by (1.1) . Assuming equally likely descriptions on \(S\) , one determines for any set \(B\) of real numbers that the value of \(P_{X}[B]\) depends on the intersection of \(B\) with the set \(\{0,1,2\}\) :

\(P_{X}[B]=0\)	\(\frac{1}{15}\)	\(\frac{8}{15}\)	\(\frac{6}{15}\)	\(\frac{9}{15}\)	\(\frac{7}{15}\)	\(\frac{14}{15}\)	1
if \(B\{0,1,2\}=\emptyset\)	\(\{0\}\)	\(\{1\}\)	\(\{2\}\)	\(\{0,1\}\)	\(\{0,2\}\)	\(\{1,2\}\)	\(\{0,1,2\}\)

We may represent the probability function \(P_{X}\left[\cdot\right]\) of a random variable as a distribution of a unit mass over the real line in such a way that the amount of mass over any set \(B\) of real numbers is equal to the value \(P_{X}[B]\) of the probability function of \(X\) at \(B\) . We have seen in Chapter 4 that a distribution of probability mass may be specified in various ways by means of probability mass functions, probability density functions, and distribution functions. We now introduce these notions in connection with random variables. However, the reader should bear constantly in mind that, as mathematical functions defined on the real line, these notions have the same mathematical properties, whether they arise from random variables or from numerical valued random phenomena.

The probability law of a random variable \(X\) is defined as a probability function \(P[\cdot]\) over the real line that coincides with the probability function \(P_{X}[\cdot]\) of the random variable \(X\) . By definition, probability theory is concerned with the statements that can be made about a random variable, knowing only its probability law . Consequently, a proposition stated about a probability function \(P[\cdot]\) is, from the point of view of probability theory, a proposition stated about all random variables \(X, Y, \ldots\) , whose probability functions \(P_{X}[\cdot], P_{Y}[\cdot], \ldots\) coincide with \(P[\cdot]\) .

Two random variables \(X\) and \(Y\) are said to be identically distributed if their probability functions are equal; that is, \(P_{X}[B]=P_{Y}[B]\) for all Borel sets \(B\) .

The distribution function of a random variable \(X\) , denoted by \(F_{X}(\cdot)\) , is defined for any real number \(x\) by

\[F_{X}(x)=P[X \leq x]. \tag{2.3}\]

The distribution function \(F_{X}(\cdot)\) of a random variable possesses all the properties stated in section 3 of Chapter 4 for the distribution function of a numerical valued random phenomenon. The distribution function of \(X\) uniquely determines the probability function of \(X\) .

The distribution function may be used to classify random variables into types. A random variable \(X\) is said to be discrete or continuous, depending on whether its distribution function \(F_{X}(\cdot)\) is discrete or continuous .

The probability mass function of a random variable \(X\) , denoted by \(p_{X}(\cdot)\) , is a function whose value \(p_{X}(x)\) at any real number \(x\) represents the probability that the observed value of the random variable \(X\) will be equal to \(x\) ; in symbols, \[p_{X}(x)=P[X=x]=P_{X}\left[\left\{x^{\prime}: \ x^{\prime}=x\right\}\right]. \tag{2.4}\]

A real number \(x\) for which \(p_{X}(x)\) is positive is called a probability mass point of the random variable \(X\) . From the distribution function \(F_{X}(\cdot)\) one may obtain the probability mass function \(p_{X}(\cdot)\) by \[p_{X}(x)=F_{X}(x)-\lim _{a \rightarrow x-} F_{X}(a). \tag{2.5}\]

A random variable \(X\) is discrete if the sum of the probability mass function over the points at which it is positive (there are at most a countably infinite number) is equal to 1; in symbols, \(X\) is discrete if \[\sum_{\substack{\text { over all points } x \text { such } \\ \text { that } p_{X}(x)>0}} p_{X}(x)=1. \tag{2.6}\]

In other words, a random variable \(X\) is discrete when one distributes a unit mass over the infinite line in accordance with the probability function \(P_{X}[\cdot]\) if one does so by attaching a positive mass \(p_{X}(x)\) to each of a finite or a countably infinite number of points.

If a random variable \(X\) is discrete, it suffices to know its probability mass function \(p_{X}(\cdot)\) in order to know its probability function \(P_{X}[\cdot]\) , for we have the following formula expressing \(P_{X}[\cdot]\) in terms of \(p_{X}(\cdot)\) . If \(X\) is discrete, then for any Borel set \(B\) of real numbers

\[P_{X}[B]=P[X \text { is in } B]=\sum_{\substack{\text { over anl points } x \text { in } B \\ \text { such that } y_{X}(x)>0}} p_{X}(x). \tag{2.7}\]

Thus, for a discrete random variable \(X\) , to evaluate the probability \(P_{\mathrm{X}}[B]\) that the random variable \(X\) will have an observed value lying in \(B\) , one has only to list the probability mass points of \(X\) which lie in \(B\) . One then adds the probability masses attached to these probability mass points to obtain \(P_{X}[B]\) .

The distribution function of a discrete random variable \(X\) is given in terms of its probability mass function by

\[F_{X}(x)=\sum_{\substack{\text { over all points } x^{\prime} \leq x \\ \text { such that } p_{X}\left(x^{\prime}\right)>0}} p_{X}\left(x^{\prime}\right). \tag{2.8}\]

The distribution function \(F_{X}(\cdot)\) of a discrete random variable \(X\) is what might be called a piecewise constant or “step” function, as diagrammed in Fig. 3A of Chapter 4. It consists of a series of horizontal lines over the intervals between probability mass points; at a probability mass point \(x\) , the graph of \(F_{\mathrm{X}}(\cdot)\) jumps upward by an amount \(p_{\mathrm{X}}(x)\) .

Example 2B . A random variable \(X\) has a binomial distribution with parameters \(n\) and \(p\) if it is a discrete random variable whose probability mass function \(p_{.}(\cdot)\) is given by, for any real number \(x\) ,

\[p_{\lambda}(x) =\left\{ \begin{aligned} &\binom{n}{x} p^{x}(1-p)^{n-x}, && \text{if } x=0,1, \cdots, n \\[2mm] &0, && \text{otherwise.} \end{aligned} \right. \tag{2.9}\]

Thus for a random variable \(X\) , which has a binomial distribution with parameters \(n=6\) and \(p=\frac{1}{3}\) ,

\begin{align} & P[1

Example 2C . Identically distributed random variables . Some insight into the notion of identically distributed random variables may be gained by considering the following simple example of two random variables that are distinct as functions and yet are identically distributed. Suppose one is tossing a fair die; consider the random variables \(X\) and \(Y\) , defined as follows:

Value of \(X\) , if outcome of die is

\[\begin{array}{cc} \text{Value of } X, \text{ if outcome of die is} & \quad \text{Value of } Y, \text{ if outcome of die is} \\ \begin{array}{c|c} X & \text{Die Outcome} \\ \hline 2 & 1, 2, 3 \\ 1 & 4, 5 \\ 0 & 6 \\ \end{array} & \quad \begin{array}{c|c} Y & \text{Die Outcome} \\ \hline 2 & 4, 5, 6 \\ 1 & 2, 3 \\ 0 & 1 \\ \end{array} \end{array}\]

It is clear that both \(X\) and \(Y\) are discrete random variables, whose probability mass functions agree for all \(x\) ; indeed, \(p_{X}(2)=p_{Y}(2)=\frac{1}{2}\) , \(p_{X}(1)=p_{Y}(1)=\frac{1}{3}, p_{X}(0)=p_{Y}(0)=\frac{1}{6}\) , \(p_{X}(x)=p_{Y}(x)=0\) for \(x \neq 0,1\) , or 2. Consequently, the probability functions \(P_{X}[B]\) and \(P_{Y}[B]\) agree for all sets \(B\) .

If a random variable \(X\) is continuous, there exists a nonnegative function \(f_{X}(\cdot)\) , called the probability density function of the random variable \(X\) , which has the following property: for any Borel set \(B\) of real numbers

\[P_{X}[B]=P[X \text { is in } B]=\int_{B} f_{X}(x) dx. \tag{2.10}\]

In words, for a continuous random variable \(X\) , once the probability density function \(f_{X}(\cdot)\) is known, the value \(P_{X}[B]\) of the probability function at any Borel set \(B\) may be obtained by integrating the probability density function \(f_{\mathrm{X}}(\cdot)\) over the set \(B\) .

The distribution function \(F_{X}(\cdot)\) of a continuous random variable is given in terms of its probability density function by

\[F_{X}(x)=\int_{-\infty}^{x} f_{X}\left(x^{\prime}\right) dx^{\prime}. \tag{2.11}\]

In turn, the probability density function of a continuous random variable can be obtained from its distribution function by differentiation:

\[f_{X}(x)=\frac{d}{d x} F_{X}(x) \tag{2.12}\]

at all points \(x\) at which the derivative on the right-hand side of (2.12) exists.

Example 2D . A random variable \(X\) is said to be normally distributed if it is continuous and if constants \(m\) and \(\sigma\) exist, where \(-\inftyand \(\sigma>0\) , such that the probability density function \(f_{X}(\cdot)\) is given by, for any real number \(x\) ,

\[f_{X}(x)=\frac{1}{\sigma \sqrt{2 \pi}} e^{-\frac{1}{2}\left(\frac{x-m}{\sigma}\right)^{2}}. \tag{2.13}\]

Then for any real numbers \(a\) and \(b\)

\[P[a \leq X \leq b]=\int_{a}^{b} f_{X}(x) d x=\Phi\left(\frac{b-m}{\sigma}\right)-\Phi\left(\frac{a-m}{\sigma}\right). \tag{2.14}\]

For a random variable \(X\) , which is normally distributed with parameters \(m=2\) and \(\sigma=2\) ,

\[P[1 \leq X \leq 2]=P[1

We conclude this section by making explicit mention of our conventions concerning the use of the letters \(p, f\) , and \(F\) , and the subscripts \(X, Y, \ldots\) . We shall always use \(p(\cdot)\) to denote a probability mass function and then add as a subscript the random variable (which could be denoted by \(X, Y, Z, U\) , \(V, W\) , etc.) of which it is the probability mass function. Thus, \(p_{U}(\cdot)\) denotes the probability mass function of the random variable \(U\) , whereas \(p_{U}(u)\) denotes the value of \(p_{U}(\cdot)\) at the point \(u\) . Similarly, we write \(f_{X}(\cdot)\) , \(f_{Y}(\cdot), f_{Z}(\cdot), f_{U}(\cdot), f_{V}(\cdot), f_{W}(\cdot)\) to denote the probability density function, respectively, of \(X, Y, Z, U, V, W\) . Similarly, we write \(F_{X}(\cdot), F_{Y}(\cdot), F_{Z}(\cdot)\) , \(F_{U}(\cdot), F_{V}(\cdot), F_{W}(\cdot)\) to denote the distribution function, respectively, of \(X, Y, Z, U, V, W\) .

Exercises

In exercises 2.1 to 2.8 describe the probability law of the random variable given.

2.1 . The number of aces in a hand of 13 cards drawn without replacement from a bridge deck.

Answer

\(p_{x}(x)=\left(\begin{array}{l}4 \\ x\end{array}\right)\left(\begin{array}{cc}48 \\ 13-x\end{array}\right) /\left(\begin{array}{l}52 \\ 13\end{array}\right) \quad\) for \(x=0,1, \ldots, 4 ;=0\) otherwise.

2.2 . The sum of numbers on 2 balls drawn with replacement (without replacement) from an urn containing 6 balls, numbered 1 to 6.

2.3 . The maximum of the numbers on 2 balls drawn with replacement (without replacement) from an urn containing 6 balls, numbered 1 to 6.

Answer

Without replacement \(p_{X}(x)=\frac{2}{3_{0}}(x-1)\) for \(x=1,2, \ldots, 6\) ; \(=0\) otherwise; with replacement \(p_{X}(x)=\frac{2 x-1}{36} \quad\) for \(x=1,2, \ldots, 6 ;=0\) otherwise.

2.4 . The number of white balls drawn in a sample of size 2 drawn with replacement (without replacement) from an urn containing 6 balls, of which 4 are white.

2.5 . The second digit in the decimal expansion of a number chosen on the unit interval in accordance with a uniform probability law.

Answer

\(p_{X}(x)=\frac{1}{10} \quad\) for \(x=0,1, \ldots, 9 ;=0\) otherwise.

2.6 . The number of times a fair coin is tossed until heads appears (i) for the first time, (ii) for the second time, (iii) the third time.

2.7 . The number of cards drawn without replacement from a deck of 52 cards until (i) a spade appears, (ii) an ace appears.

Answer

(i) \(13\left(\begin{array}{c}39 \\ x-1\end{array}\right) /(53-x)\left(\begin{array}{c}52 \\ x-1\end{array}\right) \quad\) for \(x=1,2, \ldots, 40\) ; = 0 otherwise.

(ii) \(4\left(\begin{array}{c}48 \\ x-1\end{array}\right) /(53-x)\left(\begin{array}{c}52 \\ x-1\end{array}\right) \quad\) for \(x=1,2, \ldots, 49\) ; = 0 otherwise.

2.8 . The number of balls in the first urn if 10 distinguishable balls are distributed in 4 urns in such a manner that each ball is equally likely to be placed in any urn.

In exercises 2.9 to 2.16 find \(P[1 \leq X \leq 2]\) for the random variable \(X\) described.

2.9 . \(X\) is normally distributed with parameters \(m=1\) and \(\sigma=1\) .

Answer

\(0.3413\) .

2.10 . \(X\) is Poisson distributed with parameter \(\lambda=1\) .

2.11 . \(X\) obeys a binomial probability law with parameters \(n=10\) and \(p=0.1\) .

Answer

\(0.5811\) .

2.12 . \(X\) obeys an exponential probability law with parameter \(\lambda=1\) .

2.13 . \(X\) obeys a geometric probability law with parameter \(p=\frac{1}{3}\) .

Answer

\(\frac{5}{8}\) .

2.14 . \(X\) obeys a hypergeometric probability law with parameters \(N=100\) , \(p=0.1, n=10\) .

2.15 . \(X\) is uniformly distributed over the interval \(\frac{1}{2}\) to \(\frac{3}{2}\) .

Answer

\(\frac{1}{2}\) .

2.16 . \(X\) is Cauchy distributed with parameters \(\alpha=1\) and \(\beta=1\) .