Probability Theory and Its Applications

In this section we develop formulas for the probability law of a random variable $Y$ , which arises as a function of another random variable $X$ , so that for some Borel function $g(\cdot)$ \[Y=g(X). \tag{8.1}\] To find the probability law of $Y$ , it is best in general first to find its distribution function $F_{Y}(\cdot)$ , from which one may obtain the probability density function $f_{Y}(\cdot)$ or the probability mass function $p_{Y}(\cdot)$ in cases in which these functions exist. From (2.2) we obtain the following formula for the value $F_{Y}(y)$ at the real number $y$ of the distribution function $F_{Y}(\cdot)$ : \[F_{Y}(y)=P_{X}[\{x: \quad g(x) \leq y\}] \quad \text { if } Y=g(X). \tag{8.2}\] Of great importance is the special case of a linear function $g(x)=$ $a x+b$ , in which $a$ and $b$ are given real numbers so that $a>0$ and $-\infty. The distribution function of the random variable \(Y=aX+b$ is given by \[F_{aX+b}(y)=P[a X+b \leq y]=P\left[X \leq \frac{y-b}{a}\right]=F_{X}\left(\frac{y-b}{a}\right). \tag{8.3}\] If $X$ is continuous, so is $Y=aX+b$ , with a probability density function for any real number $y$ given by

\[f_{a X+b}(y)=\frac{1}{a} f_{X}\left(\frac{y-b}{a}\right). \tag{8.4}\]

If $X$ is discrete, so is $Y=a X+b$ , with a probability mass function for any real number $y$ given by

\[p_{a X+b}(y)=p_{X}\left(\frac{y-b}{a}\right). \tag{8.5}\]

Next, let us consider $g(x)=x^{2}$ . Then $Y=X^{2}$ . For $y<0,\left\{x ; \quad x^{2} \leq y\right\}$ is the empty set of real numbers. Consequently,

\[F_{X^{2}}(y)=0 \quad \text { for } y<0. \tag{8.6}\]

For $y \geq 0$

\begin{align} F_{X^{2}}(y) & =P\left[X^{2} \leq y\right]=P[-\sqrt{y} \leq X \leq \sqrt{y}] \tag{8.7}\\ & =F_{X}(\sqrt{y})-F_{X}(-\sqrt{y})+p_{X}(-\sqrt{y}) \end{align}

One sees from (8.7) that if $X$ possesses a probability density function $f_{X}(\cdot)$ then the distribution function $F_{X^{2}}(\cdot)$ of $X^{2}$ may be expressed as an integral; this is the necessary and sufficient condition that $X^{2}$ possess a probability density function $f_{X^{2}}(\cdot)$ . To evaluate the value of $f_{X^{2}}(y)$ at a real number $y$ , we differentiate (8.7) and (8.6) with respect to $y$ . We obtain \begin{align} f_{x^{2}}(y) &= \begin{cases} \left[f_{X}(\sqrt{y}) + f_{X}(-\sqrt{y})\right] \frac{1}{2 \sqrt{y}}, & \text{for } y > 0 \\ 0, & \text{for } y < 0 \end{cases} \tag{8.8} \end{align} It may help the reader to recall the so-called chain rule for differentiation of a function of a function, required to obtain (8.8), if we point out that

\begin{align} \frac{d}{d y} F_{X}(\sqrt{y}) & =\lim _{h \rightarrow 0} \frac{F_{X}(\sqrt{y+h})-F_{X}(\sqrt{y})}{h} \tag{8.9}\\ & =\lim _{h \rightarrow 0} \frac{F_{X}(\sqrt{y+h})-F_{X}(\sqrt{y})}{\sqrt{y+h}-\sqrt{y}} \lim _{h \rightarrow 0} \frac{\sqrt{y+h}-\sqrt{y}}{h} \\ & =F_{X}^{\prime}(\sqrt{y}) \frac{d}{d y} \sqrt{y} \end{align}

If $X$ is discrete, it then follows from (8.7) that $X^{2}$ is discrete, since the distribution function $F_{X^{2}}(\cdot)$ may be expressed entirely as a sum. The probability mass function of $X^{2}$ for any real number $y$ is given by

\begin{align} p_{X^{2}}(y) &= \begin{cases} p_{X}(\sqrt{y}) + p_{X}(-\sqrt{y}), & \text{for } y \geq 0 \\ 0, & \text{for } y < 0 \end{cases} \tag{8.10} \end{align}

Example 8A . The random sine wave . Let \[X=A \sin \theta,\tag{8.11}\] in which the amplitude $A$ is a known positive constant and the phase $\theta$ is a random variable uniformly distributed on the interval $-\pi / 2$ to $\pi / 2$ . The distribution function $F_{X}(\cdot)$ for \(|x|is given by \begin{align} F_{X}(x) & =P[A \sin \theta \leq x]=P[\sin \theta \leq x / A] \\ & =P\left[\theta \leq \sin ^{-1}(x / A)\right]=F_{0}\left(\sin ^{-1} \frac{x}{A}\right) \\ & =\frac{1}{\pi}\left[\sin ^{-1}\left(\frac{x}{A}\right)+\frac{\pi}{2}\right]. \end{align} Consequently, the probability density function is given by \begin{align} f_{X}(x) & = \begin{cases} \frac{1}{\pi A}\left(1-\left(\frac{x}{A}\right)^{2}\right)^{-\frac{1}{2}}, & \text{for } |x| \leq A \tag{8.12} \\ 0, & \text{otherwise.} \end{cases} \end{align}

Random variables of the form of (8.11) arise in the theory of ballistics. If a projectile is fired at an angle $\alpha$ to the earth, with a velocity of magnitude $v$ , then the point at which the projectile returns to the earth is at a distance $R$ from the point at which it was fired; $R$ is given by the equation $R=\left(v^{2} / g\right) \sin 2 \alpha$ , in which $g$ is the gravitational constant, equal to $980 \mathrm{~cm} / \mathrm{sec}^{2}$ or $32.2 \mathrm{ft} / \mathrm{sec}^{2}$ . If the firing angle $\alpha$ is a random variable with a known probability law, then the range $R$ of the projectile is also a random variable with a known probability law.

A random variable similar to the one given in (8.11) was encountered in the discussion of Bertrand’s paradox in section 7; namely, the random variable $X=2 r \cos Z$ , in which $Z$ is uniformly distributed over the interval 0 to $\pi / 2$ .

Example 8B . The positive part of a random variable . Given any real number $x$ , we define the symbols $x^{+}$ and $x^{-}$ as follows:

\begin{align} x^{+} &= \begin{cases} x, & \text{if } x \geq 0 \\[2mm] 0, & \text{if } x < 0 \end{cases} \quad\quad & x^{-} = \begin{cases} 0, & \text{if } x \geq 0 \\[2mm] -x, & \text{if } x < 0 \end{cases} \tag{8.13} \end{align}

Then $x=x^{+}-x^{-}$ and $|x|=x^{+}+x^{-}$ . Given a random variable $X$ , let $Y=X^{+}$ . We call $Y$ the positive part of $X$ . The distribution function of the positive part of $X$ is given by

\begin{align} F_{X+}(y) & = \begin{cases} 0, & \text{if } y < 0 \tag{8.14} \\ F_{X}(0), & \text{if } y = 0 \\ F_{X}(y), & \text{if } y > 0. \end{cases} \end{align}

Thus, if $X$ is normally distributed with parameters $m=0$ and $\sigma=1$ ,

\begin{align} F_{X+}(y) & = \begin{cases} 0, & \text{if } y < 0 \tag{8.15} \\ \Phi(0)=\frac{1}{2}, & \text{if } y = 0 \\ \Phi(y), & \text{if } y > 0. \end{cases} \end{align}

The positive part $X^{+}$ of a normally distributed random variable is neither continuous nor discrete but has a distribution function of mixed type.

The Calculus of Probability Density Functions . Let $X$ be a continuous random variable, and let $Y=g(X)$ . Unless some conditions are imposed on the function $g(\cdot)$ , it is not necessarily true that $Y$ is continuous. For example, $Y=X^{+}$ is not continuous if $X$ has a positive probability of being negative. We now state some conditions on the function $g(\cdot)$ under which $g(X)$ is a continuous random variable if $X$ is a continuous random variable. At the same time, we give formulas for the probability density function of $g(X)$ in terms of the probability density function of $X$ and the derivatives of $g(\cdot)$ .

We first consider the case in which the function $g(\cdot)$ is differentiable at every real number $x$ and, further, either $g^{\prime}(x)>0$ for all $x$ or $g^{\prime}(x)<0$ for all $x$ . We may then prove the following facts (see R. Courant, Differential and Integral Calculus , Interscience, New York, 1937, pp. 144145): (i) as $x$ goes from $-\infty$ to $\infty, g(x)$ is either monotone increasing or monotone decreasing; (ii) the limits

\[\begin{array}{ll} \alpha^{\prime}=\displaystyle\lim _{x \rightarrow-\infty} g(x), & \beta^{\prime}=\displaystyle\lim _{x \rightarrow \infty} g(x) \tag{8.16}\\[2mm] \alpha=\min \left(\alpha^{\prime}, \beta^{\prime}\right), & \beta=\max \left(\alpha^{\prime}, \beta^{\prime}\right) \end{array}\]

exist (although they may be infinite); (iii) for every value of $y$ such that $\alphathere exists exactly one value of \(x$ such that $y=g(x)$ [this value of $x$ is denoted by $g^{-1}(y)$ ]; (iv) the inverse function $x=g^{-1}(y)$ is differentiable and its derivative is given by

\[\frac{dx}{dy}=\frac{d}{dy} g^{-1}(y)=\left(\left.\frac{d}{d x} g(x)\right|_{x=g^{-1}(y)}\right)^{-1}=\frac{1}{d y / d x}. \tag{8.17}\]

For example, let $g(x)=\tan ^{-1} x$ . Then $g^{\prime}(x)=1 /\left(1+x^{2}\right)$ is positive for all $x$ . Here $\alpha=-\pi / 2$ and $\beta=\pi / 2$ . The inverse function is $\tan y$ , defined for $|y| \leq \pi / 2$ . The derivative of the inverse function is given by $d x / d y=$ $\sec ^{2} y$ . One sees that $(d y / d x)^{-1}=1+(\tan y)^{2}$ is equal to $d x / d y$ , as asserted by (8.17). We may now state the following theorem:

If $y=g(x)$ is differentiable for all $x$ , and either $g^{\prime}(x)>0$ for all $x$ or $g^{\prime}(x)<0$ for all $x$ , and if $X$ is a continuous random variable, then $Y=g(X)$ is a continuous random variable with probability’ density function given by

\begin{align} f_{Y}(y) & = \begin{cases} f_{X}\left[g^{-1}(y)\right]\left|\frac{d}{d y} g^{-1}(y)\right|, & \text{if } \alpha < y < \beta \tag{8.18} \\ 0, & \text{otherwise.} \end{cases} \end{align}

in which $\alpha$ and $\beta$ are defined by (8.16).

To illustrate the use of (8.18), let us note the formula: if $X$ is a continuous random variable, then

\begin{align} f_{\tan^{-1} X}(y) & = \begin{cases} f_{X}(\tan y) \sec^2 y, & \text{for } |y| < \frac{\pi}{2} \tag{8.19} \\ 0, & \text{otherwise.} \end{cases} \end{align}

To prove (8.18), we distinguish two cases; the case in which the function $y=g(x)$ is monotone increasing and that in which it is monotone decreasing. In the first case the distribution function of $Y$ for \(\alphamay be written

\[F_{Y}(y)=P[g(X) \leq y]=P\left[X \leq g^{-1}(y)\right]=F_{X}\left[g^{-1}(y)\right]. \tag{8.20}\]

In the second case, for \(\alpha,

\[F_{X}(y)=P[g(X) \leq y]=P\left[X \geq g^{-1}(y)\right]=1-F_{X}\left[g^{-1}(y)\right].\tag{8.20$}\]

If (8.20) is differentiated with respect to $y,(8.18)$ is obtained. We leave it to the reader to consider the case in which $y<\alpha$ or $y>\beta$ .

One may extend (8.18) to the case in which the derivative $g^{\prime}(x)$ is continuous and vanishes at only a finite number of points. We leave the proof of the following assertion to the reader.

Let $y=g(x)$ be differentiable for all $x$ and assume that the derivative $g^{\prime}(x)$ is continuous and nonzero at all but a finite number of values of $x$ . Then, to every real number $y$ , (i) there is a positive integer $m(y)$ and points $x_{1}(y), x_{2}(y), \ldots, x_{m}(y)$ such that, for $k=1,2, \ldots, m(y)$ ,

\[g\left[x_{k}(y)\right]=y, \quad g^{\prime}\left[x_{k}(y)\right] \neq 0, \tag{8.21}\]

or (ii) there is no value of $x$ such that $g(x)=y$ and $g^{\prime}(x) \neq 0$ ; in this case we write $m(y)=0$ . If $X$ is a continuous random variable, then $Y=g(X)$ is a continuous random variable with a probability density function given by

\[\begin{array}{rlr} f_{Y}(y) & =\begin{cases} \displaystyle\sum_{k=1}^{m(y)} f_{X}\left[x_{k}(y)\right]\left|g^{\prime}\left[x_{k}(y)\right]\right|^{-1}, & \text{if } m(y) > 0 \tag{8.22}\\ 0, & \text{if } m(y) = 0 . \end{cases} \end{array}\]

We obtain as an immediate consequence of (8.22): if $X$ is a continuous random variable, then

\begin{align} f_{|X|}(y) & =\begin{cases} f_{X}(y) + f_{X}(-y), & \text{for } y > 0 \tag{8.23}\\ 0, & \text{for } y < 0 ; \end{cases} \\ f_{\sqrt{|X|}}(y) & =\begin{cases} 2y \left( f_{X}\left(y^{2}\right) + f_{X}\left(-y^{2}\right) \right), & \text{for } y > 0 \tag{8.24}\\ 0, & \text{for } y < 0 . \end{cases} \end{align}

Equations (8.23) and (8.24) may also be obtained directly, by using the same technique with which (8.8) was derived.

The Probability Integral Transformation . It is a somewhat surprising fact, of great usefulness both in theory and in practice, that to obtain a random sample of a random variable $X$ it suffices to obtain a random sample of a random variable $U$ , which is uniformly distributed over the interval 0 to 1. This follows from the fact that the distribution function $F_{X}(\cdot)$ of the random variable $X$ is a nondecreasing function. Consequently, an inverse function $F_{X}^{-1}(\cdot)$ may be defined for values of $y$ between 0 and 1: $F_{X}^{-1}(y)$ is equal to the smallest value of $x$ satisfying the condition that $F_{X}(x) \geq y$ .

Example 8C . If $X$ is normally distributed with parameters $m$ and $\sigma$ , then $F_{X}(x)=\Phi[(x-m) / \sigma]$ and $F_{X^{-1}}(y)=m+\sigma \Phi^{-1}(y)$ , in which $\Phi^{-1}(y)$ denotes the value of $x$ satisfying the equation $\Phi\left(\Phi^{-1}(y)\right)=y$ .

In terms of the inverse function $F_{X}^{-1}(y)$ to the distribution function $F_{X}(\cdot)$ of the random variable $X$ , we may state the following theorem, the proof of which we leave as an exercise for the reader.

Theorem 8A . Let $U_{1}, U_{2}, \ldots, U_{n}$ be independent random variables, each uniformly distributed over the interval 0 to 1. The random variables defined by

\[X_{1}=F_{X}^{-1}\left(U_{1}\right), \quad X_{2}=F_{X}^{-1}\left(U_{2}\right), \ldots, X_{n}=F_{X}^{-1}\left(U_{n}\right) \tag{8.25}\]

are then a random sample of the random variable $X$ . Conversely, if $X_{1}, X_{2}, \ldots, X_{n}$ are a random sample of the random variable $X$ and if the distribution function $F_{X}(\cdot)$ is continuous, then the random variables

\[U_{1}=F_{X}\left(X_{1}\right), \quad U_{2}=F_{X}\left(X_{2}\right), \cdots, U_{n}=F_{X}\left(X_{n}\right) \tag{8.26}\]

are a random sample of the random variable $U=F_{X}(X)$ , which is uniformly distributed on the interval 0 to 1.

The transformation of a random variable $X$ into a uniformly distributed random variable $U=F_{X}(X)$ is called the probability integral transformation . It plays an important role in the modern theory of goodness-of-fit tests for distribution functions; see T. W. Anderson and D. Darling, “Asymptotic theory of certain goodness of fit criteria based on stochastic processes”, Annals of Mathematical Statistics, Vol. 23 (1952), pp. 195–212.

Exercises

8.1 . Let $X$ have a $\chi^{2}$ distribution with parameters $n$ and $\sigma$ . Show that $Y=\sqrt{X / n}$ has a $\chi$ distribution with parameters $n$ and $\sigma$ .

8.2 . The temperature $T$ of a certain object, recorded in degrees Fahrenheit, obeys a normal probability law with mean 98.6 and variance 2. The temperature $\theta$ measured in degrees centigrade is related to $T$ by $\theta=\frac{5}{9}(T-32)$ . Describe the probability law of $\theta$ .

8.3 . The magnitude $v$ of the velocity of a molecule with mass $m$ in a gas at absolute temperature $T$ is a random variable, which, according to the kinetic theory of gas, possesses the Maxwell distribution with parameter $\alpha=(2 k T / m)^{1 / 2}$ in which $k$ is Boltzmann’s constant. Find and sketch the probability density function of the kinetic energy $E=\frac{1}{2} m v^{2}$ of a molecule. Describe in words the probability law of $E$ .

Answer

$f_{E}(x)=\frac{2}{\sqrt{\pi}} \frac{\sqrt{x}}{(k T)^{3 / 2}} e^{-x / k T} \quad$ for $x>0 ;=0$ otherwise.

$\chi^{2}$ distribution with parameters $n=3$ and $\sigma=\left(\frac{1}{2} k T\right)^{1 / 2}$

8.4 . A hardware store discovers that the number $X$ of electric toasters it sells in a week obeys a Poisson probability law with mean 10. The profit on each toaster sold is 2. If at the beginning of the week 10 toasters are in stock, the profit $Y$ from sale of toasters during the week is $Y=2$ minimum $(X, 10)$ . Describe the probability law of $Y$ .

8.5 . Find the probability density function of $X=\cos \theta$ , in which $\theta$ is uniformly distributed on $-\pi$ to $\pi$ .

Answer

$\frac{1}{\pi}\left(1-x^{2}\right)^{-1 / 2}$ for $|x|<1 ;=0$ otherwise.

8.6 . Find the probability density function of the random variable $X=\mathrm{A} \sin \omega t$ , in which $A$ and $w$ are known constants and $t$ is a random variable uniformly distributed on the interval $-T$ to $T$ , in which (i) $T$ is a constant such that $0 \leq \omega T \leq \pi / 2$ , (ii) $T=n(2 \pi / \omega)$ for some integer $n \geq 2$ .

8.7 . Find the probability density function of $Y=e^{X}$ , in which $X$ is normally distributed with parameters $m$ and $\sigma$ . The random variable $Y$ is said to have a lognormal distribution with parameters $m$ and $\sigma$ . (The importance and usefulness of the lognormal distribution is discussed by J. Aitchison and J. A. C. Brown, The Lognormal Distribution , Cambridge University Press, 1957.)

Answer

$(y \sigma \sqrt{2 \pi})^{-1} \exp \left[-\frac{1}{2 \sigma^{2}}(\log y-m)^{2}\right]$ for $y>0 ;=0$ otherwise.

In exercises 8.8 to 8.11 let $X$ be uniformly distributed on (a) the interval 0 to 1, (b) the interval -1 to 1. Find and sketch the probability density function of the functions given.

8.8 . (i) $X^{2}$ , (ii) $\sqrt{|X|}$ .

8.9 . (i) $e^{x}$ , (ii) $-\log _{e}|X|$ .

Answer

8.9 (i) $e^{x}$ , (ii) $-\log _{e}|X|$ .

8.10 . (i) $\cos \pi X$ , (ii) $\tan \pi X$ .

8.11 . (i) $2 X+1$ , (ii) $2 X^{2}+1$ .

Answer

(a): (i) $\frac{1}{2}$ for $1; =0 otherwise; (ii) \(\frac{1}{4}$ for $-1; \(=0$ otherwise; (b) $\frac{1}{4}\left(\frac{y-1}{2}\right)^{-1 / 2}$ for \(1; =0 otherwise.

In exercises 8.12 to 8.15 let $X$ be normally distributed with parameters $m=0$ and $\sigma=1$ . Find and sketch the probability density functions of the functions given.

8.12 . (i) $X^{2}$ , (ii) $e^{x}$ .

8.13 . (i) $|X|^{1 / 2}$ , (ii) $|X|^{1 / 3}$ .

Answer

(i) $\frac{4 y}{\sqrt{2 \pi}} e^{-1 / 2 y^{4}}$ for $y>0,0$ otherwise; (ii) $\frac{6 y^{2}}{\sqrt{2 \pi}} e^{-1 / 2 y^{6}}$ for $y>0,0$ otherwise.

8.14 . (i) $2 X+1$ , (ii) $2 X^{2}+1$ .

8.15 . (i) $\sin \pi X$ , (ii) $\tan ^{-1} X$ .

Answer

(i) $\left[2 \pi^{3}\left(1-y^{3}\right)\right]^{-1 / 2} \sum_{k=-\infty}^{\infty} e^{-1 / 2 x_{k}^{2}}$ where $y=\sin \pi x_{k}$ for $|y| \leq 1$ ; $=0$ otherwise; (ii) $\frac{1}{\sqrt{2 \pi}} \sec ^{2} y e^{-1,2 \tan ^{2} y}$ for $|y| \leq \frac{\pi}{2} ;=0$ otherwise.

8.16 . At time $t=0$ , a particle is located at the point $x=0$ on an $x$ -axis. At a time $T$ randomly selected from the interval 0 to 1, the particle is suddenly given a velocity $v$ in the positive $x$ -direction. For any time $t>0$ let $X(t)$ denote the position of the particle at time $t$ . Then $X(t)=0$ , if $t, and \(X(t)=v(t-T)$ , if $t \geq T$ . Find and sketch the distribution function of the random variable $X(t)$ for any given time $t>0$ .

In exercises 8.17 to 8.20 suppose that the amplitude $X(t)$ at a time $t$ of the signal emitted by a certain random signal generator is known to be a random variable (a) uniformly distributed over the interval -1 to $1,(b)$ normally distributed with parameters $m=0$ and $\sigma>0, (c)$ Rayleigh distributed with parameter $\sigma$ .

8.17 . The waveform $X(t)$ is passed through a squaring circuit; the output $Y(t)$ of the squaring circuit at time $t$ is assumed to be given by $Y(t)=X^{2}(t)$ . Find and sketch the probability density function of $Y(t)$ for any time $t>0$ .

Answer

(a) $\frac{1}{2 \sqrt{y}}$ for $0otherwise; \((b) \frac{1}{\sigma \sqrt{2 \pi y}} e^{-y / 2 \sigma^{2}}$ for $y>0 ; 0$ otherwise; (c) $\frac{1}{2 \sigma^{2}} e^{-y / 2 \sigma^{2}}$ for $y>0 ; 0$ otherwise.

8.18 . The waveform $X(t)$ is passed through a rectifier, giving as its output $Y(t)=|X(t)|$ . Describe the probability law of $Y(t)$ for any time $t>0$ .

8.19 . The waveform $X(t)$ is passed through a half-wave rectifier, giving as its output $Y(t)=X^{+}(t)$ , the positive part of $X(t)$ . Describe the probability law of $Y(t)$ for any $t>0$ .

Answer

Distribution function $F_{X}(x)$ :

(a) 0 for $x<0$ ; $\frac{1}{2}$ for $x=0$ ; $\frac{x+1}{2}$ for $0; 1 for \(x>1$ ; (b) 0 for $x<0$ ;

$\frac{1}{2}$ for $x=0 ; \Phi\left(\frac{x}{\sigma}\right)$ for $x>0$ ; (c) 0 for $x<0 ; 1-e^{-x^{2} / 2 \sigma^{2}}$ for $x>0$ .

8.20 . The waveform $X(t)$ is passed through a clipper, giving as its output $Y(t)=g[X(t)]$ , where $g(x)=1$ or 0, depending on whether $x>0$ or $x<0$ . Find and sketch the probability mass function of $Y(t)$ for any $t>0$ .

8.21 . Prove that the function given in (8.12) is a probability density function. Does the fact that the function is unbounded cause any difficulty?