The Probability Law of a Function of a Random Variable

In this section we develop formulas for the probability law of a random variable Y , which arises as a function of another random variable X , so that for some Borel function g ( ) Y=g(X). \tag{8.1} To find the probability law of Y , it is best in general first to find its distribution function F Y ( ) , from which one may obtain the probability density function f Y ( ) or the probability mass function p Y ( ) in cases in which these functions exist. From (2.2) we obtain the following formula for the value F Y ( y ) at the real number y of the distribution function F Y ( ) : F_{Y}(y)=P_{X}[\{x: \quad g(x) \leq y\}] \quad \text { if } Y=g(X). \tag{8.2} Of great importance is the special case of a linear function g ( x ) = a x + b , in which a and b are given real numbers so that a > 0 and < b < . The distribution function of the random variable Y = a X + b is given by F_{aX+b}(y)=P[a X+b \leq y]=P\left[X \leq \frac{y-b}{a}\right]=F_{X}\left(\frac{y-b}{a}\right). \tag{8.3} If X is continuous, so is Y = a X + b , with a probability density function for any real number y given by

f_{a X+b}(y)=\frac{1}{a} f_{X}\left(\frac{y-b}{a}\right). \tag{8.4} 

If X is discrete, so is Y = a X + b , with a probability mass function for any real number y given by

p_{a X+b}(y)=p_{X}\left(\frac{y-b}{a}\right). \tag{8.5} 

Next, let us consider g ( x ) = x 2 . Then Y = X 2 . For y < 0 , { x ; x 2 y } is the empty set of real numbers. Consequently,

F_{X^{2}}(y)=0 \quad \text { for } y<0. \tag{8.6} 

For y 0  

\begin{align} F_{X^{2}}(y) & =P\left[X^{2} \leq y\right]=P[-\sqrt{y} \leq X \leq \sqrt{y}] \tag{8.7}\\ & =F_{X}(\sqrt{y})-F_{X}(-\sqrt{y})+p_{X}(-\sqrt{y}) \end{align}

One sees from (8.7) that if X possesses a probability density function f X ( ) then the distribution function F X 2 ( ) of X 2 may be expressed as an integral; this is the necessary and sufficient condition that X 2 possess a probability density function f X 2 ( ) . To evaluate the value of f X 2 ( y ) at a real number y , we differentiate (8.7) and (8.6) with respect to y . We obtain \begin{align} f_{x^{2}}(y) &= \begin{cases} \left[f_{X}(\sqrt{y}) + f_{X}(-\sqrt{y})\right] \frac{1}{2 \sqrt{y}}, & \text{for } y > 0 \\ 0, & \text{for } y < 0 \end{cases} \tag{8.8} \end{align} It may help the reader to recall the so-called chain rule for differentiation of a function of a function, required to obtain (8.8), if we point out that

\begin{align} \frac{d}{d y} F_{X}(\sqrt{y}) & =\lim _{h \rightarrow 0} \frac{F_{X}(\sqrt{y+h})-F_{X}(\sqrt{y})}{h} \tag{8.9}\\ & =\lim _{h \rightarrow 0} \frac{F_{X}(\sqrt{y+h})-F_{X}(\sqrt{y})}{\sqrt{y+h}-\sqrt{y}} \lim _{h \rightarrow 0} \frac{\sqrt{y+h}-\sqrt{y}}{h} \\ & =F_{X}^{\prime}(\sqrt{y}) \frac{d}{d y} \sqrt{y} \end{align}

If X is discrete, it then follows from (8.7) that X 2 is discrete, since the distribution function F X 2 ( ) may be expressed entirely as a sum. The probability mass function of X 2 for any real number y is given by

\begin{align} p_{X^{2}}(y) &= \begin{cases} p_{X}(\sqrt{y}) + p_{X}(-\sqrt{y}), & \text{for } y \geq 0 \\ 0, & \text{for } y < 0 \end{cases} \tag{8.10} \end{align}

Example 8A . The random sine wave . Let X=A \sin \theta,\tag{8.11} in which the amplitude A is a known positive constant and the phase θ is a random variable uniformly distributed on the interval π / 2 to π / 2 . The distribution function F X ( ) for | x | < A is given by \begin{align} F_{X}(x) & =P[A \sin \theta \leq x]=P[\sin \theta \leq x / A] \\ & =P\left[\theta \leq \sin ^{-1}(x / A)\right]=F_{0}\left(\sin ^{-1} \frac{x}{A}\right) \\ & =\frac{1}{\pi}\left[\sin ^{-1}\left(\frac{x}{A}\right)+\frac{\pi}{2}\right]. \end{align}Consequently, the probability density function is given by \begin{align} f_{X}(x) & = \begin{cases} \frac{1}{\pi A}\left(1-\left(\frac{x}{A}\right)^{2}\right)^{-\frac{1}{2}}, & \text{for } |x| \leq A \tag{8.12} \\ 0, & \text{otherwise.} \end{cases} \end{align}

Random variables of the form of (8.11) arise in the theory of ballistics. If a projectile is fired at an angle α to the earth, with a velocity of magnitude v , then the point at which the projectile returns to the earth is at a distance R from the point at which it was fired; R is given by the equation R = ( v 2 / g ) sin 2 α , in which g is the gravitational constant, equal to 980   cm / sec 2 or 32.2 ft / sec 2 . If the firing angle α is a random variable with a known probability law, then the range R of the projectile is also a random variable with a known probability law.

A random variable similar to the one given in (8.11) was encountered in the discussion of Bertrand’s paradox in section 7; namely, the random variable X = 2 r cos Z , in which Z is uniformly distributed over the interval 0 to π / 2 .

Example 8B . The positive part of a random variable . Given any real number x , we define the symbols x + and x as follows:

\begin{align} x^{+} &= \begin{cases} x, & \text{if } x \geq 0 \\[2mm] 0, & \text{if } x < 0 \end{cases} \quad\quad & x^{-} = \begin{cases} 0, & \text{if } x \geq 0 \\[2mm] -x, & \text{if } x < 0 \end{cases} \tag{8.13} \end{align} 

Then x = x + x and | x | = x + + x . Given a random variable X , let Y = X + . We call Y the positive part of X . The distribution function of the positive part of X is given by

\begin{align} F_{X+}(y) & = \begin{cases} 0, & \text{if } y < 0 \tag{8.14} \\ F_{X}(0), & \text{if } y = 0 \\ F_{X}(y), & \text{if } y > 0. \end{cases} \end{align}

Thus, if X is normally distributed with parameters m = 0 and σ = 1 ,

\begin{align} F_{X+}(y) & = \begin{cases} 0, & \text{if } y < 0 \tag{8.15} \\ \Phi(0)=\frac{1}{2}, & \text{if } y = 0 \\ \Phi(y), & \text{if } y > 0. \end{cases} \end{align}

The positive part X + of a normally distributed random variable is neither continuous nor discrete but has a distribution function of mixed type.

The Calculus of Probability Density Functions . Let X be a continuous random variable, and let Y = g ( X ) . Unless some conditions are imposed on the function g ( ) , it is not necessarily true that Y is continuous. For example, Y = X + is not continuous if X has a positive probability of being negative. We now state some conditions on the function g ( ) under which g ( X ) is a continuous random variable if X is a continuous random variable. At the same time, we give formulas for the probability density function of g ( X ) in terms of the probability density function of X and the derivatives of g ( ) .

We first consider the case in which the function g ( ) is differentiable at every real number x and, further, either g^{\prime}(x)>0 for all x or g^{\prime}(x)<0 for all x . We may then prove the following facts (see R. Courant, Differential and Integral Calculus , Interscience, New York, 1937, pp. 144145): (i) as x goes from to , g ( x ) is either monotone increasing or monotone decreasing; (ii) the limits

\begin{array}{ll} \alpha^{\prime}=\displaystyle\lim _{x \rightarrow-\infty} g(x), & \beta^{\prime}=\displaystyle\lim _{x \rightarrow \infty} g(x) \tag{8.16}\\[2mm] \alpha=\min \left(\alpha^{\prime}, \beta^{\prime}\right), & \beta=\max \left(\alpha^{\prime}, \beta^{\prime}\right) \end{array} 

exist (although they may be infinite); (iii) for every value of y such that α < y < β there exists exactly one value of x such that y = g ( x ) [this value of x is denoted by g 1 ( y ) ]; (iv) the inverse function x = g 1 ( y ) is differentiable and its derivative is given by

\frac{dx}{dy}=\frac{d}{dy} g^{-1}(y)=\left(\left.\frac{d}{d x} g(x)\right|_{x=g^{-1}(y)}\right)^{-1}=\frac{1}{d y / d x}. \tag{8.17} 

For example, let g ( x ) = tan 1 x . Then g^{\prime}(x)=1 /\left(1+x^{2}\right) is positive for all x . Here α = π / 2 and β = π / 2 . The inverse function is tan y , defined for | y | π / 2 . The derivative of the inverse function is given by d x / d y = sec 2 y . One sees that ( d y / d x ) 1 = 1 + ( tan y ) 2 is equal to d x / d y , as asserted by (8.17). We may now state the following theorem:

If y = g ( x ) is differentiable for all x , and either g^{\prime}(x)>0 for all x or g^{\prime}(x)<0 for all x , and if X is a continuous random variable, then Y = g ( X ) is a continuous random variable with probability’ density function given by 

\begin{align} f_{Y}(y) & = \begin{cases} f_{X}\left[g^{-1}(y)\right]\left|\frac{d}{d y} g^{-1}(y)\right|, & \text{if } \alpha < y < \beta \tag{8.18} \\ 0, & \text{otherwise.} \end{cases} \end{align} 

in which α and β are defined by (8.16). 

To illustrate the use of (8.18), let us note the formula: if X is a continuous random variable, then

\begin{align} f_{\tan^{-1} X}(y) & = \begin{cases} f_{X}(\tan y) \sec^2 y, & \text{for } |y| < \frac{\pi}{2} \tag{8.19} \\ 0, & \text{otherwise.} \end{cases} \end{align}

To prove (8.18), we distinguish two cases; the case in which the function y = g ( x ) is monotone increasing and that in which it is monotone decreasing. In the first case the distribution function of Y for α < y < β may be written

F_{Y}(y)=P[g(X) \leq y]=P\left[X \leq g^{-1}(y)\right]=F_{X}\left[g^{-1}(y)\right]. \tag{8.20} 

In the second case, for α < y < β ,

F_{X}(y)=P[g(X) \leq y]=P\left[X \geq g^{-1}(y)\right]=1-F_{X}\left[g^{-1}(y)\right].\tag{8.20$'$} 

If (8.20) is differentiated with respect to y , ( 8.18 ) is obtained. We leave it to the reader to consider the case in which y < α or y > β .

One may extend (8.18) to the case in which the derivative g^{\prime}(x) is continuous and vanishes at only a finite number of points. We leave the proof of the following assertion to the reader.

Let y = g ( x ) be differentiable for all x and assume that the derivative g^{\prime}(x) is continuous and nonzero at all but a finite number of values of x . Then, to every real number y , (i) there is a positive integer m ( y ) and points x 1 ( y ) , x 2 ( y ) , , x m ( y ) such that, for k = 1 , 2 , , m ( y ) ,

g\left[x_{k}(y)\right]=y, \quad g^{\prime}\left[x_{k}(y)\right] \neq 0, \tag{8.21} 

or (ii) there is no value of x such that g ( x ) = y and g^{\prime}(x) \neq 0 ; in this case we write m ( y ) = 0 . If X is a continuous random variable, then Y = g ( X ) is a continuous random variable with a probability density function given by

\begin{array}{rlr} f_{Y}(y) & =\begin{cases} \displaystyle\sum_{k=1}^{m(y)} f_{X}\left[x_{k}(y)\right]\left|g^{\prime}\left[x_{k}(y)\right]\right|^{-1}, & \text{if } m(y) > 0 \tag{8.22}\\ 0, & \text{if } m(y) = 0 . \end{cases} \end{array} 

We obtain as an immediate consequence of (8.22): if X is a continuous random variable, then

\begin{align} f_{|X|}(y) & =\begin{cases} f_{X}(y) + f_{X}(-y), & \text{for } y > 0 \tag{8.23}\\ 0, & \text{for } y < 0 ; \end{cases} \\ f_{\sqrt{|X|}}(y) & =\begin{cases} 2y \left( f_{X}\left(y^{2}\right) + f_{X}\left(-y^{2}\right) \right), & \text{for } y > 0 \tag{8.24}\\ 0, & \text{for } y < 0 . \end{cases} \end{align}

Equations (8.23) and (8.24) may also be obtained directly, by using the same technique with which (8.8) was derived.

The Probability Integral Transformation . It is a somewhat surprising fact, of great usefulness both in theory and in practice, that to obtain a random sample of a random variable X it suffices to obtain a random sample of a random variable U , which is uniformly distributed over the interval 0 to 1. This follows from the fact that the distribution function F X ( ) of the random variable X is a nondecreasing function. Consequently, an inverse function F X 1 ( ) may be defined for values of y between 0 and 1: F X 1 ( y ) is equal to the smallest value of x satisfying the condition that F X ( x ) y .

Example 8C . If X is normally distributed with parameters m and σ , then F X ( x ) = Φ [ ( x m ) / σ ] and F X 1 ( y ) = m + σ Φ 1 ( y ) , in which Φ 1 ( y ) denotes the value of x satisfying the equation Φ ( Φ 1 ( y ) ) = y .

In terms of the inverse function F X 1 ( y ) to the distribution function F X ( ) of the random variable X , we may state the following theorem, the proof of which we leave as an exercise for the reader.

Theorem 8A . Let U 1 , U 2 , , U n be independent random variables, each uniformly distributed over the interval 0 to 1. The random variables defined by 

X_{1}=F_{X}^{-1}\left(U_{1}\right), \quad X_{2}=F_{X}^{-1}\left(U_{2}\right), \ldots, X_{n}=F_{X}^{-1}\left(U_{n}\right) \tag{8.25} 

are then a random sample of the random variable X . Conversely, if X 1 , X 2 , , X n are a random sample of the random variable X and if the distribution function F X ( ) is continuous, then the random variables 

U_{1}=F_{X}\left(X_{1}\right), \quad U_{2}=F_{X}\left(X_{2}\right), \cdots, U_{n}=F_{X}\left(X_{n}\right) \tag{8.26} 

are a random sample of the random variable U = F X ( X ) , which is uniformly distributed on the interval 0 to 1. 

The transformation of a random variable X into a uniformly distributed random variable U = F X ( X ) is called the probability integral transformation . It plays an important role in the modern theory of goodness-of-fit tests for distribution functions; see T. W. Anderson and D. Darling, “Asymptotic theory of certain goodness of fit criteria based on stochastic processes”, Annals of Mathematical Statistics, Vol. 23 (1952), pp. 195–212.

Exercises

8.1 . Let X have a χ 2 distribution with parameters n and σ . Show that Y = X / n has a χ distribution with parameters n and σ .

8.2 . The temperature T of a certain object, recorded in degrees Fahrenheit, obeys a normal probability law with mean 98.6 and variance 2. The temperature θ measured in degrees centigrade is related to T by θ = 5 9 ( T 32 ) . Describe the probability law of θ .

8.3 . The magnitude v of the velocity of a molecule with mass m in a gas at absolute temperature T is a random variable, which, according to the kinetic theory of gas, possesses the Maxwell distribution with parameter α = ( 2 k T / m ) 1 / 2 in which k is Boltzmann’s constant. Find and sketch the probability density function of the kinetic energy E = 1 2 m v 2 of a molecule. Describe in words the probability law of E .

 

Answer

f E ( x ) = 2 π x ( k T ) 3 / 2 e x / k T for x > 0 ; = 0 otherwise.

 

χ 2 distribution with parameters n = 3 and σ = ( 1 2 k T ) 1 / 2  

8.4 . A hardware store discovers that the number X of electric toasters it sells in a week obeys a Poisson probability law with mean 10. The profit on each toaster sold is 2. If at the beginning of the week 10 toasters are in stock, the profit Y from sale of toasters during the week is Y = 2 minimum ( X , 10 ) . Describe the probability law of Y .

8.5 . Find the probability density function of X = cos θ , in which θ is uniformly distributed on π to π .

 

Answer

1 π ( 1 x 2 ) 1 / 2 for | x | < 1 ; = 0 otherwise.

 

8.6 . Find the probability density function of the random variable X = A sin ω t , in which A and w are known constants and t is a random variable uniformly distributed on the interval T to T , in which (i) T is a constant such that 0 ω T π / 2 , (ii) T = n ( 2 π / ω ) for some integer n 2 .

8.7 . Find the probability density function of Y = e X , in which X is normally distributed with parameters m and σ . The random variable Y is said to have a lognormal distribution with parameters m and σ . (The importance and usefulness of the lognormal distribution is discussed by J. Aitchison and J. A. C. Brown, The Lognormal Distribution , Cambridge University Press, 1957.)

 

Answer

( y σ 2 π ) 1 exp [ 1 2 σ 2 ( log y m ) 2 ] for y > 0 ; = 0 otherwise.

 

In exercises 8.8 to 8.11 let X be uniformly distributed on (a) the interval 0 to 1, (b) the interval -1 to 1. Find and sketch the probability density function of the functions given.

8.8 . (i) X 2 , (ii) | X | .

8.9 . (i) e x , (ii) log e | X | .

 

Answer

8.9 (i) e x , (ii) log e | X | .

 

8.10 . (i) cos π X , (ii) tan π X .

8.11 . (i) 2 X + 1 , (ii) 2 X 2 + 1 .

 

Answer

(a): (i) 1 2 for 1 < y < 3 ; =0 otherwise; (ii) 1 4 for 1 < y < 3 ; = 0 otherwise; (b) 1 4 ( y 1 2 ) 1 / 2 for 1 < y < 3 ; =0 otherwise.

 

In exercises 8.12 to 8.15 let X be normally distributed with parameters m = 0 and σ = 1 . Find and sketch the probability density functions of the functions given.

8.12 . (i) X 2 , (ii) e x .

8.13 . (i) | X | 1 / 2 , (ii) | X | 1 / 3 .

 

Answer

(i) 4 y 2 π e 1 / 2 y 4 for y > 0 , 0 otherwise; (ii) 6 y 2 2 π e 1 / 2 y 6 for y > 0 , 0 otherwise.

 

8.14 . (i) 2 X + 1 , (ii) 2 X 2 + 1 .

8.15 . (i) sin π X , (ii) tan 1 X .

 

Answer

(i) [ 2 π 3 ( 1 y 3 ) ] 1 / 2 k = e 1 / 2 x k 2 where y = sin π x k for | y | 1 ; = 0 otherwise; (ii) 1 2 π sec 2 y e 1 , 2 tan 2 y for | y | π 2 ; = 0 otherwise.

 

8.16 . At time t = 0 , a particle is located at the point x = 0 on an x -axis. At a time T randomly selected from the interval 0 to 1, the particle is suddenly given a velocity v in the positive x -direction. For any time t > 0 let X ( t ) denote the position of the particle at time t . Then X ( t ) = 0 , if t < T , and X ( t ) = v ( t T ) , if t T . Find and sketch the distribution function of the random variable X ( t ) for any given time t > 0 .

In exercises 8.17 to 8.20 suppose that the amplitude X ( t ) at a time t of the signal emitted by a certain random signal generator is known to be a random variable (a) uniformly distributed over the interval -1 to 1 , ( b ) normally distributed with parameters m = 0 and σ > 0 , ( c ) Rayleigh distributed with parameter σ .

8.17 . The waveform X ( t ) is passed through a squaring circuit; the output Y ( t ) of the squaring circuit at time t is assumed to be given by Y ( t ) = X 2 ( t ) . Find and sketch the probability density function of Y ( t ) for any time t > 0 .

 

Answer

(a) 1 2 y for 0 < y < 1 ; 0 otherwise; ( b ) 1 σ 2 π y e y / 2 σ 2 for y > 0 ; 0 otherwise; (c) 1 2 σ 2 e y / 2 σ 2 for y > 0 ; 0 otherwise.

 

8.18 . The waveform X ( t ) is passed through a rectifier, giving as its output Y ( t ) = | X ( t ) | . Describe the probability law of Y ( t ) for any time t > 0 .

8.19 . The waveform X ( t ) is passed through a half-wave rectifier, giving as its output Y ( t ) = X + ( t ) , the positive part of X ( t ) . Describe the probability law of Y ( t ) for any t > 0 .

 

Answer

Distribution function F X ( x ) :

 

(a) 0 for x < 0 ; 1 2 for x = 0 ; x + 1 2 for 0 < x < 1 ; 1 for x > 1 ; (b) 0 for x < 0 ;

1 2 for x = 0 ; Φ ( x σ ) for x > 0 ; (c) 0 for x < 0 ; 1 e x 2 / 2 σ 2 for x > 0 .

8.20 . The waveform X ( t ) is passed through a clipper, giving as its output Y ( t ) = g [ X ( t ) ] , where g ( x ) = 1 or 0, depending on whether x > 0 or x < 0 . Find and sketch the probability mass function of Y ( t ) for any t > 0 .

8.21 . Prove that the function given in (8.12) is a probability density function. Does the fact that the function is unbounded cause any difficulty?