Probability Theory and Its Applications

It has been pointed out that the probability law of a random variable \(X\) may be specified in a variety of ways. To begin with, either its probability function \(P_{X}[\cdot]\) or its distribution function \(F_{X}\left(\cdot\right)\) may be stated. Further, if the probability law is known to be continuous or discrete, then it may be specified by stating either its probability density function \(f_{\mathrm{X}}(\cdot)\) or its probability mass function \(p_{X}(\cdot)\) . We now describe yet another function, denoted by \(\phi_{X}(\cdot)\) called the characteristic function of the random variable \(X\) , which has the property that a knowledge of \(\phi_{\mathrm{X}}(\cdot)\) serves to specify the probability law of the random variable \(X\) . Further, we shall see that the characteristic function has properties which render it particularly useful for the study of a sum of independent random variables.

To begin our introduction of the characteristic function, let us note the following fact about the probability function \(P_{X}[\cdot]\) and the distribution function \(F_{X}(\cdot)\) of a random variable \(X\) . Both functions can be regarded as the value of the expectation (with respect to the probability law of \(X\) ) of various Borel functions \(g(\cdot)\) . Thus, for every Borel set \(B\) of real numbers

\[P_{X}[B]=E_{X}\left[I_{B}(x)\right]=E\left[I_{B}(X)\right], \tag{2.1}\]

in which \(I_{B}(\cdot)\) is a function of a real variable, called the indicator function of the set \(B\) , with value \(I_{B}(x)\) at any point \(x\) given by \begin{align} I_{B}(x) & = \begin{cases} 1, & \text{if } x \text{ belongs to } B, \\ 0, & \text{if } x \text{ does not belong to } B. \end{cases} \end{align} On the other hand, for every real number \(y\) \[F_{X}(y)=E_{X}\left[I_{y}(x)\right]=E\left[I_{y}(X)\right], \tag{2.3}\] in which \(I_{y}(\cdot)\) is a function of a real variable, defined by \begin{align} I_{y}(x) & = \begin{cases} 1, & \text{if } x \leq y \tag{2.4} \\ 0, & \text{if } x > y. \end{cases} \end{align} We thus see that if one knows the expectation \(E_{X}[g(x)]\) of every bounded Borel function \(g(\cdot)\) , with respect to the probability law of the random variable \(X\) , one will know by (2.1) and (2.3) the probability function and distribution function of \(X\) . Conversely, a knowledge of the probability function or of the distribution function of \(X\) yields a knowledge of \(E[g(X)]\) for every function \(g(\cdot)\) for which the expectation exists. Consequently, stating the expectation functional \(E_{X}[\cdot]\) of a random variable [which is a function whose argument is a function \(g(\cdot)]\) constitutes another equivalent way of specifying the probability law of a random variable.

The question arises: is there any other family of functions on the real line in addition to those of the form of (2.2) and (2.4) such that a knowledge of the expectations of these functions with respect to the probability law of a random variable \(X\) would suffice to specify the probability law? We now show that the complex exponential functions provide such a family.

We define the expectation, with respect to a random variable \(X\) , of a function \(g(\cdot)\) , which takes values that are complex numbers, by

\[E[g(X)]=E[\operatorname{Re} g(X)]+i E[\operatorname{Im} g(X)] \tag{2.5}\]

in which the symbols \(\mathrm{Re}\) and \(\mathrm{Im}\) , respectively, are abbreviations of the phrases “real part of” and “imaginary part of”. Note that

\[g(x)=\operatorname{Re} g(x)+i \operatorname{Im} g(x).\]

It may be shown that under these definitions all the usual properties of the operation of taking expectations continue to hold for complex-valued functions whose expectations exist. We define \(E[g(X)]\) as existing if \(E[| g(X) |]\) is finite. If this is the case, it then follows that

\[|E[g(X)]| \leq E[|g(X)|], \tag{2.6}\]

or, more explicitly,

\[\left\{E^{2}[\operatorname{Re} g(X)]+E^{2}[\operatorname{Im} g(X)]\right\}^{1 / 2} \leq E\left[\left\{[\operatorname{Re} g(X)]^{2}+[\operatorname{Im} g(X)]^{2}\right\}^{1 / 2}\right]. \tag{2.7}\]

The validity of (2.7) is proved in theoretical exercise 2.2 . In words, (2.6) states that the modulus of the expectation of a complex-valued function is less than or equal to the expectation of the modulus of the function.

The notions are now at hand to define the characteristic function \(\phi_{X}(\cdot)\) of a random variable \(X\) . We define \(\phi_{X}(\cdot)\) as a function of a real variable \(u\) , whose value is the expectation of the complex exponential function \(e^{i u x}\) with respect to the probability law of \(X\) ; in symbols,

\[\phi_{X}(u)=E\left[e^{i u x}\right]=\int_{-\infty}^{\infty} e^{i u x} d F_{X}(x). \tag{2.8}\]

The quantity \(e^{i u x}\) for any real numbers \(x\) and \(u\) is defined by

\[e^{i u x}=\cos u x+i \sin u x, \tag{2.9}\]

in which \(i\) is the imaginary unit, defined by \(i=\sqrt{-1}\) or \(i^{2}=-1\) . Since \(\left|e^{i u x}\right|^{2}=(\cos u x)^{2}+(\sin u x)^{2}=1\) , it follows that, for any random variable \(X, E\left[\left|e^{i u X}\right|\right]=E[1]=1\) . Consequently, the characteristic function always exists.

The characteristic function of a random variable has all the properties of the moment-generating function of a random variable. All the moments of the random variable \(X\) that exist may be obtained from a knowledge of the characteristic function by the formula

\[E\left[X^{k}\right]=\frac{1}{i^{k}} \frac{d^{k}}{d u^{k}} \phi_{X}(0). \tag{2.10}\]

To prove (2.10), one must employ the techniques discussed in section 5 .

More generally, from a knowledge of the characteristic function of a random variable one may obtain a knowledge of its distribution function, its probability density function (if it exists), its probability mass function , and many other expectations. These facts are established in section 3 .

The importance of characteristic functions in probability theory derives from the fact that they have the following basic property. Consider any two random variables \(X\) and \(Y\) . If the characteristic functions are approximately equal [that is, \(\phi_{X}(u) \doteq \phi_{Y}(u)\) for every real number \(u\) ], then their probability laws are approximately equal over intervals (that is, for any finite numbers \(a\) and \(b, P[a \leq X \leq b] \doteq P[a \leq Y \leq b]\) ) or, equivalently, their distribution functions are approximately equal [that is, \(F_{X}(a) \doteq F_{Y}(a)\) for all real numbers \(a\) ]. A precise formulation and proof of this assertion is given in Chapter 10.

Characteristic functions represent the ideal tool for the study of the problem of addition of independent random variables, since the sum \(X_{1}+X_{2}\) of two independent random variables \(X_{1}\) and \(X_{2}\) has as its characteristic function the product of the characteristic functions of \(X_{1}\) and \(X_{2}\) ; in symbols, for every real number \(u\)

\[\phi_{X_{1}+X_{2}}(u)=\phi_{X_{1}}(u) \phi_{X_{2}}(u) \tag{2.11}\]

if \(X_{1}\) and \(X_{2}\) are independent. It is natural to inquire whether there is some other function that enjoys properties similar to those of the characteristic function. The answer appears to be in the negative. In his paper “An essential property of the Fourier transforms of distribution functions,” Proceedings of the American Mathematical Society , Vol. 3 (1952), pp. 508510, E. Lukacs has proved the following theorem. Let \(K(x, u)\) be a complex valued function of two real variables \(x\) and \(u\) , which is a bounded Borel function of \(x\) . Define for any random variable \(X\) \[\phi_{X}(u)=E[K(X, u)].\]

In order that the function \(\phi_{X}(u)\) satisfy (2.11) and the uniqueness condition \[\phi_{X_{1}}(u)=\phi_{X_{2}}(u) \text { for all } u \quad \text { if and only if } F_{X_{1}}(x)=F_{X_{2}}(x) \text { for all } x, \tag{2.12}\] it is necessary and sufficient that \(K(x, u)\) have the form \[K(x, u)=e^{i u A(x)},\] in which \(A(x)\) is a suitable real valued function.

Example 2A. If \(X\) is \(N(0,1)\) , then its characteristic function \(\phi_{X}(u)\) is given by

\[\phi_{X}(u)=e^{-1 / 2 u^{2}}. \tag{2.13}\]

To prove (2.13), we make use of the Taylor series expansion of the exponential function:

\begin{align} \phi_{X}(u) & =\frac{1}{\sqrt{2 \pi}} \int_{-\infty}^{\infty} e^{i u x} e^{-1 / 2 x^{2}} d x=\frac{1}{\sqrt{2 \pi}} \int_{-\infty}^{\infty} \sum_{n=0}^{\infty} \frac{(i u x)^{n}}{n !} e^{-1 / 2 x^{2}} d x \tag{2.14}\\[5mm] & =\sum_{n=0}^{\infty} \frac{(i u)^{n}}{n !} \frac{1}{\sqrt{2 \pi}} \int_{-\infty}^{\infty} x^{n} e^{-1 / 2 x^{2}} d x \\[5mm] & =\sum_{m=0}^{\infty} \frac{(i u)^{2 m}}{(2 m) !} \frac{(2 m) !}{2^{m} m !}=\sum_{m=0}^{\infty}\left(-\frac{1}{2} u^{2}\right)^{m} \frac{1}{m !}=e^{-1 / 2 u^{2}}. \end{align}

The interchange of the order of summation and integration in (2.14) may be justified by the fact that the infinite series is dominated by the integrable function \(\exp \left(|u x|-\frac{1}{2} x^{2}\right)\) .

Example 2B. If \(X\) is \(N\left(m, \sigma^{2}\right)\) , then its characteristic function \(\phi_{X}(u)\) is given by

\[\phi_{X}(u)=\exp \left(i m u-\frac{1}{2} \sigma^{2} u^{2}\right). \tag{2.15}\]

To prove (2.15), define \(Y=(X-m) / \sigma\) . Then \(Y\) is \(N(0,1)\) , and \(\phi_{Y}(u)=e^{-1 / u^{2}}\) . Since \(X\) may be written as a linear combination, \(X=\sigma Y+m\) , the validity of (2.15) follows from the general formula

\[\phi_{X}(u)=e^{i b u} \phi_{Y}(a u) \quad \text { if } X=a Y+b. \tag{2.16}\]

Example 2C. If \(X\) is Poisson distributed with mean \(E[X]=\lambda\) , then its characteristic function \(\phi_{X}(u)\) is given by

\[\phi_{X}(u)=e^{\lambda\left(e^{i u}-1\right)}. \tag{2.17}\]

To prove (2.17), we write

\begin{align} \phi_{X}(u) & =\sum_{k=0}^{\infty} e^{i u k} p_{X}(k)=\sum_{k=0}^{\infty} e^{i u k} \frac{\lambda^{k}}{k !} e^{-\lambda} \tag{2.18}\\[5mm] & =e^{-\lambda} \sum_{k=0}^{\infty} \frac{\left(\lambda e^{i u}\right)^{k}}{k !}=e^{-\lambda} e^{\lambda e^{i u}}. \end{align}

Example 2D. Consider a random variable \(X\) with a probability density function, for some positive constant \(a\) ,

\[f_{X}(x)=\frac{a}{2} e^{-a|x|}, \quad-\infty

which is called Laplace’s distribution . The characteristic function \(\phi_{X}(u)\) is given by

\[\phi_{X}(u)=\frac{a^{2}}{a^{2}+u^{2}}. \tag{2.20}\]

To prove (2.20), we note that since \(f_{X}(x)\) is an even function of \(x\) we may write

\begin{align} \phi_{X}(u) & =2 \int_{0}^{\infty} \cos u x f_{X}(x) d x=a \int_{0}^{\infty} e^{-a x} \cos u x d x \tag{2.21}\\[5mm] & =\left.a \frac{e^{-a x}(u \sin u x-a \cos u x)}{a^{2}+u^{2}}\right|_{0} ^{\infty}=\frac{a^{2}}{a^{2}+u^{2}}. \end{align}

Theoretical Exercises

2.1. Cumulants and the log-characteristic function . The logarithm (to the base \(e\) ) of the characteristic function of a random variable \(X\) is often easy to differentiate. Its \(n\) th derivative may be used to form the \(n\) th cumulant of \(X\) , written \(K_{n}[X]\) , which is defined by

\[K_{n}[X]=\left.\frac{1}{i^{n}} \frac{d^{n}}{d u^{n}} \log \phi_{X}(u)\right|_{u=0} \tag{2.22}\]

If the \(n\) th absolute moment \(E\left[|X|^{n}\right]\) exists, then both \(\phi_{X}(\cdot)\) and \(\log \phi_{X}(\cdot)\) are differentiable \(n\) times and may be expanded in terms of their first \(n\) derivatives; in particular,

\[\log \phi_{X}(u) = K_{1}[X](i u) + K_{2}[X] \frac{(i u)^{2}}{2!} + \cdots + K_{n}[X] \frac{(i u)^{n}}{n!} + R_{n}(u), \tag{2.23}\]

in which the remainder \(R_{n}(u)\) is such that \(|u|^{n} R_{n}(u)\) tends to 0 as \(|\dot{u}|\) tends to 0 . From a knowledge of the cumulants of a probability law one may obtain a knowledge both of its moments and its central moments. Show by evaluating the derivatives at \(t=0\) of \(e^{K(t)}\) , in which \(K(t)=\log \phi_{X}(t)\) , that \begin{align} E[X] & =K_{1} \\ E\left[X^{2}\right] & =K_{2}+K_{1}^{2} \\ E\left[X^{3}\right] & =K_{3}+3 K_{2} K_{1}+K_{1}^{3} \tag{2.24}\\ E\left[X^{4}\right] & =K_{4}+4 K_{3} K_{1}+3 K_{2}^{2}+3 K_{2} K_{1}^{2}+K_{1}^{4} \end{align} Show, by evaluating the derivatives of \(e^{K_{m}(t)}\) , in which \(K_{m}(t)=\log \phi_{X}(t)-\) itm and \(m=E[X]\) , that \begin{align} & E\left[(X-m)^{2}\right]=K_{2} \\ & E\left[(X-m)^{3}\right]=K_{3} \tag{2.25}\\ & E\left[(X-m)^{4}\right]=K_{4}+3 K_{2}^{2}. \end{align}

2.2. The square root of sum of squares inequality. Prove that (2.7) holds by showing that for any 2 random variables, \(X\) and \(Y\) ,

\[\sqrt{E^{2}[X]+E^{2}[Y]} \leq E\left[\sqrt{X^{2}+Y^{2}}\right] \tag{2.26}\]

Hint : Show, and use the fact, that

\[\sqrt{x^{2}+y^{2}}-\sqrt{x_{0}^{2}+y_{0}^{2}} \geq\left[\left(x-x_{0}\right) x_{0}\right. \left.+\left(y-y_{0}\right) y_{0}\right] / \sqrt{x_{0}^{2}+y_{0}^{2}}\] for real \(x, y, x_{0}, y_{0}\) with \(x_{0} y_{0} \neq 0\) .

Exercise

2.1. Compute the characteristic function of a random variable \(X\) that has as its probability law (i) the binomial distribution with mean 3 and standard deviation \(\frac{3}{2}\) , (ii) the Poisson distribution with mean 3, (iii) the geometric distribution with parameter \(p=\frac{1}{4}\) , (iv) the normal distribution with mean 3 and standard deviation \(\frac{3}{2},(v)\) the gamma distribution with parameters \(r=2\) and \(\lambda=3\) .