Consider the probability function of a numerical-valued random phenomenon. The question arises concerning the convenient ways of stating the function without having actually to state the value of for every set of real numbers . In general, to state the function , as with any function, one has to enumerate all the members of the domain of the function , and for each of these members of the domain one states the value of the function. In special circumstances (which fortunately cover most of the cases encountered in practice) more convenient methods are available.
For many probability functions there exists a function , defined for all real numbers , from which can, for any event , be obtained by integration: P[E]=\int_{E} f(x) dx. \tag{2.1}
Given a probability function , which may be represented in the form of (2.1) in terms of some function , we call the function the probability density function of the probability function , and we say that the probability function is specified by the probability density function .
A function must have certain properties in order to be a probability density function. To begin with, it must be sufficiently well behaved as a function so that the integral 1 in (2.1) is well defined. Next, letting in (2.1) , 1=P[R]=\int_{R} f(x) d x=\int_{-\infty}^{\infty} f(x) dx. \tag{2.2}
It is necessary that satisfy (2.2) ; in words, the integral of from to must be equal to 1.
A function is said to be a probability density function if it satisfies (2.2) and, in addition, 2 satisfies the condition f(x) \geq 0 \quad \text { for all } x \text { in } R, \tag{2.3} since a function satisfying (2.2) and (2.3) is the probability density function of a unique probability function , namely the probability function with value at any event given by (2.1) .
Some typical probability density functions are illustrated in Fig.2A .

Example 2A . Verifying that a function is a probability density function . Suppose one is told that the time one has to wait for a bus on a certain street corner is a numerical-valued random phenomenon, with a probability function, specified by the probability density function , given by \begin{align} f(x) & = \begin{cases} 4x - 2x^{2} - 1, & \text{if } 0 \leq x \leq 2 \tag{2.4} \\ 0, & \text{otherwise}. \end{cases} \end{align}
The function is negative for various values of ; in particular, it is negative for (prove this statement). Consequently, it is not possible for to be a probability density function. Next, suppose that the probability density function is given by \begin{align} f(x) & = \begin{cases} 4x - 2x^{2}, & \text{if } 0 \leq x \leq 2 \tag{2.5} \\ 0, & \text{otherwise}. \end{cases} \end{align}
The function , given by (2.5) , is nonnegative (prove this statement). However, its integral from to , is not equal to 1. Consequently the function , given by (2.5) is not a probability density function. However, the function , given by \begin{align} f(x) & = \begin{cases} \frac{3}{8}\left(4x - 2x^{2}\right), & \text{if } 0 \leq x \leq 2 \\ 0, & \text{otherwise}. \end{cases} \end{align}is a probability density function.
Example 2B . Computing probabilities from a probability density function . Let us consider again the numerical-valued random phenomenon, discussed in example 1A , that consists in observing the time one has to wait for a bus at a certain bus stop. Let us assume that the probability function of this phenomenon may be expressed by (2.1) in terms of the function , whose graph is sketched in Fig. 2B . An algebraic formula for can be written as follows: \begin{align} f(x) =\begin{cases} 0, & \text { for } x<0 \tag{2.6} \\[2mm] \left(\frac{1}{9}\right)(x+1), & \text { for } 0 \leq x < 1 \\[2mm] \left(\frac{4}{9}\right)\left(x-\left(\frac{1}{2}\right)\right), & \text { for } 1 \leq x<\left(\frac{3}{2}\right) \\[2mm] \left(\frac{4}{9}\right)\left(\left(\frac{5}{2}\right)-x\right), & \text { for }\left(\frac{3}{2}\right) \leq x<2 \\[2mm] \left(\frac{1}{9}\right)(4-x), & \text { for } 2 \leq x<3 \\[2mm] \left(\frac{1}{9}\right), & \text { for } 3 \leq x<6 \\[2mm] 0 & \text { for } 6 \leq x \end{cases} \end{align}

From (2.1) it follows that if and then \begin{gather} P[A]=\int_{0}^{2} f(x) d x=\frac{1}{2}, \quad P[B]=\int_{1}^{3} f(x) d x=\frac{1}{2}, \\ P[A B]=\int_{1}^{2} f(x) d x=\frac{1}{3}, \end{gather} which agree with the values assumed in example 1A .
Example 2C . The lifetime of a vacuum tube. Consider the numerical valued random phenomenon that consists in observing the total time a vacuum tube will burn from the moment it is first put into service. Suppose that the probability function of this phenomenon is expressed by (2.1) in terms of the function given by \begin{align} f(x) = \begin{cases} 0, & \text{for } x < 0 \\[2mm] \frac{1}{1000} e^{-\frac{x}{1000}}, & \text{for } x \geq 0. \end{cases} \end{align}
Let be the event that the tube burns between 100 and 1000 hours, inclusive, and let be the event that the tube burns more than 1000 hours. The events and may be represented as subsets of the real line: and . The probabilities of and are given by \begin{align} P[E] & = \int_{100}^{1000} f(x) dx = \frac{1}{1000} \int_{100}^{1000} e^{-(x / 1000)} dx \\[2mm] & = -e^{-(x / 1000)}\mid_{100}^{1000} = e^{-0.1}-e^{-1}=0.537. \\[2mm] P[F] & =\int_{1000}^{\infty} f(x) d x =\frac{1}{1000} \int_{1000}^{\infty} e^{-(x / 1000)} dx \\[2mm] & =-\left.e^{-(x / 1000)}\right|_{1000} ^{\infty} =e^{-1}=0.368. \end{align}
For many probability functions there exists a function , defined for all real numbers , but with value equal to 0 for all except for a finite or countably infinite set of values of at which is positive, such that from the value of can be obtained for any event by summation: P[E]=\sum_{\substack{\text { over all } \\ \text { points } x \text { in } E \\ \text { such that } p(x)>0}} p(x) \tag{2.7} In order that the sum in (2.7) may be meaningful, it suffices to impose the condition [letting in (2.7) ] that 1=\sum_{\substack{\text { over all } \\ \text { points } x \text { in } \\ \text { such that } p(x)>0}} p(x) \tag{2.8}
Given a probability function , which may be represented in the form (2.7) , we call the function the probability mass function of the probability function , and we say that the probability function is specified by the probability mass function .
A function , defined for all real numbers, is said to be a probability mass function if (i) equals zero for all , except for a finite or countably infinite set of values of for which , and (ii) the infinite series in (2.8) converges and sums to 1. Such a function is the probability mass function of a unique probability function defined on the subsets of the real line, namely the probability function with value at any set given by (2.7) .
Example 2D . Computing probabilities from a probability mass function . Let us consider again the numerical-valued random phenomenon considered in examples 1A and 2B. Let us assume that the probability function of this phenomenon may be expressed by (2.7) in terms of the function , whose graph is sketched in Fig. 2C .

An algebraic formula for can be written as follows: \begin{align} p(x) = \begin{cases} 0, & \text{unless } x = 0.3k \text{ for some } k = 0, 1, \ldots, 20 \\[2mm] \frac{1}{24}, & \text{for } x = 0, 0.3, 0.6, 0.9, 2.1, 2.4, 2.7, 3.0 \\[2mm] \frac{1}{9}, & \text{for } x = 1.2, 1.5, 1.8 \\[2mm] \frac{1}{30}, & \text{for } x = 3.3, 3.6, 3.9, 4.2, 4.5, 4.8, 5.1, 5.4, 5.7, 6.0 \end{cases} \end{align}
It then follows that \begin{align} P[A] & =p(0)+p(0.3)+p(0.6)+p(0.9)+p(1.2)+p(1.5)+p(1.8) \\ & =\frac{1}{2} \\ P[B] & =p(1.2)+p(1.5)+p(1.8)+p(2.1)+p(2.4)+p(2.7)+p(3.0) \\ & =\frac{1}{2} \\ P[A B] & =p(1.2)+p(1.5)+p(1.8)=\frac{1}{3} \end{align}which agree with the values assumed in example 1A .
The terminology of “density function” and “mass function” comes from the following physical representation of the probability function of a numerical-valued random phenomenon. We imagine that a unit mass of some substance is distributed over the real line in such a way that the amount of mass over any set of real numbers is equal to . The distribution of substance possesses a density, to be denoted by , at the point , if for any interval containing the point of length (where is a sufficiently small number) the mass of substance attached to the interval is equal to . The distribution of substance possesses a mass, to be denoted by , at the point , if there is a positive amount of substance concentrated at the point.
We shall see in section 3 that a probability function always possesses a probability density function and a probability mass function. Consequently, in order for a probability function to be specified by either its probability density function or its probability mass function, it is necessary (and, from a practical point of view, sufficient) that one of these functions vanish identically.
Exercises
Verify that each of the functions , given in exercises , is a probability density function (by showing that it satisfies (2.1) and (2.3) ) and sketch its graph. 3 Hint : use freely the facts developed in the appendix to this section.
2.1. \begin{align} \text{(i)} \quad f(x) & =\begin{cases} 1, & \text{for }0
2.2. \begin{align} \text{(i)} \quad f(x) & =\begin{cases} \frac{1}{2\sqrt{x}}, & \text{for }0
2.3. \begin{align} \text{(i)} \quad f(x) & =\begin{cases} \dfrac{1}{\pi \sqrt{1-x^{2}}}, & \text{for }|x|<1 \\ 0, & \text{elsewhere.} \end{cases} \\ \\ \text{(ii)} \quad f(x) & =\begin{cases} \dfrac{2}{\pi} \frac{1}{\sqrt{1-x^{2}}}, & \text{for }0
2.4. \begin{align} \text{(i)} \quad f(x) & =\begin{cases} e^{-x}, & \text{for }x \geq 0 \\ 0, & x<0 \end{cases} \\ \\ \text{(ii)} \quad f(x) & =\left(\frac{1}{2}\right) e^{-|x|} \\ \text{(iii)} \quad f(x) & =\frac{e^{x}}{\left(1+e^{x}\right)^{2}} \\ \text{(iv)} \quad f(x) & =\frac{2}{\pi} \frac{e^{x}}{1+e^{2 x}} \end{align}
2.5. \begin{align} \text{(i)} \quad f(x) & =\frac{1}{\sqrt{2 \pi}} e^{-\frac{1}{2} x^{2}} \\ \text{(ii)} \quad f(x) & =\frac{1}{2 \sqrt{2 \pi}} e^{-\frac{1}{2}\left(\frac{x-2}{2}\right)^{2}} \\ \text{(iii)} \quad f(x) & =\begin{cases} \frac{1}{\sqrt{2 \pi x}} e^{-x / 2}, & \text{for }x > 0 \\ 0, & \text{elsewhere.} \end{cases} \\ \\ \text{(iv)} \quad f(x) & =\begin{cases} \frac{1}{4} x e^{-x / 2}, & \text{for }x > 0 \\ 0, & \text{elsewhere.} \end{cases} \\ \end{align}
Show that each of the functions given in exercises 2.6 and 2.7 is a probability mass function [by showing that it satisfies (2.8) ], and sketch its graph.
Hint : use freely the facts developed in the appendix to this section.
2.6. \begin{align} \text{(i)} \quad p(x) & =\begin{cases} \dfrac{1}{3}, & \text{for } x = 0 \\[6pt] \dfrac{2}{3}, & \text{for } x = 1 \\[6pt] 0, & \text{elsewhere.} \end{cases} \\ \\ \text{(ii)} \quad p(x) & =\begin{cases} \displaystyle {6\choose x}\left(\frac{2}{3}\right)^{x}\left(\frac{1}{3}\right)^{6-x}, & \text{for } x= 0, 1, \dots, 6 \\[6pt] 0, & \text{elsewhere.} \end{cases} \\ \\ \text{(iii)} \quad p(x) & =\begin{cases} \dfrac{2}{3}\left(\dfrac{1}{3}\right)^{x-1}, & \text{for } x=1, 2, \dots, \\[2mm] 0, & \text{elsewhere.} \end{cases} \\ \\ \text{(iv)} \quad p(x) & =\begin{cases} e^{-2} \dfrac{2^{x}}{x !}, & \text{for } x=1, 2, \dots, \\[2mm] 0, & \text{elsewhere.} \end{cases} \end{align}
2.7. \begin{align} \text{(i)} \quad p(x) & =\begin{cases} \frac{\left(\begin{array}{l} 8 \\ x \end{array}\right)\left(\begin{array}{c} 4 \\ 6-x \end{array}\right)}{\left(\begin{array}{c} 12 \\ 6 \end{array}\right)}, & \text{for } x=0, 1, 2, 4, 5, 6 \\ 0, & \text{elsewhere.} \end{cases} \\ \\ \text{(ii)} \quad p(x) & =\begin{cases} \left(\begin{array}{c} 1+x \\ x \end{array}\right)\left(\frac{2}{3}\right)^{2}\left(\frac{1}{3}\right)^{x}, & \text{for } x=0, 1, 2, \dots \\ 0, & \text{elsewhere.} \end{cases} \\ \\ \text{(iii)} \quad p(x) & =\begin{cases} \frac{\left(\begin{array}{c}-8 \\ x\end{array}\right)\left(\begin{array}{c}-4 \\ 6-x\end{array}\right)}{\left(\begin{array}{c}-12 \\ 6\end{array}\right)}, & \text{for } x=0, 1, 2, 4, 5, 6 \\ 0, & \text{elsewhere.} \end{cases} \\ \end{align}
2.8. The amount of bread (in hundreds of pounds) that a certain bakery is able to sell in a day is found to be a numerical-valued random phenomenon, with a probability function specified by the probability density function , given by
\begin{align} f(x) = \begin{cases} A x, & \text{for } 0 \leq x < 5 \\[2mm] A(10 - x), & \text{for } 5 \leq x < 10 \\[2mm] 0, & \text{otherwise} \end{cases} \end{align}
- Find the value of which makes a probability density function.
- Graph the probability density function.
- What is the probability that the number of pounds of bread that will be sold tomorrow is (a) more than 500 pounds, less than 500 pounds, (c) between 250 and 750 pounds?
- Denote, respectively, by , and , the events that the number of pounds of bread sold in a day is ( ) greater than 500 pounds, less than 500 pounds, between 250 and 750 pounds. Find . Are and independent events? Are and independent events?
2.9. The length of time (in minutes) that a certain young lady speaks on the telephone is found to be a random phenomenon, with a probability function specified by the probability density function , given by \begin{align} f(x) = \begin{cases} A e^{-\frac{x}{5}}, & \text{for } x > 0 \\[2mm] 0, & \text{otherwise.} \end{cases} \end{align}
(i) Find the value of that makes a probability density function.
(ii) Graph the probability density function.
(iii) What is the probability that the number of minutes that the young lady will talk on the telephone is more than 10 minutes, less than 5 minutes, between 5 and 10 minutes?
(iv) For any real number , let denote the event that the young lady talks longer than minutes. Find . Show that, for and . In words, the conditional probability that a telephone conversation will last more than minutes, given that it has lasted at least minutes, is equal to the unconditional probability that it will last more than minutes.
Answer
(i) ; (iii) (a) 0.1353, (b) 0.6321, (c) 0.2326; (iv) .
2.10. The number of newspapers that a certain newsboy is able to sell in a day is found to be a numerical-valued random phenomenon, with a probability function specified by the probability mass function , given by
\begin{align} p(x) = \begin{cases} A x, & \text{for } x = 1, 2, \ldots, 50 \\[2mm] A(100 - x), & \text{for } x = 51, 52, \ldots, 100 \\[2mm] 0, & \text{otherwise.} \end{cases} \end{align}
(i) Find the value of that makes a probability mass function.
(ii) Sketch the probability mass function.
(iii) What is the probability that the number of newspapers that will be sold tomorrow is (a) more than 50, (b) less than 50, (c) equal to 50, (d) between 25 and 75, inclusive, (e) an odd number?
(iv) Denote, respectively, by , and , the events that the number of newspapers sold in a day is (a) greater than less than equal to 50, (d) between 25 and 75, inclusive. Find , . Are and independent events? Are and independent events? Are and independent events?
2.11. The number of times that a certain piece of equipment (say, a light switch) operates before having to be discarded is found to be a random phenomenon, with a probability function specified by the probability mass function , given by
\begin{align} p(x) = \begin{cases} A\left(\frac{1}{3}\right)^{x}, & \text{for } x = 0, 1, 2, \ldots \\[2mm] 0, & \text{otherwise.} \end{cases} \end{align}
(i) Find the value of which makes a probability mass function.
(ii) Sketch the probability mass function.
(iii) What is the probability that the number of times the equipment will operate before having to be discarded is (a) greater than an even number (regard 0 as even), (c) an odd number?
(iv) For any real number , let denote the event that the number of times the equipment operates is strictly greater than or equal to . Find . Show that, for any integers and . Express in words the meaning of this formula.
Answer
(i) ; (iii) (a) , (b) , (c) ; (iv) .
Appendix: The Evaluation of Integrals and Sums
If (2.1) and (2.7) are to be useful expressions for evaluating the probability of an event, then techniques must be available for evaluating sums and integrals. The purpose of this appendix is to state some of the notions and formulas with which the student should become familiar and to collect some important formulas that the reader should learn to use, even if he lacks the mathematical background to justify them.
To begin with, let us note the following principle. If a function is defined by different analytic expressions over various regions, then to evaluate an integral whose integrand is this function one must express the integral as a sum of integrals corresponding to the different regions of definition of the function . For example, consider the probability density function defined by \begin{align} f(x) = \begin{cases} x, & \text{for } 0 < x < 1 \\[2mm] 2 - x, & \text{for } 1 < x < 2 \\[2mm] 0, & \text{elsewhere.} \end{cases} \end{align} To prove that is a probability density function, we need to verify that (2.2) and (2.3) are satisfied. Clearly, (2.3) holds. Next, \begin{align} \int_{-\infty}^{\infty} f(x) d x & =\int_{0}^{2} f(x) d x+\int_{-\infty}^{0} f(x) d x+\int_{2}^{\infty} f(x) d x \\ & =\int_{0}^{1} f(x) d x+\int_{1}^{2} f(x) d x+0 \\ & =\left.\frac{x^{2}}{2}\right|_{0} ^{1}+\left.\left(2 x-\frac{x^{2}}{2}\right)\right|_{1} ^{2}=\frac{1}{2}+\left(2-\frac{3}{2}\right)=1, \end{align}and (2.2) has been shown to hold. It might be noted that the function in (2.10) can be written somewhat more concisely in terms of the absolute value notation: \begin{align} f(x) = \begin{cases} 1 - |1 - x|, & \text{for } 0 \leq x \leq 2 \\[2mm] 0, & \text{otherwise} \end{cases} \end{align} Next, in order to check his command of the basic techniques of integration, the reader should verify that the following formulas hold: \int e^{-x-e^{-x}} d x=\int e^{-e^{-x}} e^{-x} d x=e^{-e^{-x}}. \tag{2.12}
An important integration formula, obtained by integration by parts, is the following, for any real number for which the integrals make sense: \int x^{t-1} e^{-x} \cdot d x=-x^{t-1} e^{-x}+(t-1) \int x^{t-2} e^{-x} d x \tag{2.13} Thus, for we obtain \int x e^{-x} d x=-x e^{-x}+\int e^{-x} d x=-e^{-x}(x+1). \tag{2.14}
We next consider the Gamma function , which plays an important role in probability theory. It is defined for every by \Gamma(t)=\int_{0}^{\infty} x^{t-1} e^{-x} dx. \tag{2.15}
The Gamma function is a generalization of the factorial function in the following sense. From (2.13) it follows that \Gamma(t)=(t-1) \Gamma(t-1) . \tag{2.16} Therefore, for any integer , \Gamma(t+1)=t \Gamma(t)=t(t-1) \cdots(t-r) \Gamma(t-r) . \tag{2.17} Since, clearly, , it follows that for any integer \Gamma(n+1)=n ! \tag{2.18}
Next, it may be shown that for any integer \Gamma\left(n+\frac{1}{2}\right)=\frac{1 \cdot 3 \cdot 5 \cdots(2 n-1)}{2^{n}} \sqrt{\pi}, \tag{2.19} which may be written for any even integer \Gamma\left(\frac{n+1}{2}\right)=\frac{1 \cdot 3 \cdot 5 \cdots(n-1)}{2^{n / 2}} \sqrt{\pi}, \tag{2.20} since \Gamma\left(\frac{1}{2}\right)=\sqrt{\pi} . \tag{2.21}
We prove (2.21) by showing that is equal to another integral of whose value we have need. In (2.15) , make the change of variable , and let . Then, for any integer, , we have the formula \Gamma\left(\frac{n+1}{2}\right)=\frac{1}{2^{(n-1) / 2}} \int_{0}^{\infty} y^{n} e^{-1 / 2 y^{2}} dy. \tag{2.22}
In view of (2.22) , to establish (2.21) we need only show that \Gamma\left(\frac{1}{2}\right)=\sqrt{2} \int_{0}^{\infty} e^{-1 / 2 y^{2}} d y=\frac{1}{\sqrt{2}} \int_{-\infty}^{\infty} e^{-1 / 2 y^{2}} d y=\sqrt{\pi}. \tag{2.23}
We prove (2.23) by proving the following basic formula; for any \frac{1}{\sqrt{2 \pi}} \int_{-\infty}^{\infty} e^{-\frac{1}{2} u y^{2}} d y=\frac{1}{\sqrt{u}}. \tag{2.24}
Equation (2.24) may be derived as follows. Let be the value of the integral in (2.24) . Then is a product of two single integrals. By the theorem for the evaluation of double integrals, it then follows that I^{2}=\frac{1}{2 \pi} \int_{-\infty}^{\infty} \int_{-\infty}^{\infty} \exp \left[-\frac{1}{2} u\left(x^{2}+y^{2}\right)\right] dx dy. \tag{2.25}
We now evaluate the double integral in (2.25) by means of a change of variables to polar coordinates. Then so that , which proves (2.24) .
For large values of there is an important asymptotic formula for the Gamma function, which is known as Stirling’s formula . Taking , in which is a positive integer, this formula can be written \begin{align} \log n ! & =\left(n+\frac{1}{2}\right) \log n-n+\frac{1}{2} \log 2 \pi+\frac{r(n)}{12 n}, \\[2mm] n! & =\left(\frac{n}{e}\right)^{n} \sqrt{2 \pi n} e^{r(n) / 12 n}, \tag{2.26} \end{align} in which satisfies . The proof of Stirling’s formula may be found in many books. A particularly clear derivation is given by H. Robbins, “A Remark on Stirling’s Formula”, American Mathematical Monthly , Vol. 62 (1955), pp. 26–29.
We next turn to the evaluation of sums and infinite sums . The major tool in the evaluation of infinite sums is Taylor’s theorem, which states that under certain conditions a function may be expanded in a power series: g(x)=\sum_{k=0}^{\infty} \frac{x^{k}}{k !} g^{(k)}(0), \tag{2.27} in which denotes the value at of the th derivative of . Letting , we obtain e^{x}=\sum_{k=0}^{\infty} \frac{x^{k}}{k !}=1+x+\frac{x^{2}}{2 !}+\cdots+\frac{x^{n}}{n !}+\cdots,-\infty
Take next , in which Clearly \begin{align} g^{(k)}(x) = \begin{cases} (-1)^{k}(n)_{k}(1 - x)^{n - k}, & \text{for } k = 0, 1, \ldots, n \\[2mm] 0, & \text{for } k > n. \end{cases} \end{align}
Consequently, for (1-x)^{n}=\sum_{k=0}^{n}(-1)^{k}\left(\begin{array}{l} n \tag{2.30} \\ k \end{array}\right) x^{k}, \quad-\infty
We obtain an important generalization of the binomial theorem by taking , in which is any real number. For any real number and any integer define the binomial coefficient \begin{align} \binom{t}{k} = \begin{cases} \frac{t(t-1) \cdots (t-k+1)}{k!}, & \text{for } k = 1, 2, \ldots \tag{2.31} \\[2mm] 1, & \text{for } k = 0 \end{cases} \end{align}
Note that for any positive number \begin{align} \left(\begin{array}{c}-n \\ k\end{array}\right) & =(-1)^{k} \frac{n(n+1) \cdots(n+k-1)}{k !} \\ & =(-1)^{k}\left(\begin{array}{c}n+k-1 \\ k\end{array}\right). \tag{2.32} \end{align}
By Taylor’s theorem, we obtain the important formula for all real numbers and , (1-x)^{t}=\sum_{k=0}^{\infty}\left(\begin{array}{l} t \tag{2.33} \\ k \end{array}\right)(-x)^{k}.
For the case of positive we may write, in view of (2.32) , (1-x)^{-n}=\sum_{k=0}^{\infty}\left(\begin{array}{c} n+k-1 \tag{2.34} \\ k \end{array}\right) x^{k}, \quad|x|<1
Equation (2.34) , with , is the familiar formula for the sum of a geometric series: \sum_{k=0}^{\infty} x^{k}=1+x+x^{2}+\cdots+x^{n}+\cdots=\frac{1}{1-x}, \quad|x|<1 . \tag{2.35}
Equation (2.34) with and 3 yields the formulas \begin{align} & \sum_{k=0}^{\infty}(k+1) x^{k}=1+2 x+3 x^{2}+\cdots=\frac{1}{(1-x)^{2}}, \quad|x|<1, \tag{2.36} \\ & \sum_{k=0}^{\infty}(k+2)(k+1) x^{k}=\frac{2}{(1-x)^{3}}, \quad|x|<1 . \end{align}
From (2.33) we may obtain another important formula. By a comparison of the coefficients of on both sides of the equation we obtain for any real numbers and and any positive integer \left(\begin{array}{l} s \tag{2.37} \\ 0 \end{array}\right)\left(\begin{array}{l} t \\ n \end{array}\right)+\left(\begin{array}{l} s \\ 1 \end{array}\right)\left(\begin{array}{c} t \\ n-1 \end{array}\right)+\cdots+\left(\begin{array}{l} s \\ n \end{array}\right)\left(\begin{array}{l} t \\ 0 \end{array}\right)=\left(\begin{array}{c} s+t \\ n \end{array}\right).
If and are positive integers (2.37) could be verified by mathematical induction. A useful special case of (2.37) is when ; we then obtain (5.13) of Chapter 2 .
Theoretical Exercises
2.1 . Show that for any positive real numbers , and
\begin{align} \int_{0}^{\infty} x^{\beta-1} e^{-\alpha x} d x & =\frac{\Gamma(\beta)}{\alpha^{\beta}} \\ \frac{\alpha^{\beta}}{\Gamma(\beta)} \int_{0}^{\infty} e^{-t x} x^{\beta-1} e^{-\alpha x} d x & =\left(1+\frac{t}{\alpha}\right)^{-\beta} . \tag{2.38} \end{align}
2.2 . Show for any and 2 \int_{0}^{\infty} y^{n} e^{-\frac{1}{2}(y / \sigma)^{2}} d y=\left(2 \sigma^{2}\right)^{(n+1) / 2} \Gamma\left(\frac{n+1}{2}\right). \tag{2.39}
2.3 . The integral B(m, n)=\int_{0}^{1} x^{m-1}(1-x)^{n-1} dx, \tag{2.40} which converges if and are positive, defines a function of and , called the beta function . Show that the beta function is symmetrical in its arguments, , and may be expressed [letting and , respectively] by \begin{align} B(m, n) & =2 \int_{0}^{\pi / 2} \sin ^{2 m-1} \theta \cos ^{2 n-1} \theta d \theta \tag{2.41} \\ & =\int_{0}^{\infty} \frac{y^{n-1}}{(1+y)^{m+n}} dy. \end{align}
Show finally that the beta and gamma functions are connected by the relation B(m, n)=\frac{\Gamma(m) \Gamma(n)}{\Gamma(m+n)}. \tag{2.42} Hint: By changing to polar coordinates, we have \begin{align} \Gamma(m) \Gamma(n) & =4 \int_{0}^{\infty} \int_{0}^{\infty} x^{2 m-1} e^{-x^{2}} y^{2 n-1} e^{-y^{2}} d x d y \\ & =4 \int_{0}^{\pi / 2} d \theta \cos ^{2 m-1} \theta \sin ^{2 n-1} \theta \int_{0}^{\infty} d r e^{-r^{2} r^{2 m+2 n-1}} . \end{align}
2.5 . Prove that the integral defining the gamma function converges for any real number .
2.6 . Prove that the integral defining the beta function converges for any real numbers and , such that and .
2.7 . Taylor’s theorem with remainder . Show that if the function has a continuous th derivative in some interval containing the origin then for in this interval
\begin{align} g(x)=g(0)+x g^{\prime}(0)+\frac{x^{2}}{2 !} g^{\prime \prime}(0) & +\cdots+\frac{x^{n-1}}{(n-1) !} g^{(n-1)}(0). \tag{2.43} \\ & +\frac{x^{n}}{(n-1) !} \int_{0}^{1} d t(1-t)^{n-1} g^{(n)}(x t) \end{align}Hint : Show, for , that \begin{align} -\frac{x^{k}}{(k-1) !} \int_{0}^{1} g^{(k)}(x t)(1-t)^{k-1} d t+\frac{x^{k-1}}{(k-2) !} \int_{0}^{1} g^{(k-1)}&(x t)(1-t)^{k-2} d t \\ &=\frac{x^{k-1}}{(k-1) !} g^{(k-1)}(0). \end{align}
2.8 . Lagrange’s form of the remainder in Taylor’s theorem . Show that if has a continuous th derivative in the closed interval from 0 to , where may be positive or negative, then \int_{0}^{1} g^{(n)}(x t)(1-t)^{n-1} d t=\frac{1}{n} g^{(n)}(\theta x) \tag{2.44} for some number in the interval .
- We usually assume that the integral in (2.1) is defined in the sense of Riemann; to ensure that this is the case, we require that the function be defined and continuous at all but a finite number of points. The integral in (2.1) is then defined only for events , which are either intervals or unions of a finite number of non-overlapping intervals. In advanced probability theory the integral in (2.1) is defined by means of a theory of integration developed in the early 1900’s by Henri Lebesgue. The function must then be a Borel function, by which is meant that for any real number the set is a Borel set. A function that is continuous at all but a finite number of points may be shown to be a Borel function. It may be shown that if a Borel function satisfies (2.1) and (2.3) then, for any Borel set , the integral of over exists as an integral defined in the sense of Lebesgue. If is an interval, or a union of a finite number of non-overlapping intervals, and if is continuous on , then the integral of over , defined in the sense of Lebesgue, has the same value as the integral of over , defined in the sense of Riemann. Henceforth, in this book the word function (unless otherwise qualified) will mean a Borel function and the word set (of real numbers) will mean a Borel set. ↩︎
- For the purposes of this book we also require that a probability density function be defined and continuous at all but a finite number of points. ↩︎
- The reader should note the convention used in the exercises of this book. When a function is defined by a single analytic expression for all in , the fact that varies between and is not explicitly indicated. ↩︎