The fundamental empirical fact upon which are based all applications of the theory of probability is expressed in the empirical law of large numbers, first formulated by Poisson (in his book, Recherches sur le probabilité des jugements , 1837):
In many different fields, empirical phenomena appear to obey a certain general law, which can be called the Law of Large Numbers. This law states that the ratios of numbers derived from the observation of a very large number of similar events remain practically constant, provided that these events are governed partly by constant factors and partly by variable factors whose variations are irregular and do not cause a systematic change in a definite direction. Certain values of these relations are characteristic of each given kind of event. With the increase in length of the series of observations the ratios derived from such observations come nearer and nearer to these characteristic constants. They could be expected to reproduce them exactly if it were possible to make series of observations of an infinite length.
In the mathematical theory of probability one may prove a proposition, called the mathematical law of large numbers, that may be used to gain insight into the circumstances under which the empirical law of large numbers is expected to hold. For an interesting philosophical discussion of the relation between the empirical and the mathematical laws of large numbers and for the foregoing quotation from Poisson the reader should consult Richard von Mises, Probability, Statistics, and Truth , second revised edition, Macmillan, New York, 1957, pp. 104–134.
A sequence of jointly distributed random variables, , with finite means, is said to obey the (classical) law of large numbers if
Z_{n}=\frac{X_{1}+X_{2}+\cdots+X_{n}}{n}-\frac{E\left(X_{1}+\cdots+X_{n}\right)}{n} \rightarrow 0. \tag{2.1}
in some mode of convergence as tends to . The sequence is said to obey the strong law of large numbers, the weak law of large numbers, or the quadratic mean law of large numbers, depending on whether the convergence in (2.1) is with probability one, in probability, or in quadratic mean. In this section we give conditions, both for independent and dependent random variables, for the law of large numbers to hold.
We consider first the case of independent random variables with finite means. We prove in section 3 that a sequence of independent identically distributed random variables obeys the weak law of large numbers if the common mean is finite. It may be proved (see Loève, Probability Theory , Van Nostrand, New York, 1955, p. 243) that the finiteness of also implies that the sequence of independent identically distributed random variables obeys the strong law of large numbers.
In theoretical exercise 4.2 we indicate the proof of the law of large numbers for independent, not necessarily identically distributed, random variables with finite means: if, for some
\lim _{n \rightarrow \infty} \frac{1}{n^{1+\delta}} \sum_{k=1}^{n} E\left|X_{k}-E\left[X_{k}\right]\right|^{1+\delta}=0, \tag{2.2}
then
\operatorname{plim}_{n \rightarrow \infty} \frac{1}{n} \sum_{k=1}^{n}\left(X_{k}-E\left[X_{k}\right]\right)=0. \tag{2.3}
Equation (2.2) is known as Markov’s condition for the validity of the weak law of large numbers for independent random variables.
In this section we consider the case of dependent random variables , with finite means (which we may take to be 0), and finite variances . We state conditions for the validity of the quadratic mean law of large numbers and the strong law of large numbers, which, while not the most general conditions that can be stated, appear to be general enough for most practical applications. Our conditions are stated in terms of the behavior, as tends to , of the covariance
C_{n}=E\left[Z_{n} X_{n}\right]=\frac{1}{n} \sum_{k=1}^{n} E\left[X_{k} X_{n}\right] \tag{2.4}
between the th summand and the th sample mean
Let us examine the possible behavior of under various assumptions on the sequence and under the assumption that the variances are uniformly bounded; that is, there is a constant such that
\sigma_{n}^{2}=\operatorname{Var}\left[X_{n}\right] \leq M \quad \text { for all } n. \tag{2.5}
If the random variables are independent, then if . Consequently, , which, under condition (2.5), tends to 0 as tends to . This is also the case if the random variables are assumed to be orthogonal. The sequence of random variables is said to be orthogonal if, for any integer and integer . Then, again, .
More generally, let us consider random variables that are stationary (in the wide sense); this means that there is a function , defined for , such that, for any integers and ,
E\left[X_{k} X_{k+m}\right]=R(m). \tag{2.6}
It is clear that an orthogonal sequence of random variables (in which all the random variables have the same variance ) is stationary, with or 0, depending on whether or . For a stationary sequence the value of is given by
C_{n}=\frac{1}{n} \sum_{k=0}^{n-1} R(k). \tag{2.7}
We now show that under condition (2.5) a necessary and sufficient condition for the sample mean to converge in quadratic mean to 0 is that tends to 0. In theorem 2B we state conditions for the sample mean to converge with probability one to 0.
Theorem 2A. A sequence of jointly distributed random variables with zero mean and uniformly bounded variances obeys the quadratic mean law of large numbers (in the sense that ) if and only if
\lim _{n \rightarrow \infty} C_{n}=\lim _{n \rightarrow \infty} E\left[X_{n} Z_{n}\right]=0. \tag{2.8}
Proof
Since , it is clear that if the quadratic mean law of large numbers holds and if the variances are bounded uniformly in , then (2.8) holds. To prove the converse, we prove first the following useful identity:
E\left[Z_{n}^{2}\right]+\frac{1}{n^{2}} \sum_{k=1}^{n} E\left[X_{k}^{2}\right]=\frac{2}{n^{2}} \sum_{k=1}^{n} k E\left[X_{k} Z_{k}\right]. \tag{2.9}
To prove (2.9), we write the familiar formula
\begin{align} E\left[\left(X_{1}+\cdots+X_{n}\right)^{2}\right] & =\sum_{k=1}^{n} E\left[X_{k}^{2}\right]+2 \sum_{k=1}^{n} \sum_{j=1}^{k-1} E\left[X_{k} X_{j}\right] \tag{2.10} \\ & =2 \sum_{k=1}^{n} \sum_{j=1}^{k} E\left[X_{k} X_{j}\right]-\sum_{k=1}^{n} E\left[X_{k}^{2}\right] \\ & =2 \sum_{k=1}^{n} k E\left[X_{k} Z_{k}\right]-\sum_{k=1}^{n} E\left[X_{k}^{2}\right], \end{align}
from which (2.9) follows by dividing through by . In view of (2.9), to complete the proof that (2.8) implies tends to 0, it suffices to show that (2.8) implies
\lim _{n \rightarrow \infty} \frac{1}{n^{2}} \sum_{k=1}^{n} k C_{k}=0. \tag{2.11}
To see (2.11), note that for any
\frac{1}{n^{2}} \sum_{k=1}^{n} k C_{k} \leq \frac{1}{n^{2}} \sum_{k=1}^{N}\left|k C_{k}\right|+\sup _{N \leq k}\left|C_{k}\right|. \tag{2.12}
Letting first tend to infinity and then tend to in (2.12), we see that (2.11) holds. The proof of theorem 2A is complete.
If it is known that tends to 0 as some power of , then we can conclude that convergence holds with probability one.
Theorem 2B. A sequence of jointly distributed random variables with zero mean and uniformly bounded variances obeys the strong law of large numbers (in the sense that ) if positive constants and exist such that for all integers \left|E\left[X_{n} Z_{n}\right]\right|=\left|C_{n}\right| \leq \frac{M}{n^{q}}. \tag{2.13}
Remark: For a stationary sequence of random variables [in which case is given by (2.7)] (2.13) holds if positive constants and exist such that for all integers |R(m)| \leq \frac{M}{m^{q}}. \tag{2.14}
Proof
If (2.13) holds, then (assuming, as we may, that )
\frac{1}{n^{2}} \sum_{k=1}^{n} k C_{k} \leq \frac{M}{n^{2}} \sum_{k=1}^{n} k^{1-q} \leq \frac{M}{n^{2}} \int_{1}^{n+1} x^{1-q} d x \leq \frac{4 M}{2-q} \frac{1}{n^{q}}. \tag{2.15}
By (2.15) and (2.9), it follows that for some constant M^{\prime} and
E\left[Z_{n}^{2}\right] \leq \frac{M^{\prime}}{n^{q}} \quad \text { for all integers } n. \tag{2.16}
Choose now any integer such that and define a sequence of random variables Z_{1}^{\prime}, Z_{2}^{\prime}, \ldots, Z_{m}^{\prime} by taking for Z_{m}^{\prime} the th member of the sequence ; in symbols,
Z_{m}^{\prime}=Z_{m^{r}} \quad \text { for } m=1,2, \cdots. \tag{2.17}
By (2.16), the sequence \left\{Z_{m}^{\prime}\right\} has a mean square satisfying
E\left[Z_{m}^{\prime 2}\right] \leq \frac{M^{\prime}}{m^{r q}}. \tag{2.18}
If we sum (2.18) over all , we obtain a convergent series, since :
\sum_{m=1}^{\infty} E\left[Z_{m}^{\prime 2}\right] \leq M^{\prime} \sum_{m=1}^{\infty} m^{-r q}<\infty. \tag{2.19}
Therefore, by theorem 1A, it follows that
P\left[\lim _{m \rightarrow \infty} Z_{m}^{\prime}=0\right]=P\left[\lim _{m \rightarrow \infty} Z_{m^{r}}=0\right]=1. \tag{2.20}
We have thus shown that a properly selected subsequence of the sequence converges to 0 with probability one. We complete the proof of theorem by showing that the members of the sequence , located between successive members of the subsequence , do not tend to be too different from the members of the subsequence. More precisely, define
\begin{align} U_{m} & =\max _{m^{r} \leq n<(m+1)^{r}}\left|Z_{m^{r}}-\frac{n}{m^{r}} Z_{n}\right| \\ V_{m} & =\max _{m^{r} \leq n<(n+1)^{r}}\left|Z_{n}-\frac{n}{m^{r}} Z_{n}\right| \\ W_{m} & =\max _{m^{r} \leq n<(m+1)^{r}}\left|Z_{m^{r}}-Z_{n}\right|. \end{align}
We claim it is clear, in view of (2.20), that to show that it suffices to show that . Consequently, to complete the proof it suffices to show that
P\left[\lim _{m \rightarrow \infty} U_{m}=0\right]=P\left[\lim _{m \rightarrow \infty} V_{m}=0\right]=1. \tag{2.22}
In view of theorem 1A, to show that (2.22) holds, it suffices to show that
\sum_{m=1}^{\infty} E\left[U_{m}^{2}\right]<\infty, \quad \sum_{m=1}^{\infty} E\left[V_{m}^{2}\right]<\infty. \tag{2.23}
We prove that (2.23) holds by showing that for some constants and
\quad E^{1 / 2}\left[U_{m}^{2}\right] \leq \frac{M_{U}}{m}, \quad E^{1 / 2}\left[V_{m}^{2}\right] \leq \frac{M_{V}}{m} \quad \text{for all integers}\; m. \tag{2.24}
To prove (2.24), we note that
\left|U_{m}\right| \leq \frac{1}{m^{r}} \sum_{k=m^{r}}^{(m+1)^{r}-1}\left|X_{k}\right| \tag{2.25}
from which it follows that
E^{1 /}\left[U_{m}^{2}\right] \leq \frac{1}{m^{r}} \sum_{k=m^{r}}^{(m+1)^{r}-1} E^{1 / 2}\left[X_{k}^{2}\right] \leq \frac{(m+1)^{r}-m^{r}}{m^{r}} M, \tag{2.26}
in which we use as a bound for . By a calculus argument, using the law of the mean, one may show that for and
\left(1+\frac{1}{m}\right)^{r}-1 \leq \frac{1}{m} r 2^{r-1}. \tag{2.27}
Consequently, (2.26) implies the first part of (2.24). Similarly,
\begin{align} \left|V_{m}\right| \leq\left|\left(1+\frac{1}{m}\right)^{r}-1\right| \frac{1}{m^{r}} \sum_{k=1}^{(m+1)^{r}-1}\left|X_{k}\right|, \tag{2.28} \\ E^{1 / 2}\left[V_{m}^{2}\right] \leq\left(\frac{1}{m} r 2^{r-1}\right) \frac{1}{m^{r}} \sum_{k=1}^{(m+1)^{r}-1} E^{1 / 2}\left[X_{k}^{2}\right], \tag{2.29} \end{align}
from which one may infer the second part of (2.24). The.proof of theorem 2B is now complete.
Exercises
2.1. Random digits. Consider a discrete random variable uniformly distributed over the numbers 0 to for any integer ; that is, if . Let be a sequence of independent random variables identically distributed as . For an integer from 0 to define as the fraction of the observations equal to . Prove that
2.2. The distribution of digits in the decimal expansion of a random number. Let be a number chosen at random from the unit interval (that is, is a random variable uniformly distributed over the interval 0 to 1). Let be the successive digits in the decimal expansion of ; that is,
Prove that the random variables are independent discrete random variables uniformly distributed over the integers 0 to 9. Consequently, conclude that for any integer (say, the integer 7) the relative frequency of occurrence of in the decimal expansion of any number in the unit interval is equal to for all numbers , except a set of numbers constituting a set of probability zero. Does the fact that only 3’s occur in the decimal expansion of contradict the assertion?
2.3. Convergence of the sample distribution function and the sample characteristic function of dependent random variables . Let be a sequence of random variables identically distributed as a random variable . The sample distribution function is defined as the fraction of observations among which are less than or equal to . The sample characteristic function is defined by
Show that converges in quadratic mean to , as , if and only if
\frac{1}{n} \sum_{k=1}^{n} P\left[X_{k} \leq y, X_{n} \leq y\right] \rightarrow\left|F_{X}(y)\right|^{2}. \tag{2.30}
Show that converges in quadratic mean to if and only if
C_{n}=\frac{1}{n} \sum_{k=1}^{n} E\left[e^{i u\left(X_{k}-X_{n}\right)}\right] \rightarrow\left|\phi_{X}(u)\right|^{2}. \tag{2.31}
Prove that (2.30) and (2.31) hold if the random variables are independent.
2.4. The law of large numbers does not hold for Cauchy distributed random variables. Let be a sequence of independent identically distributed random variables with probability density functions . Show that no finite constant exists to which the sample means converge in probability.
2.5. Let be a sequence of independent random variables identically distributed as a random variable with finite mean. Show that for any bounded continuous function of a real variable
\lim _{n \rightarrow \infty} E\left[f\left(\frac{X_{1}+\cdots+X_{n}}{n}\right)\right]=f(E[X]). \tag{2.32}
Consequently, conclude that
\begin{align} & \lim _{n \rightarrow \infty} \int_{0}^{1} \cdots \int_{0}^{1} f\left(\frac{x_{1}+\cdots+x_{n}}{n}\right) d x_{1} \cdots d x_{n}=f\left(\frac{1}{2}\right) \tag{2.33} \\ & \lim _{n \rightarrow \infty} \sum_{k=0}^{n} f\left(\frac{k}{n}\right)\left(\begin{array}{l} n \\ k \end{array}\right) t^{k}(1-t)^{n-k}=f(t), \quad 0 \leq t \leq 1. \tag{2.34} \end{align}
2.6. A probabilistic proof of Weierstrass’ theorem: Extend (2.34) to show that to any continuous function on the interval there exists a sequence of polynomials such that uniformly on .