The fundamental empirical fact upon which are based all applications of the theory of probability is expressed in the empirical law of large numbers, first formulated by Poisson (in his book, Recherches sur le probabilité des jugements , 1837):

In many different fields, empirical phenomena appear to obey a certain general law, which can be called the Law of Large Numbers. This law states that the ratios of numbers derived from the observation of a very large number of similar events remain practically constant, provided that these events are governed partly by constant factors and partly by variable factors whose variations are irregular and do not cause a systematic change in a definite direction. Certain values of these relations are characteristic of each given kind of event. With the increase in length of the series of observations the ratios derived from such observations come nearer and nearer to these characteristic constants. They could be expected to reproduce them exactly if it were possible to make series of observations of an infinite length.

In the mathematical theory of probability one may prove a proposition, called the mathematical law of large numbers, that may be used to gain insight into the circumstances under which the empirical law of large numbers is expected to hold. For an interesting philosophical discussion of the relation between the empirical and the mathematical laws of large numbers and for the foregoing quotation from Poisson the reader should consult Richard von Mises, Probability, Statistics, and Truth , second revised edition, Macmillan, New York, 1957, pp. 104–134.

A sequence of jointly distributed random variables, \(X_{1}, X_{2}, \ldots, X_{n}\) , with finite means, is said to obey the (classical) law of large numbers if

\[Z_{n}=\frac{X_{1}+X_{2}+\cdots+X_{n}}{n}-\frac{E\left(X_{1}+\cdots+X_{n}\right)}{n} \rightarrow 0. \tag{2.1}\] 

in some mode of convergence as \(n\) tends to \(\infty\) . The sequence \(\left\{X_{n}\right\}\) is said to obey the strong law of large numbers, the weak law of large numbers, or the quadratic mean law of large numbers, depending on whether the convergence in (2.1) is with probability one, in probability, or in quadratic mean. In this section we give conditions, both for independent and dependent random variables, for the law of large numbers to hold.

We consider first the case of independent random variables with finite means. We prove in section 3 that a sequence of independent identically distributed random variables obeys the weak law of large numbers if the common mean \(E[X]\) is finite. It may be proved (see Loève, Probability Theory , Van Nostrand, New York, 1955, p. 243) that the finiteness of \(E[X]\) also implies that the sequence of independent identically distributed random variables obeys the strong law of large numbers.

In theoretical exercise 4.2 we indicate the proof of the law of large numbers for independent, not necessarily identically distributed, random variables with finite means: if, for some \(\delta>0\) 

\[\lim _{n \rightarrow \infty} \frac{1}{n^{1+\delta}} \sum_{k=1}^{n} E\left|X_{k}-E\left[X_{k}\right]\right|^{1+\delta}=0, \tag{2.2}\] 

then

\[\operatorname{plim}_{n \rightarrow \infty} \frac{1}{n} \sum_{k=1}^{n}\left(X_{k}-E\left[X_{k}\right]\right)=0. \tag{2.3}\] 

Equation (2.2) is known as Markov’s condition for the validity of the weak law of large numbers for independent random variables.

In this section we consider the case of dependent random variables \(X_{k}\) , with finite means (which we may take to be 0), and finite variances \(\sigma_{k}^{2}=E\left[X_{k}^{2}\right]\) . We state conditions for the validity of the quadratic mean law of large numbers and the strong law of large numbers, which, while not the most general conditions that can be stated, appear to be general enough for most practical applications. Our conditions are stated in terms of the behavior, as \(n\) tends to \(\infty\) , of the covariance

\[C_{n}=E\left[Z_{n} X_{n}\right]=\frac{1}{n} \sum_{k=1}^{n} E\left[X_{k} X_{n}\right] \tag{2.4}\] 

between the \(n\) th summand \(X_{n}\) and the \(n\) th sample mean

\[Z_{n}=\frac{X_{1}+X_{2}+\cdots+X_{n}}{n}.\] 

Let us examine the possible behavior of \(C_{n}\) under various assumptions on the sequence \(\left\{X_{n}\right\}\) and under the assumption that the variances \(\operatorname{Var}\left[X_{n}\right]\) are uniformly bounded; that is, there is a constant \(M\) such that

\[\sigma_{n}^{2}=\operatorname{Var}\left[X_{n}\right] \leq M \quad \text { for all } n. \tag{2.5}\] 

If the random variables \(\left\{X_{n}\right\}\) are independent, then \(E\left[X_{k} X_{n}\right]=0\) if \(k. Consequently, \(C_{n}=\sigma_{n}^{2} / n\) , which, under condition (2.5), tends to 0 as \(n\) tends to \(\infty\) . This is also the case if the random variables \(\left\{X_{n}\right\}\) are assumed to be orthogonal. The sequence of random variables \(\left\{X_{n}\right\}\) is said to be orthogonal if, for any integer \(k\) and integer \(m \neq 0, E\left[X_{k} X_{k+m}\right]=0\) . Then, again, \(C_{n}=\sigma_{n}^{2} / n\) .

More generally, let us consider random variables \(\left\{X_{n}\right\}\) that are stationary (in the wide sense); this means that there is a function \(R(m)\) , defined for \(m=0,1,2, \ldots\) , such that, for any integers \(k\) and \(m\) ,

\[E\left[X_{k} X_{k+m}\right]=R(m). \tag{2.6}\] 

It is clear that an orthogonal sequence of random variables (in which all the random variables have the same variance \(\sigma^{2}\) ) is stationary, with \(R(m)=\sigma^{2}\) or 0, depending on whether \(m=0\) or \(m>0\) . For a stationary sequence the value of \(C_{n}\) is given by

\[C_{n}=\frac{1}{n} \sum_{k=0}^{n-1} R(k). \tag{2.7}\] 

We now show that under condition (2.5) a necessary and sufficient condition for the sample mean \(Z_{n}\) to converge in quadratic mean to 0 is that \(C_{n}\) tends to 0. In theorem 2B we state conditions for the sample mean \(Z_{n}\) to converge with probability one to 0.

Theorem 2A. A sequence of jointly distributed random variables \(\left\{X_{n}\right\}\) with zero mean and uniformly bounded variances obeys the quadratic mean law of large numbers (in the sense that \(\lim E\left[Z_{n}^{2}\right]=0\) ) if and only if \(n \rightarrow \infty\) 

\[\lim _{n \rightarrow \infty} C_{n}=\lim _{n \rightarrow \infty} E\left[X_{n} Z_{n}\right]=0. \tag{2.8}\] 

 

Proof

Since \(E^{2}\left[X_{n} Z_{n}\right] \leq E\left[X_{n}^{2}\right] E\left[Z_{n}^{2}\right]\) , it is clear that if the quadratic mean law of large numbers holds and if the variances \(E\left[X_{n}^{2}\right]\) are bounded uniformly in \(n\) , then (2.8) holds. To prove the converse, we prove first the following useful identity:

 

\[E\left[Z_{n}^{2}\right]+\frac{1}{n^{2}} \sum_{k=1}^{n} E\left[X_{k}^{2}\right]=\frac{2}{n^{2}} \sum_{k=1}^{n} k E\left[X_{k} Z_{k}\right]. \tag{2.9}\] 

To prove (2.9), we write the familiar formula

\begin{align} E\left[\left(X_{1}+\cdots+X_{n}\right)^{2}\right] & =\sum_{k=1}^{n} E\left[X_{k}^{2}\right]+2 \sum_{k=1}^{n} \sum_{j=1}^{k-1} E\left[X_{k} X_{j}\right] \tag{2.10} \\ & =2 \sum_{k=1}^{n} \sum_{j=1}^{k} E\left[X_{k} X_{j}\right]-\sum_{k=1}^{n} E\left[X_{k}^{2}\right] \\ & =2 \sum_{k=1}^{n} k E\left[X_{k} Z_{k}\right]-\sum_{k=1}^{n} E\left[X_{k}^{2}\right], \end{align} 

from which (2.9) follows by dividing through by \(n^{2}\) . In view of (2.9), to complete the proof that (2.8) implies \(E\left[Z_{n}^{2}\right]\) tends to 0, it suffices to show that (2.8) implies

\[\lim _{n \rightarrow \infty} \frac{1}{n^{2}} \sum_{k=1}^{n} k C_{k}=0. \tag{2.11}\] 

To see (2.11), note that for any \(n>N>0\) 

\[\frac{1}{n^{2}} \sum_{k=1}^{n} k C_{k} \leq \frac{1}{n^{2}} \sum_{k=1}^{N}\left|k C_{k}\right|+\sup _{N \leq k}\left|C_{k}\right|. \tag{2.12}\] 

Letting first \(n\) tend to infinity and then \(N\) tend to \(\infty\) in (2.12), we see that (2.11) holds. The proof of theorem 2A is complete.

If it is known that \(C_{n}\) tends to 0 as some power of \(n\) , then we can conclude that convergence holds with probability one.

Theorem 2B. A sequence of jointly distributed random variables \(\left\{X_{n}\right\}\) with zero mean and uniformly bounded variances obeys the strong law of large numbers (in the sense that \(P\left[\lim _{n \rightarrow \infty} Z_{n}=0\right]=1\) ) if positive constants \(M\) and \(q\) exist such that for all integers \(n\) \[\left|E\left[X_{n} Z_{n}\right]\right|=\left|C_{n}\right| \leq \frac{M}{n^{q}}. \tag{2.13}\] 

Remark: For a stationary sequence of random variables [in which case \(C_{n}\) is given by (2.7)] (2.13) holds if positive constants \(M\) and \(q\) exist such that for all integers \(m \geq 1\) \[|R(m)| \leq \frac{M}{m^{q}}. \tag{2.14}\] 

 

Proof

If (2.13) holds, then (assuming, as we may, that \(0)

 

\[\frac{1}{n^{2}} \sum_{k=1}^{n} k C_{k} \leq \frac{M}{n^{2}} \sum_{k=1}^{n} k^{1-q} \leq \frac{M}{n^{2}} \int_{1}^{n+1} x^{1-q} d x \leq \frac{4 M}{2-q} \frac{1}{n^{q}}. \tag{2.15}\] 

By (2.15) and (2.9), it follows that for some constant \(M^{\prime}\) and \(q>0\) 

\[E\left[Z_{n}^{2}\right] \leq \frac{M^{\prime}}{n^{q}} \quad \text { for all integers } n. \tag{2.16}\] 

Choose now any integer \(r\) such that \(r>(1 / q)\) and define a sequence of random variables \(Z_{1}^{\prime}, Z_{2}^{\prime}, \ldots, Z_{m}^{\prime}\) by taking for \(Z_{m}^{\prime}\) the \(m^{r}\) th member of the sequence \(\left\{Z_{n}\right\}\) ; in symbols,

\[Z_{m}^{\prime}=Z_{m^{r}} \quad \text { for } m=1,2, \cdots. \tag{2.17}\] 

By (2.16), the sequence \(\left\{Z_{m}^{\prime}\right\}\) has a mean square satisfying

\[E\left[Z_{m}^{\prime 2}\right] \leq \frac{M^{\prime}}{m^{r q}}. \tag{2.18}\] 

If we sum (2.18) over all \(m\) , we obtain a convergent series, since \(r q>1\) :

\[\sum_{m=1}^{\infty} E\left[Z_{m}^{\prime 2}\right] \leq M^{\prime} \sum_{m=1}^{\infty} m^{-r q}<\infty. \tag{2.19}\] 

Therefore, by theorem 1A, it follows that

\[P\left[\lim _{m \rightarrow \infty} Z_{m}^{\prime}=0\right]=P\left[\lim _{m \rightarrow \infty} Z_{m^{r}}=0\right]=1. \tag{2.20}\] 

We have thus shown that a properly selected subsequence \(\left\{Z_{m^{r}}\right\}\) of the sequence \(\left\{Z_{n}\right\}\) converges to 0 with probability one. We complete the proof of theorem \(2 \mathrm{~B}\) by showing that the members of the sequence \(\left\{Z_{n}\right\}\) , located between successive members of the subsequence \(\left\{Z_{m^{r}}\right\}\) , do not tend to be too different from the members of the subsequence. More precisely, define

\begin{align} U_{m} & =\max _{m^{r} \leq n<(m+1)^{r}}\left|Z_{m^{r}}-\frac{n}{m^{r}} Z_{n}\right| \\ V_{m} & =\max _{m^{r} \leq n<(n+1)^{r}}\left|Z_{n}-\frac{n}{m^{r}} Z_{n}\right| \\ W_{m} & =\max _{m^{r} \leq n<(m+1)^{r}}\left|Z_{m^{r}}-Z_{n}\right|. \end{align} 

We claim it is clear, in view of (2.20), that to show that \(P\left[\lim _{n \rightarrow \infty} Z_{n}=0\right]=1\) it suffices to show that \(P\left[\lim _{m \rightarrow \infty} W_{m}=0\right]=1\) . Consequently, to complete the proof it suffices to show that

\[P\left[\lim _{m \rightarrow \infty} U_{m}=0\right]=P\left[\lim _{m \rightarrow \infty} V_{m}=0\right]=1. \tag{2.22}\] 

In view of theorem 1A, to show that (2.22) holds, it suffices to show that

\[\sum_{m=1}^{\infty} E\left[U_{m}^{2}\right]<\infty, \quad \sum_{m=1}^{\infty} E\left[V_{m}^{2}\right]<\infty. \tag{2.23}\] 

We prove that (2.23) holds by showing that for some constants \(M_{U}\) and \(M_{V}\) 

\[\quad E^{1 / 2}\left[U_{m}^{2}\right] \leq \frac{M_{U}}{m}, \quad E^{1 / 2}\left[V_{m}^{2}\right] \leq \frac{M_{V}}{m} \quad \text{for all integers}\; m. \tag{2.24}\] 

To prove (2.24), we note that

\[\left|U_{m}\right| \leq \frac{1}{m^{r}} \sum_{k=m^{r}}^{(m+1)^{r}-1}\left|X_{k}\right| \tag{2.25}\] 

from which it follows that

\[E^{1 /}\left[U_{m}^{2}\right] \leq \frac{1}{m^{r}} \sum_{k=m^{r}}^{(m+1)^{r}-1} E^{1 / 2}\left[X_{k}^{2}\right] \leq \frac{(m+1)^{r}-m^{r}}{m^{r}} M, \tag{2.26}\] 

in which we use \(M\) as a bound for \(E^{1 / 2}\left[X_{k}^{2}\right]\) . By a calculus argument, using the law of the mean, one may show that for \(r \geq 1\) and \(m \geq 1\) 

\[\left(1+\frac{1}{m}\right)^{r}-1 \leq \frac{1}{m} r 2^{r-1}. \tag{2.27}\] 

Consequently, (2.26) implies the first part of (2.24). Similarly,

\begin{align} \left|V_{m}\right| \leq\left|\left(1+\frac{1}{m}\right)^{r}-1\right| \frac{1}{m^{r}} \sum_{k=1}^{(m+1)^{r}-1}\left|X_{k}\right|, \tag{2.28} \\ E^{1 / 2}\left[V_{m}^{2}\right] \leq\left(\frac{1}{m} r 2^{r-1}\right) \frac{1}{m^{r}} \sum_{k=1}^{(m+1)^{r}-1} E^{1 / 2}\left[X_{k}^{2}\right], \tag{2.29} \end{align} 

from which one may infer the second part of (2.24). The.proof of theorem 2B is now complete.

Exercises

2.1. Random digits. Consider a discrete random variable \(X\) uniformly distributed over the numbers 0 to \(N-1\) for any integer \(N \geq 2\) ; that is, \(P[X=k]=1 / N\) if \(k=0,1,2, \ldots, N-1\) . Let \(\left\{X_{n}\right\}\) be a sequence of independent random variables identically distributed as \(X\) . For an integer \(k\) from 0 to \(N-1\) define \(F_{n}(k)\) as the fraction of the observations \(X_{1}, X_{2}, \ldots, X_{n}\) equal to \(k\) . Prove that

\[P\left[\lim _{n \rightarrow \infty} F_{n}(k)=\frac{1}{N}\right]=1.\] 

2.2. The distribution of digits in the decimal expansion of a random number. Let \(Y\) be a number chosen at random from the unit interval (that is, \(Y\) is a random variable uniformly distributed over the interval 0 to 1). Let \(X_{1}, X_{2}, \ldots\) be the successive digits in the decimal expansion of \(Y\) ; that is,

\[Y=\frac{X_{1}}{10}+\frac{X_{2}}{10^{2}}+\cdots+\frac{X_{n}}{10^{n}}+\cdots.\] 

Prove that the random variables \(X_{1}, X_{2}, \ldots\) are independent discrete random variables uniformly distributed over the integers 0 to 9. Consequently, conclude that for any integer \(k\) (say, the integer 7) the relative frequency of occurrence of \(k\) in the decimal expansion of any number \(Y\) in the unit interval is equal to \(\frac{1}{10}\) for all numbers \(Y\) , except a set of numbers \(Y\) constituting a set of probability zero. Does the fact that only 3’s occur in the decimal expansion of \(\frac{1}{3}\) contradict the assertion?

2.3. Convergence of the sample distribution function and the sample characteristic function of dependent random variables . Let \(X_{1}, X_{2}, \ldots, X_{n}\) be a sequence of random variables identically distributed as a random variable \(X\) . The sample distribution function \(F_{n}(y)\) is defined as the fraction of observations among \(X_{1}, X_{2}, \ldots, X_{n}\) which are less than or equal to \(y\) . The sample characteristic function \(\phi_{n}(u)\) is defined by

\[\phi_{n}(u)=M_{n}\left[e^{i u x}\right]=\frac{1}{n} \sum_{k=1}^{n} e^{iu X_{k}}.\] 

Show that \(F_{n}(y)\) converges in quadratic mean to \(F_{X}(y)=P[X \leq y]\) , as \(n \rightarrow \infty\) , if and only if

\[\frac{1}{n} \sum_{k=1}^{n} P\left[X_{k} \leq y, X_{n} \leq y\right] \rightarrow\left|F_{X}(y)\right|^{2}. \tag{2.30}\] 

Show that \(\phi_{n}(u)\) converges in quadratic mean to \(\phi_{X}(u)=E\left[e^{i u x}\right]\) if and only if

\[C_{n}=\frac{1}{n} \sum_{k=1}^{n} E\left[e^{i u\left(X_{k}-X_{n}\right)}\right] \rightarrow\left|\phi_{X}(u)\right|^{2}. \tag{2.31}\] 

Prove that (2.30) and (2.31) hold if the random variables \(X_{1}, X_{2}, \ldots\) are independent.

2.4. The law of large numbers does not hold for Cauchy distributed random variables. Let \(X_{1}, X_{2}, \ldots, X_{n}\) be a sequence of independent identically distributed random variables with probability density functions \(f_{X_{n}}(x)=\) \(\left[\pi\left(1+x^{2}\right)\right]^{-1}\) . Show that no finite constant \(m\) exists to which the sample means \(\left(X_{1}+\cdots+X_{n}\right) / n\) converge in probability.

2.5. Let \(\left\{X_{n}\right\}\) be a sequence of independent random variables identically distributed as a random variable \(X\) with finite mean. Show that for any bounded continuous function \(f(\cdot)\) of a real variable \(t\) 

\[\lim _{n \rightarrow \infty} E\left[f\left(\frac{X_{1}+\cdots+X_{n}}{n}\right)\right]=f(E[X]). \tag{2.32}\] 

Consequently, conclude that

\begin{align} & \lim _{n \rightarrow \infty} \int_{0}^{1} \cdots \int_{0}^{1} f\left(\frac{x_{1}+\cdots+x_{n}}{n}\right) d x_{1} \cdots d x_{n}=f\left(\frac{1}{2}\right) \tag{2.33} \\ & \lim _{n \rightarrow \infty} \sum_{k=0}^{n} f\left(\frac{k}{n}\right)\left(\begin{array}{l} n \\ k \end{array}\right) t^{k}(1-t)^{n-k}=f(t), \quad 0 \leq t \leq 1. \tag{2.34} \end{align} 

2.6. A probabilistic proof of Weierstrass’ theorem: Extend (2.34) to show that to any continuous function \(f(\cdot)\) on the interval \(0 \leq t \leq 1\) there exists a sequence of polynomials \(P_{n}(t)\) such that \(\lim_{n \rightarrow \infty} P_{n}(t)=f(t)\) uniformly on \(0 \leq t \leq 1\) .