Consider a sequence of jointly distributed random variables \(Z_{1}, Z_{2}, \ldots\) , \(Z_{n}\) defined on the same probability space \(S\) on which a probability function \(P[\cdot]\) has been defined. Let \(Z\) be another random variable defined on the same probability space. The notion of the convergence of the sequence of random variables \(Z_{n}\) to the random variable \(Z\) can be defined in several ways.
We consider first the notion of convergence with probability one . We say that \(Z_{n}\) converges to \(Z\) with probability one if \(P\left[\lim_{n \rightarrow \infty} Z_{n}=Z\right]=1\) or, in words, if for almost all members \(s\) of the probability space \(S\) on which the random variables are defined \(\lim _{n \rightarrow \infty} Z_{n}(s)=Z(s)\) . To prove that a sequence of random variables \(Z_{n}\) converges with probability one is often technically a difficult problem. Consequently, two other types of convergence of random variables, called, respectively, convergence in mean square and convergence in probability , have been introduced in probability theory. These modes of convergence are simpler to deal with than convergence with probability one and at the same time are conceptually similar to it.
The sequence \(Z_{1}, Z_{2}, \ldots, Z_{n}\) is said to converge in mean square to the random variable \(Z\) , denoted \(l.i.m. _{n \rightarrow \infty} Z_{n}=Z\) if \(\lim_{n \rightarrow \infty} E\left[(Z_n - Z)^2\right] = 0\) or, in words, if the mean square difference between \(Z_{n}\) and \(Z\) tends to 0.
The sequence \(Z_{1}, Z_{2}, \ldots, Z_{n}\) is said to converge in probability to the random variable \(Z\) , denoted \(\operatorname{plim}_{n \rightarrow \infty} Z_{n}=Z\) if for every positive number \(\epsilon\) \[\lim _{n \rightarrow \infty} P\left[\left|Z_{n}-Z\right|>\epsilon\right]=0. \tag{1.1}\] Equation (1.1) may be expressed in words: for any fixed difference \(\epsilon\) the probability of the event that \(Z_{n}\) and \(Z\) differ by more than \(\epsilon\) becomes arbitrarily close to 0 as \(n\) tends to infinity.
Convergence in probability derives its importance from the fact that, like convergence with probability one, no moments need exist before it can be considered, as is the case with convergence in mean square. It is immediate that if convergence in mean square holds then so does convergence in probability; one need only consider the following form of Chebyshev’s inequality: for any \(\epsilon>0\)
\[P\left[\left|Z_{n}-Z\right|>\epsilon\right] \leq \frac{1}{\epsilon^{2}} E\left[\left|Z_{n}-Z\right|^{2}\right]. \tag{1.2}\]
The relation that exists between convergence with probability one and convergence in probability is best understood by considering the following characterization of convergence with probability one, which we state without proof. Let \(Z_{1}, Z_{2}, \ldots, Z_{n}\) be a sequence of jointly distributed random variables; \(Z_{n}\) converges to the random variable \(Z\) with probability one if and only if for every \(\epsilon>0\)
\[\lim _{N \rightarrow \infty} P\left[\left(\sup _{n \geq N}\left|Z_{n}-Z\right|\right)>\epsilon\right]=0. \tag{1.3}\]
On the other hand, the sequence \(\left\{Z_{n}\right\}\) converges to \(Z\) in probability if and only if for every \(\epsilon>0\) (1.1) holds. Now, it is clear that if \(\left|Z_{N}-Z\right|>\epsilon\) , then \(\sup _{n \geq N}\left|Z_{n}-Z\right|>\epsilon\) . Consequently,
\[P\left[\left|Z_{N}-Z\right|>\epsilon\right] \leq P\left[\sup _{n \geq N}\left|Z_{n}-Z\right|>\epsilon\right],\]
and (1.3) implies (1.1). Thus, if \(Z_{n}\) converges to \(Z\) with probability one, it converges to \(Z\) in probability.
Convergence with probability one of the sequence \(\left\{Z_{n}\right\}\) to \(Z\) implies that one can make a probability statement simultaneously about all but a finite number of members of the sequence \(\left\{Z_{n}\right\}\) : given any positive numbers \(\epsilon\) and \(\delta\) , an integer \(N\) exists such that
\[P\left[\left|Z_{N}-Z\right|<\epsilon,\left|Z_{N+1}-Z\right|<\epsilon,\left|Z_{N+2}-Z\right|<\epsilon, \ldots\right]>1-\delta. \tag{1.4}\]
On the other hand, convergence in probability of the sequence \(\left\{Z_{n}\right\}\) to \(Z\) implies only that one can make simultaneous probability statements about each of all but a finite number of members of the sequence \(\left\{Z_{n}\right\}\) : given any positive numbers \(\epsilon\) and \(\delta\) an integer \(N\) exists such that \begin{align} P\left[\left|Z_{N}-Z\right|<\epsilon\right]>1-\delta, \quad P\left[\left|Z_{N+1}-Z\right|<\epsilon\right]>1-\delta, \tag{1.5} \\[3mm] P\left[\left|Z_{N+2}-Z\right|<\epsilon\right]>1-\delta, \cdots. \end{align} One thus sees that convergence in probability is implied by both convergence with probability one and by convergence in mean square. However, without additional conditions, convergence in probability implies neither convergence in mean square nor convergence with probability one. Further, convergence with probability one neither implies nor is implied by convergence in mean square.
The following theorem gives a condition under which convergence in mean square implies convergence with probability one.
Theorem 1A. If a sequence \(Z_{n}\) converges in mean square to 0 in such a way that \[\sum_{n=1}^{\infty} E\left[Z_{n}^{2}\right]<\infty, \tag{1.6}\] then it follows that \(Z_{n}\) converges to 0 with probability one.
Proof
From (1.6) it follows that \[E\left[\sum_{n=1}^{\infty} Z_{n}^{2}\right]=\sum_{n=1}^{\infty} E\left[Z_{n}^{2}\right]<\infty, \tag{1.7}\] since it may be shown that for an infinite series of nonnegative summands the expectation of the sum is equal to the sum of the expectations. Next, from the fact that the infinite series \(\sum_{n=1}^{\infty} Z_{n}^{2}\) has finite mean it follows that it is finite with probability one; in symbols, \[P\left[0 \leq \sum_{n=1}^{\infty} Z_{n}^{2}<\infty\right]=1. \tag{1.8}\] If an infinite series converges, then its general term tends to 0. Therefore, from (1.8) it follows that \[P\left[\lim _{n \rightarrow \infty} Z_{n}=0\right]=1. \tag{1.9}\] The proof of theorem 1A is complete. Although the proof of theorem 1A is completely rigorous, it requires for its justification two basic facts of the theory of integration over probability spaces that have not been established in this book.