Limits of Linear Transformations
Table of Contents
1. Convergence of vectors
In this chapter we shall indicate the various possibilities for defining the concept of limit for sets of vectors or linear transformations, and we shall give a few of the many possible applications of these concepts.
If \(\mathcal{U}\) is an \(N\)-dimensional inner product space, the two fundamental quantitive concepts—inner product, and length—suggest the following two definitions of convergence in \(\mathcal{U}\).
Definition 1. We say that a sequence, \(x_{n}\), of vectors converges to the vector \(x\) in the strong sense if \(|x_{n}-x| \to 0\) as \(n \to \infty\); and \(x_{n} \to x\) in the weak sense if \(\langle x_{n}, y \rangle \to \langle x, y \rangle\) for every \(y\).
It turns out that this distinction is merely pedantic in our finite dimensional case: the two notions of convergence are equivalent. For if \(x_{n} \stackrel{(s)}{\to} x\) (i.e., \(x_{n}\) converges strongly to \(x\)) then \[ \begin{aligned} |\langle x_{n}, y \rangle - \langle x, y \rangle| &= |\langle x_{n}-x, y \rangle|\\ &\leq |x_{n}-x|\cdot|y|\\ &\to 0. \end{aligned} \] Conversely if \(x_{n} \stackrel{(w)}{\to} x\), and if \(z_{1}, \ldots, z_{N}\) is an orthogonal basis in \(\mathcal{U}\), \(x_{n}=\sum_i a_{i}^{n} z_{i}\), \(x=\sum_i a_{i} z_{i}\), then \(a_{i}^{n} = \langle x_{n}, z_{i} \rangle\) converges, as \(n \to \infty\), to \(a_{i} = \langle x, z_{i} \rangle\), for each \(i=1, \ldots, N\). Consequently
\[ \begin{aligned} |x_{n}-x|^{2} &= \Big|\sum_i (a_{i}^{n}-a_{i}) z_{i}\Big|^{2}\\ &= \sum_i |a_{i}^{n}-a_{i}|^{2}\\ &\to 0 \end{aligned} \] as \(n \to \infty\), as was to be proved.
Concerning the convergence (in either of the two equivalent senses) of vectors, we shall use without proof the following facts. The expression \(ax + by\) is a continuous function of all its arguments simultaneously, i.e., if \(a_{n}\) and \(b_{n}\) are sequences of complex numbers and \(x_{n}\) and \(y_{n}\) vectors, then \[ a_{n} \to a, \quad b_{n} \to b, \quad x_{n} \to x, \quad y_{n} \to y \] implies that \[ a_{n} x_{n} + b_{n} y_{n} \to a x+b y. \]
If \(z_{i}\) is an orthogonal basis in \(\mathcal{U}\), \(x_{n}=\sum_i a_{i}^{n} z_{i}\), \(x=\sum_i a_{i} z_{i}\), then a necessary and sufficient condition for the convergence of \(x_{n}\) to \(x\) is that \(a_{i}^{n} \to a_{i}\) for each \(i\). (Thus the notion of convergence here defined coincides with the usual one in \(N\)-dimensional complex Euclidean space.) Finally we shall assume as known the fact that the metric topology defined by the notion of convergence is complete: i.e., if \(x_{n}\) is a sequence of vectors for which \(|x_{n}-x_{m}| \to 0\) as \(n, m \to \infty\), then there is a unique vector \(x\) such that \(x_{n} \to x\) as \(n \to \infty\).
2. Convergence of linear transformations
The notion of convergence for linear transformations is apparently more complicated than that for vectors. Using the ideas of the preceding section, and remembering our discussion of the norm, \(|A|\), of a linear transformation \(A\), we are led to consider the following three definitions for the convergence of a sequence \(A_{n}\) of linear transformations to the linear transformation \(A\).
Definition 2. \(A_{n} \to A\) in the uniform sense if \(|A_{n}-A| \to 0\); in the strong sense if \(A_{n} x \to A x\) in the strong sense for every vector \(x\); and in the weak sense if \(A_{n} x \to A x\) in the weak sense.
Once more the distinctions are illusory: we shall prove that the three notions of convergence coincide. That weak and strong convergence are equivalent follows already from the results of the preceding section. If \(A_{n} \stackrel{(u)}{\to} A\), then for every \(x\) we have
\[ \begin{aligned} |A_{n} x-A x| &= |(A_{n}-A) x|\\ &\leq |A_{n}-A| \cdot |x|\\ &\to 0, \end{aligned} \]
so that uniform convergence implies strong convergence. To prove the converse, let \(x_{1}, \ldots, x_{N}\) be an orthogonal basis in \(\mathcal{U}\), let \(x=\sum_i a_{i} x_{i}\) be an arbitrary vector, and suppose that \(A_{n} \stackrel{(s)}{\to} A\). Then for any \(\varepsilon > 0\) we may find an \(n_{0}\) such that for \(n \geq n_{0}\) and \(i=1, \ldots, N\) we have \(|A_{n} x_{i}-A x_{i}|<\varepsilon\). It follows that \[ \begin{aligned} |(A_{n}-A) x| &= \bigg|\sum_i a_{i}({A}_{n}-A) x_{i}\bigg|\\ &\leq \sum_i |a_{i}| \cdot |(A_{n}-A) x_{i}|\\ &\leq \varepsilon N|x|, \end{aligned} \] so that for \(n \geq n_{0}\), \(|A_{n} - A| \leq \varepsilon N\). This completes the proof of the equivalence of the three concepts, and yields also an easy proof of the fact that again our topology is complete. If \(|A_{n}-A_{m}| \to 0\) as \(n, m \to \infty\), then for each \(x\), \(|A_{n} x-A_{m} x| \to 0\). Hence to each \(x\) there corresponds a vector, which we may designate by \(A x\), such that \(A_{n} x \to A x\). It is easy to verify that the correspondence from \(x\) to \(A x\) is given by a linear transformation \(A\) and that \(A_{n} \to A\) in the strong sense. Consequently \(A_{n} \to A\) in the uniform sense, and this proves our assertion concerning completeness.
If \(x_{1}, \ldots, x_{N}\) is an orthogonal basis in which the matrices of \(A_{n}\) and \(A\) are \((a_{ij}^{n})\) and \((a_{ij})\) respectively, then a necessary and sufficient condition for the convergence of \(A_{n} \to A\) (in any one of the three equivalent senses given above) is that \(a_{ij}^{n} \to a_{ij}\) as \(n \to \infty\) for each \(i, j=1, \ldots, N\). This follows from the fact that \(a_{ij}^{n} = \langle A_{n} x_{j}, x_{i} \rangle\) and the definition of weak convergence. Consequently the concepts of convergence we discussed reduce to the usual concept in \({N}^{2}\) dimensional complex or \(2 {N}^{2}\) dimensional real Euclidean space. It would be easy to use this fact in the proof of several statements given below: we prefer a slightly more circuitous route, which sheds more light on the concept of the norm of a transformation.
3. Inequalities on norms
In this section we fill a gap in our discussion of norms: our purpose is to prove the following four relations: \[ \begin{aligned} |A+B| &\leq |A|+|B|,\\ |A B| &\leq |A||B|,\\ |a A| &= |a||A|,\\ |A^{*}| &= |A|. \end{aligned} \]
All of the proofs are easy from the definition: we sketch the proofs for the product and the dual. We have \[ \begin{aligned} |(A B) x| &= |A(B x)|\\ &\leq |A||B x|\\ &\leq |A||B||x|. \end{aligned} \]
For duals we have \[ \begin{aligned} |\langle A x, y \rangle| &= |\langle x, A^{*} y \rangle|\\ &= |\langle A^{*} y, x \rangle|\\ &\leq |A^{*} y| \cdot |x|\\ &\leq |A^{*}||x||y|. \end{aligned} \] It follows that \(|A| \leq |A^{*}|\); by symmetry we obtain the desired result.
4. The continuity of the ring operations
\(Ax\) is a continuous function of \(x\); \(|A|\) is a continuous function of \(A\); \(A+B\), \(A B\), \(a A\), and \(A^{*}\) are continuous functions of all their arguments. Of these statements we shall prove only the ones concerning \(|A|\), \(A B\), and \(A^{*}\): the others are proved by very similar methods.
4.1. From the relations \(|A_{n}| \leq |A_{n}-{A}|+|A|\) and \(|A| \leq |A-A_{n}| + |A_{n}|\) we conclude that the absolute value of the difference \(|A_{n}|-|A|\) is dominated by \(|A_{n}-A|\). Since \(A_{n} \to A\) means \(|A_{n}-A| \to 0\), we see that the convergence of \(A_{n}\) to \(A\) implies \(|A_{n}| \to |A|\).
In particular we note that \(A_{n} \to A\) implies that \(|A_{n}|\) remains bounded.
4.2. If \(A_{n} \to A\), \(B_{n} \to B\), then
\[ \begin{aligned} |A_{n} B_{n}-A B| &= |A_{n} B_{n}-A_{n} B+A_{n} B-A B|\\ &\leq |A_{n}| \cdot |B_{n}-B| + |B| \cdot |A_{n}-A|. \end{aligned} \]
Since \(A_{n}\) and \(B\) are bounded and the other factors converge to zero, we see that \(A_{n} B_{n} \to A B\).
4.3. If \(A_{n} \to A\) then for each \(x\) and \(y\), \(\langle A_{n} x, y \rangle = \langle x, A_{n}^{*} y \rangle\) converges to \(\langle A x, y \rangle = \langle x, A^{*} y \rangle\). Hence \(A_{n}^{*}\) converges to \(A^{*}\) in the weak and therefore in the uniform sense.
We shall now prove a few theorems about limits of special linear transformations by way of illustration of the general theory.
5. The mean ergodic theorem
Theorem 1. If \(U\) is a unitary transformation and \(M\) is a subspace of all vectors \(x\) for which \(Ux=x\), then \[ \frac{1}{n}(U+U^{2}+\cdots+U^{n}) \to E=P_{M}, \]
where \(P_{M}\) is the projection on \(M\).
Proof. For let \(U=\sum_i u_{i} E_{i}\) be the spectral form of \(U\); we have \(|u_{i}|=1\) and \(U^{k}=\sum_i u_{i}^{k} E_{i}\). Moreover \(E\) is equal to one of the \(E_{i}\): that one for which \(u_{i}=1\). (we do not exclude the possibility that no \(u_{i}=1\), so that \(M=\{0\}\).) Hence we have \[ \frac{1}{n}(U+\cdots+U^{n})=\sum_i \frac{1}{n}(u_{i}+\cdots+u_{i}^{n}) E_{i}, \] and it is sufficient to prove that the numbers \(\frac{1}{n}(u_i + \cdots + u_i^n)\) converges to \(1\) or \(0\) according as \(u_i = 1\) or \(u_i \neq 1\). In case \(u_{i}=1\) this is obvious; In any other case we have \[ \begin{aligned} \bigg|\frac{1}{n}(u_{i}+\cdots+u_{i}^{n})\bigg| &= \bigg|\frac{1}{n} \, \frac{u_{i}^{n+1}-u_{i}}{u_{i}-1}\bigg|\\ &\leq \frac{2}{n|u_{i}-1|}\\ &\to 0. \end{aligned} \]■
6. Markoff matrices
To give our next application of the notion of convergence we forsake the theory of linear transformations and consider matrices, and we shall understand by the convergence of a sequence \((a_{ij}^n)\) of matrices to \((a_{ij})\) the numercial convergence \(a_{ij}^n \to a_{ij}\) for each \(i\), \(j\).
Definition 3. A Markoff matrix is a matrix \(P = (p_{ij})\) for which \(p_{ij} \geq 0\) and \(\sum_{j} p_{ij}=1\).
Concerning Markoff matrices we need the following two lemmas.
Lemma 1. The eigenvalues of a Markoff matrix are in absolute value \(\leq 1\).
Proof. Since the eigenvalues of \((p_{ij})\) and of \((p_{j i})\) are the same, it is sufficient to prove that \(\sum_i p_{ij} \xi_{i} = \lambda \xi_j\) implies \(|\lambda| \leq 1\). We have \[ \sum_i p_{ij}|\xi_{i}| \geq |\lambda| \cdot |\xi_{j}| \] and therefore \[ \sum_{j} \sum_i p_{ij}|\xi_{i}| \geq |\lambda| \sum_{j}|\xi_{j}|. \] Since \(\sum_{j} p_{ij}=1\), we have \[ \sum_i |\xi_{i}| \geq |\lambda| \cdot \sum_{j} |\xi_{j}| \] and from this the desired conclusion is immediate.■
Lemma 2. The product of two Markoff matrices \((p_{ij})\) and \((q_{i {j}})\), is a Markoff matrix.
Proof. Write \(r_{ij}=\sum_{k} p_{i k} q_{k j}\); it is clear that \(r_{ij} \geq 0\). Also
\[ \sum_j r_{ij} = \sum_k p_{ik} \sum_j q_{kj}=\sum_{k} p_{i k}=1, \]
so that \((r_{ij})\) is a Markoff matrix.■
As a consequence of this lemma we obtain the fact that the totality of all elements in the infinite sequence \(P, P^{2}, P^{3}, \ldots\) is bounded, and therefore that the same is true of the sequence of powers of \(SPS^{-1}\), where \(S\) is an arbitrary non-singular matrix. The last statement follow from the fact that the elements of \((S^{-1})^{n}\) are linear combinations, with coefficients which are elements of \(S\) and of \(S^{-1}\), of the elements of \(P^{n}\).
7. Limits of Markoff matrices
Our main theorem concerning Markoff matrices is the following:
Theorem 2. For an arbitrary Markoff matrix \(P\), \((P+P^{2}+\cdots+P^{n}) / n\) converges to a limit as \(n \to \infty\); a necessary and sufficient condition that not only the averages, but also the powers \(P^n\) themselves converge to a limit is that \(\lambda=1\) be the only eigenvalue of \(P\) of absolute value one.
Proof. To prove this result it is sufficient to prove it not for \(P\) but for \(SPS^{-1}\), with a suitably chosen \(S\). Since we may choose \(S\) so that \(SPS^{-1}\) is in the classical (Jordan) canonical form, we may restrict our attention to matrices of the form \(A=\lambda 1 + B\), where \[ B=\begin{bmatrix} 0 & 1 & 0 & \cdots & 0 \\ 0 & 0 & 1 & \cdots & 0 \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & 0 & \cdots & 1 \\ 0 & 0 & 0 & \cdots & 0 \end{bmatrix}. \]
In this case we have \[ A^{n}=(\lambda{1}+B)^{n}=\sum_{i=0}^{n}\binom{n}{i} \lambda^{n-i} B^{i}. \] Hence, the general non-vanishing element in \(A^{n}\) is a number of the form \(\binom{n}{i} \lambda^{n-i}\), where \(i\) is a fixed positive integer depending only on the place of the element and independent of \(n\).
Since \(\lambda\) is an eigenvalue of a Markoff matrix we know that \(|\lambda| \leq 1\). If \(|\lambda|<1\) we have \(\big|\binom{n}{i} \lambda^{n-i}\big| \leq Kn^{i}|\lambda|^{n}\) for a suitable constant \({K}\), and therefore in this case not only the averages of the elements of \(A, A^{2}, \ldots, A^{n}\), but even the elements themselves converge to zero.
If, however, \(|\lambda|=1\), then \(\big|\binom{n}{i} \lambda^{n-i}\big|=\big|\binom{n}{i}\big| \to \infty\), so that in this case the elements of \(A^{n}\) do not remain bounded. Since \(A\) is a transform of a Markoff matrix, this is impossible: the only possible conclusion is that the disturbing super diagonal \(1\)’s do not enter in this case, so that the boxes of the classical canonical form which correspond to eigenvalues of absolute value one are of size \(1\). In this case therefore the problem is reduced to the convergence properties of \(\lambda^{n}\), where \(|\lambda|=1\): we have already seen that the averages of this sequence always converge, to zero or one according as \(\lambda \neq 1\) or \(\lambda=1\), and that the sequence itself converges if and only if \(\lambda \neq 1\).
This completes the proof of all our assertions: we see even that since the limit, after transforming by \(S\), is a diagonal matrix whose diagonal elements consist only of zeros and ones, the limit is in general idempotent.■
8. The exponential function
Our final example of limits is furnished by \(\exp (A)\), where \(A\) is an arbitrary linear transformation in the inner product space.
Consider the transformations \(B_{n}=\sum_{i=0}^{n} \frac{1}{i!} A^{i}\); let \(k\) be the norm, \(|A|\), of the transformation \(A\). Since we have
\[ |B_{n}-B_{m}|=\bigg|\sum_{i=m+1}^{n} \frac{1}{i!} A^{i}\bigg| \leq \sum_{i=m+1}^{n} \frac{1}{i!} k^{i}, \]
and since the dominant on the right, being a part of the power series for \(\exp(k)\), converges to zero as \(n, m \to \infty\), it follows that the series \(\sum_{i=0} \frac{1}{i!} A^{i}\) converges, or, in other words that the \(B_{n}\) have a limit. This limit we define to be \(\exp (A)\): we shall mention some of the elementary properties of this exponential function.
Since we may assume that \(A\) is a matrix in upper triangular form, so that the diagonal elements are the eigenvalues of \(A\), each occurring with its proper multiplicity, it follows that the eigenvalues of \(\exp(A)\) are the exponentials of those of \(A\). From the consideration of the upper triangular form it follows also that the determinant of \(\exp (A)\), i.e., \(\prod_{i=1}^{n} \exp(\lambda_{i})\), where \(\lambda_{1}, \ldots, \lambda_{n}\) are the diagonal elements of the upper triangular form of \(A\), is the same as \(\exp (\lambda_{1}+\cdots+\lambda_{n})\), i.e., the exponential of the trace of \(A\). This shows incidentally that \(\exp (A)\) is never a singular transformation.
We conclude by studying the multiplication rule for exponentials. We suppose that \(A\) and \(B\) are commutative transformations. Since \[ \exp (A+B)-\exp (A) \exp (B) \] is the limit of the expression \[ \begin{aligned} &\sum_{p=0}^{n} \frac{1}{p!}(A+B)^{p} - \sum_{q=0}^{n} \frac{1}{q!} A^{q} \, \sum_{r=0}^{n} \frac{1}{r!} B^{r}\\ =& \sum_{p=0}^{n} \frac{1}{p!} \sum_{s=0}^{p} \binom{p}{s} A^{s} B^{p-s} - \sum_{q=0}^{n} \sum_{r=0}^{n} \frac{1}{q!r!} A^{q} B^{r}, \end{aligned} \] we will have proved the multiplication rule when we have proved that this expression converges to zero as \(n \to \infty\). An easy verification yields the fact that for \(q+r \leq n\), \(A^{q} B^r\) occur in both members with equal coefficients; the terms that do not cancel out are all in the right member and are equal to \[ \sum_{q} \sum_{r} \frac{1}{q!r!} A_{q} B^{r}, \quad q+r>n, \] the summation being extended over those values of \(q, r \leq n\) for which \(q+r>n\). \(q+r>n\) implies that at least one of \(q\) and \(r\) is greater than the integer part of \({n} / 2\), the bound of this remainder is dominated by
\[ \begin{aligned} &\sum_{q=0}^{\infty} \sum_{r=n / 2}^{\infty} \frac{1}{q!r!}|A|^{q}|B|^{r}+\sum_{r=0}^{\infty} \sum_{q=n / 2}^{\infty} \frac{1}{q!r!}|A|^{q}|B|^{r} \\ =& \Big(\sum_{q=0}^{\infty} \frac{1}{q!}|A|^{q}\Big) \Big(\sum_{r=n / 2}^{\infty} \frac{1}{r!}|B|^{r}\Big) + \Big(\sum_{r=0}^{\infty} \frac{1}{r!}|B|^{r}\Big) \Big(\sum_{q=n/2}^\infty \frac{1}{q!}|A|^{q}\Big) \\ =& \exp (|A|) \,b_{n}+\exp (|B|) \,a_{n}, \end{aligned} \] where \(a_n \to 0\) and \(b_n \to 0\) as \(n \to \infty\).