Spectral Theory of Normal Transformations

1. Determinants
2. Invariant linear subspaces
3. Determining invariant linear subspaces of an arbitrary transformation
4. Upper triangular and lower triangular forms of a matrix
5. Spectral representation of Hermitian transformations
6. Reduction to principal axes
7. Terminology of spectral theory
8. Linear combinations of orthogonal projections
9. Normal transformations and the Cartesian decomposition
10. Functions of normal transformations
11. The square root of a non-negative transformation
12. Product of non-negative transformations
13. Commutative Hermitian transformations
14. The correspondence \(U=\exp(2\pi A)\)
15. Cayley transform
16. Polar decomposition
17. Polar conditions for normality

1. Determinants

Throughout this chapter we shall work with a fixed inner product space \(\mathcal{U}\) of dimension \(n\) in which an inner product \(\langle x, y \rangle\) is defined. Before starting on the subject of the title we make a comment on determinants. We assume as known the definition of the determinant of a matrix and its elementary properties. If \(A\) is an arbitrary linear transformation and \([A]\) its matrix in any coordinate system then its matrix in any other coordinate system is \([U][A][U]^{-1}\), where \([U]\) is a unitary matrix. Hence, if we denote by \(D[A]\) the determinant of \([A]\), \(D[A]\) is independent of the coordinate system chosen and depends only on the linear transformation \(A\). It is therefore proper to speak of the determinant of a linear transformation \(A\): we shall denote it by \(D({A})\). In matricial language we shall make use of the following properties of determinants.

The Laplace expansion. In particular we shall use the fact that if in the first row of a matrix \([A]\) only the first element is different from zero then the determinant of \([A]\) is the product of this first element and the determinant of the \({n}-1\) rowed square matrix obtained from \([A]\) by deleting the first row and first column.
\(D[A]\) is a homogeneous polynomial of degree \(n\) in its elements \(a_{i j}\).
A necessary and sufficient condition that the system of equations \(\sum_{j} a_{i j} c_{j}=0\) have a solution different from \(c_{1}=c_{2}=\cdots=c_{n}=0\) is that \(D[A]=0\). In the language of linear transformations the last fact can be stated as follows: a necessary and sufficient condition that there exists a vector \({x} \neq 0\) for which \(A x=0\) is that \(D(A)=0\). Still in other words: \(A\) has an inverse if and only if \(D(A) \neq 0\).

2. Invariant linear subspaces

All of the results of this chapter are based on the following very simple comment. If a linear subspace \({M}\) is invariant under the linear transformation \(A\), then \(M^{\perp}\) is invariant under \(A^{*}\). (\(M\) is said to be invariant under \(A\) if \(x\) in \(M\) implies \(A x\) in \(M\): in symbols \(AM \subseteq M\)). For if \(x\) is in \(M\) and \(y\) is in \(M^{\perp}\) then we have \(\langle x, A^{*} y \rangle = \langle A x, y \rangle = 0\), since \(A x\) is in \(M\). Since this is true for all \(x\) in \(M\), \(A^{*} y\) is in \(M^{\perp}\), as was to be proved.

3. Determining invariant linear subspaces of an arbitrary transformation

As a consequence of the result of (III.2) we have the following theorem.

Theorem 1. To any linear transformation \(A\) there correspond two sets of linear subspaces, \(M_{i}\) and \(N_{i}\), \(i=1, 2, \ldots, n\), each invariant under \(A\), such that the dimension of \(M_{i}\) is \(i\), the dimension of \(N_{i}\) is \(n-i+1\), \(M_{1} \subseteq M_{2} \subseteq \cdots \subseteq M_{n}\), and \(N_{1} \supseteq N_{2} \supseteq \cdots \supseteq N_{n}\). (In particular of course \(M_{n} = N_{1} = \mathcal{U}\).)

Proof. We proceed by induction on \(n\). For \(n=1\) the result is trivial; we assume that it has been proved for inner product spaces of dimension \(\leq {n}-1\).

Suppose now that \(\mathcal{U}\) is of dimension \(n\), and consider the expression \(D(\lambda)=D(A-\lambda 1)\) as a function of the complex variable \(\lambda\), where \(D(A)\) denotes the determinant of \(A\). It follows from the discussion in (III.1) that \(D(\lambda)\) is a polynomial in \(\lambda\) of degree \(n\). We apply the fundamental theorem of algebra to obtain the result that for a suitable value of \(\lambda\), say \(\lambda=\lambda_{0}\), \(D(\lambda_{0})=0\). It follows that there exists a vector \(x_{0} \neq 0\), such that \((A-\lambda_{0} 1) x_{0}=0\), or, in other words, \(A x_{0}=\lambda_{0} x_{0}\). We denote by \(M_{1}\) the one dimensional linear subspace of all vectors of the form \(ax_0\). Then \(M_{1}\) is invariant under \(A\); hence \(M_{1}^\perp\) is invariant under \(A^{*}\). \(M_{1}^\perp\) is an \(n-1\) dimensional inner product space and \(A^{*}\) is a linear transformation on this space. We may therefore use our induction hypothesis and conclude that we can find a set of linear subspaces \(N_{i}^{*}\), for \(i=2, \ldots, n\), each invariant under \(A^{*}\), \(N_{i}^{*}\) having dimension \(n-i+1\), and \(N_{2}^{*} \supseteq \cdots \supseteq {N}_{n}^{*}\). Returning to the entire \(n\)-dimensional space \(\mathcal{U}\) we define the subspaces \(M_{i}\) for \(i=2, \ldots, n\) by \(M_{i}=M_{1}+(N_{2}^{*} \cap N_{i+1}^{*\perp})\), where we make the convention that \({N}_{{n}+1}\) is the subspace consisting of \((0)\) alone. An application of (III.2) shows that \(M_{i}\) is invariant under \(A\), and it is clear that it has the required monotoneity and dimensionality properties. This proves the part of the theorem referring to the \({M}_{i}\): to prove the part concerned with the \({N}_{i}\), apply the result just proved to \(A^{*}\) to obtain the corresponding subspaces \({M}_{i}^{*}\), and set \({N}_{1}=\mathcal{U}\), \({N}_{i}=M_{i+1}^{*}\) for \(i = 2, \ldots, n\).■

4. Upper triangular and lower triangular forms of a matrix

The matricial interpretation of (III.3) is very useful. Let \(A\) be a linear transformation and find the subspaces \(M_{i}\) as described in (III.3). Let \(x_1\) be an arbitrary vector of length \(1\) in \(M_{1}\); for \(i=2, \ldots, n\) choose \(x_{i}\) to be an arbitrary vector of length \(1\) in \(M_{i} \cap M_{i-1}^\perp\). It is clear that the \(x_{i}\) form a complete orthonormal set: let us examine the matrix of \(A\) in the coordinate system of the \(x_{i}\). Since \(x_{j}\) lies in \(M_{j}\) and \(M_{j}\) is invariant under \(A\), \(A x_{j}\) will also lie in \(M_{j}\). But the vectors of \({M}_{j}\), being orthogonal to the \(x_{i}\) with \(i>{j}\), are characterized by the fact that in their expansion in terms of the \(x_{i}\) the coefficients of the \(x_{i}\) with \(i>j\) all vanish. In other words if \(A x_{j}=\sum_{i} a_{i j} x_{i}\), then for \(i>{j}\), \(a_{i j}=0\). The matrix \([A]\) in this coordinate system has the form \[ \begin{bmatrix} a_{11} & a_{12} & a_{13} & \cdots & a_{1 n} \\ 0 & a_{22} & a_{23} & \ldots & a_{2 n} \\ 0 & 0 & a_{33} & \ldots & a_{3 n} \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & 0 & \cdots & a_{n n} \end{bmatrix}. \]

We shall refer to this form as the upper triangular form of \(A\). Similarly, by considering the subspaces \({N}_{i}\) we can establish that in a suitable coordinate system the matrix of \(A\) becomes lower triangular: i.e., for \(i, \(a_{i j}=0\).

5. Spectral representation of Hermitian transformations

In the case of Hermitian transformations (III.2) implies that whenever \(M\) is invariant under \(A\) so is \(M^{\perp}\). Accordingly the results of (III.3) and (III.4) can be strengthened in this case, as follows.

Theorem 2. If \(A\) is a Hermitian transformation there exist pairwise orthogonal projections \(E_{1}, E_{2}, \ldots, E_{p}\) whose sum is \(1\) (i.e., \(E_{i} E_{j}=0\) if \(i \neq j\)) and pairwise different real numbers \(a_{1}, a_{2}, \ldots, a_{p}\) such that \(A=\sum_{i} a_{i} E_{i}\). This representation (the so-called spectral representation or spectral form of \(A\)) is unique: i.e., if \(A=\sum_{j} b_{j} F_{j}\) where the \(b_{j}\) and \(F_{j}\) satisfies the conditions placed on the \(a_{i}\) and \(E_{i}\), then the number of \(F\)’s is the same as the number of \(E\)’s and after a suitable permutation of the indices, \(E_{i}={F}_{i}\) and \(a_{i}=b_{i}\) for \(i = 1, 2, \ldots, p\).

(We note again that our original intuitive picture of Hermitian transformations as the analogs of real numbers is justified by the reality of the \(a_{i}\).)

Proof. We first prove that if for a Hermitian transformation \(A\) we have a vector \(x\) and a scalar \(\lambda\) such that \(A x=\lambda x\), \(x \neq 0\), then \(\lambda\) is real. This follows from the relation \(\langle A x, x \rangle = \langle \lambda x, x \rangle = \lambda|x|^{2}\), and the fact that both \(\langle A x, x \rangle\) and \(|x|^{2}\) are real.

We proceed again by induction on \({n}\). For \({n}=1\) we may take \({p}=1\) and \(E_{1}=1\). Suppose then that the theorem has been proved for all inner product spaces of dimension \(\leq n-1\), and consider, as in the proof of (III.3), a number \(\lambda\) and a vector \(x \neq 0\), such that \(A x=\lambda x\). Let \(M_1\) be the linear subspace of all vectors \(y\) for which \(Ay =\lambda y\); write \(a_{1}=\lambda\) and let \(E_{1}\) be the projection which projects on \(M_{1}\). We know already that \(a_1\) is real, and it is clear that \(M_{1}\) is invariant under \(A\). It follows that \(M_{1}^\perp\) is invariant under \(A\), and we may apply our induction hypothesis to the inner product space \(M_{1}^\perp\) (of dimension \(\leq {n}-1\)). Since for \(x\) in \(M_{1}^\perp\), \((A-a_{1} E_{1}) x=A x\), the result follows.

To prove uniqueness we observe first that if \(F_{j}\) are orthogonal projections of sum \(1\), \(b_{j}\) complex numbers, and \(x\) a vector \(\neq 0\), then \(\big(\sum_{j} b_{j} F_{j}\big) x=0\) implies that at least one \(b_{j}=0\). For otherwise we may consider the transformation \(\sum_{j} c_{j} F_{j}\), with \(c_{j}=b_{j}^{-1}\), and an easy computation shows that this transformation is the inverse of the original one.

Suppose then that for some complex number \(\lambda\) and vector \({x} \neq 0\) we had \(\sum_{i} a_{i} E_{i} x=\lambda x=\sum_{i} \lambda_{i} x\). Then by the preceding paragraph we should have \(\lambda=a_{i}\) for some \(i\). In other words the numbers \(a_{i}\) occurring in the spectral form are characterized by the fact that to each of them there corresponds an \(x \neq 0\) for which \(A x=a_{i} x\). To prove that the \(E_{i}\) are also determined by \(A\) we note that if \(Ax=a_{i} x\) and \(A y=a_{j} y\); \(i \neq j\), then \(x\) is orthogonal to \(y\). For \[ a_{i}\langle x, y \rangle = \langle A x, y \rangle = \langle x, A y \rangle = a_{j}\langle x, y \rangle, \] so that since \(a_{i} \neq a_{j}\), \(\langle x, y \rangle = 0\). Consequently if \(A x=a_{i}x\). then \(E_{j} x=0\) for all \(j \neq i\), so that \(E_{i}\) is characterized as the projection on precisely that linear subspace of vectors \(x\) for which \(A x=a_{i} x\).■

6. Reduction to principal axes

In the notation of the preceding section let \(A=\sum_{i} a_{i} E_{i}\), and let \({M}_{i}\) be the range of \({E}_{i}\) (i.e., \({M}_{i}\) is the subspace on which \({E}_{i}\) projects). Then the \(M_{i}\) are mutually orthogonal linear subspaces which span \(\mathcal{U}\). We may, accordingly, choose a complete orthonormal set in each \({M}_{i}\); the set of all vectors in all these orthonormal sets is a complete orthonormal set in \(\mathcal{U}\). If we denote this set by \(\{x_{i}\}\) and compute the matrix of \(A\) in this coordinate system, we observe that \(A x_{i}\) is for each \(i\) a scalar multiple of \(x_{i}\), so that in the matrix \([A]\) every element, except possibly those on the main diagonal, is zero. Matrices with this property are called diagonal matrices: our result in (III.5) is equivalent to the statement that to every Hermitian transformation \(A\) there corresponds a coordinate system in which the matrix \([A]\) is a diagonal matrix. This reduction of an arbitrary Hermitian matrix to diagonal form, or, equivalently, of the quadratic form \(\langle Ax, x \rangle\) to a sum of squares, is known as the reduction to principal axes.

7. Terminology of spectral theory

Before we proceed to a more detailed study of the spectral form of Hermitian transformations we introduce some of the usual terminology of the spectral theory.

Definition 1. For an arbitrary linear transformation \(A\) the polynomial \(D(\lambda)=D(A-\lambda 1)\) of degree \(n\) is called the characteristic polynomial of \(A\), and the equation \(D(\lambda)=0\) its characteristic equation. The roots of the characteristic equation are the eigenvalues of \(A\).

We have already made use of the fact that \(\lambda\) is an eigenvalue of \(A\) if and only if there exists a non-zero vector \(x\) for which \(A x=\lambda x\). The multiplicity of \(\lambda\) as a root of the equation \(D(\lambda)=0\) is called the multiplicity of the eigenvalue \(\lambda\). The set of \(n\) eigenvalues \(\lambda_{1}, \lambda_{2}, \ldots, \lambda_{n}\), with multiplicities counted properly, is the spectrum of \(A\).

8. Linear combinations of orthogonal projections

In order to gain a better insight into the properties of the spectral representation we shall, in this section, study a slightly more general form. We assume that \({E}_{1}, E_{2}, \ldots, {E}_{p}\) are pairwise orthogonal projections whose sum is \(1\), and we consider linear transformations of the form \(A = \sum_{i} a_{i} E_{i}\), where, however, we no longer require that \(A\) be Hermitian (so that the \(a_{i}\) need not be real), nor that the \(a_{i}\) be all different. We shall study a number of elementary properties of such linear transformations \(A\).

8.1. If \(A=\sum_{i} a_{i} E_{i}\), then \[ \begin{aligned} A^{2}&=\Big(\sum_{i} a_{i} E_{i}\Big)\Big(\sum_{j} a_{j} E_{j}\Big)\\ &=\sum_{i} \sum_{j} a_{i} a_{j} E_{i} E_{j}\\ &=\sum_{i} a_{i}^{2} E_{i}, \end{aligned} \] (since \(E_{i} E_{j}=E_{i}\) or \(0\) according as \(i=j\) or \(i \neq j\)). More generally if \(p(\lambda)\) is an arbitrary polynomial, \({p}(A)=\sum_{i} p(a_{i}) E_{i}\). If no \(a_{i}\) vanishes we prove similarly that \(\sum_{i} a_{i}^{-1} E_{i}\) is the inverse of \(A\); and the relation \[ \begin{aligned} \Big\langle \sum_{i} a_{i} E_{i} x, y \Big\rangle &= \sum_{i} a_{i}\langle E_{i} x, y \rangle\\ &= \sum_{i} a_{i}\langle x, E_{i} y \rangle\\ &= \Big\langle x, \sum_{i} \bar{a}_{i} E_{i} y \Big\rangle \end{aligned} \] shows that \(A^{*}=\sum_{i} \bar{a}_{i} E_{i}\). It follows immediately that \(A\) is Hermitian if and only if \(a_i\) is real, unitary if and only if \(|a_{i}|=1\), and idempotent if and only if \(a_{i}=0\) or \(1\): each of these conditions is supposed to hold for \(i = 1, 2, \ldots, {p}\).

8.2. For any \(x\) we have \[ \begin{aligned} \langle A x, x \rangle &= \Big\langle \sum_{i} a_{i} {E}_{i} x, x \Big\rangle\\ &= \sum_{i} a_{i}\langle E_{i} x, x \rangle\\ &= \sum_{i} a_{i}|E_i x|^{2}. \end{aligned} \]

It follows that if \(a_{i} \geq 0\) for \(i=1,2, \ldots, p\) then \(A\) is non-negative. Conversely if \(A\) is non-negative, we may choose \(x \neq 0\) so that \(E_{i} x=x\), and for this \(x\) we have \(\langle A x, x \rangle = a_{i}|x|^{2}\), so that \(a_{i} \geq 0\).

8.3. Denote by \(a\) the number \(a=\max (|a_{1}|,|a_{2}|, \ldots,|a_{p}|)\). We have, for any vector \(x\), \[ \begin{aligned} |A x|^{2} &= \Big\langle \sum_i a_i E_i x, \sum_{j} a_{j} E_{j} x \Big\rangle\\ &= \sum_i \sum_{j} a_{i} \bar{a}_{j}\langle E_{i} x, E_{j} x\rangle \\ &= \sum_i \sum_{j} a_{i} \bar{a}_{j}\langle x, E_{i} E_{j} x \rangle\\ &= \sum_i|a_{i}|^{2}|E_{i} x|^{2} \end{aligned} \] so that \(|A x|^{2} \leq a|x|^{2}\), or, in other words, \(|A| \leq a\). If the maximum \(a\) is attained by \(a_{i}\), i.e., \(|a_{i}|=a\), then by choosing \(x\) so that \({E}_{i} {x}={x}\), as above, it follows that \(|A|= a\). Similarly it is easy to show that in case \(A\) is Hermitian, so that the \(a_{i}\) are real, the upper bound and the lower bound of \(A\) are given, respectively, by \(\max (a_{1}, a_{2}, \ldots, a_{p})\) and \(\min (a_{1}, a_{2}, \ldots, a_{p})\).

8.4. In this last section we suppose that the \(a_{i}\) are all different and real, so that \(\sum_i a_i E_i\) is the spectral form of the Hermitian transformation \(A\). In this case we assert that a necessary and sufficient condition that \(A B=B A\) for a Hermitian \(B\) is that \(E_{i} B=B E_{i}\) for \(i = 1, 2, \ldots, p\). The condition is clearly sufficient: we shall merely prove its necessity. We recall that \(E_i\) is characterized by the fact that \(A x=a_{i} x\) is equivalent to \(E_{i} x=x\). Hence if \(A B=B A\) and \(E_{i} x=x\) then \[ A(B x)=B A x=B a_{i} x=a_{i} B x, \] so that \(E_{i} B x=B x\). Since for an arbitrary \(y\), \(x={E}_{i} y\) does have the property that \(E_{i} x=x\), we have \(E_{i} B E_{i} y={B}{E}_{i} y\) for all \(y\), or, in other words, \({E}_{i} {B}{E}_{i}={B}{E}_{i}\). Taking the dual of this relation and remembering that \(B\) is Hermitian yields the desired result.

All of the results of this section are immediately translatable into matrix language: thus (8.1) asserts that a diagonal matrix is Hermitian, unitary, or idempotent, if and only if the diagonal elements are real, of absolute value one, or all zero or one, respectively.

9. Normal transformations and the Cartesian decomposition

Let \(A\) be an arbitrary linear transformation and write \(B=(1 / 2)(A+A^{*})\), \(C=(1 / 2i)(A-A^{*})\). It is immediately verified that \(B\) and \(C\) are Hermitian and \(A=B+i C\). We shall refer to this representation as the Cartesian decomposition of \(A\) and we call \(B\) and \(C\) the real and imaginary parts of \(A\). It might be conjectured that since every Hermitian transformation has a spectral form and every transformation is a linear combination of Hermitian transformations, that every linear transformation has a spectral form, or, equivalently, that every linear transformation can be realized as a diagonal matrix. It is, however, not in general true that the coordinate system which reduces \([B]\) to diagonal form will also reduce \([C]\) to diagonal form, or conversely. In fact a necessary and sufficient condition that \(A\) be realizable as a diagonal matrix, or, equivalently, that there exist mutually orthogonal projections, \(E_i\), of sum one, such that \(A=\sum_i a_i E_i\) is that \(B C=C B\). (We remark that \(B\) and \(C\) are uniquely determined by \(A\), for if \(A=B+iC\) then \(A^{*}=B-iC\) and solving for \(B\) and \(C\) we obtain the original expressions for them.)

Proof of sufficiency. If \(B C=C B\), let \(B=\sum_i b_{i} F_{i}\) and \(C=\sum_{j} c_{j} G_{j}\) be the spectral representations of \(B\) and \(C\) respectively. From (III.8.4) it follows that \(F_{i}\) and \(G_{j}\) are commutative and therefore that \(F_{i} G_{j}\) is a projection. Since, moreover, \(F_{i} G_{j} F_{r} G_{s}=F_{i} F_{r} G_{j} G_{s}=0\) unless \(i=r\) and \(j=s\), the \(F_{i} G_{j}\) are pairwise orthogonal projections whose sum is one, with the additional property that \(\sum_j F_{i} G_{j}=F_{i}\) and \(\sum_i F_{i} G_{j}=G_{j}\). If we denote this set of projections in some order by \(H_k\), then we obtain representations of the form \(B=\sum_{k} b_{k}^{\prime} H_{k}\) and \(C=\sum_{k} c_{k}^\prime {H}_{k}\). (These are not, of course, the spectral forms of \(B\) and \(C\), since the \(b_{k}^\prime\) and \(c_{k}^{\prime}\) need not be pairwise different.) It follows that \[ A=B+iC=\sum_{k}(b_{k}^{\prime}+ ic_{k}^{\prime}) H_{k}, \] as was to be proved.

Proof of necessity. Rather than proving necessity directly we prove that the condition \(B C=C B\) is equivalent to the condition \(AA^*=A^*A\). For if the latter condition is satisfied it follows from the definitions of \(B\) and \(C\) that the former one is; and conversely if \(B C=C B\) then \(A=B+iC\) and \(A^{*}=B-iC\) are commutative. That \(A A^{*}=A^{*} A\) is a necessary condition for the existence of a spectral form is clear: if \(A=\sum_i a_i E_i\) then \(A^{*}=\sum_i \bar{a}_{i} E_{i}\).

Transformations \(A\) satisfying any one of the equivalent conditions discussed in this section are called normal. An arbitrary unitary transformation \(U\) is an example of a normal but not necessarily Hermitian transformation: since \(U^{*}=U^{-1}\), it is clear that \(UU^{*}=U^{*} U\) since both products are equal to the identity transformation.

10. Functions of normal transformations

If \(A\) is a normal transformation, let \(A=\sum_i a_i E_i\) be the spectral form of \(A\): i.e., the \(E_{i}\) are pairwise orthogonal projections whose sum is one and the \(a_{i}\) are pairwise different complex numbers. Let \(f(\lambda)\) be an arbitrary complex valued function of the complex variable \(\lambda\), which is defined for \(\lambda=a_{i}\), \(i = 1, \ldots, p\). We define the symbol \(f(A)\) to be the transformation \(f(A)=\sum_i f(a_{i}) E_{i}\). The discussion in (8.1) shows that if \(f(\lambda)\) is a polynomial then this definition of \(f(A)\) agrees with the one used previously; if \(f(\lambda)=\lambda^{-1}\) then (assuming that \(f(\lambda)\) is defined for all \(a_{i}\)) \(f(A)=A^{-1}\); and if \(f(\lambda)=\bar{\lambda}\), then \(f(A)=A^{*}\). These statements imply that if \(f(\lambda)\) is an arbitrary rational function of \(\lambda\) and \(\bar{\lambda}\), we obtain \(f(A)\) by the replacements \[ \lambda \to A, \quad \bar{\lambda} \to A^{*}, \quad \lambda^{-1} \to A^{-1}, \quad c \to c1. \]

The symbol \(f(A)\) is defined for much more general functions, however, and we shall in what follows make use of expressions of the type \(e^{A}\) and \(A^{1 / 2}\) without any further explanation. It is, however, worthwhile to observe that this procedure is only a notational advantage: it introduces nothing conceptually new. For we may write, for an arbitrary \(f(\lambda)\), \(f(a_{i})=b_{i}\), and then we may find a polynomial, \(p(\lambda)\) which at the finite set of different complex numbers \(a_{i}\) takes respectively the values \(b_{i}\). With this polynomial \(p(\lambda)\) we have \(f(A)=p(A)\). Hence the class of transformations defined by \(f(A)\) is nothing essentially new: it only saves the trouble of constructing a polynomial \(p(\lambda)\) to fit each special case.

11. The square root of a non-negative transformation

We recall that in (II.7) we mentioned three possible definitions of a non-negative transformation and adopted the weakest one, namely that \(\langle A x, x \rangle \geq 0\) for all \(x\). The strongest of the three possible definitions was that we could write \(A\) in the form \(A=B^{2}\), with a Hermitian (real) \(B\): in other words that \(A\) has a Hermitian square root. It is now trivial to prove that the weak definition implies the strong one, so that they are equivalent. For let \({A}\) be non-negative in the sense that \(\langle Ax, x \rangle\) is non-negative for all \(x\). We know from (III.8.2) that in the spectral form \(\sum_i a_i E_i\) of \(A\) the \(a_{i}\) must be non-negative. Hence if we consider the function \(f(\lambda)\) defined for all non-negative real numbers \(\lambda\) as the non-negative square root of \(\lambda\), we may define \(B=f(A)=\sum_i f(a_{i}) E_i\). It is clear that \(B^{2}=A\), and since the \(f(a_{i})\) are non-negative real numbers \(B\) is not only Hermitian but even non-negative.

At first glance it may seem hopeless to look for any uniqueness in the representation \(A=B^{2}\) since if we consider \(B^{\prime}=\sum_i \pm f(a_{i}) E_i\), With arbitrary choice of sign in each place, we still have \(A=(B^\prime)^{2}\), with \(B^{\prime}\) Hermitian. The \(B\) we constructed, however, was non-negative and we can show this additional property guarantees uniqueness: in other words \(A=(B^{\prime})^{2}\), \(B^{\prime} \geq 0\) implies \(B^{\prime}=B\). For let \(B^{\prime}=\sum_i b_{i} F_{i}\) be the spectral form of \(B^{\prime}\): the \(b_{i}\) are different non-negative numbers. It follows that \[ (B^{\prime})^{2}=\sum_{j} b_{j} F_{j}=A=\sum_i a_i E_i, \] where the \(F_{j}\) are pairwise orthogonal projections of sum one and the \(b_{j}^{2}\) are pairwise different non-negative numbers. The uniqueness of the spectral form of \(A\) implies that the number of \(F\)’s is the same as the number of \(E\)’s and that after a suitable permutation of indices \(b_{i}^{2}=a_{i}\), whence \(b_{i}=f(a_{i})\), so that \(B^{\prime}=B\).

12. Product of non-negative transformations

In (II.8) we stated the theorem that if \(A\) and \(B\) are commutative non-negative transformations then their product \(AB = BA\) is also non-negative. Using the results of the preceding section it is possible to give an easy proof of this statement. Let \(A=\sum_i a_{i} E_i\) and \(B=\sum_{j} b_{j} F_{j}\) be the spectral forms of \(A\) and \(B\) respectively and let \(f(\lambda)\) be the same function (the non-negative square root) as we considered above. Then \(f(A)\) and \(f(B)\) are non-negative square roots of \(A\) and \(B\) respectively and since, in view of the comment in (III.10) they can also be written as polynomials in \(A\) and \(B\), the commutativity of \(A\) and \(B\) implies that of \(f(A)\) and \(f(B)\). (This fact is also easy to derive from the general commutativity theorem of (III.8.4).) Hence we have

\[ \begin{aligned} A B&=f(A)^{2} f(B)^{2}\\ &=f(A) f(A) \, f(B) f(B)\\ &=f(A) f(B) \, f(A) f(B)\\ &=\big(f(A) f(B)\big)^2. \end{aligned} \]

Since \(f(A)\) and \(f(B)\) are commutative Hermitian transformations their product is Hermitian, and therefore we have expressed \(A B=B A\) as the square of a Hermitian transformation. It follows that \(A B=B A\) is non-negative.

13. Commutative Hermitian transformations

We have often made use of the fact that if \({C}\) is a Hermitian transformation and \(p(\lambda)\) and \(q(\lambda)\) are any two polynomials then \(A=p(C)\) and \(B=q(C)\) are commutative. In this section we shall prove a converse of this result.

Theorem 3. If \(A\) and \(B\) are commutative Hermitian transformations then there exists a Hermitian transformation \(C\) and polynomials \(p(\lambda)\) and \(q(\lambda)\) such that \(A=p(C)\) and \(B=q(C)\). \(C\) can be written in the form \(C=r(A, B)\), where \(r(\lambda, \mu)\) is a polynomial in the two variables \(\lambda\) and \(\mu\).

(We observe that ordinarily an expression of the form \(r(A, B)\) would be meaningless, since in general \(A\) and \(B\) are not commutative, without some convention as to the order in which \(A\) and \(B\) are to be substituted for \(\lambda\) and \(\mu\). Since in the present case, however, \(A\) and \(B\) are commutative by hypothesis, this difficulty does not enter.)

Proof. Let \(A=\sum_i a_i E_i\) and \(B=\sum_{j} b_{j} F_{j}\) be the spectral forms of \(A\) and \(B\). It follows from (III.8.4) and the commutativity of \(A\) and \(B\) that \(E_{i}\) and \(F_{j}\) are commutative projections for every \(i\) and \(j\). Let the projections \(E_{i} F_{j}\), enumerated in some order, be \(G_{k}\). Then we have using the fact that \(\sum_{j} E_{i} F_{j}=E_{i}\) and \(\sum_i E_{i} F_{j}=F_{j}\), \(A=\sum_{k} a_{k}^{\prime} G_{k}\) and \(B=\sum_{k} b_{k}^\prime G_{k}\). Although it is true that the \(G_{k}\) are pairwise orthogonal projections of sum one, these representations are not the spectral forms of \(A\) and \(B\), since it need not be true that the \(a_{k}^\prime\) (or the \(b_k^\prime\)) are pairwise different. It is, however, true that for \(h \neq k\) the pair (considered, say, as a point of the real Euclidean plane) \((a_{h}^\prime, b_{h}^\prime)\) is different from the pair \((a_{k}^{\prime}, b_{k}^{\prime})\). Hence we may find a polynomial \(r(\lambda, \mu)\) which at the finite set of different points \((a_{k}^\prime, b_{k}^\prime)\) assumes respectively, the pairwise different real values \(c_{k}^{\prime}\): \(r(a_{k}^{\prime}, b_{k}^{\prime}) = c_{k}^{\prime}\). We write \(C=r(A, B)\), and we proceed to prove that \(C\) has the required properties.

Since \(A^{i}=\sum_{k} (a_{k}^\prime)^i {G}_{k}\) and \(B^{j}=\sum_{k}(b_{k}^\prime)^j {G}_{k}\) it follows that \(A^{i} B^{j}=\sum_{k}(a_{k}^\prime)^{i}(b_{k}^\prime)^{j} G_k\), and, more generally, for any polynomial \(s(\lambda, \mu)\), in two variables, \(s(A, B)=\sum_{k} s(a_{k}^\prime, b_{k}^{\prime}) G_{k}\). Hence, in particular, \(c=\sum_k c_k^{\prime} G_{k}\). If we choose the polynomials \(p(\lambda)\) and \(q(\lambda)\) so that \({p}(c_{k}^{\prime})=a_{k}^{\prime}\) and \({q}(c_{k}^{\prime})=b_{k}^{\prime}\); we have \(p(c)=A\) and \(q(C)=B\), as was to be proved.■

This fact is a very useful one in the consideration of groups and algebras of matrices and their reducibility.

14. The correspondence \(U=\exp(2\pi A)\)

Since a unitary transformation \(U\) is normal it has a spectral representation \(U=\sum_{j} u_{j} E_{j}\), where, of course, \(|u_{j}|=1\). Since we may write \(u_{j}\) in the form \(u_{j}=\exp (2 \pi i a_{j})\) with real numbers \(a_{j}\) (where \(i=(-1)^{1 / 2}\)), it follows that \(U=\exp (2 \pi i A)\), where \(A\) is the Hermitian transformation, \(A=\sum_j a_{j} E_{j}\). Because of the periodicity of the exponential function we do not, of course, have uniqueness in this representation. Since, however, we may choose \(a_{j}\) so that \(0 \leq a_{j} <1\), the corresponding \(A\) will have the property that its lower bound is non-negative, and its upper bound is \(< 1\), i.e. \(0 \leq A < 1\), and it is easily verified that this additional condition determines a uniquely. The converse, of course, is clear: if \(A\) is any Hermitian transformation, \(U=\exp (2 \pi i A)\) is unitary. In other words there is a one to one correspondence between all unitary transformations and those Hermitian transformations which are bounded between \(0\) and \(1\), established by the formula \(U=\exp (2 \pi i A)\). This result is the analog of the same in the algebra of transformations, that every complex number \(u\) of absolute value one can be written, in one and only one way, in the form \(u=\exp (2 \pi i a)\), with \(0 \leq a < 1\).

15. Cayley transform

The result of the preceding section was based on a one to one mapping between the unit circle in the complex plane and the half-open unit interval on the real line. Another mapping of real numbers into complex numbers of absolute value one is given by \(u=f(a)\), where \(f(a)=\dfrac{a+i}{a-i}\). It is easy to verify that this mapping is one to one between all real numbers \(a\) and all complex numbers \(u\) of absolute value one with the exception of \(u= 1\). It follows that the mapping \(U=f(A)\), (whose inverse mapping, incidentally, is \(A=g(U)\), where \(g(u)=i \dfrac{u+1}{u-1}\)), is a one to one correspondence between all Hermitian transformations \(A\) and those unitary transformations \(U\) which do not have the eigenvalue \(1\). Since the structure of unitary transformations is clear geometrically, this correspondence is often used to obtain information about Hermitian transformations that may be rather difficult to obtain in other ways. The unitary transformation \(U\) is said to be the Cayley transform of \(A\).

16. Polar decomposition

Every complex number \(b\) can be written in one and only one way in the form \(b = pu = p\exp(2\pi i a)\) where \(p\) is non-negative and \(u\) has absolute value one, or, equivalently, \(a\) lies between zero and one. We shall conclude this chapter by proving the analog, in the algebra of transformations, of this theorem. Although most of the results of this chapter are valid for normal transformations only, it is noteworthy that the theorem of this section is true in all generality, for arbitrary transformations \(B\).

Theorem 4. If \(B\) is an arbitrary linear transformation, there exists a non-negative transformation \(P\) and a unitary transformation \(U\) such that \(B=P U\). \(P\) is uniquely determined by \(B\); if \(B\) has an inverse, and only in this case, \(U\) is uniquely determined by \(B\).

Proof. Although it is not logically necessary we shall first give the proof in the case where \(B\) has an inverse: the proof in the other case is an obvious modification of this proof, which will give greater insight into the structure of arbitrary transformations.

Since the transformation \({BB}^{*}\) is non-negative, we can find a non-negative transformation \(P\) such that \(P^{2}=BB^{*}\). Write \(V=B^{-1} P\); since \(B V=P\), the theorem will be proved if we can prove that \(V\) is unitary, for then we may choose \(U=V^{-1}\). Since \(V^{*}=P^{*}(B^{-1})^{*}=P(B^{*})^{-1}\), the relations \[ VV^{*}=B^{-1} {PP}({B}^{*})^{-1}=B^{-1} B B^{*}(B^{*})^{-1}=1 \] imply that \(V\) is unitary. To prove uniqueness observe that \(P U=P_{0} U_{0}\) implies \(U^{*} P=U_{0}^{*} P_{0}\) and therefore \(P UU^{*} P=P_{0} U_{0} U_{0}^{*} P_{0}\) or, \(P^{2}=P_{0}^{2}\). Since a non-negative transformation has only one non-negative square root it follows that \(P=P_{0}\) and therefore (multiplying the relation \(P U=P_{0} U_{0}\) on the left by \(P^{-1}\)) \(U=U_{0}\). (\(P^{-1}\) exists since \(B=P U\) implies \(P=B U^{-1}\).)

We turn now to the general case where we do not assume that \(B^{-1}\) exists. We form \(P\) exactly the same way as in the preceding proof, that \({P}^{2}={BB}^{*}\), and we observe that for any vector \({x}\) we have \[ \begin{aligned} |P x|^{2} &= \langle P x, P x \rangle\\ &= \langle P^{*} P x, x \rangle \\ &= \langle P^{2} x, x \rangle\\ &= \langle B B^{*} x, x \rangle\\ &= \langle B^{*} x, B^{*} x \rangle\\ &=|B^{*} x|^{2}. \end{aligned} \] Hence if for every vector \(y\) in the linear subspace of vectors of the form \(Px\) we define \(V y=B^{*} x\), the transformation \(V\) is, wherever it is defined, length preserving. We must show that \(V\) is uniquely determined by the condition given: i.e., that \({P}{x}^{\prime}={P}{x}^{\prime\prime}\) implies \({B}^{*} {x}^{\prime}=B^{*} x^{\prime\prime}\). This is true since \(P(x^{\prime}-x^{\prime\prime})=0\) is equivalent to \(|P(x^{\prime}-x^{\prime\prime})|=0\) and this latter fact implies \(|B^{*}(x^{\prime}-x^{\prime\prime})|=0\). If we define \(V\) on the orthogonal complement of the set of vectors of the form \(Px\) to be an arbitrary unitary transformation of this orthogonal complement into itself, the transformation \(V\), thus determined in all \(\mathcal{U}\), is unitary and has the property that \(VPx = B^{*} x\) for all \(x\). Hence \(B=P V^{*}\) and the theorem follows upon setting \(U=V^{*}\). The uniqueness of \(P\) is proved the same way as in the case when \(B^{-1}\) exists, the extent of non-uniqueness of \(U\) is clear from the proof, in which at one place we were free to make an arbitrary choice for \(V\).■

We shall refer to the representation \(B=P U\) as the polar decomposition of \(B\). Applying the theorem just proved to \(B^{*}\) instead of \(B\) we obtain the result that every \(B\) can be written in the form \(B={W P}\) with a unitary \(W\) and a non-negative \(P\).

17. Polar conditions for normality

In terms of the polar decomposition we can state another necessary and sufficient condition that \(B\) be normal. If \(B=P U\) is the polar decomposition of \(B\), then \(B\) is normal if and only if \(PU = UP\). (Since \(U\) is not uniquely determined by \(B\), this statement is to be interpreted as follows: if for any decomposition of \(B\), \(P\) and \(U\) commute then \(B\) is normal and if \(B\) is normal then \(P\) and \(U\) commute.) Since \(B^{*} B=U^{*} P^{2} U\) and \(B B^{*}=P^{2}\), it is clear that \(B\) is normal if and only if \(U\) and \(P^{2}\) are commutative. If \(U\) and \(P\) commute then certainly \(U\) and \(P^{2}\) do, so that \(B\) is normal. If \(B\) is normal we have \(U^{*} P^{2} U=P^{2}=A\) (say), where \(A\) is non-negative; the relations \(A=(U^{*}PU)^{2}\) and \(A = P^2\) together with the uniqueness theorem for non-negative square roots, imply the commutativity of \(U\) and \(P\).