Linear Transformations and Matrices

1. Linear transformations in vector spaces
2. The dual of a transformation
3. Matrices associated with linear transformations
4. Isomorphism of matrices and transformations
5. The forms \(\langle Ax, x\rangle\) and \(\langle Ax, y \rangle\)
6. Hermitian transformations
7. Positive definite transformations
8. Algebraic combinations of Hermitian and definite transformation
9. Matricial characterizations of Hermitian transformations
10. Unitary transformations
11. Automorphisms of inner product spaces
12. Change of basis in an inner product space
13. Matricial characterization of unitary transformations
14. Orthogonal projections in an inner product space
15. Products of projections
16. Sums of projections
17. Differences of projections
18. Relation between projections and involutions
19. The rank of a linear transformation
20. The norm of a linear transformation
21. Expressions for the norm of a transformation
22. Upper and lower bounds of a Hermitian transformation

1. Linear transformations in vector spaces

A linear transformation \(A\) in a vector space \(\mathcal{V}\) is a correspondence which assigns to every vector \(x\) in \(\mathcal{V}\) another vector \(A x\) in \(\mathcal{V}\) in such a way that \(A(ax + by)=aAx + bAy\), for any two vectors \(x\) and \(y\) and any two complex numbers \(a\) and \(b\).

If \(A\) is any linear transformation and \(a\) any complex number, we define a linear transformation \(B=aA\) by \(Bx = aAx\). For any two linear transformations \(A\) and \(B\) we define their sum \(C=A+B\) by \(C x=A x+B x\) and their product \(D=AB\) by \(D x=A(B x)\). Two special linear transformation of interest are \(0\), defined by \(0 x=0\) for all \(x\), and \(1\), defined by \(1 x=x\) for all \(x\). If the correspondence between \(x\) and \(y=Ax\) happens to be one to one, in other words if every vector \(y\) can be written in the form \(Ax\) in one and only one way, we may define a transformation \(B=A^{-1}\) by \(B y=x\). It is easy to verify that \(A^{-1}\) is a linear transformation.

(The transformations we call linear are sometimes called homogeneous linear transformations: i.e., they have the property that \(A0=0\).)

We observe, without proof, that the following formulas are valid for linear transformations: the proofs of all of these facts are immediate from the definitions. \[ \begin{aligned} A+(B+C)&=(A+B)+C\\ A+B&=B+A\\ A(B C)&=(A B) C \\ A(B+C)&=A B+A C\\ (A+B) C&=A C+B C\\ a(A+B)&=a A+a B \\ (a+b) A&=a A+b A\\ a(b A)&=(a b) A\\ A+0&=A \\ A1&=1A=A\\ A A^{-1}&=A^{-1} A=1 \end{aligned} \]

The associative law enables us to define for every positive integer \(n\) the transformation \(A^{n}\) by the recursive definition \(A^{n}=A A^{n-1}\), \(A^{1}=A\). Although in general the multiplication of two transformations is not commutative, and in fact much of the difficulty and interest in the theory is due to this fact, for powers of one transformation we do have \(A^n A^m = A^{n + m} = A^m A^n\). If we make the convention that \({A}^{0}=1\) and if we define, in case \(A^{-1}\) exists, \(A^{-n}\), for any positive integer \(n\), by \(A^{-n}=(A^{-1})^{n}\) then the calculus of powers of a single linear transformation is exactly the same as in ordinary arithmetic. In accordance with this comment and the properties of addition and scalar multiplication, for every polynomial \(p(z)=a_{0}+a_{1} z+\cdots+a_{n} z^{n}\) we may write \(p(A)\) as an abbreviation for the linear transformation \(\sum_{i=0}^{n} a_{i} A^{i}\). These ideas will be very useful to us later.

We make also some more comments about \({A}^{-1}\), the inverse of \(A\). We observe first that a necessary and sufficient condition that \(A\) has an inverse, (in other words that be a one to one transformation), is that \(Ax=0\) imply \(x=0\). The necessity of this condition is obvious: we proceed to the proof of sufficiency. Since \(A x=A y\) implies \(A(x-y)=0\), it follows that \(x=y\), so that \(A\) is one to one as far as it goes. It remains to prove only that every vector has the form \(A x\). For this purpose let \(x_{1}, x_{2}, \ldots, x_{n}\) be any \(n\) linearly independent vectors in \(\mathcal{V}\): we claim that \(A x_{1}, A x_{2}, \ldots, A x_{n}\) are also linearly independent. For if we had \(\sum_{i} c_{i} A x_{i}=0\), then \(A\big(\sum_{i} c_{i} x_{i}\big)=0\) whence \(\sum_{i} c_{i} x_{i}=0\) and the linear independence of the \(x_{i}\) implies that \(c_{i}=0\) for all \(i\). Hence the set of all vectors of the form \(A x\), which is clearly a linear subspace, contains \(n\) linearly independent vectors and is therefore \(n\)-dimensional. It follows that it must coincide with \(\mathcal{V}\).

We have already stated that \(A A^{-1}=A^{-1} A=1\). We claim now that these equations are characteristic of \(A^{-1}\): in other words if any linear transformation \(B\) exists for which \(A B=1\) then \(B=A^{-1}\). For

\[ B=(A^{-1} A) B=A^{-1}(A B)=A^{-1} 1=A^{-1}. \]

Similarly we could prove that if there is a \(C\) for which \(C A=1\) then \(C=A^{-1}\). It is an immediate consequence of this result that \((A^{-1})^{-1}=A\) and \((A B)^{-1}=B^{-1} A^{-1}\), and since \(c A=(c 1) A\), that \((c A)^{-1}=c^{-1} A^{-1}\).

2. The dual of a transformation

Let \(A\) be a linear transformation in an inner product space \(\mathcal{U}\) and consider the expression \(\langle A x, y \rangle\). Since \[ \big\langle A(a_{1} x_{1}+a_{2} x_{2}), y\big\rangle = a_{1}\langle A x_{1}, y \rangle + a_{2}\langle A x_{2}, y \rangle, \] \(\langle A x, y \rangle\) is for each fixed \(y\) a linear function of \(x\). Hence, by the theorem of (I.15), there exists a uniquely determined vector, say \(y^{*}\), for which \(\langle Ax, y \rangle = \langle x, y^{*} \rangle\) for all \(x\). We denote the correspondence which assigns to every vector \(y\) the vector \(y^*\) as just defined by \(B\): \(B y=y^*\). We prove that \(B\) is a linear transformation. For if \(\langle A x, y_{1} \rangle = \langle x, y_{1}^{*} \rangle\) and \(\langle A x, y_{2} \rangle = \langle x, y_{2}^{*} \rangle\), where \({y}_{1}^{*}=B{y}_{1}\) and \({y}_{2}^{*}=By_{2}\), then by multiplying the two equations involving inner products by \(\bar{a}_{1}\) and \(\bar{a}_{2}\) respectively and adding, we obtain \[ \langle A x, a_{1} y_{1}+a_{2} y_{2} \rangle = \langle x, a_{1} y_{1}^{*}+a_{2} y_{2}^{*} \rangle, \] so that (using the uniqueness statement in (I.15)) \[ B(a_{1} y_{1}+a_{2} y_{2})=a_{1} B y_{1}+a_{2} B y_{2}, \] as was to be proved.

The process just described associates with every linear transformation \(A\) another linear transformation \(B\) which we shall denote by \(A^{*}\) and call the dual of \(A\) (This terminology is justified by the well known geometric language of duality: roughly speaking when \(A\) is applied in the space of vectors, \(A^{*}\) is applied in the dual space of hyperplanes.) The dual of \(A\) is uniquely characterized by the fundamental relation \(\langle Ax, y \rangle = \langle x, A^{*} y \rangle\). If we denote the dual of the dual of \(A\) by \(A^{**}\) then the relation

\[ \overline{\langle y, A^{*} x \rangle} = \langle A^{*} x, y \rangle = \langle x, A^{**}y \rangle = \overline{\langle A^{**}y, x \rangle} \]

implies (interchanging the roles of \(x\) and \(y\) and removing the conjugates) that for all \(x\) and \(y\) we have \(\langle A^{**}x, y \rangle = \langle x, A^{*} y \rangle = \langle A x, y \rangle\), so that \(A^{**}=A\).

The relation of the dual to the previously introduced operations (among linear transformations) of addition, multiplication, and scalar multiplication is completely described by the following identities: \[ \begin{aligned} (A+B)^{*}&=A^{*}+B^{*},\\ (A B)^{*}&=B^{*}{A}^{*},\\ (a A)^{*}&=\bar{a}A^{*}. \end{aligned} \] The proofs of these identities will be found in the equations: \[ \begin{aligned} \big\langle (A + B)x, y\big\rangle &= \langle A x, y \rangle + \langle B x, y \rangle \\ &= \langle x, A^{*} y \rangle + \langle x, B^{*} y \rangle \\ &= \big\langle x, (A^{*}+B^{*}) y\big\rangle\\ \langle A B x, y \rangle &= \langle B x, A^{*} y \rangle = \langle x, B^{*} A^{*} y\rangle\\ \langle a A x, y \rangle &= a\langle A x, y \rangle = a\langle x, A^{*} y \rangle = \langle x, \bar{a}{A}^{*} y \rangle \end{aligned} \]

We observe that \(0^{*}=0\) and \(1^{*}=1\), and that in case \(A\) has an inverse then \(A^{*}\) does also and \((A^{*})^{-1}=(A^{-1})^{*}\). The latter fact follows from the identity \[ (A^{-1})^{*} A^{*}=\big(A(A^{-1})\big)^{*}=1^{*}=1. \]

3. Matrices associated with linear transformations

Let \(x_{1}, x_{2}, \ldots, x_{n}\) be an orthogonal basis in the inner product space \(\mathcal{U}\) of dimension \(n\), and let \(A\) be a linear transformation on \(\mathcal{U}\). Since for every \(j\), \(j=1,2, \ldots, n\), \(A x_{j}\) is a vector in \(\mathcal{U}\) it may, in virtue of (I.11.4), be written (uniquely) as a linear combination of the \(x_{i}\), say \(A x_{j} = \sum_{i} a_{i j} x_{i}\). The set \((a_{i j})\) of \(n^{2}\) indexed complex numbers is a matrix; a matrix is usually written in the form of a square array, \[ \begin{aligned} \begin{bmatrix} a_{11} & a_{12} & \cdots & a_{1n}\\ a_{21} & a_{22} & \cdots & a_{2n}\\ \vdots & \vdots & \ddots & \vdots \\ a_{n1} & a_{n2} & \cdots & a_{nn} \end{bmatrix}. \end{aligned} \]

We shall consistently use the following notation. Capital Latin letters denote (as before) linear transformations, the corresponding lower case letters with double subscripts will be the elements of the corresponding matrix, and the capital letters in square brackets will stand for the matrix itself. When a linear transformation is distinguished by symbols, as \(A^{*}\) or \(A^{\prime}\), the corresponding matrix elements and matrices will be denoted by \(a_{i j}^{*}\) or \(a_{ij}^\prime\) and \([A^*]\) or \([A^\prime]\) respectively. Two matrices \([A]\) and \([B]\) are equal if \(a_{i j}=b_{i j}\) for every \(i\) and \(j\).

With the aid of a fixed orthogonal basis \(\{x_{i}\}\) we have made correspond a matrix \([A]\) to every linear transformation \(A\): the correspondence is described by the relations \(A x_{j}=\sum_{i} a_{ij} x_i\). We assert now that this correspondence is one to one. For let \(A\) be a linear transformation and let \(x\) be any vector. Then \(x\) is a linear combination of the vectors \(x_{j}\) of the basis, say \(x=\sum_{j} c_{j} x_{j}\), and the linearity of \(A\) implies that \[ A x=\sum_{j} c_j Ax_{j}=\sum_{j} c_{j} \sum_{i} a_{i j} x_{i}=\sum_{i} \Big(\sum_{j} a_{i j} c_{j}\Big) x_{i}, \]

so that the vector \(x\) whose \(j\)-th coordinate in the coordinate system \(\{x_i\}\) is \(c_{j}\) becomes, upon application of the transformation \(A\), the vector whose \(i\)-th coordinate in the same coordinate system is \(\sum_{j} a_{i j} c_{j}\). Conversely if \((a_{ij})\) is any matrix we may define a transformation \(A\) by the formula \[ A\Big(\sum_{j} c_{j} x_{j}\Big)=\sum_{i}\Big(\sum_{j} a_{ij} c_{j}\Big) x_{i}; \]

it is easy to verify that \(A\) is a linear transformation whose corresponding matrix is precisely \([A]=(a_{ij})\). We emphasize the fundamental fact that this one to one correspondence was set up by means of a particular coordinate system \(\{x_{i}\}\) and that as we pass from one coordinate system to another the same linear transformation may correspond to several matrices and one matrix may be the correspondent of several linear transformations. In fact the relation between the different matrices that may correspond to one linear transformation in various coordinate systems will be the object of study in much of what follows.

4. Isomorphism of matrices and transformations

Although the matrix associated with a linear transformation depends on a particular coordinate system, several properties of the correspondence between transformations and matrices are the same in all coordinate systems. In this section we study some of these properties. Throughout we assume that \(\{x_{i}\}\) is an arbitrary but fixed coordinate system and that the matrices we discuss are related to linear transformations by means of this system.

If \(x=\sum_{j} c_{j} x_{j}\) and \(k\) is any complex number, then \(k x=\sum_{j} k c_{j} x_{j}\), and \(k A x=A k x=\sum_{i}\big(\sum_{j} k a_{ij} c_{j}\big) x_{i}\), so that the matrix \([kA]\), corresponding to \({kA}\), has the elements \({ka}_{ij}\).

Similarly \[ \begin{aligned} (A+B)x &= A x+B x\\ &=\sum_{i}\Big(\sum_{j} a_{ij} c_{j}\Big) x_{i}+\sum_{i}\Big(\sum_{j} b_{ij} c_{j}\Big) x_{i} \\ &=\sum_{i}\Big(\sum_{j}(a_{ij}+b_{ij}) c_{j}\Big) x_{i}, \end{aligned} \] so that the matrix \((A+B)\) has elements \((a_{i j}+b_{i j})\).

Also \[ \begin{aligned} A B x&=A\bigg(\sum_{k}\Big(\sum_{j} b_{k j} c_{j}\Big) x_{k}\bigg)\\ &=\sum_{i}\Big(\sum_{k} a_{i k} \sum_{j} b_{k j} c_{j}\Big) x_{i} \\ &=\sum_{i}\bigg(\sum_{j}\Big(\sum_{k} a_{i k} b_{k j}\Big) c_{j}\bigg) x_{i}, \end{aligned} \] so that \([A B]=\big(\sum_{k} a_{ik} b_{kj}\big)\).

Finally if \(y=\sum_{i} d_{i} x_{i}\) then we have, (using Parseval’s identity, (I.11.5)),

\[ \begin{aligned} \langle Ax, y \rangle &= \sum_i \sum_j a_{ij} c_i \bar{d}_j\\ &= \sum_i c_i \Big(\sum_j a_{ij} \bar{d}_j\Big)\\ &= \sum_i c_i \overline{\Big(\sum_j b_{ji} d_j\Big)}\\ &= \langle x, By \rangle, \end{aligned} \]

where \(b_{ij}=\bar{a}_{j i}\).

We observe that for \(A=0\), \(a_{ij}=0\) and for \(A=1\), \(a_{ij}=\delta_{ij}\).

A simple way of summing up the results of this section is the following. For a matrix \([A]\), (not for a linear transformation!) we define \(k[A]=[B]\) by \(b_{ij}=k a_{ij}\); and we define the conjugate transpose \([C]=[{A}]^{*}\) by \(c_{ij}=\bar{a}_{j i}\). If moreover for any two matrices \([A]\) and \([B]\) we define their sum \([C]=[A]+[B]\) by \(c_{ij}=a_{ij}+b_{ij}\), and their product \([D]=[A][B]\) by \(d_{ij}=\sum_{k} a_{i k} b_{k j}\), then our result is that the correspondence established by means of an arbitrary coordinate system between the set of all linear transformations of \(\mathcal{U}\) and the set of square matrices of \({n}\) rows and \({m}\) columns is an isomorphism: i.e., it preserves addition, multiplication, and scalar multiplication, and makes the dual of a transformation correspond to the conjugate transpose of its matrix.

We return now to our general study of linear transformations without reference to any orthogonal basis. Periodically we shall stop to interpret our results in terms of the language and notation of matrices.

5. The forms \(\langle Ax, x\rangle\) and \(\langle Ax, y \rangle\)

With every linear transformation \(A\) We may associate the expressions \(\langle A x, x \rangle\) and \(\langle A x, y \rangle\). We may consider these as numerical valued functions, defined by means of \(A\), of a single vector \(x\) or of a pair of vectors \(x\) and \(y\), respectively. Properties of these functions are intimately connected with properties of the linear transformation \(A\): we shall study this subject in more detail later. At present we observe only two simple facts. First: if \(\langle A x, y \rangle = 0\) for all \(x\) and \(y\) then \(A=0\). For in particular we may choose \(y=A x\) and obtain \(|A x|^{2}=0\), whence \(A x=0\) for all \(x\). Second: if \(\langle A x, x \rangle = 0\) for all \(x\) then \(A=0\). The proof of this statement is less trivial: It depends on a standard technique called polarization. If \(\langle A x, x \rangle = 0\) for all \(x\) then for every pair of vectors \(x\) and \(y\) and every complex number \(c\) we have \[ \begin{aligned} 0&=\big\langle A(c x+y), (c x+y)\big \rangle\\ &=|c|^{2}\langle A x, x \rangle + \langle A y, y \rangle + c\langle A x, y \rangle + \bar{c}\langle A y, x \rangle, \end{aligned} \] so that since the first two terms of the right member vanish we obtain \[ c\langle A x, y \rangle +\bar{c}\langle A y, x \rangle = 0. \] Choosing first \(c=1\) and then \(c=i=(-1)^{1 / 2}\) we obtain the equations \[ \langle A x, y \rangle + \langle A y, x \rangle = 0, \quad i\langle A x, y \rangle - i\langle A y, x \rangle = 0. \] Dividing the second equation by \(i\) and then forming the arithmetic mean of the two shows that \(\langle A x, y \rangle = 0\) for all \(x\) and \(y\), so that, by our first result, \(A=0\).

We observe that our second result is not true if we restrict ourselves to real spaces: the proof of course breaks down at our choice, \(c=i\). For example a \(90^{\circ}\) rotation in the plane clearly has the property that it sends every vector \(x\) into a vector \(Ax\) which is orthogonal to \(x\).

6. Hermitian transformations

In many fundamental respects the algebraic system of all linear transformations on an inner product space \(\mathcal{U}\) resembles the set of all complex numbers. In both systems the notions of addition, multiplication, \(0\) and \(1\) are defined and have similar properties; and, moreover, in both systems there is defined a conjugation (\(A \to A^{*}\) and \(z \to\bar{z}\), respectively): i.e., an involutory conjugate automorphism of the system on itself. We shall use this analogy as a heuristic principle and we shall attempt to carry over to linear transformations some well known concepts of the complex domain.

When is a complex number real? Clearly a necessary and sufficient condition for the reality of \(z\) is that \(z=\bar{z}\). We might accordingly (remembering that the analog of the complex conjugate for linear transformations is the dual) define a linear transformation to be real if \(A=A^{*}\). Actually linear transformations \(A\) for which \(A=A^{*}\) are called Hermitian. (Since the dual of a transformation is sometimes called its adjoint, Hermitian transformations are sometimes called self-adjoint. Other terms are symmetric and Hermitian symmetric. The reason for the latter terminology will appear later when we examine the corresponding matricial concept.) The following theorem shows that Hermitian transformations are tied up with reality in more ways than through the formal analogy that suggested their definition.

Theorem 1. A necessary and sufficient condition that \(A\) be Hermitian is that \(\langle A x, x \rangle\) be real for all \(x\).

Proof. For if \(A=A^{*}\), then \[ \langle A x, x \rangle = \langle x, A^{*} x \rangle = \langle x, Ax \rangle = \overline{\langle Ax, x \rangle}, \] so that \(\langle A x, x \rangle\) is equal to its own conjugate and is therefore real. Conversely, if \(\langle A x, x \rangle\) is always real then \[ \langle A x, x \rangle = \overline{\langle Ax, x \rangle }=\overline{\langle x, A^{*} x \rangle} = \langle A^{*} x, x \rangle, \] so that \(\big\langle (A-A^{*})x, x \big\rangle =0\) for all \(x\), whence, by (II.5), \(A - A^{*}=0\).■

We remark that this theorem is also false in real spaces. For in the first place its proof depends on a lemma that is valid only in complex inner product spaces, and in the second place in a real space the reality of \(\langle Ax, x \rangle\) (in fact of \(\langle Ax, y \rangle\)) is a condition automatically satisfied by all \(A\), whereas the condition \(A=A^{*}\), or equivalently \(\langle A x, y \rangle = \langle x, A y \rangle\), need not be satisfied. It is not difficult to verify that the example given in (II.5) is a counter example to this theorem in real spaces.

7. Positive definite transformations

When is a complex number non-negative? Two equally natural necessary and sufficient conditions are that \(z\) may be written in the form \(z=u^{2}\) for some real \(u\) or that \(z\) may be written in the form \(z=v \bar{v}\) for an arbitrary \(v\). Remembering also the fact that the Hermitian property of a transformation \(A\) can be described in terms of the function \(\langle A x, x \rangle\) we may consider any one of three conditions and attempt to use them as the definition of a transformation being non-negative.

7.1. \(A = B^2\), \(B\) Hermitian,

7.2. \(A = C^* C\), \(C\) arbitrary,

7.3. \(\langle Ax, x \rangle \geq 0\) for all \(x\).

Before deciding on which one of these three conditions to use as definition we prove that the following implication relations hold: \[ (7.1) \implies (7.2) \implies (7.3). \]

For if \(A=B^{2}\) with a Hermitian \(B\), i.e., with a \(B\) for which \(B^{*}=B\), then \(A=B B=B^{*} B\). And if \(A=C^{*} C\) then \[ \langle A x, x \rangle = \langle C^{*} C x, x \rangle = \langle C x, C x \rangle = |C x|^{2} \geq 0. \]

It is actually true that (7.3) implies (7.1), so that the three conditions are equivalent, but we shall not be able to prove this till much later. We adopt as our definition the third condition: a linear transformation \(A\) is non-negative, in symbols \(A\geq 0\), if for all \(x\), \(\langle A x, x \rangle \geq 0\).

(Non-negative transformations are usually called positive semidefinite. If \(A\geq 0\) and \(\langle A x, x \rangle = 0\) implies \(x=0\) we call \(A\) positive definite.) It follows from the theorem in (II.6) that \(A\geq 0\) implies that \(A\) is Hermitian. The transformations \(0\) and \(1\) are non-negative.

8. Algebraic combinations of Hermitian and definite transformation

We discuss the relation of the two concepts just defined to our preceding notions. If \(A\) and \(B\) are both Hermitian then, since \((A+B)^{*}=A^{*}+B^{*}\), \(A+B\) is also. If \(A\) is Hermitian then \(cA\) is Hermitian if and only if \(c\) is real. This follows immediately from the fact that \((c A)^{*}=\bar{c} A^{*}\). If \(A\) and \(B\) are both Hermitian then \(A B\) is Hermitian if and only if \(A B=B A\). This is a consequence of the relations \((A B)^{*}=B^*{A}^{*}=B A\). Since \((A^{-1})^{*}=(A^{*})^{-1}\), \(A^{-1}\) is Hermitian if and only if \(A\) is. Similarly the relation \((B^*{A B})^{*}=B^{*} A^{*} B\) implies that \(B^{*} AB\) is Hermitian along with \(A\). A converse to the last statement is that if \(B\) has an inverse and \(B^{*} A B\) is Hermitian then so is \(A\). (We remark that both the direct and converse statements are valid for not necessarily Hermitian \(B\).) For if \(B\) has an inverse, every vector \(x\) may be written in the form \(x=B y\), and since \(\langle A x, x \rangle = \langle A B y, B y \rangle = \langle B^{*} A B y, y \rangle\), the reality of the last term for all \(y\) implies the reality of the first for all \(x\).

The formulas used in the proofs in the preceding paragraph prove also that \(A+B\) is non-negative if both \(A\) and \(B\) are, that if \(A\geq 0\) then \(cA\) is non-negative if and only if \({c}\) is, that \({B}^{*} {AB}\) is non-negative along with \(A\), and that, conversely, if \(B\) has an inverse then the non-negativeness of \(B^{*} AB\) implies that of \(A\). It is true also that if \(A\) and \(B\) are non-negative and commutative then \(AB\) is non-negative, but we shall have to postpone the proof of this statement until later. Neither this statement nor the one concerning Hermitian transformations is true without the restriction of commutativity.

9. Matricial characterizations of Hermitian transformations

If \(\{x_{i}\}\) is any coordinate system in \(\mathcal{U}\) then a necessary and sufficient condition that a linear transformation \(A\) be Hermitian is that the matrix \([A]\) corresponding to \(A\) in this coordinate system satisfy the equation \([A]=[A]^{*}\), or, in other words, that we have \(a_{ij} = \bar{a}_{ji}\) for all \(i\) and \(j\). This explains, incidentally, why Hermitian transformations are sometimes called Hermitian symmetric. A similar matricial characterization of non-negative matrices is possible, but the conditions on the \(a_{ij}\) are much more complicated and since we shall not have any occasion to use them we do not enter on this subject here. We shall, however, refer to matrices (not linear transformations!) as Hermitian (or non-negative) if their associated linear transformations are Hermitian (or non-negative).

10. Unitary transformations

When does a complex number \(z\) have absolute value \(1\)? Clearly \(\bar{z}=z^{-1}\) is a necessary and sufficient condition: guided by our heuristic principle we are led to consider linear transformations \(U\) for which \(U^{*}=U^{-1}\). Such transformations are called unitary. Concerning unitary transformations we prove the following two theorems,

Theorem 2. 10.1. A necessary and sufficient condition that \(U\) be unitary is that \(\langle Ux, Uy \rangle = \langle x, y \rangle\) for all \(x\) and \(y\).

10.2. A necessary and sufficient condition that \(U\) be unitary is that \(|U x|=|x|\) for all \(x\).

Proof. Since the condition of (10.1) is obviously stronger than that of (10.2), (i.e., if \(\langle Ux, Uy \rangle = \langle x, y \rangle\) then \(\langle Ux, Ux \rangle = \langle x, x \rangle\)) it will be sufficient to prove that the condition of (10.1) is necessary and that the condition of (10.2) is sufficient. If \(|U x|=|x|\) for all \(x\), then \[ \langle x, x \rangle = \langle U x, U x \rangle = \langle U^{*} U x, x \rangle, \] so that \(\big\langle (U^*U - 1) x, x \big\rangle = 0\) for all \(x\). It follows that \(U^*U = 1\), so that \(U^{*}=U^{-1}\). Conversely if \(U\) is unitary, so that \(U^*U = 1\), then for all \(x\) and \(y\), \[ \langle x, y \rangle = \langle U^{*} U x, y \rangle = \langle Ux, Uy \rangle. \]■

11. Automorphisms of inner product spaces

In any algebraic system, in particular in vector spaces and in inner product spaces, it is of interest to consider the automorphisms of the systems: i.e., to consider those one to one mappings of the system on itself which preserve all relations between elements of the system. The most general automorphism of a vector space is a one to one transformation that preserves addition and scalar multiplication: in other words it is an arbitrary linear transformation which has an Inverse. Of an automorphism \(U\) of an inner product space we should also require that it preserve inner product: i.e., that we have \(\langle Ux, Uy \rangle = \langle {x}, {y} \rangle\). But this, as we have seen, is equivalent to the requirement that \(U\) be unitary. Thus the two questions — ‘What linear transformations are the analogs of complex numbers of absolute value one?’ and ‘What are the most general automorphisms of inner product space?’ — have the same answer: unitary transformations. In tha next paragraph we shall see that unitary transformations furnish the answer to a third question also: ‘What happens to the matrix \([A]\) of the linear transformation \(A\) when we change coordinate systems?’

12. Change of basis in an inner product space

We start with the comment that a necessary and sufficient condition that a linear transformation \(U\) be unitary is that whenever \(\{x_{i}\}\) is a complete orthonormal set then so is \(\{U x_{i}\}\). For the condition is merely the statement that \(\langle Ux, Uy \rangle = \langle x, y \rangle\) for \(x\) and \(y\) lying in complete orthonormal set, and by linearity the condition extends to all \(x\) and \(y\). Suppose then that \(\{x_i^\prime\}\) and \(\{x_i^{\prime\prime}\}\) are two coordinate systems (i.e., complete orthonormal sets). If \(A\) is an arbitrary linear transformation then throughout this paragraph we shall denote by \([A]^\prime\) (or \([A]^{\prime\prime}\)) the matrix of \(A\) in the coordinate system \(\{x_i^\prime\}\) (or \(\{x_{i}^{\prime\prime}\}\)). The matrices \([A]^\prime\) and \([A]^{\prime \prime}\) are characterized, respectively, by the following two equations: \[ Ax_j^\prime = \sum_i a_{ij}^\prime x_i^\prime, \quad Ax_j^{\prime \prime}= \sum_i a_{ij}^{\prime \prime} x_i^{\prime \prime}. \]

Let us denote by \(U\) the linear transformation defined by the relations \(Ux_i^\prime = x_i^{\prime\prime}\), and generally, \(U\big(\sum_i c_i x_i^\prime\big) = \sum_i c_i x_i^{\prime\prime}\). It follows from the comment made as the beginning of this paragraph that \(U\) is unitary, and the second of tho two equations above implies that \(AU^{-1} x_j^\prime = U^{-1}\big(\sum_i a_{ij}^{\prime\prime} x_i^\prime\big)\), or, in other words, that \[ UAU^{-1} x_j^\prime = \sum_i a_{ij}^{\prime\prime} x_i^\prime. \]

But this means exactly that \([UAU^{-1}]^\prime = [A]^{\prime\prime}\). Summing up: if \([A]^\prime\) and \([A]^{\prime\prime}\) are two matrices corresponding to the same linear transformation in different coordinate systems, then there exists a unitary transformation \(U\) such that \([U A U^{-1}] =[A]^{\prime\prime}\), or, equivalently, there exists a unitary matrix \([U]^\prime\) such that \([U]^\prime [A]^\prime [U^{-1}]^\prime = [A]^{\prime\prime}\).

13. Matricial characterization of unitary transformations

If \(U\) is a unitary transformation and \(\{x_{i}\}\) a coordinate system in which the matrix of \(U\) is \([U]\), then it follows from the multiplication rule for matrices and the equation \(UU^* = 1\), that \(\sum_{k} u_{i k} \bar{u}_{j k}=\delta_{ij}\). Clearly this condition characterizes \([U]\): i.e., it is necessary and sufficient in order that \([U]\) be unitary. We terminate, temporarily, our discussion of unitary transformations, and turn to the discussion of another special class of linear transformations that will be of great interest to us.

14. Orthogonal projections in an inner product space

We have seen (in I.13) that if \(M\) is a linear subspace then every vector \(x\) may be written, uniquely, in the form \(x=y+z\) with \(y\) in \(M\) and \(z\) in \(M^{\perp}\). \(y\) is called the projection of \(x\) on \(M\). We consider the correspondence \(P_{M}\) which assigns to every vector \(x\) its projection \(y=P_{M}{x}\) on \(M\): \(P_{M}\) is called a projection transformation or simply a projection. On occasions when it is not necessary to denote the dependence of the projection on the linear subspace in terms of which it was defined we shall use the letters \(E, F, \ldots\) for projections. We prove first that \(E=P_{M}\) is a linear transformation. For if \(x_{i}=y_{i}+z_{i}\); \(i=1, 2\), with \(y_{i}\) in \(M\) and \(z_{i}\) in \(M^{\perp}\), then \[ a_{1} x_{1}+a_{2} x_{2}=(a_{1} y_{1}+a_{2} y_{2})+(a_{1} z_{1}+a_{2} z_{2}), \] and from the fact that \(M\) and \(M^{\perp}\) are linear subspaces it follows therefore that \[ E(a_{1} x_{1}+a_{2} x_{2})=a_{1} y_{1}+a_{2} y_{2}=a_{1} Ey_{1}+a_{2} E y_{2}. \]

The following theorem gives a complete algebraic characterization of projections.

Theorem 3. If \(E\) is a projection then \(E=E^{2}=E^{*}\); conversely if \(E\) is a linear transformation for which \(E=E^{2}=E^{*}\) then \(E=P_{M}\) where \(M\) is the linear subspace of all vectors of the form \(Ex\).

Proof. If \({E}={P}_M\) and \(x=y+z\) with \(y\) in \(M\) and \(z\) in \(M^{\perp}\), then \(Ex=y\). Since \(y\) has the representation \(y=y+0\), with \(y\) in \(M\) and \(0\) in \(M^{\perp}\), and since this representation is unique, it follows that \({Ey}={y}\), whence \({EEx} = {E}^{2} {x}={Ex}\) for all \({x}\). To prove that \(E={E}^{*}\), let \(x_{1}\) and \(x_{2}\) be any two vectors and denote their projections on \(M\) (or on \(M^{\perp}\)) by \(y_{1}\) and \(y_{2}\) (or \(z_{1}\) and \(z_{2}\)) respectively. Then \[ \begin{aligned} \langle E x_{1}, x_{2} \rangle &= \langle y_{1}, x_{2} \rangle\\ &= \langle y_{1}, y_{2}+z_{2} \rangle\\ &= \langle y_{1}, y_{2} \rangle\\ &= \langle y_{1}+z_{1}, y_{2} \rangle\\ &= \langle x_{1}, E x_{2} \rangle; \end{aligned} \] and this implies that \({E}={E}^{*}\).

Conversely suppose that \({E}={E}^{2}={E}^{*}\), and let \(M\) be the linear subspace of all vectors of the form \(Ex\). Since for any \(x\) we have \(x=E x+(1-E) x\) with \(Ex\) in \(M\), the proof of the theorem will be complete when we succeed in proving that for every \(x\), \((1-{E}) x\) is in \(M^{\perp}\). But this follows from the equations \[ \begin{aligned} \big\langle E x,(1-E) y\big\rangle &= \langle E x, y \rangle - \langle E x, E y \rangle\\ &= \langle E x, y \rangle - \langle E^{*} E x, y \rangle \\ &= \langle E x, y \rangle - \langle E^{2} x, y \rangle\\ &= \langle E x, y \rangle - \langle E x, y \rangle\\ &= 0 \end{aligned} \] (We remark that the proof shows incidentally that \(M^{\perp}\) consists precisely of all vectors of the form \((1 - E)x\).)■

In different words this theorem states that the characteristic properties of a projection are that it is Hermitian \((E={E}^{*})\) and idempotent \((E=E^{2})\). As a corollary of this theorem we obtain the fact that if \(P_{M}=E\) then \(P_{M^{\perp}}=1-E\). Incidentally this theorem establishes a one to one correspondence between the class of all projections (idempotent and Hermitian transformations) and the class of all linear subspaces. In the following paragraphs we shall investigate this correspondence more closely and obtain conditions in order that certain algebraic combinations of projection transformations be themselves projections.

15. Products of projections

Theorem 4. A necessary and sufficient condition that the product \(EF\) of two projections, \(E=P_{M}\) and \(F=P_{N}\), be a projection is that \(EF=FE\); if this commutativity condition is satisfied then \(EF = FE = P_{M \cap N}\).

Proof. If \(EF\) is a projection, along with \(E\) and \(F\), then \[ EF = (EF)^{*} = F^{*} E^{*}={FE}, \] so that \(E\) and \(F\) commute. (In fact we have already seen in (II.8) that if \(A\) and \(B\) are Hermitian then \(A B\) is Hermitian if and only if \(A\) and \(B\) commute.) Conversely if \(EF = FE\) then \[ (EF)^2 = EFEF = EEFF = E^2 F^2 = EF \] and \[ ({EF})^{*}={F}^{*} {E}^{*}={FE}={EF}, \] so that \({EF}\) is Hermitian and idempotent. Finally suppose that \({E F}=P_{K}\); we know from (II.14) that \(K\) is the set of all vectors of the form \(EFx = FEx\). Then every vector in \({K}\) is simultaneously of the form \(Ey\) and \(Fz\), so that \(K\) is contained in the intersection \(M \cap N\). On the other hand if \(x\) is any vector in \({M \cap N}\) then \(F x=x\) and \(E x=x\), whence \(E F x=Ex=x\), so that \(x\) is contained in \(K\). This proves that \(K={M \cap N}\).■

16. Sums of projections

Theorem 5. A necessary and sufficient condition that the sum \(E+F\) of two projections, \({E}=P_{M}\) and \(F=P_{N}\), be a projection is that \(E F=F E=0\); if this condition is satisfied then \(M\) and \(N\) are orthogonal and \(E+F=P_{M+N}\).

Proof. If \(E+F\) is a projection then \[ E+F=(E+F)^{2}=E+{EF}+{FE}+{F}, \] whence \({EF}+{FE}=0\). Multiplying this equation first on the right and then on the left by \(E\) we obtain the equations \[ {E FE}+{FE}=0, \quad E F+E F E=0. \] Upon subtraction it follows that \({E F} - FE = 0\), and this, combined with our original equation, yields \(E F=F E=0\). Conversely, if \(EF = FE = 0\), then

\[ (E+F)^{2} = E + EF + FE + F = E + F, \] so that \(E+F\) is idempotent; being the sum of two Hermitian transformations it is also Hermitian and therefore it is a projection. Finally suppose that \(E+F=P_{K}\). Since \(P_{K}x=Ex+Fx\) and since \(E x\) is in \(M\) and \(F x\) is in \(N\), it follows that \(K\) is contained in \(M+N\). On the other hand if \(x\) is any vector in \(M+N\) then \(x\) has the form \(x=E y+F z\), whence \[ P_{K}x={E x}+{Fx}={Ey}+{EFz}+{FEy}+{Fz}={Ey}+{Fz}={x}, \] (since \({E F}=FE=0\)), so that every \(x\) in \(M+N\) has the form \(P_{K} x\) and is therefore in \(K\). This proves that \(K=M+N\). In order to show that \(M\) and \(N\) are orthogonal we must show that \(x\) in \(M\) and \(y\) in \(N\) implies \(\langle x, y \rangle = 0\). This follows from the equations \[ \begin{aligned} \langle x, y \rangle &= \langle E x, F y \rangle\\ &= \langle F^{*} {Ex}, y \rangle\\ &= \langle F E x, y \rangle\\ &= \langle 0, y \rangle\\ &= 0. \end{aligned} \]■

17. Differences of projections

Theorem 6. A necessary and sufficient condition that the difference \(E - F\) of two projections, \(E=P_{M}\) and \(F=P_{N}\), be a projection is that \(EF = FE = F\); if this condition is satisfied then \(N\) is contained in \(M\) and \(E - F = P_{M \cap N^\perp}\).

Proof. A necessary and sufficient condition in order that \(E - F\) be a projection is that \(1-(E-F)\) be one: i.e., that \((1-{E})+F\) be a projection. According to (II.16) above, this is equivalent to \((1-{E}) F=F(1-E)=0\). If moreover, this condition is satisfied and if \(E-F=P_{K}\) then we know, still from (II.16) that \(M^{\perp}\) and \(N\) are orthogonal, whence \(M^{\perp}\) is contained in \(N^\perp\), or, equivalently, \(N\) is contained in \(M\), and finally \[ K=(M^\perp \cap N\big)^\perp = M \cap N^\perp. \]■

18. Relation between projections and involutions

We conclude our discussion of projections by indicating their relation to certain other classes of transformations. We shall call a linear transformation \(A\) for which \(A^{2}=1\) an involution, or an involutory transformation. We assert now that if a linear transformation \(U\) has any two of the three properties—Hermitian, involutory, unitary—then it has the third.

(i). If \(U=U^{*}\) and \(U^{2}=1\) then \(UU^{*}=1\), so that \(U^{*}=U^{-1}\).

(ii). If \(U=U^{*}\) and \(U^{*}=U^{-1}\) then \(U=U^{-1}\) whence \(U^{2}=1\).

(iii). If \(U^{*}=U^{-1}\) and \(U^{2}=1\), then \(U=U^{-1}=U^{*}\).

Transformations having these three properties are related to projections through the following theorem.

Theorem 7. If two transformations \(U\) and \(E\) are related by the two (equivalent) conditions \[ U=2E - 1, \quad E=(1 / 2)(U+1), \] then a necessary and sufficient condition that \(E\) be a projection is that \(U\) be Hermitian, unitary, and involutory.

Proof. If \(E\) is a projection then, since \({E}^{*}=E\) we must also have \(U^{*}=U\), and from \(E^{2}=E\) it follows that \[ \begin{aligned} U^{2}&=4 E^{2}-2E+1\\ &=4 E-2 E+1\\ &=2 E+1\\ &=U. \end{aligned} \] Conversely if \(U\) is Hermitian then \(E\) is, and if \(U^{2}=1\) then \[ \begin{aligned} E^{2}&=(1 / 4)(U^{2}+2 U+1)\\ &=(1 / 4)(2 U+2)\\ &=E. \end{aligned} \]■

19. The rank of a linear transformation

We conclude this chapter by a discussion of two important numerical invariants of linear transformations: the rank and the norm.

If \(A\) is any linear transformation we define two linear subspaces \(M\) and \(N\) as follows: \(M\) is the set of all vectors of the form \(A x\), and \({N}\) is the set of all vectors \(x\) for which \(A x=0\). Let \(n\) be the dimension of the vector space \(\mathcal{V}\) under consideration, and let \(r\) and \(s\) be the dimensions of \(M\) and \(N\), respectively. We shall show that \(r=n-s\). The non-negative integer \(r\) is the rank of the linear transformation \(A\). Let \(x_{1}, x_{2}, \ldots, x_{s}\) be a linear basis in \(N\) (which is of course not necessarily a coordinate system in the sense of this chapter, but merely a maximal linearly independent set in \(N\)), and let the vectors \(x_{1}, x_{2}, \ldots, x_{s}, x_{s+1}, \ldots, x_{n}\) be a linear basis in \(\mathcal{V}\). Then every vector \(x\) in \(\mathcal{V}\) may be written (uniquely) in the form \(x=\sum_{i} a_{i} x_{i}\) and it follows that \[ A x=\sum_{i} a_{i} A x_{i}=\sum_{i=s+1}^{n} a_{i} A x_{i}. \]

In other words every vector of the form \(Ax\) is a linear combination of the \(n - s\) vectors \(A x_{s+1}, \ldots, A x_{n}\), so that \(r \leq n-s\). We shall prove that these \(n-s\) vectors are linearly independent, thereby proving that \(r=n-s\). If we had \[ 0=\sum_{i=s+1}^{n} c_{i} A x_{i}=A\Big(\sum_{i=s+1}^{n} c_{i} x_{i}\Big) \] then we should have \(\sum_{i=s+1}^{n} c_{i} x_{i}\) belonging to \(M\). accordingly this vector would have an expression of the form \(\sum_{i=s+1}^{n} c_{i} x_{i}=-\sum_{i=1}^{s} c_{i} x_{i}\), so that \(\sum_{i=1}^{n} c_{i} x_{i}=0\). The linear independence of the \(x_{i}\), \(i = 1, 2, \ldots,n\), implies then that \(c_{i}=0\).

If for the moment we denote the dependence of \(M\) and \(N\) on \(A\) by writing \(M=M_{A}\) and \(N=N_{A}\), and if \(B\) is an arbitrary linear transformation which has an inverse, then it is very easy to characterize \(N_{A B}\) and \(N_{B A}\). For \(A B x=0\) if and only if \(B x\) lies in \(N_{A}\) and \(B A x=0\) if and only if \(A x=0\), i.e., if and only if \(x\) lies in \(N_{A}\). Since the image of \(N_{A}\) under \(B^{-1}\) has the same dimension as \(N_{A}\) it follows in particular that \(A B\) and \(B A\) have the same rank as \(A\). If \(A\) is a linear transformation in an inner product space \(\mathcal{U}\) and if \(x\) is any vector in \(N_{A}\) then \(0 = \langle A x, y \rangle = \langle x, A^{*} y \rangle\) for all \(y\). In other words every vector of the form \(A^{*} y\) is orthogonal to \(x\), whence \(M_{A^{*}} \perp {N}_{A}\). It follows that \(M_{A^{*}}\) is contained in \(N_{A}^\perp\), whence, denoting the rank of \(A^{*}\) by \(r^{*}\), \(r^{*} \leq r\). Since this is generally true we may apply this result to \(A^{*}\) obtaining \(r \leq r^{*}\), so that \(r=r^{*}\). Finally we notice that if \(A_{1}\) and \(A_{2}\) are arbitrary linear transformations and \(r_{1}\) and \(r_{2}\) their ranks then, since \({N}_{A_{2}}\) is contained in \({N}_{A_{1} A_{2}}\), it follows that the rank \(r\) of \({A_{1} A_{2}}\) is \(\leq r_{2}\). Applying our previous result on duals and denoting the rank of \((A_{1} A_{2})^{*}=A_{2}^{*} A_{1}^{*}\) by \(r^{*}\), we obtain \(r=r^{*} \leq r_{1}^{*}=r_{1}\), and we obtain the result that the rank of a product of two transformations does not exceed the rank of either factor. (This is Sylvester’s law of nullity: the terminology arises from the fact that if \(A\) is a transformation of rank \(r\), \(s=n-r\) is called the nullity of \(A\).)

We observe that in virtue of (II.14) the rank of a projection \(P_{M}\) is the dimension of the linear subspace \(M\).

20. The norm of a linear transformation

In order to define the norm of a linear transformation \(A\) defined in an inner product space \(\mathcal{U}\) of dimension \(n\) we prove first the following theorem.

Theorem 8. There exists a constant \(K\) such that for all \(x\), \(|A x| \leq K|x|\).

Proof. Let \(\{x_{i}\}\) be an orthogonal basis in \(\mathcal{U}\), and choose a constant \(K_{0}\) so that \(|A x_{i}| for all \(i\). Since an arbitrary vector \(x\) may be written in the form \(x=\sum_{i} c_{i} x_{i}\), we have \[ \begin{aligned} |A x| &= \bigg|A\Big(\sum_{i} c_{i} x_{i}\Big)\bigg|\\ &= \bigg|\sum_{i} c_{i} A x_{i}\bigg|\\ &\leq \sum_{i}|c_{i}||A x_{i}|\\ &\leq K_{0} \sum_{i}|c_{i}| \end{aligned} \] whence it follows, remembering that \(c_{i} = \langle x, x_{i} \rangle\) and applying the Schwartz inequality, that \[ |A x| \leq K_{0} \sum_{i}|x||x_{i}|=K_{0} \sum_{i}|x|=n K_{0}|x|, \] so that we may choose \({K}={n}{K}_{0}\).■

The least number \({K}\) with the property described in the theorem is called the norm of \(A\): more formally the norm of \(A\) can be defined as \(|A|=\sup (|Ax| / |x|)\), with the supremum taken over all vectors \(x \neq 0\).

21. Expressions for the norm of a transformation

Along with the norm \(|A|\) of the transformation \(A\) we may consider the following three constants: \[ \begin{aligned} & p=\sup |A x| \text { for } |x|=1, \\ & q=\sup \big(|\langle A x, y \rangle| /|x||y|\big) \text { for } x, y \neq 0, \\ & r=\sup |\langle A x, y \rangle| \text { for } |x|=|y|=1. \end{aligned} \]

We shall prove that \({p}={q}={r}=|{A}|\). We first remark, however, that since \[ |\langle A x, y \rangle| \leq |A x||y| \leq |A||x||y|, \] the expressions defining \(q\) and \(r\) are bounded, and the suprema \(q\) and \(r\) are finite. It follows from this comment, and from the definitions of \(p\), \(q\), and \(r\), that \(r \leq q \leq |A|\), and we also have \({p} \leq |A|\). Accordingly we will have proved the equality of all four constants involved if we succeed in proving that \(p \geq |A|\), \(q \geq |A|\), and \(r \geq q\).

Since for any \(x \neq 0\), \(|A x| /|x|=|{A y}|\), where \(y=x /|x|\), and since \(|y|=1\), it follows that \(|A x| /|x| \leq p\), whence the supremum, \(|A|\), of the expression on the left is also \(\leq {p}\).

If for any vector \({x}\) for which neither \({x}=0\) nor \(A x=0\) we write \(y=A x\), we obtain the equation \[ |A x| /|x| = |\langle A x, y \rangle| /|x||y| \leq q. \] It follows that the supremum, \(|A|\), of the expression on the is \(\leq {q}\).

If for any pair \(x\), \(y\) of non-vanishing vectors we define \(x^{\prime}=x /|x|\) and \(y^{\prime}=y /|y|\), then we obtain, since \(|x|=|y^{\prime}|=1\), \[ |\langle A x, y \rangle| /|x||y|=|\langle A x^{\prime}, y^{\prime} \rangle| \leq r, \] so that the supremum, \(q\), of the expression on the left is also \(\leq r\).

The equation \(\langle A x, y \rangle = \langle x, A^{*} y \rangle\) implies immediately that \(|A|=|A^{*}|\). It is easy to verify that if \(U\) is unitary \(|U|=1\), and that if \(E\) is a projection, \(E \neq 0\), \(|E|=1\).

22. Upper and lower bounds of a Hermitian transformation

For Hermitian transformations \(A\) we can find still another interesting expression for the norm, \(|A|\). If \(c\) is an arbitrary real number we shall write \(A \geq c\) if the transformation \(A - c1\) is non-negative (Cf. II.7), and \(A \leq c\) if \(c1-A\) is non-negative. In this sense we may write \(-|A| \leq A \leq |A|\), and we may define the upper bound, \(\beta\), of the Hermitian transformation \(A\), to be the least (i.e., the infimum) of the numbers \(c\) for which \(A \leq c\); similarly we define the lower bound, \(\alpha\), of \(A\) to be the greatest (i.e., the supremum) of the numbers \(c\) for which \(c \leq A\). In other words, remembering the definition of non-negative transformations, \(\alpha\) and \(\beta\) are, respectively, the infimum and the supremum of the set of all real numbers of the form \(\langle A x, x \rangle /|x|^{2}\). Our main result is that the upper and lower bounds of \(A\) are related to the norm, \(|A|\), of \(A\) by the relation \[ |A|=\max (|\alpha|,|\beta|)=k. \] Since we have already observed that \(-|A| \leq A \leq |A|\), it follows that \(-|A| \leq \alpha \leq \beta \leq |A|\), so that \(k \leq |A|\). From the definition of \(k\) it follows that \(-{k} \leq A \leq k\), or, in other words, that the transformations \(k1 - A\) and \(k1 + A\) are non-negative. It follows from (II.8) that the transformations \[ (k1 + A)^*(k1 - A)(k1 + A) = (k1 + A)(k1 - A)(k1 + A) \] and \[ (k1 - A)^*(k1 + A)(k1 - A) = (k1 - A)(k1 + A)(k1 - A) \] are also non-negative. Hence so also is their sum, \((2 k)(k^{2} 1-A^{2})\). Since \(k=0\) implies \(|A|=0\), the theorem is trivial in this case; in any other case \(k\) is positive and we obtain therefore the result that \((k^{2} 1-A^{2})\) is a non-negative linear transformation. In other words \[ \begin{aligned} k^{2} |x|^{2}&=k^{2} \langle x, x \rangle\\ &> \langle A^{2} x, x \rangle\\ &= \langle A x, A x \rangle\\ &=|Ax|^{2}, \end{aligned} \] whence \(k>|A|\), and the proof of the theorem is complete. Since it is easy to prove, by the methods of the preceding paragraph, that \[ \sup |\langle A x, x \rangle| /|x|^{2}=\sup |\langle A x, x \rangle|, \] (where the first supremum is extended over all \(x \neq 0\) and the second one over all \(x\) with \(|x|=1\)), our result is equivalent to the statement that for Hermitian transformations \(A\), \(|A|=\sup |\langle A x, x \rangle|\) for \(|x|=1\).