It is sound geometric intuition that makes most of us conjecture that, for linear transformations, being invertible and being in some sense zero are exactly opposite notions. Our disappointment in finding that the range and the null-space need not be disjoint is connected with this conjecture. The situation can be straightened out by relaxing the sense in which we interpret "being zero"; for most practical purposes a linear transformation some power of which is zero (that is, a nilpotent transformation) is as zeroish as we can expect it to be. Although we cannot say that a linear transformation is either invertible or "zero" even in the extended sense of zeroness, we can say how any transformation is made up of these two extreme kinds.

Theorem 1. Every linear transformation \(A\) on a finite-dimensional vector space \(\mathcal{V}\) is the direct sum of a nilpotent transformation and an invertible transformation.

Proof. We consider the null-space of the \(k\) -th power of \(A\) ; this is a subspace \(\mathcal{N}_{k}=\mathcal{N}(A^{k})\) . Clearly \(\mathcal{N}_{1} \subset \mathcal{N}_{2} \subset \cdots\) . We assert first that if ever \(\mathcal{N}_{k}=\mathcal{N}_{k+1}\) , then \(\mathcal{N}_{k}=\mathcal{N}_{k+j}\) for all positive integers \(j\) . Indeed, if \(A^{k+j} x=0\) , then \(A^{k+1} A^{j-1} x=0\) , whence (by the fact that \(\mathcal{N}_{k}=\mathcal{N}_{k+1}\) ) it follows that \(A^{k} A^{j-1} x=0\) , and therefore that \(A^{k+j-1} x=0\) . In other words, \(\mathcal{N}_{k+j}\) is contained in (and therefore equal to) \(\mathcal{N}_{k+j-1}\) ; induction on \(j\) establishes our assertion.

Since \(\mathcal{V}\) is finite-dimensional, the subspaces \(\mathcal{N}_{k}\) cannot continue to increase indefinitely; let \(q\) be the smallest positive integer for which \(\mathcal{N}_{q}=\mathcal{N}_{q+1}\) . It is clear that \(\mathcal{N}_{q}\) is invariant under \(A\) (in fact each \(\mathcal{N}_{k}\) is such). We write \(\mathcal{R}_{k}=\mathcal{R}(A^{k})\) for the range of \(A^{k}\) (so that, again, it is clear that \(\mathcal{R}_{q}\) is invariant under \(A\) ); we shall prove that \(\mathcal{V}=\mathcal{N}_{q} \oplus \mathcal{R}_{q}\) and that \(A\) on \(\mathcal{N}_{q}\) is nilpotent, whereas on \(\mathcal{R}_{q}\) it is invertible. If \(x\) is a vector common to \(\mathcal{N}_{q}\) and \(\mathcal{R}_{q}\) , then \(A^{q} x=0\) and \(x=A^{q} y\) for some \(y\) . It follows that \(A^{2 q} y=0\) , and hence, from the definition of \(q\) , that \(x=A^{q} y=0\) . We have shown thus that the range and the null-space of \(A^{q}\) are disjoint; a dimensionality argument (see Section: Rank and nullity , Theorem 1) shows that they span \(\mathcal{V}\) , so that \(\mathcal{V}\) is their direct sum. It follows from the definitions of \(q\) and \(\mathcal{N}_{q}\) that \(A\) on \(\mathcal{N}_{q}\) is nilpotent of index \(q\) . If, finally, \(x\) is in \(\mathcal{R}_{q}\) (so that \(x=A^{q} y\) for some \(y\) ) and if \(A x=0\) , then \(A^{q+1} y=0\) , whence \(x=A^{q} y=0\) ; this shows that \(A\) is invertible on \(\mathcal{R}_{q}\) . The proof of Theorem 1 is complete. ◻

The decomposition of \(A\) into its nilpotent and invertible parts is unique. Suppose, indeed, that \(\mathcal{V}=\mathcal{H} \oplus \mathcal{K}\) so that \(A\) on \(\mathcal{H}\) is nilpotent and \(A\) on \(\mathcal{K}\) is invertible. Since \(\mathcal{H} \subset \mathcal{N}(A^k)\) for some \(k\) , it follows that \(\mathcal{H} \subset \mathcal{N}_q\) , and, since \(\mathcal{K} \subset \mathcal{R}(A^{k})\) for all \(k\) , it follows that \(\mathcal{K} \subset \mathcal{R}_{q}\) ; these facts together imply that \(\mathcal{H} = \mathcal{N}_{q}\) and \(\mathcal{K}=\mathcal{R}_{q}\) .

We can now use our results on nilpotent transformations to study the structure of arbitrary transformations. The method of getting a nilpotent transformation out of an arbitrary one may seem like a conjuring trick, but it is a useful trick, which is often employed. What is essential is the guaranteed existence of proper values; for that reason we continue to assume that the scalar field is algebraically closed (see Section: Multiplicity ).

Theorem 2. If \(A\) is a linear transformation on a finite-dimensional vector space \(\mathcal{V}\) , and if \(\lambda_{1}, \ldots, \lambda_{p}\) are the distinct proper values of \(A\) with respective algebraic multiplicities \(m_{1}, \ldots, m_{p}\) , then \(\mathcal{V}\) is the direct sum of \(p\) subspaces \(\mathcal{M}_{1}, \ldots, \mathcal{M}_{p}\) of respective dimensions \(m_{1}, \ldots, m_{p}\) , such that each \(\mathcal{M}_{j}\) is invariant under \(A\) and such that \(A-\lambda_{j}\) is nilpotent on \(\mathcal{M}_{j}\) .

Proof. Take any fixed \(j=1, \ldots, p\) , and consider the linear transformation \(A_{j}=A-\lambda_{j}\) . To \(A_{j}\) we may apply the decomposition of Theorem 1 to obtain subspaces \(\mathcal{M}_{j}\) and \(\mathcal{N}_{j}\) such that \(A_{j}\) is nilpotent on \(\mathcal{M}_{j}\) and invertible on \(\mathcal{N}_{j}\) . Since \(\mathcal{M}_{j}\) is invariant under \(A_{j}\) , it is also invariant under \(A_{j}+\lambda_{j}=A\) . Hence, for every \(\lambda\) , the determinant of \(A-\lambda\) is the product of the two corresponding determinants for the two linear transformations that \(A\) becomes when we consider it on \(\mathcal{M}_{j}\) and \(\mathcal{N}_{j}\) separately. Since the only proper value of \(A\) on \(\mathcal{M}_{j}\) is \(\lambda_{j}\) , and since \(A\) on \(\mathcal{N}_{j}\) does not have the proper value \(\lambda_{j}\) (that is, \(A-\lambda_{j}\) is invertible on \(\mathcal{N}_{j}\) ), it follows that the dimension of \(\mathcal{M}_{j}\) is exactly \(m_{j}\) and that each of the subspaces \(\mathcal{M}_{j}\) is disjoint from the span of all the others. A dimension argument proves that \(\mathcal{M}_{1} \oplus \cdots \oplus \mathcal{M}_p = \mathcal{V}\) and thereby concludes the proof of the theorem. ◻

We proceed to describe the principal results of this section and the preceding one in matricial language. If \(A\) is a linear transformation on a finite-dimensional vector space \(\mathcal{V}\) , then with respect to a suitable basis of \(\mathcal{V}\) , the matrix of \(A\) has the following form. Every element not on or immediately below the main diagonal vanishes. On the main diagonal there appear the distinct proper values of \(A\) , each a number of times equal to its algebraic multiplicity. Below any particular proper value there appear only \(1\) ’s and \(0\) ’s, and these in the following way: there are chains of \(1\) ’s followed by a single \(0\) , with the lengths of the chains decreasing as we read from top to bottom. This matrix is the Jordan form or the classical canonical form of \(A\) ; we have \(B=T A T^{-1}\) if and only if the classical canonical forms of \(A\) and \(B\) are the same except for the order of the proper values. (Thus, in particular, a linear transformation \(A\) is diagonable if and only if its classical canonical form is already diagonal, that is, if every chain of \(1\) ’s has length zero.)

Let us introduce some notation. Let \(A\) have \(p\) distinct proper values \(\lambda_{1}, \ldots, \lambda_{p}\) , with algebraic multiplicities \(m_{1}, \ldots, m_{p}\) , as before; let the number of chains of \(1\) ’s under \(\lambda_{j}\) be \(r_{j}\) , and let the lengths of these chains be \(q_{j, 1}-1, q_{j, 2}-1, \ldots, q_{j, r_{j}}-1\) . The polynomial \(e_{j i}\) defined by \(e_{j i}(\lambda)=(\lambda-\lambda_{j})^{q_{j, i}}\) is called an elementary divisor of \(A\) of multiplicity \(q_{j, i}\) belonging to the proper value \(\lambda_{j}\) . An elementary divisor is called simple if its multiplicity is \(1\) (so that the corresponding chain length is \(0\) ); we see that a linear transformation is diagonable if and only if its elementary divisors are simple.

To illustrate the power of Theorem 2 we make one application. We may express the fact that the transformation \(A-\lambda_{j}\) on \(\mathcal{M}_{j}\) is nilpotent of index \(q_{j, 1}\) by saying that the transformation \(A\) on \(\mathcal{M}_{j}\) is annulled by the polynomial \(e_{j 1}\) . It follows that \(A\) on \(\mathcal{V}\) is annulled by the product of these polynomials (that is, by the product of the elementary divisors of the highest multiplicities); this product is called the minimal polynomial of \(A\) .

It is quite easy to see (since the index of nilpotence of \(A-\lambda_{j}\) on \(\mathcal{M}_{j}\) is exactly \(q_{j, 1}\) ) that this polynomial is uniquely determined (up to a multiplicative factor) as the polynomial of smallest degree that annuls \(A\) . Since the characteristic polynomial of \(A\) is the product of all the elementary divisors, and therefore a multiple of the minimal polynomial, we obtain the Hamilton-Cayley equation : every linear transformation is annulled by its characteristic polynomial.

EXERCISES

Exercise 1. Find the Jordan form of \(\begin{bmatrix} 1 & 0 & 1 \\ 0 & 0 & 0 \\ 0 & 0 & -1 \end{bmatrix}\) .

Exercise 2. What is the maximum number of pairwise non-similar linear transformations on a three-dimensional vector space, each of which has the characteristic polynomial \((\lambda-1)^{3}\) ?

Exercise 3. Does every invertible linear transformation have a square root? (To say that \(A\) is a square root of \(B\) means, of course, that \(A^{2}=B\) .)

Exercise 4. 

  1. Prove that if \(\omega\) is a cube root of \(1\) ( \(\omega \neq 1\) ), then the matrices \[\begin{bmatrix} 0 & 1 & 0 \\ 0 & 0 & 1 \\ 1 & 0 & 0 \end{bmatrix} \quad \text { and } \quad \begin{bmatrix} 1 & 0 & 0 \\ 0 & \omega & 0 \\ 0 & 0 & \omega^{2} \end{bmatrix}\] are similar.
  2. Discover and prove a generalization of (a) to higher dimensions.

Exercise 5. 

  1. Prove that the matrices \[\begin{bmatrix} 0 & 1 & \alpha \\ 0 & 0 & 1 \\ 0 & 0 & 0 \end{bmatrix} \quad \text{ and } \quad \begin{bmatrix} 0 & 1 & 0 \\ 0 & 0 & 1 \\ 0 & 0 & 0 \end{bmatrix}\] are similar.
  2. Discover and prove a generalization of (a) to higher dimensions.

Exercise 6. 

  1. Show that the matrices \[\begin{bmatrix} 1 & 1 & 1 \\ 1 & 1 & 1 \\ 1 & 1 & 1 \end{bmatrix} \quad \text { and } \quad \begin{bmatrix} 3 & 0 & 0 \\ 0 & 0 & 0 \\ 0 & 0 & 0 \end{bmatrix}\] are similar (over, say, the field of complex numbers).
  2. Discover and prove a generalization of (a) to higher dimensions.

Exercise 7. If two real matrices are similar over \(\mathbb{C}\) , then they are similar over \(\mathbb{R}\) .

Exercise 8. Prove that every matrix is similar to its transpose.

Exercise 9. If \(A\) and \(B\) are \(n\) -by- \(n\) matrices such that the \(2 n\) -by- \(2 n\) matrices \[\begin{bmatrix} A & 0 \\ 0 & A \end{bmatrix} \quad \text{ and } \quad \begin{bmatrix} B & 0 \\ 0 & B \end{bmatrix}\] are similar, then \(A\) and \(B\) are similar.

Exercise 10. Which of the following matrices are diagonable (over the field of complex numbers)?

  1. \(\begin{bmatrix} 0 & 0 & 1 \\ 1 & 0 & 0 \\ 0 & 1 & 0 \end{bmatrix}\) ,
  2. \(\begin{bmatrix} 0 & 0 & 1 \\ 0 & 0 & 0 \\ 0 & 0 & 0 \end{bmatrix}\) ,
  3. \(\begin{bmatrix} 0 & 0 & 1 \\ 0 & 0 & 0 \\ -1 & 0 & 0 \end{bmatrix}\) ,
  4. \(\begin{bmatrix} 0 & 0 & 1 \\ 0 & 0 & 0 \\ 1 & 0 & 0 \end{bmatrix}\) ,
  5. \(\begin{bmatrix} 1 & 0 & 0 \\ 0 & 0 & 0 \\ 0 & 0 & 1 \end{bmatrix}\) .

What about the field of real numbers?

Exercise 11. Show that the matrix \[\begin{bmatrix} 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \\ 1 & 0 & 0 & 0 \end{bmatrix}\] is diagonable over the field of complex numbers but not over the field of real numbers.

Exercise 12. Let \(\pi\) be a permutation of the integers \(\{1, \ldots, n\}\) ; if \(x=(\xi_{1}, \ldots, \xi_{n})\) is a vector in \(\mathbb{C}^{n}\) , write \(A x=(\xi_{\pi(1)}, \ldots, \xi_{\pi(n)})\) . Prove that \(A\) is diagonable and find a basis with respect to which the matrix of \(A\) is diagonal.

Exercise 13. Suppose that \(A\) is a linear transformation and that \(\mathcal{M}\) is a subspace invariant under \(A\) . Prove that if \(A\) is diagonable, then so also is the restriction of \(A\) to \(\mathcal{M}\) .

Exercise 14. Under what conditions on the complex numbers \(\alpha_{1}, \ldots, \alpha_{n}\) is the matrix \[\begin{bmatrix} 0 & \cdots & 0 & \alpha_{1} \\ 0 & \cdots & \alpha_{2} & 0 \\ \vdots & \iddots & \vdots & \vdots \\ \alpha_{n} & \cdots & 0 & 0 \end{bmatrix}\] diagonable (over the field of complex numbers)?

Exercise 15. Are the following assertions true or false?

  1. A real two-by-two matrix with a negative determinant is similar to a diagonal matrix.
  2. If \(A\) is a linear transformation on a complex vector space, and if \(A^{k}=1\) for some positive integer \(k\) , then \(A\) is diagonable.
  3. If \(A\) is a nilpotent linear transformation on a finite-dimensional vector space, then \(A\) is diagonable.

Exercise 16. If \(A\) is a linear transformation on a finite-dimensional vector space over an algebraically closed field, and if every proper value of \(A\) has algebraic multiplicity \(1\) , then \(A\) is diagonable.

Exercise 17. If the minimal polynomial of a linear transformation \(A\) on an \(n\) -dimensional vector space has degree \(n\) , then \(A\) is diagonable.

Exercise 18. Find the minimal polynomials of all projections and all involutions.

Exercise 19. What is the minimal polynomial of the matrix \[\begin{bmatrix} \lambda_{1} & 0 & \cdots & 0 \\ 0 & \lambda_{2} & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \cdots & \lambda_{n} \end{bmatrix}?\] 

Exercise 20. 

  1. What is the minimal polynomial of the differentiation operator on \(\mathcal{P}_{n}\) ?
  2. What is the minimal polynomial of the transformation \(A\) on \(\mathcal{P}_{n}\) defined by \((A x)(t)=x(t+1)\) ?

Exercise 21. If \(A\) is a linear transformation with minimal polynomial \(p\) , and if \(q\) is a polynomial such that \(q(A)=0\) , then \(q\) is divisible by \(p\) .

Exercise 22. 

  1. If \(A\) and \(B\) are linear transformations, if \(p\) is a polynomial such that \(p(A B)=0\) , and if \(q(t)=t p(t)\) , then \(q(B A)=0\) .
  2. What can be inferred from (a) about the relation between the minimal polynomials of \(A B\) and of \(B A\) ?

Exercise 23. A linear transformation is invertible if and only if the constant term of its minimal polynomial is different from zero.