As an aid to getting a representation theorem more informative than the triangular one, we proceed to introduce and to study a very special but useful class of transformations. A linear transformation \(A\) is called nilpotent if there exists a strictly positive integer \(q\) such that \(A^{q}=0\) ; the least such integer \(q\) is the index of nilpotence.

Theorem 1. If \(A\) is a nilpotent linear transformation of index \(q\) on a finite-dimensional vector space \(\mathcal{V}\) , and if \(x_{0}\) is a vector for which \(A^{q-1} x_{0}\neq 0\) , then the vectors \(x_{0}, A x_{0}, \ldots, A^{q-1} x_{0}\) are linearly independent. If \(\mathcal{H}\) is the subspace spanned by these vectors, then there exists a subspace \(\mathcal{K}\) such that \(\mathcal{V}=\mathcal{H} \oplus \mathcal{K}\) and such that the pair \((\mathcal{H}, \mathcal{K})\) reduces \(A\) .

Proof. To prove the asserted linear independence, suppose that \(\sum_{i=0}^{q-1} \alpha_{i} A^{i} x_{0}=0\) , and let \(j\) be the least index such that \(\alpha_{j} \neq 0\) . (We do not exclude the possibility \(j=0\) .) Dividing through by \(-\alpha_{j}\) and changing the notation in an obvious way, we obtain a relation of the form \begin{align} A^{j} x_{0} &= \sum_{i=j+1}^{q-1} \alpha_{i} A^{i} x_{0}\\ &= A^{j+1}\Big(\sum_{i=j+1}^{q-1} \alpha_{i} A^{i-j-1} x_{0}\Big)\\ &= A^{j+1} y. \end{align} 

It follows from the definition of \(q\) that \begin{align} A^{q-1} x_{0} &= A^{q-j-1} A^{j} x_{0}\\ &= A^{q-j-1} A^{j+1} y\\ &= A^{q} y\\ &= 0; \end{align} 

since this contradicts the choice of \(x_{0}\) , we must have \(\alpha_{j}=0\) for each \(j\) .

It is clear that \(\mathcal{H}\) is invariant under \(A\) ; to construct \(\mathcal{K}\) we go by induction on the index \(q\) of nilpotence. If \(q=1\) , the result is trivial; we now assume the theorem for \(q-1\) . The range \(\mathcal{R}\) of \(A\) is a subspace that is invariant under \(A\) ; restricted to \(\mathcal{R}\) the linear transformation \(A\) is nilpotent of index \(q-1\) . We write \(\mathcal{H}_{0}=\mathcal{H} \cap \mathcal{R}\) and \(y_{0}=A x_{0}\) ; then \(\mathcal{H}_{0}\) is spanned by the linearly independent vectors \(y_{0}, A y_{0}, \ldots, A^{q-2} y_{0}\) . The induction hypothesis may be applied, and we may conclude that \(\mathcal{R}\) is the direct sum of \(\mathcal{H}_{0}\) and some other invariant subspace \(\mathcal{K}_{0}\) .

We write \(\mathcal{K}_{1}\) for the set of all vectors \(x\) such that \(A x\) is in \(\mathcal{K}_{0}\) ; it is clear that \(\mathcal{K}_{1}\) is a subspace. The temptation is great to set \(\mathcal{K}=\mathcal{K}_{1}\) and to attempt to prove that \(\mathcal{K}\) has the desired properties. Unfortunately this need not be true; \(\mathcal{H}\) and \(\mathcal{K}_{1}\) need not be disjoint. (It is true, but we shall not use the fact, that the intersection of \(\mathcal{H}\) and \(\mathcal{K}_{1}\) is contained in the null-space of \(A\) .) That, in spite of this, \(\mathcal{K}_{1}\) is useful is caused by the fact that \(\mathcal{H}+\mathcal{K}_{1}=\mathcal{V}\) . To prove this, observe that \(A x\) is in \(\mathcal{R}\) for every \(x\) , and, consequently, \(A x=y+z\) with \(y\) in \(\mathcal{H}_{0}\) and \(z\) in \(\mathcal{K}_{0}\) . The general element of \(\mathcal{H}_{0}\) is a linear combination of \(A x_{0}, \ldots, A^{q-1} x_{0}\) ; hence we have \begin{align} y &= \sum_{i=1}^{q-1} \alpha_{i} A^{i} x_{0}\\ &= A\Big(\sum_{i=0}^{q-2} \alpha_{i+1} A^{i} x_{0}\Big)\\ &= A y_{1}, \end{align} 

where \(y_{1}\) is in \(\mathcal{H}\) . It follows that \(A x=A y_{1}+z\) , or \(A(x-y_{1})=z\) , so that \(A(x-y_{1})\) is in \(\mathcal{K}_{0}\) . This means that \(x-y_{1}\) is in \(\mathcal{K}_{1}\) , so that \(x\) is the sum of an element (namely \(y_{1}\) ) of \(\mathcal{H}\) and an element (namely \(x-y_{1}\) ) of \(\mathcal{K}_{1}\) .

As far as disjointness is concerned, we can say at least that \(\mathcal{H} \cap \mathcal{K}_{0}=\mathcal{O}\) . To prove this, suppose that \(x\) is in \(\mathcal{H} \cap \mathcal{K}_{0}\) , and observe first that \(A x\) is in \(\mathcal{H}_{0}\) (since \(x\) is in \(\mathcal{H}\) ). Since \(\mathcal{K}_{0}\) is also invariant under \(A\) , the vector \(A x\) belongs to \(\mathcal{K}_{0}\) along with \(x\) , so that \(A x=0\) . From this we infer that \(x\) is in \(\mathcal{H}_{0}\) . (Since \(x\) is in \(\mathcal{H}\) , we have \(x=\sum_{i=0}^{q-1} \alpha_{i} A^{i} x_{0}\) ; and therefore \(0=A x=\sum_{i=1}^{q-1} \alpha_{i-1} A^{i} x_{0}\) ; from the linear independence of the \(A^{j} x_{0}\) it follows that \(\alpha_{0}=\cdots=\alpha_{q-2}=0\) , so that \(x=\alpha_{q-1} A^{q-1} x_{0}\) .) We have proved that if \(x\) belongs to \(\mathcal{H} \cap \mathcal{K}_{0}\) , then it belongs also to \(\mathcal{H}_{0} \cap \mathcal{K}_{0}\) , and hence that \(x=0\) .

The situation now is this: \(\mathcal{H}\) and \(\mathcal{K}_{1}\) together span \(\mathcal{V}\) , and \(\mathcal{K}_{1}\) contains the two disjoint subspaces \(\mathcal{K}_{0}\) and \(\mathcal{H} \cap \mathcal{K}_{1}\) . If we let \(\mathcal{K}_{0}^\prime\) be any complement of \(\mathcal{K}_{0} \oplus (\mathcal{H} \cap \mathcal{K}_{1})\) in \(\mathcal{K}_{1}\) , that is, if \[\mathcal{K}_{0}^{\prime} \oplus \mathcal{K}_{0} \oplus (\mathcal{H} \cap \mathcal{K}_{1})=\mathcal{K}_{1},\] then we may write \(\mathcal{K}=\mathcal{K}_0^{\prime} \oplus \mathcal{K}_{0}\) ; we assert that this \(\mathcal{K}\) has the desired properties. In the first place, \(\mathcal{K} \subset \mathcal{K}_{1}\) and \(\mathcal{K}\) is disjoint from \(\mathcal{H} \cap \mathcal{K}_{1}\) ; it follows that \(\mathcal{H} \cap \mathcal{K} = \mathcal{O}\) . In the second place, \(\mathcal{H} \oplus \mathcal{K}\) contains both \(\mathcal{H}\) and \(\mathcal{K}_{1}\) , so that \(\mathcal{H} \oplus \mathcal{K} = \mathcal{V}\) . Finally, \(\mathcal{K}\) is invariant under \(A\) , since the fact that \(\mathcal{K} \subset \mathcal{K}_{1}\) implies that \(A\mathcal{K} \subset \mathcal{K}_{0} \subset \mathcal{K}\) . The proof of the theorem is complete. ◻

Later we shall need the following remark. If \(\tilde{x}_{0}\) is any other vector for which \(A^{q-1} \tilde{x}_{0} \neq 0\) , if \(\tilde{\mathcal{H}}\) is the subspace spanned by the vectors \(x_{0}, A x_{0}, \ldots, A^{q-1} \tilde{x}_{0}\) , and if, finally, \(\tilde{\mathcal{K}}\) is any subspace that together with \(\tilde{\mathcal{H}}\) reduces \(A\) , then the behavior of \(A\) on \(\tilde{\mathcal{H}}\) and \(\tilde{\mathcal{K}}\) is the same as its behavior on \(\mathcal{H}\) and \(\mathcal{K}\) respectively. (In other words, in spite of the apparent non-uniqueness in the statement of Theorem 1, everything is in fact uniquely determined up to isomorphisms.) The truth of this remark follows from the fact that the index of nilpotence of \(A\) on \(\mathcal{K}\) ( \(r\) , say) is the same as the index of nilpotence of \(A\) on \(\tilde{\mathcal{K}}\) ( \(\tilde{r}\) , say). This fact, in turn, is proved as follows. Since \(A^{r}\mathcal{V}=A^{r} \mathcal{H}+A^{r} \mathcal{K}\) and also \(A^{r} \mathcal{V}=A^{r} \tilde{\mathcal{H}}+A^{r} \tilde{\mathcal{K}}\) (these results depend on the invariance of all the subspaces involved), it follows that the dimensions of the right sides of these equations may be equated, and hence that \((q-r)+0=(q-r)+(\tilde{r}-r)\) .

Using Theorem 1 we can find a complete geometric characterization of nilpotent transformations.

Theorem 2. If \(A\) is a nilpotent linear transformation of index \(q\) on a finite-dimensional vector space \(\mathcal{V}\) , then there exist positive integers \(r, q_{1}, \ldots, q_{r}\) and vectors \(x_{1}, \ldots, x_{r}\) such that (i) \(q_{1} \geq \cdots \geq q_{r}\) , (ii) the vectors \begin{align} x_1, Ax_1, &\ldots, A^{q_1 - 1}x_1,\\ x_2, Ax_2, &\ldots, A^{q_2 - 1}x_2,\\ &\vdots\\ x_r, Ax_r, &\ldots, A^{q_r - 1}x_r \end{align} 

form a basis for \(\mathcal{V}\) , and (iii) \(A^{q_{1}} x_{1}=A^{q_{2}} x_{2}=\cdots=A^{q_{r}} x_{r}=0\) . The integers \(r, q_{1}, \ldots, q_{r}\) form a complete set of isomorphism invariants of \(A\) . If, in other words, \(B\) is any other nilpotent linear transformation on a finite-dimensional vector space \(\mathcal{W}\) , then a necessary and sufficient condition that there exist an isomorphism \(T\) between \(\mathcal{V}\) and \(\mathcal{W}\) such that \(T A T^{-1}=B\) is that the integers \(r, q_{1}, \ldots, q_{r}\) attached to \(B\) be the same as the ones attached to \(A\) .

Proof. We write \(q_{1}=q\) and we choose \(x_{1}\) to be any vector for which \(A^{q_{1}-1} x_{1} \neq 0\) . The subspace spanned by \(x_{1}, A x_{1}, \ldots, A^{q_{1}-1} x_{1}\) is invariant under \(A\) , and, by Theorem 1 , possesses an invariant complement, which, naturally, has strictly lower dimension than \(\mathcal{V}\) . On this complementary subspace \(A\) is nilpotent of index \(q_{2}\) , say; we apply the same reduction procedure to this subspace (beginning with a vector \(x_{2}\) for which \(A^{q_{2}-1} x_{2} \neq 0\) ).

We continue thus by induction till we exhaust the space. This proves the existential part of the theorem; the remaining part follows from the uniqueness (up to isomorphisms) of the decomposition given by Theorem 1. ◻

With respect to the basis \(\{A^{i} x_{j}\}\) described in Theorem 2, the matrix of \(A\) takes on a particularly simple form. Every matrix element not on the diagonal just below the main diagonal vanishes (that is, \(\alpha_{i j} \neq 0\) implies \(j=i-1\) ), and the elements below the main diagonal begin (at top) with a string of \(1\) ’s followed by a single \(0\) , then go on with another string of \(1\) ’s followed by a \(0\) , and continue so on to the end, with the lengths of the strings of \(1\) ’s monotonely decreasing (or, at any rate, non-increasing).

Observe that our standing assumption about the algebraic closure of the field of scalars was not used in this section.

EXERCISES

Exercise 1. Does there exist a nilpotent transformation of index \(3\) on a \(2\) -dimensional space?

Exercise 2. 

  1. Prove that a nilpotent linear transformation on a finite-dimensional vector space has trace zero.
  2. Prove that if \(A\) and \(B\) are linear transformations (on the same finite-dimensional vector space) and if \(C=A B-B A\) , then \(1-C\) is not nilpotent.

Exercise 3. Prove that if \(A\) is a nilpotent linear transformation of index \(q\) on a finite-dimensional vector space, then \[\nu(A^{k+1})+\nu(A^{k-1}) \leq 2 \nu(A^{k})\] for \(k=1, \ldots, q-1\) .

Exercise 4. If \(A\) is a linear transformation (on a finite-dimensional vector space over an algebraically closed field), then there exist linear transformations \(B\) and \(C\) such that \(A=B+C\) , \(B\) is diagonable, \(C\) is nilpotent, and \(B C=C B\) ; the transformations \(B\) and \(C\) are uniquely determined by these conditions.