Canonical Forms

Throughout this chapter we shall be concerned with an \(n\) dimensional vector space \(\mathcal{V}\). We shall not assume that it is a complex vector space: the scalar coefficients may lie in an arbitrary field. The treatment we shall give is taken from N. Jacobson’s Notes on Algebra.


 

1. The space of linear transformations

The set of all linear transformations defined on \(\mathcal{V}\) is an \(n^{2}\) dimensional vector space. For if \(\{x_i\}\) is a basis in \(\mathcal{V}\) we may define the linear transformations \({E}_{ij}\) in \(\mathcal{V}\) by \({E}_{ij} x_{k}=\delta_{j k} x_{i}\). (We have: \[ E_{ij} E_{pq} x_k = \delta_{qk} E_{ij} x_p = \delta_{qk} \delta_{jp} x_i, \] so that \(E_{ij} E_{pq} = \delta_{jp} E_{iq}\).) The \({E}_{ij}\) are linearly independent, for \(\sum_{ij}a_{ij} {E}_{ij} {x}=0\) for all \(x\) implies \[ \begin{aligned} 0 &= \sum_{ij} a_{ij} E_{ij} x_k\\ &= \sum_{ij} a_{ij} \delta_{jk} x_i\\ &= \sum_i a_{ik} x_i, \end{aligned} \] whence \(a_{ik} = 0\) for all \(i\), \(k\). Moreover every linear transformation \(A\) is a linear combination of the \(E_{ij}\): if the matrix of \(A\) is \((a_{ij})\), so that \(A x_{j}=\sum_i a_{ij} x_{i}\), then \(A x_{j}=\sum_i a_{ij} {E}_{ij} x_{j}\), so that \(A=\sum_{ij}{a}_{ij} {E}_{ij}\).

2. The minimal polynomial of a transformation

If \(A\) is an arbitrary linear transformation, the \(n^{2}+1\) linear transformations \(1, A, A^{2}, \ldots, A^{n^{2}}\) are linearly dependent, from the result of the preceding section, so that there exists a polynomial \(m(\lambda)\) of degree \(\leq n^{2}\) for which \(m(A)=0\).

Definition 1. The polynomial \(m(\lambda)\) of minimal degree \(m\) and leading coefficient \(1\) for which \(m(A)=0\) is the minimal polynomial of \(A\).

We observe that it is uniquely determined by the conditions we placed on it: if \(f(\lambda)\) is such that \(f(A)=0\), we may divide \(f(\lambda)\) by \(m(\lambda)\) and obtain \(f(\lambda)=q(\lambda) m(\lambda)+r(\lambda)\), where the degree of \(r(\lambda)\) is less than \(m\). From the relation \(f(A)=q(A) m(A)+r(A)=0\) we see that \(r(A)=0\), whence, by the properties of \(m(\lambda)\), \(r(\lambda)\) must be identically zero. Consequently \(m(\lambda)\) divides any \(f(\lambda)\) for which \(f(\lambda)=0\): this proves that it is uniquely determined by our conditions. Moreover if \(B={SA}{S}^{-1}\), then \(m(B)={S}m(A) S^{-1}=0\), so that \(m(\lambda)\) must be a divisor of the minimal polynomial of \(B\). Interchanging the roles of \(A\) and \(B\) we obtain the result that the minimal polynomials of two similar transformations (i.e., of two transformations \(A\) and \(B\) related by an equation of the form \(B=S A S^{-1}\)) are equal.

3. The order of a vector

Since for an arbitrary vector \(x\), the vectors \(x, A x, A^{2} x, \ldots, {A}^{n} x\), being in number more than \(n\), are linearly dependent, there exists a polynomial \(m_{x}(\lambda)\) for which \(m_{x}(A) x=0\). We verify easily, as in the previous section, that the requirements that \(m_{x}(\lambda)\) have minimal degree and have leading coefficient one uniquely determine \(m_{x}(\lambda)\): we call this polynomial the order of \(x\) (relative to \(A\)). Since, moreover, \(m_{x}(\lambda)\) is a factor of every polynomial \(f(\lambda)\) for which \(f(A) x=0\), the least common multiple (L.C.M.) of all \({m}_{x}(\lambda)\) exists and is a factor of \({m}(\lambda)\). If \(\{x_i\}\) is a basis in \(\mathcal{V}\) and \(m^{*}(\lambda)= \text{L.C.M.}\, m_{x_{i}}(\lambda)\) then for every \(x=\sum_i a_{i} x_{i}\) in \(\mathcal{V}\) we have \[ m^{*}(A) x=\sum_i a_{i} m^{*}(A) x_{i}=0, \] or \(m^{*}(A)=0\), so that \(m^{*}(\lambda)\) is divisible by \(m(\lambda)\). It follows that \(m(\lambda)\) is exactly equal to \(m^{*}(\lambda)\), or equivalently, to the L.C.M. of all \(m_x(\lambda)\).

4. Quotient spaces

For an arbitrary linear subspace \(M\) in \(\mathcal{V}\) we write \(x_{1} \equiv x_{2} \,\, (\operatorname{mod} M)\) if \(x_{1}-x_{2}\) is in \(M\). The set of all vectors \(x\) in \(\mathcal{V}\), with the same meaning of addition and scalar multiplication as before, but with the notion of equality of two vectors redefined to mean their congruence \(\operatorname{mod} M\), is a vector space: we denote it by \(\mathcal{V} / {M}\). We prove that if the dimension of \(M\) is \(r\) then that of \(\mathcal{V} / M\) is \(n-r\). For let \(N\) be an arbitrary linear subspace whose intersection with \(M\) consists of \(0\) alone and which together with \(M\) spans \(\mathcal{V}\). If \(x_{1}, x_{2}, \ldots, x_{n-r}\) is a basis in \(N\), then the \(x_{i} \operatorname{mod} M\) are a basis in \(\mathcal{V} / M\). If \(\sum_i a_{i} x_{i} \equiv 0 \,\, (\operatorname{mod} M)\), i.e, if \(\sum_i a_{i} x_{i}\) lies in \(M\), then, since \(N \cap M=\{0\}\), we must have \(\sum_i a_{i} x_{i}=0\), and the linear independence of the \(x_{i}\) implies that \(a_{i}=0\) for all \(i\). Hence the \(x_{i} \operatorname{mod} M\) are linearly independent. To prove that they span \(\mathcal{V} / M\) we observe that every \(x\) in \(\mathcal{V}\) may be written in the form \(x=y+z\), with \(y\) in \(M\) and \(z\) in \(N\). In other words, since \(x-z=y\) is in \({M}\), every \(x\) is congruent, \(\operatorname{mod} {M}\), to some \({z}\) in \({N}\), as was to be proved.

If \(A\) is a linear transformation which leaves \(M\) invariant, i.e., for which \(x\) in \(M\) implies \(A x\) in \(M\), then \(x \equiv y \,\, (\operatorname{mod} M)\) implies \(A x \equiv A y \,\, (\operatorname{mod} M)\), so that \(A\) induces a linear transformation in \(\mathcal{V} / M\). The space \(\mathcal{V} / M\) is the quotient space of \(\mathcal{V}\) modulo \(M\) (or \(\mathcal{V}\) by \(M\)).

5. Cyclic spaces

Definition 2. For any vector \(x\), and given linear transformation \(A\), we denote by \(\{x\}\) the linear subspace spanned by all vectors \(x, A x, A^{2} x, A^{3} x, \ldots\): we call such linear subspaces cyclic.

If the order of \(x\) is \(m(\lambda)=\lambda^{m}-\sum_{i=0}^{m-1} a_{i} \lambda^{i}\), then \(x, A x, \ldots, A^{m-1} x\) are linearly independent, but together with \({A}^{m} x\) they form a linearly dependent set, so that the dimension of \(\{x\}\) is \(m\). The matrix of \(A\) in the space \(\{x\}\) with respect to the basis \(x, A x, \ldots, A^{m-1} x\) is \[ \begin{bmatrix} 0 & 1 & 0 & \cdots & 0 \\ 0 & 0 & 1 & \cdots & 0 \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & 0 & \cdots & 1 \\ a_{1} & a_{2} & a_{3} & \cdots & a_{m} \end{bmatrix}. \]

We note that \(m(\lambda)\) is the minimal polynomial of \(A\) in \(\{x\}\) and therefore depends only on the subspace \(\{x\}\) and not on the particular generator \(x\).

6. Vectors of maximal order

The fundamental lemma on which the work of this chapter is based is the following.

Lemma 1. Given any linear transformation \(A\), there exists a cyclic subspace \(\{x\}\) whose order is the minimal polynomial, \(m(\lambda)\), of \(A\).

Proof. For let \(\{x_i\}\) be a basis in \(\mathcal{V}\) and let \(m_{i}(\lambda)\) be the order of \(x_{i}\). If \[ m_{i}(\lambda)=p_{1}(\lambda)^{e_{i1}} \ldots p_{r}(\lambda)^{e_{ir}} \] is the factorization of \(m_{i}(\lambda)\) into powers of irreducible polynomials and if we write \(e_{j}=\max (e_{1j}, \ldots, e_{n j})\), for \(j = 1, 2, \ldots, r\), then \[ m(\lambda) = \text{L.C.M.} \, m_{i}(\lambda) = p_{1}(\lambda)^{e_{1}} \ldots p_{r}(\lambda)^{e_{r}} \] is the minimal polynomial of \(A\). We write \[ y_{ij}=p_{1}(A)^{e_{i1}} \ldots p_{j-1}(A)^{e_{i(j - 1)}} p_{j+1}(A)^{e_{i(j + 1)}} \ldots p_{r}(A)^{e_{ir}}x_{i}; \] then the order of \(y_{ij}\) is \({p}(A)^{e_{ij}}\). Hence among the \({y}_{ij}\) we may find vectors \({y}_{1}, {y}_{2}, \ldots {y}_{{r}}\) whose orders are, respectively, \({p}_{1}(\lambda)^{e_{1}}, {p}_{2}(\lambda)^{e_2}, \ldots, {p}_{{r}}(\lambda)^{e_r}\). We shall prove that \({y}={y}_{1}+{y}_{2}+\cdots+{y}_{{r}}\) has the order \(m(\lambda)\).

For any polynomial \(f(\lambda)\), denote by \(N(f)\) the linear subspace of vectors \(x\) for which \(f(A) x=0\). It is easy to verify that \(N(f)\) is invariant under \({A}\). We assert that if \(f(\lambda)\) and \(g(\lambda)\) are relatively prime then \(N(f)\) and \(N(g)\) have only \(0\) in common. For if \(f(A) x=g(A) x=0\) then \(m_{x}(\lambda)\) must be a divisor of both \(f(\lambda)\) and \(g(\lambda)\), whence \(m_{x}(\lambda)=1\), and \(x=0\). Hence, in our case, the intersection of any two of the subspaces \[ N_{i} = N\big(p_i(\lambda)^{e_i}\big) \] consists of \(0\) alone, and the \({N}_{i}\) are invariant under \(A\). It follows (since \(y_{i}\) is in \(N_{i}\)) that \[ f(A) y=f(A) y_{1}+f(A) y_{2}+\cdots+f(A) y_{r}=0 \] implies \(f(A) y_{j}=0\), so that \(f(\lambda)\) must be divisible by \(p_{j}(\lambda)^{e_j}\), \(j=1, \ldots, r\), and therefore by \(m(\lambda)\). This completes the proof of our lemma.

Since the order of any vector is a polynomial of degree \(\leq n\), we have proved that the degree of \(m(\lambda)\) is \(\leq n\): if it is equal to \(n\) then \(\mathcal{V}\) itself is cyclic.

7. The rational canonical form

On the basis of the lemma of the preceding section we shall now prove that \(\mathcal{V}\) is the direct sum of cyclic subspaces: i.e., that we can find vectors \(x_{1}, \ldots, x_{r}\) of orders \(m_{1}(\lambda), \ldots, m_{r}(\lambda)\) respectively (where \(m_{i}(\lambda)\) is a polynomial of degree \(m_{i}\)) so that the vectors \(A^{j} x_{i}\), \(i=1, \ldots, r\), \(j=0,1, \ldots, m_{i}-1\), are a basis in \(\mathcal{V}\).

Let \(m_{1}(\lambda)=m(\lambda)\) be the minimal polynomial of \(A\) and let \(x_{1}\) be a vector whose order is \(m_{1}(\lambda)\). Let \(m_{2}(\lambda)\) be the minimal polynomial of \(A\) in the quotient space \(\mathcal{V} /\{x_1\}\), and let \(x_{2}^\prime\) be a vector of this space whose order (\(\operatorname{mod} \,\{x_1\}\)) is \(m_{2}(\lambda)\). This means that \(m_{2}(A) x_{2}^{\prime} \equiv 0 \,\, (\operatorname{mod} \,\{x_1\})\), i.e., that \(m_{2}(A) x_{2}^{\prime}\) is in \(\{x_1\}\), so that \(m_{2}(A) x_{2}^{\prime}=h_{1}(A) x_{1}\) for a suitable polynomial \(h_{1}(\lambda)\).

Since \(m_{1}(A) x_{2}^{\prime}=0\) (and, a fortiori, \(m_{1}(A) x_{2}^{\prime} \equiv 0 \,\, (\operatorname{mod} \,\{x_1\})\)) \(m_{2}(\lambda)\) must be a factor of \(m_{1}(\lambda)\): say \(m_{1}(\lambda)=n_{1}(\lambda) m_{2}(\lambda)\). We have then \[ n_{1}(A) h_{1}(A) x_{1}=n_{1}(A) m_{2}(A) x_{2}^{\prime}=m_{1}(A) x_{2}^{\prime}=0 \] so that \(n_{1}(\lambda) h_{1}(\lambda)\) must be divisible by \(m_{1}(\lambda)\): \[ n_{1}(\lambda) h_{1}(\lambda)=k_{1}(\lambda) m_{1}(\lambda)=k_{1}(\lambda) n_{1}(\lambda) m_{2}(\lambda). \]

It follows (upon division by \(n_{1}(\lambda)\)) that \[ {h}_{1}(\lambda)=k_{1}(\lambda) m_{2}(\lambda). \]

We define \(x_{2}=x_{2}^{\prime}-k_{1}(A) x_{1}\). Since \(x_{2} \equiv x_{2}^{\prime} \,\, (\operatorname{mod}\,\{x_1\})\) the order of \(x_{2} \, \operatorname{mod} \,\{x_1\}\) is the same as that of \(x_{2}^{\prime}\), i.e., it is \(m_{2}(\lambda)\). We have moreover \[ \begin{aligned} m_{2}(A) x_{2} &= m_{2}(A) x_{2}^\prime - k_{1}(A) m_{2}(A) x_{1}\\ &= h_{1}(A) x_{1}-h_{1}(A) x_{1}\\ &= 0. \end{aligned} \]

The intersection of the linear subspaces \(\{x_1\}\) and \(\{x_{2}\}\) consists of zero alone, for \(f(A) x_{2}\) in \(\{x_1\}\) implies that \(m_{2}(\lambda)\) is a divisor of \(f(\lambda)\), and therefore that \(f(A) x_{2}=0\).

The above constitutes the first step of an induction proof: we go on to indicate the second step.

Denote by \(\{x_{1}, x_{2}\}\) the linear subspace spanned together by \(\{x_1\}\) and \(\{x_{2}\}\) and consider the quotient space \(\mathcal{V} /\{x_{1}, x_{2}\}\). Let \(x_{3}^{\prime}\) be a vector whose order (\(\operatorname{mod} \,\{x_{1}, x_{2}\}\)) is the minimal polynomial, \(m_{3}(\lambda)\), of \(A\) in this space. Then \[ m_{3}(A) x_{3}^{\prime}=h_{1}(A) x_{1}+h_{2}(A) x_{2}, \] (where the auxiliary polynomial \(h_{1}(\lambda)\) is not related to the one used above). Since \(m_{2}(A) x_{3}^\prime \equiv 0 \,\, (\operatorname{mod} \,\{x_1\})\) and therefore \(\operatorname{mod} \,\{x_{1}, x_{2}\}\), \(m_{3}(\lambda)\) must divide \(m_{2}(\lambda)\): \[ m_{2}(\lambda)=n_{2}(\lambda) m_{3}(\lambda). \] Hence \[ n_{2}(A) h_{2}(A) x_{2}=n_{2}(A) m_{3}(A) x_{3}^{\prime}-n_{2}(A) h_{1}(A) x_{1} \equiv 0 \,\, (\operatorname{mod} \,\{x_1\}), \] so that \[ n_{2}(\lambda) h_{2}(\lambda)=k_{2}(\lambda) m_{2}(\lambda)=k_{2}(\lambda) n_{2}(\lambda) m_{3}(\lambda), \] whence, finally, \[ h_{2}(\lambda)=k_{2}(\lambda) m_{3}(\lambda). \]

Similarly, since \(m_{1}(A) x_{3}^\prime=0\), \(m_{1}(\lambda)=n_{1}(\lambda) m_{3}(\lambda)\), so that \[ \begin{aligned} n_{1}(A) h_{1}(A) &= n_{1}(A) m_{3}(A) x_{3}^\prime - n_{1}(A) n_{2}(A) x_{2} \\ &= m_{1}(A) x_{2}-n_{1}(A) k_{2}(A) m_{3}(A) x_{2}\\ &= 0. \end{aligned} \] Consequently \[ n_{1}(\lambda) h_{1}(\lambda)=k_{1}(\lambda) m_{1}(\lambda)=k_{1}(\lambda) n_{1}(\lambda) m_{3}(\lambda), \] so that \[ {h}_{1}(\lambda)=k_{1}(\lambda) m_{3}(\lambda). \]

We write \(x_{3}=x_{3}^{\prime}-k_{1}(A) x_{1}-k_{2}(A) x_{2}\); then \(x_{3} \equiv x_{3}^\prime \,\, (\operatorname{mod} \,\{x_{1}, x_{2}\})\) so that the order of \(x_{3}\) is still \(m_{3}(\lambda)\); and moreover \[ m_{3}(A) x_{3}=m_{3}(A) x_{3}^{\prime}-m_{3}(A) k_{1}(A) x_{1}-m_{3}(A) k_{2}(A) x_{2}=0. \]

It follows that the linear subspace \(\{x_{3}\}\) has only the vector \(0\) in common with \(\{x_{1}, x_{2}\}\), for \(f(A) x_{3} \equiv 0 \,\, (\operatorname{mod} \,\{x_{1}, x_{2}\})\) implies that \(f(\lambda)\) is divisible by \(m_{3}(\lambda)\) and therefore that \(f(A) x_{3}=0\).

Proceeding in this way by induction we obtain finally the desired result. The polynomials \(m_{i}(\lambda)\) (which incidentally have the property that each divides all preceding ones) are the invariant factors of \(A\).

Using the basis \(A^{j} x_{i}\), \(i=1, \ldots, r\), \(j=0,1, \ldots, m_{i}-1\), the matrix of \(A\) has the so-called rational canonical form: \[ [A] = \begin{bmatrix} {[A_{1}]} & 0 & \cdots & 0 \\ 0 & {[A_{2}]} & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \cdots & {[A_{{r}}]} \end{bmatrix} \] where each \([A_{i}]\) has the form described in (V.5) above.

It is easily verified, using this form, that the characteristic polynomial, \(f(\lambda)\), is the product of all the \(m_{i}(\lambda)\) and is therefore divisible by \(m_{1}(\lambda)=m(\lambda)\). As a corollary, therefore, we obtain the Hamilton-Cayley theorem:

Theorem 1. For every \(A\), \(f(A)=0\), where \(f(\lambda)\) is the characteristic polynomial of \(A\).

8. Elementary transformations

In the present section we consider matrices (not linear transformations) depending on a parameter \(\lambda\).

Definition 3. Let \(A(\lambda)=\big(a_{ij}(\lambda)\big)\) be a matrix whose elements are polynomials in \(\lambda\), and let \(d_{j}\big(A(\lambda)\big)\) be the highest common factor (H.C.F.) of all \(n - {j}\) rowed minors of \(A(\lambda)\). \(d_{j}\big(A(\lambda)\big)\) is the \(j\)-th determinantal factor of \(A(\lambda)\).

Two matrix polynomials, \(A(\lambda)\) and \(B(\lambda)\), are called equivalent if there exist non-singular matrix polynomials \(P(\lambda)\) and \(Q(\lambda)\) for which \({B}(\lambda)={P}(\lambda) {A}(\lambda) Q(\lambda)\). Since the rows (or columns) of \({P}(\lambda) A(\lambda)\) (or \(A(\lambda) Q(\lambda)\)) are linear combinations of the rows (or columns) of \(A(\lambda)\), it follows that the determinantal factors of \(B(\lambda)\) are linear combinations of those of \(A(\lambda)\) and therefore that \(d_{j}\big(B(\lambda)\big)\) is divisible by \(d_{j}\big(A(\lambda)\big)\). Interchanging the roles of \(A\) and \(B\) we see that if \(A(\lambda)\) is equivalent to \(B(\lambda)\) then their determinantal factors are equal.

The matric polynomial \(B(\lambda)\) is obtained from \(A(\lambda)\) by elementary transformations if it is obtained by

  1. the interchange of any two rows (or columns),
  2. the multiplication of any row (or column) by a constant different from \(0\), or
  3. the addition of the \(u(\lambda)\)-fold of any row (or column) to any other row (or column), when \(u(\lambda)\) is an arbitrary polynomial.

It is easily verified that the elementary transformations can be effected by suitable multiplication by non-singular matric polynomials, so that if \(B(\lambda)\) is obtained from \(A(\lambda)\) by elementary transformations then \(B(\lambda)\) and \(A(\lambda)\) are equivalent. In particular therefore \(B(\lambda)\) and \(A(\lambda)\) have the same determinantal factors.

9. Similarity of linear transformations

To apply the results of the preceding section to linear transformations, let \(A\), \(B\), and \(S\) be linear transformations connected by the equation \(B=S A S^{-1}\). Then in any coordinate system we have the matrix equation \[ [S][A-\lambda{1}][S]^{-1}=[B-\lambda{1}], \] so that the determinantal factors of \([A-\lambda{1}]\) and \([B-\lambda 1]\) are equal. In other words the determinantal factors are invariant under a change of basis. Hence if for any linear transformation \(A\) we want to compute the determinantal factors of \(A-\lambda 1\), we may assume that the matrix of \(A\) is in rational canonical form. The generic block in the matrix of \(A - \lambda{1}\) will then have the form \[ \begin{bmatrix} -\lambda & 1 & 0 & \cdots & 0 \\ 0 & -\lambda & 1 & \cdots & 0 \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & 0 & \cdots & 1 \\ a_{1} & a_{2} & a_{3} & \cdots & a_{m}-\lambda \end{bmatrix}. \]

Adding to the first column \(\lambda\) times the second, \(\lambda^{2}\) times the third, etc., we obtain \[ \begin{bmatrix} 0 & 1 & 0 & \cdots & 0 \\ 0 & -\lambda & 1 & \cdots & 0 \\ 0 & 0 & -\lambda & \cdots & 0 \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ m(\lambda) & a_{2} & a_{3} & \cdots & a_{m}-\lambda \end{bmatrix}, \] and by elementary transformations on the rows this becomes \[ \begin{bmatrix} 1 & 0 & 0 & \cdots & 0 \\ 0 & 1 & 0 & \cdots & 0 \\ 0 & 0 & 1 & \cdots & 0 \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & 0 & \cdots & m(\lambda) \end{bmatrix}. \]

After these transformations on each block, the matrix of \(A-\lambda 1\) will have the form \[ \begin{bmatrix} 1 & \cdots & 0 \quad & 0 & \cdots & 0\\ \vdots & \ddots & \vdots \quad & \vdots & \ddots & \vdots\\ 0 & \cdots & 1 \quad & 0 & \cdots & 0\\ \, & \, & \, \quad & \, & \, & \, \\ 0 & \cdots & 0 \quad & m_1(\lambda) & \cdots & 0\\ \vdots & \ddots & \vdots \quad & \vdots & \ddots & \vdots\\ 0 & \cdots & 0 \quad & 0 & \cdots & m_r(\lambda)\\ \end{bmatrix}. \]

It follows that for \(j=0,1, \ldots, r-1\), \[ d_{j}(A-\lambda 1)=m_{j+1}(\lambda)\ldots m_{r}(\lambda), \] and for \(j \geq r\), \(d_{j}(A-\lambda 1)=1\). Hence for \(j= 1, \ldots, r\), \[ m_{j}(\lambda)=d_{j-1}(A-\lambda 1) \big/ d_{j}(A-\lambda 1), \] which shows, incidentally, that the invariant factors \(m_{j}(\lambda)\) are invariant under change of basis and that we may therefore speak of the invariant factors of a linear transformation. As an easy consequence of our results we obtain the fact that a necessary and sufficient condition that two linear transformations \(A\) and \(B\) be similar, i.e., that there exist a non-singular linear transformation \(S\) for which \(B=S A S^{-1}\), is that \(A\) and \(B\) have the same invariant factors. The necessity of this condition we have just proved; its sufficiency follows from the fact that if \(A\) and \(B\) have the same invariant factors then their matrices, in two suitably chosen coordinate systems, will have the same rational canonical form.

10. Elementary divisors

Let the minimal polynomial of \(A\) be \(m(\lambda)=p_{1}(\lambda)^{e_1} \ldots p_{s}(\lambda)^{e_{s}}\), and write \(q_{i}(\lambda)=m(\lambda) / p_{i}(\lambda)^{e_{i}}\). Then the H.C.F. of all \(q_{i}(\lambda)\) is \(1\), so that we may find polynomials \(u_{i}(\lambda)\) for which \[ u_{1}(\lambda) q_{1}(\lambda)+\cdots+u_{s}(\lambda) q_{s}(\lambda)=1. \]

Let \(M_i\) be the range of \(q_{i}(A)\), i.e., the linear subspace of all vectors of the form \(q_{i}(A) x\). If \(x_{i}\) is in \(M_i\) then \[ p_{i}(A)^{e_i} x_i=0, \] and if \(x\) is any vector then \[ x=\sum_i u_{i}(A) q_{i}(A) x=\sum_i x_{i}, \] where \(x_{i}\) is in \(M_i\). If \(\sum_i x_{i}=0\) then \[ q_{j}(A) x=\sum_i q_{j}(A) x_{i}=q_{j}(A) x_{j}=0. \] since \(q_{i}(\lambda)\) and \(p_{i}(\lambda)^{e_{i}}\) are relatively prime we may find polynomials \(v_{i}(\lambda)\) and \(w_{i}(\lambda)\) for which \[ v_{i}(\lambda) p_{i}(\lambda)^{e_i}+ w_{i}(\lambda) q_{i}(\lambda)=1, \] so that \[ x_{i}=q_{i}(A) w_{i}(A) x_{i}+ v_i(A) p_{i}(A)^{e_i} x_{i}=0. \]

To sum up: \(\mathcal{V}\) is the direct sum of \(M_{1}, \ldots, M_s\), i.e., the linear subspaces \(M_{i}\) span \(\mathcal{V}\) and any two of them contain only the vector \(0\) in their intersection. Moreover the minimal polynomial of \(A\) in the invariant linear subspace \(M_{i}\) is exactly \(p_{i}(\lambda)^{e_i}\).

If we apply the above considerations to each cyclic subspace of \(A\) that is exhibited by the rational canonical form we obtain an even finer decomposition of \(\mathcal{V}\) into cyclic subspaces whose orders are the prime power factors of \(m_{i}(\lambda)\). These orders are called the elementary divisors of \(A\). The elementary divisors are determined by the invariant factors: we show that the converse is true. If \(\mathcal{V}\) is the direct sum of cyclic subspaces of orders \(p_{i}^{\prime}(\lambda)^{f_{ij}}\) which are powers of irreducible polynomials, and if the notation is so chosen that \(f_{i1} \geq f_{i2} \geq \cdots\), then it is easy to verify that the \(j\)-th invariant factor, \(m_{j}(\lambda)\), of \(A\) is \[ m_{j}(\lambda)=p_{1}^{\prime}(\lambda)^{f_{1j}} p_{2}(\lambda)^{f_{2 j}}\ldots \]

From the unique factorization theorem it follows that the \(p_{i}^\prime(\lambda)^{f_{ij}}\) are the elementary divisors of \(A\), and therefore that the elementary divisors determine the invariant factors. It follows easily from our preceding results that a necessary and sufficient condition that two linear transformations be similar is that they have the same elementary divisors.

11. The classical canonical form

In preceding section we have decomposed \(\mathcal{V}\) into a direct sum of cyclic subspaces and the decomposed each subspace into a direct sum of smaller cyclic subspaces whose orders are powers of irreducible polynomials. In the present section we shall indicate one more decomposition, of these last prime power spaces, which will not in general be into cyclic spaces.

Consider a cyclic space \(\{z\}\) of order \(p(\lambda)^{e}\), where \(p(\lambda)=\lambda^{c}-\sum_{i=0}^{c-1} p_{i} \lambda^{i}\) is an irreducible polynomial. For \(0 \leq k < e\), and \(0 < {h} \leq {c}\), we write \[ z_{k c + h} = p(A)^{e-k-1} {A}^{h-1} z. \]

Since the degrees of the polynomials \({p}(\lambda)^{e-k-1} \lambda^{{h}-1}\) are all different and \(< ce\), the \(z_{i}\) are linearly independent and therefore form a basis for \(\{z\}\). In this basis the matrix of \(A\) has the form \[ \begin{bmatrix} {[P]} & [E] & 0 & \cdots & 0 & 0 \\ 0 & {[P]} & {[E]} & \cdots & 0 & 0 \\ 0 & 0 & {[P]} & \cdots & 0 & 0 \\ \vdots & \vdots & \vdots & \ddots & \vdots & \vdots \\ 0 & 0 & 0 & \cdots & [P] & [E]\\ 0 & 0 & 0 & \cdots & 0 & {[P]} \end{bmatrix} \] where \[ [P] = \begin{bmatrix} 0 & 1 & 0 & \cdots & 0 \\ 0 & 0 & 1 & \cdots & 0 \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & 0 & \cdots & 1 \\ p_{0} & p_{1} & p_{2} & \cdots & p_{c-1} \end{bmatrix}, \] \[ [E] = \begin{bmatrix} 0 & 0 & \cdots & 1 \\ 0 & 0 & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \cdots & 0 \end{bmatrix}. \]

In case the ground field is algebraically closed, so that the only irreducible polynomials are linear (i.e., \(c = 1\)), this matrix reduces to \[ \begin{bmatrix} \lambda & 1 & 0 & \cdots & 0 & 0 \\ 0 & \lambda & 1 & \cdots & 0 & 0 \\ 0 & 0 & \lambda & \cdots & 0 & 0 \\ \vdots & \vdots & \vdots & \ddots & \vdots & \vdots \\ 0 & 0 & 0 & \cdots & \lambda & 1\\ 0 & 0 & 0 & \cdots & 0 & \lambda \end{bmatrix}. \] The matrix of \(A\) in the entire space will then consist of a number of blocks of this type on the main diagonal: this form of the matrix of \(A\) is called the classical canonical (or Jordan) form.

As an application of these ideas we mention that (in the case of an algebraically closed field) a necessary and sufficient condition that a matrix be similar to a diagonal matrix is that the roots of the minimal equation (or the elementary divisors) be simple, i.e., have multiplicity one. For if these roots are simple then the blocks of the classical canonical form will consist of only one element each, so that no super diagonal \(1\)’s appear, and the matrix is diagonal. This proves the sufficiency of our conditions. To prove necessity we observe that for a diagonal matrix whose diagonal elements are \(\lambda_{i}\) the product of factors \((\lambda-\lambda_{i})\) extended over a set of subscripts \(i\) in such a way as to contain every eigenvalue exactly once is precisely the minimal polynomial, which therefore has simple roots. As a particular application of this result we see that every idempotent linear transformation, which satisfies the equation \(\lambda^{2}-\lambda=\lambda(\lambda-1)=0\), may be realized by a diagonal matrix whose diagonal elements consist of only \(0\)’s and \(1\)’s.