Dual Spaces and Tensor Products

1. Transformations of rank one
2. The Hadamard product of non-negative matrices
3. The dual space of a vector space
4. The dual space of an inner product space
5. Reflexivity of inner product spaces
6. Direct sum of vector spaces
7. Tensor product of vector spaces
8. Dimension of a tensor product
9. The dual of a tensor product
10. Tensor product of inner product spaces
11. The inner product in a tensor product
12. Tensor product of transformations
13. Kronecker products of matrices
14. Properties of tensor product transformations

1. Transformations of rank one

Before beginning the proper subject matter of the present chapter we digress to a discussion of interest in itself whose results we shall need later. It follows easily from the spectral theory of normal transformations (or, equivalently, from the possibility of representing a normal transformation by a diagonal matrix) that every normal transformation is a sum of normal transformations of rank one; similarly every Hermitian (or non-negative) transformation is a sum of Hermitian (or non-negative) transformations of rank one. It becomes, therefore, of interest to investigate transformations of rank one.

Theorem 1. A necessary and sufficient condition that a linear transformation \(A\) has rank \(1\) is that in every coordinate system the matrix \([A]=(a_{ij})\) has the form \(a_{ij}=b_{i} c_{j}\).

Proof. If \(A\) has rank \(1\) then the set of all vectors of the form \(Ax\) is one dimensional, so that there exists a vector \(x_{0}\) with the property that \(Ax\) is for every \(x\) a constant multiple (depending on \(x\)) of \(x_{0}\): \(A x=f(x) x_{0}\). (It is easily verified that \(f(x)\) is a linear function of \(x\), but we shall not need this fact.) Now if \(\{x_{i}\}\) is a coordinate system the matrix \([A]\) is characterized by \(A x_{j}=\sum_{k} a_{k j} x_{k}\), whence \[ \langle A x_{j}, x_{i} \rangle = \sum_{k} a_{k j}\langle x_{k}, x_{i} \rangle = a_{ij}, \] so that \[ a_{ij}=\big\langle f(x_{j}) x_{0}, x_{i} \big\rangle = f(x_{j})\,\langle x_{0}, x_{i} \rangle = b_{i} c_{j}. \]

Conversely if \(a_{ij}=b_{i} c_{j}\), we may find a linear function \(f(x)\) for which \(f(x_{j})=c_{j}\), and we may define \(x_{0}=\sum_i b_{i} x_{i}\). The linear transformation \(B\) defined by \(B x=f(x) x_{0}\) is clearly of rank \(1\), and we have \[ b_{ij} = \langle B x_{j}, x_{i} \rangle = b_{i} c_{j}=a_{ij}, \] so that \(B=A\).■

If \(A\) has rank \(1\) and is Hermitian and if the matrix of \(A\) in some coordinate system is \((b_{i} c_{j})\), then we must have \(b_{i} c_{j}=\overline{b_{j} c_{i}}\). If, for some \(i\), \(b_{i}=0\) and \(c_{i} \neq 0\), then \(\bar{b}_{j}=(b_{i} / \bar{c}_{i}) c_{j}=0\) for all \(j\), whence \(A=0\). Since we assumed that the rank of \(A\) is \(1\), this is impossible and we can find an \(i\) for which \(b_{i} c_{i} \neq 0\). Using this \(i\) the relation \(\bar{b}_{j}=(b_{i} / \bar{c}_{i}) c_{j}\) implies that \(c_{j}=k \bar{b}_{j}\) with some constant \(k\) independent of \({j}\). Since the diagonal elements of a Hermitian matrix, i.e., the \(\langle A x_{i}, x_{i} \rangle\), are real, we can even conclude that \(k\) is real, so that in this case \(a_{ij}\) has the form \(a_{ij}=k a_{i} \bar{a}_{j}\) with a real \(k\).

If \(A\) has rank \(1\) and is non-negative, then the discussion of the preceding paragraph applies and the fact that the diagonal elements of a non-negative matrix are non-negative implies that \(k\) is non-negative. In this case we may write \({h}=+{k}^{1 / 2}\) and the relation \(k a_{i} \bar{a}_{j} = (ha_i)\overline{(ha_j)}\) shows that \(a_{ij}\) has the form \(a_{ij}=a_{i} \bar{a}_{j}\).

It is easy to see that the conditions given in the last two paragraphs are not only necessary but also sufficient. If \(a_{ij}=k a_{i} \bar{a}_{j}\) then clearly \(A\) is Hermitian and has rank \(1\). If, moreover, \(a_{ij}=a_{i} \bar{a}_{j}\) and \(x=\sum_i c_i x_{i}\) then

\[ \begin{aligned} \langle A x, x \rangle &= \sum_i \sum_{j} c_{j} \bar{c}_{i}\langle A x_{j}, x_{i} \rangle\\ &= \sum_i \sum_{j} c_{j} \bar{c}_{i} a_{i} \bar{a}_{j} \\ &= \Big(\sum_i a_{i} \bar{c}_{i}\Big) \overline{\Big(\sum_{j} a_{j} \bar{c}_{j}\Big)}\\ &= \Big|\sum_i a_{i} \bar{c}_{i}\Big|^{2} \\ &\geq 0 \end{aligned} \]

so that \(A\) is non-negative.

2. The Hadamard product of non-negative matrices

As a consequence of the preceding section it is very easy to prove a remarkable theorem on non-negative matrices, due to I. Schur.

Theorem 2. If \(A\) and \(B\) are non-negative linear transformations whose matrices in some coordinate system are \((a_{ij})\) and \((b_{ij})\) respectively then the linear transformation \(C\) whose matrix in this coordinate system is defined by \(c_{ij}=a_{ij} b_{ij}\) is also non-negative.

Proof. Since we may write both \(A\) and \(B\) as a sum of non-negative transformations of rank \(1\), \(C\) may be written as a sum of transformations the matrices of which are obtained from the matrices of two non-negative transformations of rank \(1\) in the same way as the matrix of \(C\) was obtained from the matrices of \(A\) and \(B\). Since a sum of non-negative transformations is non-negative, it is therefore sufficient to prove the theorem in the case where \(A\) and \(B\) both have rank \(1\). In this case \(a_{ij}=a_{i} \bar{a}_{j}\), \(b_{ij}=b_{i} \bar{b}_{j}\), and therefore \(c_{ij}=c_{i} \bar{c}_{j}\), where \(c_{i}=a_{i} b_{i}\), whence it follows that \(C\) is non-negative (and has rank \(1\)).■

3. The dual space of a vector space

Definition 1. Let \(\mathcal{V}\) be an arbitrary vector space; we denote by \(\mathcal{V}^{*}\) the set of all linear functions defined on \(\mathcal{V}\). If in \(\mathcal{V}^{*}\), \(0\), \(f+g\), and \(af\) are defined by \(f(x)=0\), \(f(x)+g(x)\), and \(af(x)\) respectively, then \(\mathcal{V}^{*}\) becomes a vector space: we call \(\mathcal{V}^{*}\) the dual space of \(\mathcal{V}\).

In the present chapter we shall discuss the theory of dual spaces. We call attention to the fact that all our definitions and theorems will be phrased without reference to any basis or coordinate system and that, although we shall make liberal use of bases, we use them only when that is unavoidable: namely in considerations of dimensionality, where bases enter by definition. Through out this chapter we shall mean by a basis a linear basis, i.e., a maximal set of linearly independent elements: in case \(\mathcal{V}\) is an inner product space we shall in each case specify whether or not we need an orthogonal basis.

If \(\mathcal{V}\) is of dimension \(n\) so is \(\mathcal{V}^{*}\). For let \(\{x_{i}\}\) be a basis in \(\mathcal{V}\). For each \(i=1,2, \ldots, n\) we may define a linear function \(f_{i}(x)\) by the requirement that \(f_{i}(x_{j})=\delta_{ij}\). Then \(\sum_{i} a_{i} f_{i}=0\), i.e., \(\sum_i a_{i} f_{i}(x) \equiv 0\) implies \[ 0=\sum_i a_{i} f_{i}(x_{j})=\sum_i a_{i} \delta_{ij}=a_{j}, \] so that the \(f_{i}\) are linearly independent. Moreover if \(f\) is arbitrary in \(\mathcal{V}^{*}\) and \(x=\sum_{j} c_{j} x_{j}\) in \(\mathcal{V}\) then \(f(x)=\sum_{j} c_{j} f(x_{j})\) and \(f_{i}(x)=\sum_{j} c_{j} f_{i}(x_{j})=c_{i}\), so that \(f(x)=\sum_{j} f(x_{j}) f_{j}(x)\). In other words \(\{f_{j}\}\) is a basis in \(\mathcal{V}^{*}\), so that \(\mathcal{V}^{*}\) has dimension \(n\).

Since \(\mathcal{V}\) and \(\mathcal{V}^{*}\) are both \({n}\) dimensional vector spaces it is possible, in many ways, to set up a one to one correspondence between them that preserves \(0\), sum, and scalar product: in other words \(\mathcal{V}\) and \(\mathcal{V}^{*}\) are isomorphic. These isomorphisms, however, are perfectly arbitrary and yield no information about the structure of vector spaces. If, however, we consider not \(\mathcal{V}^{*}\) but its dual space, which we may denote by \(\mathcal{V}^{**}\), (i.e., \(\mathcal{V}^{**}\) is the set of all linear functions \(X(f)\) defined on \(\mathcal{V}^*\)), then it is possible to set up a ‘natural’ isomorphism between \(\mathcal{V}\) and \(\mathcal{V}^{**}\).

Given any vector \(x_{0}\) in \(\mathcal{V}\), we make correspond to it an element \(X_{0}\) in \(\mathcal{V}^{* *}\) by defining, for every \(f=f(x)\) in \(\mathcal{V}^{*}\), \(X_{0}(f)=f(x_{0})\). (It is easy to verify that \(X_{0}=X_{0}(f)\) is indeed a linear function of \(f\).) The correspondence \(x_{0} \to X_{0}\) is linear: i.e., if \(x_{1} \to X_{1}\), \(x_{2} \to X_{2}\), and \(a_{1} x_{1}+a_{2} x_{2} \to X\), then \(X=a_{1} X_{1}+a_{2} X_{2}\). For, by definition we have for each \(f\) in \(\mathcal{V}^{*}\),

\[ \begin{aligned} X(f)&=f(a_{1} x_{1}+a_{2} x_{2})\\ &=a_{1} f(x_{1})+a_{2} f(x_{2})\\ &=a_{1} X_{1}(f)+a_{2} X_{2}(f). \end{aligned} \]

We now show that the correspondence \(x \to X\) is one to one. If \(x_{1}\) and \(x_{2}\) correspond to the same \(X\), then we have for every \(f\) in \(\mathcal{V}^{*}\), \(f(x_{1})=f(x_{2})\), or \(f(x_{1}-x_{2})=0\). If we introduce, as above, a basis \(\{y_{j}\}\) in \(\mathcal{V}\) and corresponding linear functions \(f_{i}(y)\) in \(\mathcal{V}^{*}\), defined by \(f_{i}(y_{j})=\delta_{ij}\), then we see that \(f(x)=0\) for all \(f\), where \(x=\sum_{j} c_{j} y_{j}\), implies in particular that \(f_{i}(x)=c_{i}=0\), so that \(x=0\). Hence \(x_{1}\) and \(x_{2}\) can correspond to the same \(X\) only if \(x_{1}=x_{2}\), as was to be proved. Finally we remark that every \(X\) in \(\mathcal{V}^{* *}\) corresponds in this correspondence to some \(x\) in \(\mathcal{V}\). The simplest proof of this fact is that since \(\mathcal{V}\) and therefore \(\mathcal{V}^{*}\) are \({n}\) dimensional vector spaces the dual space \(\mathcal{V}^{**}\) of \(\mathcal{V}^{*}\) is also \({n}\) dimensional. Hence if we can exhibit \({n}\) linearly independent elements of \(\mathcal{V}^{* *}\) which do correspond to elements of \(\mathcal{V}\) the desired result will follow. Let \(\{x_{i}\}\) be a basis in \(\mathcal{V}\): then \(X_{i}=X_{i}(f)=f(x_{i})\) is a set of linearly independent elements, and therefore a basis, in \(\mathcal{V}^{* *}\). For \(\sum_i a_{i} X_{i}=0\), i.e., \(\sum_i a_{i} X_{i}(f)=0\) for all \(f\), implies that

\[ 0=\sum_i a_{i} f(x_{i})=f\Big(\sum_i a_{i} x_{i}\Big), \]

whence, as above, \(\sum_i a_{i} x_{i}=0\) and therefore \(a_{i}=0\).

Thus the correspondence \(x \to X\) is an isomorphism, the so-called ‘natural isomorphism’, between \(\mathcal{V}\) and \(\mathcal{V}^{* *}\).

4. The dual space of an inner product space

The considerations of the preceding section apply, of course, to inner product spaces. In the case of inner product spaces, however, It is not necessary to go to \(\mathcal{U}^{**}\): we shall establish a natural correspondence between \(\mathcal{U}\) and \(\mathcal{U}^{*}\).

Let \(\mathcal{U}\) be an \(n\) dimensional inner product space and \(\mathcal{U}^{*}\) its dual space. The theorem on the representation of linear functions (cf. (I.15)) shows that every \(f=f(x)\) in \(\mathcal{U}^{*}\) has the form \(f(x) = \langle x, y \rangle\). This relation establishes a correspondence, which we already know to be one to one, between \(f(x)\) in \(\mathcal{U}^{*}\) and \(y\) in \(\mathcal{U}\). If in this correspondence \(f_{i}\) corresponds to \(y_{i}\), \(i=1,2\), then we have \[ \begin{aligned} f(x) &= a_{1} f_{1}(x)+a_{2} f_{2}(x)\\ &=a_{1} \langle x, y_{1} \rangle + a_{2} \langle x, y_2 \rangle\\ &=\langle x, \bar{a}_{1} y_{1}+\bar{a}_{2} y_{2} \rangle, \end{aligned} \] so that \(f\) corresponds to \(y=\bar{a}_{1} y_{1}+\bar{a}_{2} y_{2}\). Thus the correspondence \(f(x) \to y\) is not an isomorphism but a conjugate isomorphism between \(\mathcal{U}^{*}\) and \(\mathcal{U}\).

This correspondence can also be used to define an inner product in \(\mathcal{U}^{*}\). At first glance it might seem plausible to define \(\langle f_{1}, f_{2} \rangle\) to be \(\langle y_{1}, y_{2} \rangle\), where \(f_{i}(x) = \langle x, y_{i} \rangle\), but due to the fact that the correspondence is a conjugate isomorphism we have the relation

\[ a\langle f_{1}, f_{2} \rangle \neq \langle a f_{1}, f_{2} \rangle = \bar{a}\langle f_{1}, f_{2} \rangle \]

so that this definition does not satisfy the requirements of the definition of an inner product in (I.3). If, however, we define \(\langle f_{1}, f_{2} \rangle = \langle y_{2}, y_{1} \rangle\) then it is readily verified that \(\langle f_{1}, f_{2} \rangle\) is an inner product in \(\mathcal{U}^{*}\) so that \(\mathcal{U}^{*}\) is an inner product space.

5. Reflexivity of inner product spaces

If we apply the results of the preceding section not to the inner product space \(\mathcal{U}\) but to its dual \(\mathcal{U}^{*}\) we obtain a conjugate isomorphism between \(\mathcal{U}^{*}\) and \(\mathcal{U}^{* *}\). Thereby we have induced a one to one correspondence between \(\mathcal{U}\) itself and \(\mathcal{U}^{* *}\): it is readily verified (since the operation of conjugation is involutory) that this correspondence is an isomorphism. We now show that this isomorphism is the same as the natural isomorphism between \(\mathcal{U}\) and \(\mathcal{U}^{*}\) described in (IV.3). Let \(y_{0}\) be an arbitrary vector in \(\mathcal{U}\); to it there corresponds an element \(f_{0}(x) = \langle x, y_{0} \rangle\) in \(\mathcal{U}^{*}\); to this element, in turn, there corresponds the element \(Y_{0}(f) = \langle f, f_{0} \rangle\) in \(\mathcal{U}^{* *}\). We must show that \(Y_{0}(f)=f(y_{0})\). Let \(f=f(x)= \langle x, y \rangle\) be an arbitrary element of \(\mathcal{U}^{*}\); we have \[ Y_{0}(f) = \langle f, f_{0} \rangle = \langle y_{0}, y \rangle = f(y_{0}), \] as was to be proved.

6. Direct sum of vector spaces

Definition 2. If \(\mathcal{U}\) and \(\mathcal{V}\) are arbitrary vector spaces we define the direct sum, \(\mathcal{W} = \mathcal{U} \oplus \mathcal{V}\) to be the set of all pairs \((x, y)\) with \(x\) in \(\mathcal{U}\) and \(y\) in \(\mathcal{V}\).

If in \(\mathcal{W}\) we define \[ \begin{aligned} 0 &= (0, 0),\\ (x_{1}, y_{1})+(x_{2}, y_{2})&=(x_{1}+x_{2}, y_{1}+y_{2}),\\ a(x, {y})&=(a x, a y), \end{aligned} \] then \(\mathcal{W}\) becomes a vector space. If, moreover, \(\mathcal{U}\) and \(\mathcal{V}\) are inner product spaces we may define in \(\mathcal{W}\), \(\big\langle (x_{1}, y_{1}), (x_{2}, y_{2})\big\rangle = \langle x_{1}, x_{2} \rangle + \langle y_{1}, y_{2} \rangle\) and \(\mathcal{W}\) becomes thereby an inner product space. In fact although for vector spaces this definition yields something new, for inner product spaces it can be subsumed under the discussion of the projection theorem (I.13). In other words \(\mathcal{U}\) and \(\mathcal{V}\) can be thought of as two orthogonal linear subspaces in \(\mathcal{W}\), and in an arbitrary inner product space a linear subspace and its orthogonal complement are a decomposition of the space into a direct sum.

If \(A\) and \(B\) are linear transformations in \(\mathcal{U}\) and \(\mathcal{V}\) respectively we may define a linear transformation \(C\), the direct sum of \(A\) and \(B\), in \(\mathcal{W}=\mathcal{U} \oplus \mathcal{V}\) by \(C(x, y) = (A x, B y)\). It is easy to discuss the matricial representation of \(C\), its relation to addition, multiplication, scalar multiplication, \(0\), \(1\), inverse, dual, etc. We omit this discussion here, and merely state without proof two propositions that will be useful to us later.

If \(\mathcal{U}\) and \(\mathcal{V}\) have dimensions \(n\) and \(m\) respectively the dimension of \(\mathcal{W}\) is \(n+m\). If \(\{x_{i}\}\) and \(\{y_{j}\}\) are bases in \(\mathcal{U}\) and \(\mathcal{V}\) respectively then the totality of all vectors of either of the two forms \((x_{i}, 0)\) or \((0, y_{j})\) is a basis in \(\mathcal{W}\). If the matrix of the direct sum transformation \({C}\) is computed in this basis it will have the form

\[ \begin{bmatrix} {[A]} & 0 \\ 0 & {[B]} \end{bmatrix} \]

where \([A]\) and \([B]\) are the matrices of \(A\) and \(B\) in the bases \(\{x_{i}\}\) and \(\{y_{j}\}\) respectively, and where the zeros represent rectangular blocks each element of which is zero.

The most general linear function \(f(x, y)\) on \(\mathcal{W} = \mathcal{U} \oplus \mathcal{V}\) is of the form \(f(x, y)=g(x)+h(y)\), where \(g\) and \(h\) are linear functions in \(\mathcal{U}\) and \(\mathcal{V}\). In other words the dual space of a direct sum is the direct sum of the dual spaces.

7. Tensor product of vector spaces

The main purpose of this chapter is to define for vector spaces (and inner product spaces) the notion of a tensor product. In other words if \(\mathcal{U}\) and \(\mathcal{V}\) are given vector spaces we shall define for every vector \(x\) in \(\mathcal{U}\) and \(y\) in \(\mathcal{V}\) a product \(x \otimes y\), which is to be an element of a suitable vector space, in such a way that \(x \otimes y\) depends linearly on either variable if the other one is fixed and so that (in case \(\mathcal{U}\) and \(\mathcal{V}\) are inner product spaces) we have \[ \langle x_{1} \otimes y_{1}, x_{2} \otimes y_{2} \rangle = \langle x_{1}, x_{2} \rangle \langle y_{1}, y_{2} \rangle. \]

In order to clarify the definition we shall give, we proceed heuristically on the basis of the proposition (ii) in the preceding section. If we denote the (as yet undefined) tensor product of \(\mathcal{U}\) and \(\mathcal{V}\) by \(\mathcal{W} = \mathcal{U} \otimes \mathcal{V}\), me may expect that \(\mathcal{W}^{*}=\mathcal{U}^{*} \otimes \mathcal{V}^{*}\). Since it is technically easier to do so, instead of defining \(\mathcal{W}\) itself we shall instead define \(\mathcal{W}^{*}\); we shall then write, by definition, \(\mathcal{W}=(\mathcal{W}^{*})^{*}\). Also we may expect that if \(f(x)\) and \(g(y)\) are linear functions in \(\mathcal{U}\) and \(\mathcal{V}\) respectively then it is their product, \(f(x) g(y)\), that should in some sense be the general element of \(\mathcal{W}^{*}\). This product is a function \(h(x, y)\), defined for \(x\) in \(\mathcal{U}\) and \(y\) in \(\mathcal{V}\), with the property that for each fixed value of one variable it is a linear function of the other: in other words \(h(x, y)\) is a bilinear function of \(x\) and \(y\). This discussion is meant to motivate the formal work that we begin in the next paragraph.

Let \(\mathcal{U}\) and \(\mathcal{V}\) be vector spaces of dimensions \(n\) and \(m\) respectively; we denote by \(\mathcal{W}^{*}\) the set of all bilinear functions \(f(x, y)\) defined for \(x\) in \(\mathcal{U}\) and \(y\) in \(\mathcal{V}\). Let \(\mathcal{W}\) be the dual space of \(\mathcal{W}^{*}\) (i.e., \(\mathcal{W}\) is the set of all linear functions \(z(f)\) defined for \(f\) in \(\mathcal{W}^{*}\)): we call \(\mathcal{W} = \mathcal{U} \otimes \mathcal{V}\) the tensor product of \(\mathcal{U}\) and \(\mathcal{V}\). To every pair \((x_{0}, y_{0})\) of vectors with \(x_{0}\) in \(\mathcal{U}\) and \(y_{0}\) in \(\mathcal{V}\) we make correspond the element \(z_{0}(f)\) in \(\mathcal{W}\) defined by \(z_{0}(f)=f(x_{0}, y_{0})\). (It is easy to verify that \(z_{0}(f)\) is a linear function of \(f\).) We write \(z_{0}=x_{0} \otimes y_{0}\) and call \(z_{0}\) the tensor product of \(x_{0}\) and \(y_{0}\). We shall consistently use the notation \(x\) for vectors of \(\mathcal{U}\), \(y\) for vectors of \(\mathcal{V}\), and \(z\) for vectors of the vector space \(\mathcal{W}\).

8. Dimension of a tensor product

We observe that the dimension of \(\mathcal{W}^{*}\) is \({nm}\). For, exactly as in (IV.3) above, we may choose bases \(\{x_{i}\}\) and \(\{y_{j}\}\) in \(\mathcal{U}\) and \(\mathcal{V}\) respectively, and then we may find bilinear functions \(f_{\alpha \beta}(x, y)\) subject to the requirement that \(f_{\alpha \beta}(x_{i}, y_{j})=\delta_{i \alpha} \delta_{j \beta}\). It is then easy to show that the \(f_{\alpha \beta}\) are linearly independent and that every bilinear function is a linear combination of them.

We shall also need the fact that the elements \(z_{ij}=x_{i} \otimes y_{j}\) of \(\mathcal{W}\) are a basis in \(\mathcal{W}\). According to the preceding paragraph we need only prove that they are linearly independent. If \(\sum_{ij} a_{ij} z_{ij}(f)=0\) for all \(f\) then we should have, in particular, \(\sum_{ij} a_{ij} f_{\alpha \beta}(x_{i}, y_{j})=a_{\alpha \beta}=0\) for all \(\alpha\) and \(\beta\), as was to be proved.

9. The dual of a tensor product

If \(z_{1}=x_{1} \otimes y_{1}\) and \(z_{2}=x_{2} \otimes y_{2}\) and if \(z=(a_{1} x_{1}+a_{2} x_{2}) \otimes y\) then \(z=a_{1} z_{1}+a_{2} z_{2}\). For we have, for every bilinear function: \(f(x, y)\), \[ \begin{aligned} z(f)&=f(a_{1} x_{1}+a_{2} x_{2}, y)\\ &=a_{1} f(x_{1}, y)+a_{2} f(x_{2}, y)\\ &=a_{1} z_{1}(f)+a_{2} z_{2}(f). \end{aligned} \] Similarly we can show that \[ x \otimes (b_{1} y_{1}+b_{2} y_{2})=b_{1}(x \otimes y_{1})+b_{2}(x \otimes y), \] so that \(x \otimes y\) depends linearly on each of its factors when the other is held fixed. It follows from the preceding paragraph that every element in \(\mathcal{W}\) is a sum of tensor products \(x \otimes y\) (not necessarily uniquely). It is also easy to prove, using the bilinear character of \(x \otimes y\), that every linear function of \(z\) (i.e., every element in the dual space of \(\mathcal{W}\)) is a bilinear function of \(x\) and \(y\) and consequently a sum of products of the form \(f(x) g(y)\), where \(f\) and \(g\) are linear functions defined on \(\mathcal{U}\) and \(\mathcal{V}\) respectively. Hence for general vector spaces our definition of tensor product fulfills the conditions (heuristically derived above) of our program. Before investigating the relation of tensor product spaces to linear transformations, we examine the situation in inner product spaces.

10. Tensor product of inner product spaces

If \(\mathcal{U}\) and \(\mathcal{V}\) are inner product spaces the construction of the preceding sections applies unaltered: the only new problem is to introduce into the tensor product \(\mathcal{W} = \mathcal{U} \otimes \mathcal{V}\) an inner product related to the given inner products in the factor spaces in a suitable way. It is technically easier to define inner product not in \(\mathcal{W}\) but in \(\mathcal{W}^{*}\) and then apply the general theory of duals of inner product spaces to find a tensor product in \(\mathcal{W}\).

If \(f=f(x, y)\) is any element of \(\mathcal{W}^*\), \(f\) can be written as a sum of products of the form \(g(x) h(y)\), or, since \(\mathcal{U}\) and \(\mathcal{V}\) are inner product spaces, \(f\) can be written as a sum of expressions of the form \(\langle x, x_{0} \rangle \langle y, y_{0} \rangle\). Hence if \(f^{\prime}\) and \(f^{\prime \prime}\) are any two elements of \(\mathcal{W}^{*}\) we may write

\[ f^{\prime}(x, y)=\sum_i\langle x, x_{i}^{\prime} \rangle \langle y, y_{i}^{\prime} \rangle \quad \text{ and } \quad f^{\prime \prime}(x, y)=\sum_{j}\langle x, x_{j}^{\prime\prime}\rangle \langle y, y_{j}^{\prime\prime} \rangle. \]

We write, by definition, \[ \langle f^{\prime}, f^{\prime\prime} \rangle = \sum_i \sum_{j}\langle x_{j}^{\prime\prime}, x_{i}^{\prime} \rangle \langle y_{j}^{\prime\prime}, y_{i}^{\prime} \rangle. \]

(The conjugate nature of the relation between vectors and linear functions again necessitates putting \(x_{j}^{\prime \prime}\) before \(x_{i}^{\prime}\).) Before we can even start to prove that this definition fulfills the conditions of the definition of an inner product, we must prove that it defines \(\langle f^{\prime}, {f}^{\prime \prime} \rangle\) independently of the representations as sums. To do this we observe that \[ \sum_j \langle x_j^{\prime\prime}, x_i^\prime \rangle \langle y_j^{\prime\prime}, y_i^\prime \rangle = \overline{f^{\prime\prime}(x_i^\prime, y_i^\prime)}, \] so that \[ \langle f^\prime, f^{\prime\prime} \rangle = \sum_i \overline{f^{\prime\prime}(x_i^\prime, y_i^\prime)}, \] whence \(\langle f^{\prime}, {f}^{\prime \prime} \rangle\) is independent of the particular representation of \(f^{\prime \prime}\). Since, moreover, in any given representations of \(f^\prime\) and \(f^{\prime\prime}\), \(\langle f^{\prime}, f^{\prime\prime} \rangle = \overline{\langle f^{\prime\prime}, f^{\prime} \rangle}\), it follows that \(\langle f^\prime, f^{\prime\prime} \rangle\) is also independent of the representation of \(f^{\prime}\).

It is easy to verify that the expression \(\langle f^\prime, f^{\prime\prime} \rangle\) is linear in \(f^\prime\), conjugate linear in \({f}^{\prime \prime}\), and Hermitian symmetric. It remains to prove that it is positive definite: i.e., that \(\langle f^{\prime}, f^{\prime} \rangle \geq 0\) for all \(f^{\prime}\), and that \(\langle f^{\prime}, f^{\prime} \rangle = 0\) if and only if \(f^{\prime}=0\). This surprisingly, is not trivial: it requires Schur’s theorem, proved in (IV.2).

We have \[ \langle f^{\prime}, f^{\prime} \rangle = \sum_i^{p} \sum_{j}^{p}\langle x_{j}^{\prime}, x_{i}^{\prime} \rangle \langle y_{j}^{\prime}, y_{i}^{\prime} \rangle. \]

Let \(a_{1}, a_{2}, \ldots, a_{p}\) be arbitrary complex numbers. Then \[ \begin{aligned} \sum_i \sum_{j}\langle x_{j}^{\prime}, x_{i}^{\prime} \rangle \bar{a}_{i} a_{j} &= \Big\langle \sum_{j} a_{j} x_{j}^{\prime}, \sum_i a_{i} x_{i}^{\prime} \Big\rangle\\ &= \Big|\sum_i a_{i} x_{i}^{\prime}\Big|^{2}\\ &\geq 0, \end{aligned} \] so that the matrix whose general element is \(\langle x_{j}^{\prime}, x_{i}^{\prime} \rangle\) is non-negative. Similarly we may show that the matrix whose general element is \(\langle y_{j}^{\prime}, y_{i}^\prime \rangle\) is non-negative; it follows from Schur’s theorem that the matrix whose general element is the product \(\langle x_{j}^{\prime}, x_{i}^{\prime} \rangle \langle y_{j}^{\prime}, y_{i}^{\prime} \rangle\) is also non-negative. Hence \[ \sum_i \sum_{j}\langle x_{j}^{\prime}, x_{i}^{\prime} \rangle \langle y_{j}^{\prime}, y_{i}^{\prime} \rangle a_{i} \bar{a}_{j} \geq 0 \] for every choice of the complex numbers \((a_{1}, a_{2}, \ldots, a_{p})\): choosing \(a_{i}=1\) for all \(i\) proves that \(\langle f^{\prime}, f^{\prime} \rangle \geq 0\).

In order to prove that \(\langle f^{\prime}, f^{\prime} \rangle = 0\) implies \(f^{\prime}=0\) we proceed as follows. For the expression \(\langle f^{\prime}, f^{\prime} \rangle\), which now has all other properties of an inner product, we may prove the Schwartz inequality as in (I.4): \[ |\langle f^{\prime}, f^{\prime\prime} \rangle| \leq \Big(\langle f^\prime, f^{\prime} \rangle \, \langle f^{\prime\prime}, f^{\prime\prime} \rangle \Big)^{1 / 2}. \]

It follows that the vanishing of \(\langle f^{\prime}, f^{\prime} \rangle\) implies the vanishing of \(\langle f^{\prime}, f^{\prime \prime} \rangle\) for all \(f^{\prime \prime}\). Let \(x^{\prime \prime}\) and \(y^{\prime \prime}\) be arbitrary vectors and take, in particular, \(f^{\prime\prime}=f^{\prime\prime}(x, y)= \langle x, x^{\prime\prime} \rangle \langle y, y^{\prime\prime} \rangle\). The vanishing of \(\langle f^{\prime}, f^{\prime\prime} \rangle\) implies that \[ \begin{aligned} 0 &= \langle f^{\prime}, f^{\prime\prime} \rangle\\ &= \sum_i \langle x^{\prime\prime}, x_{i}^{\prime} \rangle \langle y^{\prime\prime}, y_{i}^{\prime} \rangle\\ &= f^{\prime}(x^{\prime\prime}, y^{\prime\prime}); \end{aligned} \] hence the vanishing of \(\langle f^{\prime}, f^{\prime\prime} \rangle\) for all \(f^{\prime \prime}\) implies that \(f^{\prime}(x^{\prime\prime}, y^{\prime \prime})=0\), for every pair \(x^{\prime \prime}\), \(y^{\prime \prime}\) of vectors, or, in other words, that \(f^{\prime}=0\).

This concludes the introduction of an inner product in \(\mathcal{W}^{*}\). Applying the results of (IV.4) we obtain an inner product in the dual space \(\mathcal{W}\) of \(\mathcal{W}^{*}\), so that \(\mathcal{W}\) becomes an inner product space.

11. The inner product in a tensor product

It is now easy to prove that the inner product defined in \(\mathcal{W}\) has the property that

\[ \langle x^{\prime} \otimes y^{\prime}, x^{\prime \prime} \otimes y^{\prime \prime} \rangle = \langle x^{\prime}, x^{\prime \prime} \rangle \langle y^{\prime}, y^{\prime \prime} \rangle. \]

We write \[ z^{\prime}=x^{\prime} \otimes y^{\prime}, \quad \text{ and } \quad z^{\prime \prime}=x^{\prime \prime} \otimes y^{\prime \prime}, \] and we define \[ z_{0}^{\prime}=z_{0}^{\prime}(f)= \langle f, f^{\prime} \rangle, \qquad z_{0}^{\prime\prime}=z_{0}^{\prime\prime}(f)= \langle f, f^{\prime\prime} \rangle, \] where \(f^{\prime}\) and \(f^{\prime\prime}\) are the particular bilinear functions defined by \[ f^{\prime}(x, y)= \langle x, x^{\prime} \rangle \langle y, y^{\prime} \rangle \quad \text{and} \quad f^{\prime \prime}(x, y)= \langle x, x^{n} \rangle \langle y, y^{n} \rangle. \]

For an arbitrary \(f=\sum_{i} \langle x, x_{i} \rangle \langle y, y_{i} \rangle\) we have \[ \begin{aligned} z_{0}^{\prime}(f) &= \sum_i \langle x^{\prime}, x_{i} \rangle \langle y^{\prime}, y_{i} \rangle\\ &= f(x^{\prime}, y^{\prime})\\ &= z^{\prime}(f) \end{aligned} \] and \[ \begin{aligned} z_{0}^{\prime\prime}(f) &= \sum_i \langle x^{\prime\prime}, x_{i} \rangle \langle y^{\prime\prime}, y_{i} \rangle\\ &= f(x^{\prime\prime}, y^{\prime\prime})\\ &= z^{\prime\prime}(f). \end{aligned} \] (This is similar to the proof in (IV.5) if the equality of the two natural correspondences between an inner product space and its second dual.) Hence we have, finally, \[ \langle z^{\prime}, z^{\prime\prime} \rangle = \langle z_{0}^{\prime}, z_{0}^{\prime\prime} \rangle = \langle f^{\prime\prime}, f^{\prime} \rangle = \langle x^{\prime}, x^{\prime\prime} \rangle \langle y^{\prime}, y^{\prime \prime} \rangle, \] as was to be proved.

The last proved fact justifies the terminology of tensor product and describes completely the structure of \(\mathcal{W}\) and its relation to \(\mathcal{U}\) and \(\mathcal{V}\). It follows also that if \(\{x_{i}\}\) and \(\{y_{\alpha}\}\) are orthogonal bases in \(\mathcal{U}\) and \(\mathcal{V}\) respectively then \[ \langle x_{i} \otimes y_{\alpha}, x_{j} \otimes y_{\beta} \rangle = \langle x_{i}, x_{j} \rangle \langle y_{\alpha}, y_{\beta} \rangle = \delta_{ij} \delta_{\alpha \beta} \] so that the \(x_{i} \otimes y_{\alpha}\) form an orthonormal set in \(\mathcal{W}\). Since we have already seen that they form a maximal linearly independent set it follows that they are a complete orthonormal set, or an orthogonal basis, in \(\mathcal{W}\).

12. Tensor product of transformations

We are now in a position to examine the relation of linear transformations to the theory of tensor products. If \(A\) and \(B\) are linear transformations defined in \(\mathcal{U}\) and \(\mathcal{V}\) respectively we define a linear transformation \(C^{*}\) in \(\mathcal{W}^{*}\) by \(C^{*} f(x, y)=f(A x, B y)\), and then a linear transformation \({C}\) in \(\mathcal{W}\) by \({Cz}({f})=z(C^{*} f)\). In brief: \[ C\Big(z\big(f(x, y)\big)\Big) = z\big(f(A x, B y)\big). \]

If we apply \(C\) to a particular \(z\) of the form \(z=x \otimes y\) (i.e., \(z(f)=f(x, y)\)) we obtain \[ C z=z\big(f(A x, B y)\big)= Ax \otimes By. \]

Since we have already remarked that every \(z\) is a sum of tensor products \(x \otimes y\) the relation \({Cz}={Ax} \otimes {By}\) completely characterizes \(C\). The linear transformation \(C\) in the space \(\mathcal{W}\) is called the tensor product of the linear transformations \(A\) and \(B\), \(C = A \otimes B\).

13. Kronecker products of matrices

Let \(A\) and \(B\) be linear transformations and \(\{x_{i}\}\) and \(\{y_{\alpha}\}\) orthogonal bases in \(\mathcal{U}\) and \(\mathcal{V}\) respectively. We find the matrix of the linear transformation \(C=A \otimes B\) in the orthogonal basis \(\{x_{i} \otimes y_{\alpha}\}\) of \(\mathcal{W}\). Naturally the matrix depends on the way in which these \(nm\) vectors are ordered in a linear order: we suppose first that the order is the lexicographical one, i.e., \[ x_1 \otimes y_1, \ldots, x_1 \otimes y_m, \quad x_2 \otimes y_1, \ldots, x_2 \otimes y_m, \quad \ldots, \quad x_n \otimes y_1, \ldots, x_n \otimes y_m. \]

We have \[ \begin{aligned} C(x_{j} \otimes y_{\beta}) &= A x_{j} \otimes B y_{\beta}\\ &= \Big(\sum_i a_{ij} x_{i}\Big) \otimes \Big(\sum_{\alpha} b_{\alpha \beta} y_{\alpha}\Big)\\ &= a_{1j} b_{1\beta}(x_{1} \otimes y_{1}) + a_{1j} b_{2 \beta}(x_{1} \otimes y_{2}) + \cdots + a_{1j} b_{m \beta}(x_{1} \otimes y_{m})+\cdots \end{aligned} \]

so that the matrix of \(C\) has the form \[ \begin{bmatrix} a_{11}b_{11} & a_{11}b_{12} & \cdots & a_{11}b_{1m} \quad & a_{12}b_{11} & a_{12}b_{12} & \cdots & a_{12}b_{1m} \quad & \cdots & \cdots \quad & a_{1n}b_{11} & a_{1n}b_{12} & \cdots & a_{1n}b_{1m}\\ a_{11}b_{21} & a_{11}b_{22} & \cdots & a_{11}b_{2m} \quad & a_{12}b_{21} & a_{12}b_{22} & \cdots & a_{12}b_{2m} \quad & \cdots & \cdots \quad & a_{1n}b_{21} & a_{1n}b_{22} & \cdots & a_{1n}b_{2m}\\ \vdots & \vdots & \ddots & \vdots \quad & \vdots & \vdots & \ddots & \vdots \quad & \cdots & \cdots \quad & \vdots & \vdots & \ddots & \vdots\\ a_{11}b_{m1} & a_{11}b_{m2} & \cdots & a_{11}b_{mm} \quad & a_{12}b_{m1} & a_{12}b_{m2} & \cdots & a_{12}b_{mm} \quad & \cdots & \cdots \quad & a_{1n}b_{m1} & a_{1n}b_{m2} & \cdots & a_{1n}b_{mm}\\ \, & \, & \, & \, & \, & \, & \, & \, & \, & \, & \, & \, & \, & \,\\ a_{21}b_{11} & a_{21}b_{12} & \cdots & a_{21}b_{1m} \quad & a_{22}b_{11} & a_{22}b_{12} & \cdots & a_{22}b_{1m} \quad & \cdots & \cdots \quad & a_{2n}b_{11} & a_{2n}b_{12} & \cdots & a_{2n}b_{1m}\\ a_{21}b_{21} & a_{21}b_{22} & \cdots & a_{21}b_{2m} \quad & a_{22}b_{21} & a_{22}b_{22} & \cdots & a_{22}b_{2m} \quad & \cdots & \cdots \quad & a_{2n}b_{21} & a_{2n}b_{22} & \cdots & a_{2n}b_{2m}\\ \vdots & \vdots & \ddots & \vdots \quad & \vdots & \vdots & \ddots & \vdots \quad & \cdots & \cdots \quad & \vdots & \vdots & \ddots & \vdots\\ a_{21}b_{m1} & a_{21}b_{m2} & \cdots & a_{21}b_{mm} \quad & a_{22}b_{m1} & a_{22}b_{m2} & \cdots & a_{22}b_{mm} \quad & \cdots & \cdots \quad & a_{2n}b_{m1} & a_{2n}b_{m2} & \cdots & a_{2n}b_{mm}\\ \, & \, & \, & \, & \, & \, & \, & \, & \, & \, & \, & \, & \, & \,\\ \vdots & \vdots & \vdots & \vdots \quad & \vdots & \vdots & \vdots & \vdots \quad & \ddots & \ddots \quad & \vdots & \vdots & \vdots & \vdots\\ \vdots & \vdots & \vdots & \vdots \quad & \vdots & \vdots & \vdots & \vdots \quad & \ddots & \ddots \quad & \vdots & \vdots & \vdots & \vdots\\ \, & \, & \, & \, & \, & \, & \, & \, & \, & \, & \, & \, & \, & \,\\ a_{n1}b_{11} & a_{n1}b_{12} & \cdots & a_{n1}b_{1m} \quad & a_{n2}b_{11} & a_{n2}b_{12} & \cdots & a_{n2}b_{1m} \quad & \cdots & \cdots \quad & a_{nn}b_{11} & a_{nn}b_{12} & \cdots & a_{nn}b_{1m}\\ a_{n1}b_{21} & a_{n1}b_{22} & \cdots & a_{n1}b_{2m} \quad & a_{n2}b_{21} & a_{n2}b_{22} & \cdots & a_{n2}b_{2m} \quad & \cdots & \cdots \quad & a_{nn}b_{21} & a_{nn}b_{22} & \cdots & a_{nn}b_{2m}\\ \vdots & \vdots & \ddots & \vdots \quad & \vdots & \vdots & \ddots & \vdots \quad & \cdots & \cdots \quad & \vdots & \vdots & \ddots & \vdots\\ a_{n1}b_{m1} & a_{n1}b_{m2} & \cdots & a_{n1}b_{mm} \quad & a_{n2}b_{m1} & a_{n2}b_{m2} & \cdots & a_{n2}b_{mm} \quad & \cdots & \cdots \quad & a_{nn}b_{m1} & a_{nn}b_{m2} & \cdots & a_{nn}b_{mm}\\ \end{bmatrix} \] or, in a condensed notation whose meaning is clear, \[ \begin{bmatrix} a_{11}[B] & a_{12}[B] & \cdots & a_{1 n}[B] \\ a_{21}[B] & a_{22}[B] & \cdots & a_{2 n}[B] \\ \vdots & \vdots & \ddots & \vdots \\ a_{n 1}[B] & a_{n 2}[B] & \cdots & a_{n n}[B] \end{bmatrix}. \]

If we had adopted, instead, the converse lexicographic ordering, i.e., \[ x_1 \otimes y_1, \ldots, x_n \otimes y_1, \quad x_1 \otimes y_2, \ldots, x_n \otimes y_2, \quad \ldots, \quad x_1 \otimes y_m, \ldots, x_n \otimes y_m. \] we should have found the matrix of \({C}\) to be \[ \begin{bmatrix} b_{11}[A] & b_{12}[A] & \ldots & b_{1 m}[A] \\ b_{21}[A] & b_{22}[A] & \ldots & b_{2 m}[A] \\ \vdots & \vdots & \ddots & \vdots \\ b_{m 1}[A] & b_{m 2}[A] & \ldots & b_{m m}[A] \end{bmatrix}. \]

The first of these two matrices is known as the Kronecker product, \([A] \otimes [B]\), of \([A]\) and \([B]\) (in this order!); the second one is \([B] \otimes [A]\). Since a permutation of the elements of an orthogonal basis is a trivial kind of change of basis (i.e. it is effected by a unitary transformation \(U\)) we obtain that \[ \big([A] \otimes[B]\big)=[U]\big([B] \otimes[A]\big)[U]^{-1}. \]

14. Properties of tensor product transformations

We now proceed to describe some of the elementary properties of tensor product transformations.

14.1. If \(A=\sum_i a_{i} A_{i}\) and \(B=\sum_{j} b_{j} B_{j}\) then \(A \otimes B=\sum_i \sum_{j} a_{i} b_{j}(A_{i} \otimes B_{j})\). For we have \[ \begin{aligned} (A \otimes B)(x \otimes y) &= Ax \otimes By\\ &= \Big(\sum_i a_i A_i x\Big) \otimes \Big(\sum_j b_j B_j y\Big)\\ &= \sum_i \sum_j a_i b_j (A_i \otimes B_j)(x \otimes y). \end{aligned} \]

14.2. If \(A=A_{1} A_{2}\) and \(B=B_{1} B_{2}\) then \(A \otimes B = (A_1 \otimes B_1)(A_2 \otimes B_2)\). For \[ \begin{aligned} (A \otimes B)(x \otimes y) &= A x \otimes B y\\ &= A_{1} A_{2} x \otimes B_{1} B_{2} y\\ &= (A_{1} \otimes B_{1})(A_{2} x \otimes B_{2} y)\\ &= (A_{1} \otimes B_{1})(A_{2} \otimes B_{2})(x \otimes y). \end{aligned} \] As immediate consequences of this result we obtain the formulas \[ \begin{aligned} (A \otimes B) &= (A \otimes 1)(1 \otimes B) = (1 \otimes B)(A \otimes 1), \\ (A \otimes B)^{-1} &= (A^{-1} \otimes B^{-1}). \end{aligned} \]