Vector Spaces and Inner Product Spaces


 

1. Real and complex vector spaces

Definition 1. A vector space, \(\mathcal{V}\), is a set of elements, \(x\), \(y\), \(z\), etc., called vectors or points, which satisfies the following axioms.

1.1. To every pair, \(x\) and \(y\), of vectors in \(\mathcal{V}\) there corresponds a vector \(z\), called the sum of \(x\) and \(y\), \(z=x+y\).

1.2. Addition is commutative: \(x+y=y+x\).

1.3. Addition is associative: \((x+y)+z=x+(y+z)\).

1.4. There exists in \(\mathcal{V}\) a unique vector \(0\) such that for all \(x\) in \(\mathcal{V}\), \(x+0=x\).

1.5. To every pair \((a, x)\), where \(a\) is a complex number and \(x\) is in \(\mathcal{V}\), there corresponds a vector \(y\) in \(\mathcal{V}\), called the product of \(a\) and \(x\), \(y=a x\), such that \[ \begin{aligned} a(x+y)&=ax+ay,\\ (a+b) x&=a x+b x,\\ a(b x)&=(a b) x,\\ 0 x&=0,\\ 1 x&=x. \end{aligned} \]

We shall refer to complex numbers as scalars or constants; later in these notes we shall discuss vector spaces in which the scalars may lie in an arbitrary field (not necessarily the field of complex numbers). To differentiate between the various cases we shall refer to the vector spaces just defined as complex vector spaces; if (1.5) is satisfied for real numbers, \(a\), we shall refer to real vector spaces. Even though throughout the first three chapters we shall be concerned with complex vector spaces only, it will be very helpful to keep in mind the best known example of real vector spaces, namely, two-dimensional real Euclidean space. This space, \(E_{2}\), is the set of all pairs, \((a_{1}, a_{2})=x\), of real numbers. Addition, zero, and scalar multiplication are defined by the formulas \[ \begin{aligned} (a_{1}, a_{2})+(b_{1}, b_{2})&=(a_{1}+b_{1}, a_{2}+b_{2}), \\ 0_{2}&=(0,0),\\ b(a_{1}, a_{2})&=(b a_{1}, b a_{2}). \end{aligned} \]

For any vector \(x\) we write, by definition, \(-x=(-1) x\); instead of \(x+(-y)\) we write \(x-y\). We observe that \[ x-x=1 x+(-1) x=(1-1) x=0 x=0 \] (so that the elements of \(\mathcal{V}\) form under addition an Abelian group with \(-x\) playing the role of the inverse of \(x\)) and that for any scalar, \(a\), and an arbitrary \(x\),

\[ a 0=a(0 x)=(a 0) x=0 x=0 \]

(We comment on the fact that the symbol \(0\) is used in two different senses, once as a scalar and once as a vector. Later we shall even assign a third meaning to the symbol, but the relation between the various interpretations of it is such that no confusion should arise from this practice.)

2. Linear dependence and dimension

Definition 2. The vectors \(x_{1}, x_{2}, \ldots, x_{n}\) in are linearly dependent if there exist \(n\) scalars, \(a_{1}, a_{2}, \ldots, a_{n}\), not all zero, such that \[ a_{1} x_{1}+a_{2} x_{2}+\cdots+a_{n} x_{n}=0. \]

If no such scalars, \(a\), exist, the vectors \(x_{i}\) (\(i=1,2, \ldots, n\)) are linearly independent.

We shall assume throughout these notes, that there exists a positive integer \(n\) such that every set of more than \(n\) vectors is linearly dependent. If, moreover, a set of \(n\) linearly independent vectors actually exists, we shell say that the linear dimension of \(\mathcal{V}\) is \(n\).

3. Inner products and inner product spaces

Two fundamental notions in Euclidean geometry, that occur already in the study of the space \(E_{2}\), are the notions of angle and length. In the following, we shall abstract the essential properties of these two notions and carry them over to abstract vector spaces. It turns out more convenient for our purposes to study the analog of not the angle between two vectors but of the cosine of this angle. Suppose then that \(x=(a_{1}, a_{2})\) and \(y=(b_{1}, b_{2})\) are two vectors in \(E_{2}\); we denote the angle between \(x\) (or \(y\)) and the positive real axis by \(\alpha\) (or \(\beta\)). Then the cosine of the angle between \(x\) and \(y\) becomes \[ \cos (\alpha-\beta)=\cos \alpha \cos \beta+\sin \alpha \sin \beta=\frac{a_{1} b_{1}+a_{2} b_{2}}{|x||y|} \] (where \(|x|\) denotes the distance of \(x\) from the origin, \(|x|=(a_{1}^{2}+a_{2}^{2})^{1/2}\)). It is the expression \(a_{1} b_{1}+a_{2} b_{2}\), which we shall denote by \(\langle x, y \rangle\), that is of interest to us. It is easy to verify that \(\langle x, y \rangle\), considered as a numerical valued function of the pair of vectors \(x\) and \(y\), is symmetric in \(x\) and \(y\), depends linearly on both \(x\) and \(y\), and has the property that \(\langle x, x \rangle\), which is the square of the distance between \(x\) and the origin, vanishes if and only if \(x=0\) and is otherwise positive.

The properties of \(\langle x, y \rangle\) are in one respect too special for our purposes. For consider the example of one dimensional complex Euclidean space: i.e., the set of all complex numbers \(z\). In this space, as well as in \(E_{2}\), the notions of angle and length are defined, but if the expression \(\langle x, y \rangle\), (which in case \(|x|=|y|=1\) is equal to the cosine of the angle between \(x\) and \(y\)), were linear in \(x\) and \(y\) then we should have \[ \langle i,-i \rangle = -i^{2}\langle 1, 1 \rangle = 1, \] (where \(i=(-1)^{1 / 2}\)). This would imply that the angle between \(i\) and \(-i\) is zero. It is easily verified that in fact the cosine of the angle between \(x\) and \(y\) (where \(x\) and \(y\) are two complex numbers of distance one from the origin) is the real part of \(x \bar{y}\), where \(\bar{y}\) denotes the complex conjugate of \(y\). In other words in this case the expression \(x \bar{y}\), which is neither symmetric nor linear in \(y\), takes the place of the expression \(xy\) which might have been suggested by analogy with the situation in \(E_{2}\). Using the results of the last two paragraphs as a heuristic indication of what we should require in the general case we proceed to the formal definition.

Definition 3. An inner product, \(\langle x, y \rangle\), in a vector space, \(\mathcal{V}\), is a complex numerically valued function of the ordered pair of vectors \(x\), \(y\), such that

3.1. \(\langle x, y \rangle = \overline{\langle y, x \rangle}\),

3.2. \(\langle a_{1} x_{1}+a_{2} x_{2}, y \rangle = a_{1}\langle x_{1}, y \rangle + a_{2}\langle x_{2}, y \rangle\),

3.3. \(\langle x, x \rangle \geq 0\); \(\langle x, x \rangle = 0\) implies \(x=0\).

An inner product space, \(\mathcal{U}\), is a vector space in which an inner product is defined.

If \(\mathcal{U}\) is an inner product space with the inner product \(\langle x, y \rangle\) then we have \[ \begin{aligned} \langle x, b_{1} y_{1}+b_{2} y_{2} \rangle &= \overline{\langle b_{1} y_{1}+b_{2} y_{2}, x \rangle} \\ &=\overline{b_{1}}\,\overline{\langle y_{1}, x \rangle}+\overline{b_{2}}\,\overline{\langle y_{2}, x \rangle}\\ &=\overline{b_{1}}\langle x, y_{1}\rangle + \overline{b_{2}}\langle x, y_{2}\rangle. \end{aligned} \]

In analogy with the situation in Euclidean space we define in an inner product space the length, \(|x|\), of a vector \(x\), by the formula \[|x|=\langle x, x \rangle^{1 / 2}.\] We note that \[ |a x|=|a||x|. \]

4. Schwartz Inequality

We shall now prove for an inner product space the following inequality, known as Schwartz’ Inequality or Cauchy’s Inequality.

Theorem 1. \[ |\langle x, y \rangle| \leq |x||y|. \]

Proof. Since \[ \begin{aligned} 0 &\leq |x-y|^{2}\\ &=\langle x-y, x-y \rangle\\ &=\langle x, x \rangle - \langle y, x \rangle - \langle x, y \rangle + \langle y, y \rangle \\ & =|x|^{2}-\Big[\langle x, y \rangle + \overline{\langle x, y \rangle}\Big]+|y|^{2}\\ &=|x|^{2}-2\operatorname{Re}\langle x, y \rangle + |y|^{2}, \end{aligned} \] (where we use \(\operatorname{Re} z\) and \(\operatorname{Im} z\) to denote the real and imaginary parts of \(z\), respectively) it follows that \[ \operatorname{Re}\langle x, y\rangle \leq \frac{1}{2}\big(|x|^{2}+|y|^{2}\big). \]

Since, furthermore, for any complex number \(z\), we can find a complex number \(a\) of absolute value one such that \(az\) is real and non-negative, we may find such a number \(a\) when \(z = \langle x, y \rangle\). Then \[ |\langle x, y \rangle|=|a\langle x, y\rangle|=\operatorname{Re}a\langle x, y\rangle = \operatorname{Re}\langle a x, y\rangle. \]

Applying the preceding inequality with \(ax\) in place of \(x\) we obtain \[ \begin{aligned} |\langle x, y \rangle|&=\operatorname{Re}\langle ax, y \rangle\\ &\leq \frac{1}{2}\left(|a x|^{2}+|y|^{2}\right)\\ &=\frac{1}{2}\left(|x|^{2}+|y|^{2}\right). \end{aligned} \]

Let \(b\) be a positive number for which \(b^{2}=|y| /|x|\) and apply the last inequality to the vectors \(b x\) and \((1 / b) y\) in place of \(x\) and \(y\). We obtain \[ \begin{aligned} |\langle x, y \rangle|&=\left|\left\langle b x,\Big(\frac{1}{b}\Big) y\right\rangle\right|\\ &\leq \frac{1}{2}\left(b^{2}|x|^{2}+\Big(\frac{1}{b}\Big)^{2}|y|^{2}\right)\\ &=\frac{1}{2}\big(|x||y|+|x||y|\big), \end{aligned} \] and this is the desired result.

5. Triangle Inequality in inner product spaces

A geometric interpretation and application of the Schwartz inequality is the following. For any two vectors \(x\) ind \(y\) in the inner product space \(\mathcal{U}\) we may define the distance between \(x\) and \(y\) to be \(|x-y|\). It is clear that this distance is non-negative and symmetric in \(x\) and \(y\), and that it vanishes if and only if \(x=y\). It is precisely the Schwartz inequality which implies that this distance has the remaining property usually required of a distance function, namely that the triangle inequality, \[ |x-z| \leq |x-y|+|y-z|, \] is valid. For we have \[ \begin{aligned} |x+y|^{2} &= \langle x+y, x+y\rangle\\ &= \langle x, x \rangle +\langle x, y \rangle + \langle y, x \rangle+ \langle y, y\rangle \\ &= |x|^{2}+2\operatorname{Re}\langle x, y \rangle + |y|^{2}\\ &\leq |x|^{2}+2|\langle x, y \rangle|+|y|^{2} \\ &\leq |x|^{2}+2|x||y|+|y|^{2}\\ &= \big(|x|+|y|\big)^{2}, \end{aligned} \] so that \(|x+y| \leq |x|+|y|\), and replacing \(x\) by \(x-y\) and \(y\) by \(y-z\), we obtain the triangle inequality.

6. \(n\)-dimensional complex Euclidean space

As an example of an inner product space, we consider \(n\)-dimensional complex Euclidean space, \(\overline{E}_n\), which is the set of all ordered sets of \(n\) complex numbers. If \(x=(a_{1},a_{2},\ldots, a_{n})\) and \(y=(b_{1}, b_{2}, \ldots, b_{n})\) are two points of this space and \(a\) and \(b\) are complex numbers, we define \(a x+b y\) to be \[ (a a_{1}+b b_{1}, a a_{2}+b b_{2}, \ldots, a a_{n}+b b_{n}); \] if in addition we define \(0=(0,0, \ldots, 0)\) and \[ \langle x, y \rangle = a_{1} \bar{b}_{1}+a_{2} \bar{b}_{2}+\cdots+a_{n} \bar{b}_{n}, \] it is easy to verify that \(\overline{E}_n\) is an inner product space. Applying the Schwartz inequality to a pair of vectors \(x\) and \(y\) in this space we get the arithmetic relation \[ \left|\sum_{j=1}^{n} a_{j} \bar{b}_{j}\right|^{2} \leq \sum_{j=1}^{n}|a_{j}|^{2} \sum_{j=1}^{n}|b_{j}|^{2}. \]

Since in an arbitrary inner product space the expression \(\langle x, y \rangle /|x||y|\) plays the role of the cosine of the angle between \(x\) and \(y\), the Schwartz inequality can also be considered as a generalization of the fact that for real angles \(t\), \(|\cos t| \leq 1\).

7. Linear subspaces

Definition 4. A linear subspace 1 (also called a vector subspace or simply a subspace) in a vector space \(\mathcal{V}\) is a non-empty set, \(M\), of vectors in \(\mathcal{V}\), which contains along with every pair of vectors \(x\) and \(y\) the vector \(a x+b y\), for arbitrary complex numbers \(a\) and \(b\).

It follows immediately from the definitions that a linear subspace in a vector space (or an inner product space) is a vector space (or an inner product space).

8. Orthonormal sets

Definition 5. For vectors \(x\) and \(y\) in an inner product space we say that \(x\) is orthogonal to \(y\), in symbols \(x \perp y\), if \(\langle x, y \rangle = 0\). Two linear subspaces, \(M\) and \(N\), are orthogonal if every vector in each is orthogonal to every vector in the other. A set, \(\{x_{i}\}\), of vectors is an orthonormal set if \(\langle x_{i}, x_{j} \rangle = \delta_{ij}\) (where \(\delta_{ij}\) is the Kronecker symbol which is equal to \(1\) when \(i=j\) and is equal to \(0\) otherwise).

If \(\{x_i\}\) is a finite orthonormal set, then \(\sum_{i} a_{i} x_{i}=0\) implies that \[ a_{j}=\sum_{i} a_{i}\langle x_{i}, x_{j} \rangle = 0, \] so that an orthonormal set is linearly independent. Hence if \(n\) is the linear dimension of the space, no orthonormal set can have more than \(n\) elements; the maximal number of elements is the orthogonal dimension of the space.

If an orthonormal set is not contained in any larger orthonormal set it is complete.

It follows from the definition of the terms that the orthogonal dimension is not greater than the linear dimension.

9. Intersection and linear sum of linear subspaces

The intersection of any collection of linear subspaces is again a linear subspace. If \(M\) and \(N\) are linear subspaces we shall denote their intersection by \(M \cap N\). If \(K\) is an arbitrary set of vectors there exist linear subspaces containing \(K\), (since the whole space is such), and we may form the intersection of all linear subspaces containing \(K\). This linear subspace is the linear subspace spanned by \(K\); if \(M\) and \(N\) are linear subspaces we shall denote the linear subspace spanned by the vectors of \(M\) and \(N\) by \(M + N\). \(M+N\) is the linear sum of \(M\) and \(N\).

10. Bessel’s Inequality

Theorem 2. If \(\{x_{i}\}\) is an orthonormal set, \(x\) is any vector, and \(a_{i}=\langle x, x_{i} \rangle\), then \[ \sum_{i}{|a_i|^{2}} \leq |x|^{2}. \]

Proof. Let \(x^{*}\) be the vector \[ x^{*}=x-\sum_{i} a_{i} x_{i}. \] Then, since \(|x^{*}| \geq 0\), we have \[ \begin{aligned} 0 &\leq\bigg\langle x-\sum_i a_i x_i, x-\sum_i a_i x_i\bigg\rangle \\ &=\langle x, x \rangle - \bigg\langle x, \sum_i a_i x_i\bigg\rangle - \bigg\langle \sum_i a_i x_i, x\bigg\rangle + \bigg\langle\sum_i a_i x_i, \sum_i a_i x_i\bigg\rangle \\ &=\langle x, x \rangle -\sum_i \bar{a}_i\langle x, x_i\rangle - \sum_i a_i\langle x_i, x \rangle + \sum_i \sum_j a_i \bar{a}_j\langle x_i, x_j \rangle \\ &=\langle x, x \rangle - \sum_i|a_i|^2-\sum_i|a_i|^2+\sum_i|a_i|^2\\ &=|x|^2-\sum_i|a_i|^2, \end{aligned} \]and this is the desired result.

Corollary 1. \(x^*=x-\sum_{i} a_{i} x_{i}\) is orthogonal to each \(x_{i}\) and therefore to the linear subspace spanned by \(\{x_{i}\}\); a necessary and sufficient condition that \(x\) belong to this linear subspace is that \(x=\sum_{i}a_{i} x_{i}\).

Proof. We have \[ \begin{aligned} \langle x^{*}, x_{j} \rangle &= \langle x, x_{j} \rangle - \sum_{i} a_{i}\langle x_{i}, x_{j} \rangle\\ &=a_{j}-\sum_{i} a_i \delta_{i j}\\ &=0. \end{aligned} \] This proves the first part of the corollary. The sufficiency of the condition of the second part is obvious. If on the other hand, \(x\) belongs to the linear subspace spanned by \(\{x_{i}\}\) then \(x^{*}\), being a linear combination of \(x\) and the \(x_{i}\) must also belong to this linear subspace Since, however, \(x^{*}\) is orthogonal to every vector in this linear subspace it must be orthogonal to itself, i.e., \(x^{*}=0\), and the condition is necessary.

11. Characterization of complete orthonormal sets

We shall now prove that the following five conditions on an orthonormal set \(x_{i}\) are equivalent to each other.

11.1. \(\{x_{i}\}\) is complete.

11.2. \(\langle x, x_{i} \rangle = 0\) for all \(i\) implies \(x=0\).

11.3. The linear subspace spanned by \(\{x_{i}\}\) is the whole space, \(\mathcal{U}\).

11.4. For every \(x\) in \(\mathcal{U}\), \(x=\sum_{i} a_i x_{i}\), where \(a_{i}=\langle x, x_i \rangle\).

11.5. For every pair \(x\), \(y\) in \(\mathcal{U}\) \[ \langle x, y \rangle = \sum_{i} a_{i} \bar{b}_{i}, \] where \(a_i = \langle x, x_{i} \rangle\) and \(b_{i} = \langle y, x_{i} \rangle\). (11.5 is Parseval’s identity.)

We shall prove this by establishing the following implication relations.

\[ (11.1) \implies (11.2) \implies (11.3) \implies (11.4) \implies (11.5) \implies (11.1) \]

(11.1) \(\implies\) (11.2): If there exists a vector \(x \neq 0\) with \(\langle x, x_{i} \rangle = 0\) for all \(i\), then the set of vectors formed by \(\{x_{i}\}\) and \(x /|x|\) is an orthonormal set containing \(\{x_{i}\}\).

(11.2) \(\implies\) (11.3): If \(\{x_{i}\}\) does not span the whole space then, by the corollary to Bessel’s inequality, there exists a vector \(x\) which is not of the form \(\sum_{i} a_{i} x_{i}\). It follows from this corollary that the vector \(x^*=x-\sum_{i}\langle x, x_{i} \rangle x_{i}\) is different from zero and is orthogonal to each \(x_{i}\).

(11.3) \(\implies\) (11.4): This implication is a direct consequence of the corollary to Bessel’s inequality.

(11.4) \(\implies\) (11.5): Assuming (11.4) we have \(x=\sum_i a_i x_{i}\) and \(y=\sum_i b_{i} x_{i}\), whence \[ \begin{aligned} \langle x, y \rangle &= \bigg\langle \sum_{i} a_{i} x_{i}, \sum_{j} b_{j} x_{j}\bigg\rangle\\ &= \sum_{i} \sum_{j} a_{i} b_{j}\langle x_{i}, x_{j}\rangle\\ &= \sum_{i}a_i \bar{b}_{i}. \end{aligned} \]

(11.5) \(\implies\) (11.1): If \(\{x_{i}\}\) were contained in a larger orthonormal set, say if \(x_{0}\) is orthogonal to each \(x_{i}\), then taking \(x=y=x_{0}\) in (11.5) we obtain \(|x_{0}|^{2}=0\).

We observe that (11.5) with \(x=y\) is the natural generalization of the Pythagorean theorem.

12. Erhardt Schmidt orthogonalization process

Although we have proved several relations among properties of complete orthonormal sets, we have never yet established that such sets exist. The purpose of the construction that follows is to prove the existence of complete orthonormal sets.

Let \(\mathcal{U}\) be an inner product space of linear dimension \(n\), and let \(z_{1}\), \(z_{2}, \ldots, z_{n}\) be a set of linearly independent vectors. We define \(x_{1}=z_{1} /|z_{1}|\); then the set \(\{x_{1}\}\) consisting of the single vector \(x_{1}\) is an orthonormal set. We proceed by induction. Suppose that the vectors \(x_{1}, x_{2}, \ldots, x_{r}\) have been defined so that they form an orthonormal set and so that \(x_{i}\) is a linear combination of \(z_{1}, \ldots, z_{1}\) for \(i=1,2, \ldots, r\). We write \(y=c_{1} x_{1}+\cdots+c_{r} x_{r}+z_{r+1}\), and we determine the coefficients \(c_{i}\) so that \(y\) should be orthogonal to each \(x_{i}\), \(i=1, \ldots, r\). Since \(\langle y, x_{i} \rangle = c_{i} + \langle z_{r+1}, x_{i} \rangle\), we may choose \(c_{i}=-\langle z_{r+1}, x_{i} \rangle\). Since \(y\) is a linear combination of \(x_{1}, \ldots, x_{r}\) and \(z_{r+1}\) and is therefore a linear combination of \(z_{1}, \ldots, z_{r}, z_{r+1}\) in such a way that the coefficient of \(z_{r+1}\) does not vanish (in fact it is equal to \(1\)), the linear independence of the \(z_{i}\) implies that \(y \neq 0\). Hence we may define \(x_{r+1}=y /|y|\), and the set \(x_{1}, \ldots, x_{r}, x_{r+1}\) will again satisfy our induction hypothesis. The vectors \(x_{1}, \ldots, x_{n}\) so constructed are an orthonormal set which must be complete, for if it were contained in a larger orthonormal set then there would exist a set of more than \(n\) linearly independent vectors. This construction proves, therefore, that complete orthonormal sets do indeed exist and that the orthogonal dimension of \(\mathcal{U}\) is equal to its linear dimension.

For a complete orthonormal set, we shall also use the terms ‘coordinate system,’ ‘frame of reference,’ and ‘orthogonal basis.’ By ‘basis’ or ‘linear basis’ we mean a set of \(n\) linearly independent vectors, where \(n\) is the dimension of the space. (Since we have proved the equality of the orthogonal and linear dimensions, we shall in the future not distinguish between them.) If \(\{x_{i}\}\) is a coordinate system then, by (11.4), every \(x\) can be written in the form \(x=\sum_{i} a_{i} x_{i}\) where \(a_{i} = \langle x, x_{i} \rangle\). The numbers \(a_{j}\) are the coordinates of \(x\) with respect to \(\{x_{i}\}\).

13. The projection theorem

Theorem 3. Let \(M\) be a linear subspace in \(\mathcal{U}\). Every vector \(x\) in \(\mathcal{U}\) can be written in the form \(x=y+z\), where \(y\) is in \(M\) and \(z\) is orthogonal to every vector in \(M\), in one and only one way.

Proof. \(M\), being a linear subspace in an inner product space, is itself an inner product space, and we may find an orthonormal set \(\{x_{i}\}\) in \(M\) which is complete in \(M\). For any \(x\), write \(y=\sum_{i} a_{i} x_{i}\), where \(a_{i} = \langle x, x_{i} \rangle\). Then clearly \(y\) is in \(M\) and by the corollary to Bessel’s inequality, \(z=x-y\) is orthogonal to every vector in \(M\). This establishes the existence of the representation; to prove uniqueness suppose that \(x=y_{1}+z_{1}\) and \(x=y_{2}+z_{2}\), where \(y_{i}\) and \(z_{i}\) (\(i=1, 2\)) have the properties of \(y\) and \(z\), respectively. Then we should have \(y_{1}-y_{2}=z_{2}-z_{1}\): i.e., a vector in \(M\) is equal to a vector which is orthogonal to \(M\). It follows that this vector is orthogonal to itself and is therefore zero, whence \(y_{1}=y_{2}\) and \(z_{1}=z_{2}\), as was to be proved.

The vector \(y\) is the projection of \(x\) on \(M\); it is easy to verify that it is a generalization of the well known notion of perpendicular projection in Euclidean space. We shall investigate properties of projections more thoroughly in the next chapter.

14. Calculus of linear subspaces

We make some comments on the calculus of linear subspaces. If \(M\) is an arbitrary linear subspace, we denote by \(M^\perp\) the set of all vectors orthogonal to \(M\). It is clear that \(M^\perp\) is a linear subspace; we call \(M^\perp\) the orthocomplement of \(M\).

The projection theorem proves that \(M + M^\perp\), in the sense established in (9), is \(\mathcal{U}\). If \(M\) and \(N\) are any two linear subspaces, \(M \cap N^\perp\) is a linear subspace. If \(\{x_{i}\}\) is a complete orthonormal set in \(M\) and \(\{y_{j}\}\) is a complete orthonormal set in \(M^\perp\) then the set of vectors consisting of all \(x_{i}\) and all \(y_{j}\) is a complete orthonormal set in \(\mathcal{U}\). If \(M\) is any linear subspace and \(x\) is a vector such that \(x \perp M^\perp\), (i.e., if \(x\) lies in \(M^{\perp\perp}=\left(M^\perp\right)^\perp\)) then \(x\) is in \(M\). For by the projection theorem, we may write \(x=y+z\), with \(y\) in \(M\) and \(z\) in \(M^\perp\). Since an element, \(y\), of \(M\) is orthogonal to \(M^\perp\), and is therefore in \(M^{\perp\perp}\), \(x-y=z\) must also be in \(M^{\perp\perp}\). Since \(z\), therefore, is both in \(M^\perp\) and orthogonal to \(M^\perp\), \(z\) must be zero. It follows that \(M^{\perp\perp} = M\).

15. Representation of linear functions

Of interest in analysis in general and in our study in particular is the study of linear functions on vector spaces.

Definition 6. A linear function \(f(x)\) is a (complex) numerically valued function of the vector \(x\) such that for any two vectors \(x\) and \(y\) and any two complex numbers \(a\) and \(b\), \(f(a x+b y)=a f(x)+b f(y)\).

(A geometric interpretation of linear functions is the fact that the set of all vectors \(x\) for which \(f(x)=0\), where \(f(x)\) is a linear function not identically zero, is a hyperplane in \(\mathcal{U}\). Because two linear functions \(f(x)\) and \(g(x)\) related by \(f(x)=a g(x)\) for all \(x\), where \(a\) is any nonzero constant, correspond to the same hyperplane, it turns out a little more convenient to study the linear functions themselves and not the hyperplanes, that is, classes of linear functions defined by them. It is easy to verify, but we shall not make use of this fact in this form, that the hyperplanes of a vector space of dimension \(n\) are the linear subspaces of dimension \(n-1\). It is also easy to verify that the set of all linear functions defined on a vector space is itself a vector space; one interpretation of the theorem to be proved below is that the set of all linear functions on an inner product space is itself an inner product space.)

Theorem 4. If \(f(x)\) is a linear function in an inner product space \(\mathcal{U}\), there exists a unique vector \(y\) in \(\mathcal{U}\) such that for all \(x\), \(f(x) = \langle x, y \rangle\).

Proof. Since the theorem is obvious if \({f}({x})\) is identically zero, we may assume that this is not the case. Let \(M\) be the set of all vectors \(x\) for which \(f(x)=0\), and let \(N = M^\perp\) be the orthocomplement of \(M\). Then \(N\) contains a vector \(y_{0} \neq 0\) for which \(f(y_{0}) \neq 0\); by multiplication by a suitable constant we may assume \(f(y_{0})=1\).

Consider now any vector \(z\) in \(N\). For any complex number \(a\), \(z-ay_{0}\) is also in \(N\) and we have \[ f(z-a y_{0})=f(z)-a f(y_{0})= f(z)-a. \]

If we choose \(a=f(z)\), then \(f(z-ay_{0})=0\), so that \(z - ay_0\) is in \(M\); since we have already seen that it must be in \(N\), it follows that \(z-ay_{0}=0\), or \(z=f(z) y_{0}\). (This shows that \(N\) is one dimensional and therefore that the dimension of \(M\) is \(n-1\).) We write \(y=y_{0} /|y_{0}|^{2}\), and if \(x\) is an arbitrary vector in \(\mathcal{U}\) we use the projection theorem to write \(x\) in the form \(x=w+z\), with \(w\) in \(M\) and \(z\) in \(N\). Then \[ \begin{aligned} \langle x, y \rangle &= \langle w+z, y \rangle\\ &= \langle z, y \rangle\\ &=\big\langle f(z) y_{0}, y\big\rangle\\ &=f(z)\big\langle y_{0}, y_{0} /|y_{0}|^{2}\big\rangle\\ &=f(z); \end{aligned} \] and since \(w\) is in \(M\), \(f(w)=0\) so that \(f(x)=f(w+z)=f(z)\). This completes the proof of the existence of \(y\).

To prove uniqueness, suppose that for every \(x\), \(\langle x, y_{1} \rangle = \langle x, y_{2} \rangle\). Then we should have \(\langle x, y_{1}-y_{2}\rangle = 0\) for all \(x\): i.e., the vector \(y_{1} - y_{2}\) is orthogonal to every vector, and therefore in particular to itself, so that \({y}_{1}={y}_{2}\), as was to be proved.


  1. Also called a linear manifold. ↩︎