5 Matrix Algebra
We have so far avoided matrix algebra in our discussion of the linear regression model, one reason being that we can get good intuition from the “summation-notation formulas” for parameter estimators, standard errors, and other statistics. However, we have had to limit ourselves to two regressors, and even then we have had to skip over several results. To go any further, we will need matrix algebra. In this chapter we cover the basics: definitions, notation, and elementary operations, and its applications in solving systems of linear equations. This chapter is an abridged version of Chapter 7 of Tay, Preve, and Baydur (2025).
5.1 Definitions and Notation
A matrix is a rectangular collection of numbers. The following is a matrix with \(m\) rows and \(n\) columns: \[ \begin{bmatrix} a_{11} & a_{12} & \cdots & a_{1n} \\ a_{21} & a_{22} & \cdots & a_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ a_{m1} & a_{m2} & \cdots & a_{mn} \end{bmatrix}. \] Such a matrix is said to have “dimension” or “order” \(m \times n\). The number that appears in the \((i,j)\)th position, i.e., in the \(i\)th row and \(j\)th column, is called the \((i,j)\)th element/entry/component of the matrix. We count rows from top to bottom, and columns from left to right. If \(m=n\), the matrix is a square matrix. If \(m=1\) and \(n>1\), we have a row vector. If \(m>1\) and \(n=1\), we have a column vector. If \(m=n=1\), we have a scalar.
The term “vector” is used in many ways in mathematics. Sometimes a vector refers to an ordered list of numbers \((x_1,x_2,\ldots,x_n)\). Such an object has no “shape”. It is merely an ordered sequence of \(n\) elements. Column and row vectors, on the other hand, are “two-dimensional” objects, in the sense of having a “height” (number of rows) and “width” (number of columns). In the context of matrix algebra, the word “vector” alone usually means a column vector, but not always.
Example 5.1 The matrix \(A\) below is a square matrix, \(b\) is a column vector and \(c\) is a row vector. \[ A = \begin{bmatrix} a_{11} & a_{12} \\ a_{21} & a_{22} \end{bmatrix},\,\, b = \begin{bmatrix} b_1 \\ b_2 \\ \vdots \\ b_m \end{bmatrix},\,\; c = \begin{bmatrix} c_1 & c_2 & \cdots & c_n \end{bmatrix}. \]
Matrices and vectors are often written in bold lettering, or with some sort of mark to distinguish them from scalars and other objects. We will not do so in these notes. The reader will have to rely on context to distinguish scalars from vectors and matrices. Where context is unclear, we will be more explicit.
Some additional notation:
It is often convenient to indicate an \(m\times n\) matrix \(A\) by \((a_{ij})_{m \times n}\).
We can refer to the \((i,j)\)th element of a matrix \(A\) by \((A)_{ij}\) or \((A)_{i,j}\).
The utility of these two notational conventions should become clearer as the chapter progresses.
Two matrices of the same dimension are said to be equal if each of their corresponding elements are equal, i.e., \[ A = B \Leftrightarrow (A)_{ij} = (B)_{ij} \,\,\text{ for all }\, i=1,2,\ldots,m \,\text{ and }\; j=1,2,\ldots,n. \] Two matrices of different dimensions cannot be equal.
A zero matrix is one whose elements are all zero. It is simply written as \(0\) although sometimes subscripts are added to indicate its dimension.
The diagonal of a \(n \times n\) square matrix refers to the \((i,i)\)th elements of the matrix, i.e., to the elements \((A)_{ii}\), \(i=1,2,\ldots,n\). A diagonal matrix is a square matrix with all off-diagonal elements equal to zero, i.e., a square matrix \(A\) is diagonal if \((A)_{ij} = 0\) for all \(i \neq j\), \(i,j=1,2,...,n\). Diagonal matrices are sometimes written \(\mathrm{diag}(a_{11},a_{22},...,a_{nn})\).
Example 5.2 The matrix \[ A = \begin{bmatrix} 1 & 0 & 0 \\ 0 & 4 & 0 \\ 0 & 0 & 0 \end{bmatrix} = \mathrm{diag}(1,4,0) \] is a diagonal matrix. Note that there is nothing in the definition of a diagonal matrix that says its diagonal elements cannot be zero.1
An identity matrix is a square matrix with all diagonal elements equal to one and all off-diagonal elements equal to zero, i.e., \[ I_n = \begin{bmatrix}1&0&\cdots&0\\0&1&\cdots&0\\\vdots & \vdots & \ddots & \vdots\\0&0&\cdots & 1 \end{bmatrix} =\text{diag}(\underbrace{1, 1, \dots, 1}_{n \text{ terms}})\,. \] We will denote an identity matrix by \(I\). A subscript is sometimes added to indicate its dimension, as we did above, although this is often left out. We will see shortly that the identity matrix plays a role in matrix algebra akin to the role played by the number “1” in the real number system.
A symmetric matrix is a square matrix \(A\) such that \((A)_{ij} = (A)_{ji}\) for all \(i,j=1,2,...,n\).
Example 5.3 The matrix \(\begin{bmatrix} 1 & 3 & 2 \\ 3 & 4 & 6 \\ 2 & 6 & 3 \end{bmatrix}\) is symmetric, \(\begin{bmatrix} 1 & 3 & 2 \\ 7 & 4 & 6 \\ 2 & 6 & 3 \end{bmatrix}\) is not.
5.1.1 Addition, Scalar Multiplication and Transpose
Addition: Matrix addition is defined as element-by-element addition, i.e.., for two matrices \(A=(a_{ij})_{m\times n}\) and \(B=(b_{ij})_{m\times n}\), we define \[ (A+B)_{ij} = (A)_{ij} + (B)_{ij}\;\text{ for all }\;i=1,\dots,m\,;\;j=1,\dots,n\,. \] Matrix addition is defined only for matrices of the same dimensions.
Example 5.4 \(\begin{bmatrix} 1 & 4 \\ 3 & 2 \\ 6 & 5 \end{bmatrix} + \begin{bmatrix} 6 & 9 \\ 1 & 2 \\ 1 & 10 \end{bmatrix} = \begin{bmatrix} 1+6 & 4+9 \\ 3+1 & 2+2 \\ 6+1 & 5+10 \end{bmatrix} = \begin{bmatrix} 7 & 13 \\ 4 & 4 \\ 7 & 15 \end{bmatrix}\).
It should also be obvious that \[ \begin{aligned} A + B &= B + A \;,\\ (A+B)+C &= A + (B + C)\;. \end{aligned} \] This means that as far as addition is concerned, we can manipulate matrices in the same way we manipulate ordinary numbers (as long as the matrices being added have the same dimensions).
Scalar Multiplication: For a scalar \(\alpha\) and matrix \(A = (a_{ij})_{m\times n}\), we define \[ (\alpha A)_{ij} = (A \alpha)_{ij} = \alpha (A)_{ij}\,\text{ for all }\,i=1,\dots,m\;;\;j=1,\dots,n\,. \] i.e., the product of a scalar and a matrix is defined to be the multiplication of each element of the matrix by the scalar.
Example 5.5 \(b\begin{bmatrix}a_{11} & a_{12} \\a_{21} & a_{22} \\a_{31} & a_{32} \end{bmatrix} = \begin{bmatrix}ba_{11} & ba_{12} \\ba_{21} & ba_{22} \\ba_{31} & ba_{32} \end{bmatrix}\).
We can use scalar multiplication to define matrix subtraction: \[ A-B = A + (-1)B. \] Transpose: When we transpose a matrix, we write its rows as its columns, and its columns as its rows. That is, the transpose of an \((m \times n)\) matrix \(A\), denoted either by \(A^{\mathrm{T}}\) or \(A'\), is defined by \[ (A^\mathrm{T})_{ij} = (A)_{ji} \,\, \text{ for all } \, i=1,2,...,m\,\text{ and }\,\, j=1,2,...,n. \]
Example 5.6 \(\begin{bmatrix}1 & 4 \\ 3 & 2 \\ 6 & 5 \end{bmatrix}^\mathrm{T} = \begin{bmatrix}1 & 3 & 6 \\ 4 & 2 & 5 \end{bmatrix}.\)
In order to use space more efficiently, we will often write a column vector \(x = \begin{bmatrix} x_1 \\ \vdots \\ x_m \end{bmatrix}\) as \[ x = \begin{bmatrix} x_1 & x_2 & \dots & x_m \end{bmatrix}^\mathrm{T}\; \text{ or } \; x^\mathrm{T} = \begin{bmatrix} x_1 & x_2 & \dots & x_m \end{bmatrix}\,. \]
We can use the transpose operator to define symmetric matrices: a symmetric matrix is simply a square matrix where \(A^\mathrm{T} = A\).
5.1.2 Exercises
Exercise 5.1 What is the dimension of \(A = \begin{bmatrix}7 & 13 \\ 4 & 4 \\ 7 & 15 \end{bmatrix}\)? What is \((A)_{1,2}\) and \((A)_{3,1}\)?
Exercise 5.2 Suppose \(A=(a_{ij})_{2\times 4}\) where \(a_{ij} = i + j\). Write out the matrix in full.
Exercise 5.3 Express the following matrices in full:
\((a_{ij})_{4 \times 4}\) where \(a_{ij} = 1\) when \(i=j\), \(0\) otherwise.
\((a_{ij})_{4 \times 4}\) where \(a_{ij} = 0\) if \(i \neq j\) (fill the rest of the entries with “\(*\)”).
\((a_{ij})_{4 \times 4}\) where \(a_{ij} = 0\) if \(i < j\) (fill the rest of the entries with “\(*\)”).
\((a_{ij})_{4 \times 4}\) where \(a_{ij} = 0\) if \(i > j\) (fill the rest of the entries with “\(*\)”).
These are all square matrices. Matrix (c) is a “lower triangular matrix” and (d) is an “upper triangular matrix” (so we have in (c) and (d) matrices that are square and triangular!). Matrix (b) is diagonal, which is both upper and lower triangular.
Exercise 5.4 What is \(u\) and \(v\) if \[ \begin{bmatrix}u+2v & 1 & 3 \\ 9 & 0 & 4 \\ 3 & 4 & 7\end{bmatrix} = \begin{bmatrix}1 & 1 & 3 \\ 9 & 0 & u+v \\ 3 & 4 & 7\end{bmatrix}? \]
Exercise 5.5 Let \(v_{1}, v_{2}, v_{3}, v_{4}\) represent cities and suppose there are one-way flights from \(v_1\) to \(v_2\) and \(v_3\), from \(v_2\) to \(v_3\) and \(v_4\), and two-way flights between \(v_1\) and \(v_4\). Write out a matrix \(A\) such that \((A)_{ij}=1\) if there is a flight from \(v_i\) to \(v_j\), and zero otherwise.
Exercise 5.6 Let \(A = \begin{bmatrix}0 & 0 & 0 \\ 0 & 0 & 0 \end{bmatrix}\) and \(B = \begin{bmatrix}0 & 0 \\ 0 & 0 \\ 0 & 0\end{bmatrix}\). Is \(A=B\)?
Exercise 5.7 If \(2A = \begin{bmatrix}3 & 4 \\ 2 & 8 \\ 1 & 5\end{bmatrix}\), what is \(A\)? If \(B - \dfrac{1}{2}\begin{bmatrix}3 & 4 \\ 1 & 8 \\ 1 & 4\end{bmatrix} = \begin{bmatrix}6 & 4 \\ 2 & 5 \\ 3 & 1\end{bmatrix}\), what is \(B\)?
Exercise 5.8 Which of the following matrices are symmetric?
(a) \(\begin{bmatrix}1 & 2 & 3 & 5 \\ 2 & 5 & 4 & b \\ 3 & 4 & 3 & 3 \\ 5 & b & 3 & 1\end{bmatrix}\) (b) \(\begin{bmatrix}1 & 1 & 3 & 5 \\ 2 & 5 & 4 & b \\ 3 & 4 & 3 & 3 \\ 5 & b & 3 & 1\end{bmatrix}\) (c) \(\begin{bmatrix}1 & 0 & 0 & 0 \\ 0 & 5 & 0 & 0 \\ 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 1\end{bmatrix}\) (d) \(\begin{bmatrix}1 & 1 \\ 1 & 1 \\ 1 & 1 \\ 1 & 1\end{bmatrix}\)
Exercise 5.9 True or False?
- Symmetric matrices must be square.
- A scalar is symmetric.
- If \(A\) is symmetric, then \(\alpha A\) is symmetric.
- The sum of symmetric matrices is symmetric.
- All diagonal matrices are symmetric.
- If \((A^\mathrm{T})^\mathrm{T} = A\), then \(A\) is symmetric.
Exercise 5.10 (a) Find \(A\) and \(B\) if they simultaneously satisfy \[ 2A + B = \begin{bmatrix} 1 & 2 & 1\\4 & 3 & 0 \end{bmatrix} \quad \text{and} \quad A + 2B = \begin{bmatrix} 4 & 2 & 3\\5 & 1 & 1 \end{bmatrix}\,. \] (b) If \(A+B=C\) and \(3A - 2B = 0\) simultaneously, find \(A\) and \(B\) in terms of \(C\).
5.2 Matrix Multiplication
Let \(A\) be \(m \times n\) and \(B\) be \(n \times p\) — here we require the number of columns in \(A\) and the number of rows in \(B\) to be the same. Then the product \(AB\) is defined as the \(m \times p\) matrix whose \((i,j)\)th element is \[ (AB)_{ij} = \sum_{k=1}^n a_{ik}b_{kj}\;. \] That is, the \((i,j)\)th element of the product \(AB\) is defined as the sum of the product of the elements of the \(i\)th row of \(A\) with the corresponding elements in the \(j\)th column of \(B\). Put another way, the \((i,j)th\) element of the product \(AB\) is the dot or inner product of the \(i\)th row of \(A\) with the \(j\)th column of \(B\). For example, the \((1,1)\)th element of \(AB\) is \[ (AB)_{11} = \sum_{k=1}^n a_{1k}b_{k1} = a_{11}b_{11} + a_{12}b_{21} + a_{13}b_{31} + \cdots + a_{1n}b_{n1}\;. \] The \((2,3)\)th element of \(AB\) is \[ (AB)_{23} = \sum_{k=1}^n a_{2k}b_{k3} = a_{21}b_{13} + a_{22}b_{23} + a_{23}b_{33} + \cdots + a_{2n}b_{n3}\,, \] Visually, for a product of a \(3 \times 3\) matrix and a \(3 \times 2\) matrix, we have
\[ \begin{aligned} \begin{bmatrix} \boxed{\begin{matrix} a_{11} & a_{12} & a_{13} \end{matrix}} \\ \begin{matrix} a_{21} & a_{22} & a_{23} \end{matrix} \\ \begin{matrix} a_{31} & a_{32} & a_{33} \end{matrix} \end{bmatrix} \begin{bmatrix} \boxed{\begin{matrix} b_{11} \\ b_{21} \\ b_{31} \end{matrix}} & \begin{matrix} b_{12} \\ b_{22} \\ b_{32} \end{matrix} \end{bmatrix} &= \begin{bmatrix} \boxed{a_{11}b_{11}+a_{12}b_{21}+a_{13}b_{31}} & \bullet \\ \bullet & \bullet \\ \bullet & \bullet \end{bmatrix} \\[1.5ex] \begin{bmatrix} \boxed{\begin{matrix} a_{11} & a_{12} & a_{13} \end{matrix}} \\ \begin{matrix} a_{21} & a_{22} & a_{23} \end{matrix} \\ \begin{matrix} a_{31} & a_{32} & a_{33} \end{matrix} \end{bmatrix} \begin{bmatrix} \begin{matrix} b_{11} \\ b_{21} \\ b_{31} \end{matrix} & \boxed{\begin{matrix} b_{12} \\ b_{22} \\ b_{32} \end{matrix}} \end{bmatrix} &= \begin{bmatrix} \sum_{k=1}^{3} a_{1k}b_{k1} & \boxed{a_{11}b_{12}+a_{12}b_{22}+a_{13}b_{32}} \\ \bullet & \bullet \\ \bullet & \bullet \end{bmatrix} \\[1.5ex] \begin{bmatrix} \begin{matrix} a_{11} & a_{12} & a_{13} \end{matrix} \\ \boxed{\begin{matrix} a_{21} & a_{22} & a_{23} \end{matrix}} \\ \begin{matrix} a_{31} & a_{32} & a_{33} \end{matrix} \end{bmatrix} \begin{bmatrix} \boxed{\begin{matrix} b_{11} \\ b_{21} \\ b_{31} \end{matrix}} & \begin{matrix} b_{12} \\ b_{22} \\ b_{32} \end{matrix} \end{bmatrix} &= \begin{bmatrix} \sum_{k=1}^{3} a_{1k}b_{k1} & \sum_{k=1}^{3} a_{1k}b_{k2} \\ \boxed{a_{21}b_{11}+a_{22}b_{21}+a_{23}b_{31}} & \bullet \\ \bullet & \bullet \end{bmatrix} \end{aligned} \] and so on.
Example 5.7 Let \(A = \begin{bmatrix} 2 & 8 \\ 3 & 0 \\ 5 & 1 \end{bmatrix}\) and \(B = \begin{bmatrix} 4 & 7 \\ 6 & 9 \end{bmatrix}\). Then \[ AB = \begin{bmatrix} 2 & 8 \\ 3 & 0 \\ 5 & 1 \end{bmatrix}\begin{bmatrix} 4 & 7 \\ 6 & 9 \end{bmatrix} = \begin{bmatrix} 2 \cdot 4+ 8 \cdot 6 & 2 \cdot 7 + 8 \cdot 9 \\ 3 \cdot 4 + 0 \cdot 6 & 3 \cdot 7 + 0 \cdot 9 \\ 5 \cdot 4 + 1 \cdot 6 & 5 \cdot 7 + 1 \cdot 9 \end{bmatrix} = \begin{bmatrix} 56 & 86 \\ 12 & 21 \\ 26 & 44 \end{bmatrix}. \]
Example 5.8 The system of equations \[ \begin{aligned} 2x_1 - \phantom{2}x_2 &= 4 \\ \phantom{2} x_1 + 2x_2 &= 2 \end{aligned} \] can be written in matrix form as \[ \underbrace{\begin{bmatrix} 2 & -1 \\ 1 & 2 \end{bmatrix}}_{A} \underbrace{\begin{bmatrix} x_1 \\ x_2 \end{bmatrix}}_{x} = \underbrace{\begin{bmatrix} 4 \\ 2 \end{bmatrix}}_{b}, \, \text{ or } \, Ax = b \,. \]
5.2.1 Exercises
These exercises illustrate crucial aspects of matrix multiplication. You should work through the exercises before proceeding to the next section.
Exercise 5.11 Let \(A=\begin{bmatrix} 2 & 8 \\ 3 & 0 \\ 5 & 1\end{bmatrix}\), \(B=\begin{bmatrix} 2 & 0 \\ 3 & 8 \end{bmatrix}\) and \(C=\begin{bmatrix} 7 & 2 \\ 6 & 3 \end{bmatrix}\).
- Compute the products \(BC\), \(CB\), and \(AB\). (b) Can \(BA\) even be computed?
Remark: This exercise shows that for any two matrices \(A\) and \(B\), \(AB \neq BA\) in general. That is, we have to distinguish between pre-multiplication and post-multiplication. In the product \(AB\), we say that \(B\) is pre-multiplied by \(A\), or that \(A\) is post-multiplied by \(B\).
Exercise 5.12 Show that \(x^\mathrm{T}x \geq 0\) for any vector \(x = \begin{bmatrix} x_1 & x_2 & \dots & x_n\end{bmatrix}^\mathrm{T}\). When will \(x^\mathrm{T}x = 0\)?
Remark: For any column vector \(x\), the product \(x^\mathrm{T}x\) is the sum of the squares of its elements. We call it the dot or inner product of the column vector \(x\) with itself.
Exercise 5.13
- Compute \(\begin{bmatrix} 2 & 4 \\ 1 & 2 \end{bmatrix}\begin{bmatrix} -2 & 4 \\ 1 & -2 \end{bmatrix}\). (b) Compute \(A^{2} = AA\) where \(A = \begin{bmatrix} 1 & b \\ -\frac{1}{b} & -1 \end{bmatrix}, \,b \neq 0\)
Remark: This exercise shows that you can multiply two non-zero matrices and end up with a zero matrix. Therefore \(AB = 0\) does not imply \(A=0\) or \(B=0\). It is even possible for the square of a non-zero matrix to be a zero matrix. Of course, if \(A=0\) or \(B=0\), then \(AB=0\).
As you can see, in many ways matrix multiplication does not behave like the usual multiplication of numbers. For instance, the order of multiplication matters, and \(AB=0\) does not imply \(A=0\) or \(B=0\). But in some ways matrix multiplication does behave like regular multiplication of numbers, as the next exercise shows.
Exercise 5.14 Prove that
\((AB)C = A(BC)\) where \(A\), \(B\), and \(C\) are \(m \times n\), \(n \times p\) and \(p \times q\) respectively.
\(A(B+C) = AB + AC\) where \(A\) is \(m \times n\), and \(B\) and \(C\) are \(n \times p\).
\((A+B)C = (AC + BC)\) where \(A\) and \(B\) are \(m \times n\) and \(C\) is \(n \times p\).
Exercise 5.15 Let \(A\) be an \(m \times n\) matrix, and let \(I_n\) and \(I_m\) be identity matrices of dimensions \(n\times n\) and \(m\times m\) respectively. Show that \(I_m A = A I_n = A\).
Exercise 5.16 Show that \[ \begin{bmatrix}a_{11} & a_{12} & a_{13} \\ a_{21} & a_{22} & a_{23} \\ a_{31} & a_{32} & a_{33} \\ a_{41} & a_{42} & a_{43} \\ \end{bmatrix} \begin{bmatrix}b_{1} \\ b_{2} \\ b_{3}\end{bmatrix} = b_1\begin{bmatrix}a_{11} \\ a_{21} \\ a_{31} \\ a_{41}\end{bmatrix} + b_2\begin{bmatrix}a_{12} \\ a_{22} \\ a_{32} \\ a_{42}\end{bmatrix} + b_3\begin{bmatrix}a_{13} \\ a_{23} \\ a_{33} \\ a_{43}\end{bmatrix} \] i.e., \(Ab\) is a linear combination of the columns of \(A\), with weights given in \(b\).
Exercise 5.17 (a) Show that \((AB)^\mathrm{T} = B^\mathrm{T}A^\mathrm{T}\) for any \(m \times n\) matrix \(A\) and any \(n \times p\) matrix \(B\). Verify this equality for the matrices \[ A = \begin{bmatrix}a_1 & a_2 & a_3 \\ a_4 & a_5 & a_6 \end{bmatrix} \; \text{ and } \; B = \begin{bmatrix}b_1 & b_2 & b_3 \\ b_4 & b_5 & b_6 \\ b_7 & b_8 & b_9 \end{bmatrix}. \]
- Prove that \((ABC)^\mathrm{T}\) = \(C^\mathrm{T}B^\mathrm{T}A^\mathrm{T}\).
Exercise 5.18 Explain why \(X^\mathrm{T}X\) is square and symmetric for any general \(n \times k\) matrix \(X\).
Remark: The matrix \(X^\mathrm{T}X\) is encountered frequently in all statistical disciplines.
Exercise 5.19 The trace of an \(n\times n\) matrix \(A = (a_{ij})_{n \times n}\) is defined to be \[ \mathrm{tr}(A) = \sum_{i=1}^n a_{ii}. \] That is, the trace of a square matrix is simply the sum of its diagonal elements. The trace of a scalar is the scalar itself.
If \(A\) and \(B\) are square matrices of the same dimensions, show that \(\mathrm{tr}(A+B) = \mathrm{tr}(A)+\mathrm{tr}(B)\),.
If \(A\) is a square matrix, show that \(\mathrm{tr}(A^\mathrm{T})=\mathrm{tr}(A)\).
If \(A\) is \(m \times n\) and \(B\) is \(n \times m\), show that \(\mathrm{tr}(AB) = \mathrm{tr}(BA)\).
If \(x\) is an \(n \times 1\) column vector, show that \(x^\mathrm{T}x = \mathrm{tr}(xx^\mathrm{T})\) by
- direct multiplication,
- using (c) and the fact that the trace of a scalar is the scalar itself.
The trace operation is surprisingly useful in proofs and for deriving and simplifying matrix equations.
Exercise 5.20 Let \(i_n\) be an \(n \times 1\) vector of ones, i.e., \(i_n = \begin{bmatrix}1 & 1 & \cdots & 1 \end{bmatrix}^\mathrm{T}\).
Show that the formula for the sample mean of the elements of the column vector \(y=\begin{bmatrix}y_1 & y_2 & \cdots & y_n \end{bmatrix}^\mathrm{T}\) can be written as \(\overline{y} = (i_n^\mathrm{T}i_n)^{-1} i_n^\mathrm{T}y\).
Show that \(M_{0} = I_{n} - i_n(i_n^\mathrm{T}i_n)^{-1} i_n^\mathrm{T}\) is symmetric, and that \(M_{0}M_{0} = M_{0}\).
Show that the sample variance of the data in \(y\) can be written as \[ \frac{1}{n-1} \sum_{i=1}^{n} (y_i - \overline{y})^2 = \frac{y^\mathrm{T} M_{0} y}{n-1}\,. \]
Exercise 5.21 Prove that \(A(\alpha B) = (\alpha A)B = \alpha(AB)\).
5.3 Partitioned Matrices
We can partition the contents of an \(m \times n\) matrix into blocks of submatrices. For instance, we can write \[ A= \left[\begin{array}{@{}cccc@{}} 1 & 3 & 2 & 6 \\ 2 & 8 & 2 & 1 \\ 3 & 1 & 2 & 4 \\ 4 & 2 & 1 & 3 \\ 3 & 1 & 1 & 7 \end{array}\right] = \left[\begin{array}{@{}c|ccc@{}} 1 & 3 & 2 & 6 \\ 2 & 8 & 2 & 1 \\ \hline 3 & 1 & 2 & 4 \\ 4 & 2 & 1 & 3 \\ 3 & 1 & 1 & 7 \end{array}\right] = \left[\begin{array}{@{}cc@{}} A_{11} & A_{12} \\ A_{21} & A_{22} \end{array}\right] \] where \[ A_{11} = \begin{bmatrix}1\\2\end{bmatrix}, \, A_{21} = \begin{bmatrix}3\\4\\3\end{bmatrix},\, A_{12}=\begin{bmatrix}3&2&6\\8&2&1\end{bmatrix} \,\text{ and }\, A_{22} = \begin{bmatrix}1&2&4\\2&1&3\\1&1&7\end{bmatrix}. \] Partitioned matrices are often called block matrices. Of course, there are many ways of partitioning any given matrix. The following is another partition of the matrix \(A\): \[ A= \left[\begin{array}{@{}cccc@{}} 1 & 3 & 2 & 6 \\ 2 & 8 & 2 & 1 \\ 3 & 1 & 2 & 4 \\ 4 & 2 & 1 & 3 \\ 3 & 1 & 1 & 7 \end{array}\right] = \left[\begin{array}{@{}cc|cc@{}} 1 & 3 & 2 & 6 \\ 2 & 8 & 2 & 1 \\ 3 & 1 & 2 & 4 \\ \hline 4 & 2 & 1 & 3 \\ 3 & 1 & 1 & 7 \end{array}\right]. \] It can be shown that addition and multiplication of partitioned matrices can be carried out as though the blocks are elements, as long as the matrices are partitioned conformably.
Addition of Partitioned Matrices. Consider two \(m \times n\) matrices \(A\) and \(B\) partitioned in the following manner: \[ A = \begin{bmatrix} \underbrace{A_{11}}_{m_1 \times n_1} & \underbrace{A_{12}}_{m_1 \times n_2} \\[2ex] \underbrace{A_{21}}_{m_2 \times n_1} & \underbrace{A_{22}}_{m_2 \times n_2} \end{bmatrix} \,\, \text{ and } \,\, B = \begin{bmatrix} \underbrace{B_{11}}_{m_1 \times n_1} & \underbrace{B_{12}}_{m_1 \times n_2} \\[2ex] \underbrace{B_{21}}_{m_2 \times n_1} & \underbrace{B_{22}}_{m_2 \times n_2} \end{bmatrix} \] where \(n_1 + n_2 = n\) and \(m_1 + m_2 = m\). We emphasize that \(A\) and \(B\) must be of the same size and partitioned identically. Then \[ A + B = \begin{bmatrix} \underbrace{A_{11}+B_{11}}_{m_1 \times n_1} & \underbrace{A_{12}+B_{12}}_{m_1 \times n_2} \\[2ex] \underbrace{A_{21}+B_{21}}_{m_2 \times n_1} & \underbrace{A_{22}+B_{22}}_{m_2 \times n_2} \end{bmatrix}. \tag{5.1}\]
Multiplication of Partitioned Matrices. Now consider two matrices \(A\) and \(B\) with dimensions \(m \times p\) and \(p \times n\) respectively, are partitioned as follows: \[ A = \begin{bmatrix} \underbrace{A_{11}}_{m_1 \times p_1} & \underbrace{A_{12}}_{m_1 \times p_2} \\[2ex] \underbrace{A_{21}}_{m_2 \times p_1} & \underbrace{A_{22}}_{m_2 \times p_2} \end{bmatrix} \,\, \text{ and } \,\, B = \begin{bmatrix} \underbrace{B_{11}}_{p_1 \times n_1} & \underbrace{B_{12}}_{p_1 \times n_2} \\[2ex] \underbrace{B_{21}}_{p_2 \times n_1} & \underbrace{B_{22}}_{p_2 \times n_2} \end{bmatrix}\,. \] In particular, the partition is such that the column-wise partition of \(A\) matches the row-wise partition of \(B\). Then \[ \begin{aligned} AB &= \begin{bmatrix} \underbrace{A_{11}}_{m_1 \times p_1} & \underbrace{A_{12}}_{m_1 \times p_2} \\[2ex] \underbrace{A_{21}}_{m_2 \times p_1} & \underbrace{A_{22}}_{m_2 \times p_2} \end{bmatrix} \begin{bmatrix} \underbrace{B_{11}}_{p_1 \times n_1} & \underbrace{B_{12}}_{p_1 \times n_2} \\[2ex] \underbrace{B_{21}}_{p_2 \times n_1} & \underbrace{B_{22}}_{p_2 \times n_2} \end{bmatrix} = \begin{bmatrix} \underbrace{A_{11}B_{11}+A_{12}B_{21}}_{m_1 \times n_1} & \underbrace{A_{11}B_{12}+A_{12}B_{22}}_{m_1 \times n_2} \\[2ex] \underbrace{A_{21}B_{11}+A_{22}B_{21}}_{m_2 \times n_1} & \underbrace{A_{21}B_{12}+A_{22}B_{22}}_{m_2 \times n_2} \end{bmatrix}\,. \end{aligned} \tag{5.2}\] Transposition of Partitioned Matrices. It is straightforward to show that \[ A = \begin{bmatrix} \underbrace{A_{11}}_{m_1 \times n_1} & \underbrace{A_{12}}_{m_1 \times n_2} \\[1ex] \underbrace{A_{21}}_{m_2 \times n_1} & \underbrace{A_{22}}_{m_2 \times n_2} \end{bmatrix} \hspace{0.5cm}\Rightarrow\hspace{0.5cm} A^\mathrm{T} = \begin{bmatrix} \underbrace{A_{11}^\mathrm{T}}_{n_1 \times m_1} & \underbrace{A_{21}^\mathrm{T}}_{n_1 \times m_2} \\[1ex] \underbrace{A_{12}^\mathrm{T}}_{n_2 \times m_1} & \underbrace{A_{22}^\mathrm{T}}_{n_2 \times m_2} \end{bmatrix}. \tag{5.3}\]
Remark on Matrix Multiplication: So far we have spoken of inner products of vectors, scalar multiplication (multiplication of of matrices and vectors with a scalar), and regular matrix multiplication. There are yet other kinds of matrix multiplication concepts. For instance, the Hadamard product, denoted \(\circ\) or \(\odot\), refers to element-wise multiplication, e.g., \[ \begin{bmatrix} 1 & 2 \\ 3 & 4 \\ 5 & 6\end{bmatrix} \odot \begin{bmatrix} 2 & 3\\ 4 & 5 \\ 6 & 7\end{bmatrix} = \begin{bmatrix} 1 \cdot 2 & 2 \cdot 3\\ 3 \cdot 4 & 4 \cdot5 \\ 5 \cdot 6 & 6 \cdot 7\end{bmatrix} = \begin{bmatrix} 2 & 6\\ 12 & 20 \\ 30 & 42\end{bmatrix}. \] The Kronecker product, denoted \(\otimes\), of an \(m \times n\) matrix \(A\) with a \(p \times q\) matrix \(B\) is the \(mp \times nq\) block matrix formed by multiplying each element of \(A\) by the entire \(B\) matrix. For example \[ \begin{bmatrix} a_{11} & a_{12} & a_{13} \\ a_{21} & a_{22} & a_{23}\end{bmatrix} \otimes \begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix} = \left[\begin{array}{@{}cc|cc|cc@{}} a_{11} & 0 & a_{12} & 0 & a_{13} & 0 \\ 0 & a_{11} & 0 & a_{12} & 0 & a_{13} \\[0.5ex] \hline a_{21} & 0 & a_{22} & 0 & a_{23} & 0 \\ 0 & a_{21} & 0 & a_{22} & 0 & a_{23} \end{array} \right]. \]
5.3.1 Exercises
Exercise 5.22 Let \[ A = \left[\begin{array}{@{}c|ccc@{}} 1 & 3 & 2 & 6 \\ 2 & 8 & 2 & 1 \\ \hline 3 & 1 & 2 & 4 \\ 4 & 2 & 1 & 3 \\ 3 & 1 & 1 & 7 \end{array}\right] \quad\text{ and }\quad B = \left[\begin{array}{@{}c|ccc@{}} 2 & 0 & 1 \\ \hline 3 & 1 & 3 \\ 1 & 5 & 4 \\ 4 & 1 & 1 \end{array}\right]. \] Verify the partitioned matrix multiplication formulas by computing \(AB\) in the usual way, then compute \(AB\) using (5.2). Verify the transposition formula (5.3) for matrix \(A\).
Exercise 5.23 Let \(A\) be a \(m \times n\) matrix and \(b\) be a \(n \times 1\) vector. We have shown earlier that \(Ab\) is a linear combination of the columns of \(A\). In terms of partitioned matrices, we have \[ \begin{aligned} Ab &= \left[\begin{array}{@{}c|c|c|c@{}} a_{11} & a_{12} & \cdots & a_{1n} \\ a_{21} & a_{22} & \cdots & a_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ a_{m1} & a_{m2} & \cdots & a_{mn} \\ \end{array}\right] \begin{bmatrix} b_1 \\ b_2 \\ \vdots \\ b_{n} \end{bmatrix} = \begin{bmatrix} A_{*1} & A_{*2} & \cdots & A_{*n} \end{bmatrix} \begin{bmatrix} b_1 \\ b_2 \\ \vdots \\ b_{n} \end{bmatrix} = A_{*1} b_1 + A_{*2} b_2 + \dots + A_{*n} b_n \end{aligned} \] Let \(c = \begin{bmatrix} c_1 & c_2 & \dots & c_m \end{bmatrix}^\mathrm{T}\). Show that \(c^\mathrm{T}A\) is a linear combination of the rows of \(A\).
Exercise 5.24 Let \(X\) be a \(n \times 3\) data matrix containing \(n\) observations of three variables: \[ X = \begin{bmatrix} x_{11} & x_{12} & x_{13} \\ x_{21} & x_{22} & x_{23} \\ x_{31} & x_{32} & x_{33} \\ \vdots & \vdots & \vdots \\ x_{n1} & x_{n2} & x_{n3} \end{bmatrix} \] where \(x_{ij}\) represents the \(i\)th observation of variable \(j\). We can partition this matrix to emphasize the variables by writing \(X\) as \(X = \begin{bmatrix} X_{*1} & X_{*2} & X_{*3} \end{bmatrix}\) where \[ X_{*1} = \begin{bmatrix}x_{11} \\ x_{21} \\ x_{31} \\ \vdots \\ x_{n1} \end{bmatrix},\; X_{*2} = \begin{bmatrix}x_{12} \\ x_{22} \\ x_{32} \\ \vdots \\ x_{n2} \end{bmatrix} \, \text{ and } \; X_{*3} = \begin{bmatrix}x_{13} \\ x_{23} \\ x_{33} \\ \vdots \\ x_{n3} \end{bmatrix}. \] Alternatively, we can partition the data matrix to emphasize the observations: \[ X = \begin{bmatrix} X_{1*} \\ X_{2*} \\ X_{3*} \\ \vdots \\ X_{n*} \end{bmatrix} \] where \(X_{i*} = \begin{bmatrix} x_{i1} & x_{i2} & x_{i3} \end{bmatrix}\) is the row vector containing the \(i\)th observations of all three variables, \(i=1,2,...,n\). Show that the matrix \(X^\mathrm{T}X\) can be written as \[ \begin{aligned} X^\mathrm{T}X = \begin{bmatrix} X_{*1}^\mathrm{T}X_{*1} & X_{*1}^\mathrm{T}X_{*2} & X_{*1}^\mathrm{T}X_{*3} \\[1ex] X_{*2}^\mathrm{T}X_{*1} & X_{*2}^\mathrm{T}X_{*2} & X_{*2}^\mathrm{T}X_{*3} \\[1ex] X_{*3}^\mathrm{T}X_{*1} & X_{*3}^\mathrm{T}X_{*2} & X_{*3}^\mathrm{T}X_{*3} \end{bmatrix} = \sum_{i=1}^n X_{i*}^{\mathrm{T}} X_{i*}^{\phantom{T}} = \begin{bmatrix} \sum_{i=1}^n x_{i1}^2 & \sum_{i=1}^n x_{i1}x_{i2} & \sum_{i=1}^n x_{i1}x_{i3} \\[0.5ex] \sum_{i=1}^n x_{i1}x_{i2} & \sum_{i=1}^n x_{i2}^2 & \sum_{i=1}^n x_{i2}x_{i3} \\[0.5ex] \sum_{i=1}^n x_{i1}x_{i3} & \sum_{i=1}^n x_{i2}x_{i3} & \sum_{i=1}^n x_{i3}^2 \end{bmatrix} \end{aligned} \]
5.4 Introduction to Inverses and Determinants
5.4.1 The Inverse Matrix
The \(n \times m\) matrix \(B\) is said to be a left-inverse of a \(m \times n\) matrix \(A\) if \(BA = I_n\). The \(n \times m\) matrix \(C\) is a right-inverse of \(A\) if \(AC = I_m\). If \(A\) is \(n \times n\), and \(BA = AC = I_n\), then it must be the case that \(B = C\) since \[ BA = I_n \Rightarrow BAC = I_nC \Rightarrow BI_n = C \Rightarrow B = C. \] In this case, we call \(B=C\) the two-sided inverse, or simply the **inverse of \(A\), and give it the special notation \(A^{-1}\). That is, the inverse of a \(n \times n\) matrix \(A\), if it exists, is the unique matrix \(A^{-1}\) such that \[ A^{-1}A = I_n = AA^{-1}\,. \] We could leave out the second equality from the definition, since as we have already shown, \(A^{-1}A = I \Rightarrow AA^{-1} = I\).
Example 5.9 The inverse of the matrix \[ A = \begin{bmatrix} 1 & 3 \\ 2 & 4 \end{bmatrix} \quad \text{is} \quad A^{-1} = -\frac{1}{2}\begin{bmatrix} 4 & -3 \\ -2 & 1 \end{bmatrix}\,. \] This can be verified by direct multiplication: \[ A^{-1}A = -\frac{1}{2}\begin{bmatrix} 4 & -3 \\ -2 & 1 \end{bmatrix} \begin{bmatrix} 1 & 3 \\ 2 & 4 \end{bmatrix} = \begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix}\,. \] We do not have to show \(AA^{-1}=I_2\), since it is implied. You may wish to do so nonetheless, as an exercise.
Example 5.10 Let \(A\) and \(B\) be the matrices \[ A = \begin{bmatrix} 1 & 1 \\ 2 & 1 \\ 4 & 2 \end{bmatrix} \;\text{ and }\; B = \begin{bmatrix} -1 & 0.2 & 0.4 \\ 2 & -0.2 & -0.4 \\ \end{bmatrix}. \] You can easily verify (by direct multiplication) that \[ BA = \begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix} \;\text{ but }\; AB = \begin{bmatrix} 1 & 0 & 0 \\ 0 & 0.2 & 0.4 \\ 0 & 0.4 & 0.8\end{bmatrix}. \] The matrix \(B\) is a left-inverse of \(A\). We give left-inverses the special notation \(A_{left}^{-1}\). Likewise, right-inverses are given the special notation \(A_{right}^{-1}\). We will say more about left- and right-inverses in a later chapter. For this chapter we will focus on (two-sided) inverses. The term “inverse” will always mean a two-sided inverse.
We emphasize that \(A\) has a (two-sided) inverse only if it is square. Furthermore, not all square matrices have an inverse. The inverse of an arbitrary \(2 \times 2\) matrix \(A = \begin{bmatrix} a_{11} & a_{12} \\ a_{21} & a_{22} \end{bmatrix}\), if it exists, is \[ A^{-1} = \frac{1}{\text{det}(A)}\begin{bmatrix} a_{22} & -a_{12} \\ -a_{21} & a_{11} \end{bmatrix} \;\;\text{where}\; \text{det}(A) = a_{11}a_{22}-a_{12}a_{21}. \tag{5.4}\] You can easily verify this by direct multiplication. It is worth your while to commit formula (5.4) to memory.
The expression \(\det(A)\) in (5.4) is called the determinant of the \(2 \times 2\) matrix \(A\). Notice that the inverse exists only if \(\text{det(A)} \neq 0\). If the inverse of \(A\) does not exist, we say that \(A\) is singular. If the inverse exists, we say that \(A\) is non-singular. An alternative notation for \(\text{det}(A)\) is \(|A|\). We will use both notations in this book. In particular, we use the latter when indicating the determinant of a matrix written out in full. For instance, the determinant of the matrix \((a_{ij})_{2 \times 2}\) is \[ \text{det}(A) = \begin{vmatrix} a_{11} & a_{12} \\ a_{21} & a_{22} \end{vmatrix} = a_{11}a_{22} - a_{12}a_{21}\,. \]
Example 5.11 The inverse of the matrix \(A = \begin{bmatrix} 1 & 4 \\ 5 & 6 \end{bmatrix}\) is \[ A^{-1} = \frac{1}{\det(A)} \begin{bmatrix} 6 & -4 \\ -5 & 1 \end{bmatrix} = -\frac{1}{14} \begin{bmatrix} 6 & -4 \\ -5 & 1 \end{bmatrix} =\begin{bmatrix} -\frac{3}{7} & \frac{2}{7} \\[0.5ex] \frac{5}{14} & -\frac{1}{14} \end{bmatrix}. \]
Example 5.12 The determinant of the matrix \(A = \begin{bmatrix} 1 & 3 \\ 2 & 6 \end{bmatrix}\) is \(\det(A) = 1 \cdot 6 - 2 \cdot 3 = 0\), so \(A\) does not have an inverse.
When will \(\text{det}(A)=0\)? Examining the expression for \(\text{det}(A)\) in (5.4), we see that the determinant will be zero if one or both rows or columns are all zero, or if one row is a multiple of the other, or if one column is a multiple of the other.
The inverse of a scalar is obviously just its reciprocal. The following example shows the inverse of a particular \(3 \times 3\) matrix.
Example 5.13 The inverse of \(A = \begin{bmatrix} 0 & 2 & 4 \\ 3 & 1 & 2 \\ 6 & 2 & 1 \end{bmatrix}\) is \(A^{-1} = \begin{bmatrix} -\frac{1}{6} & \frac{1}{3} & 0 \\[0.5ex] \frac{1}{2} & -\frac{4}{3} & \frac{2}{3} \\[0.5ex] 0 & \frac{2}{3} & -\frac{1}{3} \end{bmatrix}\). This can be seen by direct multiplication: \[ \begin{bmatrix} -\tfrac{1}{6} & \frac{1}{3} & 0 \\[0.5ex] \tfrac{1}{2} & -\tfrac{4}{3} & \tfrac{2}{3} \\[0.5ex] 0 & \tfrac{2}{3} & -\tfrac{1}{3} \end{bmatrix} \begin{bmatrix} 0 & 2 & 4 \\ 3 & 1 & 2 \\ 6 & 2 & 1 \end{bmatrix} = \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{bmatrix}. \]
We’ll omit from these notes any discussion as to how to find the inverse of a general \(n \times n\) square matrix, see Tay, Preve, and Baydur (2025) for details. Nonetheless, even without seeing the formula or algorithms for computing the inverse of a matrix, we are able to prove the following general statements. Suppose the \(n \times n\) matrices \(A\) and \(B\) are non-singular, i.e., their inverses exist. Then \[ \text{i.} \;\;(A^{-1})^\mathrm{T} = (A^\mathrm{T})^{-1}\,,\quad\, \text{ii.} \;\;(AB)^{-1} = B^{-1} A^{-1}\,. \] Proof: For i., start with \(A A^{-1} = I\). Transpose both sides to get \((A^{-1})^\mathrm{T}A^\mathrm{T} = I\). Finally post-multiply both sides by \((A^\mathrm{T})^{-1}\) to get \[ (A^{-1})^\mathrm{T}A^\mathrm{T}(A^\mathrm{T})^{-1} = I(A^\mathrm{T})^{-1} \;\Rightarrow\; (A^{-1})^\mathrm{T} = (A^\mathrm{T})^{-1}\,. \] For ii., pre-multiply \(AB\) first by \(A^{-1}\) and then by \(B^{-1}\). This gives \[ \begin{gathered} A^{-1}AB = B \\ B^{-1}A^{-1}AB = B^{-1}B = I. \end{gathered} \] This says that \(B^{-1}A^{-1}\) is the inverse of \(AB\) since multiplying the two gives the identity matrix.
One implication of the first result is that the inverse of a symmetric matrix is symmetric: if \(A\) is symmetric, then \(A^\mathrm{T} = A\), so we have
\[
(A^{-1})^\mathrm{T} = (A^\mathrm{T})^{-1} = A^{-1}
\] which says that \(A^{-1}\) is symmetric. For the second result, it is important to keep in mind that this result holds only if \(A\) and \(B\) are both square. It is possible for \(A\) to be \(n \times k\) and \(B\) to be \(k \times n\) such that the square matrix \(AB\) is non-singular. But since \(A\) and \(B\) are not square, they do not have inverses. In that case the statement \((AB)^{-1} = B^{-1}A^{-1}\) is obviously meaningless.
5.4.2 Systems of Linear Equations
One application of matrix inverses is to find solutions to systems of linear equations. Consider a system of \(n\) equations in \(n\) unknowns \(x_{1}\), \(x_{2}\), \(\dots\), \(x_{n}\), \[ \begin{array}{ccccccccc} a_{11} x_{1} &+& a_{12} x_{2} &+& \dots &+& a_{1n} x_{n} &=& b_{1} \\ a_{21} x_{1} &+& a_{22} x_{2} &+& \dots &+& a_{2n} x_{n} &=& b_{2} \\ \vdots & & \vdots & & \vdots & & \vdots & & \vdots \\ a_{n1} x_{1} &+& a_{n2} x_{2} &+& \dots &+& a_{nn} x_{n} &=& b_{n} \\ \end{array} \tag{5.5}\] which can be written as \[ \begin{bmatrix} a_{11} & a_{12} & \dots & a_{1n} \\ a_{21} & a_{22} & \dots & a_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ a_{n1} & a_{n2} & \dots & a_{nn} \end{bmatrix} \begin{bmatrix} x_{1} \\ x_{2} \\ \vdots \\ x_{n} \end{bmatrix} = \begin{bmatrix} b_{1} \\ b_{2} \\ \vdots \\ b_{n} \end{bmatrix} \;\;\;\text{or} \;\;\; Ax = b\,. \] To be clear, we are speaking here of systems where there are as many equations as there are unknowns. If the inverse of \(A\) exists, then the system has a unique solution: \[ Ax = b \;\;\Rightarrow\;\; A^{-1}Ax = A^{-1}b \;\;\Rightarrow \;\;x = A^{-1}b\,. \]
Example 5.14 Consider the following systems of equations \[ \text{(i)}\quad \begin{aligned} 2x_1 - \phantom{2}x_2 &= 4 \\ \phantom{2}x_1 + 2x_2 &= 2 \end{aligned} \qquad \text{(ii)}\quad \begin{aligned} 2x_1 + \phantom{3}x_2 &= 4 \\ 6x_1 + 3x_2 &= 12 \end{aligned} \qquad \text{(iii)}\quad \begin{aligned} 2x_1 + \phantom{3}x_2 &= 4 \\ 6x_1 + 3x_2 &= 10 \end{aligned} \tag{5.6}\]
You can see that system (i) has a unique solution. System (ii) has infinitely many solutions (the graphs of the two equations coincide). System (iii) has no solution; the graphs of the two equations are parallel. The three systems can be written in the matrix form \(Ax=b\): \[ \text{(i)}\; \begin{bmatrix} 2 & -1 \\ 1 & 2 \end{bmatrix} \begin{bmatrix} x_1 \\ x_2 \end{bmatrix} = \begin{bmatrix} 4 \\ 2 \end{bmatrix} \quad \text{(ii)}\; \begin{bmatrix} 2 & 1 \\ 6 & 3 \end{bmatrix} \begin{bmatrix} x_1 \\ x_2 \end{bmatrix} = \begin{bmatrix} 4 \\ 12 \end{bmatrix} \quad \text{(iii)}\; \begin{bmatrix} 2 & 1 \\ 6 & 3 \end{bmatrix} \begin{bmatrix} x_1 \\ x_2 \end{bmatrix} = \begin{bmatrix} 4 \\ 10 \end{bmatrix} \quad \] Since \[ \begin{bmatrix} 2 & -1 \\ 1 & 2 \end{bmatrix}^{-1} = \frac{1}{5}\begin{bmatrix} 2 & 1 \\ -1 & 2 \end{bmatrix} \] the unique solution for system (i) is \[ x = A^{-1}b = \frac{1}{5} \begin{bmatrix} 2 & 1 \\ -1 & 2 \end{bmatrix} \begin{bmatrix} 4 \\ 2 \end{bmatrix} = \begin{bmatrix} 2 \\ 0 \end{bmatrix}. \] For systems (ii) and (iii), we find that the coefficient matrix \(A\) does not have an inverse, since \[ \mathrm{det}\begin{bmatrix} 2 & 1 \\ 6 & 3 \end{bmatrix} = 2 \cdot 3 - 1 \cdot 6 = 0\,. \]
Notice that non-existence of the coefficient matrix inverse does not imply that there are no solutions. It could be that there are multiple solutions.
5.4.3 The Determinant and Cramer’s Rule
Consider now the general \(2 \times 2\) system \[ \begin{array}{ccccc} a_{11} x_{1} &+& a_{12} x_{2} &=& b_{1} \\ a_{21} x_{1} &+& a_{22} x_{2} &=& b_{2} \\ \end{array} \;\;\;\text{or}\;\;\; Ax = b \tag{5.7}\] Solving this system gives \[ x_{1} = \frac{a_{22}\,b_{1}-a_{12}\,b_{2}}{a_{11}a_{22}-a_{12}a_{21}} \;\;\;\text{and}\;\;\; x_{2} = \frac{a_{11}\,b_{2}-a_{21}\,b_{1}}{a_{11}a_{22}-a_{12}a_{21}}\,. \] Of course, this is the solution only if the (common) denominator in both expressions is not zero. The denominator is just the determinant of the matrix \(A\). Notice also that the numerators of the solutions for \(x_1\) and \(x_2\) are, respectively, the determinants of the matrices \[ A_1(b) = \begin{bmatrix} b_1 & a_{12} \\ b_{2} & a_{22} \end{bmatrix} \;\;\;\text{and}\;\;\; A_2(b) = \begin{bmatrix} a_{11} & b_{1} \\ a_{21} & b_{2} \end{bmatrix}\,. \] These are just the matrix \(A\) with one column replaced by \(b\). This is Cramer’s Rule for systems of two equations in two unknowns: for system (5.7), the solutions are \[ x_{1} = \frac{\text{det}(A_1(b))}{\text{det}(A)} = \frac{\begin{vmatrix} b_1 & a_{12} \\ b_{2} & a_{22} \end{vmatrix}}{\begin{vmatrix} a_{11} & a_{12} \\ a_{21} & a_{22} \end{vmatrix}} \;\;\;\text{and}\;\;\; x_{2} = \frac{\text{det}(A_2(b))}{\text{det}(A)} = \frac{\begin{vmatrix} a_{11} & b_{1} \\ a_{21} & b_{2} \end{vmatrix}}{\begin{vmatrix} a_{11} & a_{12} \\ a_{21} & a_{22} \end{vmatrix}}\,. \] The idea extends to larger systems of equations with as many equations as unknowns. If you work out the solutions for the general three-equations three-unknowns system \[ \begin{array}{ccccccc} a_{11} x_{1} &+& a_{12} x_{2} &+& a_{13} x_{3} &=& b_{1} \\ a_{21} x_{1} &+& a_{22} x_{2} &+& a_{23} x_{3} &=& b_{2} \\ a_{31} x_{1} &+& a_{32} x_{2} &+& a_{33} x_{3} &=& b_{3} \end{array} \] you will find the solutions to be \[ \begin{aligned} x_{1} &= \frac {b_{1}a_{22}a_{33} + a_{12}a_{23}b_{3} + a_{13}b_{2}a_{32} - a_{13}a_{22}b_{3} - b_{1}a_{23}a_{32} - a_{12}b_{2}a_{33}} {a_{11}a_{22}a_{33} + a_{12}a_{23}a_{31} + a_{13}a_{21}a_{32} - a_{13}a_{22}a_{31} - a_{11}a_{23}a_{32} - a_{12}a_{21}a_{33}} \\[1ex] x_{2} &= \frac {a_{11}b_{2}a_{33} + b_{1}a_{23}a_{31} + a_{13}a_{21}b_{3} - a_{13}b_{2}a_{31} - a_{11}a_{23}b_{3} - b_{1}a_{21}a_{33}} {a_{11}a_{22}a_{33} + a_{12}a_{23}a_{31} + a_{13}a_{21}a_{32} - a_{13}a_{22}a_{31} - a_{11}a_{23}a_{32} - a_{12}a_{21}a_{33}} \\[1ex] x_{3} &= \frac {a_{11}a_{22}b_{3} + a_{12}b_{2}a_{31} + b_{1}a_{21}a_{32} - b_{1}a_{22}a_{31} - a_{11}b_{2}a_{32} - a_{12}a_{21}b_{3}} {a_{11}a_{22}a_{33} + a_{12}a_{23}a_{31} + a_{13}a_{21}a_{32} - a_{13}a_{22}a_{31} - a_{11}a_{23}a_{32} - a_{12}a_{21}a_{33}} \\ \end{aligned} \]
You do not want to memorize this solution, at least not in this form. But notice two things: first, the denominator is the same for all three expressions. We define the expression in the denominator to be the determinant of the \(3 \times 3\) coefficient matrix \[ A = \begin{bmatrix} a_{11} & a_{12} & a_{13} \\ a_{21} & a_{22} & a_{23} \\ a_{31} & a_{32} & a_{33} \end{bmatrix}. \] We must have \(\mathrm{det}(A) \neq 0\) in order for there to be a unique solution. Second, using this definition for the determinant, the numerators in the solutions for \(x_1\), \(x_2\) and \(x_3\) are, respectively, the determinants of the matrices \[ A_1(b) = \begin{bmatrix} b_{1} & a_{12} & a_{13} \\ b_{2} & a_{22} & a_{23} \\ b_{3} & a_{32} & a_{33} \end{bmatrix} ,\,\, A_2(b) = \begin{bmatrix} a_{11} & b_{1} & a_{13} \\ a_{21} & b_{2} & a_{23} \\ a_{31} & b_{3} & a_{33} \end{bmatrix} \,\,\text{and} \,\,\, A_3(b) = \begin{bmatrix} a_{11} & a_{12} & b_{1} \\ a_{21} & a_{22} & b_{2} \\ a_{31} & a_{32} & b_{3} \end{bmatrix}. \] This gives Cramer’s Rule for systems of three equations in three unknowns: \[ x_{1} = \dfrac{\text{det}(A_1(b))}{\text{det}(A)}\,,\; x_{2} = \dfrac{\text{det}(A_2(b))}{\text{det}(A)} \; \text{ and } \; x_{3} = \dfrac{\text{det}(A_3(b))}{\text{det}(A)}\,. \] The determinant for larger square matrices can be thought of in a similar way, as the (common) denominator in the solutions to the general \(n\)-equations in \(n\)-unknowns system \(Ax = b\). Furthermore, the solution to such a system is \[ x_{i} = \frac{\text{det}(A_i(b))}{\text{det}(A)},\,\,i=1,2,\dots,n\, \] where \(A_i(b)\) is the determinant of the matrix \(A\) with the \(i\)th column replaced by \(b\). See Tay, Preve, and Baydur (2025) for details on how determinants can be computed.
The following properties of determinants are useful:
if \(A\) has a row of zeros or a column of zeros, then \(\text{det}(A) = 0\).
if a single row or column of \(A\) is multiplied by some constant \(\alpha\), then its determinant is multiplied by \(\alpha\).
The determinant of a triangular matrix is the product of its diagonal elements.
\(\text{det}(A^{\mathrm{T}}) = \text{det}(A)\).
Every time we swap the rows of a matrix, its determinant changes sign. Same for columns.
Adding a multiple of one row to another row does not change the determinant. Same for columns.
If \(A\) and \(B\) are two square matrices, then \(\text{det}(AB) = \text{det}(A) \text{det}(B)\).
5.4.4 Exercises
Exercise 5.25 Find the inverse of the transpose of \(A = \begin{bmatrix} 0 & 2 & 4 \\ 3 & 1 & 2 \\ 6 & 2 & 1 \end{bmatrix}\). (Hint: see Example 5.13.)
Exercise 5.26 Show that the inverse of a diagonal matrix \(A = \text{diag}(a_{11}, \dots, a_{nn})\) is the diagonal matrix \[ A^{-1} = \text{diag}\left(\frac{1}{a_{11}}, \dots, \frac{1}{a_{nn}} \right). \]
Exercise 5.27 Suppose one row of a (square) matrix is a multiple of another row. Explain why this matrix has no inverse.
Exercise 5.28 Consider the following system of equations \[ \begin{array}{ccccccccc} 4 x_{1} &+& &+& \phantom{-1} x_{3} &=& 4 \\ 8 x_{1} &+& x_{2} &+& -3 x_{3} &=& 3 \\ 12 x_{1} &+& x_{2} &+& &=& 1 \\ \end{array} \]
Express this system in the form \(Ax = b\) and solve it by finding \(A^{-1}\) and then computing \(A^{-1}b\).
Verify your solution in a. by solving the system using Cramer’s Rule.
Exercise 5.29 Suppose \(A\) is an \(m \times m\) matrix and \(b\) and \(c\) are \(m \times 1\) vectors. Does \(Ab = Ac\) imply that \(b=c\)? If no, give a counterexample.
5.5 Matrix Definiteness
A \(n \times n\) symmetric matrix \(A\) is said to be positive definite if \[ x^\mathrm{T} A x > 0\;\text{ for all } \; n\text{-vectors }\;x \neq 0_n\,. \tag{5.8}\] If the inequality in (5.8) is non-strict, then \(A\) is positive semidefinite. If the inequality in (5.8) is reversed, \(A\) is negative definite. If it is reversed and made non-strict, then \(A\) is called negative semidefinite. We emphasize that the conditions must hold for all non-zero vectors \(x\). Expressions of the form \(x^\mathrm{T}Ax\) where \(x\) is \(n \times 1\) and \(A\) is \(n \times n\) and symmetric are called quadratic forms.
Example 5.15 The matrix \(\begin{bmatrix} 2 & 1 \\ 1 & 2\end{bmatrix}\) is positive definite since \[ \begin{bmatrix} x_1 & x_2\end{bmatrix} \begin{bmatrix} 2 & 1 \\ 1 & 2\end{bmatrix} \begin{bmatrix} x_1 \\ x_2\end{bmatrix} = 2(x_1^2 + x_1x_2 + x_2^2) = 2[(x_1 + 0.5x_2)^2 + 0.75x_2^2]>0 \] as long as \(x_1\) and \(x_2\) are not both zero.
Example 5.16 The matrix \(\begin{bmatrix} 1 & 2 \\ 2 & 1\end{bmatrix}\) is indefinite (not definite) since \[ Q = \begin{bmatrix} x_1 & x_2\end{bmatrix} \begin{bmatrix} 1 & 2 \\ 2 & 1\end{bmatrix} \begin{bmatrix} x_1 \\ x_2\end{bmatrix} = x_1^2 + 4x_1x_2 + x_2^2\,. \] If \(x_1 = 1\) and \(x_2 = 1\), then \(Q > 0\). If \(x_1 = 1\) and \(x_2 = -1\), then \(Q < 0\).
We will see later that “variance-covariance matrices” are always at least positive semidefiniteness, often positive definite (Section 5.7). The positive or negative definiteness of the “Hessian” of a multivariable function is also an indicator of whether a function is convex or concave, which in turn plays an important role in function optimization. Definiteness of matrices also play an important role in matrix factorizations, dynamic systems, and many other areas where matrix algebra is used. One method for checking the definiteness of matrices uses the determinants of certain submatrices of the matrix, called principal minors. Another uses eigenvalues. See Tay, Preve, and Baydur (2025) for details. Often we are able to surmise the definiteness of a matrix from its structure, as in Exercise 5.30.
5.5.1 Exercises
Exercise 5.30 Suppose \(X\) is \(n \times k\). Explain why the matrix \(X^\mathrm{T}X\) is positive semidefinite. Explain why it is positive definite if \(Xc \neq 0\) for all \(k\)-vectors \(c \neq 0_k\). (The next section explains the significance of the condition \(Xc \neq 0\) for all \(k\)-vectors \(c \neq 0_n\).) Hint: Consider the expression \(c^\mathrm{T}X^\mathrm{T}Xc\).
5.6 The Rank of a Matrix
A point \(x\) in \(\mathbb{R}^m\) can be thought of as a \(m\)-dimensional vector, or “\(m\)-vector”. If \(X=\{x_1, x_2, \dots, x_n\}\) is a set of \(n\) \(m\)-vectors, and if at least one of these vectors can be written as a linear combination of the others, i.e., if \[ x_i = c_1 x_1 + \dots + c_{i-1}x_{i-1} + c_{i+1} x_{i+1} + \dots + c_n x_n\,, \] then we say that the vectors are linearly dependent. Another way of saying this is that we can find \(c_1, c_2, \dots, c_n\), not all equal to zero, such that \[ c_1 x_1 + c_2 x_2 + \dots + c_n x_n = 0\,. \] If we cannot express any vector in \(X\) as a linear combination of the other vectors, then the vectors in \(X\) are linearly independent. In that case, the vectors in \(X\) will satisfy the condition \[ c_1 x_1 + c_2 x_2 + \dots + c_n x_n = 0 \quad \Rightarrow \quad c_1 = c_2 = \dots = c_n = 0\,. \] A vector space or subspace is a set of vectors such that linear combinations of vectors in the space always result in a vector in the space. Every vector space or subspace must contain the zero vector. The set of all linear combinations of the vectors in \(X\) is a vector subspace of \(\mathbb{R}^m\). The dimension of this subspace cannot exceed \(\min\{m,n\}\). Finally, recall that two vectors are orthogonal if their inner product is zero.
Consider an \(m \times n\) matrix \(A\), where possibly \(m \neq n\). We can view the columns of \(A\) as a collection of \(n\) \(m\)-vectors:
\[
\begin{aligned}
A \;
= \begin{bmatrix}
a_{11} & a_{12} & \dots & a_{1n} \\
a_{21} & a_{22} & \dots & a_{2n} \\
\vdots & \vdots & \ddots & \vdots \\
a_{m1} & a_{m2} & \dots & a_{mn}
\end{bmatrix}
=
\left[\begin{array}{@{}c|c|c|c@{}}
a_{11} & a_{12} & \cdots & a_{1n} \\
a_{21} & a_{22} & \cdots & a_{2n} \\
\vdots & \vdots & \ddots & \vdots \\
a_{m1} & a_{m2} & \cdots & a_{mn} \\
\end{array}\right].
\end{aligned}
\] Linear combinations of the column vectors of \(A\) can be written as \(Ax\) where \(x\) is some \(n\)-vector. If we consider the function \[
y = f(x) = Ax\,,\,\,x \in \mathbb{R}^n
\tag{5.9}\] mapping \(n\)-vectors into \(m\)-vectors, then the range of this function is the set of all linear combinations of the columns of \(A\), spanning a vector subspace of \(\mathbb{R}^m\) of dimension \(r\leq\min\{m,n\}\). We call this subspace the column space of \(A\) and refer to \(r\) as the column rank of \(A\).
Likewise, we can view the rows of \(A\) as a collection of \(m\) \(n\)-vectors, i.e., \[ A = \begin{bmatrix} a_{11} & a_{12} & \dots & a_{1n} \\ a_{21} & a_{22} & \dots & a_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ a_{m1} & a_{m2} & \dots & a_{mn} \end{bmatrix} = \left[\begin{array}{@{}cccc@{}} a_{11} & a_{12} & \cdots & a_{1n} \\[0.5ex] \hline a_{21} & a_{22} & \cdots & a_{2n} \\[0.5ex] \hline \vdots & \vdots & \ddots & \vdots \\[0.5ex] \hline a_{m1} & a_{m2} & \cdots & a_{mn} \end{array}\right]. \] Linear combinations of the \(m\) row vectors can be written as \(y^{T}A\) or \(A^\mathrm{T}y\) where \(y\) is an \(m\)-vector. The range of the function \[ x = g(y) = A^\mathrm{T}y\,,\,\,y \in \mathbb{R}^m \tag{5.10}\] is the column space of \(A^\mathrm{T}\), which is also the row space of \(A\), since the columns of \(A^\mathrm{T}\) are the rows of \(A\). The dimension of the row space is called the row rank of \(A\).
It turns out that for any matrix \(A\), the row and column ranks of \(A\) are the same. Suppose the column rank of \(A\) is \(r\). This means we can find \(r\) linearly independent columns in \(A\). Gather these columns into a \(m \times r\) matrix \(C\). Since every column of \(A\) can be written as a linear combination of the \(r\) columns in \(C\), we can write \(A = CR\) where \(R\) is \(r \times n\), each column containing the necessary weights to generate the corresponding columns of \(A\) as a linear combination of the vectors in \(C\). However, the fact that \(A = CR\) also means that every row of \(A\) is a linear combination of the rows of \(R\), the necessary weights appearing in the corresponding rows of \(C\). Since \(R\) has \(r\) rows, the row rank of \(A\) also cannot exceed \(r\), i.e., \[ \text{row rank}(A) \leq r = \text{column rank}(A)\,. \] Applying a similar argument to \(A^\mathrm{T}\) shows that the row rank of \(A^\mathrm{T}\) must be less than or equal to the column rank of \(A^\mathrm{T}\). But since the rows of \(A^\mathrm{T}\) are the columns of \(A\), we have \[ \text{column rank}(A) \leq \text{row rank}(A)\,. \] It follows that \[ \text{column rank}(A) = \text{row rank}(A). \tag{5.11}\]
We can therefore speak unambiguously of the “rank” of a matrix \(A\), and simply write \(\text{rank}(A)\), where \(0 \leq \text{rank}(A) \leq \min\{m,n\}\). If \(\text{rank}(A) = \min\{m,n\}\), then we say that \(A\) has full rank. If this coincides with the number of columns \(n\), \(r = n \leq m\), we can also say that the matrix has full column rank. If the rank coincides with the number of rows, \(r = m \leq n\), we say that it has full row rank.
A square \(n \times n\) matrix has an inverse if (and only if) \(A\) has full rank. The following are three further results regarding matrix rank:
- For any matrices \(A\) and \(B\) such that \(AB\) exists, we have \[ \text{rank}(AB) \leq \min\{\text{rank}(A), \text{rank}(B)\}. \]
This result holds because the columns of \(AB\) are linear combinations of the columns of \(A\), therefore \(\text{rank}(AB) \leq \text{rank}(A)\). Likewise, the rows of \(AB\) are linear combinations of the rows of \(B\), therefore \(\text{rank}(AB) \leq \text{rank}(B)\). It follows that \(\text{rank}(AB) \leq \min\{\text{rank}(A), \text{rank}(B)\}\).
If \(A\) is a full rank \(m \times m\) matrix and \(B\) is \(m \times p\) of rank \(r\), then \(\text{rank}(AB) = r\).
For any matrix \(A\), we have \[ \text{rank}(A^\mathrm{T}A) = \text{rank}(AA^\mathrm{T})=\text{rank}(A)\,. \] For a proof, see Tay, Preve, and Baydur (2025).
5.7 Vectors and Matrices of Random Variables
Organizing large numbers of random variables using matrix algebra provides convenient formulas for manipulating their expectations, variances and covariances, and for expressing their joint pdf.
5.7.1 Expectations and Variance-Covariance Matrices
The expectation of a vector \(x\) of \(m\) random variables \(x = \begin{bmatrix} X_1 & X_2 & \dots & X_m \end{bmatrix}^\mathrm{T}\) is defined as the vector of their expectations, i.e., \[ E(x) = \begin{bmatrix} E(X_1) & E(X_2) & \dots & E(X_m) \end{bmatrix}^\mathrm{T}. \] Likewise, if \(X\) is a matrix of random variables, then
\[ X = \begin{bmatrix} X_{11} & X_{12} & \dots & X_{1n} \\ X_{21} & X_{22} & \dots & X_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ X_{m1} & X_{m2} & \dots & X_{mn} \\ \end{bmatrix} \;\Leftrightarrow \; E(X) = \begin{bmatrix} E(X_{11}) & E(X_{12}) & \dots & E(X_{1n}) \\[0.5ex] E(X_{21}) & E(X_{22}) & \dots & E(X_{2n}) \\[0.5ex] \vdots & \vdots & \ddots & \vdots \\[0.5ex] E(X_{m1}) & E(X_{m2}) & \dots & E(X_{mn}) \\ \end{bmatrix}. \]
With these definitions, we can define the variance-covariance matrix of a vector \(x\) of random variables. Let \[ \tilde{x} = x - E(x) = \begin{bmatrix} X_1-E(X_1) \\[0.5ex] X_2-E(X_2) \\[0.5ex] \vdots \\[0.5ex] X_m-E(X_m) \end{bmatrix} = \begin{bmatrix} \tilde{X}_1 \\[0.5ex] \tilde{X}_2 \\[0.5ex] \vdots \\[0.5ex] \tilde{X}_m \end{bmatrix}. \] Then \[ \begin{aligned} E(\tilde{x}\tilde{x}^\mathrm{T}) &= E((x - E(x))(x - E(x))^\mathrm{T}) \\[2ex] &= E \begin{bmatrix} \tilde{X}_1^2 & \tilde{X}_1\tilde{X}_2 & \dots & \tilde{X}_1\tilde{X}_m \\ \tilde{X}_2\tilde{X}_1 & \tilde{X}_2^2 & \dots & \tilde{X}_2\tilde{X}_m \\ \vdots & \vdots & \ddots & \vdots \\ \tilde{X}_m\tilde{X}_1 & \tilde{X}_m\tilde{X}_2 & \dots & \tilde{X}_m\tilde{X}_m \\ \end{bmatrix} \\[2ex] &= \begin{bmatrix} E(\tilde{X}_1^2) & E(\tilde{X}_1\tilde{X}_2) & \dots & E(\tilde{X}_1\tilde{X}_m) \\ E(\tilde{X}_2\tilde{X}_1) & E(\tilde{X}_2^2) & \dots & E(\tilde{X}_2\tilde{X}_m) \\ \vdots & \vdots & \ddots & \vdots \\ E(\tilde{X}_m\tilde{X}_1) & E(\tilde{X}_m\tilde{X}_2) & \dots & E(\tilde{X}_m\tilde{X}_m) \\ \end{bmatrix} \\[2ex] &= \begin{bmatrix} \mathit{Var}(X_1) & \mathit{Cov}(X_1,X_2) & \dots & \mathit{Cov}(X_1,X_m) \\ \mathit{Cov}(X_1,X_2) & \mathit{Var}(X_2) & \dots & \mathit{Cov}(X_2,X_m) \\ \vdots & \vdots & \ddots & \vdots \\ \mathit{Cov}(X_1,X_m) & \mathit{Cov}(X_2,X_m) & \dots & \mathit{Var}(X_m) \end{bmatrix}\,. \end{aligned} \tag{5.12}\] In other words, \(E((x - E(x))(x - E(x))^\mathrm{T})\) is a symmetric matrix containing the variances of all of the variables in \(x\), and their covariances. We denote the variance-covariance matrix of a vector of random variables \(x\) by \(\mathit{Var}(x)\): \[ \mathit{Var}(x) = E((x - E(x))(x - E(x))^\mathrm{T})\,. \]
Example 5.17 Let \(X_1\), \(X_2\) and \(X_3\) be random variables with \[ \begin{gathered} E(X_1)=1, E(X_2)=3, E(X_3)=5, \\[0.5ex] \mathit{Var}(X_1)=2, \mathit{Var}(X_2)=3, \mathit{Var}(X_3)=2, \;\;\text{and} \\[0.5ex] \mathit{Cov}(X_1, X_2) = 1, \mathit{Cov}(X_1, X_3) = 0, \mathit{Cov}(X_2, X_3) = 2 \end{gathered} \] and let \(x\) be the \(3 \times 1\) vector \(\begin{bmatrix} X_1 & X_2 & X_3 \end{bmatrix}^\mathrm{T}\). Then \[ E(X) = \begin{bmatrix} 1 \\ 3 \\ 5 \end{bmatrix} \;\;\text{ and } \;\; \mathit{Var}(X) = \begin{bmatrix} 2 & 1 & 0 \\ 1 & 3 & 2 \\ 0 & 2 & 2 \end{bmatrix} . \]
Recall that if \(X\) is a (univariate) random variable, then \(E(aX + b) = aE(X) + b\), \(\mathit{Var}(aX + b) = a^2\mathit{Var}(X)\), and \(\mathit{Var}(X) = E(X^2) - E(X)^2\). The following are the matrix analogues of these results. Suppose \(x\) is an \(m \times 1\) vector of random variables, \(A=(a_{ij})_{km}\) is a \(k \times m\) matrix of constants, and \(b\) is a \(k \times 1\) vector of constants. Then
- \(E(Ax + b) = AE(x) + b\),
- \(\mathit{Var}(Ax + b) = A\mathit{Var}(x)A^\mathrm{T}\),
- \(\mathit{Var}(x) = E(xx^\mathrm{T}) - E(x)E(x)^\mathrm{T}\).
To show (i), we note that the \(i\)th element of the \(k \times 1\) vector \(Ax + b\) is \(\sum_{j=1}^m (a_{ij}X_j + b_i)\), and the expectation of this term is \[ E\left(\sum_{j=1}^m (a_{ij}X_j + b_i)\right) = \sum_{j=1}^m a_{ij}E(X_j) + b_i\,, \] which in turn is the \(i\)th element of the vector \(AE(x) + b\). For (ii), since \(Ax + b- E(Ax + b) = A(x-E(x)) = A\tilde{x}\), we have \[ \begin{aligned} \mathit{Var}(Ax + b) &= E((A\tilde{x})(A\tilde{x})^\mathrm{T}) = E(A\tilde{x}\tilde{x}^\mathrm{T}A^\mathrm{T}) = AE(\tilde{x}\tilde{x}^\mathrm{T})A^\mathrm{T} = A\,\mathit{Var}(x)A^\mathrm{T}\,. \end{aligned} \] You are asked to prove (iii) in Exercise 5.31.
Example 5.18 Given a vector of random variables \(x\), the linear combination \(c^\mathrm{T}x\) of the random variables in \(x\) has variance-covariance matrix \[ \mathit{Var}(c^\mathrm{T}x) = c^\mathrm{T}\mathit{Var}(x)c\,. \] Since variances cannot be negative, we have \(c^\mathrm{T}\mathit{Var}(x)c \geq 0\) for all \(c\), i.e., \(\mathit{Var}(x)\) is a positive semidefinite matrix. If there is a linear combination of the random variables in \(x\) that has zero variance, then at least one or more of the variables in \(x\) is actually a constant (a “degenerate random variable”), or at least one of the variables in \(x\) is a linear combination of the others. Otherwise we have \(c^\mathrm{T}\mathit{Var}(x)c > 0\) for all \(c \neq 0\), i.e., \(\mathit{Var}(x)\) is positive definite.
5.7.2 The Multivariate Normal Distribution
We presented the pdf of a bivariate normal distribution in Section 3.6. We present here the pdf of a general multivariate normal distribution and some associated results. A \(k \times 1\) vector of random variables \(x\) is said to have a multivariate normal distribution with mean \(\mu\) and positive definite variance-covariance matrix \(\Sigma\), denoted \(\mathrm{Normal}_k(\mu,\Sigma)\), if its pdf has the form \[ f(x) = (2\pi)^{-\frac{k}{2}}\text{det}(\Sigma)^{-\frac{1}{2}}\exp\left\{-\frac{1}{2}(x-\mu)^\mathrm{T}\Sigma^{-1}(x-\mu)\right\}\,. \] We list a few results below, omitting proofs:
(a) If \(\Sigma\) is diagonal, then \(X_1, X_2, \dots, X_k\) are independent random variables.
(b) If \(x \sim \mathrm{Normal}_k(\mu, \Sigma)\), then for \(A_{m \times k}\) and \(b_{m \times 1}\), \[ Ax + b \;\sim\; \mathrm{Normal}_m(A\mu+b, A\Sigma A^\mathrm{T}). \]
(c) If we partition \(x\) as \[ \begin{bmatrix} x_1 \\ x_2 \end{bmatrix} \sim \mathrm{Normal}_k\left( \begin{bmatrix} \mu_1 \\ \mu_2 \end{bmatrix}, \begin{bmatrix} \Sigma_{11} & \Sigma_{12} \\ \Sigma_{21} & \Sigma_{22} \end{bmatrix} \right) \] where \(x_1\) is \(k_1 \times 1\) and \(x_2\) is \(k_2 \times 1\), with \(k_1 + k_2 = k\), then the marginal distribution of \(x_1\) is \(\mathrm{Normal}_{k_1}(\mu_1, \Sigma_{11})\), and the conditional distribution of \(x_2\) given \(x_1\) is \[ x_2 \mid x_1 \sim \mathrm{Normal}_{k_2}(\mu_{2 \mid 1}, \Sigma_{22\mid 1}) \] where \(\mu_{2 \mid 1} = \mu_2 + \Sigma_{21}\Sigma_{11}^{-1}(x_1-\mu_1)\) and \(\Sigma_{22 \mid 1} = \Sigma_{22} - \Sigma_{21}\Sigma_{11}^{-1}\Sigma_{12}\).
(d) If \(x \sim \mathrm{Normal}_k(0,I)\) and \(A\) is a rank \(v\) symmetric matrix such that \(AA=A\), then the scalar \(x^\mathrm{T} A x\) is distributed \(\chi^2{(v)}\): \[ x^\mathrm{T} A x \;\sim\; \chi^2{(v)}\,. \] Matrices \(A\) such that \(AA=A\) are said to be idempotent.
(e) If \(x \sim \mathrm{Normal}_k(\mu,\Sigma)\), then \((x-\mu)^\mathrm{T} \Sigma^{-1} (x-\mu) \sim \chi^2{(k)}\).
5.7.3 Exercises
Exercise 5.31 Show that \(\mathit{Var}(x) = E(xx^\mathrm{T}) - E(x)E(x)^\mathrm{T}\).
Exercise 5.32 Show that \(E(\mathrm{trace}(X)) = \mathrm{trace}(E(X))\) where \(X = (X_{ij})_{n \times n}\) is a matrix of random variables.
5.8 Differentiation of Matrix Forms
There are useful differentiation formulas available when the expression for the function to be differentiated has certain matrix forms. The following are a few particularly important examples.
Example 5.19 If \(y = x^\mathrm{T}Ax\) where \(A = (a_{jk})_{nn}\) is \(n \times n\) and \(x\) is \(n \times 1\), then \[ \nabla y = \dfrac{\partial y}{\partial x} = \dfrac{\partial }{\partial x}\left(x^\mathrm{T}Ax\right) = (A+A^\mathrm{T})x. \tag{5.13}\] Proof: \(y = x^\mathrm{T}Ax = \sum_{j=1}^n \sum_{k=1}^n a_{jk}x_jx_k\). The derivative \(\partial y/\partial x\) is the \(n \times 1\) vector whose \(i\)-\(th\) element is \[ \frac{\partial}{\partial x_i}\left(\sum_{j=1}^n \sum_{k=1}^n a_{jk}x_jx_k\right) = \underbrace{\sum_{k=1}^n a_{ik}x_k}_{\text{when}\;j=i} + \underbrace{\sum_{j=1}^n a_{ji}x_j}_{\text{when}\;k=i}\,. \] The first sum after the equality is the product of the \(ith\) row of \(A\) into \(x\). The second sum after the equality is the product of the \(ith\) row of \(A^\mathrm{T}\) into \(x\). In other words, \(\partial y/\partial x = (A+A^\mathrm{T})x\).
It may be helpful to you to verify (5.13) by direct differentiation for a special case, say, where \(A\) is \(2 \times 2\). You are asked to do this in an exercise. Note that if \(A\) is symmetric, then (5.13) becomes \[ \nabla y = \dfrac{\partial y}{\partial x} = \dfrac{\partial}{\partial x}x^\mathrm{T}Ax = (A+A^\mathrm{T})x = 2Ax. \tag{5.14}\] This result is the matrix analogue of the univariate differentiation rule \(f(x)=ax^2 \Rightarrow f'(x)=2ax\).
Example 5.20 Let \(y = f(x) = Ax\) where \(A = (a_{ij})_{mn}\) is \(m \times n\) and \(x = \begin{bmatrix} x_1 & x_2 & \dots & x_n \end{bmatrix}^\mathrm{T}\) is \(n \times 1\). This is an example of a vector-valued function, mapping \(x \in \mathbb{R}^n\) into \(y \in \mathbb{R}^m\). We have \[ Df = \dfrac{\partial y}{\partial x^\mathrm{T}} = A\,. \tag{5.15}\] This is the matrix analogue of the univariate differentiation rule \(f(x) = ax \Rightarrow f'(x) = a\).
Proof: The product \(Ax\) is an \(m \times 1\) vector whose \(i\)-\(th\) element is \(\sum_{k=1}^n a_{ik}x_k\). Therefore the \((i,j)th\) element of \(\partial y/\partial x^\mathrm{T}\) is \((\partial/\partial x_j)\sum_{k=1}^n a_{ik}x_k = a_{ij}\). This says that \(\partial y/\partial x^\mathrm{T}=A\).
Example 5.21 The previous two examples show that if \(y = x^\mathrm{T}Ax\) where \(A\) is an \(n \times n\) symmetric matrix of constants and \(x\) is an \(n \times 1\) vector of variables, then the Hessian is \[ \dfrac{d^2y}{dxdx^\mathrm{T}} = D(\nabla y) = D(2Ax) = 2A\,. \]
5.8.1 Exercises
Exercise 5.33 (a) Show that if \(y = f(x) = x^\mathrm{T}A\) where \(A = (a_{ij})_{mn}\) is a \(m \times n\) matrix of constants and \(x^\mathrm{T} = \begin{bmatrix} x_1 & x_2 & \dots & x_m \end{bmatrix}\), then \[ \partial y / \partial x = A\,. \]
(b) If \(c\) and \(x\) are \(n \times 1\) vectors, show that \[ \dfrac{\partial}{\partial x}c^\mathrm{T}x = c\,. \]
Exercise 5.34 Let \(A=(a_{ij})_{2,2}\) be a \(2 \times 2\) matrix of constants, and \(x\) be a \(2 \times 1\) vector of variables. Multiply out \(x^\mathrm{T}Ax\) in full, and show by direct differentiation that \[ \dfrac{\partial y}{\partial x} = (A+A^\mathrm{T})x\,. \]
A square zero matrix is therefore technically also a diagonal matrix.↩︎