6.1 Activity
Prompt
Given the matrices:
\[ \mathbf{X} = \begin{bmatrix} 0 & 0 \\ 0 & 1 \\ 1 & 0 \\ 1 & 1 \end{bmatrix}, \quad \mathbf{W} = \begin{bmatrix} 1 & 1 \\ 1 & 1 \end{bmatrix}, \quad \mathbf{c} = \begin{bmatrix} 0 & -1 \end{bmatrix} \]
And the neural network prediction function:
\[ \mathbf{y} = \max(0, \mathbf{XW} + \mathbf{c}) \]
How can we merge \(\mathbf{W}\) and \(\mathbf{c}\) into some new matrix \(\mathbf{W}_{new}\) and modify \(\mathbf{X}\) into some new matrix \(\mathbf{X}_{new}\) so that the only one matrix multiply is required i.e.,
\[ \max(0, \mathbf{XW} + \mathbf{c})=\max(0,\mathbf{X}_{new} \mathbf{W}_{new}) \]
Solution
To merge \(\mathbf{W}\) and \(\mathbf{c}\) into a single matrix \(\mathbf{W}_{new}\) and modify \(\mathbf{X}\) into \(\mathbf{X}_{new}\), we need to rewrite the affine transformation + as a single matrix multiplication:
Step 1: Expressing the Transformation
We have:
\[ \mathbf{y} = \max(0, \mathbf{XW} + \mathbf{c}) \]
where
\[ \begin{bmatrix} 0 & 0 \\ 0 & 1 \\ 1 & 0 \\ 1 & 1 \end{bmatrix}, \quad \mathbf{W} = \begin{bmatrix} 1 & 1 \\ 1 & 1 \end{bmatrix}, \quad \mathbf{c} = \begin{bmatrix} 0 & -1 \end{bmatrix} \]
Computing \(\mathbf{XW}\):
\[ \mathbf{XW} = \begin{bmatrix} 0 & 0 \\ 0 & 1 \\ 1 & 0 \\ 1 & 1 \end{bmatrix} \begin{bmatrix} 1 & 1 \\ 1 & 1 \end{bmatrix} = \begin{bmatrix} 0(1) + 0(1) & 0(1) + 0(1) \\ 0(1) + 1(1) & 0(1) + 1(1) \\ 1(1) + 0(1) & 1(1) + 0(1) \\ 1(1) + 1(1) & 1(1) + 1(1) \end{bmatrix} = \begin{bmatrix} 0 & 0 \\ 1 & 1 \\ 1 & 1 \\ 2 & 2 \end{bmatrix} \]
Adding \(\mathbf{c}\) to each row:
\[ \mathbf{XW} + \mathbf{c} = \begin{bmatrix} 0 & 0 \\ 1 & 1 \\ 1 & 1 \\ 2 & 2 \end{bmatrix} + \begin{bmatrix} 0 & -1 \\ 0 & -1 \\ 0 & -1 \\ 0 & -1 \end{bmatrix} = \begin{bmatrix} 0 & -1 \\ 1 & 0 \\ 1 & 0 \\ 2 & 1 \end{bmatrix} \]
Applying ReLU:
\[ \mathbf{y} = \max(0, \mathbf{XW} + \mathbf{c}) = \begin{bmatrix} 0 & 0 \\ 1 & 0 \\ 1 & 0 \\ 2 & 1 \end{bmatrix} \]
Step 2: Reformulating as a Single Matrix Multiplication
We introduce an augmented matrix \(\mathbf{X}_{new}\) by adding a column of ones to \(\mathbf{X}\):
\[ \mathbf{X}_{new} = \begin{bmatrix} 0 & 0 & 1 \\ 0 & 1 & 1 \\ 1 & 0 & 1 \\ 1 & 1 & 1 \end{bmatrix} \]
And define \(\mathbf{W}_{new}\) as:
\[ \mathbf{W}_{new}= \begin{bmatrix} 1 & 1 \\ 1 & 1 \\ 0 & -1 \end{bmatrix} \]
Now,
\[ \mathbf{W}_{new} \ \mathbf{X}_{new} = \begin{bmatrix} 0 & 0 & 1 \\ 0 & 1 & 1 \\ 1 & 0 & 1 \\ 1 & 1 & 1 \end{bmatrix} \begin{bmatrix} 1 & 1 \\ 1 & 1 \\ 0 & -1 \end{bmatrix} \] \[ =\begin{bmatrix} 0(1) + 0(1) + 1(0) & 0(1) + 0(1) + 1(-1) \\ 0(1) + 1(1) + 1(0) & 0(1) + 1(1) + 1(-1) \\ 1(1) + 0(1) + 1(0) & 1(1) + 0(1) + 1(-1) \\ 1(1) + 1(1) + 1(0) & 1(1) + 1(1) + 1(-1) \end{bmatrix} \]
\[ =\begin{bmatrix} 0 & -1 \\ 1 & 0 \\ 1 & 0 \\ 2 & 1 \end{bmatrix} \]
Which matches \(\mathbf{XW} + \mathbf{c}\), confirming that:
\[ \max(0, \mathbf{XW} + \mathbf{c}) = \max(0, \mathbf{X}{new} \ \mathbf{W}{new}) \]
Final Answer
\[ \mathbf{X}\_{new} = \begin{bmatrix} 0 & 0 & 1 \\ 0 & 1 & 1 \\ 1 & 0 & 1 \\ 1 & 1 & 1 \end{bmatrix}, \quad \mathbf{W}\_{new} = \begin{bmatrix} 1 & 1 \\ 1 & 1 \\ 0 & -1 \end{bmatrix} \]