6.1 Activity

Prompt

Given the matrices:

\[ \mathbf{X} = \begin{bmatrix} 0 & 0 \\ 0 & 1 \\ 1 & 0 \\ 1 & 1 \end{bmatrix}, \quad \mathbf{W} = \begin{bmatrix} 1 & 1 \\ 1 & 1 \end{bmatrix}, \quad \mathbf{c} = \begin{bmatrix} 0 & -1 \end{bmatrix} \]

And the neural network prediction function:

\[ \mathbf{y} = \max(0, \mathbf{XW} + \mathbf{c}) \]

How can we merge \(\mathbf{W}\) and \(\mathbf{c}\) into some new matrix \(\mathbf{W}_{new}\) and modify \(\mathbf{X}\) into some new matrix \(\mathbf{X}_{new}\) so that the only one matrix multiply is required i.e.,

\[ \max(0, \mathbf{XW} + \mathbf{c})=\max(0,\mathbf{X}_{new} \mathbf{W}_{new}) \]

Solution

To merge \(\mathbf{W}\) and \(\mathbf{c}\) into a single matrix \(\mathbf{W}_{new}\) and modify \(\mathbf{X}\) into \(\mathbf{X}_{new}\)​, we need to rewrite the affine transformation + as a single matrix multiplication:

Step 1: Expressing the Transformation

We have:

\[ \mathbf{y} = \max(0, \mathbf{XW} + \mathbf{c}) \]

where

\[ \begin{bmatrix} 0 & 0 \\ 0 & 1 \\ 1 & 0 \\ 1 & 1 \end{bmatrix}, \quad \mathbf{W} = \begin{bmatrix} 1 & 1 \\ 1 & 1 \end{bmatrix}, \quad \mathbf{c} = \begin{bmatrix} 0 & -1 \end{bmatrix} \]

Computing \(\mathbf{XW}\):

\[ \mathbf{XW} = \begin{bmatrix} 0 & 0 \\ 0 & 1 \\ 1 & 0 \\ 1 & 1 \end{bmatrix} \begin{bmatrix} 1 & 1 \\ 1 & 1 \end{bmatrix} = \begin{bmatrix} 0(1) + 0(1) & 0(1) + 0(1) \\ 0(1) + 1(1) & 0(1) + 1(1) \\ 1(1) + 0(1) & 1(1) + 0(1) \\ 1(1) + 1(1) & 1(1) + 1(1) \end{bmatrix} = \begin{bmatrix} 0 & 0 \\ 1 & 1 \\ 1 & 1 \\ 2 & 2 \end{bmatrix} \]

Adding \(\mathbf{c}\) to each row:

\[ \mathbf{XW} + \mathbf{c} = \begin{bmatrix} 0 & 0 \\ 1 & 1 \\ 1 & 1 \\ 2 & 2 \end{bmatrix} + \begin{bmatrix} 0 & -1 \\ 0 & -1 \\ 0 & -1 \\ 0 & -1 \end{bmatrix} = \begin{bmatrix} 0 & -1 \\ 1 & 0 \\ 1 & 0 \\ 2 & 1 \end{bmatrix} \]

Applying ReLU:

\[ \mathbf{y} = \max(0, \mathbf{XW} + \mathbf{c}) = \begin{bmatrix} 0 & 0 \\ 1 & 0 \\ 1 & 0 \\ 2 & 1 \end{bmatrix} \]

Step 2: Reformulating as a Single Matrix Multiplication

We introduce an augmented matrix \(\mathbf{X}_{new}\) by adding a column of ones to \(\mathbf{X}\):

\[ \mathbf{X}_{new} = \begin{bmatrix} 0 & 0 & 1 \\ 0 & 1 & 1 \\ 1 & 0 & 1 \\ 1 & 1 & 1 \end{bmatrix} \]

And define \(\mathbf{W}_{new}\) as:

\[ \mathbf{W}_{new}= \begin{bmatrix} 1 & 1 \\ 1 & 1 \\ 0 & -1 \end{bmatrix} \]

Now,

\[ \mathbf{W}_{new} \ \mathbf{X}_{new} = \begin{bmatrix} 0 & 0 & 1 \\ 0 & 1 & 1 \\ 1 & 0 & 1 \\ 1 & 1 & 1 \end{bmatrix} \begin{bmatrix} 1 & 1 \\ 1 & 1 \\ 0 & -1 \end{bmatrix} \] \[ =\begin{bmatrix} 0(1) + 0(1) + 1(0) & 0(1) + 0(1) + 1(-1) \\ 0(1) + 1(1) + 1(0) & 0(1) + 1(1) + 1(-1) \\ 1(1) + 0(1) + 1(0) & 1(1) + 0(1) + 1(-1) \\ 1(1) + 1(1) + 1(0) & 1(1) + 1(1) + 1(-1) \end{bmatrix} \]

\[ =\begin{bmatrix} 0 & -1 \\ 1 & 0 \\ 1 & 0 \\ 2 & 1 \end{bmatrix} \]

Which matches \(\mathbf{XW} + \mathbf{c}\), confirming that:

\[ \max(0, \mathbf{XW} + \mathbf{c}) = \max(0, \mathbf{X}{new} \ \mathbf{W}{new}) \]

Final Answer

\[ \mathbf{X}\_{new} = \begin{bmatrix} 0 & 0 & 1 \\ 0 & 1 & 1 \\ 1 & 0 & 1 \\ 1 & 1 & 1 \end{bmatrix}, \quad \mathbf{W}\_{new} = \begin{bmatrix} 1 & 1 \\ 1 & 1 \\ 0 & -1 \end{bmatrix} \]