Breaking down the Normal Equation

Let me begin by presenting the complete derivation of the solution to the Closed Form Regression Equation from Chieh Wu's Regression Lecture. Preceding the derivation will I explain in detail a step-by-step breakdown of each line.

$$ \phi w = y \newline

(\phi^ \intercal \phi )w = \phi^ \intercal y \newline

(\phi^ \intercal \phi)^ {-1} (\phi^ \intercal \phi) w = (\phi^ \intercal \phi)^ {-1} \phi^ \intercal y \newline

w = (\phi^ \intercal \phi)^ {-1} \phi^ \intercal y \newline $$

1) The Equation

$$ \phi w = y $$

Not much to explain in the first line! The first line just illustrates the aim of what is to be done in Regression. Given that we know the basis functions, the x values, and the corresponding y values, we hope to find the best weights (the w that when multiplying by $\phi$ will give us y)!

This ultimately involves rearranging the equation to isolate the variable w, leading us to the weight values that will satisfy the equation.

Note

The tutorial will go in depth about how to solve the normal equation, but not what the equation actually means! To understand what is meant by $\phi$, and $w$, refer to this tutorial where I explain them in detail.

2) Multiplying the transpose of the feature map

$$ (\phi^ \intercal \phi )w = \phi^ \intercal y $$

The second line involves multiplying $\phi ^ \intercal$ on both sides of the equation. You may be questioning the purpose of this, as it seems rhetorical. Why not just multiply both sides of the equation by $\phi ^ {-1}$? The reason for doing this is because more often than not the feature map is not a square matrix.

<aside> 💡 Only Square Matrices are invertible, hence, we need to transform the matrix to a square one if we wish to perform step 3

</aside>