In lecture 16, in minute 27, the professor talks about solving for the least squared error using calculus and taking partial derivatives.

I can't under stand why we set for example d=0 and proceed ? What relation does this have to gradient descent ?

Question

In lecture 16, in minute 27, the professor talks about solving for the least squared error using calculus and taking partial derivatives. 

I can't under stand why we set for example d=0 and  proceed ? What relation does this have to gradient descent ?

joshdanziger23 · Answer

salehmamdouh1984, in effect we are considering the square of the length  of the error vector ||e||^2 as a function f(C,D); Prof Strang shows on the board that in his example f(C,D) = (C+D-1)^2 + (C+2D-2)^2 + (C+3D-2)^2; the intention is to choose C and D to minimise f(C,D). The way to find the minimum is to find the two partial derivatives df/dC and df/dD; we know that there is a stationary point where df/dC=df/dD=0.  Prof Strang glosses over the point, but it's not hard to show (by taking second partial derivatives) that the stationary point is a minimum.  Of course the whole point is to demonstrate that calculus gets you the same answer you can get much more quickly by looking for e to be orthogonal to the columns of A ==> A'Ax = A'b. Josh.

anonymous · Answer

What I am trying to understand,what is the difference between doing the projection method VS the gradient descent algorithm ?

joshdanziger23 · Answer

salehmamdouh1984, isn't the gradient descent algorithm a numerical technique to use when you can't find a minimum any other way? I'm not sure that that's called for when you can go direct to the solution either by projection or by calculus.  Josh.