Ask your own question, for FREE!
MIT 18.06 Linear Algebra, Spring 2010 27 Online
OpenStudy (anonymous):

In lecture 16, in minute 27, the professor talks about solving for the least squared error using calculus and taking partial derivatives. I can't under stand why we set for example d=0 and proceed ? What relation does this have to gradient descent ?

OpenStudy (joshdanziger23):

salehmamdouh1984, in effect we are considering the square of the length of the error vector ||e||^2 as a function f(C,D); Prof Strang shows on the board that in his example f(C,D) = (C+D-1)^2 + (C+2D-2)^2 + (C+3D-2)^2; the intention is to choose C and D to minimise f(C,D). The way to find the minimum is to find the two partial derivatives df/dC and df/dD; we know that there is a stationary point where df/dC=df/dD=0. Prof Strang glosses over the point, but it's not hard to show (by taking second partial derivatives) that the stationary point is a minimum. Of course the whole point is to demonstrate that calculus gets you the same answer you can get much more quickly by looking for e to be orthogonal to the columns of A ==> A'Ax = A'b. Josh.

OpenStudy (anonymous):

What I am trying to understand,what is the difference between doing the projection method VS the gradient descent algorithm ?

OpenStudy (joshdanziger23):

salehmamdouh1984, isn't the gradient descent algorithm a numerical technique to use when you can't find a minimum any other way? I'm not sure that that's called for when you can go direct to the solution either by projection or by calculus. Josh.

Can't find your answer? Make a FREE account and ask your own questions, OR help others and earn volunteer hours!

Join our real-time social learning platform and learn together with your friends!
Can't find your answer? Make a FREE account and ask your own questions, OR help others and earn volunteer hours!

Join our real-time social learning platform and learn together with your friends!