In lecture 16, in minute 27, the professor talks about solving for the least squared error using calculus and taking partial derivatives. I can't under stand why we set for example d=0 and proceed ? What relation does this have to gradient descent ?
salehmamdouh1984, in effect we are considering the square of the length of the error vector ||e||^2 as a function f(C,D); Prof Strang shows on the board that in his example f(C,D) = (C+D-1)^2 + (C+2D-2)^2 + (C+3D-2)^2; the intention is to choose C and D to minimise f(C,D). The way to find the minimum is to find the two partial derivatives df/dC and df/dD; we know that there is a stationary point where df/dC=df/dD=0. Prof Strang glosses over the point, but it's not hard to show (by taking second partial derivatives) that the stationary point is a minimum. Of course the whole point is to demonstrate that calculus gets you the same answer you can get much more quickly by looking for e to be orthogonal to the columns of A ==> A'Ax = A'b. Josh.
What I am trying to understand,what is the difference between doing the projection method VS the gradient descent algorithm ?
salehmamdouh1984, isn't the gradient descent algorithm a numerical technique to use when you can't find a minimum any other way? I'm not sure that that's called for when you can go direct to the solution either by projection or by calculus. Josh.
Join our real-time social learning platform and learn together with your friends!