In the method of fitting a straight line by the principle of least squares, why in minimizing the error, the Euclidean norm is used?

Question

jango_in_dtown · Answer

@imqwerty

jango_in_dtown · Answer

@welshfella

holsteremission · Answer

You might find this a useful read: http://stats.stackexchange.com/a/151

jango_in_dtown · Answer

@ganeshie8

jango_in_dtown · Answer

@HolsterEmission cant figure out the answer..

holsteremission · Answer

Can you give some more context? What kind of least squares fitting problems have you run into so far? And how have they been presented to you? As in, were you just given the normal least squares equations or have you also seen the theory behind them? Are you approaching this from a statistical point of view or more of a linear-algebra based approach?

jango_in_dtown · Answer

Fitting a straight line , say y=a+bx and the set of data points (x_i,y_i) are given... Then the error S is to be minimized

jango_in_dtown · Answer

Its in numerical analysis, in statistics too..

holsteremission · Answer

Okay, so your question boils down to why we typically choose minimizing the sum of squared differences (which uses the Euclidean norm) as opposed to the sum of absolute differences. The reason for this is that it makes the error function, which you denote $S$, easier to analyze because it behaves better. $S$ is continuously differentiable when it's defined as a sum of squared differences. The error function defined in terms of absolute differences can also be usable, but is more restrictive. There's a lengthy discussion about the use of the $\ell^2$ norm (Euclidean) vs the $\ell^1$ norm (absolute difference) here that you might find useful: http://www.chioka.in/differences-between-the-l1-norm-and-the-l2-norm-least-absolute-deviations-and-least-squares/ Another point I recall hearing from my old stats professor was that the squared differences allow your analysis of the data to pick up on more details. I believe the particular example went something like this: Suppose you have three data points with residuals $\{-1,0,1\}$. Adding them all together yields $0$, which suggests that whatever model you construct based on the data you gathered perfectly represents the population, but this isn't necessarily the case. Meanwhile, if you were to square the residual and add them, you'd end up with $2$, which may be a better estimate for the population statistic you're trying to measure. Yet another point: the Euclidean norm gives the actual straight-line distance between points in higher dimensions while the $\ell^1$ norm does not (it's a kind of distance, but more complicated than the usual notion of distance), so the Euclidean norm is a slightly more natural choice.