Hello, can anyone tell me something about assumptions of a simple linear model? What happens if any of them does not hold? Thanks!

Question

mathmale · Answer

A simple, linear model has the following characteristics:
1.  Involves the FIRST power of x:  x^1 or just x
2.  Features a constant slope which could be negative, zero or positive, or undefined
3.  Has a y-intercept:  a point at which the graph of this function crosses the y-axis
4.  When sketched, is a straight line
These are not really assumptions, but rather characteristics.

A model that involves any power of x other than the first power (x) is not a linear function.
And so on...

mathmale · Answer

Revision:

3.  Often, but not always, has a y-intercept:  a point at which the graph of this function crosses the y-axis.  Example:  The vertical line x=2 does not have a vertical intercept, but only a horizontal one, (2,0).

gorica · Answer

Thank you, but it is clear to me what a simple linear model is. 

What I am looking for is the answer on the questions:
1. What if Xi's (points of a data sample) are not known constants, but random variables?
2. What if mean of errors is not zero?
3. What if variance of each error is not σ^2?
4. What if errors are not mutually independent random variables, i.e. that covariance between any two errors is not zero?
5. What if errors don't have normal distribution?

kirbykirby · Answer

This is fairly lengthy to write about on Open Study, but I found a website that addresses most of those issues here: http://www.basic.northwestern.edu/statguidefiles/linreg_ass_viol.html I hope it helps!

gorica · Answer

Thanks! :)

anonymous · Answer

The link @kirbykirby points to is quite helpful. Here are a few thoughts of my own.

1. If you're using a linear model to *predict* an outcome (say, what is the expected score on a Spring math test given a score on the Fall math test and Fall math grades), you generally have fewer assumptions to meet. Basically, criterion #2 above (sum or mean of errors equaling zero) is the key, and that's as much a definition as an assumption. When the mean of errors is not zero you have a "bias" in your estimate. #3, #4, and #5 won't affect the point estimate of a predicted value. Nor should #1, as long as by "random" we mean random and measured without error. Note, though, that #3 and #4 (and I suspect #5) will affect the standard error of your prediction.

2. If you care about the *coefficients* of your linear model, then #3, #4, and #5 all come into play. The point estimates and standard errors of the coefficients depends on some distributional assumptions.

Basically, the assumptions are a bit relaxed if all you care about is prediction. For interpreting coefficients, the assumptions matter more.