Hello, can anyone tell me something about assumptions of a simple linear model? What happens if any of them does not hold? Thanks!
A simple, linear model has the following characteristics: 1. Involves the FIRST power of x: x^1 or just x 2. Features a constant slope which could be negative, zero or positive, or undefined 3. Has a y-intercept: a point at which the graph of this function crosses the y-axis 4. When sketched, is a straight line These are not really assumptions, but rather characteristics. A model that involves any power of x other than the first power (x) is not a linear function. And so on...
Revision: 3. Often, but not always, has a y-intercept: a point at which the graph of this function crosses the y-axis. Example: The vertical line x=2 does not have a vertical intercept, but only a horizontal one, (2,0).
Thank you, but it is clear to me what a simple linear model is. What I am looking for is the answer on the questions: 1. What if Xi's (points of a data sample) are not known constants, but random variables? 2. What if mean of errors is not zero? 3. What if variance of each error is not σ^2? 4. What if errors are not mutually independent random variables, i.e. that covariance between any two errors is not zero? 5. What if errors don't have normal distribution?
This is fairly lengthy to write about on Open Study, but I found a website that addresses most of those issues here: http://www.basic.northwestern.edu/statguidefiles/linreg_ass_viol.html I hope it helps!
Thanks! :)
The link @kirbykirby points to is quite helpful. Here are a few thoughts of my own. 1. If you're using a linear model to *predict* an outcome (say, what is the expected score on a Spring math test given a score on the Fall math test and Fall math grades), you generally have fewer assumptions to meet. Basically, criterion #2 above (sum or mean of errors equaling zero) is the key, and that's as much a definition as an assumption. When the mean of errors is not zero you have a "bias" in your estimate. #3, #4, and #5 won't affect the point estimate of a predicted value. Nor should #1, as long as by "random" we mean random and measured without error. Note, though, that #3 and #4 (and I suspect #5) will affect the standard error of your prediction. 2. If you care about the *coefficients* of your linear model, then #3, #4, and #5 all come into play. The point estimates and standard errors of the coefficients depends on some distributional assumptions. Basically, the assumptions are a bit relaxed if all you care about is prediction. For interpreting coefficients, the assumptions matter more.
Join our real-time social learning platform and learn together with your friends!