How to evaluate the suitability and goodness of a fit?

Question

kirbykirby · Answer

Are you doing a Chi-Square Goodness of Fit test?

kirbykirby · Answer

or is this just a general question for any kind of goodness of fit test?

anonymous · Answer

any kind on a scatter plot

anonymous · Answer

line of best fit

kirbykirby · Answer

There are various ways to make this assessment. 
- You can look at the $R^2$ value and see if it's close to 1, indicating a strong positive linear relationship. If you are dealing with multiple linear regression, then you should look at $R^2 \text{ adjusted}$ which takes into account the artificial increase in the $R^2$ result due to the increase in the number of explanatory variates used in the model. You can't be too reliant on R^2 though  since it does not say anything if there is collinearity or if there aren't enough of explanatory variables to adequately describe the model (This is quite tricky though to know in general)

-Residual plot (vs. explanatory variables) checking is important to do. You should do this to make sure that the effect of the explanatory variables on the response is in fact linear. And, again, it can be an indication of if there are missing explanatory variables. When you do the plot, you should observe non-random patterns . If there is a trend, you may need to add higher-order terms to the model (like x^2, x^3, etc..) 

You should also check for the assumption of constant variance. You can do this by plotting residuals against the fitted values. If a pattern emerges, then it is likely that the assumption has been violated. 

-You should also check for the normality of the errors. This can be done by a QQ-plot (quantile-quantile plot). The residuals are plotted in order against the standard normal (N(0,1)) order statistics. If the normality assumption is good, then the plot should be approximately a straight line. 

-Since independence is assumed in these models, you can verify if there is correlation among the residuals.  This is harder to do by inspection though... You may expect that if you have data that are collected close in time or space, for which there is a reason for there to be correlation (like weather on a day-by-day basis), then scrutinizing the data collected should be done. If you are doing confidence intervals for the estimates of your regression parameters, then having correlation can be problematic as it will affect the variance (and standard deviation) of these. More analysis though on this would be the topic of Time Series though. (You could also maybe check about the Durbin-Watson test for this). 

-Also, it's possible that you may have to do some transformation of the model (like a logarithmic transformation) to make the model fit better (you can check Box-Cox transformations). 

-Also, a note on model building: You can do hypothesis tests for the regression parameters, where the null hypothesis: is a (or many) regression parameter(s) = 0 and then you can perform an F-test by finding the SSE of the model with no constraints, then fit the regression model subject to the constraints and calculate the new SSE. Then you can find the F ratio and determine if your calculated value exceed the critical value or not to reject or fail to reject the hypothesis of which model fits the data best.