This is a large topic which requires several posts or several book chapters. During a conference in Sweden in 2010, a Swedish statistician asked me: "What is Econometrics, anyway? What tools does it use?" I said: "Among others, it uses linear regression." He said: "But linear regression is a general statistical tool, why do they say it's a part of Econometrics?" My answer was: "Yes, it's a general tool but the name Econometrics emphasizes that the motivation for its applications lies in Economics".
Both classical assumptions and their violations should be studied with this point in mind: What is the Economics and Math behind each assumption?
A6. The model is linear in parameters and correctly specified.
When it is not linear in parameters, you can think of nonlinear alternatives. Instead of saying "correctly specified" I say "true model" when a "wrong model" is available.
A1. What if the existence condition is violated? If variance of the regressor is zero, the OLS estimator does not exist. The fitted line is supposed to be vertical, and you can regress on . Violation of the existence condition in case of multiple regression leads to multicollinearity, and that's where economic considerations are important.
A2. The convenience condition is called so because when it is violated, that is, the regressor is stochastic, there are ways to deal with this problem: finite-sample theory and large-sample theory.
A3. What if the errors in (1) have means different from zero? This question can be divided in two: 1) the means of the errors are the same: for all and 2) the means are different. Read the post about centering and see if you can come up with the answer for the first question. The means may be different because of omission of a relevant variable (can you do the math?). In the absence of data on such a variable, there is nothing you can do.
Here we derived the OLS estimators of the intercept and slope:
A1. Existence condition. Since division by zero is not allowed, for (2) to exist we require . If this condition is not satisfied, then there is no variance in and all observed points are on the vertical line.
A2. Convenience condition. The regressoris deterministic. This condition is imposed to be able to apply the properties of expectation, see equation (7) in this post. The time trend and dummy variables are examples of deterministic regressors. However, most real-life regressors are stochastic. Modifying the theory in order to cover stochastic regressors is the subject of two posts: finite-sample theory and large-sample theory.
A3. Unbiasedness condition. . This is the main assumption that makes sure that OLS estimators are unbiased, see equation (7) in this post.
Unbiasedness is not enough
Unbiasedness characterizes the quality of an estimator, see the intuitive explanation. Unfortunately, unbiasedness is not enough to choose the best estimator because of nonuniqueness: usually, if there is one unbiased estimator of a parameter, then there are infinitely many unbiased estimators of the same parameter. For example, we know that the sample mean unbiasedly estimates the population mean . Since ( is the first observation), we can easily construct an infinite family of unbiased estimators , assuming . Indeed, using linearity of expectation .
Variance is another measure of an estimator quality: to have a lower spread of estimator values, among competing estimators we choose the one which has the lowest variance. Knowing the estimator variance allows us to find the z-score and use statistical tables.
Slope estimator variance
It is not difficult to find the variance of the slope estimator using representation (6) derived here:
Don't try to apply directly the definition of variance at this point, because there will be a square of a sum, which leads to a double sum upon squaring. We need two new assumptions.
A4. Uncorrelatedness of errors. Assume that for all (errors from different equations (1) are uncorrelated). Note that because of the unbiasedness condition, this assumption is equivalent to for all . This assumption is likely to be satisfied if we observe consumption patterns of unrelated individuals.
A5. Homoscedasticity. All errors have the same variances: for all . Again, because of the unbiasedness condition, this assumption is equivalent to for all .
Now we can derive the variance expression, using properties from this post:
(dropping a constant doesn't affect variance)
(for uncorrelated variables, variance is additive)
(variance is homogeneous of degree 2)
(using the notation of sample variance)
Note that canceling out two variances in the last line is obvious. It is not so obvious for some if instead of the short notation for variances you use summation signs. The case of the intercept variance is left as an exercise.
The above assumptions A1-A5 are called classical. It is necessary to remember their role in derivations because a considerable part of Econometrics is devoted to deviations from classical assumptions. Once you have a certain assumption violated, you should expect the corresponding estimator property invalidated. For example, if , you should expect the estimators to be biased. If any of A4-A5 is not true, the formula we have derived
will not hold. Besides, the Gauss-Markov theorem that the OLS estimators are efficient will not hold (this will be discussed later). The pair A4-A5 can be called an efficiency condition.