7
Aug 17

## Violations of classical assumptions

This is a large topic which requires several posts or several book chapters. During a conference in Sweden in 2010, a Swedish statistician asked me: "What is Econometrics, anyway? What tools does it use?" I said: "Among others, it uses linear regression." He said: "But linear regression is a general statistical tool, why do they say it's a part of Econometrics?" My answer was: "Yes, it's a general tool but the name Econometrics emphasizes that the motivation for its applications lies in Economics".

Both classical assumptions and their violations should be studied with this point in mind: What is the Economics and Math behind each assumption?

## Violations of the first three assumptions

We consider the simple regression

(1) $y_i=a+bx_i+e_i$

Make sure to review the assumptions. Their numbering and names sometimes are different from what Dougherty's book has. In particular, most of the time I omit the following assumption:

A6. The model is linear in parameters and correctly specified.

When it is not linear in parameters, you can think of nonlinear alternatives. Instead of saying "correctly specified" I say "true model" when a "wrong model" is available.

A1. What if the existence condition is violated? If variance of the regressor is zero, the OLS estimator does not exist. The fitted line is supposed to be vertical, and you can regress $x$ on $y$. Violation of the existence condition in case of multiple regression leads to multicollinearity, and that's where economic considerations are important.

A2. The convenience condition is called so because when it is violated, that is, the regressor is stochastic, there are ways to deal with this problem:  finite-sample theory and large-sample theory.

A3. What if the errors in (1) have means different from zero? This question can be divided in two: 1) the means of the errors are the same: $Ee_i=c\ne 0$ for all $i$ and 2) the means are different. Read the post about centering and see if you can come up with the answer for the first question. The means may be different because of omission of a relevant variable (can you do the math?). In the absence of data on such a variable, there is nothing you can do.

Violations of A4 and A5 will be treated later.