21
Jan 23

## Excel for mass education

### Problem statement

The Covid with its lockdowns has posed a difficult question: how do you teach online and preclude cheating by students? How do you do that efficiently with a large number of students and without lowering the teaching standards? I think the answer depends on what you teach. Using Excel made my course very attractive because many students adore learning Excel functions.

### Suggested solution

Last year I taught Financial Econometrics. The topic was Portfolio Optimization using the Sharpe ratio. The idea was to give the students Excel files with individual data sizes so that they have to do the calculations themselves. Those who tried to obtain a file from another student and send it to me under their own names were easily identified. I punished both the giver and receiver of the file. Some steps for assignment preparation and report checking may be very time consuming if you don’t automate them. In the following list, the starred steps are the ones that may take a lot of time with large groups of students.

Step 1. I download data for several stocks from Yahoo Finance and put them in one Excel file where I have the students’ list (Video 1).

Step 2. For each student I randomly choose the sample size for the chunk of data to be selected from the data I downloaded. The students are required to use the whole sample in their calculations.

Step 3*. For creating individual student files with assignments, I use a Visual Basic macro. It reads a student name, his or her sample size, creates an Excel file, pastes there the appropriate sample and saves the file under that student’s name (Video 2).

Step 4*. In Gmail I prepare messages with individual Excel files. Gmail has an option for scheduling emails (Video 3). Outlook.com also has this feature but it requires too many clicks.

Step 5. The test is administered using MS Teams. In the beginning of the test, I give the necessary oral instructions and post the assignment description (which is common for all students). The emails are scheduled to be sent 10 minutes after the session start. The time for the test is just enough to do calculations in Excel. I cannot control how the students do it nor can I see if they share screens to help each other. But I know that the task is difficult enough, so one needs to be familiar with the material in order to accomplish the task, even when one sees on the screen how somebody else is doing it.

Step 6*. Upon completion of the test, the students email me their files. The messages arrival times are recorded by Gmail. I have to check the files and post the grades (video 4).

### Skills to test

Portfolio Optimization involves the following steps.

a) For each stock one has to find daily rates of return.

b) Using arbitrary initial portfolio shares, the daily rates of return on the portfolio are calculated. I require the students to use matrix multiplication for this, which makes checking their work easier.

c) The daily rates of return on the portfolio are used to find the average return, standard deviation and Sharpe ratio for the portfolio. The fact that after all these calculations the students have to obtain a single number also simplifies verification.

d) Finally, the students have to optimize the portfolio shares using the Solver add-in.

The list above is just an example. The task can be expanded to check the knowledge of other elements of matrix algebra, Econometrics and/or Finance. In one of my assignments, I required my students to run a multiple regression. The Excel add-in called Data Analysis allows one to do that easily but my students were required to do everything using the matrix expression for the OLS estimator and also to report the results using Excel string functions.

To make my job easier, I partially or completely automate time-consuming operations. Arguably, everything can be completely automated using Power Automate promoted by Microsoft. Except for the macro, my home-made solutions are simpler.

### Detailed explanations

How to make Gmail your mailto protocol handler

Video 1. Initial file

Video 2. Creating Excel individual files

Video 3. Scheduling emails

Video 4. How to quickly check students work

Macro for creating files

Sub CreateDataFiles()
'
' This needs a file with student names (column A), block sizes (column C)
' and data to choose data blocks from (columns F through M). All on sheet "block finec"
' It creates files with student names and individual data blocks
' If necessary, change edit whatever you want
' Also can change the range address. R1C5 - upper left corner of the data
' "R" & Size & "C13" - lower right corner of the data
' Size is read off column C

' First select the cells with block sizes and then run the macro

' Files will be created and saved with student names
' Keyboard Shortcut: Ctrl+i
'
Application.ScreenUpdating = False
For Each cell In Selection.Cells

Size = cell.Value
Name = cell.Offset(0, -2).Value

Application.Goto Reference:="R1C5:R" & Size & "C13"
Application.CutCopyMode = False
Selection.Copy
ActiveSheet.Paste
ChDir "C:\Users\Student files"
ActiveWorkbook.SaveAs Filename:= _
"C:\Users\Student files\" & Name & ".xlsx", _
FileFormat:=xlOpenXMLWorkbook, CreateBackup:=False
ActiveWorkbook.Close

Workbooks("Stat 2 Spring 2022 list with emails.xlsm").Activate

Next
End Sub

19
Feb 22

## Distribution of the estimator of the error variance

If you are reading the book by Dougherty: this post is about the distribution of the estimator  $s^2$ defined in Chapter 3.

Consider regression

(1) $y=X\beta +e$

where the deterministic matrix $X$ is of size $n\times k,$ satisfies $\det \left( X^{T}X\right) \neq 0$ (regressors are not collinear) and the error $e$ satisfies

(2) $Ee=0,Var(e)=\sigma ^{2}I$

$\beta$ is estimated by $\hat{\beta}=(X^{T}X)^{-1}X^{T}y.$ Denote $P=X(X^{T}X)^{-1}X^{T},$ $Q=I-P.$ Using (1) we see that $\hat{\beta}=\beta +(X^{T}X)^{-1}X^{T}e$ and the residual $r\equiv y-X\hat{\beta}=Qe.$ $\sigma^{2}$ is estimated by

(3) $s^{2}=\left\Vert r\right\Vert ^{2}/\left( n-k\right) =\left\Vert Qe\right\Vert ^{2}/\left( n-k\right) .$

$Q$ is a projector and has properties which are derived from those of $P$

(4) $Q^{T}=Q,$ $Q^{2}=Q.$

If $\lambda$ is an eigenvalue of $Q,$ then multiplying $Qx=\lambda x$ by $Q$ and using the fact that $x\neq 0$ we get $\lambda ^{2}=\lambda .$ Hence eigenvalues of $Q$ can be only $0$ or $1.$ The equation $tr\left( Q\right) =n-k$
tells us that the number of eigenvalues equal to 1 is $n-k$ and the remaining $k$ are zeros. Let $Q=U\Lambda U^{T}$ be the diagonal representation of $Q.$ Here $U$ is an orthogonal matrix,

(5) $U^{T}U=I,$

and $\Lambda$ is a diagonal matrix with eigenvalues of $Q$ on the main diagonal. We can assume that the first $n-k$ numbers on the diagonal of $Q$ are ones and the others are zeros.

Theorem. Let $e$ be normal. 1) $s^{2}\left( n-k\right) /\sigma ^{2}$ is distributed as $\chi _{n-k}^{2}.$ 2) The estimators $\hat{\beta}$ and $s^{2}$ are independent.

Proof. 1) We have by (4)

(6) $\left\Vert Qe\right\Vert ^{2}=\left( Qe\right) ^{T}Qe=\left( Q^{T}Qe\right) ^{T}e=\left( Qe\right) ^{T}e=\left( U\Lambda U^{T}e\right) ^{T}e=\left( \Lambda U^{T}e\right) ^{T}U^{T}e.$

Denote $S=U^{T}e.$ From (2) and (5)

$ES=0,$ $Var\left( S\right) =EU^{T}ee^{T}U=\sigma ^{2}U^{T}U=\sigma ^{2}I$

and $S$ is normal as a linear transformation of a normal vector. It follows that $S=\sigma z$ where $z$ is a standard normal vector with independent standard normal coordinates $z_{1},...,z_{n}.$ Hence, (6) implies

(7) $\left\Vert Qe\right\Vert ^{2}=\sigma ^{2}\left( \Lambda z\right) ^{T}z=\sigma ^{2}\left( z_{1}^{2}+...+z_{n-k}^{2}\right) =\sigma ^{2}\chi _{n-k}^{2}.$

(3) and (7) prove the first statement.

2) First we note that the vectors $Pe,Qe$ are independent. Since they are normal, their independence follows from

$cov(Pe,Qe)=EPee^{T}Q^{T}=\sigma ^{2}PQ=0.$

It's easy to see that $X^{T}P=X^{T}.$ This allows us to show that $\hat{\beta}$ is a function of $Pe$:

$\hat{\beta}=\beta +(X^{T}X)^{-1}X^{T}e=\beta +(X^{T}X)^{-1}X^{T}Pe.$

Independence of $Pe,Qe$ leads to independence of their functions $\hat{\beta}$ and $s^{2}.$

11
Nov 19

## My presentation at Kazakh National University

Today's talk: “Analysis of variance in the central limit theorem”
The talk is about results, which are a combination of methods of the function theory, functional analysis and probability theory. The intuition underlying the central limit theorem will be described, and the history and place of the results of the author in modern theory will be highlighted.

10
Dec 18

## Distributions derived from normal variables

In the one-dimensional case the economic way to define normal variables is this: define a standard normal variable and then a general normal variable as its linear transformation.

In case of many dimensions, we follow the same idea. Before doing that we state without proofs two useful facts about independence of random variables (real-valued, not vectors).

Theorem 1. Suppose variables $X_1,...,X_n$ have densities $p_1(x_1),...,p_n(x_n).$ Then they are independent if and only if their joint density $p(x_1,...,x_n)$ is a product of individual densities: $p(x_1,...,x_n)=p_1(x_1)...p_n(x_n).$

Theorem 2. If variables $X,Y$ are normal, then they are independent if and only if they are uncorrelated: $cov(X,Y)=0.$

The necessity part (independence implies uncorrelatedness) is trivial.

### Normal vectors

Let $z_1,...,z_n$ be independent standard normal variables. A standard normal variable is defined by its density, so all of $z_i$ have the same density. We achieve independence, according to Theorem 1, by defining their joint density to be a product of individual densities.

Definition 1. A standard normal vector of dimension $n$ is defined by

$z=\left(\begin{array}{c}z_1\\...\\z_n\\ \end{array}\right)$

Properties$Ez=0$ because all of $z_i$ have means zero. Further, $cov(z_i,z_j)=0$ for $i\neq j$by Theorem 2 and variance of a standard normal is 1. Therefore, from the expression for variance of a vector we see that $Var(z)=I.$

Definition 2. For a matrix $A$ and vector $\mu$ of compatible dimensions a normal vector is defined by $X=Az+\mu.$

Properties$EX=AEz+\mu=\mu$ and

$Var(X)=Var(Az)=E(Az)(Az)^T=AEzz^TA^T=AIA^T=AA^T$

(recall that variance of a vector is always nonnegative).

### Distributions derived from normal variables

In the definitions of standard distributions (chi square, t distribution and F distribution) there is no reference to any sample data. Unlike statistics, which by definition are functions of sample data, these and other standard distributions are theoretical constructs. Statistics are developed in such a way as to have a distribution equal or asymptotically equal to one of standard distributions. This allows practitioners to use tables developed for standard distributions.

Exercise 1. Prove that $\chi_n^2/n$ converges to 1 in probability.

Proof. For a standard normal $z$ we have $Ez^2=1$ and $Var(z^2)=2$ (both properties can be verified in Mathematica). Hence, $E\chi_n^2/n=1$ and

$Var(\chi_n^2/n)=\sum_iVar(z_i^2)/n^2=2/n\rightarrow 0.$

Now the statement follows from the simple form of the law of large numbers.

Exercise 1 implies that for large $n$ the t distribution is close to a standard normal.

30
Nov 18

## Application: estimating sigma squared

Consider multiple regression

(1) $y=X\beta +e$

where

(a) the regressors are assumed deterministic, (b) the number of regressors $k$ is smaller than the number of observations $n,$ (c) the regressors are linearly independent, $\det (X^TX)\neq 0,$ and (d) the errors are homoscedastic and uncorrelated,

(2) $Var(e)=\sigma^2I.$

Usually students remember that $\beta$ should be estimated and don't pay attention to estimation of $\sigma^2.$ Partly this is because $\sigma^2$ does not appear in the regression and partly because the result on estimation of error variance is more complex than the result on the OLS estimator of $\beta .$

Definition 1. Let $\hat{\beta}=(X^TX)^{-1}X^Ty$ be the OLS estimator of $\beta$. $\hat{y}=X\hat{\beta}$ is called the fitted value and $r=y-\hat{y}$ is called the residual.

Exercise 1. Using the projectors $P=X(X^TX)^{-1}X^T$ and $Q=I-P$ show that $\hat{y}=Py$ and $r=Qe.$

Proof. The first equation is obvious. From the model we have $r=X\beta+e-P(X\beta +e).$ Since $PX\beta=X\beta,$ we have further $r=e-Pe=Qe.$

Definition 2. The OLS estimator of $\sigma^2$ is defined by $s^2=\Vert r\Vert^2/(n-k).$

Exercise 2. Prove that $s^2$ is unbiased: $Es^2=\sigma^2.$

Proof. Using projector properties we have

$\Vert r\Vert^2=(Qe)^TQe=e^TQ^TQe=e^TQe.$

Expectations of type $Ee^Te$ and $Eee^T$ would be easy to find from (2). However, we need to find $Ee^TQe$ where there is an obstructing $Q.$ See how this difficulty is overcome in the next calculation.

$E\Vert r\Vert^2=Ee^TQe$ ($e^TQe$ is a scalar, so its trace is equal to itself)

$=Etr(e^TQe)$ (applying trace-commuting)

$=Etr(Qee^T)$ (the regressors and hence $Q$ are deterministic, so we can use linearity of $E$)

$=tr(QEee^T)$ (applying (2)) $=\sigma^2tr(Q).$

$tr(P)=k$ because this is the dimension of the image of $P.$ Therefore $tr(Q)=n-k.$ Thus, $E\Vert r\Vert^2=\sigma^2(n-k)$ and $Es^2=\sigma^2.$

18
Nov 18

## Application: Ordinary Least Squares estimator

### Generalized Pythagoras theorem

Exercise 1. Let $P$ be a projector and denote $Q=I-P.$ Then $\Vert x\Vert^2=\Vert Px\Vert^2+\Vert Qx\Vert^2.$

Proof. By the scalar product properties

$\Vert x\Vert^2=\Vert Px+Qx\Vert^2=\Vert Px\Vert^2+2(Px)\cdot (Qx)+\Vert Qx\Vert^2.$

$P$ is symmetric and idempotent, so

$(Px)\cdot (Qx)=(Px)\cdot[(I-P)x]=x\cdot[(P-P^2)x]=0.$

This proves the statement.

### Ordinary Least Squares (OLS) estimator derivation

Problem statement. A vector $y\in R^n$ (the dependent vector) and vectors $x^{(1)},...,x^{(k)}\in R^n$ (independent vectors or regressors) are given. The OLS estimator is defined as that vector $\beta \in R^k$ which minimizes the total sum of squares $TSS=\sum_{i=1}^n(y_i-x^{(1)}\beta_1-...-x^{(k)}\beta_k)^2.$

Denoting $X=(x^{(1)},...,x^{(k)}),$ we see that $TSS=\Vert y-X\beta\Vert^2$ and that finding the OLS estimator means approximating $y$ with vectors from the image $\text{Img}X.$ $x^{(1)},...,x^{(k)}$ should be linearly independent, otherwise the solution will not be unique.

Assumption. $x^{(1)},...,x^{(k)}$ are linearly independent. This, in particular, implies that $k\leq n.$

Exercise 2. Show that the OLS estimator is

(2) $\hat{\beta}=(X^TX)^{-1}X^Ty.$

Proof. By Exercise 1 we can use $P=X(X^TX)^{-1}X^T.$ Since $X\beta$ belongs to the image of $P,$ $P$ doesn't change it: $X\beta=PX\beta.$ Denoting also $Q=I-P$ we have

$\Vert y-X\beta\Vert^2=\Vert y-Py+Py-X\beta\Vert^2$

$=\Vert Qy+P(y-X\beta)\Vert^2$ (by Exercise 1)

$=\Vert Qy\Vert^2+\Vert P(y-X\beta)\Vert^2.$

This shows that $\Vert Qy\Vert^2$ is a lower bound for $\Vert y-X\beta\Vert^2.$ This lower bound is achieved when the second term is made zero. From

$P(y-X\beta)=Py-X\beta =X(X^TX)^{-1}X^Ty-X\beta=X[(X^TX)^{-1}X^Ty-\beta]$

we see that the second term is zero if $\beta$ satisfies (2).

Usually the above derivation is applied to the dependent vector of the form $y=X\beta+e$ where $e$ is a random vector with mean zero. But it holds without this assumption. See also simplified derivation of the OLS estimator.

8
May 18

## Different faces of vector variance: again visualization helps

In the previous post we defined variance of a column vector $X$ with $n$ components by

$V(X)=E(X-EX)(X-EX)^T.$

In terms of elements this is the same as:

(1) $V(X)=\left(\begin{array}{cccc}V(X_1)&Cov(X_1,X_2)&...&Cov(X_1,X_n)\\Cov(X_2,X_1)&V(X_2)&...&Cov(X_2,X_n)\\...&...&...&...\\Cov(X_n,X_1)&Cov(X_n,X_2)&...&V(X_n)\end{array}\right).$

## So why knowing the structure of this matrix is so important?

Let $X_1,...,X_n$ be random variables and let $a_1,...,a_n$ be numbers. In the derivation of the variance of the slope estimator for simple regression we have to deal with the expression of type

(2) $V\left(\sum_{i=1}^na_iX_i\right).$

Question 1. How do you multiply a sum by a sum? I mean, how do you use summation signs to find the product $\left(\sum_{i=1}^na_i\right)\left(\sum_{i=1}^nb_i\right)$?

Answer 1. Whenever you have problems with summation signs, try to do without them. The product

$\left(a_1+...+a_n\right)\left(b_1+...+b_n\right)=a_1b_1+...+a_1b_n+...+a_nb_1+...+a_nb_n$

should contain ALL products $a_ib_j.$ Again, a matrix visualization will help:

$\left(\begin{array}{ccc}a_1b_1&...&a_1b_n\\...&...&...\\a_nb_1&...&a_nb_n\end{array}\right).$

The product we are looking for should contain all elements of this matrix. So the answer is

(3) $\left(\sum_{i=1}^na_i\right)\left(\sum_{i=1}^nb_i\right)=\sum_{i=1}^n\sum_{j=1}^na_ib_j.$

Formally, we can write $\sum_{i=1}^nb_i=\sum_{j=1}^nb_j$ (the sum does not depend on the index of summation, this is another point many students don't understand) and then perform the multiplication in (3).

Question 2. What is the expression for (2) in terms of covariances of components?

Answer 2. If you understand Answer 1 and know the relationship between variances and covariances, it should be clear that

(4) $V\left(\sum_{i=1}^na_iX_i\right)=Cov(\sum_{i=1}^na_iX_i,\sum_{i=1}^na_iX_i)$

$=Cov(\sum_{i=1}^na_iX_i,\sum_{j=1}^na_jX_j)=\sum_{i=1}^n\sum_{j=1}^na_ia_jCov(X_i,X_j).$

Question 3. In light of (1), separate variances from covariances in (4).

Answer 3. When $i=j,$ we have $Cov(X_i,X_j)=V(X_i),$ which are diagonal elements of (1). Otherwise, for $i\neq j$ we get off-diagonal elements of (1). So the answer is

(5) $V\left(\sum_{i=1}^na_iX_i\right)=\sum_{i=1}^na_i^2V(X_i)+\sum_{i\neq j}a_ia_jCov(X_i,X_j).$

Once again, in the first sum on the right we have only variances. In the second sum, the indices $i,j$ are assumed to run from $1$ to $n$, excluding the diagonal $i=j.$

Corollary. If $X_{i}$ are uncorrelated, then the second sum in (5) disappears:

(6) $V\left(\sum_{i=1}^na_iX_i\right)=\sum_{i=1}^na_i^2V(X_i).$

This fact has been used (with a slightly different explanation) in the derivation of the variance of the slope estimator for simple regression.

Question 4. Note that the matrix (1) is symmetric (elements above the main diagonal equal their mirror siblings below that diagonal). This means that some terms in the second sum on the right of (5) are repeated twice. If you group equal terms in (5), what do you get?

Answer 4. The idea is to write

$a_ia_jCov(X_i,X_j)+a_ia_jCov(X_j,X_i)=2a_ia_jCov(X_i,X_j),$

that is, to join equal elements above and below the main diagonal in (1). For this, you need to figure out how to write a sum of the elements that are above the main diagonal. Make a bigger version of (1) (with more off-diagonal elements) to see that the elements that are above the main diagonal are listed in the sum $\sum_{i=1}^{n-1}\sum_{j=i+1}^n.$ This sum can also be written as $\sum_{1\leq i Hence, (5) is the same as

(7) $V\left(\sum_{i=1}^na_iX_i\right)=\sum_{i=1}^na_i^2V(X_i)+2\sum_{i=1}^{n-1}\sum_{j=i+1}^na_ia_jCov(X_i,X_j)$

$=\sum_{i=1}^na_i^2V(X_i)+2\sum_{1\leq i

Unlike (6), this equation is applicable when there is autocorrelation.

6
Oct 17

## Significance level and power of test

In this post we discuss several interrelated concepts: null and alternative hypotheses, type I and type II errors and their probabilities. Review the definitions of a sample space and elementary events and that of a conditional probability.

## Type I and Type II errors

Regarding the true state of nature we assume two mutually exclusive possibilities: the null hypothesis (like the suspect is guilty) and alternative hypothesis (the suspect is innocent). It's up to us what to call the null and what to call the alternative. However, the statistical procedures are not symmetric: it's easier to measure the probability of rejecting the null when it is true than other involved probabilities. This is why what is desirable to prove is usually designated as the alternative.

Usually in books you can see the following table.

 Decision taken Fail to reject null Reject null State of nature Null is true Correct decision Type I error Null is false Type II error Correct decision

This table is not good enough because there is no link to probabilities. The next video does fill in the blanks.

Video. Significance level and power of test

## Significance level and power of test

The conclusion from the video is that

$\frac{P(T\bigcap R)}{P(T)}=P(R|T)=P\text{(Type I error)=significance level}$ $\frac{P(F\bigcap R)}{P(F)}=P(R|F)=P\text{(Correctly rejecting false null)=Power}$
11
Aug 17

## Violations of classical assumptions

This will be a simple post explaining the common observation that "in Economics, variability of many variables is proportional to those variables". Make sure to review the assumptions; they tend to slip from memory. We consider the simple regression

(1) $y_i=a+bx_i+e_i.$

One of classical assumptions is

Homoscedasticity. All errors have the same variances$Var(e_i)=\sigma^2$ for all $i$.

We discuss its opposite, which is

Heteroscedasticity. Not all errors have the same variance. It would be wrong to write it as $Var(e_i)\ne\sigma^2$ for all $i$ (which means that all errors have variance different from $\sigma^2$). You can write that not all $Var(e_i)$ are the same but it's better to use the verbal definition.

Remark about Video 1. The dashed lines can represent mean consumption. Then the fact that variation of a variable grows with its level becomes more obvious.

Video 1. Case for heteroscedasticity

Figure 1. Illustration from Dougherty: as x increases, variance of the error term increases

Homoscedasticity was used in the derivation of the OLS estimator variance; under heteroscedasticity that expression is no longer valid. There are other implications, which will be discussed later.

Companies example. The Samsung Galaxy Note 7 battery fires and explosions that caused two recalls cost the smartphone maker at least $5 billion. There is no way a small company could have such losses. GDP example. The error in measuring US GDP is on the order of$200 bln, which is comparable to the Kazakhstan GDP. However, the standard deviation of the ratio error/GDP seems to be about the same across countries, if the underground economy is not too big. Often the assumption that the standard deviation of the regression error is proportional to one of regressors is plausible.

To see if the regression error is heteroscedastic, you can look at the graph of the residuals or use statistical tests.

7
Aug 17

## Violations of classical assumptions

This is a large topic which requires several posts or several book chapters. During a conference in Sweden in 2010, a Swedish statistician asked me: "What is Econometrics, anyway? What tools does it use?" I said: "Among others, it uses linear regression." He said: "But linear regression is a general statistical tool, why do they say it's a part of Econometrics?" My answer was: "Yes, it's a general tool but the name Econometrics emphasizes that the motivation for its applications lies in Economics".

Both classical assumptions and their violations should be studied with this point in mind: What is the Economics and Math behind each assumption?

## Violations of the first three assumptions

We consider the simple regression

(1) $y_i=a+bx_i+e_i$

Make sure to review the assumptions. Their numbering and names sometimes are different from what Dougherty's book has. In particular, most of the time I omit the following assumption:

A6. The model is linear in parameters and correctly specified.

When it is not linear in parameters, you can think of nonlinear alternatives. Instead of saying "correctly specified" I say "true model" when a "wrong model" is available.

A1. What if the existence condition is violated? If variance of the regressor is zero, the OLS estimator does not exist. The fitted line is supposed to be vertical, and you can regress $x$ on $y$. Violation of the existence condition in case of multiple regression leads to multicollinearity, and that's where economic considerations are important.

A2. The convenience condition is called so because when it is violated, that is, the regressor is stochastic, there are ways to deal with this problem:  finite-sample theory and large-sample theory.

A3. What if the errors in (1) have means different from zero? This question can be divided in two: 1) the means of the errors are the same: $Ee_i=c\ne 0$ for all $i$ and 2) the means are different. Read the post about centering and see if you can come up with the answer for the first question. The means may be different because of omission of a relevant variable (can you do the math?). In the absence of data on such a variable, there is nothing you can do.

Violations of A4 and A5 will be treated later.