Search Results for linearity of covariance

3
Nov 16

Properties of covariance

Wikipedia says: The magnitude of the covariance is not easy to interpret. I add: We keep the covariance around mainly for its algebraic properties. It deserves studying because it appears in two important formulas: correlation coefficient and slope estimator in simple regression (see derivation, simplified derivation and proof of unbiasedness).

Definition. For two random variables X,Y their covariance is defined by

Cov (X,Y) = E(X - EX)(Y - EY)

(it's the mean value of the product of the deviations of two variables from their respective means).

Properties of covariance

Property 1. Linearity. Covariance is linear in the first argument when the second argument is fixed: for any random variables X,Y,Z and numbers a,b one has
(1) Cov (aX + bY,Z) = aCov(X,Z) + bCov (Y,Z).
Proof. We start by writing out the left side of Equation (1):
Cov(aX + bY,Z)=E[(aX + bY)-E(aX + bY)](Z-EZ)
(using linearity of means)
= E(aX + bY - aEX - bEY)(Z - EZ)
(collecting similar terms)
= E[a(X - EX) + b(Y - EY)](Z - EZ)
(distributing (Z - EZ))
= E[a(X - EX)(Z - EZ) + b(Y - EY)(Z - EZ)]
(using linearity of means)
= aE(X - EX)(Z - EZ) + bE(Y - EY)(Z - EZ)
= aCov(X,Z) + bCov(Y,Z).

Exercise. Covariance is also linear in the second argument when the first argument is fixed. Write out and prove this property. You can notice the importance of using parentheses and brackets.

Property 2. Shortcut for covariance: Cov(X,Y) = EXY - (EX)(EY).
ProofCov(X,Y)= E(X - EX)(Y - EY)
(multiplying out)
= E[XY - X(EY) - (EX)Y + (EX)(EY)]
(EX,EY are constants; use linearity)
=EXY-(EX)(EY)-(EX)(EY)+(EX)(EY)=EXY-(EX)(EY).

Definition. Random variables X,Y are called uncorrelated if Cov(X,Y) = 0.

Uncorrelatedness is close to independence, so the intuition is the same: one variable does not influence the other. You can also say that there is no statistical relationship between uncorrelated variables. The mathematical side is not the same: uncorrelatedness is a more general property than independence.

Property 3. Independent variables are uncorrelated: if X,Y are independent, then Cov(X,Y) = 0.
Proof. By the shortcut for covariance and multiplicativity of means for independent variables we have Cov(X,Y) = EXY - (EX)(EY) = 0.

Property 4. Correlation with a constant. Any random variable is uncorrelated with any constant: Cov(X,c) = E(X - EX)(c - Ec) = 0.

Property 5. Symmetry. Covariance is a symmetric function of its arguments: Cov(X,Y)=Cov(Y,X). This is obvious.

Property 6. Relationship between covariance and variance:

Cov(X,X)=E(X-EX)(X-EX)=Var(X).

« Previous Entries  

Search Results for linearity of covariance

17
Mar 19

AP Statistics the Genghis Khan way 2

AP Statistics the Genghis Khan way 2

Last semester I tried to explain theory through numerical examples. The results were terrible. Even the best students didn't stand up to my expectations. The midterm grades were so low that I did something I had never done before: I allowed my students to write an analysis of the midterm at home. Those who were able to verbally articulate the answers to me received a bonus that allowed them to pass the semester.

This semester I made a U-turn. I announced that in the first half of the semester we will concentrate on theory and we followed this methodology. Out of 35 students, 20 significantly improved their performance and 15 remained where they were.

Midterm exam, version 1

1. General density definition (6 points)

a. Define the density p_X of a random variable X. Draw the density of heights of adults, making simplifying assumptions if necessary. Don't forget to label the axes.

b. According to your plot, how much is the integral \int_{-\infty}^0p_X(t)dt? Explain.

c. Why the density cannot be negative?

d. Why the total area under the density curve should be 1?

e. Where are basketball players on your graph? Write down the corresponding expression for probability.

f. Where are dwarfs on your graph? Write down the corresponding expression for probability.

This question is about the interval formula. In each case students have to write the equation for the probability and the corresponding integral of the density. At this level, I don't talk about the distribution function and introduce the density by the interval formula.

2. Properties of means (8 points)

a. Define a discrete random variable and its mean.

b. Define linear operations with random variables.

c. Prove linearity of means.

d. Prove additivity and homogeneity of means.

e. How much is the mean of a constant?

f. Using induction, derive the linearity of means for the case of n variables from the case of two variables (3 points).

3. Covariance properties (6 points)

a. Derive linearity of covariance in the first argument when the second is fixed.

b. How much is covariance if one of its arguments is a constant?

c. What is the link between variance and covariance? If you know one of these functions, can you find the other (there should be two answers)? (4 points)

4. Standard normal variable (6 points)

a. Define the density p_z(t) of a standard normal.

b. Why is the function p_z(t) even? Illustrate this fact on the plot.

c. Why is the function f(t)=tp_z(t) odd? Illustrate this fact on the plot.

d. Justify the equation Ez=0.

e. Why is V(z)=1?

f. Let t>0. Show on the same plot areas corresponding to the probabilities A_1=P(0<z<t), A_2=P(z>t), A_3=P(z<-t), A_4=P(-t<z<0). Write down the relationships between A_1,...,A_4.

5. General normal variable (3 points)

a. Define a general normal variable X.

b. Use this definition to find the mean and variance of X.

c. Using part b, on the same plot graph the density of the standard normal and of a general normal with parameters \sigma =2, \mu =3.

Midterm exam, version 2

1. General density definition (6 points)

a. Define the density p_X of a random variable X. Draw the density of work experience of adults, making simplifying assumptions if necessary. Don't forget to label the axes.

b. According to your plot, how much is the integral \int_{-\infty}^0p_X(t)dt? Explain.

c. Why the density cannot be negative?

d. Why the total area under the density curve should be 1?

e. Where are retired people on your graph? Write down the corresponding expression for probability.

f. Where are young people (up to 25 years old) on your graph? Write down the corresponding expression for probability.

2. Variance properties (8 points)

a. Define variance of a random variable. Why is it non-negative?

b. Define the formula for variance of a linear combination of two variables.

c. How much is variance of a constant?

d. What is the formula for variance of a sum? What do we call homogeneity of variance?

e. What is larger: V(X+Y) or V(X-Y)? (2 points)

f. One investor has 100 shares of Apple, another - 200 shares. Which investor's portfolio has larger variability? (2 points)

3. Poisson distribution (6 points)

a. Write down the Taylor expansion and explain the idea. How are the Taylor coefficients found?

b. Use the Taylor series for the exponential function to define the Poisson distribution.

c. Find the mean of the Poisson distribution. What is the interpretation of the parameter \lambda in practice?

4. Standard normal variable (6 points)

a. Define the density p_z(t) of a standard normal.

b. Why is the function p_z(t) even? Illustrate this fact on the plot.

c. Why is the function f(t)=tp_z(t) odd? Illustrate this fact on the plot.

d. Justify the equation Ez=0.

e. Why is V(z)=1?

f. Let t>0. Show on the same plot areas corresponding to the probabilities A_1=P(0<z<t), A_2=P(z>t), A_{3}=P(z<-t), A_4=P(-t<z<0). Write down the relationships between A_{1},...,A_{4}.

5. General normal variable (3 points)

a. Define a general normal variable X.

b. Use this definition to find the mean and variance of X.

c. Using part b, on the same plot graph the density of the standard normal and of a general normal with parameters \sigma =2, \mu =3.

« Previous Entries  

Search Results for linearity of covariance

11
Feb 17

Gauss-Markov theorem

The Gauss-Markov theorem states that the OLS estimator is the most efficient. Without algebra, you cannot make a single step further, whether it is the precise theoretical statement or an application.

Why do we care about linearity?

The concept of linearity has been repeated many times in my posts. Here we have to start from scratch, to apply it to estimators.

The slope in simple regression

(1) y_i=a+bx_i+e_i

can be estimated by

\hat{b}(y,x)=\frac{Cov_u(y,x)}{Var_u(x)}.

Note that the notation makes explicit the dependence of the estimator on x,y. Imagine that we have two sets of observations: (y_1^{(1)},x_1),...,(y_n^{(1)},x_n) and (y_1^{(2)},x_1),...,(y_n^{(2)},x_n) (the x coordinates are the same but the y coordinates are different). In addition, the regressor is deterministic. The x's could be spatial units and the y's temperature measurements at these units at two different moments.

Definition. We say that \hat{b}(y,x) is linear with respect to y if for any two vectors y^{(i)}= (y_1^{(i)},...,y_n^{(i)}), i=1,2, and numbers c,d we have

\hat{b}(cy^{(1)}+dy^{(2)},x)=c\hat{b}(y^{(1)},x)+d\hat{b}(y^{(2)},x).

This definition is quite similar to that of linearity of means. Linearity of the estimator with respect to y easily follows from linearity of covariance

\hat{b}(cy^{(1)}+dy^{(2)},x)=\frac{Cov_u(cy^{(1)}+dy^{(2)},x)}{Var_u(x)}=c\hat{b}(y^{(1)},x)+d\hat{b}(y^{(2)},x).

In addition to knowing how to establish linearity, it's a good idea to be able to see when something is not linear. Recall that linearity implies homogeneity of degree 1. Hence, if something is not homogeneous of degree 1, it cannot be linear. The OLS estimator is not linear in x because it is homogeneous of degree -1 in x:

\hat{b}(y,cx)=\frac{Cov_u(y,cx)}{Var_u(cx)}=\frac{c}{c^2}\frac{Cov_u(y,x)}{Var_u(x)}=\frac{1}{c}\hat{b}(y,x).

Gauss-Markov theorem

Students don't have problems remembering the acronym BLUE: the OLS estimator is Best Linear Unbiased Estimator. Decoding this acronym starts from the end.

  1. An estimator, by definition, is a function of sample data.
  2. Unbiasedness of OLS estimators is thoroughly discussed here.
  3. Linearity of the slope estimator with respect to y has been proved above. Linearity with respect to x is not required.
  4. Now we look at the class of all slope estimators that are linear with respect to y. As an exercise, show that the instrumental variables estimator belongs to this class.

Gauss-Markov Theorem. Under the classical assumptions, the OLS estimator of the slope has the smallest variance in the class of all slope estimators that are linear with respect to y.

In particular, the OLS estimator of the slope is more efficient than the IV estimator. The beauty of this result is that you don't need expressions of their variances (even though they can be derived).

Remark. Even the above formulation is incomplete. In fact, the pair intercept estimator plus slope estimator is efficient. This requires matrix algebra.

 

« Previous Entries  

Search Results for linearity of covariance

8
Feb 17

Instrumental variables estimator

The instrumental variables (IV) estimator is one of the most important alternatives to the OLS estimator.

Preliminaries

Review What is an OLS estimator - simplified derivation. In case of the OLS estimator, there is also the rigorous derivation. For the IV estimator, the simplified derivation is the only one.

Review the large sample approach to OLS estimator (second approach). We need properties of probability limits and conditions sufficient for consistency of the OLS estimator.

Besides, recall our convention regarding the notation of sample versus population characteristics.

Problem statement

We want to estimate the slope in simple regression

(1) y_i=a+bx_i+e_i

assuming that x_i is stochastic. We established that if

Var(x)\neq 0 (existence condition)

and

Cov(x,e)=0 (unbiasedness condition)

then the OLS estimator of the slope is consistent. If the last condition is violated:

Cov(x,e)\ne 0

then there is no consistency. This is called an endogeneity problem and it may occur for various reasons. One of them is omission of relevant variables: if the true model is y_i=a+bx_i+cz_i+v_i but we erroneously assume (1), then the error in (1) is e_i=cz_i+v_i. Most likely, x_i,z_i are correlated and then in (1) x_i,e_i will be correlated.

The easiest way to learn is by similarity

Suppose we have found a variable z such that

(2) Cov(z,x)\neq 0 (IV existence condition)

and

(3) Cov(z,e)=0 (IV consistency condition).

Such a variable z is called an instrument for x. Following the simplified derivation for the OLS estimator, plug (1) in

Cov_u(z,y)=Cov_u(z,a+bx+e) (using linearity of covariance)

=Cov_u(z,a)+bCov_u(z,x)+Cov_u(z,e) (formally letting Cov_u(z,e)=0)

=bCov_u(z,x).

Solving this for b and putting a hat on it, we arrive to the IV estimator:

(4) \hat{b}=\frac{Cov_u(z,y)}{Cov_u(z,x)}.

Consistency

To obtain the working representation, plug (1) in (4):

\hat{b}=\frac{Cov_u(z,a+bx+e)}{Cov_u(z,x)}

=\frac{Cov_u(z,a)+bCov_u(z,x)+Cov_u(z,e)}{Cov_u(z,x)}

=b+\frac{Cov_u(z,e)}{Cov_u(z,x)}.

Repeating what we did for the OLS estimator, we get consistency from (2) and (3):

\text{plim}\hat{b}=\text{plim}\left[b+\frac{Cov_u(z,e)}{Cov_u(z,x)}\right]=\text{plim}b+\text{plim}\frac{Cov_u(z,e)}{Cov_u(z,x)}

=b+\frac{\text{plim}Cov_u(z,e)}{\text{plim}Cov_u(z,x)}=b+\frac{Cov(z,e)}{Cov(z,x)}=b.

Now you can see why (2) and (3) have been imposed.

Remark 1. Sometimes in addition to (2) and (3) people say that the instrument should not be perfectly correlated with the regressor. This is because if z,x are perfectly correlated, then z is a linear function of xz=c+dx, with d\ne 0, so that Cov(z,e)=dCov(x,e)\ne 0 and (3) is impossible.

Remark 2. An instrument is not the same thing as a proxy. If x cannot be measured and we replace it by a close variable z that can be measured (which then is called a proxy for x), instead of (1) we obtain y_i=a+bz_i+e_i and the slope estimator will be the usual OLS estimator, not IV.

 

« Previous Entries  

Search Results for linearity of covariance

2
Sep 16

Proving unbiasedness of OLS estimators

Proving unbiasedness of OLS estimators - the do's and don'ts

Groundwork

Here we derived the OLS estimators. To distinguish between sample and population means, the variance and covariance in the slope estimator will be provided with the subscript u (for "uniform", see the rationale here).

(1) \hat{b}=\frac{Cov_u(x,y)}{Var_u(x)},

(2) \hat{a}=\bar{y}-\hat{b}\bar{x}.

These equations are used in conjunction with the model

(3) y_i=a+bx_i+e_i

where we remember that

(4) Ee_i=0 for all i.

Since (2) depends on (1), we have to start with unbiasedness of the slope estimator.

Using the right representation is critical

We have to show that E\hat{b}=b.

Step 1. Don't apply the expectation directly to (1). Do separate in (1) what is supposed to be E\hat{b}. To reveal the role of errors in (1), plug (3) in (1) and use linearity of covariance with respect to each argument when the other argument is fixed:

\hat{b}=\frac{Cov_u(x,a+bx+e)}{Var_u(x)}=\frac{Cov_u(x,a)+bCov_u(x,x)+Cov_u(x,e)}{Var_u(x)}.

Here Cov_u(x,a)=0 (a constant is uncorrelated with any variable), Cov_u(x,x)=Var_u(x) (covariance of x with itself is its variance), so

(5) \hat{b}=\frac{bVar_u(x)+Cov_u(x,e)}{Var_u(x)}=b+\frac{Cov_u(x,e)}{Var_u(x)}.

Equation (5) is the mean-plus-deviation-from-the-mean decomposition. Many students think that Cov_u(x,e)=0 because of (4). No! The covariance here does not involve the population mean.

Step 2. It pays to make one more step to develop (5). Write out the numerator in (5) using summation:

\hat{b}=b+\frac{1}{n}\sum(x_i-\bar{x})(e_i-\bar{e})/Var_u(x).

Don't write out Var_u(x)! Presence of two summations confuses many students.

Multiplying parentheses and using the fact that \sum(x_i-\bar{x})=n\bar{x}-n\bar{x}=0 we have

\hat{b}=b+\frac{1}{n}[\sum(x_i-\bar{x})e_i-\bar{e}\sum(x_i-\bar{x})]/Var_u(x)

=b+\frac{1}{n}\sum\frac{(x_i-\bar{x})}{Var_u(x)}e_i.

To simplify calculations, denote a_i=(x_i-\bar{x})/Var_u(x). Then the slope estimator becomes

(6) \hat{b}=b+\frac{1}{n}\sum a_ie_i.

This is the critical representation.

Unbiasedness of the slope estimator

Convenience conditionThe regressor x is deterministic. I call it a convenience condition because it's just a matter of mathematical expedience, and later on we'll study ways to bypass it.

From (6), linearity of means and remembering that the deterministic coefficients a_i behave like constants,

(7) E\hat{b}=E[b+\frac{1}{n}\sum a_ie_i]=b+\frac{1}{n}\sum a_iEe_i=b

by (4). This proves unbiasedness.

You don't know the difference between the population and sample means until you see them working in the same formula.

Unbiasedness of the intercept estimator

As above we plug (3) in (2): \hat{a}=\overline{a+bx+e}-\hat{b}\bar{x}=a+b\bar{x}+\bar{e}-\hat{b}\bar{x}. Applying expectation:

E\hat{a}=a+b\bar{x}+E\bar{e}-E\hat{b}\bar{x}=a+b\bar{x}-b\bar{x}=a.

Conclusion

Since in (1)  there is division by Var_u(x), the condition Var_u(x)\ne 0 is the main condition for existence of OLS estimators. From the above proof we see that (4) is the main condition for unbiasedness.

« Previous Entries  

Search Results for linearity of covariance

13
Feb 16

What is an OLS estimator - simplified derivation

What is an OLS estimator?

Most sources give too much theory and a long procedure of OLS (Ordinary Least Squares) derivation. In fact, there is a simplified derivation which can be used in other situations. For example, in case of an Instrumental Variables estimator it is the ONLY derivation. However, simplicity comes at a cost. You need to know the properties of variances and covariances.

Tip. By all means avoid using summation signs because: 1) not everybody is good at them and 2) they clog the picture. For this reason alone it is worth using variances and covariances. For example, if X_1,...,X_n are observations, let us call by variance the quantity Var(X)=\frac{1}{n}\sum_{i=1}^n (X_i-\bar{X})^2 where \bar{X}=\frac{1}{n}\sum_{i=1}^n X_i is the sample mean. If, in addition, we have observations Y_1,...,Y_n on another variable, then we can introduce covariance by Cov(X,Y)=\frac{1}{n}\sum_{i=1}^n (X_i-\bar{X})(Y_i-\bar{Y}). One can notice that

(1) Var(X)=Cov(X,X).

Try to rewrite this using summation signs, and you will see the gain from using shorter notation. The next property is less easy to establish. It is called linearity of covariance: if Z is a third observed variable and a,b are any numbers, then

(2) Cov(X,aY+bZ)=aCov(X,Y)+bCov(X,Z).

Now consider simple regression

(3) Y=\beta_0+\beta_1 X+u

where u is the error term and \beta_0,\beta_1 are parameters to be estimated. The well-known formula for the OLS estimator of \beta_1 is in our notation

(4) \hat{\beta}_1=\frac{Cov(X,Y)}{Var(X)}.

The question is: What is an easy way to obtain this?

Answer: Notice that there is Cov(X,Y) in the numerator and try to expand it using (3) and (2):

Cov(X,Y)=Cov(X,\beta_0+\beta_1 X+u)=Cov(X,\beta_0)+\beta_1Cov(X,X)+Cov(X,u).

The first term on the right is zero (this is shown directly), for the second term we apply (1). In case of the third term, we formally put Cov(X,u)=0 (remember, this is a simplified derivation and we are cheating). The result is Cov(X,Y)=\beta_1Var(X). It remains to solve this for \beta_1, putting a hat on it, to obtain (4). The whole derivation takes just one paragraph!

« Previous Entries  

Search Results for linearity of covariance

16
Jun 21

Solution to Question 1 from UoL exam 2020

Solution to Question 1 from UoL exam 2020

The assessment was an open-book take-home online assessment with a 24-hour window. No attempt was made to prevent cheating, except a warning, which was pretty realistic. Before an exam it's a good idea to see my checklist.

Question 1. Consider the following ARMA(1,1) process:

(1) z_{t}=\gamma +\alpha z_{t-1}+\varepsilon _{t}+\theta \varepsilon _{t-1}

where \varepsilon _{t} is a zero-mean white noise process with variance \sigma ^{2}, and assume |\alpha |,|\theta |<1 and \alpha+\theta \neq 0, which together make sure z_{t} is covariance stationary.

(a) [20 marks] Calculate the conditional and unconditional means of z_{t}, that is, E_{t-1}[z_{t}] and E[z_{t}].

(b) [20 marks] Set \alpha =0. Derive the autocovariance and autocorrelation function of this process for all lags as functions of the parameters \theta and \sigma .

(c) [30 marks] Assume now \alpha \neq 0. Calculate the conditional and unconditional variances of z_{t}, that is, Var_{t-1}[z_{t}] and Var[z_{t}].

Hint: for the unconditional variance, you might want to start by deriving the unconditional covariance between the variable and the innovation term, i.e., Cov[z_{t},\varepsilon _{t}].

(d) [30 marks] Derive the autocovariance and autocorrelation for lags of 1 and 2 as functions of the parameters of the model.

Hint: use the hint of part (c).

Solution

Part (a)

Reminder: The definition of a zero-mean white noise process is

(2) E\varepsilon _{t}=0, Var(\varepsilon _{t})=E\varepsilon_{t}^{2}=\sigma ^{2} for all t and Cov(\varepsilon _{j},\varepsilon_{i})=E\varepsilon _{j}\varepsilon _{i}=0 for all i\neq j.

A variable indexed t-1 is known at moment t-1 and at all later moments and behaves like a constant for conditioning at such moments.

Moment t is future relative to t-1.  The future is unpredictable and the best guess about the future error is zero.

The recurrent relationship in (1) shows that

(3) z_{t-1}=\gamma +\alpha z_{t-2}+... does not depend on the information that arrives at time t and later.

Hence, using also linearity of conditional means,

(4) E_{t-1}z_{t}=E_{t-1}\gamma +\alpha E_{t-1}z_{t-1}+E_{t-1}\varepsilon _{t}+\theta E_{t-1}\varepsilon _{t-1}=\gamma +\alpha z_{t-1}+\theta\varepsilon _{t-1}.

The law of iterated expectations (LIE): application of E_{t-1}, based on information available at time t-1, and subsequent application of E, based on no information, gives the same result as application of E.

Ez_{t}=E[E_{t-1}z_{t}]=E\gamma +\alpha Ez_{t-1}+\theta E\varepsilon _{t-1}=\gamma +\alpha Ez_{t-1}.

Since z_{t} is covariance stationary, its means across times are the same, so Ez_{t}=\gamma +\alpha Ez_{t} and Ez_{t}=\frac{\gamma }{1-\alpha }.

Part (b)

With \alpha =0 we get z_{t}=\gamma +\varepsilon _{t}+\theta\varepsilon _{t-1} and from part (a) Ez_{t}=\gamma . Using (2), we find variance

Var(z_{t})=E(z_{t}-Ez_{t})^{2}=E(\varepsilon _{t}^{2}+2\theta \varepsilon_{t}\varepsilon _{t-1}+\theta ^{2}\varepsilon _{t-2}^{2})=(1+\theta^{2})\sigma ^{2}

and first autocovariance

(5) \gamma_{1}=Cov(z_{t},z_{t-1})=E(z_{t}-Ez_{t})(z_{t-1}-Ez_{t-1})=E(\varepsilon_{t}+\theta \varepsilon _{t-1})(\varepsilon _{t-1}+\theta \varepsilon_{t-2})=\theta E\varepsilon _{t-1}^{2}=\theta \sigma ^{2}.

Second and higher autocovariances are zero because the subscripts of epsilons don't overlap.

Autocorrelation function: \rho _{0}=\frac{Cov(z_{t},z_{t})}{\sqrt{Var(z_{t})Var(z_{t})}}=1 (this is always true),

\rho _{1}=\frac{Cov(z_{t},z_{t-1})}{\sqrt{Var(z_{t})Var(z_{t-1})}}=\frac{\theta \sigma ^{2}}{(1+\theta ^{2})\sigma ^{2}}=\frac{\theta }{1+\theta ^{2}}, \rho _{j}=0 for j>1.

This is characteristic of MA processes: their autocorrelations are zero starting from some point.

Part (c)

If we replace all expectations in the definition of variance, we obtain the definition of conditional variance. From (1) and (4)

Var_{t-1}(z_{t})=E_{t-1}(z_{t}-E_{t-1}z_{t})^{2}=E_{t-1}\varepsilon_{t}^{2}=\sigma ^{2}.

By the law of total variance

(6) Var(z_{t})=EVar_{t-1}(z_{t})+Var(E_{t-1}z_{t})=\sigma ^{2}+Var(\gamma+\alpha z_{t-1}+\theta \varepsilon _{t-1})=

(an additive constant does not affect variance)

=\sigma ^{2}+Var(\alpha z_{t-1}+\theta \varepsilon _{t-1})=\sigma^{2}+\alpha ^{2}Var(z_{t})+2\alpha \theta Cov(z_{t-1},\varepsilon_{t-1})+\theta ^{2}Var(\varepsilon _{t-1}).

By the LIE and (3)

Cov(z_{t-1},\varepsilon _{t-1})=Cov(\gamma +\alpha z_{t-2}+\varepsilon  _{t-1}+\theta \varepsilon _{t-2},\varepsilon _{t-1})=\alpha  Cov(z_{t-2},\varepsilon _{t-1})+E\varepsilon _{t-1}^{2}+\theta  EE_{t-2}\varepsilon _{t-2}\varepsilon _{t-1}=\sigma ^{2}+\theta  E(\varepsilon _{t-2}E_{t-2}\varepsilon _{t-1}).

Here E_{t-2}\varepsilon _{t-1}=0, so

(7) Cov(z_{t-1},\varepsilon _{t-1})=\sigma ^{2}.

This equation leads to

Var(z_{t})=Var(\gamma +\alpha z_{t-1}+\varepsilon _{t}+\theta \varepsilon  _{t-1})=\alpha ^{2}Var(z_{t-1})+Var(\varepsilon _{t})+\theta  ^{2}Var(\varepsilon _{t-1})+

+2\alpha Cov(z_{t-1},\varepsilon _{t})+2\alpha \theta  Cov(z_{t-1},\varepsilon _{t-1})+2\theta Cov(\varepsilon _{t},\varepsilon  _{t-1})=\alpha ^{2}Var(z_{t})+\sigma ^{2}+\theta ^{2}\sigma ^{2}+2\alpha  \theta \sigma ^{2}

and, finally,

(8) Var(z_{t})=\frac{(1+2\alpha \theta +\theta ^{2})\sigma ^{2}}{1-\alpha  ^{2}}.

Part (d)

From (7)

(9) Cov(z_{t-1},\varepsilon _{t-2})=Cov(\gamma +\alpha z_{t-2}+\varepsilon  _{t-1}+\theta \varepsilon _{t-2},\varepsilon _{t-2})=\alpha  Cov(z_{t-2},\varepsilon _{t-2})+\theta Var(\varepsilon _{t-2})=(\alpha  +\theta )\sigma ^{2}.

It follows that

Cov(z_{t},z_{t-1})=Cov(\gamma +\alpha z_{t-1}+\varepsilon _{t}+\theta  \varepsilon _{t-1},\gamma +\alpha z_{t-2}+\varepsilon _{t-1}+\theta  \varepsilon _{t-2})=

(a constant is not correlated with anything)

=\alpha ^{2}Cov(z_{t-1},z_{t-2})+\alpha Cov(z_{t-1},\varepsilon  _{t-1})+\alpha \theta Cov(z_{t-1},\varepsilon _{t-2})+

+\alpha Cov(\varepsilon _{t},z_{t-2})+Cov(\varepsilon _{t},\varepsilon  _{t-1})+\theta Cov(\varepsilon _{t},\varepsilon _{t-2})+

+\theta \alpha Cov(\varepsilon _{t-1},z_{t-2})+\theta Var(\varepsilon  _{t-1})+\theta ^{2}Cov(\varepsilon _{t-1},\varepsilon _{t-2}).

From (7) Cov(z_{t-2},\varepsilon _{t-2})=\sigma ^{2} and from (9) Cov(z_{t-1},\varepsilon _{t-2})=(\alpha +\theta )\sigma ^{2}.

From (3) Cov(\varepsilon _{t},z_{t-2})=Cov(\varepsilon _{t-1},z_{t-2})=0.

Using also the white noise properties and stationarity of z_{t}

Cov(z_{t},z_{t-1})=Cov(z_{t-1},z_{t-2})=\gamma _{1},

we are left with

\gamma _{1}=\alpha ^{2}\gamma _{1}+\alpha \sigma  ^{2}+\alpha \theta (\alpha +\theta )\sigma ^{2}+\theta \sigma ^{2}=\alpha  ^{2}\gamma _{1}+(1+\alpha \theta )(\alpha +\theta )\sigma ^{2}.

Hence,

\gamma _{1}=\frac{(1+\alpha \theta )(\alpha +\theta )\sigma ^{2}}{1-\alpha  ^{2}}

and using (8)

\rho _{0}=1, \rho _{1}=\frac{(1+\alpha \theta )(\alpha +\theta )}{  1+2\alpha \theta +\theta ^{2}}.

The finish is close.

Cov(z_{t},z_{t-2})=Cov(\gamma +\alpha z_{t-1}+\varepsilon _{t}+\theta  \varepsilon _{t-1},\gamma +\alpha z_{t-3}+\varepsilon _{t-2}+\theta  \varepsilon _{t-3})=

=\alpha ^{2}Cov(z_{t-1},z_{t-3})+\alpha Cov(z_{t-1},\varepsilon  _{t-2})+\alpha \theta Cov(z_{t-1},\varepsilon _{t-3})+

+\alpha Cov(\varepsilon _{t},z_{t-3})+Cov(\varepsilon _{t},\varepsilon  _{t-2})+\theta Cov(\varepsilon _{t},\varepsilon _{t-3})+

+\theta \alpha Cov(\varepsilon _{t-1},z_{t-3})+\theta Cov(\varepsilon  _{t-1},\varepsilon _{t-2})+\theta ^{2}Cov(\varepsilon _{t-1},\varepsilon  _{t-3}).

This simplifies to

(10) Cov(z_{t},z_{t-2})=\alpha ^{2}Cov(z_{t-1},z_{t-3})+\alpha (\alpha  +\theta )\sigma ^{2}+\alpha \theta Cov(z_{t-1},\varepsilon _{t-3}).

By (7)

Cov(z_{t-1},\varepsilon _{t-3})=Cov(\gamma +\alpha z_{t-2}+\varepsilon  _{t-1}+\theta \varepsilon _{t-2},\varepsilon _{t-3})=\alpha  Cov(z_{t-2},\varepsilon _{t-3})=

=\alpha Cov(\gamma +\alpha z_{t-3}+\varepsilon _{t-2}+\theta \varepsilon  _{t-3},\varepsilon _{t-3})=\alpha \sigma ^{2}+\alpha \theta \sigma ^{2}=\alpha (1+\theta )\sigma ^{2}.

Finally, using (10)

\gamma _{2}=\alpha ^{2}\gamma _{2}+\alpha (\alpha +\theta )\sigma  ^{2}+\alpha^2 \theta (1 +\theta )\sigma ^{2}=\alpha ^{2}\gamma  _{2}+\alpha\sigma^2 (\alpha +\theta +\alpha\theta +\alpha\theta^2)\sigma ^{2},

\gamma _{2}=\frac{\alpha\sigma^2 (\alpha +\theta +\alpha\theta +\alpha\theta^2)\sigma ^{2}}{1-\alpha  ^{2}},

\rho _{2}=\frac{\alpha\sigma^2 (\alpha +\theta +\alpha\theta +\alpha\theta^2)}{1+2\alpha \theta  +\theta ^{2}}.

A couple of errors have been corrected on June 22, 2021. Hope this is final.

« Previous Entries  

Search Results for linearity of covariance

7
Jul 18

Euclidean space geometry: scalar product, norm and distance

Euclidean space geometry: scalar product, norm and distance

Learning this material has spillover effects for Stats because everything in this section has analogs for means, variances and covariances.

Scalar product

Definition 1. The scalar product of two vectors x,y\in R^n is defined by x\cdot y=\sum_{i=1}^nx_iy_i. The motivation has been provided earlier.

Remark. If matrix notation is of essence and x,y are written as column vectors, we have x\cdot y=x^Ty. The first notation is better when we want to emphasize symmetry x\cdot y=y\cdot x.

Linearity. The scalar product is linear in the first argument when the second argument is fixed: for any vectors x,y,z and numbers a,b one has

(1) (ax+by)\cdot z=a(x\cdot z)+b(y\cdot z).

Proof. (ax+by)\cdot z=\sum_{i=1}^n(ax_i+by_i)z_i=\sum_{i=1}^n(ax_iz_i+by_iz_i)

=a\sum_{i=1}^nx_iz_i+b\sum_{i=1}^ny_iz_i=ax\cdot z+by\cdot z.

Special cases. 1) Homogeneity: by setting b=0 we get (ax)\cdot z=a(x\cdot  z). 2) Additivity: by setting a=b=1 we get (x+y)\cdot z=x\cdot z+y\cdot  z.

Exercise 1. Formulate and prove the corresponding properties of the scalar product with respect to the second argument.

Definition 2. The vectors x,y are called orthogonal if x\cdot y=0.

Exercise 2. 1) The zero vector is orthogonal to any other vector. 2) If x,y are orthogonal, then any vectors proportional to them are also orthogonal. 3) The unit vectors in R^n are defined by e_i=(0,...,1,...,0) (the unit is in the ith place, all other components are zeros), i=1,...,n. Check that they are pairwise orthogonal.

Norm

Exercise 3. On the plane find the distance between a point x and the origin.

Figure 1. Pythagoras theorem

Figure 1. Pythagoras theorem

Once I introduce the notation on a graph (Figure 1), everybody easily finds the distance to be \text{dist}(0,x)=\sqrt{x_1^2+x_2^2} using the Pythagoras theorem. Equally easily, almost everybody fails to connect this simple fact with the ensuing generalizations.

Definition 3. The norm in R^n is defined by \left\Vert x\right\Vert=\sqrt{\sum_{i=1}^nx_i^2}. It is interpreted as the distance from point x to the origin and also the length of the vector x.

Exercise 4. 1) Can the norm be negative? We know that, in general, there are two square roots of a positive number: one is positive and the other is negative. The positive one is called an arithmetic square root. Here we are using the arithmetic square root.

2) Using the norm can you define the distance between points x,y\in R^n?

3) The relationship between the norm and scalar product:

(2) \left\Vert x\right\Vert =\sqrt{x\cdot x}.

True or wrong?

4) Later on we'll prove that \Vert x+y\Vert\leq\Vert x\Vert+\Vert{ y}\Vert . Explain why this is called a triangle inequality. For this, you need to recall the parallelogram rule.

5) How much is \left\Vert 0\right\Vert ? If \left\Vert x\right\Vert =0, what can you say about x?

Norm of a linear combination. For any vectors x,y and numbers a,b one has

(3) \left\Vert ax+by\right\Vert^2=a^2\left\Vert x\right\Vert^2+2ab(x\cdot y)+b^2\left\Vert y\right\Vert^2.

Proof. From (2) we have

\left\Vert ax+by\right\Vert^2=\left(ax+by\right)\cdot\left(ax+by\right)     (using linearity in the first argument)

=ax\cdot\left(ax+by\right)+by\cdot\left(ax+by\right)         (using linearity in the second argument)

=a^2x\cdot x+abx\cdot y+bay\cdot x+b^2y\cdot y (applying symmetry of the scalar product and (2))

=a^2\left\Vert x\right\Vert^2+2ab(x\cdot y)+b^2\left\Vert y\right\Vert^2.

Pythagoras theorem. If x,y are orthogonal, then \left\Vert x+y\right\Vert^2=\left\Vert x\right\Vert^2+\left\Vert y\right\Vert^2.

This is immediate from (3).

Norm homogeneity. Review the definition of the absolute value and the equation |a|=\sqrt{a^2}. The norm is homogeneous of degree 1:

\left\Vert ax\right\Vert=\sqrt{(ax)\cdot (ax)}=\sqrt{{a^2x\cdot x}}=|a|\left\Vert x\right\Vert.

« Previous Entries  

Search Results for linearity of covariance

25
Oct 16

Properties of variance

All properties of variance in one place

Certainty is the mother of quiet and repose, and uncertainty the cause of variance and contentions. Edward Coke

Preliminaries: study properties of means with proofs.

Definition. Yes, uncertainty leads to variance, and we measure it by Var(X)=E(X-EX)^2. It is useful to use the name deviation from mean for X-EX and realize that E(X-EX)=0, so that the mean of the deviation from mean cannot serve as a measure of variation of X around EX.

Property 1. Variance of a linear combination. For any random variables X,Y and numbers a,b one has
(1) Var(aX + bY)=a^2Var(X)+2abCov(X,Y)+b^2Var(Y).
The term 2abCov(X,Y) in (1) is called an interaction term. See this post for the definition and properties of covariance.
Proof.
Var(aX + bY)=E[aX + bY -E(aX + bY)]^2

(using linearity of means)
=E(aX + bY-aEX -bEY)^2

(grouping by variable)
=E[a(X-EX)+b(Y-EY)]^2

(squaring out)
=E[a^2(X-EX)^2+2ab(X-EX)(Y-EY)+(Y-EY)^2]

(using linearity of means and definitions of variance and covariance)
=a^2Var(X) + 2abCov(X,Y) +b^2Var(Y).
Property 2. Variance of a sum. Letting in (1) a=b=1 we obtain
Var(X + Y) = Var(X) + 2Cov(X,Y)+Var(Y).

Property 3. Homogeneity of degree 2. Choose b=0 in (1) to get
Var(aX)=a^2Var(X).
Exercise. What do you think is larger: Var(X+Y) or Var(X-Y)?
Property 4. If we add a constant to a variable, its variance does not change: Var(X+c)=E[X+c-E(X+c)]^2=E(X+c-EX-c)^2=E(X-EX)^2=Var(X)
Property 5. Variance of a constant is zero: Var(c)=E(c-Ec)^2=0.

Property 6. Nonnegativity. Since the squared deviation from mean (X-EX)^2 is nonnegative, its expectation is nonnegativeE(X-EX)^2\ge 0.

Property 7. Only a constant can have variance equal to zero: If Var(X)=0, then E(X-EX)^2 =(x_1-EX)^2p_1 +...+(x_n-EX)^2p_n=0, see the definition of the expected value. Since all probabilities are positive, we conclude that x_i=EX for all i, which means that X is identically constant.

Property 8. Shortcut for variance. We have an identity E(X-EX)^2=EX^2-(EX)^2. Indeed, squaring out gives

E(X-EX)^2 =E(X^2-2XEX+(EX)^2)

(distributing expectation)

=EX^2-2E(XEX)+E(EX)^2

(expectation of a constant is constant)

=EX^2-2(EX)^2+(EX)^2=EX^2-(EX)^2.

All of the above properties apply to any random variables. The next one is an exception in the sense that it applies only to uncorrelated variables.

Property 9. If variables are uncorrelated, that is Cov(X,Y)=0, then from (1) we have Var(aX + bY)=a^2Var(X)+b^2Var(Y). In particular, letting a=b=1, we get additivityVar(X+Y)=Var(X)+Var(Y). Recall that the expected value is always additive.

GeneralizationsVar(\sum a_iX_i)=\sum a_i^2Var(X_i) and Var(\sum X_i)=\sum Var(X_i) if all X_i are uncorrelated.

Among my posts, where properties of variance are used, I counted 12 so far.

« Previous Entries  

Search Results for linearity of covariance

13
Oct 16

Properties of means

Properties of means, covariances and variances are bread and butter of professionals. Here we consider the bread - the means

Properties of means: as simple as playing with tables

Definition of a random variable. When my Brazilian students asked for an intuitive definition of a random variable, I said: It is a function whose values are unpredictable. Therefore it is prohibited to work with their values and allowed to work only with their various means. For proofs we need a more technical definition: it is a table values+probabilities of type Table 1.

Table 1.  Random variable definition
Values of X Probabilities
x_1 p_1
... ...
x_n p_n

Note: The complete form of writing {p_i} is P(X = {x_i}).

Mean (or expected value) value definitionEX = x_1p_1 + ... + x_np_n = \sum\limits_{i = 1}^nx_ip_i. In words, this is a weighted sum of values, where the weights p_i reflect the importance of corresponding x_i.

Note: The expected value is a function whose argument is a complex object (it is described by Table 1) and the value is simple: EX is just a number. And it is not a product of E and X! See how different means fit this definition.

Definition of a linear combination. See here the financial motivation. Suppose that X,Y are two discrete random variables with the same probability distribution {p_1},...,{p_n}. Let a,b be real numbers. The random variable aX + bY is called a linear combination of X,Y with coefficients a,b. Its special cases are aX (X scaled by a) and X + Y (a sum of X and Y). The detailed definition is given by Table 2.

Table 2.  Linear operations definition
Values of X Values of Y Probabilities aX X + Y aX + bY
x_1 {y_1} p_1 a{x_1} {x_1} + {y_1} a{x_1} + b{y_1}
...  ... ...  ...  ...  ...
x_n {y_n} p_n a{x_n} {x_n} + {y_n} a{x_n} + b{y_n}

Note: The situation when the probability distributions are different is reduced to the case when they are the same, see my book.

Property 1. Linearity of means. For any random variables X,Y and any numbers a,b one has

(1) E(aX + bY) = aEX + bEY.

Proof. This is one of those straightforward proofs when knowing the definitions and starting with the left-hand side is enough to arrive at the result. Using the definitions in Table 2, the mean of the linear combination is
E(aX + bY)= (a{x_1} + b{y_1}){p_1} + ... + (a{x_n} + b{y_n}){p_n}

(distributing probabilities)
= a{x_1}{p_1} + b{y_1}{p_1} + ... + a{x_n}{p_n} + b{y_n}{p_n}

(grouping by variables)
= (a{x_1}{p_1} + ... + a{x_n}{p_n}) + (b{y_1}{p_1} + ... + b{y_n}{p_n})

(pulling out constants)
= a({x_1}{p_1} + ... + {x_n}{p_n}) + b({y_1}{p_1} + ... + {y_n}{p_n})=aEX+bEY.

See applications: one, and two, and three.

Generalization to the case of a linear combination of n variables:

E({a_1}{X_1} + ... + {a_n}{X_n}) = {a_1}E{X_1} + ... + {a_n}E{X_n}.

Special cases. a) Letting a = b = 1 in (1) we get E(X + Y) = EX + EY. This is called additivity. See an application. b) Letting in (1) b = 0 we get E(aX) = aEX. This property is called homogeneity of degree 1 (you can pull the constant out of the expected value sign). Ask your students to deduce linearity from homogeneity and additivity.

Property 2. Expected value of a constant. Everybody knows what a constant is. Ask your students what is a constant in terms of Table 1. The mean of a constant is that constant, because a constant doesn't change, rain or shine: Ec = c{p_1} + ... + c{p_n} = c({p_1} + ... + {p_n}) = 1 (we have used the completeness axiom). In particular, it follows that E(EX)=EX.

Property 3. The expectation operator preserves order: if x_i\ge y_i for all i, then EX\ge EY. In particular, the mean of a nonnegative random variable is nonnegative: if x_i\ge 0 for all i, then EX\ge 0.

Indeed, using the fact that all probabilities are nonnegative, we get EX = x_1p_1 + ... + x_np_n\ge y_1p_1 + ... + y_np_n=EY.

Property 4. For independent variables, we have EXY=(EX)(EY) (multiplicativity), which has important implications on its own.

The best thing about the above properties is that, although we proved them under simplified assumptions, they are always true. We keep in mind that the expectation operator E is the device used by Mother Nature to measure the average, and most of the time she keeps hidden from us both the probabilities and the average EX.

« Previous Entries