27
Dec 22

## Final exam in Advanced Statistics ST2133, 2022

Unlike most UoL exams, here I tried to relate the theory to practical issues.

KBTU International School of Economics

Compiled by Kairat Mynbaev

The total for this exam is 41 points. You have two hours.

Everywhere provide detailed explanations. When answering please clearly indicate question numbers. You don’t need a calculator. As long as the formula you provide is correct, the numerical value does not matter.

Question 1. (12 points)

a) (2 points) At a casino, two players are playing on slot machines. Their payoffs $X,Y$ are standard normal and independent. Find the joint density of the payoffs.

b) (4 points) Two other players watch the first two players and start to argue what will be larger: the sum $U = X + Y$ or the difference $V = X - Y$. Find the joint density. Are variables $U,V$ independent? Find their marginal densities.

c) (2 points) Are $U,V$ normal? Why? What are their means and variances?

d) (2 points) Which probability is larger: $P(U > V)$ or $P\left( {U < V} \right)$?

e) (2 points) In this context interpret the conditional expectation $E\left( {U|V = v} \right)$. How much is it?

Reminder. The density of a normal variable $X \sim N\left( {\mu ,{\sigma ^2}} \right)$ is ${f_X}\left( x \right) = \frac{1}{{\sqrt {2\pi {\sigma ^2}} }}{e^{ - \frac{{{{\left( {x - \mu } \right)}^2}}}{{2{\sigma ^2}}}}}$.

Question 2. (9 points) The distribution of a call duration $X$ of one Kcell [largest mobile operator in KZ] customer is exponential: ${f_X}\left( x \right) = \lambda {e^{ - \lambda x}},\,\,x \ge 0,\,\,{f_X}\left( x \right) = 0,\,\,x < 0.$ The number $N$ of customers making calls simultaneously is distributed as Poisson: $P\left( {N = n} \right) = {e^{ - \mu }}\frac{{{\mu ^n}}}{{n!}},\,\,n = 0,1,2,...$ Thus the total call duration for all customers is ${S_N} = {X_1} + ... + {X_N}$ for $N \ge 1$. We put ${S_0} = 0$. Assume that customers make their decisions about calling independently.

a) (3 points) Find the general formula (when ${X_1},...,{X_n}$ are identically distributed and $X,N$ are independent but not necessarily exponential and Poisson, as above) for the moment generating function of $S_N$ explaining all steps.

b) (3 points) Find the moment generating functions of $X$, $N$ and ${S_N}$ for your particular distributions.

c) (3 points) Find the mean and variance of ${S_N}$. Based on the equations you obtained, can you suggest estimators of parameters $\lambda ,\mu$?

Remark. Direct observations on the exponential and Poisson distributions are not available. We have to infer their parameters by observing ${S_N}$. This explains the importance of the technique used in Question 2.

Question 3. (8 points)

a) (2 points) For a non-negative random variable $X$ prove the Markov inequality $P\left( {X > c} \right) \le \frac{1}{c}EX,\,\,\,c > 0.$

b) (2 points) Prove the Chebyshev inequality $P\left( {|X - EX| > c} \right) \le \frac{1}{c^2}Var\left( X \right)$ for an arbitrary random variable $X$.

c) (4 points) We say that the sequence of random variables $\left\{ X_n \right\}$ converges in probability to a random variable $X$ if $P\left( {|{X_n} - X| > \varepsilon } \right) \to 0$ as $n \to \infty$ for any $\varepsilon > 0$.  Suppose that $E{X_n} = \mu$ for all $n$ and that $Var\left(X_n \right) \to 0$ as $n \to \infty$. Prove that then $\left\{X_n\right\}$ converges in probability to $\mu$.

Remark. Question 3 leads to the simplest example of a law of large numbers: if $\left\{ X_n \right\}$ are i.i.d. with finite variance, then their sample mean converges to their population mean in probability.

Question 4. (8 points)

a) (4 points) Define a distribution function. Give its properties, with intuitive explanations.

b) (4 points) Is a sum of two distribution functions a distribution function? Is a product of two distribution functions a distribution function?

Remark. The answer for part a) is here and the one for part b) is based on it.

Question 5. (4 points) The Rakhat factory prepares prizes for kids for the upcoming New Year event. Each prize contains one type of chocolates and one type of candies. The chocolates and candies are chosen randomly from two production lines, the total number of items is always 10 and all selections are equally likely.

a) (2 points) What proportion of prepared prizes contains three or more chocolates?

b) (2 points) 100 prizes have been sent to an orphanage. What is the probability that 50 of those prizes contain no more than two chocolates?

23
Mar 17

## Maximum likelihood: idea and life of a bulb

Maximum likelihood: idea of the method and application to life of a bulb. Sometimes I plagiarize from my book.

### Maximum likelihood idea

Figure 1. Maximum likelihood idea

The main idea of the maximum likelihood (ML) method is illustrated in Figure 1. We start with the sample depicted with points on the horizontal axis. Then we think which of the densities shown on the figure is more likely to have generated that sample. Of course, it's the one on the left, filled with grey.

This density takes higher values at observed points than the other two. Note also that the position of the density is regulated by its parameters. This explains the main idea: choose the parameters so as to maximize the density at the observed points.

### Algorithm

Step 1. A statistical model usually contains a random term. To describe that term, choose a density from some parametric family. Denote it $f(x|\theta)$ where $\theta$ is a parameter or a set of parameters. $f(x_i|\theta)$ is the value of the density at the $i$th observation.

Step 2. Assume that observations are independent. Then the joint density is a product of own densities: $f(x_1,...,x_n|\theta)=f(x_1|\theta)...f(x_n|\theta)$. Since the observations are fixed, the joint density is a function of just parameters.

Definition. The joint density as a function of just parameters is called a likelihood function and denoted $L(\theta|x_1,...,x_n)=f(x_1,...,x_n|\theta)$, to reflect the fact that the parameters are the main argument. The parameters that maximize the likelihood function, if they exist, are the maximum likelihood estimators, and thus the name Maximum Likelihood (ML) method.

Step 3. Since $\log x$ is a monotone function, the likelihood $L(\theta |x_1,...,x_n)$ and the log-likelihood function $\lambda(\theta)=\log L(\theta|x_1,...,x_n)$ are maximized at the same time. The likelihood is often a multiplicative function, in which case maximizing the log-likelihood is technically easier.

Comments. (1) Most of the time the likelihood function is difficult to maximize analytically. Then maximization is done on the computer. A numerical algorithm can give the solution only approximately. Moreover, the likelihood function may not have maximums at all or may have many maximums; in the former case the numerical procedure does not converge and in the latter the computer gives only one solution.

(2) One should distinguish models and estimation methods. The OLS method applied to the linear model gives OLS estimators. The ML method applied to the same linear model gives ML estimators. Most linear models are dealt with using the least squares method. All exercises for maximum likelihood require some algebra, as is seen from the algorithm.

### Example: life of a bulb

Life of a bulb is described by the exponential distribution

$p(t)=0$, if $t\le0$, and $p(t)=\mu{e^{-\mu t}}$ if $t>0$

where $\mu$ is a positive parameter. Life of a bulb cannot be negative, so the density is zero on the left half-axis. The density takes high values in the right neighborhood of the origin and quickly declines afterwards. That means that the probability that the bulb will burn right after it's produced is the highest, but if it survives the first minutes (hours, days), it will serve for a while. Most electronic products behave like this.

Exercise. Derive the ML estimator of $\mu$.

Solution. Step 1. $f(x_i|\mu)=\mu e^{-\mu x_i}$.

Step 2. Assuming independent observations, the joint density is a product of these densities $\mu^{n}e^{-\mu x_1}...e^{-\mu x_n}$.

Step 3. The log-likelihood function is $\lambda=n\log\mu-\mu(x_1+...+x_n)$. The first order condition is $\frac{\partial\lambda}{\partial\mu}=\frac{n}{\mu}-(x_1+...+x_n)=0$ and its solution is $\mu=\frac{1}{\bar{x}}$. To make sure that this is a maximum, we need to check the second order condition. Since $\frac{\partial^{2}\lambda }{\partial\mu^{2}}=-\frac{n}{\mu^2}$ is negative, we have really found the maximum.

Conclusion: $\hat{\mu}_{ML}=\frac{1}{\bar{x}}$ is the ML estimator for $\mu$.