27
Jan 21

## AP Stats and Business Stats

Its content, organization and level justify its adoption as a textbook for introductory statistics for Econometrics in most American or European universities. The book's table of contents is somewhat standard, the innovation comes in a presentation that is crisp, concise, precise and directly relevant to the Econometrics course that will follow. I think instructors and students will appreciate the absence of unnecessary verbiage that permeates many existing textbooks.

Having read Professor Mynbaev's previous books and research articles I was not surprised with his clear writing and precision. However, I was surprised with an informal and almost conversational one-on-one style of writing which should please most students. The informality belies a careful presentation where great care has been taken to present the material in a pedagogical manner.

Carlos Martins-Filho
Professor of Economics
Boulder, USA

26
May 20

2
Mar 20

## Statistical calculator

In my book I explained how one can use Excel to do statistical simulations and replace statistical tables commonly used in statistics courses. Here I go one step further by providing a free statistical calculator that replaces the following tables from the book by Newbold et al.:

Table 1 Cumulative Distribution Function, F(z), of the Standard Normal Distribution Table

Table 2 Probability Function of the Binomial Distribution

Table 5 Individual Poisson Probabilities

Table 7a Upper Critical Values of Chi-Square Distribution with $\nu$ Degrees of Freedom

Table 8 Upper Critical Values of Student’s t Distribution with $\nu$ Degrees of Freedom

Tables 9a, 9b Upper Critical Values of the F Distribution

The calculator is just a Google sheet with statistical functions, see Picture 1:

Picture 1. Calculator using Google sheet

## How to use Calculator

1. Open an account at gmail.com, if you haven't already. Open Google Drive.
3. Find the sheet on my Google drive and copy it to your Google drive (File/Make a copy). An icon of my calculator will appear in your drive. That's not the file, it's just a link to my file. To the right of it there are three dots indicating options. One of them is "Make a copy", so use that one. The copy will be in your drive. After that you can delete the link to my file. You might want to rename "Copy of Calculator" as "Calculator".
5. When you click a cell, you can enter what you need either in the formula bar at the bottom or directly in the cell. You can also see the functions I embedded in the sheet.
6. In cell A1, for example, you can enter any legitimate formula with numbers, arithmetic signs, and Google sheet functions. Be sure to start it with =,+ or - and to press the checkmark on the right of the formula bar after you finish.
7. The cells below A1 replace the tables listed above. Beside each function there is a verbal description and further to the right - a graphical illustration (which is not in Picture 1).
8. On the tab named Regression you can calculate the slope and intercept. The sample size must be 10.
9. Keep in mind that tables for continuous distributions need two functions. For example, in case of the standard normal distribution one function allows you to go from probability (area of the left tail) to the cutting value on the horizontal axis. The other function goes from the cutting value on the horizontal axis to probability.
10. Feel free to add new sheets or functions as you may need. You will have to do this on a tablet or computer.
17
Mar 19

## AP Statistics the Genghis Khan way 2

Last semester I tried to explain theory through numerical examples. The results were terrible. Even the best students didn't stand up to my expectations. The midterm grades were so low that I did something I had never done before: I allowed my students to write an analysis of the midterm at home. Those who were able to verbally articulate the answers to me received a bonus that allowed them to pass the semester.

This semester I made a U-turn. I announced that in the first half of the semester we will concentrate on theory and we followed this methodology. Out of 35 students, 20 significantly improved their performance and 15 remained where they were.

### Midterm exam, version 1

#### 1. General density definition (6 points)

a. Define the density $p_X$ of a random variable $X.$ Draw the density of heights of adults, making simplifying assumptions if necessary. Don't forget to label the axes.

b. According to your plot, how much is the integral $\int_{-\infty}^0p_X(t)dt?$ Explain.

c. Why the density cannot be negative?

d. Why the total area under the density curve should be 1?

e. Where are basketball players on your graph? Write down the corresponding expression for probability.

f. Where are dwarfs on your graph? Write down the corresponding expression for probability.

This question is about the interval formula. In each case students have to write the equation for the probability and the corresponding integral of the density. At this level, I don't talk about the distribution function and introduce the density by the interval formula.

#### 2. Properties of means (8 points)

a. Define a discrete random variable and its mean.

b. Define linear operations with random variables.

c. Prove linearity of means.

d. Prove additivity and homogeneity of means.

e. How much is the mean of a constant?

f. Using induction, derive the linearity of means for the case of $n$ variables from the case of two variables (3 points).

#### 3. Covariance properties (6 points)

a. Derive linearity of covariance in the first argument when the second is fixed.

b. How much is covariance if one of its arguments is a constant?

c. What is the link between variance and covariance? If you know one of these functions, can you find the other (there should be two answers)? (4 points)

#### 4. Standard normal variable (6 points)

a. Define the density $p_z(t)$ of a standard normal.

b. Why is the function $p_z(t)$ even? Illustrate this fact on the plot.

c. Why is the function $f(t)=tp_z(t)$ odd? Illustrate this fact on the plot.

d. Justify the equation $Ez=0.$

e. Why is $V(z)=1?$

f. Let $t>0.$ Show on the same plot areas corresponding to the probabilities $A_1=P(0 $A_2=P(z>t),$ $A_3=P(z<-t),$ $A_4=P(-t Write down the relationships between $A_1,...,A_4.$

#### 5. General normal variable (3 points)

a. Define a general normal variable $X.$

b. Use this definition to find the mean and variance of $X.$

c. Using part b, on the same plot graph the density of the standard normal and of a general normal with parameters $\sigma =2,$ $\mu =3.$

### Midterm exam, version 2

#### 1. General density definition (6 points)

a. Define the density $p_X$ of a random variable $X.$ Draw the density of work experience of adults, making simplifying assumptions if necessary. Don't forget to label the axes.

b. According to your plot, how much is the integral $\int_{-\infty}^0p_X(t)dt?$ Explain.

c. Why the density cannot be negative?

d. Why the total area under the density curve should be 1?

e. Where are retired people on your graph? Write down the corresponding expression for probability.

f. Where are young people (up to 25 years old) on your graph? Write down the corresponding expression for probability.

#### 2. Variance properties (8 points)

a. Define variance of a random variable. Why is it non-negative?

b. Define the formula for variance of a linear combination of two variables.

c. How much is variance of a constant?

d. What is the formula for variance of a sum? What do we call homogeneity of variance?

e. What is larger: $V(X+Y)$ or $V(X-Y)$? (2 points)

f. One investor has 100 shares of Apple, another - 200 shares. Which investor's portfolio has larger variability? (2 points)

#### 3. Poisson distribution (6 points)

a. Write down the Taylor expansion and explain the idea. How are the Taylor coefficients found?

b. Use the Taylor series for the exponential function to define the Poisson distribution.

c. Find the mean of the Poisson distribution. What is the interpretation of the parameter $\lambda$ in practice?

#### 4. Standard normal variable (6 points)

a. Define the density $p_z(t)$ of a standard normal.

b. Why is the function $p_z(t)$ even? Illustrate this fact on the plot.

c. Why is the function $f(t)=tp_z(t)$ odd? Illustrate this fact on the plot.

d. Justify the equation $Ez=0.$

e. Why is $V(z)=1?$

f. Let $t>0.$ Show on the same plot areas corresponding to the probabilities $A_1=P(0 $A_2=P(z>t),$ $A_{3}=P(z<-t),$ $A_4=P(-t Write down the relationships between $A_{1},...,A_{4}.$

#### 5. General normal variable (3 points)

a. Define a general normal variable $X.$

b. Use this definition to find the mean and variance of $X.$

c. Using part b, on the same plot graph the density of the standard normal and of a general normal with parameters $\sigma =2,$ $\mu =3.$

4
Nov 18

## Little tricks for AP Statistics

This year I am teaching AP Statistics. If the things continue the way they are, about half of the class will fail. Here is my diagnosis and how I am handling the problem.

On the surface, the students lack algebra training but I think the problem is deeper: many of them have underdeveloped cognitive abilities. Their perception is slow, memory is limited, analytical abilities are rudimentary and they are not used to work at home. Limited resources require  careful allocation.

### Terminology

Short and intuitive names are better than two-word professional names.

Instead of "sample space" or "probability space" say "universe". The universe is the widest possible event, and nothing exists outside it.

Instead of "elementary event" say "atom". Simplest possible events are called atoms. This corresponds to the theoretical notion of an atom in measure theory (an atom is a measurable set which has positive measure and contains no set of smaller positive measure).

Then the formulation of classical probability becomes short. Let $n$ denote the number of atoms in the universe and let $n_A$ be the number of atoms in event $A.$ If all atoms are equally likely (have equal probabilities), then $P(A)=n_A/n.$

The clumsy "mutually exclusive events" are better replaced by more visual "disjoint sets". Likewise, instead of "collectively exhaustive events" say "events that cover the universe".

The combination "mutually exclusive" and "collectively exhaustive" events is beyond comprehension for many. I say: if events are disjoint and cover the universe, we call them tiles. To support this definition, play onscreen one of jigsaw puzzles (Video 1) and produce the picture from Figure 1.

Video 1. Tiles (disjoint events that cover the universe)

Figure 1. Tiles (disjoint events that cover the universe)

### The philosophy of team work

We are in the same boat. I mean the big boat. Not the class. Not the university. It's the whole country. We depend on each other. Failure of one may jeopardize the well-being of everybody else.

You work in teams. You help each other to learn. My lectures and your presentations are just the beginning of the journey of knowledge into your heads. I cannot control how it settles there. Be my teaching assistants, share your big and little discoveries with your classmates.

I don't just preach about you helping each other. I force you to work in teams. 30% of the final grade is allocated to team work. Team work means joint responsibility. You work on assignments together. I randomly select a team member for reporting. His or her grade is what each team member gets.

This kind of team work is incompatible with the Western obsession with grades privacy. If I say my grade is nobody's business, by extension I consider the level of my knowledge a private issue. This will prevent me from asking for help and admitting my errors. The situation when students hide their errors and weaknesses from others also goes against the ethics of many workplaces. In my class all grades are public knowledge.

In some situations, keeping the grade private is technically impossible. Conducting a competition without announcing the points won is impossible. If I catch a student cheating, I announce the failing grade immediately, as a warning to others.

To those of you who think team-based learning is unfair to better students I repeat: 30% of the final grade is given for team work, not for personal achievements. The other 70% is where you can shine personally.

### Breaking the wall of silence

Team work serves several purposes.

Firstly, joint responsibility helps breaking communication barriers. See in Video 2 my students working in teams on classroom assignments. The situation when a weaker student is too proud to ask for help and a stronger student doesn't want to offend by offering help is not acceptable. One can ask for help or offer help without losing respect for each other.

Video 2. Teams working on assignments

Secondly, it turns on resources that are otherwise idle. Explaining something to somebody is the best way to improve your own understanding. The better students master a kind of leadership that is especially valuable in a modern society. For the weaker students, feeling responsible for a team improves motivation.

Thirdly, I save time by having to grade less student papers.

On exams and quizzes I mercilessly punish the students for Yes/No answers without explanations. There are no half-points for half-understanding. This, in combination with the team work and open grades policy allows me to achieve my main objective: students are eager to talk to me about their problems.

### Set operations and probability

After studying the basics of set operations and probabilities we had a midterm exam. It revealed that about one-third of students didn't understand this material and some of that misunderstanding came from high school. During the review session I wanted to see if they were ready for a frank discussion and told them: "Those who don't understand probabilities, please raise your hands", and about one-third raised their hands. I invited two of them to work at the board.

Video 3. Translating verbal statements to sets, with accompanying probabilities

Many teachers think that the Venn diagrams explain everything about sets because they are visual. No, for some students they are not visual enough. That's why I prepared a simple teaching aid (see Video 3) and explained the task to the two students as follows:

I am shooting at the target. The target is a square with two circles on it, one red and the other blue. The target is the universe (the bullet cannot hit points outside it). The probability of a set is its area. I am going to tell you one statement after another. You write that statement in the first column of the table. In the second column write the mathematical expression for the set. In the third column write the probability of that set, together with any accompanying formulas that you can come up with. The formulas should reflect the relationships between relevant areas.

Table 1. Set operations and probabilities

 Statement Set Probability 1. The bullet hit the universe $S$$S$ $P(S)=1$$P(S)=1$ 2. The bullet didn't hit the universe $\emptyset$$\emptyset$ $P(\emptyset )=0$$P(\emptyset )=0$ 3. The bullet hit the red circle $A$$A$ $P(A)$$P(A)$ 4. The bullet didn't hit the red circle $\bar{A}=S\backslash A$$\bar{A}=S\backslash A$ $P(\bar{A})=P(S)-P(A)=1-P(A)$$P(\bar{A})=P(S)-P(A)=1-P(A)$ 5. The bullet hit both the red and blue circles $A\cap B$$A\cap B$ $P(A\cap B)$$P(A\cap B)$ (in general, this is not equal to $P(A)P(B)$$P(A)P(B)$) 6. The bullet hit $A$$A$ or $B$$B$ (or both) $A\cup B$$A\cup B$ $P(A\cup B)=P(A)+P(B)-P(A\cap B)$$P(A\cup B)=P(A)+P(B)-P(A\cap B)$ (additivity rule) 7. The bullet hit $A$$A$ but not $B$$B$ $A\backslash B$$A\backslash B$ $P(A\backslash B)=P(A)-P(A\cap B)$$P(A\backslash B)=P(A)-P(A\cap B)$ 8. The bullet hit $B$$B$ but not $A$$A$ $B\backslash A$$B\backslash A$ $P(B\backslash A)=P(B)-P(A\cap B)$$P(B\backslash A)=P(B)-P(A\cap B)$ 9. The bullet hit either $A$$A$ or $B$$B$ (but not both) $(A\backslash B)\cup(B\backslash A)$$(A\backslash B)\cup(B\backslash A)$ $P\left( (A\backslash B)\cup (B\backslash A)\right)$$P\left( (A\backslash B)\cup (B\backslash A)\right)$ $=P(A)+P(B)-2P(A\cap B)$$=P(A)+P(B)-2P(A\cap B)$

During the process, I was illustrating everything on my teaching aid. This exercise allows the students to relate verbal statements to sets and further to their areas. The main point is that people need to see the logic, and that logic should be repeated several times through similar exercises.

8
Oct 17

## Reevaluating probabilities based on piece of evidence

This actually has to do with the Bayes' theorem. However, in simple problems one can use a dead simple approach: just find probabilities of all elementary events. This post builds upon the post on Significance level and power of test, including the notation. Be sure to review that post.

Here is an example from the guide for Quantitative Finance by A. Patton (University of London course code FN3142).

Activity 7.2 Consider a test that has a Type I error rate of 5%, and power of 50%.

Suppose that, before running the test, the researcher thinks that both the null and the alternative are equally likely.

1. If the test indicates a rejection of the null hypothesis, what is the probability that the null is false?

2. If the test indicates a failure to reject the null hypothesis, what is the probability that the null is true?

Denote events R = {Reject null}, A = {fAil to reject null}; T = {null is True}; F = {null is False}. Then we are given:

(1) $P(F)=0.5;\ P(T)=0.5;$

(2) $P(R|T)=\frac{P(R\cap T)}{P(T)}=0.05;\ P(R|F)=\frac{P(R\cap F)}{P(F)}=0.5;$

(1) and (2) show that we can find $P(R\cap T)$ and $P(R\cap F)$ and therefore also $P(A\cap T)$ and $P(A\cap F).$ Once we know probabilities of elementary events, we can find everything about everything.

Figure 1. Elementary events

Answering the first question: just plug probabilities in $P(F|R)=\frac{P(R\cap F)}{P(R)}=\frac{P(R\cap F)}{P(R\cap T)+P(A\cap T)}.$

Answering the second question: just plug probabilities in $P(T|A)=\frac{P(A\cap T)}{P(A)}=\frac{P(A\cap T)}{P(A\cap T)+P(A\cap F)}.$

Patton uses the Bayes' theorem and the law of total probability. The solution suggested above uses only additivity of probability.

6
Oct 17

## Significance level and power of test

In this post we discuss several interrelated concepts: null and alternative hypotheses, type I and type II errors and their probabilities. Review the definitions of a sample space and elementary events and that of a conditional probability.

## Type I and Type II errors

Regarding the true state of nature we assume two mutually exclusive possibilities: the null hypothesis (like the suspect is guilty) and alternative hypothesis (the suspect is innocent). It's up to us what to call the null and what to call the alternative. However, the statistical procedures are not symmetric: it's easier to measure the probability of rejecting the null when it is true than other involved probabilities. This is why what is desirable to prove is usually designated as the alternative.

Usually in books you can see the following table.

 Decision taken Fail to reject null Reject null State of nature Null is true Correct decision Type I error Null is false Type II error Correct decision

This table is not good enough because there is no link to probabilities. The next video does fill in the blanks.

Video. Significance level and power of test

## Significance level and power of test

The conclusion from the video is that

$\frac{P(T\bigcap R)}{P(T)}=P(R|T)=P\text{(Type I error)=significance level}$ $\frac{P(F\bigcap R)}{P(F)}=P(R|F)=P\text{(Correctly rejecting false null)=Power}$
26
Jul 17

## Nonlinear least squares

Here we explain the idea, illustrate the possible problems in Mathematica and, finally, show the implementation in Stata.

## Idea: minimize RSS, as in ordinary least squares

Observations come in pairs $(x_1,y_1),...,(x_n,y_n)$. In case of ordinary least squares, we approximated the y's with linear functions of the parameters, possibly nonlinear in x's. Now we use a function $f(a,b,x_i)$ which may be nonlinear in $a,b$. We still minimize RSS which takes the form $RSS=\sum r_i^2=\sum(y_i-f(a,b,x_i))^2$. Nonlinear least squares estimators are the values $a,b$ that minimize RSS. In general, it is difficult to find the formula (closed-form solution), so in practice software, such as Stata, is used for RSS minimization.

## Simplified idea and problems in one-dimensional case

Suppose we want to minimize $f(x)$. The Newton algorithm (default in Stata) is an iterative procedure that consists of steps:

1. Select the initial value $x_0$.
2. Find the derivative (or tangent) of RSS at $x_0$. Make a small step in the descent direction (indicated by the derivative), to obtain the next value $x_1$.
3. Repeat Step 2, using $x_1$ as the starting point, until the difference between the values of the objective function at two successive points becomes small. The last point $x_n$ will approximate the minimizing point.

Problems:

1. The minimizing point may not exist.
2. When it exists, it may not be unique. In general, there is no way to find out how many local minimums there are and which ones are global.
3. The minimizing point depends on the initial point.

See Video 1 for illustration in the one-dimensional case.

Video 1. NLS geometry

## Problems illustrated in Mathematica

Here we look at three examples of nonlinear functions, two of which are considered in Dougherty. The first one is a power functions (it can be linearized applying logs) and the second is an exponential function (it cannot be linearized). The third function gives rise to two minimums. The possibilities are illustrated in Mathematica.

Video 2. NLS illustrated in Mathematica

## Finally, implementation in Stata

Here we show how to 1) generate a random vector, 2) create a vector of initial values, and 3) program a nonlinear dependence.

Video 3. NLS implemented in Stata

10
Jul 17

## Alternatives to simple regression in Stata

In this post we looked at dependence of EARNINGS on S (years of schooling). In the end I suggested to think about possible variations of the model. Specifically, could the dependence be nonlinear? We consider two answers to this question.

This name is used for the quadratic dependence of the dependent variable on the independent variable. For our variables the dependence is

$EARNINGS=a+bS+cS^2+u$.

Note that the dependence on S is quadratic but the right-hand side is linear in the parameters, so we still are in the realm of linear regression. Video 1 shows how to run this regression.

Video 1. Running quadratic regression in Stata

## Nonparametric regression

The general way to write this model is

$y=m(x)+u.$

The beauty and power of nonparametric regression consists in the fact that we don't need to specify the functional form of dependence of $y$ on $x$. Therefore there are no parameters to interpret, there is only the fitted curve. There is also the estimated equation of the nonlinear dependence, which is too complex to consider here. I already illustrated the difference between parametric and nonparametric regression. See in Video 2 how to run nonparametric regression in Stata.

Video 2. Nonparametric dependence

6
Jul 17

## Running simple regression in Stata

Running simple regression in Stata is, well, simple. It's just a matter of a couple of clicks. Try to make it a small research.

1. Obtain descriptive statistics for your data (Statistics > Summaries, tables, and tests > Summary and descriptive statistics > Summary statistics). Look at all that stuff you studied in introductory statistics: units of measurement, means, minimums, maximums, and correlations. Knowing the units of measurement will be important for interpreting regression results; correlations will predict signs of coefficients, etc. In your report, don't just mechanically repeat all those measures; try to find and discuss something interesting.
2. Visualize your data (Graphics > Twoway graph). On the graph you can observe outliers and discern possible nonlinearity.
3. After running regression, report the estimated equation. It is called a fitted line and in our case looks like this: Earnings = -13.93+2.45*S (use descriptive names and not abstract X,Y). To see if the coefficient of S is significant, look at its p-value, which is smaller than 0.001. This tells us that at all levels of significance larger than or equal to 0.001 the null that the coefficient of S is significant is rejected. This follows from the definition of p-value. Nobody cares about significance of the intercept. Report also the p-value of the F statistic. It characterizes significance of all nontrivial regressors and is important in case of multiple regression. The last statistic to report is R squared.
4. Think about possible variations of the model. Could the dependence of Earnings on S be nonlinear? What other determinants of Earnings would you suggest from among the variables in Dougherty's file?

Figure 1. Looking at data. For data, we use a scatterplot.

Figure 2. Running regression (Statistics > Linear models and related > Linear regression)