24
Oct 22

Marginal probabilities and densities

This is to help everybody, from those who study Basic Statistics up to Advanced Statistics ST2133.

Discrete case

Suppose in a box we have coins and banknotes of only two denominations: $1 and$5 (see Figure 1).

Figure 1. Illustration of two variables

We pull one out randomly. The division of cash by type (coin or banknote) divides the sample space (shown as a square, lower left picture) with probabilities $p_{c}$ and $p_{b}$ (they sum to one). The division by denomination ($1 or$5) divides the same sample space differently, see the lower right picture, with the probabilities to pull out $1 and$5 equal to $p_{1}$ and $p_{5}$, resp. (they also sum to one). This is summarized in the tables

 Variable 1: Cash type Prob coin $p_{c}$$p_{c}$ banknote $p_{b}$$p_{b}$
 Variable 2: Denomination Prob $1 $p_{1}$$p_{1}$$5 $p_{5}$$p_{5}$

Now we can consider joint events and probabilities (see Figure 2, where the two divisions are combined).

Figure 2. Joint probabilities

For example, if we pull out a random $item$ it can be a $coin$ and $1 and the corresponding probability is $P\left(item=coin,\ item\ value=\1\right) =p_{c1}.$ The two divisions of the sample space generate a new division into four parts. Then geometrically it is obvious that we have four identities: Adding over denominations: $p_{c1}+p_{c5}=p_{c},$ $p_{b1}+p_{b5}=p_{b},$ Adding over cash types: $p_{c1}+p_{b1}=p_{1},$ $p_{c5}+p_{b5}=p_{5}.$ Formally, here we use additivity of probability for disjoint events $P\left( A\cup B\right) =P\left( A\right) +P\left( B\right) .$ In words: we can recover own probabilities of variables 1,2 from joint probabilities. Generalization Suppose we have two discrete random variables $X,Y$ taking values $x_{1},...,x_{n}$ and $y_{1},...,y_{m},$ resp., and their own probabilities are $P\left( X=x_{i}\right) =p_{i}^{X},$ $P\left(Y=y_{j}\right) =p_{j}^{Y}.$ Denote the joint probabilities $P\left(X=x_{i},Y=y_{j}\right) =p_{ij}.$ Then we have the identities (1) $\sum_{j=1}^mp_{ij}=p_{i}^{X},$ $\sum_{i=1}^np_{ij}=p_{j}^{Y}$ ($n+m$ equations). In words: to obtain the marginal probability of one variable (say, $Y$) sum over the values of the other variable (in this case, $X$). The name marginal probabilities is used for $p_{i}^{X},p_{j}^{Y}$ because in the two-dimensional table they arise as a result of summing table entries along columns or rows and are displayed in the margins. Analogs for continuous variables with densities Suppose we have two continuous random variables $X,Y$ and their own densities are $f_{X}$ and $f_{Y}.$ Denote the joint density $f_{X,Y}$. Then replacing in (1) sums by integrals and probabilities by densities we get (2) $\int_R f_{X,Y}\left( x,y\right) dy=f_{X}\left( x\right) ,\ \int_R f_{X,Y}\left( x,y\right) dx=f_{Y}\left( y\right) .$ In words: to obtain one marginal density (say, $f_{Y}$) integrate out the other variable (in this case, $x$). 27 Jan 21 My book is gaining international recognition AP Stats and Business Stats Its content, organization and level justify its adoption as a textbook for introductory statistics for Econometrics in most American or European universities. The book's table of contents is somewhat standard, the innovation comes in a presentation that is crisp, concise, precise and directly relevant to the Econometrics course that will follow. I think instructors and students will appreciate the absence of unnecessary verbiage that permeates many existing textbooks. Having read Professor Mynbaev's previous books and research articles I was not surprised with his clear writing and precision. However, I was surprised with an informal and almost conversational one-on-one style of writing which should please most students. The informality belies a careful presentation where great care has been taken to present the material in a pedagogical manner. Carlos Martins-Filho Professor of Economics University of Colorado at Boulder Boulder, USA 18 Oct 20 People need real knowledge People need real knowledge Traffic analysis The number of visits to my website has exceeded 206,000. This number depends on what counts as a visit. An external counter, visible to everyone, writes cookies to the reader's computer and counts many visits from one reader as one. The number of individual readers has reached 23,000. The external counter does not give any more statistics. I will give all the numbers from the internal counter, which is visible only to the site owner. I have a high percentage of complex content. After reading one post, the reader finds that the answer he is looking for depends on the preliminary material. He starts digging it and then has to go deeper and deeper. Hence the number 206,000, that is, one reader visits the site on average 9 times on different days. Sometimes a visitor from one post goes to another by link on the same day. Hence another figure: 310,000 readings. I originally wrote simple things about basic statistics. Then I began to write accompanying materials for each advanced course that I taught at Kazakh-British Technical University (KBTU). The shift in the number and level of readership shows that people need deep knowledge, not bait for one-day moths. For example, my simple post on basic statistics was read 2,300 times. In comparison, the more complex post on the Cobb-Douglas function has been read 7,100 times. This function is widely used in economics to model consumer preferences (utility function) and producer capabilities (production function). In all textbooks it is taught using two-dimensional graphs, as P. Samuelson proposed 85 years ago. In fact, two-dimensional graphs are obtained by projection of a three-dimensional graph, which I show, making everything clear and obvious. The answer to one of the University of London (UoL) exam problems attracted 14,300 readers. It is so complicated that I split the answer into two parts, and there are links to additional material. On the UoL exam, students have to solve this problem in 20-30 minutes, which even I would not be able to do. Why my site is unique My site is unique in several ways. Firstly, I tell the truth about the AP Statistics books. This is a basic statistics course for those who need to interpret tables, graphs and simple statistics. If you have a head on your shoulders, and not a Google search engine, all you need to do is read a small book and look at the solutions. I praise one such book in my reviews. You don't need to attend a two-semester course and read an 800-page book. Moreover, one doesn't need 140 high-quality color photographs that have nothing to do with science and double the price of a book. Many AP Statistics consumers (that's right, consumers, not students) believe that learning should be fun. Such people are attracted by a book with anecdotes that have no relation to statistics or the life of scientists. In the West, everyone depends on each other, and therefore all the reviews are written in a superlative degree and streamlined. Thank God, I do not depend on the Western labor market, and therefore I tell the truth. Part of my criticism, including the statistics textbook selected for the program "100 Textbooks" of the Ministry of Education and Science of Kazakhstan (MES), is on Facebook. Secondly, I have the world's only online, free, complete matrix algebra tutorial with all the proofs. Free courses on Udemy, Coursera and edX are not far from AP Statistics in terms of level. Courses at MIT and Khan Academy are also simpler than mine, but have the advantage of being given in video format. The third distinctive feature is that I help UoL students. It is a huge organization spanning 17 universities and colleges in the UK and with many branches in other parts of the world. The Economics program was developed by the London School of Economics (LSE), one of the world's leading universities. The problem with LSE courses is that they are very difficult. After the exams, LSE puts out short recommendations on the Internet for solving problems like: here you need to use such and such a theory and such and such an idea. Complete solutions are not given for two reasons: they do not want to help future examinees and sometimes their problems or solutions contain errors (who does not make errors?). But they also delete short recommendations after a year. My site is the only place in the world where there are complete solutions to the most difficult problems of the last few years. It is not for nothing that the solution to one problem noted above attracted 14,000 visits. Fourthly, my site is unique in terms of the variety of material: statistics, econometrics, algebra, optimization, and finance. The average number of visits is about 100 per day. When it's time for students to take exams, it jumps to 1-2 thousand. The total amount of materials created in 5 years is equivalent to 5 textbooks. It takes from 2 hours to one day to create one post, depending on the level. After I published this analysis of the site traffic on Facebook, my colleague Nurlan Abiev decided to write posts for the site. I pay for the domain myself,$186 per year. It would be nice to make the site accessible to students and schoolchildren of Kazakhstan, but I don't have time to translate from English.

Once I was looking at the requirements of the MES for approval of electronic textbooks. They want several copies of printouts of all (!) materials and a solid payment for the examination of the site. As a result, all my efforts to create and maintain the site so far have been a personal initiative that does not have any support from the MES and its Committee on Science.

26
May 20

17
Mar 19

AP Statistics the Genghis Khan way 2

Last semester I tried to explain theory through numerical examples. The results were terrible. Even the best students didn't stand up to my expectations. The midterm grades were so low that I did something I had never done before: I allowed my students to write an analysis of the midterm at home. Those who were able to verbally articulate the answers to me received a bonus that allowed them to pass the semester.

This semester I made a U-turn. I announced that in the first half of the semester we will concentrate on theory and we followed this methodology. Out of 35 students, 20 significantly improved their performance and 15 remained where they were.

Midterm exam, version 1

1. General density definition (6 points)

a. Define the density $p_X$ of a random variable $X.$ Draw the density of heights of adults, making simplifying assumptions if necessary. Don't forget to label the axes.

b. According to your plot, how much is the integral $\int_{-\infty}^0p_X(t)dt?$ Explain.

c. Why the density cannot be negative?

d. Why the total area under the density curve should be 1?

e. Where are basketball players on your graph? Write down the corresponding expression for probability.

f. Where are dwarfs on your graph? Write down the corresponding expression for probability.

This question is about the interval formula. In each case students have to write the equation for the probability and the corresponding integral of the density. At this level, I don't talk about the distribution function and introduce the density by the interval formula.

2. Properties of means (8 points)

a. Define a discrete random variable and its mean.

b. Define linear operations with random variables.

c. Prove linearity of means.

d. Prove additivity and homogeneity of means.

e. How much is the mean of a constant?

f. Using induction, derive the linearity of means for the case of $n$ variables from the case of two variables (3 points).

3. Covariance properties (6 points)

a. Derive linearity of covariance in the first argument when the second is fixed.

b. How much is covariance if one of its arguments is a constant?

c. What is the link between variance and covariance? If you know one of these functions, can you find the other (there should be two answers)? (4 points)

4. Standard normal variable (6 points)

a. Define the density $p_z(t)$ of a standard normal.

b. Why is the function $p_z(t)$ even? Illustrate this fact on the plot.

c. Why is the function $f(t)=tp_z(t)$ odd? Illustrate this fact on the plot.

d. Justify the equation $Ez=0.$

e. Why is $V(z)=1?$

f. Let $t>0.$ Show on the same plot areas corresponding to the probabilities $A_1=P(0 $A_2=P(z>t),$ $A_3=P(z<-t),$ $A_4=P(-t Write down the relationships between $A_1,...,A_4.$

5. General normal variable (3 points)

a. Define a general normal variable $X.$

b. Use this definition to find the mean and variance of $X.$

c. Using part b, on the same plot graph the density of the standard normal and of a general normal with parameters $\sigma =2,$ $\mu =3.$

Midterm exam, version 2

1. General density definition (6 points)

a. Define the density $p_X$ of a random variable $X.$ Draw the density of work experience of adults, making simplifying assumptions if necessary. Don't forget to label the axes.

b. According to your plot, how much is the integral $\int_{-\infty}^0p_X(t)dt?$ Explain.

c. Why the density cannot be negative?

d. Why the total area under the density curve should be 1?

e. Where are retired people on your graph? Write down the corresponding expression for probability.

f. Where are young people (up to 25 years old) on your graph? Write down the corresponding expression for probability.

2. Variance properties (8 points)

a. Define variance of a random variable. Why is it non-negative?

b. Define the formula for variance of a linear combination of two variables.

c. How much is variance of a constant?

d. What is the formula for variance of a sum? What do we call homogeneity of variance?

e. What is larger: $V(X+Y)$ or $V(X-Y)$? (2 points)

f. One investor has 100 shares of Apple, another - 200 shares. Which investor's portfolio has larger variability? (2 points)

3. Poisson distribution (6 points)

a. Write down the Taylor expansion and explain the idea. How are the Taylor coefficients found?

b. Use the Taylor series for the exponential function to define the Poisson distribution.

c. Find the mean of the Poisson distribution. What is the interpretation of the parameter $\lambda$ in practice?

4. Standard normal variable (6 points)

a. Define the density $p_z(t)$ of a standard normal.

b. Why is the function $p_z(t)$ even? Illustrate this fact on the plot.

c. Why is the function $f(t)=tp_z(t)$ odd? Illustrate this fact on the plot.

d. Justify the equation $Ez=0.$

e. Why is $V(z)=1?$

f. Let $t>0.$ Show on the same plot areas corresponding to the probabilities $A_1=P(0 $A_2=P(z>t),$ $A_{3}=P(z<-t),$ $A_4=P(-t Write down the relationships between $A_{1},...,A_{4}.$

5. General normal variable (3 points)

a. Define a general normal variable $X.$

b. Use this definition to find the mean and variance of $X.$

c. Using part b, on the same plot graph the density of the standard normal and of a general normal with parameters $\sigma =2,$ $\mu =3.$

16
Mar 19

AP Statistics the Genghis Khan way 1

Recently I enjoyed reading Jack Weatherford's "Genghis Khan and the Making of the Modern World" (2004). I was reading the book with a specific question in mind: what were the main reasons of the success of the Mongols? Here you can see the list of their innovations, some of which were in fact adapted from the nations they subjugated. But what was the main driving force behind those innovations? The conclusion I came to is that Genghis Khan was a genial psychologist. He used what he knew about individual and social psychology to constantly improve the government of his empire.

I am no Genghis Khan but I try to base my teaching methods on my knowledge of student psychology.

Problems and suggested solutions

Steven Krantz in his book (How to teach mathematics : Second edition, 1998, don't remember the page) says something like this: If you want your students to do something, arrange your classes so that they do it in the class.

Problem 1. Students mechanically write down what the teacher says and writes.

Solution. I don't allow my students to write while I am explaining the material. When I explain, their task is to listen and try to understand. I invite them to ask questions and prompt me to write more explanations and comments. After they all say "We understand", I clean the board and then they write down whatever they understood and remembered.

Problem 2. Students are not used to analyze what they read or write.

Solution. After students finish their writing, I ask them to exchange notebooks and check each other's writings. It's easier for them to do this while everything is fresh in their memory. I bought and distributed red pens. When they see that something is missing or wrong, they have to write in red. Errors or omissions must stand out. Thus, right there in the class students repeat the material twice.

Problem 3. Students don't study at home.

Solution. I let my students know in advance what the next quiz will be about. Even with this knowledge, most of them don't prepare at home. Before the quiz I give them about half an hour to repeat and discuss the material (this is at least the third repetition). We start the quiz when they say they are ready.

Problem 4. Students don't understand that active repetition (writing without looking at one's notes) is much more productive than passive repetition (just reading the notes).

Solution. Each time before discussion sessions I distribute scratch paper and urge students to write, not just read or talk. About half of them follow my recommendation. Their desire to keep their notebooks neat is not their last consideration. The solution to Problem 1 also hinges upon active repetition.

Problem 5. If students work and are evaluated individually, usually there is no or little interaction between them.

Solution. My class is divided in teams (currently I have teams of two to six people). I randomly select one person from each team to write the quiz. That person's grade is the team's grade. This forces better students to coach others and weaker students to seek help.

Problem 6. Some students don't want to work in teams. They are usually either good students, who don't want to suffer because of weak team members, or weak students, who don't want their low grades to harm other team members.

Solution. The good students usually argue that it's not fair if their grade becomes lower because of somebody else's fault. My answer to them is that the meaning of fairness depends on the definition. In my grading scheme, 30 points out of 100 is allocated for team work and the rest for individual achievements. Therefore I never allow good students to work individually. I want them to be my teaching assistants and help other students. While doing so, I tell them that I may reward good students with a bonus in the end of the semester. In some cases I allow weak students to write quizzes individually but only if the team so requests. The request of the weak student doesn't matter. The weak student still has to participate in team discussions.

Problem 7. There is no accumulation of theoretical knowledge (flat learning curve).

Solution. a) Most students come from high school with little experience in algebra. I raise the level gradually and emphasize understanding. Students never see multiple choice questions in my classes. They also know that right answers without explanations will be discarded.

b) Normally, during my explanations I fill out the board. The amount of the information the students have to remember is substantial and increases over time. If you know a better way to develop one's internal vision, let me know.

c) I don't believe in learning the theory by doing applied exercises. After explaining the theory I formulate it as a series of theoretical exercises. I give the theory in large, logically consistent blocks for students to see the system. Half of exam questions are theoretical (students have to provide proofs and derivations) and the other half - applied.

d) The right motivation can be of two types: theoretical or applied, and I never substitute one for another.

Problem 8. In low-level courses you need to conduct frequent evaluations to keep your students in working shape. Multiply that by the number of students, and you get a serious teaching overload.

Solution. Once at a teaching conference in Prague my colleague from New York boasted that he grades 160 papers per week. Evaluating one paper per team saves you from that hell.

Outcome

In the beginning of the academic year I had 47 students. In the second semester 12 students dropped the course entirely or enrolled in Stats classes taught by other teachers. Based on current grades, I expect 15 more students to fail. Thus, after the first year I'll have about 20 students in my course (if they don't fail other courses). These students will master statistics at the level of my book.

8
Oct 17

Reevaluating probabilities based on piece of evidence

This actually has to do with the Bayes' theorem. However, in simple problems one can use a dead simple approach: just find probabilities of all elementary events. This post builds upon the post on Significance level and power of test, including the notation. Be sure to review that post.

Here is an example from the guide for Quantitative Finance by A. Patton (University of London course code FN3142).

Activity 7.2 Consider a test that has a Type I error rate of 5%, and power of 50%.

Suppose that, before running the test, the researcher thinks that both the null and the alternative are equally likely.

1. If the test indicates a rejection of the null hypothesis, what is the probability that the null is false?

2. If the test indicates a failure to reject the null hypothesis, what is the probability that the null is true?

Denote events R = {Reject null}, A = {fAil to reject null}; T = {null is True}; F = {null is False}. Then we are given:

(1) $P(F)=0.5;\ P(T)=0.5;$

(2) $P(R|T)=\frac{P(R\cap T)}{P(T)}=0.05;\ P(R|F)=\frac{P(R\cap F)}{P(F)}=0.5;$

(1) and (2) show that we can find $P(R\cap T)$ and $P(R\cap F)$ and therefore also $P(A\cap T)$ and $P(A\cap F).$ Once we know probabilities of elementary events, we can find everything about everything.

Figure 1. Elementary events

Answering the first question: just plug probabilities in $P(F|R)=\frac{P(R\cap F)}{P(R)}=\frac{P(R\cap F)}{P(R\cap T)+P(A\cap T)}.$

Answering the second question: just plug probabilities in $P(T|A)=\frac{P(A\cap T)}{P(A)}=\frac{P(A\cap T)}{P(A\cap T)+P(A\cap F)}.$

Patton uses the Bayes' theorem and the law of total probability. The solution suggested above uses only additivity of probability.

6
Oct 17

Significance level and power of test

In this post we discuss several interrelated concepts: null and alternative hypotheses, type I and type II errors and their probabilities. Review the definitions of a sample space and elementary events and that of a conditional probability.

Type I and Type II errors

Regarding the true state of nature we assume two mutually exclusive possibilities: the null hypothesis (like the suspect is guilty) and alternative hypothesis (the suspect is innocent). It's up to us what to call the null and what to call the alternative. However, the statistical procedures are not symmetric: it's easier to measure the probability of rejecting the null when it is true than other involved probabilities. This is why what is desirable to prove is usually designated as the alternative.

Usually in books you can see the following table.

 Decision taken Fail to reject null Reject null State of nature Null is true Correct decision Type I error Null is false Type II error Correct decision

This table is not good enough because there is no link to probabilities. The next video does fill in the blanks.

Video. Significance level and power of test

Significance level and power of test

The conclusion from the video is that

$\frac{P(T\bigcap R)}{P(T)}=P(R|T)=P\text{(Type I error)=significance level}$ $\frac{P(F\bigcap R)}{P(F)}=P(R|F)=P\text{(Correctly rejecting false null)=Power}$
4
Sep 17

Geometry related to derivatives

In a sequence of videos I explain the main ideas pursued by the fathers of Calculus - Isaac Newton and Gottfried Wilhelm Leibniz.

Derivative equals speed

Here we assume that a point moves along a straight line and try to find its speed as usual. We divide the distance traveled by the time it takes to travel it. The ratio is an average speed over a time interval. As we reduce the length of the time interval, we get a better and better approximation to the exact (instantaneous) speed at a point in time.

Video 1. Derivative is speed

Position of point as a function of time

Working with the visualization of the point movement on a straight line is inconvenient because it is difficult to correlate the point position to time. It is much better to visualize the movement on the space-time plane where the horizontal axis is for time and the vertical axis is for the point position.

Video 2. Position of point as function of time

Measuring the slope of a straight line

A little digression: how do you measure the slope of a straight line, if you know the values of the function at different points?

Video 3. Measuring the slope of a straight line

Derivative as the slope of a tangent line

This is like putting two and two together: we apply the previous definition to the slope of a secant drawn through two points on a graph. Then it remains to notice that the secant approaches the tangent line, as the second point approaches the first.

Video 4. Derivative as tangent slope

From function to its derivative

This is a very useful exercise that allows later to come up with the optimization conditions, called first order and second order conditions.

Video 5. From function to its derivative

Conclusion

Let $P(t)$ be some function and fix an initial point $t_1$. The derivative $P^\prime(t_1)$ is defined as the limit

$P^\prime(t_1)=\lim_{t_2\rightarrow t_1}\frac{P(t_2)-P(t_1)}{t_2-t_1}.$

When $P(t)$ describes the movement of a point along a straight line, the derivative gives the speed of that point. When $P(t)$ is drawn on a plane, the derivative gives the slope of the tangent line to the graph.

11
Aug 17

Violations of classical assumptions

This will be a simple post explaining the common observation that "in Economics, variability of many variables is proportional to those variables". Make sure to review the assumptions; they tend to slip from memory. We consider the simple regression

(1) $y_i=a+bx_i+e_i.$

One of classical assumptions is

Homoscedasticity. All errors have the same variances$Var(e_i)=\sigma^2$ for all $i$.

We discuss its opposite, which is

Heteroscedasticity. Not all errors have the same variance. It would be wrong to write it as $Var(e_i)\ne\sigma^2$ for all $i$ (which means that all errors have variance different from $\sigma^2$). You can write that not all $Var(e_i)$ are the same but it's better to use the verbal definition.

Remark about Video 1. The dashed lines can represent mean consumption. Then the fact that variation of a variable grows with its level becomes more obvious.

Video 1. Case for heteroscedasticity

Figure 1. Illustration from Dougherty: as x increases, variance of the error term increases

Homoscedasticity was used in the derivation of the OLS estimator variance; under heteroscedasticity that expression is no longer valid. There are other implications, which will be discussed later.

Companies example. The Samsung Galaxy Note 7 battery fires and explosions that caused two recalls cost the smartphone maker at least $5 billion. There is no way a small company could have such losses. GDP example. The error in measuring US GDP is on the order of$200 bln, which is comparable to the Kazakhstan GDP. However, the standard deviation of the ratio error/GDP seems to be about the same across countries, if the underground economy is not too big. Often the assumption that the standard deviation of the regression error is proportional to one of regressors is plausible.

To see if the regression error is heteroscedastic, you can look at the graph of the residuals or use statistical tests.