Dec 19

Review of mt3042 Optimization Guide by M. Baltovic

Review of mt3042 Optimization Guide by M. Baltovic

by Kairat Mynbaev, professor, ISE, KBTU, Almaty, Kazakhstan

This is a one-semester course in Optimization. It covers the following topics:

  • Introduction and review of relevant parts from real analysis, with emphasis on higher dimensions.
  • Weierstrass' Theorem on continuous functions on compact sets.
  • Review with added rigour of unconstrained optimization of differentiable functions.
  • Lagrange's Theorem on equality-constrained optimization.
  • The Kuhn-Tucker Theorem on inequality-constrained optimization.
  • Finite and infinite horizon dynamic programming.

The course is based mainly on the book by Sundaram, L.R. A First Course in Optimization Theory. (Cambridge University Press, 1996).

The course is given to fourth year students. I evaluate the guide on two points: how well it expands the students’ theoretical horizon and how well it prepares them to the challenges they may face while applying their knowledge in their careers, whether in industry or in academia.

  1. The exposition is overly formal. Experience shows that most students don’t understand the definition of continuity in terms of epsilon-delta. Right after giving it on p.18, the author gives an alternative definition in terms of sequences. I think it’s better to omit the former definition altogether and rephrase everything in terms of sequences. After all, the compactness notion relies on sequences.
  2. The differentiability definition 2.21 on p.19 is simply indigestible. It is in fact the Fréchet derivative applicable in the infinite-dimensional case. Who needs it here? Why not define the matrix as a matrix of partial derivatives, which students know from previous courses?
  3. The solution to Example 3.2 is overblown. A professional mathematician never thinks like that. A pro would explain the idea as follows: because of Condition 2, the function is close to zero in some neighborhood of infinity. Therefore, a maximum should be looked for in a bounded closed set. Since this is a compact, the Weierstrass theorem applies. With a proper graphical illustration, the students don’t need anything else. Remarks similar to this apply to many solutions in the guide. As a result, the students don’t see the forest behind the trees.
  4. Regarding the theoretical conditions, the author refers to Sundaram without explaining why they are imposed. Explanations in Sundaram are far from intuitive (see the first of them on p.107). In all cases for n=2 it is possible to give relatively simple intuitive explanations. Specifically,
    1. For first-order conditions see
    2. For second-order conditions see
    3. For the constraint qualification condition, see
      The explanation on pp.58-60 is hard to understand because of dimensionality.
    4. For the Lagrange method see
      (necessary conditions),
      (sufficient conditions),
      (case of many constraints) and
      (multiplier interpretation).
    5. In case of the Kuhn-Tucker theorem, the most important point is that, once the binding constraints have been determined, the nonbinding ones can be omitted from the analysis
      The proof of nonnegativity of the Lagrange multiplier for binding constraints is less than one page:
  5. In solutions that rely on the Kuhn-Tucker theorem, the author suggests to check the constraint qualification condition for all possible combinations of constraints. Not only is this time consuming, but this is also misleading, given the fact that often it is possible to determine the binding constraints and use the Lagrange method instead of the Kuhn-Tucker theorem.
  6. Minor remark: In Example 5.2, the function f(x,1-x) is obviously nonnegative.
  7. There are many optimization methods not covered in Sundaram’s book. One of them, Pontryagin’s maximum principle, is more general than the Bellman approach, see
    It may be too advanced for this course. However, convexity is covered by Sundaram and omitted by Baltovic, while it can be successfully applied to solve some problems from the guide, see
  8. Another important area that is covered neither in the book nor in the guide is numerical optimization. Numerical optimization is as important as the Maximum Likelihood method in Statistics because ML realization in any software employs numerical optimization. People need to learn at least one method in numerical optimization to understand error messages produced on the computer.
  9. Speaking of applications, I would eliminate all big exercises that have nothing to do with Economics or Finance, such as Exercise 6.2.
  10. In the solution to Exercise 6.1, the author produces a 3-page analysis but then in one line discovers that all that was for nothing because the profit is infinite. What’s the pedagogical value of such an exposition?

Conclusion. This is a badly written guide for a badly designed course. I suggest removing it from our program.

Mar 17

Review of Dougherty "Introduction to Econometrics"

Review of Dougherty "Introduction to Econometrics" 4th edition, Oxford, 2011.

The 5th edition is already available but I don't have it.

What I like

The book is as comprehensive as it can be, given its target audience (undergraduate programs) and Math constraints (matrix algebra is not required). It goes through all major topics usually included in such texts and has much more: three chapters on models with time series data and panel data models.

The economic side of the subject is always in focus, and that makes the book a good reference for practitioners. The Math is kept at a manageable level, to my taste. Specifically, the derivations related to simple regression are detailed, and the reader is expected to be able to handle proofs up to two pages long. Longer derivations and proofs, which are in fact a subject of journal publications, are, naturally, omitted.

Probably, the most part of the material can be taught with even less algebra, because the author provides PowerPoint slides which illuminate intuition. On the other hand, many statements sound vague without derivations, and the book presents challenges for those who want to understand everything. When my students complain that my explanation is not in the book, I tell them that they have to read between the lines. Somebody on Amazon.com said that the book is impossible to understand without instructor's help. That is occasionally true. Moreover, as a person who has taken bachelor, master's and PhD courses and given a bachelor course in Econometrics in the US, I think the level corresponds to the master's level in an average US university.

What I don't like

In the 3rd edition, the author used the short notation for OLS estimators that all professionals and I use. In the 4th edition he switched to that awkward notation with summation signs, which made most derivations at least a quarter longer. Probably, this change was caused by the fact that students were not familiar with properties of covariances and variances. Dougherty has a Statistics review chapter for a reason.

In general, the instructor may decide whether to keep the level low or to engage at full throttle. It's not like that at my university. The book is a required reading at affiliate centers of the University of London, and we are one of them. We have an unusual practice of separating weaker students from stronger ones. I teach the stronger ones. Even in my group, half of students struggle with summation signs. I think instead of switching from the professional notation to the longer one, it would be better to adjust the prerequisite Statistics courses. The University of London lives in its own universe anyway.

The biggest challenge is not the book itself but the UoL Econometrics exam.

Christopher Dougherty visited Almaty, and we had some good time together. He said he is not the one who composes the UoL exams. Therefore, some discrepancy between the book's content and the exam questions is inevitable. However, when exams require more theory than the book has, that's not normal. For example, the theorem I prove here is not in the book.

The exams coverage is also a problem. The book, as I said, is comprehensive and almost encyclopedic for a one-year course. Every little detail may appear on the exam. Even I, after teaching this course for many years, don't remember all the details.

The exams require a much higher level of formal thinking and proficiency in algebra than the book implies. For instance, the genuine understanding of the notion of cointegration hinges upon linear independence of infinite-dimensional vectors. When I told my American friend that we give cointegration and panel data models, he was very impressed. Thank God, in the last two years panel data models have been dropped from the curriculum. The purpose of this whole site is to help Econometrics students of the UoL.


Highly recommend: the coverage and the balance between intuition and Math can satisfy the needs of many instructors and courses.

The combination book+exam at affiliate centers of the UoL creates huge problems. If you want to use the Econometrics course as a screening device for potential Nobel laureates, that's fine. But then you have to prepare the students for the challenges of the course. Our students have two years to study Statistics. Of these two years, three semesters they spend at the AP Stats level (only intuition, no algebra or formal proofs), see Statistics 1 Guide by J. Abdey.  And then in one semester they are expected to leap to the level required by Dougherty's book and UoL exams, see Statistics 2 Guide. The learning curve is flat for three semesters, steep in the fourth semester, and then the students have to fly beyond the clouds for one year!

I've been talking about this since 2011, but the decision makers at the UoL wouldn't listen to me. Formally, most prerequisites are covered in Statistics 1 and 2 (J. Abdey told me that he included in his guides what was requested by the Econometrics people of the LSE). But developing logic takes much longer than one semester. That's what nobody wants to understand.


Feb 17

Review of Hinders "5 steps"

Review of Hinders "5 steps, 2010-2011 Edition"

This is a review of "5 Steps to a 5 AP Statistics, 2010-2011 Edition" by Duane Hinders. The latest edition is "5 Steps to a 5 AP Statistics, 2017 Edition" by C. Andreasen, D. Hinders, and D. McDonald, which I, unfortunately, don't have.

The main part of the book has 14 chapters. These chapters plus the preface, introduction, practice exams, appendices etc. are organized in 5 units, that's why the "5 steps" in the name of the book. The book is concise and explanations are as clear as they can be without derivations. See for yourself: of the 14 chapters, the first four contain mainly methodological recommendations and the real study starts from Chapter 5. There are just 385 pages and the latest edition is about the same size.

When one skips the theory and uses book formulas blindly, there is no guarantee that the result will be correct, because the book may contain errors and typos. Hinders does not avoid a common misconception about confidence intervals, but I hope that's the only lapse.

Figure 1. Cognitive ability

Unlike other books I reviewed before (Agresti and FranklinAlbert and RossmanNewbold, Carlson and Thorne, and Bock, Velleman, De Veaux), this one has real exam questions and better uses the reader's time. If I wanted to pass the AP exam, without spending too much time on preparation, I would read the "5 steps".

Perhaps, the exposition wouldn't look so clear to me if I didn't have prior knowledge of Statistics but younger readers may have the advantage I don't: a better memory. I try to illustrate this in Figure 1. I have stronger analytical skills than most 20-year olds but their memory is better. Because of the age trade-off, overall my cognitive abilities are about the same as those of young people. Read and think about every word, and you may succeed.

Jan 17

Review of Newbold, Carlson and Thorne

Review of Newbold, Carlson and Thorne "Statistics for Business and Economics", Pearson, 7th edition, 2010

Paul Newbold, the senior author, was a British economist known for his contributions to econometrics and time series analysis. His most famous contribution was a 1974 paper co-authored with Clive Granger, Nobel laureate, which introduced the concept of cointegration.

In addition to usual AP Stats topics, the book contains chapters on nonparametric estimation, comparison of subpopulation means, forecasting with time series models and introduction to decision theory. Overall, the level is higher than AP Stats. This textbook is required for University of London Statistics courses.

What I like

There is no pointless, distracting talk so often seen in AP Stats texts. Except for a couple of definitions in the first two (elementary) chapters, the exposition is clear.

The book attempts to introduce the reader to algebra gently, through verbal statements of formulas, but quickly (in Chapter 2) switches to a mathematical notation that I find more adequate for AP Stats.

The probabilities are given where they are supposed to, right after using graphs and numerical measures to describe the data.

There are two types of exercises. End-of-section exercises are accessible to AP Stats students. The majority of end-of-chapter ones are much more complex. For example, Exercise 5.96 requires an explicit application of properties of three distributions: Bernoulli, binomial and normal. I learned a lot while solving exercises with my students. On one occasion the theory given in the book was insufficient, my students found on the Internet the missing part and explained the solution to me (it was about Poisson processes).

Among the applications of regression, there is the beta measure of financial risk.

What I don't like

On the theory side the book may be satisfactory for AP Stats but not for the University of London. There are rudimentary proofs here and there; however, in general it's the same recipe approach. After two years of teaching following this approach I became convinced that it doesn't work and wrote my book.


Figure 10.5. Flow Chart for Selecting the Appropriate Hypothesis test When Comparing Two Population Means

In case of means, the breakage of logical links is seen especially well. Different means are given in different places, and the students don't see how they are related to one another.

The proof that the sample mean unbiasedly estimates the population mean is relatively advanced. Students should be gradually prepared for proofs this complex. Since there is no systematic algebraic training, most students do not understand the proof given in the Appendix to Chapter 6.

Figures 9.10 and 10.5 for choosing the appropriate decision rules are definitely helpful but who is going to remember them without derivations?

Conclusion: thumbs up!

The "Business" part of the name of the book may be misleading. This is not a book for those who want to understand just the basics of Statistics and/or to pass AP Stats because it is a required course. For future economists, this is an excellent introduction to Statistics, with two caveats. Firstly, while the selection and variety of exercises is superb, the theoretical part does not provide an adequate foundation and is better replaced by my book. Secondly, there is too much material even for a one-year course.

This book can be used by an instructor who is not constrained by the College Board curriculum. Since in our program we have a strong Econometrics course, Chapters 12, 13, 16 should be skipped. On the other hand, almost all material from Chapters 1-11 should be given with derivations. When students have access to Excel or Internet, instead of statistical tables students can use Excel or Mathematica functions.

Jan 17

Review of Albert and Rossman

Review of Albert and Rossman "Workshop Statistics: Discovery with Data, A Bayesian Approach", Key College Publishing, 2001

Who is this book for?

In this review I concentrate on how this book is similar to and different from  Agresti and Franklin. The book contains almost no formulas and in this respect is even more basic than Agresti and Franklin. The emphasis of the book is on the Bayesian approach, which is not mainstream Statistics, and this makes it stand out from the crowd. The advantages of this emphasis are described on p.11 (avoiding the notion of a sampling distribution, making the course shorter and doing with less recipes).

What I like

The text is business-like. Just one page (p.1) explains the difference between descriptive and inferential statistics, without even mentioning these names.

The book urges the instructor to reduce the amount of lecturing and rely more on active learning. It often prompts the student to think about ideas before providing the theoretical answer. That's what I like to do in my class. Activity 3-12 (Wrong Conclusions) pursues the same purpose.

The description of basic features of a data distribution on p.22 is concise and clear.

What I don't like

The definition of a categorical variable (p.5) does not allow one to distinguish it from a numerical one. See my explanation.

No attempt is made to improve students' algebra skills.  This is what undermines the attempt to explain the Bayesian approach.

Like Agresti and Franklin, the authors make the study of regression dependent on the correlation coefficient. See Correlation and regression are two separate entities.

The logical sequence is broken. In particular, probabilities are introduced after regression.

The normal distribution, one of the pillars of Statistics, is given too late (in Chapter 18).


As much as I like the idea of active learning, I cannot recommend the book, for the simple reason that it doesn't comply with the College Board curriculum.

Not being a Bayesian specialist, I was hoping to pick up something useful for myself. That hope didn't realize. The five chapters on the Bayesian approach are nothing more than just a collection of recipes accompanied by numerical examples. Even the Bayes theorem is not stated. If I were to write such a book, I would write it as a complement to a widely adopted text. This would allow me to avoid repeating the common stuff (graphical illustration of statistical data, measures of center and spread, probabilities etc.) and give more theory.

Jan 17

Review of Agresti and Franklin

Review of Agresti and Franklin "Statistics: The Art and Science of Learning from Data", 3rd edition

Who is this book for?

On the Internet you can find both positive and negative reviews. The ones that I saw on Goodreads.com and Amazon.com do not say much about the pros and cons. Here I try to be more specific.

The main limitation of the book is that it adheres to the College Board statement that "it is a one semester, introductory, non-calculus-based, college course in statistics". Hence, there are no derivations and no links between formulas. You will not find explanations of why Statistics works. As a result, there is too much emphasis on memorization. After reading the book, you will likely not have an integral view of statistical methods.

I have seen students who understand such texts well. Generally, they have an excellent memory and better-than-average imagination. But such students are better off reading more advanced books. A potential reader has to lower his/her expectations. I imagine a person who is not interested in taking a more advanced Stats course later. The motivation of that person would be: a) to understand the ways Statistics is applied and/or b) to pass AP Stats just because it is a required course. The review is written on the premise that this is the intended readership.

What I like

  1. The number and variety of exercises. This is good for an instructor who teaches large classes. Having authored several books, I can assure you that inventing many exercises is the most time-consuming part of this business.
  2. The authors have come up with good visual embellishments of graphs and tables summarized in "A Guide to Learning From the Art in This Text" in the end of the book.
  3. The book has generous left margins. Sometimes they contain reminders about the past material. Otherwise, the reader can use them for notes.
  4. MINITAB is prohibitively expensive, but the Student Edition of MINITAB is provided on the accompanying CD.

What I don't like

  1. I counted about 140 high-resolution photos that have nothing to do with the subject matter. They hardly add to the educational value of the book but certainly add to its cost. This bad trend in introductory textbooks is fueled to a considerable extent by Pearson Education.
  2. 800+ pages, even after slashing all appendices and unnecessary illustrations, is a lot of reading for one semester. Even if you memorize all of them, during the AP test it be will difficult for you to pull out of your memory exactly that page you need to answer exactly this particular question.
  3. In an introductory text, one has to refrain from giving too much theory. Still, I don't like some choices made by the authors. The learning curve is flat. As a way of gentle introduction to algebra, verbal descriptions of formulas are normal. But sticking to verbal descriptions until p. 589 is too much. This reminds me a train trip in Kazakhstan. You enter the steppe through the western border and two days later you see the same endless steppe, just the train station is different.
  4. At the theoretical level, many topics are treated superficially. You can find a lot of additional information in my posts named "The pearls of AP Statistics". Here is the list of most important additions: regression and correlation should be decoupled; the importance of sampling distributions is overstated; probability is better explained without reference to the long run; the difference between the law of large numbers and central limit theorem should be made clear; the rate of convergence in the law of large numbers is not that fast; the law of large numbers is intuitively simple; the uniform distribution can also be made simple; to understand different charts, put them side by side; the Pareto chart is better understood as a special type of a histogram; instead of using the software on the provided CD, try to simulate in Excel yourself.
  5. Using outdated Texas instruments calculators contradicts the American Statistical Association recommendation to "Use technology for developing concepts and analyzing data".


If I want to save time and don't intend to delve into theory, I would prefer to read a concise book that directly addresses questions given on the AP test. However, to decide for yourself, read the Preface to see how much fantasy has been put into the book, and you may want to read it.