Kaleidoscopic effect

Mathematicians, there might be more to your Statistical siblings than you thought

John Braun, July 22, 2020

I have always wondered at the incredibly intricate levels of specialization within mathematics while statistics is usually considered as one homogeneous entity.

 

Recently, our (UBC-Okanagan) department’s Math PhD program undertook a review of its comprehensive exam, an undertaking capably handled by the program coordinator who sought to address some of the issues that had arisen locally in recent years. Our department is highly diverse, and the mathematicians here have shown amazing flexibility in accommodating this diversity through incorporation of theoretical computer science and a generously broad definition of probability and theoretical statistics which has allowed our microscopically small groups of computer scientists and statisticians to supervise a nice cohort of HQP locally. But, this begs the question: what is an appropriate comprehensive exam for a Math PhD candidate? One answer, that always rattles the statisticians, and I suspect the computer scientists as well, is that the comprehensive exam should rigorously address the three “pillars” of mathematics, two of which seem to have firm support: analysis and algebra, and a third which seems to vary considerably – is it geometry? Differential equations? Combinatorics? Here, the consensus had been algebra, analysis and differential equations, before the annoying statisticians arrived and suggested that probability and statistics could be a fourth. The mathematicians accommodated this idea by allowing students to take 3 of the 4 possible topics, a reasonable compromise. With the addition of theoretical computer science, the flexibility of the mathematicians was tested more stringently. The mathematicians quite reasonably (after all, it is their program) wanted to retain 2 of the “pillars” while arguments were made that a student might be better off with only 1 pillar, together with computer science along with probability and statistics.

 

Our resourceful program coordinator was able to strike an appropriate compromise once again, but the debate leading to the compromise concerned the point of the mathematics comprehensive exam itself. What is it for? One answer (an important one, but not the only one) is that the student who passes the comprehensive exam and ultimately obtains the PhD degree and ultimately ends up as an Assistant Professor in the Mathematics Department at a leading university would be able to “teach at all levels of the mathematics undergraduate program’’ with the possible exception of some specialized senior courses in an area complementary to their program of research. This is a great idea, until we recognize that most PhD graduates do not end up teaching mathematics at a university, let alone a leading one. And when those mathematicians enter the industrial world, they may very well be sought after for their mathematical talents, but there is a good chance that they will also be expected to program a computer, run some simulation code or construct a predictive model from data that doesn’t come from a textbook. In other words, they may need more than that long-forgotten introductory statistics course and that other long-forgotten introductory course in C (in fact, they most likely need to quickly pick up python and SQL and a lot of other things).

 

The thing that struck me in this friendly debate was how we tend to view the fields of math and stats themselves. Mathematicians often view statistics as a subset of mathematics, and there is certainly an argument to be made that statistics and mathematics overlap to a substantial degree – and statisticians who do not have considerable knowledge of mathematics are often at a tremendous disadvantage relative to those who do. The irony is not lost then, when considering mathematical science as the union of mathematics and statistics (and maybe some other things that don’t concern me at the moment), the subdivisions on the non-statistical side are rather fine (e.g. “algebraic combinatorics”, “combinatorial algebra”, “algebraic number theory”, and so on), but on the statistics side you will see the singleton class “statistics” or sometimes the pair, “probability” and “statistics”.

 

This nomenclature seems to lend itself to the mistaken view that statistics is some monolithic entity, while the field of mathematics is much richer and the players are more highly varied. (“Who should be the next hire in our math/stat department, someone in topological ergodic theory or a statistician?”) In fact, there is considerable variation among statisticians – and just as in mathematics, a PhD statistician should be able to handle the first few years of undergraduate teaching, with specialized senior and graduate courses belonging to domain specialists, where the domains have names like “functional data analysis”, “survival analysis”, “sampling theory”, “Bayesian nonparametrics”, and so on.

 

In closing, it is clear to me that statisticians have benefitted a lot from learning mathematics – we learn and apply elements of coding theory, combinatorics, functional analysis, …, but I think mathematicians could benefit by learning more statistics. Taking a course or two in statistics is not enough – that would be like a statistician taking the calculus sequence and thinking they know math. Just as it takes time to get to the point where one can understand Zorn’s Lemma or why we need to prove uniqueness and existence of certain mathematical objects, it takes time to get to the point of fully appreciating variance propagation and uncertainty. Indeed, much of statistical practice in industry is closer to engineering than mathematics. That is, one has a set of tools (i.e., statistical methods) that are used to solve applied problems. Statistical principles and experience in application are key to successful problem solving with data. Students in both mathematics and in statistics should be actively encouraged to explore these closely allied fields. They can only benefit from the experience.