“One of the leading results on Brownian motion is that it exists.” So wrote David Freedman on page 1 of his book on the topic. I recalled this statement several years ago when I was writing in this column about proofs, and it came to mind again recently when some younger colleagues asked me to explain Dirichlet processes (DPs) to them. More precisely, they asked me to explain three-level hierarchical Dirichlet process (HDP) mixture models, something that I’d never seen before, but which are very easy to write down using a plate diagram, though not quite as easy to grasp. Until now, I’ve been a bit blasé about Bayesian nonparametrics, thinking that I could probably reach the end of my career without getting far into it. But I was wrong, for big data has caught up with me. One of the most exciting things in my world these days is a little machine known as the MinION, which can produce reads of hundreds of thousands of base pairs from a single DNA molecule, lots of them. This machine can be held in your hand (“smaller than your smartphone”), and plugged into your computer: if you’re not careful, this can crash it by delivering more data than it can take in. The desktop version called PromethION can generate a thousand times more data. Having been involved in DNA sequencing for a while, handheld machines generating 10–20 giga-basepairs of DNA sequence in 48 hours have almost ceased to impress me, but desktop versions generating 12 tera-basepairs of sequence in 48 hours still get my attention. What is even more amazing to me is that the signals from which all this DNA sequence data are derived are single electrical currents measured several thousand times per second. This is the big data that has forced my colleagues to come to terms with these models. Not just HDP mixture models, but also convolutional, and standard and long short-term memory recurrent neural networks, the statistical machinery of deep learning. I have finally been dragged into the twenty-first century.
What’s all this got to do with existence? Wonderfully, perhaps surprisingly, DPs first saw the light of day in 1973 in papers in our own Annals of Statistics, written by Thomas Ferguson and colleagues. And there he had to prove their existence. He did so using facts about Dirichlet distributions to show that Kolmogorov’s consistency conditions for a projective system were satisfied, and so a limit, the DP, should exist. That’s why I got called in. Who among the present generation of statistics students knows about the existence of random processes or projective limits? In 1973 and since then, alternative derivations of the DP were discovered. Some, such as the equivalent Pólya urn scheme, are indirect. Others, such as Ferguson’s use of gamma processes or Jayaram Sethuraman’s stick-breaking representation, are more direct and constructive. It turned out that some restrictions were necessary on the underlying measure spaces for Ferguson’s original existence proofs to work, restrictions which weren’t necessary for some other constructions. So we didn’t need Kolmogorov’s theorem after all.
Why do we need existence proofs anyway? Using DPs to analyse data boils down to simple arithmetic procedures whose behavior doesn’t appear to demand deep existence proofs. If people can use HDP mixture models effectively in practice without ever having thought of the question of existence, who are we to criticize? Louis Bachelier found Brownian motion valuable before Norbert Wiener proved that it exists. I have the impression that physicists have sometimes drawn valid inferences about the world from theory that wasn’t fully grounded until later.
For most of my career, I have started thinking about the questions and the data that cross my path in traditional terms, including devising graphical displays that suggest a way ahead, using linear models and their generalizations, various forms of multivariate analysis, possibly latent variables, and at times context-specific models based on some scientific or technological background. There’s always been plenty of theory in the background for me to consult if I wished. In 2013, the International Year of Statistics, I attended the London Workshop on the Future of the Statistical Sciences. I felt at home, and I liked the published report. Now, with my scientific collaborators asking questions concerning data off the MinION, I no longer feel at home. I need to go far beyond what I currently know, and I have become deeply conscious of the power of deep learning. Most of the theory is unfamiliar, indeed much is unlike what I have come to think of as theory. I should have been paying closer attention, as none of these things are really new.
One may even become Stephen Stigler’s eighth Pillar of Statistical Wisdom.