
Arthur P. Dempster
Arthur Pentland Dempster, Professor Emeritus of Theoretical Statistics at Harvard University and a founding member of its Department of Statistics, died on January 30, 2026, at the age of 96. Over nearly seven decades of scholarship, Professor Dempster shaped the foundations of statistical inference through a series of path-breaking research programs, including a theory of upper and lower probabilities (now widely known as Dempster–Shafer theory), the EM algorithm, and a geometric reformulation of multivariate analysis. Professor Dempster’s contributions have left a permanent mark on the discipline and on adjacent fields, ranging from machine learning to artificial intelligence.
Dempster was born in Toronto in 1929, the middle of three boys. An early interest in science and mathematics was influenced by his uncle and namesake, Arthur Jeffrey Dempster, the University of Chicago physicist whose pioneering work in mass spectrometry led to the discovery of uranium-235.
Arthur P. Dempster attended the University of Toronto, where he was a Putnam Fellow in 1951 and received a B.A. in Mathematics and Physics in 1952; his PhD in Mathematical Statistics in 1956 was at Princeton University. His first sustained contact with the statistical community came at the University of Connecticut’s summer statistical seminar, organized by Geoffrey Beall and John W. Tukey and attended by Paul Meier, Jerome Cornfield, David Blackwell, and other figures who would shape the discipline. At Princeton, Dempster studied under Sam Wilks’ statistics program and participated regularly in Tukey’s afternoon seminars, where applied problems brought by visitors were systematically analyzed before an audience of graduate students. His dissertation treated multivariate problems in which the number of variables exceeds the sample size, a regime now central to high-dimensional statistics [1,2]. While at Princeton, Dempster met Elizabeth O’Neill; they married in January 1957.
Dempster spent 1956–57 as a lecturer at the University of Toronto. In 1957–58 he held an appointment in the mathematics research group at Bell Telephone Laboratories. At Bell Labs, Dempster was exposed to scientific computing and participated in an early large-scale data project to computerize personnel records for the entire Bell System. It was also at Bell Labs that Dempster developed his theoretical work on random allocation designs, published as a pair of papers in the Annals of Mathematical Statistics [3, 4].
In early 1957, Frederick Mosteller recruited Dempster to the newly established Department of Statistics at Harvard. Dempster joined as assistant professor in 1958, shortly after the other founding faculty (Mosteller, William G. Cochran, John W. Pratt, and Howard Raiffa) took up their appointments. He was promoted to associate professor in 1961 and to full professor in 1964. He served as chair of the department for a total of 13 years between 1969 and 1985, and remained on the active faculty until his transition to emeritus status in 2005.
Dempster’s first major research program continued the multivariate themes of his thesis. Throughout the 1960s he taught a graduate course in multivariate analysis with a distinctive geometric and computational orientation. Dempster developed this material into the book Elements of Continuous Multivariate Analysis [5]. His brief note in the Annals on a paradox concerning inference about a covariance matrix [6] was privately regarded by Dempster as among his most important contributions. The work anticipated by decades the difficulties of inferring high-dimensional covariance structures from small samples, a problem to which he would return in his later work on belief functions.
In the early 1960s, while immersed in R.A. Fisher’s Statistical Methods and Scientific Inference [7], Dempster began to appreciate the conceptual allure of Fisher’s fiducial inference and became convinced that it was technically deficient. His response was a sequence of papers published between 1966 and 1968, in which he introduced a new framework for probabilistic reasoning under partial information. “New methods for reasoning towards posterior distributions based on sample data” [8] and “Upper and lower probabilities induced by a multivalued mapping” [9] established the mathematical apparatus of random sets and multivalued mappings as a generalization of ordinary probability. A companion paper extended the framework to inference from finite populations [10]. The synthesis appeared as “A generalization of Bayesian inference” [11] in the Journal of the Royal Statistical Society, Series B, which made explicit the relationship between the new framework and Bayesian posteriors. A further paper that same year treated upper and lower probabilities generated by random closed intervals [12].
The mathematical apparatus of these papers was motivated initially by Fisher’s inability to extend fiducial reasoning to discrete data, beginning with the Binomial distribution. The conceptual core was an explicit recognition of a third category of probability mass beyond “for” and “against” any assertion: the probability of “don’t know.” Dempster delivered an early account of the broader subjectivist program at the 1965 International Statistical Institute meetings in Warsaw under the title “A subjectivist look at robustness” [13], a presentation that drew an appreciative letter from Bruno de Finetti. Glenn Shafer’s 1976 monograph, A Mathematical Theory of Evidence [14], reformulated Dempster’s theory of upper and lower probabilities. Dempster praised Shafer’s monograph for “freeing [Dempster’s theory] from the narrow statistical confines of random sampling and re-expressing it in terms of more general relevance as belief, support, and evidence.” Their contributions combined, now widely known as the Dempster–Shafer theory, became one of the principal alternative formalisms for reasoning under uncertainty widely employed in artificial intelligence and engineering contexts.
In the early 1970s Dempster began collaborating with Donald B. Rubin (then at the Educational Testing Service) on missing data problems and with Nan M. Laird (then at Harvard Department of Statistics) on random-effects models. Their collaboration culminated into a landmark paper, “Maximum Likelihood from Incomplete Data via the EM Algorithm” [15], published in the Journal of the Royal Statistical Society, Series B, with discussion. The paper formulated a general computational repertoire, namely an “Expectation” step followed by a “Maximization” step iterated until convergence, and demonstrated how it may be applied broadly to obtain maximum likelihood estimation in the presence of a broad class of incomplete data problems arising in exponential families, including random-effects models, censored and grouped data, as well as generally in models involving latent structures.
The 1977 paper accumulated more than 70,000 citations and was listed among Nature’s most-cited papers across disciplines [16]. Its impact has been deepest in fields where latent-variable models and incomplete data are pervasive: machine learning, mixture modeling, tomographic image reconstruction, hidden Markov modeling, and quantitative genetics, to name a few.
Through the 1980s and 1990s, Dempster supervised graduate work that addressed the principal computational obstacle to applying Dempster–Shafer reasoning: the combinatorial cost of the random-set representation in high dimensions. Augustine Kong’s dissertation developed a “join tree” representation for combining belief functions locally. Kong’s representation is one of several closely parallel contributions to belief function local computation that emerged during that era. This program was in turn part of a broader convergence on local computation in graphical models that took place across statistics and machine learning research communities. Russell Almond’s dissertation extended this line of work and later appeared in book form [17].
Over the final two decades of his research life, Dempster returned to Dempster–Shafer theory as his principal scholarly focus. In “The Dempster–Shafer Calculus for Statisticians” [18], Dempster reinvigorated the research program by introducing the characteristic triple (p, q, r), corresponding to the probabilities “for,” “against,” and “don’t know” attachable to any formal assertion. Dempster argued for the necessity of an explicit “don’t-know” probability for the honest representation of inferential uncertainty. He called the resulting framework “DS-21,” a 21st-century reformulation organized around the careful construction of state spaces, the recognition that the “don’t-know” probability (r) is sensitive to the analyst’s choice of variables, and the deliberate use of “don’t-know” probability to address multiplicity and post-hoc subgroup inference. A retrospective synthesis appeared as “Statistical Inference from a Dempster–Shafer Perspective” in the COPSS volume, Past, Present, and Future of Statistical Science [19].
Methodological and computational work proceeded in parallel. With Paul T. Edlefsen and Chuanhai Liu, Dempster developed a Dempster–Shafer treatment of Poisson counting data motivated by problems in high-energy physics, specifically the Banff upper-limits challenge associated with Higgs-boson searches [20]. The prohibitive computational cost to obtain Monte Carlo samples of the random-set objects at the heart of the theory, which Dempster had studied as a class of random convex polytopes in a 1972 Annals paper [21], stood for nearly half a century as the greatest obstacle to applied Dempster–Shafer inference. In 2021, this obstacle was overcome in collaboration with Pierre E. Jacob, Ruobin Gong, and Paul Edlefsen, in “A Gibbs Sampler for a Class of Random Convex Polytopes” [22] published in the Journal of the American Statistical Association, accompanied by discussions from Glenn Shafer, Persi Diaconis, among others.
Professor Dempster believed that the research program on Dempster–Shafer theory of belief functions is the most important of his principal scientific legacies.
Over nearly five decades of active service, Dempster taught generations of Harvard undergraduates and supervised over 30 doctoral students, many of whom went on to leading positions in academic statistics and in industry. His service and dedication to the department were a sustained force in shaping Harvard’s statistical research and teaching mission. He and Elizabeth opened their beautiful home to colleagues, students, and visitors throughout his career. With funding support from former doctoral student Stephen Blyth, Harvard’s Department of Statistics established the Arthur P. Dempster Award in 2012 to recognize promising graduate-student research in theoretical and foundational statistics.
Professor Dempster was a Fellow of the IMS and ASA, and of the American Academy of Arts and Sciences. He held a Guggenheim Fellowship in 1967–68.

Members of the Dempster family
Elizabeth (née O’Neill), Arthur’s wife of nearly sixty-nine years, died on March 29, 2026, two months after his own passing. A classicist by training, Elizabeth read classics with first-class honors from Queen’s University in Kingston, Ontario, and held a master’s degree in Greek at Bryn Mawr College. Elizabeth and Arthur are survived by their children, Rebecca, Ben, and Sara; their spouses Matthew, Yuri, and Eloy; and grandchildren Rowan, Aidan, and Gavin.
The 2026 Joint Statistical Meetings in Boston, MA will host a memorial session on Sunday August 2, 2026 at 2pm, to celebrate Professor Dempster’s life and work. Harvard Statistics will also host an alumni reception in honor of Professor Dempster on the same day at 5pm at the Maxwell-Dworkin building (33 Oxford St, Cambridge, MA, 02138).
—
SELECT REFERENCES
[1] Dempster, A. P. (1958). A high dimensional two sample significance test. The Annals of Mathematical Statistics, 29(4), 995–1010.
[2] Dempster, A. P. (1960). A significance test for the separation of two highly multivariate small samples. Biometrics, 16(1), 41–50.
[3] Dempster, A. P. (1960). Random allocation designs I: On general classes of estimation methods. The Annals of Mathematical Statistics, 31(4), 885–905.
[4] Dempster, A. P. (1961). Random allocation designs II: Approximate theory for simple random allocation. The Annals of Mathematical Statistics, 32(2), 387–405.
[5] Dempster, A. P. (1969). Elements of Continuous Multivariate Analysis. Reading, MA: Addison-Wesley.
[6] Dempster, A. P. (1963). On a paradox concerning inference about a covariance matrix. The Annals of Mathematical Statistics, 34(4), 1414–1418.
[7] Fisher, R. A. (1956). Statistical Methods and Scientific Inference. New York: Hafner.
[8] Dempster, A. P. (1966). New methods for reasoning towards posterior distributions based on sample data. The Annals of Mathematical Statistics, 37(2), 355–374.
[9] Dempster, A. P. (1967). Upper and lower probabilities induced by a multivalued mapping. The Annals of Mathematical Statistics, 38(2), 325–339.
[10] Dempster, A. P. (1967). Upper and lower probability inferences based on a sample from a finite univariate population. Biometrika, 54(3–4), 515–528.
[11] Dempster, A. P. (1968). A generalization of Bayesian inference. Journal of the Royal Statistical Society, Series B, 30(2), 205–247.
[12] Dempster, A. P. (1968). Upper and lower probabilities generated by a random closed interval. The Annals of Mathematical Statistics, 39(3), 957–966.
[13] Dempster, A. P. (1975). A subjectivist look at robustness. Bulletin of the International Statistical Institute, 46, 349–374.
[14] Shafer, G. (1976). A Mathematical Theory of Evidence. Princeton: Princeton University Press.
[15] Dempster, A. P., Laird, N. M., and Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm (with discussion). Journal of the Royal Statistical Society, Series B, 39(1), 1–38.
[16] Van Noorden, R., Maher, B., and Nuzzo, R. (2014). The top 100 papers. Nature, 514(7524), 550–553.
[17] Almond, R. G. (1995). Graphical Belief Modeling. London: Chapman & Hall.
[18] Dempster, A. P. (2008). The Dempster–Shafer calculus for statisticians. International Journal of Approximate Reasoning, 48(2), 365–377.
[19] Dempster, A. P. (2014). Statistical inference from a Dempster–Shafer perspective. In Past, Present, and Future of Statistical Science. Boca Raton: Chapman & Hall/CRC.
[20] Edlefsen, P. T., Liu, C., and Dempster, A. P. (2009). Estimating limits from Poisson counting data using Dempster–Shafer analysis. The Annals of Applied Statistics, 3(2), 764–790.
[21] Dempster, A. P. (1972). A class of random convex polytopes. The Annals of Mathematical Statistics, 43(1), 260–272.
[22] Jacob, P. E., Gong, R., Edlefsen, P. T., and Dempster, A. P. (2021). A Gibbs sampler for a class of random convex polytopes. Journal of the American Statistical Association, 116(535), 1181–1192.
—
Written by Ruobin Gong, drawing on the memorial published by Harvard’s Department of Statistics.
The author thanks Nan Laird, Glenn Shafer, Augustine Kong, Chuanhai Liu, Stephen Blyth, Paul Edlefsen, Emily Palmer, and the Dempster family, for their comments and recollections.