Axel Munk is the founder and director of the Felix-Bernstein Institute for Mathematical Statistics in the Biosciences at the Georg-August University of Göttingen. Moreover, he is a Max Planck fellow at the Max Planck Institute for Biophysical Chemistry where he is the head of the research group for Statistical Inverse Problems in Biophysics. In 1994, he received his doctoral degree in Mathematics at the University of Göttingen, and held positions at Ruhr-University Bochum, Technical University Dresden and Paderborn University, before he joined the department of Mathematics and Computer Science at the University of Göttingen in 2002. He was elected a member of the Göttingen Academy of Sciences and Humanities in 2012. His interests range from fundamental statistical research to the development of methods and software for the analysis of experimental data in the lab sciences, in particular in biochemistry, structural biology, molecular genetics, and cell microscopy. He is a board member of the cluster of excellence “Multiscale Bioimaging: From Molecular Machines to Networks of Excitable Cells” and of the collaborative research center “Mathematics of Experiment”. Aspects of his research have been highlighted as editor’s picks, spotlights and discussion papers in several journals. His work on nanoscale statistics has been reviewed in the Research Features Magazine. Axel Munk is an elected member of the ISI and an IMS Fellow. He has served the IMS in several capacities, including currently as a council member. He will deliver this Medallion Lecture at the Joint Statistical Meetings in Seattle, August 7–12, 2021.
Optimal Transport-based Data Analysis: Inference, Algorithms, Applications
Optimal transport (OT) has a long history, which originates in the 18th century with Gaspard Monge’s physical considerations of mass transportation and his studies on optimal spatial allocation of resources. Since then it has undergone a flourishing mathematical development and has influenced and shaped various areas within mathematics including analysis, probability, statistics and optimization. In parallel, it also has been proven to be a remarkably rich and fruitful concept for various other disciplines, such as economic theory, finance and more recently, computer science, machine learning and statistical data analysis.
In its most fundamental form optimal transport reduces to an assignment problem, which is a combinatorial optimization problem and NP hard in general. In the first half of the last century, Leonid Kantorovich bypassed this issue as he suggested a relaxation of OT as a probability mass transportation problem — which laid the foundations of linear programing. Since then, the computation of OT is a highly active field of research. Modern methods often exploit duality, the specific structure of the ground space and of the cost functional. In fact, due to such computational progress and the flexibility of OT, various concepts for OT data analysis (OTDA) are beginning to find its way into novel areas of applications including tomography, cell biology and geophysics, to mention a few. Nevertheless, OTDA computation still is a bottleneck when processing millions of data, highly relevant to the aforementioned applications. This becomes even more critical for related but more complex tasks such as multi-marginal transport, or transport problems, which are to be solved up to isometries of the ground space resulting in the Gromov–Wasserstein transport.
Hence, despite its great conceptual appeal and certain computational progress, OTDA is still in its infancy. This also concerns the development of statistical methodology and theory.
In this talk we will discuss some recent developments in OTDA at the cutting edge of statistical methodology and computation. This includes OT-baycenters, which are summary measures of data with complex geometric structure, as well as novel ways to measure dependency. Mathematical tools are limit laws and risk bounds for empirical OT plans and distances on finite and discrete spaces. Proofs are based on a combination of sensitivity analysis from convex optimization and discrete empirical process theory. From this, we obtain methods for statistical inference, fast simulation and for fast randomized computation of OTDA tasks in large scale data applications at pre-specified computational cost. The performance of OTDA is illustrated in various computer experiments and on data from structural biology and super-resolution cell microscopy.
This talk surveys work over the last years. Collaborations with Florian Heinemann, Shayan Hundrieser, Marcel Klatt, Facundo Memoli, Giacomo Nies, Katharina Proksch, Max Sommerfeld, Thomas Staudt, Carla Tameling, Zhengchao Wan, Christoph Weitkamp and Yoav Zemel are gratefully acknowledged.