*We introduce the Le Cam lecturer, Ruth Williams, and the Neyman lecturer, Peter Bühlmann, along with the rest of this year’s Medallion Lecturers: Anthony Davison, Anna De Masi, Svante Janson and Sonia Petrone. Anthony’s lecture will be at JSM in Vancouver (July 28–August 2), Anna’s at the Stochastic Processes and their Applications meeting in Gothenburg (June 11–15). The others will all be at the IMS Annual Meeting in Vilnius (July 2–6).*

**Le Cam Lecture preview: **Ruth Williams

**Ruth J. Williams** studied mathematics, at the University of Melbourne where she earned BSc(Hons) and MSc degrees, and then at Stanford University where she earned her PhD. Following a postdoc at Courant Institute in New York, she took up a position as an Assistant Professor at the University of California, San Diego (UCSD). She is currently a Distinguished Professor of Mathematics and holds the Charles Lee Powell Chair in Mathematics I at UCSD.

*Ruth Williams’ research in probability theory and its applications includes work on reflecting diffusion processes in non-smooth domains, heavy traffic limit theorems for multiclass queueing networks, and fluid and diffusion approximations for the analysis and control of more general stochastic networks, including those described by measure-valued processes. Her current research includes the study of stochastic models of complex networks, for example, those arising in Internet congestion control and systems biology.*

*Among her honors, she is an elected member of the National Academy of Sciences, an elected fellow of the American Academy of Arts and Sciences, a fellow of AAAS, AMS, IMS and INFORMS. In 2012, she served as President of the IMS and in 2016, joint with Martin Reiman, she was awarded the John von Neumann Theory prize by INFORMS. She delivered an IMS Special Invited paper (now called Medallion lecture) in 1994.*

*Ruth Williams will give this Le Cam Lecture at the IMS Annual Meeting in Vilnius, Lithuania, on Monday July 2, 2018.*

**Stochastic Networks: Bottlenecks, Entrainment and Reflection**

Stochastic models of complex networks with limited resources arise in a wide variety of applications in science and engineering, e.g., in manufacturing, transportation, telecommunications, computer systems, customer service facilities, and systems biology. Bottlenecks in such networks cause congestion, leading to queueing and delay. Sharing of resources can lead to entrainment effects. Understanding the dynamic behavior of such modern stochastic networks presents challenging mathematical problems.

While there are exact steady-state analyses for certain network models under restrictive distributional assumptions, most networks cannot be analyzed exactly. Accordingly, it is natural to seek more tractable approximate models. Two types of approximations which have been used to study the stability and performance of some stochastic networks are fluid and diffusion models. At this point in time, there is a substantial theory of such approximate models for queueing networks in which service to a queue is given to the job at the head-of-the-line (HL networks). However, for queueing networks with non-HL service, such as processor sharing or random order of service, and for more general stochastic networks that do not have a conventional queueing network structure, the development of approximate models (and rigorous scaling limit theorems to justify them) is in its early stages of development.

This talk will describe some recent developments and open problems in this area. A key feature will be dimension reduction, resulting from entrainment due to resource sharing. Examples will be drawn from bandwidth sharing and enzymatic processing.

For background reading, see the survey article: R. J. Williams, Stochastic Processing Networks, *Annu. Rev. Stat. Appl. ***2016**. 3: 323–45.

**Neyman Lecture preview: **Peter Bühlmann

**Peter Bühlmann **is Professor of Statistics and Mathematics at ETH Zürich. Previously (1995–97), he was a Neyman Visiting Assistant Professor at the University of California at Berkeley. His current main research interests are in causal and high-dimensional inference, computational statistics, machine learning, and applications in bioinformatics and computational biology. He is a Fellow of IMS and ASA, and he has served as Co-Editor of the Annals of Statistics from 2010–12. He received an honorary doctorate from the Université Catholique de Louvain in 2017, and is the recipient of the 2018 Royal Statistical Society’s Guy Medal in Silver.

*Peter will give this Neyman Lecture at the IMS Annual Meeting in Vilnius, Lithuania, on Tuesday July 3, 2018.*

**Invariance, causality and novel robustness**

** Jerzy Neyman: my starting point.** Jerzy Neyman (1923) considered agricultural field experiments where unobserved “potential yields” from a plant variety are modeled with a fully randomized assignment mechanism [5]. Historically, it appears that Neyman was the first to formalize causal inference using potential outcomes. It turned out to be an important milestone on which many developments, methods and algorithms are building, cf. [4].

** Causality: it’s about predicting an answer to a “What if I do question”.** Loosely speaking, the main task in causality is to predict a potential outcome under a certain treatment or in a certain environment based on data where this treatment has not been observed. In many modern applications, we are faced with such prediction tasks. For example: in genomics, what would be the effect of knocking down (the activity of) a gene on the growth rate of a plant? We want to make a prediction without having data on such a gene knock-out (e.g. no data for this particular perturbation). Similar questions arise in economics, e-commerce and many other areas.

*Invariance in heterogeneous data.** *Structural equation models are another framework for the same causal inference task as in the potential outcome setting. There is a key invariance assumption in this framework which has been formalized first by Trygve Haavelmo (the Norwegian economist and 1989 Nobel Laureate) in 1943 [1]. “Invariance” is the first word appearing in the lecture’s title as it is a crucial center point: we will focus on an invariance principle in the context of heterogeneous and potentially “large-scale” data. In a nutshell, Haavelmo [1] had recognized that:

causal model structures ⇒ invariance of a certain type.

One new line of thinking is the reverse relation, namely:

invariance of a certain type ⇒ causal model structures.

With access to large-scale heterogeneous data, we can simply estimate invariance from the data which then leads to estimated causal structures; or in other words, we infer causality from a special well-defined notion of “stability” or invariance [2]. Here, heterogeneity is not a nuisance but a “real friend” since it enables the search for invariance (a.k.a. stationary structures) within heterogeneous (a.k.a. non-stationary) data.

** From invariance to novel robustness: anchor regression. **Strict invariance and causality may be too ambitious in the context of “large-scale” and poorly collected data. But when relaxing to soft invariance, we still obtain interesting robustness results for prediction. The “What if I do question” from causality is related to robustness with respect to a class of new scenarios or perturbations which are not observed in the data. A novel methodology relying on causal regularization, called anchor regression, provides new answers [3].

The lecture will highlight some fundamental connections between invariance, robustness and causality. We will illustrate that the novel insights and methods are useful in a variety of applications involving “large-scale” data.

**Acknowledgment. **Many of the ideas presented in the lecture come from my collaborators Nicolai Meinshausen, Jonas Peters and Dominik Rothenhäusler.

**References**

[1] Haavelmo, T. (1943). The statistical implications of a system of simultaneous equations. *Econometrica*, **11**:1–12.

[2] Peters, J., Bühlmann, P., and Meinshausen, N. (2016). Causal inference using invariant prediction: identification and confidence interval (with discussion). *J. Royal Statistical Society, Series B*, **78**:947–1012.

[3] Rothenhäusler, D., Bühlmann, P., Meinshausen, N., and Peters, J. (2018). Anchor regression: heterogeneous data meets causality. Preprint arXiv:1801.06229.

[4] Rubin, D. and Imbens, G. (2015). *Causal Inference for Statistics, Social, and Biomedical Sciences.* Cambridge University Press.

[5] Splawa-Neyman, J. ([1923] 1990). On the application of probability theory to agricultural experiments. Essay on principles. Section 9. Translated and edited by D.M. Dabrowska and T.P. Speed from the Polish original, which appeared in *Roczniki Nauk Rolniczyc, Tom X* (1923): 1–51 (*Annals of Agricultural Sciences*). *Statistical Science*, **5**:465–472.

**Medallion Lecture preview: **Anthony Davison

**Anthony Davison **is Professor of Statistics at the Ecole Polytechnique Fédérale de Lausanne (EPFL). Between obtaining his PhD from Imperial College London in 1984 and moving to EPFL in 1996, he worked at the University of Texas at Austin, at Imperial College London and at the University of Oxford. He has published on a variety of topics in statistical theory and methods, including small-sample likelihood inference, bootstrap methods and the statistics of extremes. He is author or co-author of several books.

*He has served the statistical profession as Editor of Biometrika, as Joint Editor of Journal of the Royal Statistical Society, series B, and in various other roles. In 2009 he was made laurea honoris causa in statistical science by the University of Padova. He is a Fellow of IMS and the ASA, and in 2015 received the Royal Statistical Society’s Guy Medal in Silver.*

*Anthony will give this Medallion Lecture at the JSM in Vancouver.*

**Statistical Inference for Complex Extreme Events**

Statistics of extremes deals with the estimation of the risk of events that have low probabilities but potentially very damaging consequences, such as stock market gyrations, windstorms, flooding and heatwaves. Typically, few events relevant to the phenomenon of interest have ever been observed, and their probabilities must be estimated by extrapolation well outside any existing data, using appropriate probability models and statistical methods. Two broad approaches to the analysis of extremes are the use of block maxima, for example, annual maximum rainfall series; and the use of threshold exceedances, whereby only those observations that exceed some high threshold contribute to tail estimation. Key difficulties are that relevant events are typically scarce, so as much information as possible must be squeezed from those data that are available, and that any models based on limiting arguments are likely to be mis-specified for finite samples. There is an extensive literature on all aspects of this problem, from a wide range of perspectives.

In the scalar case, classical arguments suggest that inference for block maxima and exceedances over high thresholds should be based respectively on the generalised extreme-value and generalized Pareto distributions, and these are widely used in applications. Extensions of these arguments to more complex settings suggest that max-stable processes should provide suitable models for maxima of spatial and spatio-temporal data, and that Pareto processes are appropriate for modelling multivariate exceedances, appropriately defined. Although max-stable processes were introduced around 1980, there had been few attempts to fit them to data until recently, due both to a dearth of suitable models and to computational considerations. Such processes are generally specified through their joint distribution functions, leading to a combinatorial explosion when attempting to construct a full likelihood function, so workarounds, such as use of composite likelihoods or other low-dimensional summaries, have been proposed for both parametric and nonparametric inference. These approaches are increasingly being deployed in applications, but they are statistically inefficient, and the resulting loss of precision matters in settings where the final uncertainty is invariably too large for comfort.

A further difficulty is that basing inference on maxima, which typically stem from merging several unrelated occurrences, obscures the detailed structure of individual events. Since these may show typical patterns that are important in risk assessment, attention has recently turned to inference based on multivariate exceedances, which in principle allow more informative modelling to be undertaken. Functionals that determine risks of particular interest are used to select the events most relevant to the estimation of these risks, and the tail probabilities corresponding to such risks are then estimated.

This lecture will survey recent work in the area and then show how detailed modelling for high-dimensional settings can be undertaken using Pareto processes, generalized versions of threshold exceedances, suitably-defined likelihoods and gradient scoring rules.

The work is joint with numerous others, and in particular with Raphaël de Fondeville.

**Medallion Lecture preview: **Anna De Masi

**Anna De Masi **is Professor in Probability and Statistics at the University of L’Aquila in Italy, where she has been coordinator of the PhD program, “Mathematics and Models” since 2013. Her research interests cover issues such as macroscopic behavior of interacting particle systems, phase transition phenomena in equilibrium and non-equilibrium statistical mechanics. She is among the founders of the mathematical analysis of hydrodynamic limits involving stochastic evolutions of particle systems and kinetic limits [see the surveys in collaboration: “A survey of the hydrodynamical behaviour of many-particle systems,” Studies in Statistical Mechanics, Vol.11, North Holland (1984); and “Mathematical methods for hydrodynamical limits,” Lecture Notes in Mathematics 1501, Springer–Verlag (1991)]. With various collaborators, she has analyzed, using probabilistic techniques, phenomena like separation of phases, spinodal decomposition, and development and motion of interfaces. Her recent interests focus on boundary-driven systems in the presence of phase transition and their relation with free boundary problems.

*Anna’s Medallion Lecture will be at the Stochastic Processes and their Applications meeting in Gothenburg in June.*

**Backwards diffusion: how much does it cost, could it be for free?**

To explain the title, consider a thought experiment where there is a gas in a container with a wall in the middle that keeps the density on the left smaller than that on the right. If we take out the wall, the gas diffuses and at the end the density becomes uniform. However with “astronomically” small probability or after “astronomically” long times we may see again regions with different densities.

A system exhibiting such a behaviour is the Ising model with nearest neighbour ferromagnetic interaction and Kawasaki dynamics. At large temperatures initial inhomogeneities disappear due to the diffusive behaviour of the system, [1], and the appearance of inhomogeneities is only due to a large deviation. However if we lower the temperature we see the opposite, i.e. spacial homogeneous initial states evolve into states with regions having different magnetization.

This is not only a mathematical abstraction, consider in fact a binary alloy mixture consisting of atoms of Fe and Cr combined in a face centered cubic lattice. At T=1200K the system performs normal diffusion but at T=800K the mixture separates into regions one predominantly Fe and the other Cr. This is largely used in chemical engineering to purify metal: loosely speaking in the above example we slice the crystal into parts made predominantly of Fe and others predominantly of Cr. Such phenomena go under the name “uphill diffusion”, see for instance [2], they occur in more general alloy mixtures and are widely used in industrial applications.

The purpose of my talk is to address these questions in the framework of a rigorous analysis. Something can be done but many intriguing questions remain open and I will try to focus on them, hoping that the audience will get interested and maybe involved. I will restrict to the Ising model with Kawasaki dynamics, which is a Markov process with nearest neighbour spin exchanges, so that the canonical Gibbs measure (with n.n. ferromagnetic interaction) is invariant. The evolution is in a finite region [0, *N*]* ^{d}* ∩

*Z*,

^{d}*d*= 1, 2. We add a spin flip process on the right and left boundaries which forces the average spin to have values

*m*

*< 0 (on the left) and*

_{−}*m*

_{+}> 0 (on the right).

This is the usual set up for the Fick law. The question is the sign of the current in the stationary state. In *d* = 1 at infinite temperature the current satisfies the Fick law (going opposite to the magnetization gradient) and it is therefore negative. In this case we prove a large deviation estimate on the probability that it is instead positive (also when *m*_{±} are slowly varying on time), see [3]. In the case *d* = 2 when the temperature is subcritical (and there is a phase transition) we have observed via computer simulations, [4], that the stationary current may become positive flowing from the reservoir with smaller to the one with larger magnetization. We have some theory to explain the phenomenon but few mathematical proofs.

Several other questions will be addressed in my talk.

**References**

[1] S.R.S. Varadhan, T. Yau. Diffusive limit of lattice gas with mixing conditions. *Asian J. Math.* 1, 1997, 623-678.

[2] R. Krishna. Uphill diffusion in multicomponent mixtures. *Chem. Soc. Rev.* 44, 2015, 2812.

[3] A. De Masi, S. Olla. Quasi-static hydrodynamic limit. *J. Stat. Phys.*, 161, 2015 and Quasi-static large deviations in preparation 2018.

[4] M. Colangeli, C. Giardinà, C. Giberti, C. Vernia. Non-equilibrium 2D Ising model with stationary uphill diffusion. *Phys. Rev. E*, to appear 2018.

**Medallion Lecture preview: **Svante Janson

**Svante Janson** is Professor of Mathematics at Uppsala University. He obtained his PhD in Mathematics from Uppsala University 1977, and has remained there ever since, except for short stays at other places. His thesis and early research was in harmonic analysis and functional analysis, but for a long time, his main interest has been in probability theory. In particular, he is interested in the study of random combinatorial structures such as random graphs, trees, permutations, and so on, where he generally tries to find interesting limit theorems or limit objects. He also works on Pólya urns and branching processes. He has written three books, and over three hundred mathematical papers. Svante is a member of the Royal Swedish Academy of Sciences, the Royal Society of Sciences at Uppsala, the Swedish Mathematical Society, and the Swedish Statistical Association. He has been awarded the Rudbeck medal by Uppsala University and the Celsius medal by the Royal Society of Sciences at Uppsala.

*This Medallion Lecture will be at the IMS Annual Meeting in Vilnius, Lithuania, on Thursday July 5, 2018.*

**Random trees and branching processes**

Branching processes generate random trees in a natural way, which can be varied by, e.g., conditioning or stopping the branching process. Moreover, many families of random trees that are defined in other ways turn out to be equivalent to random trees defined by branching processes in some way or another. This has thus become one of the main probabilistic tools to study random trees.

One central family of random trees are the conditioned Galton–Watson trees, coming from a Galton–Watson process conditioned to have a given total size. We study the asymptotic behaviour of these random trees, both locally close to the root, locally close to a random node, and globally after suitable rescaling.

Other, quite different, families of random trees can be constructed by stopping a (supercritical) Crump–Mode–Jagers (CMJ) process when it reaches the desired size. We study the local and global behaviour of such trees too.

There are also open problems in this field, and these lead to open problems and conjectures about fluctuations for functionals of CMJ branching processes.

See further, e.g.,[1, 2] and the references therein.

**References**

[1] Holmgren, Cecilia and Janson, Svante. Fringe trees, Crump–Mode–Jagers branching processes and *m*-ary search trees. *Probability Surveys ***14 **(2017), 53–154.

[2] Janson, Svante. Simply generated trees, conditioned Galton–Watson trees, random allocations and condensation. *Probability Surveys ***9** (2012), 103–252.

**Medallion Lecture preview: **Sonia Petrone

**Sonia Petrone** is a Professor of Statistics at Bocconi University, Milan, Italy. Her research is in Bayesian statistics, covering foundational aspects as well as methods and applications. Foundational themes in her work include exchangeability and decisions under risk. Her main methodological interests are in the area of Bayesian nonparametrics, including mixtures and latent variable models, density estimation, nonparametric regression and predictive methods. She has been President of the International Society for Bayesian Analysis (ISBA) and is an elected member of the IMS Council. She has been a co-editor of Bayesian Analysis and is currently an associate editor of Statistical Science. She is a Fellow of ISBA. Sonia will be giving her Medallion Lecture at the IMS Annual Meeting in Vilnius, on Monday July 2.

**Bayesian predictive learning beyond exchangeability**

Bayesian Statistics has its foundations in the concept of probability as the rule for quantifying uncertainty, and in the consequent solution of the learning process through conditional probabilities. People usually distinguish two main learning goals: in the inferential approach, the focus of learning is on the model’s parameters; in the predictive approach, the focus of learning is on future events, given the past. Predictive learning is central in many applications, and has a foundational appeal: one should express probabilities on observable facts, (non-observable) parameters being just a link of the probabilistic chain that leads from past to future events. Beyond foundations, the focus on prediction is important for fully understanding the implications of modeling assumptions.

Bayesian predictive learning is solved through the conditional distribution *P _{n}* of

*X*

_{n}_{+1}given (

*X*

_{1}, …,

*X*). This implies that some form of probabilistic dependence is always expressed, in order to learn from experience. Random sampling is reflected in an assumption of invariance of the joint distribution of the observable sequence (

_{n}*X*) under permutations of the labels, that is, exchangeability. The basic role of exchangeability in Bayesian statistics is enhanced by the representation theorem: For an infinite exchangeable sequence (

_{n}*X*), the predictive (and the empirical) distributions converge to a random distribution

_{n}*F*, and, conditionally on

*F*, the

*X*are a random sample from

_{i}*F*: the inferential model

*F*arises as the limit of the predictive distributions. A long-studied problem is to characterize the model through the sequence (

*P*). For example, and informally, if

_{n}*P*only depends on a ‘predictive sufficient’ statistic

_{n}*T*(

_{n}*X*

_{1}, …,

*X*), one obtains a parametric model

_{n}*F*

_{θ}where θ = lim

*T*. In Bayesian nonparametric inference, predictive characterizations are an attractive alternative to directly assigning a prior distribution on an infinite-dimensional

_{n}*F*. Most popular priors for Bayesian nonparametrics have a predictive characterization as stochastic processes with reinforcement, such as Pólya-like urn schemes.

In more complex problems, stochastic dependence structures beyond exchangeability are needed. Still, forms of symmetry, or partial exchangeability, may hold. Powerful predictive constructions have been proposed for partially and Markov-exchangeable data, and successfully applied in a wide range of fields. A challenge, nowadays, is having partially exchangeable predictive rules that remain computationally tractable for the increasingly complex applications. In the information-versus-computations trade-off, possibly sub-optimal but computationally more tractable approximations of exchangeable predictive rules receive renewed interest.

In my lecture, I will present predictive constructions, beyond exchangeability, based on (interacting) stochastic processes with time-dependent or random reinforcement. These processes are not, in general, (partially) exchangeable, but may still have convergent predictive distributions and be so asymptotically. They may model evolutionary phenomena that deviate from exchangeability but reach an exchangeable steady state. Interestingly, they may also offer a ‘quasi-Bayes’ recursive predictive learning rule, that approximates an exchangeable rule and is computationally simpler. I will illustrate this potential in the basic case of nonparametric mixture models.

## Comments on “Le Cam, Neyman and Medallion Lecture previews”