Hans R. Künsch, ETH Zurich, completed his term as IMS President at the Joint Statistical Meetings in Montreal. He delivered this Presidential Address:
This year we are celebrating the anniversary of a book published 300 years ago, and on this occasion we have declared 2013 to be the International Year of Statistics. So the book gives us the opportunity to attract the attention of a larger public to statistics, but what relevance does it have for us statisticians and probabilists today?
Some might argue, very little: First, the title, Ars Conjectandi, the Art of Conjecturing, seems odd for a book about science—after all, science and arts have very different standards. And isn’t the book’s main result just the weak law of large numbers for the binomial distribution? We can prove that theorem in much greater generality in a few lines. Such arguments miss however important points. With his Ars Conjectandi, Jacob Bernoulli has laid an important foundation for our fields, and I want to use it tonight for the following purposes:
• to look back how our subject evolved,
• to appreciate how far we have come,
• to better understand the conceptual difficulties with probability and statistics applied to real world problems,
• to compare how science was done in the past with how it is done nowadays,
• to speculate about possible directions of our field.
I believe that such thoughts also fit well with the efforts of IMS to highlight the rich history of probability and statistics through our Scientific Legacy project, although that project will not go back so far in the past.
Let me first recall briefly the historical context of Ars Conjectandi: The scientific approach to randomness starts in 1654 with an unpublished correspondence between Pascal and Fermat, followed in 1657 by the first book, written by Huygens. They dealt with games of chance where probabilities could be determined by symmetry arguments, without relying on observations.
In 1662 Graunt published his Observations made upon the bills of mortality, and in 1694 Halley published Life tables with seven uses. These works compute relative frequencies of survival given a certain age, ignoring the underlying random variation.
Jacob Bernoulli’s book Ars Conjectandi brought these two developments together by giving a mathematical argument for the convergence of relative frequencies to probabilities.
It laid therefore the foundation for the application of probability theory in situations outside of pure games of chance. More importantly, because Bernoulli gave specific bounds on the required number of observations for a desired precision, his results made it possible to quantify the uncertainty in the estimation of probabilities by relative frequencies. In a wider perspective, the book gives a justification for inductive reasoning: How can we derive mathematical models from data?
In a less known fourth part, Jacob Bernoulli also discusses principles for the application of probability theory in political, judicial, moral and business matters. He was convinced that a precise knowledge of probabilities was the basis for predictions, judgments and actions. He could not have imagined the many other applications of probability in the natural sciences—statistical physics, quantum mechanics and genetics were completely unknown in his time.
Before we look more closely at some of the conceptual issues, I would like briefly to describe the personality of the author and some circumstances regarding how the book was written.
The Bernoulli family was of Dutch origin and had moved to Basel in Switzerland for religious reasons. Jacob was the first of many scientists that this family produced over generations—all of them male; the first female member of the Bernoulli family that I could find on the web is Eva Bernoulli, 1873–1935, who was active in the temperance movement against alcohol. Presumably there were other female Bernoullis, and we can only wonder what they might have contributed to science if they had had the same access to education as the men.
Jacob originally studied theology at his father’s wishes, but later became a mathematician. We can guess at his relationship with his father from the motto that he adopted: “In spite of my father I am among the stars.” His younger brother Johann studied first medicine, but then also switched to mathematics. At the beginning he was taught by Jacob, and together they became leaders in the application of the new infinitesimal calculus, developed by Newton and Leibniz, to problems in geometry and physics. The two brothers solved a number of the most difficult problems of their time, but their cooperation turned quickly to a bitter rivalry which they fought out ruthlessly in their publications.
I will not go into the details of this clash, but just quote a few characterizations of the two brothers used by their biographers: “a bilious and melancholic temperament”, “mostly quarrelsome and jealous”, or “violent, abusive… and, when necessary, dishonest”. We may ask if the way we deal with competition in research has changed in the past 300 years? Difficult characters are presumably unavoidable in science, but this is no excuse for personal attacks or polemics. We still have to continue our efforts to create an environment for science which is characterized by fair play and mutual respect.
Jacob Bernoulli did the main work on his Ars Conjectandi from 1684 to 1689, but he did not publish it during his lifetime. He died in 1705, and the book appeared only 8 years later. The delay between Jacob’s death and the publication of the book is mainly a consequence of the rivalry with his brother, but why did he not publish it during his lifetime? Historians believe that this is primarily because he was not satisfied with it: The bounds that he had are not tight and thus the required sample sizes for a precise estimation are huge. Moreover, he wanted to obtain some data to illustrate the arguments in his fourth part. Who would still do this nowadays and resist the pressure to publish quickly despite the feeling that he or she hasn’t yet fully understood the problem?
Now let me discuss some of the conceptual difficulties related to Ars Conjectandi. The first one is the meaning of probability. Bernoulli’s definition is still my preferred one: “Probability is a degree of certainty, and differs from certainty as a part from the whole”. In the weak law of large numbers, uncertainty is again expressed as a probability. Thus for the interpretation of this law, we need the concept that “Events with low probability do not occur in a single occasion.” Bernoulli formulated this idea using the term “morally impossible.” But how small should the probability of an event be so that we are “morally certain” that it will not happen? This question is still with us today in the assessment and communication of risk. For instance, during my presidency I had to decide whether IMS should take a position about the conviction of scientists because of their statements before the earthquake in l’Aquila, Italy. The issue was not whether earthquakes can be predicted, but rather whether the scientists had weighed and communicated the evidence according to the most recent scientific knowledge. [Hans recommended attending David Spiegelhalter’s public lecture given a couple of days later, From Gambling to Global Catastrophe: Metaphors and Images for Communicating Numerical Risks]
Let me next turn to the issue of inductive reasoning. Since Bernoulli there have been many attempts to understand far more complex systems than a simple urn by analyzing empirical data. In the era of Big Data, data are available almost without limitations, and expectations are high.
For instance the European Journal of Physics published last year a “Manifesto of computational social science” which contains statements like the following:
• Information and communication technologies (ICT) produce a flood of data. These data represent traces of almost all kinds of activities of individuals enabling an entirely new scientific approach for social analysis.
• The analysis of huge data sets as obtained, say, from mobile phone calls, social networks, or commercial activities provides insight into phenomena and processes at the societal level.
• ICT can greatly enhance the possibility to uncover the laws of the society.
• The role of computational social science is a leading one in addressing the Big Problems of society, avoiding crises and threats to its stability and healthy development.
Most of us will agree with the first two statements, but I have some doubts about the other two. Of course, picking out a few sentences from a 20-page article is not fair, but I believe the authors should discuss much more extensively questions such as: What are these laws of the society, what distinguishes them from the laws of physics, how can the idea of social laws coexist with concepts of individuality and free choices, and what are the limits of predictability and controllability of phenomena and processes at the societal level?
Neuroscience is another field where hope is high that by exploiting available data we can gain insight, discover fundamental laws and find cures for diseases like Alzheimer’s and Parkinson’s.
The website of the Blue Brain Project, which recently was approved as an EU flagship project, describes their approach as follows:
Neuroscience: systematic, industrial-scale collection of experimental data, making it possible to describe all possible levels of structural and functional brain organization from the sub-cellular, through the cellular, to the micro-circuit, meso-circuit and macrocircuit levels;
Neuroinformatics: automated curation and databasing of data, use of Predictive Reverse Engineering to predict unknown data from a smaller sample of known data or from data describing other levels of brain organization;
Mathematical abstraction: definition of parameters, variables, equations, algorithms and constraints representing the structure and functionality of the brain at different levels of organization.
Again, I am a bit skeptical that systematic collection of data will allow us almost automatically to infer the functioning of the brain and to predict what happens at a different levels.
Most of us are presumably also surprised that the document uses the term Predictive Reverse Engineering for tasks that we would consider as the central domain of statistics. Indeed, there is some concern that statistical knowledge is not fully recognized by other branches of science. I have heard this concern repeatedly in the 30+ years of my career, the first time being when fuzzy systems and neural networks were the hot topics in science. I am therefore not too worried about the future of our fields, but I do believe that we should encourage more people to reach out towards new disciplines, to develop and study new methods to address the needs of other disciplines. We should also have more appreciation for such interdisciplinary contributions and achievements.
In the remaining time, I don’t want to discuss this point further, but I would like to describe to you briefly how the knowledge about another complex system—the weather and climate—has progressed in the past 150 years. This topic fits well into this year’s celebration of “Mathematics of Planet Earth”. My brief thoughts will also provide illustrations of the roles and interplays between data and theory, induction and deduction, stochastic and deterministic thinking.
I begin my story with Robert Fitzroy, who lived from 1805 till 1865. He is presumably unknown to you. He was the captain of HMS Beagle, the ship of Charles Darwin’s famous voyage, which gave him lots of experience with weather and storms at sea. In 1854 he was appointed as chief of a new government agency to deal with the collection of weather data at sea, with the goal to make shipping less dangerous.
Fifteen land stations were established to use the new telegraph to transmit daily reports of weather at set times. Fitzroy developed charts to allow predictions to be made from these data. The first daily weather forecasts were published in The Times in 1860. In analogy to information and communication technology today, he used the newest technological means available in order to collect as much information as possible in order to gain insight and to solve a practical problem.
In 1863 he published The Weather Book: A Manual of Practical Meteorology, in which he attempted to uncover the laws governing the weather from his life-long experience . The success of his efforts was however limited. Francis Galton, who had also an interest in meteorology, wrote a devastating review of that book: “It is a fault in a book intended to lay the foundations of a new experimental science, that it should be mainly occupied with deductions from unproven hypotheses, instead of the careful establishment of axioms by rigorous induction from observed facts.”
Despite Galton’s assertion, big breakthroughs did not happen through a more sound induction from observed facts. They were achieved through an entirely different approach, initiated by the Norwegian Vilhelm Bjerknes (1862–1951). He was the first to realize that the fundamental laws of fluid dynamics and thermodynamics can be used to describe the large scale motions in the oceans and the atmosphere. In 1904, he declared that weather forecasting is possible in principle by solving deterministic partial differential equations. This was very much ahead of his time, and the first attempt to implement the idea ended in a failure. Computing power was simply not sufficient.
The situation changed with the advent of computers in the 1950s. Since then, deterministic differential equation models are at the core of all scientific weather prediction services. Moreover, basically the same models can be used also for climate studies, provided one takes into account the interactions of the atmosphere and the oceans with the biosphere and the cryosphere, the part of the earth where water is frozen.
However, as we all know, despite the huge progress in computing power, both weather and climate predictions are still uncertain. Atmospheric physicists are nowadays aware that they have to quantify this uncertainty. The uncertainty has two main reasons. Firstly, in view of the large range of space and time scales of processes in the atmosphere, the problem of how to deal with phenomena at smaller scales than the numerical resolution allows will not disappear. Secondly, in 1963 Lorenz discovered the chaotic nature of the weather: sensitivity to initial conditions limits its predictability, even if we could solve the equations exactly.
The solutions to these difficulties are parametrizations and data assimilation. Without going into technical details, parametrizations are methods to approximate the effect of unresolved scales as functions of resolved variables. Data assimilation is the sequential adjustment of the current computed state of the atmosphere, based on observations. In the next prediction step, this current state is used as new initial condition.
The relevant point for this talk is that for both parametrization and data assimilation, advanced methods are stochastic: Stochastic parametrization is a hot topic in the field, and ensemble methods which replace a single current state of the atmosphere by a sample in order to quantify the uncertainty of predictions are nowadays widespread. These approaches then turn weather prediction into a filtering problem in high dimensions and thus they offer interesting possibilities for our fields, in combination with physics and numerical mathematics.
So the first attempts with a purely inductive approach to understand weather and climate failed. They were replaced by a purely deductive and deterministic approach. However, also that approach reached its limit. Uncertainty is unavoidable and stochastic methods have re-entered the scene. What the story will look like for social sciences or neuroscience in the future, how progress will be made, and what the contributions of our fields will be, I don’t know, but I would like to live long enough to see at least part of it. I encourage you to actively participate in these research efforts and I hope that IMS will contribute through its journals, meetings and the contacts it makes possible.