One of the final items on every IMS President’s to-do list is to deliver the Presidential Address at the IMS annual meeting. Tony Cai did this in August 2025, at the JSM in Nashville, and this article is adapted from his IMS Presidential Address—with assistance, as you might expect in the age of AI, from ChatGPT.

Tony Cai, now IMS Past President, handed over the presidential gavel via Zoom to Kavita Ramanan the day after giving this IMS Presidential Address at JSM 2025
The field of statistics, built on more than a century of foundational achievements, now stands at a pivotal moment. The rise of artificial intelligence (AI), the explosion of complex and high-dimensional data, and growing demands from science, industry, and society are reshaping not only how we conduct research but also how we define our role as statisticians. These changes are not purely technical—they challenge our identity, our methods, and our place in a rapidly evolving scientific ecosystem.
Data today comes in many forms—structured and unstructured, numerical and textual, visual and auditory. This diversity reflects the broadening scope of modern data science. Even more striking is the exponential growth of data generated by AI itself. Generative models like ChatGPT, Claude, Gemini, DeepSeek, Sora, and DALL·E have created a surge of synthetic content. According to recent projections, AI-generated data may surpass human-generated data as early as 2026, and exceed 80% of total data by 2030.
This evolution raises critical questions: What does it mean to analyze data generated by algorithms? How do we validate, trust, or interpret such data? These questions are at the heart of statistical reasoning, and our field must play a central role in answering them.
This is a moment of extraordinary opportunity, but also one of profound responsibility. As the data ecosystem evolves at unprecedented speed, we must ask: Will statistics lead in this new era of AI and data science—or be left behind? The answer depends on our willingness to adapt, collaborate, and assert the enduring importance of statistical thinking in shaping the future.
Statistics at the Heart of AI
Despite the rise of machine learning and AI, many of the core algorithms that drive these technologies are fundamentally statistical. Among the top “machine learning algorithms”—such as linear regression, logistic regression, decision trees, naive Bayes classifiers, and support vector machines—are methods that originated from classical statistics. These approaches are not only powerful and practical but also grounded in ideas of estimation, uncertainty quantification, and hypothesis testing. Their enduring popularity underscores a fundamental truth: statistics is not merely relevant to AI—it is central to it.
What truly distinguishes statistics is its emphasis on inference, interpretability, reproducibility, and uncertainty—a set of principles that are becoming ever more critical as AI systems grow in complexity and scope. While modern AI often gravitates toward deep learning and black-box models, it is the statistical perspective that equips us to ask whether a model generalizes, how reliable its predictions are, and what factors drive its decisions. Concepts such as regularization, model selection, the bias–variance trade-off, and the quantification of predictive uncertainty all have their roots in statistics, and they remain vital tools in building robust, interpretable AI.
As data analysis becomes increasingly embedded in high-stakes domains—healthcare, criminal justice, finance, education, and scientific discovery—the need for privacy, accountability, fairness, and transparency has never been greater. Statisticians are uniquely equipped to ensure that these systems are scientifically valid, ethically designed, and rigorously evaluated. By bringing statistical rigor into AI development and deployment, our field can help ensure that the data-driven systems shaping our future are not only effective but also trustworthy and just.
AI is Changing the Research Ecosystem
Artificial intelligence is profoundly transforming the nature of scientific discovery. A striking illustration of AI’s growing impact on scientific discovery is the 2024 Nobel Prize in Chemistry, awarded in part to computer scientist Demis Hassabis, co-founder and CEO of DeepMind. His breakthrough contributions to the development of AlphaFold—an AI system for predicting complex protein structures—have revolutionized molecular biology and opened new frontiers in biomedical research. Similarly, the 2024 Nobel Prize in Physics recognized computer scientist Geoffrey Hinton, often hailed as the “godfather of AI,” for his foundational work in developing artificial neural networks.
These developments mark a new era in science—one where AI is no longer a supporting tool but a driving force in discovery. Statistical thinking—with its emphasis on inference, model validation, uncertainty quantification, and reproducibility—is essential to ensure that AI-generated insights are trustworthy, interpretable, and scientifically meaningful. As AI systems become embedded in research pipelines, statisticians must lead the development of principled frameworks that assess and guide these systems. Our tools are needed not only to measure performance but to diagnose when models fail, to correct for bias, and to ensure that conclusions hold under varying conditions.
Moreover, as science becomes increasingly data-rich and algorithmically driven, the role of statistics must expand. We must advocate for rigorous standards, transparent methodologies, and ethical safeguards in AI-assisted science. Statisticians have a vital responsibility—not only to interpret data but to shape how knowledge is created, evaluated, and shared in the age of AI.
Between Theory and Practice
A persistent and growing tension exists between theoretical rigor and practical application. While theoretical development has long been a defining strength of statistics—ensuring sound inference, well-calibrated uncertainty, and model transparency—the current landscape increasingly prioritizes empirical performance, scalability, and real-world impact. This shift is evident in both academic publishing and the deployment of models in practice, where success is often measured by benchmarks and predictive accuracy alone.
Yet in this climate, statistical theory must not retreat into the background. On the contrary, it must evolve and assert its relevance by addressing the urgent and complex challenges posed by modern machine learning. These include understanding generalization in overparameterized models, quantifying uncertainty in deep learning, designing fair and robust algorithms with privacy guarantees, and characterizing the limits of learning under data and resource constraints.
We must view theory not just as an academic exercise but as a practical foundation for building models that are reliable, interpretable, and reproducible. A sound theoretical framework helps us understand when and why models fail and ensures that the insights drawn from data are meaningful, stable, and trustworthy—qualities essential in high-stakes domains such as healthcare, finance, and public policy.
Ultimately, the future of statistics depends on a vibrant and responsive theoretical foundation—one that is deeply connected to applications and grounded in real-world complexity. We need theory that engages with practice, and practice that informs theory. Bridging this gap is essential for the continued growth and relevance of our field.
New Frontiers for Statistical Research
The age of AI opens unprecedented avenues for statistical innovation. As data grows in scale, complexity, and diversity, statisticians are uniquely positioned to address the fundamental challenges that arise in modeling, inference, and decision-making. These emerging frontiers are not merely extensions of classical topics—they require rethinking core statistical principles in light of modern computational, societal, and ethical contexts. Examples of such frontiers include:
- Mixed Human–AI Data: As AI systems increasingly generate content and interact with humans, many datasets are now hybrids of human and machine outputs. For example, large language models produce responses that are further curated by users, creating complex feedback loops. Statisticians must develop principled methods to model, analyze, and validate such data while accounting for both algorithmic and behavioral biases.
- Differential Privacy and Responsible Data Sharing: The demand for privacy-preserving statistical analysis is growing across healthcare, finance, social sciences, and government data platforms. Differential privacy offers rigorous privacy guarantees, but its practical deployment—especially in federated and streaming environments—poses open questions in optimal noise calibration, privacy–utility trade-offs, and uncertainty quantification.
- Federated and Decentralized Learning: Data is increasingly collected and stored across distributed networks of devices or institutions. Statisticians can lead in developing algorithms for federated learning that are communication-efficient, privacy-preserving, and robust to data heterogeneity. These advances are essential in fields like health research, where privacy regulations prevent full data sharing.
- Transfer Learning and Multi-Source Integration: Transfer learning and domain adaptation aim to combine insights across disparate data sources. These tools are crucial in real-world applications—such as medicine, ecology, and economics—where target populations differ from training data. Statisticians are well-positioned to offer theoretical guarantees for when and how transfer should occur, and how to quantify uncertainty in heterogeneous environments.
- Uncertainty Quantification in Deep Learning: Many deep learning systems lack calibrated uncertainty, limiting their trustworthiness in safety-critical domains. There is growing opportunity for statisticians to build bridges between probabilistic modeling and deep learning to develop more reliable uncertainty quantification, predictive intervals, and diagnostics.
- Robustness, Interpretability, and Fairness: Societal deployment of AI requires models that are robust to distribution shifts, adversarial inputs, and data imperfections. At the same time, transparency and interpretability are increasingly demanded by users, policymakers, and regulators. Statistical methodology—especially from causal inference, sparse modeling, and sensitivity analysis—is crucial to advancing robust and interpretable machine learning.
- Automated Discovery and Scientific Reasoning: AI is becoming not just a tool for prediction, but a collaborator in hypothesis generation, experimental design, and knowledge discovery. As these systems are deployed in science, statisticians must help design frameworks that ensure scientific rigor, reproducibility, and inferential validity.
Together, these frontiers and other emerging topics redefine the role of statistics in the modern data ecosystem. Statisticians are now integral to the entire data lifecycle—from design and generation to inference and impact.
The Role of IMS and the Statistical Community
The Institute of Mathematical Statistics (IMS) is uniquely positioned to guide the global statistics community through this era of rapid transformation. With approximately 45% of its members based in North America, 25% in Asia, and 23% in Europe, IMS has both the reach and the responsibility to foster scientific excellence while building bridges across geographical, disciplinary, and generational boundaries.
In recognition of this global mandate, IMS has just launched a new International New Researchers Conference (INRC) to be held alternately in Europe and Asia. This initiative builds on the longstanding success of the NRC in North America, which has played a crucial role in supporting early-career statisticians, promoting research visibility, and facilitating professional networking. The expansion into Europe and Asia acknowledges the growing strength of the international statistics community and reflects IMS’s commitment to equitable access to resources, mentorship, and collaboration opportunities worldwide.
Complementing this effort, IMS has also introduced a new annual meeting series—Frontiers in Statistical Machine Learning (FSML). Strategically scheduled to take place at a local university immediately after the NRC and just before the Joint Statistical Meetings (JSM), FSML serves as a vital link between early-career development and the broader statistical and data science community. The goal is to foster meaningful interaction between statistics and machine learning, providing a vibrant forum for researchers to explore advances at the interface of theory, methodology, and real-world application.
Together, these initiatives underscore IMS’s dedication to nurturing the next generation of statistical leaders and deepening global engagement. By creating dedicated spaces for early-career researchers and encouraging interdisciplinary collaboration, IMS strengthens its international presence and brings a diverse array of voices into the heart of our professional community. With a strong foundation and enthusiastic global support, these new efforts are well-positioned to replicate—and even build upon—the success of their North American counterparts.
Beyond these initiatives, IMS has an essential role to play in shaping the future of our discipline. It can:
- Foster meaningful interdisciplinary collaboration—especially with computer science, engineering, the life sciences, and the social sciences—ensuring that statistical thinking informs AI, data science, and emerging areas of research.
- Strengthen support for early-career statisticians through mentorship programs, training workshops, travel funding, and high-visibility publication opportunities.
- Lead educational reform by encouraging curricula that incorporate algorithmic thinking, computational tools, effective communication, and applied problem-solving alongside mathematical and statistical theory.
- Amplify the role of statistics in AI and data science by highlighting impactful work in IMS journals, conferences, and outreach efforts.
These efforts are not just desirable—they are essential. As data continues to shape the future of science, policy, and industry, IMS must serve as both a standard-bearer for excellence and a catalyst for inclusion, innovation, and impact. Through thoughtful leadership and bold action, IMS can ensure that statistics remains at the heart of the scientific enterprise and continues to play a central role in addressing the most important challenges of our time.
Final Thoughts
Statistics is not being replaced by artificial intelligence—it is being called upon to strengthen it. The accelerating pace of change in data science and AI presents not only challenges, but opportunities—opportunities for statisticians to lead, to shape, and to safeguard the future of evidence-based reasoning in science, technology, and society.
We stand at a crossroads—not just as a discipline, but as stewards of how knowledge is derived, communicated, and applied in an increasingly data-driven world. The choices we make today will shape the trajectory of statistics, the integrity of science, and the role of evidence in society for generations to come.