Institute of Mathematical Statistics | Large Language Models in Practice: A new companion for statisticians

Large Language Models in Practice: A new companion for statisticians

October 1, 2025

Linjun Zhang is an Associate Professor in the Department of Statistics, at Rutgers University. He obtained his PhD in Statistics at the Wharton School, the University of Pennsylvania, in 2019, and received the J. Parker Bursk Memorial Prize and the Donald S. Murray Prize for excellence in research and teaching, respectively, upon graduation. He has also received an NSF CAREER Award, the Rutgers Presidential Teaching Award in 2024, and the Warren I. Susman Award for Excellence in Teaching in 2025. His current research interests include statistical foundations of large language models, privacy-preserving data analysis, algorithmic fairness, and deep learning theory.

We invited Linjun to write a series of articles on LLMs, AI, and implications for the statistics profession. This is the first part:

I am very grateful to Tati Howell, the IMS Bulletin editor, for the invitation to write a series of pieces on statistics in the age of large language models (LLMs). Over the next few articles, I hope to explore the practical, foundational, and ethical dimensions of this moment. In this first piece, I will focus on the practical side—how tools like ChatGPT are already entering our research, writing, and teaching workflows—drawing from my own concrete experiences.

I will begin with what sounds like a provocative joke (or a bold prediction) I have heard from others: given recent advances, from AI models solving IMO problems at gold-medal levels to automated research agents, perhaps the only job left for professors will be in-person education. It sounds far-fetched, but the point is serious: parts of our work are being automated faster than we imagined. The pressing question is which parts of the professor’s role remain irreplaceably human.

I have been experimenting with and doing research on LLMs since their early release. In my own writing routine, LLM has become something like a momentum machine. When starting a paper or a grant proposal, I will feed in a handful of bullet points, such as motivation, methodology, and structure, and get back a scaffold. The text is never final; the phrasing is often too general and lack details, claims occasionally exaggerated. But it is a way to leap over the blank page. (By the way, somewhere in the following article, one of the paragraphs is written purely by ChatGPT. The first one who can spot which paragraph, I owe you a fine dinner.)

Programming has been another place where the tool has earned a place. A graduate student and I once wrestled with a stubborn simulation bug. The LLM’s first proposed fix did not work, but its reasoning pointed us to a version mismatch we had not considered. The AI did not solve the problem outright, but it shortened the path to the solution.

The visualization tools were a surprise. I can now sketch a clumsy diagram, and an LLM tool (like the recent Nano Banana) will produce a refined and much more beautiful version. It is not always perfect, but it is an excellent starting point for refinement and saves hours of tedious work.

These experiments have carried into the classroom. Students are already eager to use ChatGPT, so the question is not whether but how. In an earlier paper, “What Should Data Science Education Do with Large Language Models?” [1], my coauthors and I argued that the goal of current data science education should be to teach students not only to use LLMs but also to evaluate them, identifying errors, understanding biases, and integrating statistical reasoning into the process. That philosophy now shapes the way I design assignments and final projects. In the graduate course I am teaching this semester, on statistical foundations of LLMs, the students’ final project is a nine-page essay on “Statistics + Large Language Models.” They are required to use ChatGPT or similar tools, but also to submit their entire prompt history and reflect critically on what worked, what failed, and how statistical thinking guided their judgments. The hope is that they will learn not to take LLM outputs at face value, but to see them as raw material requiring scrutiny.

The classroom exercises have been enlightening. In one assignment, students used ChatGPT to explore a concept from class they found confusing or exciting. In another, groups of students compared each other’s solutions to a problem with those from ChatGPT and the published answers. One student identified three distinct errors in the AI’s output, neglecting uncertainty, misusing terminology, and oversimplifying assumptions. That exercise provoked a far richer discussion than simply presenting the correct solution ever could have.

Of course, there are risks: fluent but wrong text, the temptation to over-trust, inequities between students with different levels of access or prompting skill. I try to balance this by giving prompt templates, by grading more for reasoning than polish, and by insisting on transparency. As this will be a series of articles, I look forward to reporting further what I have learned from these new experiments in class in the December issue of the IMS Bulletin.

So far, the gains outweigh the costs: blank-page paralysis recedes, debugging gets faster, diagrams arrive sooner, and classroom discussions deepen. But if in-person education does remain the last indispensable part of the professor’s role, it will be because mentorship, critique, and the noisy process of learning together are not things that can be automated away.

In the next article, I will turn from these practical reflections to the foundational questions: why statisticians have a unique role to play in LLM research, with a focus on evaluation and alignment; and how our discipline’s perspectives on uncertainty, inference, and interpretability can shape the future of these powerful technologies.

References

[1] Tu, Xinming, James Zou, Weijie Su, and Linjun Zhang. “What Should Data Science Education Do With Large Language Models?” Harvard Data Science Review, 6, no. 1 (2024).