Daniela Witten is frustrated with (some of) her students “outsourcing their brains to LLMs”:
For decades, the university classroom was as if frozen in time: blackboards became whiteboards and transparencies transformed to Beamer*, but that was about it. But early 2020 brought with it a series of transformations. First came the lightning-fast pivot to Zoom. Then followed the realization that students might not have home environments conducive to timed tests; this caused a shift towards other forms of assessment. And finally, when students returned to campus in person, there was an understanding that the turmoil that they had experienced during their formative years might impact their ability to re-acclimate to the new (or rather, the old) normal, and that it is the responsibility of faculty to ease this transition.
For the most part, I handled these changes with grace and good cheer: I care about my students and take pride in doing my job well. It was clear that many of these new teaching practices benefited my students, and so the sacrifice of my time, energy, and headspace felt worthwhile.
However, I have only so many hoots to give. So in early 2023, when faculty at my home university and beyond began sounding the alarm about the perils of large language models (LLMs) for student learning, I could not muster any additional hoots. Yes, I understood that many students would (try to) use LLMs to complete their homework assignments, and that in doing so they would cheat themselves of the learning promised by a university education. However, I just was not that worried about it, for the following reasons:
- Cheating on assignments has been a problem since the beginning of time (or at least, since the beginning of assignments). There’s nothing new here!
- Even before LLMs, students already had access to extracurricular learning resources, like StackExchange and the unsung heroes of our time (which is what I call the people who write statistics Wikipedia articles)**. Again, there’s nothing new here.
- I mostly teach MS or PhD students, or else advanced Stat undergrads. For the most part, these students want to learn and do not want to cheat. And as for the others: I can bring a horse to water but I can’t make it drink. For the literal-minded, in this metaphor the horse is my student and the water is a deep understanding of the foundations of statistical machine learning.
- And finally, my primary emotional/intellectual output is my research, and I am better at statistics research than LLMs are. If you prompt ChatGPT with the title of my current work-in-progress, it will come up with something, but not a good statistics paper. If LLMs do not pose a risk to my research, then why would they pose a risk to my teaching?
So, I blissfully rode the wave of don’t-give-a-hoot-ery for over two years: I included standard wording in my course syllabi prohibiting the use of LLMs or other digital assistance for the completion of homework assignments, and then put the issue out of my mind.
However, my feelings changed during the most recent offering of my statistical machine learning course for data scientists. I have taught this material more than a dozen times in the past 10 years, and I have it down to a science, from the topic of each lecture to the style of problem sets. (I do write new problems each year, since solution keys for my assignments emerge online after each course offering, like digital mushrooms.) This time, however, was different. There were two sets of students in my class. The first set—the “thinkers”, if you will—completed the assignments in earnest, often with painstaking, hand-written calculations or the uncertain Latex formatting of the newly-initiated. The second set—let us not call them “non-thinkers”!—submitted approximately 60 pages of LLM-generated drivel in response to a straightforward assignment, with superfluous bullet points, the mystifying use of italics, and a tendency to use 50 words where three would do. It was not hard to tell the two sets of students apart. But it was hard to grade the assignments of the second set, since scouring pages of slop for the correct answer was beyond the scope of my teaching assistants’ job descriptions.
One particularly tragicomic anecdote involves the responses that students submitted to a question that asked them to reproduce a figure from the course textbook (Figure 10.20 of Introduction to Statistical Learning, 2nd Edition, by James et al.). This figure displays an example of the so-called ‘double descent’ phenomenon in machine learning, which can occur in the setting of highly over-parametrized models in the absence of sufficient regularization. The textbook does not provide details about how the figure was created, and so the assignment requires students to deeply understand the circumstances that cause the phenomenon. The thinkers, as advertised, thought: some of them were able to reproduce double descent, and some weren’t, but it was clear that they all learned from the process. By contrast, the students who outsourced their thinking to our machine learning overlords instead submitted variants of the figure that completely lacked the double-descent phenomenon. The LLMs had quite literally misunderstood the assignment: they perfectly replicated the font size, line type, and aspect ratio of the figure in question, but without the double descent behavior that was the entire point of the assignment.
To add insult to injury, for this particular assignment, a substantial number of students submitted literally identical PDFs: the only difference between their submissions was the name on the top of the assignment. When I raised this point to the students in question, they seemed genuinely baffled by what was, in their eyes, a total non-issue. After all, they had all prompted the LLM with the same query (the PDF of my homework assignment), and so of course their submissions were the same! I can acknowledge a certain logical consistency to their perspective: if you believe that LLMs can answer statistics problem sets correctly, and that it is okay to use LLMs to obtain those answers, then why is it better for multiple students to query ChatGPT separately rather than for a single student to query ChatGPT and share the output with their friends? After all, the latter approach would have a smaller carbon footprint!
I now realize that while I had been directionally correct that the cheating enabled by LLMs is not new, I had failed to appreciate the magnitude of the problem. A student who pores through StackExchange to find the answer to their question has put in an honest day’s work and has almost certainly (though perhaps unintentionally) learned something along the way. The student who uploads the PDF of my homework assignment to ChatGPT neatly avoids both the exertion and the learning. While there is a certain poetic justice to this (the old adage “nothing ventured, nothing gained” comes to mind), it is unfair to expect my teaching assistants to make sense of 60 pages of drivel, and it is also demoralizing for the thinkers in the class.
So the next time I teach, I will give only in-person assessments. This change certainly will not solve the long-term problem of people outsourcing their brains to LLMs, but it will motivate my students to learn and will reward them for learning. May my classroom remain frozen in time for at least a little bit longer.
—
* Devoted readers of this column, and anyone who has attended one of my talks, will know that this is tongue-in-cheek. I live and die by Keynote.
** For clarity: I do not mind if my students use StackExchange or Wikipedia for a homework assignment. They are great resources.