Xiao-Li Meng writes:
Writing the last XL- Files on “Peter Hall of Fame” reminded me of a piece that I have wanted to write since attending Chin Long Chiang’s memorial workshop on November 15, 2014. Professor Chiang was a pioneer of biostatistics long before I survived a course on survival analysis. Thus I was honored when I was invited to provide a statistician’s perspective on a debate between Chiang and another pioneer of biostatistics, Marvin Zelen. The debate apparently started with Zelen (1983, Biometrics), in a piece titled “Biostatistical Science as a Discipline: A Look into the Future,” whose abstract begins: “The field of biostatistics is enjoying unparalleled developments. Never before have members of our profession been in such demand. Current applications are significantly influencing the direction of research in statistical methodology. It is not clear whether there is a discipline which can be termed ‘biostatistics,’ but we are part of the emergence of a discipline which is termed ‘biostatistical science’. It refers to the applications of statistics, probability, computing and mathematics to the life sciences, with the goal of advancing our knowledge of a subject-matter field in this area. This paper discusses the role of computing, some aspects of training, and future directions of biostatistical science.”
What strikes me most is the relevance of Zelen’s thoughts on biostatistics vs biostatistical science for today’s discussion of statistics vs. data science. His description of biostatistical science could easily serve as one for data science, save for its restriction to life science. His question regarding the disciplinary identity of biostatistics within biostatistical science parallels the current question of whether statistics will survive as a viable discipline, given the emergence of the more encompassing discipline of data science.
Zelen suggested that the term biostatistics or biometrics “refers to a collection of statistical techniques which are primarily used in applications to the biological and biomedical sciences. … However, a discipline is not a collection of techniques.” But what is a discipline?
In his discussion, Bernard Greenberg listed three criteria for being a discipline: there must be a body of knowledge; it must be transmissible via educational methods; and it must undergo constant changes as a result of research performed by persons identified as its members. For Greenberg, if biostatistics was not a discipline, additional criteria would have to be articulated. Although Zelen did not directly respond to Greenberg’s challenge, he was clear that the key difference between biostatistics and biostatistical science was that the latter places far more emphasis and training on computing and substantive scientific knowledge. Biostatistics, then, was implicitly not a viable discipline because its “body of knowledge” was not sufficiently broad.
In his commentary “What is Biostatistics?” (1985, Biometrics), Chiang defined, and defended, biostatistics as “a discipline that is concerned with the development and application of statistical theory and methods for the study of phenomena arising in the life sciences.” Chiang reasoned that biostatistics was well qualified to be a discipline after 1950 because of “the amount and quality of knowledge that has been developed and accumulated in the field,” and because, “Since then graduates with strong backgrounds in mathematical statistics and mathematics have entered the field and treated biostatistical topics with a different attitude.” For Chiang, biostatistics possessed depth; for Zelen, biostatistics lacked breadth.
Perhaps the sharpest difference between Chiang and Zelen lies in their predictions of the future. Chiang predicted that “theoretical development, not statistical software, will be the centerpiece of biostatistics” and that “the future of biostatistics lies in the direction of stochastic processes.” Chiang believed that Zelen had overemphasized the role of computing and statistical software, remarking that, “His misplacement of emphasis made him feel insecure when he realized ‘the computer will become an intelligent data analyst’ in less than 10 years. The ‘computer data analyst’ may come sooner than he thinks. But biostatistics will continue to flourish and biostatisticians will not be out of a job.”
Zelen, however, considered Chiang’s emphasis on theoretical model building to be “totally naive unless one takes a serious interest in the subject matter and the appropriate data.” Zelen went on to conclude that, “Time will tell whether computing or stochastic processes will dominate biostatistics or biostatistical science. However, one need not go too far to verify that nearly all Departments of Biostatistics are currently adding computing courses in their curricula. We have a revolution in our midst. Why should one deny it!”
No one today is denying the revolution in our midst, and nearly all Departments of Statistics are currently adding computing courses in their curricula. Zelen’s prediction is spot on beyond biostatistics, thanks to the two Vs of Big Data—volume and velocity. We need more computing, and we need to compute fast. But Chiang’s prediction captures the third V of Big Data, variety, which demands more sophisticated stochastic temporal-spatial models, network models, etc, as well as newer and deeper theory. Chiang was also correct that as long as we deepen our foundations while expanding our horizons, (bio)statistics will continue to flourish and (bio)statisticians will not be out of a job.
Marvin Zelen passed away on the day of Chiang’s memorial workshop. A sad coincidence, or the reunion of two visionary scholars, whose collective predictions capture the very essence of what we experience today and, likely, for generations to come?
Comments on “XL-Files. Statistics vs Data Science: a 30-year-old prediction?”