We in IMS must build the foundations of this emerging field

Data science is at a crossroads. Will it become a fundamentally applied discipline, a collection of heuristics without any coherent mathematical underpinning? Or will a rigorous foundation lead to practical new tools and algorithms with provable properties? IMS members Sofia Olhede and Patrick Wolfe, co-coordinators of the IMS Data Science Group, argue that the IMS should become the preeminent professional society dedicated to building the foundations of this emerging field. Join the group or volunteer to help lead it by writing to them before the end of 2017 at datascience@imstat.org, and follow the group on Twitter at @imsdatascience.

Sofia Olhede and Patrick J. Wolfe, joint coordinators of the new IMS Data Science Group

The IMS Bulletin has reported on Data Science prior to this date: Bin Yu, IMS president in 2014, even entitled her societal address “Let Us Own Data Science,” calling on probabilists and mathematical statisticians to do precisely that. Yet data science can at times seem less clearly focused on the broad mathematical and computational sciences, and more obviously connected to the application of these fields in practice. But to be rigorous and replicable, data science requires tools whose theoretical properties are well understood. Consequently, all of us in mathematical statistics and probability have an opportunity to contribute to foundational aspects of data science.

The IMS set up a data science group in 2015 and also launched its first IMS data science conference the same year, as announced by 2016 IMS president Richard Davis in his incoming agenda message. Jon Wellner highlighted the importance of data science for education in his 2017 presidential address. Foundational developments impact how and what we teach, in turn adding to what the next generation of probabilists and mathematical statisticians will learn, as recognized by the U.S. National Academies’ recent work to envision undergraduate data science education.

Whither, then, data science and the role of the IMS? Let us explore what needs to be done and how the IMS can contribute.

First, as the global conversation on data science education continues, it is more important than ever that theoretical parts of data science are given their voice and input into the debate. The IMS is a natural group to provide such input, with a unique voice, especially at the level of graduate education. At the same time, we should also increase the presence of the IMS in undergraduate and even secondary education, as the pipelines for future leading theoretical scholars should be built at the earliest stage possible.

Second, the prevalence of data science needs will alter and enhance what types of inference and prediction problems we study. This is a scientific question, and to contribute as a group we must self-organize. This means recognizing venues for publication, and organizing workshops and special sessions at larger meetings. We, as a group, must determine the best way to organize such initiatives.

Third, we can make a greater effort to connect the data science community globally, irrespective of geographic or national boundaries. Every day it seems, new centers and groupings are being organized: from Big Insight in Norway, to the Data Science Institute Vancouver in Canada, to ACEMS Better in Australia, to Fudan School of Data Science in China, to the Insight Centre for Data Analytics in Ireland—along with a host of activities at leading universities around the world. These centers all have a unique mixture of the mathematical sciences and the computational sciences, creating new scientific communities. By serving as a hub of communication, through IMS we can help to make sure all these excellent initiatives are aware of each other and communicate between themselves. We bring distinct expertise, augmenting and complementing sister groups such as the American Statistical Association’s Statistical Learning and Data Science Section and the Royal Statistical Society’s Data Science Section.

Fourth, data science presents policy questions relating to ethics and data governance. To ground these questions in a solid theoretical framework where we can compare and understand problems formally and precisely, we must contribute to this discussion and ensure that the foundations of data science take it into account. Because data science has policy implications, we have a unique responsibility and an opportunity to ensure that this debate is sound. Technology is developing very rapidly, and our foundational input is needed to keep pace, as described in the UK Royal Society’s recent work on data governance and the Institute of Electrical and Electronics Engineers’ global initiative for ethically aligned design.

Last, and perhaps most crucially for our community, data science provides us with an unprecedented opportunity to connect to, and be inspired by, real problems of societal importance. To make this connection can often be challenging, especially for new researchers and those who have so far focused only on purely theoretical challenges. We see clear and compelling opportunities to build new interfaces between impactful activities in problems in industry and government on the one hand, and the foundational underpinnings of data science on the other, and to partner with other IMS Groups such as the one for New Researchers.

If these or other data science opportunities sound interesting to you, please get in touch at datascience@imstat.org and tell us what you think! We’re looking for group members, as well as to identify approximately ten volunteers before the end of 2017 to serve on the executive, helping to recruit additional members and to organize meetings and sessions. We also expect in due course to form working groups to contribute to IMS committees wherever input on data science is needed, such as the five key needs discussed above.

We are aiming explicitly for the group to encompass a broad spectrum of geographical, disciplinary, and career stage coverage, and would like explicitly to encourage members of traditionally underrepresented groups within the mathematical sciences to volunteer to help lead.

Join us—we can’t wait to hear from you!