Nicholas Horton (Amherst College) was a co-author of the recent National Academies consensus study report that is available for free download from https://nas.edu/envisioningds. He writes:
Recent years have seen the dramatic rise of data science, revolutionizing industry and science. The NSF-funded National Academies consensus report entitled Undergraduate Data Science: Opportunities and Options (NASEM, 2018) noted that “as more data and ways of analyzing them become available, more aspects of the economy, society, and daily life will become dependent on data.”
Much has been written on the growth of data science and the role of statistics plays within it (see for example Donoho, 2017). Historically, working in data science has required a graduate degree. However, many reports indicate a shortage of well-trained data scientists to fill new positions, with many opportunities now available to those with appropriate undergraduate training. Given the demands of the workforce, the committee, chaired by Laura Haas (University of Massachusetts/Amherst) and Al Hero (University of Michigan) was charged with setting forth a vision for undergraduate data science with a focus on applications of and careers in data science.
The second chapter of the report laid out key concepts that data science professionals need to know. Building on the work of De Veaux et al. (2017), the report proposes “data acumen” as a framework for the education of future data scientists. This requires “exposure to key concepts in data science, real-world data and programs that can reinforce the limitations of tools, and ethical considerations that permeate many applications”. The committee outlined ten (overlapping) areas fundamental to developing data acumen: Mathematical foundations; Computational foundations; Statistical foundations; Data management and curation; Data description and visualization; Data modeling and assessment; Workflow and reproducibility; Communication and teamwork; Domain-specific consideration; and Ethical problem solving.
Mathematics is essential to data science, but questions remain about what type and how much mathematics is needed for bachelors’ graduates. The committee identified key concepts that would be important for all students, including set theory and basic logic; multivariate thinking (via functions and graphical displays); basic probability theory and randomness; matrices and basic linear algebra; networks and graph theory; and optimization.
Statistics was also seen as foundational to data science. Key concepts identified by the committee include variability, uncertainty, sampling error, and inference; multivariate thinking; non-sampling error, design, experiments, biases, confounding, and causal inference; exploratory data analysis; statistical modeling and model assessment; and simulations and experiments.
The third chapter of the report focused on how to develop courses (e.g., data science for all, introduction to data science) and programs (e.g., certificates, minors, and majors) that would provide flexible pathways to students. The fourth chapter reviewed challenges and barriers that need to be addressed in developing data science programs. The fifth chapter reiterated the key role that formative and summative assessment and faculty development plays in advancing data science.
What are the implications of the report and the growth of undergraduate data science for statisticians and the IMS? De Veaux et al (2017) noted that: “Students should understand the basic statistical concepts of data collection, data wrangling, data analysis, modeling, and inference. … Successful graduates should be able to apply statistical knowledge and computational skills to formulate problems, plan data collection campaigns or identify and gather relevant existing data, and then analyze the data to provide insights.”
More work is needed to create courses and flexible pathways that can provide sufficient mathematical and statistical background without a long succession of prerequisite courses, while also ensuring that students have strength in algorithmic thinking, data technologies, and domain knowledge.
The report notes that data science is in a formative development stage with robust growth likely. Academic institutions are recommended to “embrace data science as a vital new field” and “provide and evolve a range of educational pathways to prepare students for an array of data science roles in the workplace” (NASEM, 2018).
More discussion is also needed about future preparation at the graduate level, to ensure that interested data science graduates at the bachelors’ level are able to matriculate and successfully complete doctoral programs in statistics.
At a time when many (most?) institutions are pioneering data science programs, it is important for mathematical statisticians to ensure that they are part of the process of attracting students with varied backgrounds and degrees of preparation and preparing them for success in a variety of careers.
De Veaux, R., et al. (2017). Curriculum guidelines for undergraduate programs in data science, Annual Review of Statistics and its Applications, 4:15-30. https://www.annualreviews.org/doi/abs/10.1146/annurev-statistics-060116-053930.
Donoho, D. (2017). 50 Years of Data Science, Journal of Computational and Graphical Statistics, 26:4, 745–766. doi:10.1080/10618600.2017.1384734.
National Academies of Sciences, Engineering, and Medicine (2018). Data Science for Undergraduates: Opportunities and Options. Washington, DC: The National Academies Press. doi:10.17226/25104.