Nancy Ruonan Zhang is a Ge Li and Ning Zhao Professor of Statistics in The Wharton School at University of Pennsylvania. Her research focuses primarily on the development of statistical and computational approaches for the analysis of genetic, genomic, and transcriptomic data. In the field of Genomics, she has developed methods to improve the accuracy of copy number variant and structural variant detection, methods for improved FDR control in genomic studies, and methods for analysis of single-cell RNA sequencing data. In the field of Statistics, she has developed new models and methods for change-point analysis, variable selection, and model selection. Dr. Zhang has also made contributions in the area of tumor genomics, where she has developed analysis methods to improve our understanding of intra-tumor clonal heterogeneity.

Nancy obtained her PhD in Statistics in 2005 from Stanford University. After one year of postdoctoral training at University of California, Berkeley, she returned to the Department of Statistics at Stanford University as Assistant Professor in 2006. She received the Sloan Fellowship in 2011, and formally moved to University of Pennsylvania in 2012. At Penn, she is a member of the Graduate Group in Genomics and Computational Biology, and currently serves as the Vice Dean of the Wharton Doctoral Program.

Nancy’s Medallion lecture will take place at the online JSM, August 8–12, 2021.

DNA copy number profiling from bulk tissue to single cells

The completion of the human genome two decades ago gave birth to the expansive and cross-disciplinary field of Genomics, and along with it, our own community of Statistical Genomics. From microarrays to high throughput sequencing, from genome-wide association studies to the recent advances in single cell profiling, wave after wave of technological innovation have fed Statistics with new data challenges that spurred methodological and theoretical developments. In this lecture, I will focus on two specific areas of genomics: single cell sequencing and DNA copy number profiling, and describe the critical role of Statistics in their scientific development. I will start with DNA copy number profiling in bulk tissues, review the scientific background and early models, and describe how these models have adapted to adjust to the shifting sands of technological change. I will briefly survey the statistical developments that were seeded by these scientific inquiries, from change-point detection to multi-channel scan statistics to latent variable modeling. On the scientific side, I will focus on DNA copy number profiling in cancer and its role in the study of cancer cell evolution.

Despite our best computational efforts, bulk tissue sequencing can only tell us so much about how DNA copy number varies between single cancer cells within a tumor. Since cancer is, quite simply, a Darwinian evolution of cells driven by somatic mutations, it is important to detect and study these cell-to-cell DNA copy number variations. For example, the copy number heterogeneity for a given tumor has been found to be a useful prognostic marker. In the second half of my talk, I will turn to the modeling of data from single cell technologies, which have revolutionized the field of biology during the last decade. I will describe how the large, sparse data matrices from single cell experiments have inspired new models and statistical problems. I will also describe, to some detail, a specific method that we developed for allele-specific copy number estimation at the single cell level. The method, Alleloscope1, enabled the discovery of previously hidden types of variation within tumor cell populations.

Apart from the framing of problems and the proposal of their (partial) solutions, I hope to convey through this talk some lessons that I have learned about the role of Statistics in today’s scientific process. Science, as always, is driven by technology, and today’s fast-paced turnover in technology gives us large data sets where exploratory hypothesis generation is a primary challenge, and where low hanging fruits often render statistical inference an afterthought. Through my own winding journey on the problem of copy number estimation in bulk and single cell sequencing data, I will reflect on common pitfalls and emerging opportunities.

1 Wu C-Y, Lau BT, Kim H, Sathe A, Grimes SM, Ji HP, Zhang NR (2021) Integrative single-cell analysis of allele-specific copy number alterations and chromatin accessibility in cancer. Nature Biotechnology, forthcoming.