Chanwoo Lee received his PhD at University of Wisconsin–Madison in 2023, advised by Dr. Miaoyan Wang. Before joining UW–Madison, he received a BS in Mathematical Science and Statistics in 2018 from Seoul National University, Korea. He is broadly interested in statistics, machine learning, and optimization. He has worked on developing statistical tools for analyzing matrix or tensor-valued data.

This will be one of three Lawrence D. Brown PhD Student Award winners’ talks in a special session at the 11th World Congress in Probability and Statistics in Bochum, Germany, August 12–16, 2024.

Statistical and computational rates in high rank tensor estimation

The analysis of higher-order tensors has recently drawn much attention in statistics, machine learning, and data science. Higher-order tensor datasets are collected in applications including recommendation systems, social networks, neuroimaging, genomics, and longitudinal data analysis. One example is a multi-tissue expression data. This dataset collects genome-wide expression profiles from different tissues in a number of individuals, which results in three-way tensor of gene by individual by tissue. Another example is hypergraph networks, in which edges are allowed to connect more than two vertices. Considering multi-way interactions based on hypergraphs helps to understand complex networks in molecule system and computer vision. Tensors are naturally used to represent such hypergraph structures. Along with many important applications, tensor methods have provided effectiveness in data analysis that classical vector- or matrix-based methods fail to offer.

One of popular structures imposed on the tensor of interest is the low-rankness. Common low rank models include CP low rank models, Tucker low rank models, and block models. Despite the popularity of the low rank assumption, it is rather restricted to assume that the rank of the tensor remains fixed while the tensor dimension increases to infinity. In particular, low rank assumption is sensitive to entrywise transformation and inadequate for representing special structures of tensors. In addition, low rank tensors are nowhere dense, and random matrices/tensors are almost surely of full rank. This motivates us to develop a more flexible model that can handle possibly high rank tensors.

In this talk, we develop a latent variable tensor model that addresses both low and high rank tensors. Our model includes, but is not limited to, most existing tensor models such as CP models, Tucker models, generalized linear models, single index models, and simple hypergraphon models. Comprehensive results are developed on both the statistical and computational limits for the signal tensor estimation under the latent variable tensor model.

First, we find that high-dimensional latent variable tensors are of log-rank, which provides a rigorous justification for the empirical success of low-rank methods despite the prevalence of high rank tensors in real data applications.

Second, we discover the gap between statistical and computational optimality in the higher order tensor estimation. We prove the statistically minimax optimal rate of the problem. We find that this rate, however, is non-achievable by any polynomial-time algorithms under hypergraphic planted clique (HPC) conjecture. We then show that a slower computationally optimal rate is achievable by polynomial-time algorithms.

Third, we propose two estimation methods with accuracy guarantees: the least-square estimation (LSE) and double-projection spectral estimation (DSE). The LSE achieves the information-theoretical lower bound demonstrating its statistical optimality. The computation of LSE, however, requires possibly non-polynomial complexity. We then propose the DSE using the idea of double- projection spectral method and the log-rank property of latent variable tensors. We show that the DSE achieves the computational optimal bound within the subclass of polynomial-time estimators.

Numerical experiments and real data applications will be presented to demonstrate the practical merits of our methods.

This talk is based on joint work with Miaoyan Wang.


Apply for next year’s PhD Student Award

The IMS Lawrence D. Brown PhD Student Award is open for applications. The deadline is May 1, 2024. Eligible applicants compete to be one of three speakers at an invited session as part of the IMS Annual Meeting (the 2025 Joint Statistical Meetings, in Nashville, USA, August 2–7, 2025). The award includes reimbursement for travel and meeting registration fee (up to $2,000 for each recipient).

The award was created in memory of Lawrence D. Brown (1940–2018), professor of statistics at The Wharton School, University of Pennsylvania, who was an enthusiastic and dedicated mentor to many graduate students. For application details: https://imstat.org/ims-awards/ims-lawrence-d-brown-ph-d-student-award/