Yuetian Luo is currently a postdoctoral scholar in the Data Science Institute at the University of Chicago. He received his PhD in Statistics from the University of Wisconsin–Madison in 2022, advised by Anru Zhang. He is broadly interested in methodology, computation, and theory in complex and large-scale statistical inference problems. In the past, he has worked on developing efficient algorithms for high-dimensional matrix/tensor learning problems. Many of these problems are nonconvex and one of his focuses is to understand the statistical guarantees for these algorithms. Recently, he has also become interested in distribution-free inference. 

Yuetian will give this talk in the Lawrence Brown PhD Student Award session at JSM Toronto.

Tensor-on-tensor Regression: Riemannian Optimization, Over-parameterization, Computational Barriers, and Their Interplay

The analysis of tensor or multiway array data has emerged as a very active topic of research in statistics, applied mathematics, machine learning, and signal processing, along with many important applications, such as neuroimaging analysis, latent variable models, and collaborative filtering. In this talk, we consider a general class of problems termed tensor-on-tensor regression, which aims to characterize the relationship between covariates and responses in the form of scalars, vectors, matrices, or high-order tensors. The generic tensor-on-tensor regression covers many special tensor regression models in literature, such as scalar-on-tensor regression, tensor-on-vector regression and scalar-on-matrix regression. 

There is a great surge of interest in tensor-on-tensor regression for its applications in neuroimaging data analysis to compare MRI scans across different autism spectrum disorder groups, in facial image data analysis to predict describable attributes from a facial image and in longitudinal relational data analysis to estimate the longitudinal relation interaction effect. 

In this talk, we assume the tensor responses in the tensor-on-tensor regression are connected with tensor covariates with a low Tucker rank parameter tensor/matrix without the prior knowledge of its intrinsic rank. Despite significant efforts in the literature, a couple of key questions for tensor-on-tensor regression are still missing:

1. Can we develop fast and statistically optimal solutions for the general low-rank tensor-on-tensor regression?

2. Can we solve tensor-on-tensor regression robustly without knowing the intrinsic rank of the parameter of interest?

3. Is there a statistical-computational gap in tensor-on-tensor regression? What is the difference between tensor and matrix settings?

4. Is there any interplay among Riemannian optimization, over-parameterization, and statistical-computational gap?

We aim to answer the four questions above. First, we propose the Riemannian gradient descent (RGD) and Riemannian Gauss–Newton (RGN) methods and cope with the challenge of unknown rank by studying the effect of rank over-parameterization. We provide the first convergence guarantee for the general tensor-on-tensor regression by showing that RGD and RGN respectively converge linearly and quadratically to a statistically optimal estimate in both rank correctly-parameterized and over-parameterized settings. Our theory reveals an intriguing phenomenon: Riemannian optimization methods naturally adapt to over-parameterization without modifications to their implementation. This is significantly different from the classic factorized gradient descent algorithm where preconditioning is needed in the over-parameterized setting. We also prove the statistical-computational gap in scalar-on-tensor regression by a low-degree polynomial argument. Our theory demonstrates a “blessing of statistical–computational gap” phenomenon: in a wide range of scenarios in tensor-on-tensor regression for tensors of order three or higher, the computationally required sample size matches what is needed by moderate rank over-parameterization when considering computationally feasible estimators, while there are no such benefits in the matrix settings. This shows moderate rank over-parameterization is essentially “cost-free” in terms of sample size in tensor-on-tensor regression of order three or higher. Finally, we provide efficient implementations for both RGD and RGN and conduct simulation studies to show the advantages of our proposed methods and to corroborate our theoretical findings.

This is joint work with Anru Zhang, when I was a PhD candidate in Statistics at the University of Wisconsin–Madison.