Guest columnist Dominique Picard considers the statistical legacy of wavelets.

Once upon a time (around the nineties), wavelet analysis emerged as a major tool in various disciplines, including several branches of pure and applied mathematics and statistics. The primary intent at this time was to produce and exploit the properties of a basis which was concentrated at the same time in space and frequency.

This program has indeed been extensively conducted and many results have been achieved ever since, that were not even conceivable. Since then, many of the first objectives have changed and produced new orientations for research. Moreover and quite surprisingly, many of the recent developments in statistics find some of their roots in wavelet theory.

Among them, obviously, the multiple results on functional estimation, inverse problems, minimax rates of estimation, nonparametric Bayesian estimation and adaptation of procedures have certainly be highly stimulated by the growing activities around wavelets. Even the notion of sparsity, which is nowadays extensively used in high dimensional problems, has many connections to this program.

Indeed one major concern in functional estimation at the beginning of the nineties was to investigate how to best express the standard functional regularity assumptions such as Lipschitz or Sobolev conditions. At this point, the Besov spaces $B_{s,p}^q$ have been essential tools since they proved to be the ideal framework in which regularity conditions exactly translated in generalized $\ell_p$ balls in the domain of wavelet coefficients. At this occasion, it appeared that for $p<2, \; q=p $ and specified regularities (namely $s=\frac 1 p- \frac 12$), these Besov spaces were exactly $\ell_p$ spaces, the same $\ell_p$ spaces which are now widely used to express sparsity in many different frameworks.

Concerning the wavelet approach, many new directions of research have meanwhile emerged. Let us illustrate this with two basic aspects which have been recently extensively investigated: adapt the construction of wavelets to different problems and produce new ways of exploiting wavelets constructions.

Adapt the wavelet construction to the problem at hand

Very early, an increasingly important domain of activity has been to find new wavelets and extend their domain of application. Wavelet packets, simlets, curvelets, edgelets, bandelets, …, were thereby produced to better analyse pictures, contours, discontinuities, surfaces and so on. Nowadays significant energy is still dedicated to produce wavelet constructions adapted to various statistical problems.

A seminal example is the spherical data case. This domain has obviously been motivated by a number of important applications. Let us only mention here the fascinating and multiple statistical challenges posed by the astrophysical data: denoising a signal, impainting the zones where it is covered by other radiations, testing stationarity (rotation invariance) or Gaussianity of the signal, investigating the fundamental properties of the cosmic microwave background (CMB), providing cosmological maps, investigating clusters of galaxies, point sources, exploring the true nature of ultra high energy cosmic rays, and more. See for instance the review by Starck[29] on the use of various wavelet tools in this domain.

To our knowledge, the first construction of wavelets on the sphere was introduced by Dahlke et al.[12] using a tensor product basis where one factor is an exponential spline. Schröder and Sweldens[28] used a “lifting scheme”. Other well-known methods are obtained via a domain decomposition approach (see Dahmen and Schneider[13], Canuto et al.[6], Cohen and Masson[9] for related techniques). Other constructions (Antoine and Vandergheynst[1]) use a group theoretical approach. Others as Narcowich et al.[23,24], Petrushev and Xu[25] use a smooth Littlewood Paley decomposition associated to a discretization using a cubature formula.

Generalizing this aspect, an important effort has been put on the construction of wavelets on manifolds or more general domains, with the two especially challenging examples of the construction of wavelets on spaces of matrices or on spaces of graphs to contribute to the emerging field of signal processing on graphs and extending high- dimensional data analysis to networks and other irregular domains.

Wavelet construction starting from an operator

Although wavelets seem to be the ideal framework to study the high frequency phenomena, their use in this domain has been intensively developed only quite recently. There are important technical reasons to explain why the use of wavelets in a high frequency environment is considerably more challenging than in a traditional framework. Loosely speaking, high resolution analysis fundamentally requires the construction of wavelet bases which are specifically adapted to the problem at hand. For instance, in finance, cosmology or tomography, data either are observed on very particular domains, not necessarily well adapted to traditional wavelets (as the sphere in astrophysics) or they are observed after being blurred by a linear operator (Radon transform in the case of tomography, differential operators in finance). More frequently, a combination of these difficulties occur in the same problem.

In a regular manifold or in a linear inverse problem there is often an operator which is genuinely involved in the problem (the Laplacian operator in the manifold case, Radon or convolution operators in standard inverse problems). These operators usually have a basis of eigenvectors which is an important tool of the problem, but very frequently these eigenvectors are poorly concentrated in space (as it is the case for instance, for the trigonometric basis on the interval, or the spherical harmonics in the sphere case).

More precisely, in lots of mathematical situations we have $\mathcal{Y}$, a compact metric space, $ \mu$ a finite Borelian measure and the following decomposition :
$$\mathbb{L}_2(\mathcal{Y},\mu) = \bigoplus_{k=0}^\infty \mathcal{H}_k,
$$
where each $\mathcal{H}_k$ is a finite dimensional eigenspace associated to the spectral decomposition of a natural operator on $(\mathcal{Y},\mu)$ -say $\Delta$-.

Let $L_k$ denote the orthogonal projection on
$\mathcal{H}_k$ with the following kernel (if the $e^k_i$’s are forming an orthonormal basis of $\mathcal{H}_k$):
$$L_k(x,y) = \sum_{i=1}^{l_k} e^k_i(x) e^k_i(y).$$
In some of these situations, the ‘mother’ and ‘father’ needlet ‘basis’ can be expressed in the form:
for $\xi \in \mathcal{X}_j, $
\begin{align*} \phi_{j,\xi}(x) &= \sqrt{\lambda_{j,\xi}}
\sum_{2^{j-1} < k <2^{j+1}} \sqrt{ a(\frac k{2^j})} L_k(x,\xi), \quad \psi_{j,\xi}(x) = \sqrt{\lambda_{j,\xi}} \sum_{2^{j-1} < k <2^{j+1}} \sqrt{ b(\frac k{2^j})} L_k(x,\xi) , \end{align*} where $a$ and $ b$ are regular compactly supported smoothing functions, $\mathcal{X}_j$ is a regular network in $\mathcal{Y}$ and the coefficients $\lambda_{j,\xi}$ are coming from a cubature formula. Such constructions have (under appropriate conditions) the beautiful property of producing localised frames. The localisation results can be found in Narcowitch et al.[23,24], Petrushev and Xu[25], or in Coulhon et al.[11] in a completely general framework. In this direction we also mention the work of Hammond et al.[18] constructing wavelet frames defined on the vertices of an arbitrary finite weighted graph. Constructions of such types have successfully been used in the case of high frequency astrophysical observations for testing the stationarity or the Gaussianity of the cosmological microwave background, as in Baldi et al.[2,3], Pietrobon et al.[26], Delabrouille et al[14], Faye et al.[16]. These applications fundamentally used the concentration in space and frequency of the wavelet bases, leading to wavelet coefficients being well estimated when the signal is present and decorrelated if the signal is stationary. The same constructions have also been used in inverse problems where the coefficients on the eigenvector functions can easily been estimated, leading to the wavelet coefficients by a simple linear transformation (see, for example, Cohen et al.[8], for general inverse models; Kerkyacharian et al.[21], concerning the sphere deconvolution; Kerkyacharian et al.[19, 20] for the Radon transform).

New wavelet methods, learning and Bayesian methods

Massive learning:
In 1997, Donoho[15] highlighted the connection between CART methods (Breiman et al.[5]) and the search for a best orthogonal basis among anisotropic Haar wavelet bases. This connection still makes sense for the more implementable procedures in high dimension such as random forests.

Moving to the frequency domain, in the recent years, methods based on graph Laplacians have become increasingly popular in machine learning. They have been used in unsupervised or semi-supervised learning (Belkin and Niyogi[4]; Zhou et al.[32]), spectral clustering (Spielman and Teng[27]; von Luxburg[31]) and dimensionality reduction (Belkin and Niyogi[4]). Their popularity is mainly due to the following properties of the Laplacian:

• the eigenvectors of the Laplacian translate the deep geometric structure which leads to a natural choice of a kernel in the spectral clustering for instance,
• the Laplacian induces a genuine regularization, adapting to the geometric structure of the data, which is fundamental in various situations: denoising, semi-supervised learning, classification…
• the Laplacian is the generator of a diffusion process which properties are linked with clustering or label propagation in semi-supervised learning.

One generally assumes that the data (even though in a very high dimensional setting) has a hidden, low dimensional structure—for example, the data lies on a low dimensional manifold. The goal is to construct a mapping that parameterizes this low dimensional structure, revealing the intrinsic geometry of the data.

A major step in this direction was achieved by the work on diffusion maps by Coifman et al.[10,22]. This is a deep theory on diffusion processes for finding meaningful geometric descriptions of data sets. It shows in particular that eigenfunctions of Markov matrices can be used to construct coordinates called diffusion maps that generate efficient representations of complex geometric structures. The associated family of diffusion distances, obtained by iterating the Markov matrix, defines multiscale geometries that prove to be useful in the context of data parametrization and dimensionality reduction. This framework especially relates the spectral properties of Markov processes to their geometric counterparts and unifies ideas arising in a variety of contexts such as machine learning, spectral graph theory and eigenmap methods.

Bayesian methods:
Bayesian methods: Works devoted to deeply understanding the behaviour of Bayesian nonparametric methods have recently experienced a considerable development in particular after the seminal works of A. W. van der Vaart et al.[17,30]. Especially, the class of Gaussian processes forms an important family of nonparametric prior distributions, for which precise rates have been obtained. Also, a vast class of priors have been produced using appropriate distributions directly on the collection of wavelet coefficients.

Moreover adaptation to diverse function classes has been obtained using for instance a randomized rescaling of the Gaussian processes. If the data is observed in a more intricate setting than $[0,1]^d$, a regular manifold for instance, this rescaling is made possible by introducing a notion of time decoupled from the underlying space and issued from the semigroup property of a family of operators. Another important difference brought by the geometrical nature of the problem is the underlying Gaussian process, which now originates from an harmonic analysis of the data space, with the rescaling naturally acting on the frequency domain (see Castillo et al.[7]).

This construction of a prior is very much coupled with the general construction in Coulhon et al.[11] of wavelet bases initiated from an operator, and its application in spaces of matrices or graphs currently investigated in probability or learning theory seems promising. Moreover this construction is to be compared to priors obtained via distributions on the collection of wavelet coefficient for well suited wavelet bases.

References

[1] J.-P. Antoine and P. Vandergheynst. Wavelets on the 2-sphere: a group-theoretical
approach. Appl. Comput. Harmon. Anal., 7(3):262–291, 1999.
[2] Paolo Baldi, G´erard Kerkyacharian, Domenico Marinucci, and Dominique Picard.
High frequency asymptotics for wavelet-based tests for Gaussianity and isotropy on
the torus. J. Multivariate Anal., 99(4):606–636, 2008.
[3] Paolo Baldi, G´erard Kerkyacharian, Domenico Marinucci, and Dominique Picard.
Subsampling needlet coefficients on the sphere. Bernoulli, 15(2):438–463, 2009.
[4] Mikhail Belkin and Partha Niyogi. Towards a theoretical foundation for Laplacian-
based manifold methods. J. Comput. System Sci., 74(8):1289–1308, 2008.
[5] Leo Breiman, Jerome Friedman, Richard Olshen, Charles Stone, D Steinberg, and
P Colla. Cart: Classification and regression trees. Wadsworth: Belmont, CA, 156,
1983.
[6] Claudio Canuto, Anita Tabacco, and Karsten Urban. The wavelet element method.
I. Construction and analysis. Appl. Comput. Harmon. Anal., 6(1):1–52, 1999.
[7] Isma¨el Castillo, G´erard Kerkyacharian, and Dominique Picard. Thomas Bayes’ walk
on manifolds. Probab. Theory Related Fields, 158(3-4):665–710, 2014.
[8] A. Cohen, M. Hoffmann, and M. Reiß. Adaptive wavelet Galerkin methods for linear
inverse problems. SIAM J. Numer. Anal., 42(4):1479–1501 (electronic), 2004.
[9] A. Cohen and R. Masson. Wavelet adaptive method for second order elliptic problems:
boundary conditions and domain decomposition. Numer. Math., 86(2):193–
238, 2000.
[10] Ronald R Coifman and Mauro Maggioni. Diffusion wavelets. Appl. Comp. Harm.
Anal., 21(1):53–94, 2006.
[11] T. Coulhon, G. Kerkyacharian, and P. Petrushev. Heat kernel generated frames in
the setting of Dirichlet spaces. Journal of Fourier Analysis and Applications 18 :
995-1066, 2012.
[12] Stephan Dahlke, Wolfgang Dahmen, Ilona Weinreich, and Eberhard Schmitt. Multiresolution
analysis and wavelets on S2 and S3. Numer. Funct. Anal. Optim.,
16(1-2):19–41, 1995.
[13] Wolfgang Dahmen and Reinhold Schneider. Wavelets on manifolds. I. Construction
and domain decomposition. SIAM J. Math. Anal., 31(1):184–230, 1999.
[14] J Delabrouille, J-F Cardoso, M Le Jeune, M Betoule, G Fay, F Guilloux, et al.
A full sky, low foreground, high resolution cmb map from wmap. Astronomy and
Astrophysics, 493:835–857, 2009.
[15] David L Donoho et al. Cart and best-ortho-basis: a connection. The Annals of
Statistics, 25(5):1870–1911, 1997.
[16] Guilloux Fa¨y, Fr´ed´eric Guilloux, Marc Betoule, J-F Cardoso, Jacques Delabrouille,
and Maude Le Jeune. Cmb power spectrum estimation using wavelets. Physical
Review D, 78(8):083013, 2008.
[17] Subhashis Ghosal, Jayanta K. Ghosh, and Aad W. van der Vaart. Convergence
rates of posterior distributions. Ann. Statist., 28(2):500–531, 2000.
[18] David K. Hammond, Pierre Vandergheynst, and R´emi Gribonval. Wavelets on
graphs via spectral graph theory. Applied and Computational Harmonic Analysis,
30(2):129 – 150, 2011.
[19] G´erard Kerkyacharian, George Kyriazis, Erwan Le Pennec, Pencho Petrushev, and
Dominique Picard. Inversion of noisy Radon transform by SVD based needlets.
Appl. Comput. Harmon. Anal., 28(1):24–45, 2010.
[20] G´erard Kerkyacharian, Erwan Le Pennec, and Dominique Picard. Radon needlet
thresholding. Bernoulli, 18(2):391–433, 2012.
[21] G´erard Kerkyacharian, Thanh Mai Pham Ngoc, and Dominique Picard. Localized
deconvolution on the sphere. Ann. Statist., 39(2):1042–1068., 2011.
[22] Boaz Nadler, St´ephane Lafon, Ronald R. Coifman, and Ioannis G. Kevrekidis. Diffusion
maps, spectral clustering and reaction coordinates of dynamical systems.
Applied and Computational Harmonic Analysis, 21(1):113–127, 2006.
[23] F. Narcowich, P. Petrushev, and J. Ward. Local tight frames on spheres. SIAM J.
Math. Anal., 2006.
[24] F. J. Narcowich, P. Petrushev, and J.M. Ward. Decomposition of besov and triebellizorkin
spaces on the sphere. J. Funct. Anal., 2006.
[25] P. Petrushev and Y. Xu. Localized polynomial frames on the interval with Jacobi
weights. J. Fourier Anal. Appl., 11(5):557–575, 2005.
[26] D. Pietrobon, Paolo Baldi, G´erard Kerkyacharian, Domenico Marinucci, and Dominique
Picard. Spherical needlets for cmb data analysis. Monthly Notices of the
Royal Astronomical Society, 383:539–545, 2008.
[27] Spielman, D. and Teng, S. (1996). Spectral partitioning works: planar graphs and finite element meshes. In 37th Annual Symposium on Foundations of Computer Science (Burlington, VT, 1996) (pp. 96 – 105). Los Alamitos, CA: IEEE Comput. Soc. Press.
[28] Peter Schr¨oder and Wim Sweldens. Spherical wavelets: Efficiently representing
functions on the sphere. In Proceedings of the 22nd annual conference on Computer
graphics and interactive techniques, pages 161–172. ACM, 1995.
[29] Jean-Luc Starck, Fionn Murtagh, and Jalal M Fadili. Sparse image and signal processing:
wavelets, curvelets, morphological diversity. Cambridge University Press,
2010.
[30] Aad W. van der Vaart and Harry van Zanten. Rates of contraction of posterior
distributions based on Gaussian process priors. Ann. Statist., 36(3):1435–1463,
2008.
[31] Ulrike Von Luxburg. A tutorial on spectral clustering. Statistics and computing,
17(4):395–416, 2007.
[32] Dengyong Zhou, Thomas Hofmann, and Bernhard Sch¨olkopf. Semi-supervised learning
on directed graphs. In Advances in Neural Information Processing Systems,
pages 1633–1640, 2005.