Student Puzzle Editor Anirban DasGupta poses another two problems, and says, “Our statistics problem this time is on universally consistent estimation of the regression coefficients for not necessarily Gaussian errors, and the probability problem is on probabilistic graph theory. The statistics problem is a practically important problem that will make you think, and the probability problem is a fun, unusual problem, but a simple one.” Send us your solution, to either or both, by September 15, 2023.

Puzzle 46.1.
Consider the usual linear model \[ Y_i = \beta_0 + \beta_1 \,x_{i,1}+\cdots + \beta_p\,x_{i,p} + \epsilon_i, i = 1, 2, \cdots n, 1 \leq p < \infty. \] We assume that $\epsilon_i \stackrel{iid} {\sim} f(z)$, where \[ f(z) = c(\alpha )\, e^{-|z|^\alpha}, -\infty < z < \infty , 0 < \alpha < \infty ,\] and $c(\alpha )$ is the normalizing constant. Provide infinitely many explicit consistent estimators of the vector of regression coefficients, estimators that are consistent under all error densities $f$ stated above.

Puzzle 46.2.
Suppose $X, Y, Z$ are iid Poisson with mean $\lambda > 0$. Let $f(\lambda ) = P_{\lambda}\,(X, Y, Z \mbox{are the degrees of a nonempty graph on 3 vertices})$. Find $\sup_{\lambda > 0}\, f(\lambda )$.

Solution to Puzzle 45

Thanks to Bilol Banerjee (ISI Kolkata) and Soham Bonnerjee (University of Chicago), who sent in answers, and a special well done to Bishakh Bhattacharya (ISI Kolkata) whose solution to 45.1 was “admirably detailed.” Anirban DasGupta explains:

Puzzle 45.1.
The key is to show separately that (almost surely) $\max\{X_1^2, X_2^2, \cdots , X_n^2\} \sim 2\,\log n$ and $\sum_{i=1}^n\,X_i^2 \sim n$, where we say that for two positive sequences $a_n, b_n, a_n \sim b_n$ if $\frac{a_n}{b_n} \to 1$. That $\sum_{i=1}^n\,X_i^2 \sim n$ is just the strong law. To show that $\max\{X_1^2, X_2^2, \cdots , X_n^2\} \sim 2\,\log n$, let $F(x) = 2\,\Phi(x) -1 $ denote the CDF of $|X_1|$ and $q_n = F^{-1}(1-\frac{1}{n}) = \Phi^{-1}(1-\frac{1}{2n}) \sim \sqrt{2\log n}$. By using the formula for $F(x)$, we show easily that for each $c > 1, \sum_{n = 1}^\infty \, \bigg [1- F(c\,q_n)\bigg ] < \infty $. This suffices to show that $\max\{|X_1|, |X_2|, \cdots , |X_n|\} \sim F^{-1}(1-\frac{1}{n}) \sim \sqrt{2\log n}$, and hence, $\max\{X_1^2, X_2^2, \cdots , X_n^2\} \sim 2\,\log n$. Therefore, $\max\{X_1^2, X_2^2, \cdots , X_n^2\} /\sum_{i=1}^n\,X_i^2 \sim \frac{2\,\log n}{n}$. On the other hand, by the prime number theorem, $\pi (n) \sim \frac{n}{\log n}$. This finishes the proof of puzzle 45.1.

Puzzle 45.2.
Denoting the $t$-percentile $t_{\alpha /2, n-1}$ by $c$, the $t$-confidence interval at level $1-\alpha $ is $C_n = \bar{X}\,\pm c \frac{s}{\sqrt{n}}$. Hence, $\mu \not\in \, C_n \Leftrightarrow \mu > \bar{X}\,+c \frac{s}{\sqrt{n}} \, \cup \mu < \bar{X}\,-c \frac{s}{\sqrt{n}}$. Therefore, by symmetry considerations, \[ E\,[\mbox{dist}(\mu, C_n)\,|\mu \, \not\in C_n\,] = 2\,\times \,E\,[\mu -(\bar{X}\,+c \frac{s}{\sqrt{n}})\,| \mu > \bar{X}\,+c \frac{s}{\sqrt{n}}\,] \] \[ = 2\,\times \frac{2}{\alpha }\, \int \frac{s {\sqrt{n}}\,(t-c)\,I_{t > c}\,f_n(s,t)dsdt, \] where $t$ denotes the $t$-statistic $\frac{\sqrt{n}\,(\bar{X}-\mu)}{s}$, and $f_n(s,t)$ denotes the joint density of $(s,t)$, and this can be calculated by the usual Jacobian method, starting with the joint density of $(\bar{X}, s)$. A (long) calculation then shows that \[ \int \frac{s}{\sqrt{n}}\,(t-c)\,I_{t > c}\,f_n(s,t)dsdt\] \[ = \frac{\sigma }{\sqrt{n}}\, [\phi (z_{\frac{\alpha }{2}}) – \frac{\alpha }{2}\, z_{\frac{\alpha }{2}}] + O(n^{-3/2}), \] In order to get this above expression in terms of $z_{\frac{\alpha }{2}}$, one will need to use the fact that for any given $\alpha , c = z_{\frac{\alpha }{2}} + O(\frac{1}{n})$.