Deadline: July 1, 2023

Student Puzzle Editor Anirban DasGupta poses another two problems, and says, “Once again, we propose one problem on statistics and one on probability. I think you will find both problems to be new to you and interesting. If you are not able to answer either problem analytically, send us meaningful computational answers, and we will look at them too.” Student IMS members: send us your solution, to either or both, to bulletin@imstat.org (with subject “Student Puzzle Corner”).  

 

Puzzle 45.1.

Suppose $X_1, X_2, \cdots $ is a sequence of i.i.d. standard normal variables. For any given $n \geq 1$, define $R_n = \frac{\mbox{max}\{X_1^2, \cdots , X_n^2\}} {X_1^2+\cdots + X_n^2}$. Let also $\pi (n)$ denote the usual prime counting function, i.e., $\pi(n)$ denotes the number of prime numbers $\leq n$.

Prove that $\frac{R_n}{\frac{2}{\pi(n)}}\stackrel{a.s}{\to} 1$.

 

Puzzle 45.2.

This interesting question was first raised by Brad Efron. How far from a t-confidence interval is the true value of a population mean when the t-interval misses the true value?

Formally, assume below that we have i.i.d. normal variables with mean $\mu$ and variance $\sigma ^2$, and let for given $n \geq 2, C_n$ denote the usual $100(1-\alpha )\%$ $t$-confidence interval for $\mu $.

Derive an asymptotic approximation for $E(\mbox{dist}(\mu, C_n)\,|\mu \notin C_n)$, where $\mbox{dist}(\mu, C_n)$ stands for the (Euclidean) distance between $\mu$ and the interval $C_n$.  

 

Solution to Puzzle 44

Well done to Soham Bonnerjee (University of Chicago) who sent a correct solution to 44.2 below. Anirban DasGupta explains:

Puzzle 44.1  

This is a special case of an extremely interesting problem known as Moser’s worm problem. In 1966, Leo Moser asked what is the minimal area of a geometric region that can enclose any closed curve of a given length $L$ in the plane? To my knowledge, in this generality, the problem remains open. If we restrict our region to be a circle, then it is known that a circle of diameter $\frac{L}{2}$ suffices (see Harold Johnson’s 1974 article in the Proceedings of the American Mathematical Society). We can apply this sufficient condition to L = 8 in our problem.

 

Puzzle 44.2

Actually, the answers to the parts of this problem have nothing to do with the sequence of observations being i.i.d. from a standard Cauchy. Let $F$ be the CDF of a real valued random variable such that $F(x)$ is a continuous function on the real line and $F(-x) = 1-F(x)$ for all real $x$. Define a function $h(x,y) = I_{x+y>0}$. We are interested in various asymptotic properties of $T_n = \frac{1}{{n \choose 2}}\,\sum_{1 \leq i < j \leq n}\,h(X_i, X_j)$.

Although we can solve this problem directly by using calculations for $m$-dependent sequences, it is the best to use Hoeffding’s theory of $U$-statistics. The statistic we are interested in is a $U$-statistic of order $m = 2$. Due to the symmetry and continuity of our $F$, it follows that $T_n \stackrel {\mathcal{P}} {\longrightarrow} \frac{1}{2}$. Also, by considering the number of overlaps in two pairs of indices $(i,j), (k,l)$, one can calculate the variance of $T_n$ exactly. It then follows that $a_n\,(T_n – c) \stackrel {\mathcal{L}} {\longrightarrow} G$, with $a_n = \sqrt{n}, c = \frac{1}{2}, $ and $G$ the normal distribution with mean $0$ and variance $\frac{1}{3}$.