Deadline: July 1, 2023

Student Puzzle Editor Anirban DasGupta poses another two problems, and says, “Once again, we propose one problem on statistics and one on probability. I think you will find both problems to be new to you and interesting. If you are not able to answer either problem analytically, send us meaningful computational answers, and we will look at them too.” Student IMS members: send us your solution, to either or both, to bulletin@imstat.org (with subject “Student Puzzle Corner”).  

 

Puzzle 45.1.

Suppose X1,X2, is a sequence of i.i.d. standard normal variables. For any given n1, define Rn=max{X12,,Xn2}X12++Xn2. Let also π(n) denote the usual prime counting function, i.e., π(n) denotes the number of prime numbers n.

Prove that Rn2π(n)a.s1.

 

Puzzle 45.2.

This interesting question was first raised by Brad Efron. How far from a t-confidence interval is the true value of a population mean when the t-interval misses the true value?

Formally, assume below that we have i.i.d. normal variables with mean μ and variance σ2, and let for given n2,Cn denote the usual 100(1α)% t-confidence interval for μ.

Derive an asymptotic approximation for E(dist(μ,Cn)|μCn), where dist(μ,Cn) stands for the (Euclidean) distance between μ and the interval Cn.  

 

Solution to Puzzle 44

Well done to Soham Bonnerjee (University of Chicago) who sent a correct solution to 44.2 below. Anirban DasGupta explains:

Puzzle 44.1  

This is a special case of an extremely interesting problem known as Moser’s worm problem. In 1966, Leo Moser asked what is the minimal area of a geometric region that can enclose any closed curve of a given length L in the plane? To my knowledge, in this generality, the problem remains open. If we restrict our region to be a circle, then it is known that a circle of diameter L2 suffices (see Harold Johnson’s 1974 article in the Proceedings of the American Mathematical Society). We can apply this sufficient condition to L = 8 in our problem.

 

Puzzle 44.2

Actually, the answers to the parts of this problem have nothing to do with the sequence of observations being i.i.d. from a standard Cauchy. Let F be the CDF of a real valued random variable such that F(x) is a continuous function on the real line and F(x)=1F(x) for all real x. Define a function h(x,y)=Ix+y>0. We are interested in various asymptotic properties of Tn=1(n2)1i<jnh(Xi,Xj).

Although we can solve this problem directly by using calculations for m-dependent sequences, it is the best to use Hoeffding’s theory of U-statistics. The statistic we are interested in is a U-statistic of order m=2. Due to the symmetry and continuity of our F, it follows that TnP12. Also, by considering the number of overlaps in two pairs of indices (i,j),(k,l), one can calculate the variance of Tn exactly. It then follows that an(Tnc)LG, with an=n,c=12, and G the normal distribution with mean 0 and variance 13.