The Student Puzzle Corner contains one or two problems in statistics or probability. Sometimes, solving the problems may require a literature search.
Current student members of the IMS are invited to submit solutions electronically (to bulletin@imstat.org with subject “Student Puzzle Corner”). Deadline May 1, 2014.
The names and affiliations of (up to)the first 10 student members to submit correct solutions, and the answer(s) to the problem(s), will be published in the next issue of the Bulletin.
The Editor’s decision is final.

Student Puzzle Corner 3

Let $P,Q$ be two randomly chosen points on the surface of the Earth and let $D$ be the Euclidean distance between $P$ and $Q$. Assuming that Earth is a perfect sphere of radius 3960 miles, find the exact value of $E(D)$. Notice that we are not asking for $E(D^2)$, but $E(D)$ itself.

Airplanes generally travel approximately along the geodesic distance, because to take the path corresponding to the Euclidean distance, one has to go through the interior of the Earth. It is possible to find how how much larger the geodesic distance is than the Euclidean distance on the average.

Solution to the previous Student Puzzle Corner
Tengyuan Liang at the Wharton School, University of Pennsylvania, sent the correct value of $\mu$ (though without sending any work).

Note that the correct value just refers to the $\mu$ that was used to generate the six data values. Of course, $\mu$ cannot be exactly estimated, but we can formulate the estimation problem.

Let $\mathcal{X} = \{9.73, 9.77, 9.57, 9.75, 8.95, 9.73\}$ denote the set of sample values. Let $f_0$ denote the normal density with mean $\mu$ and standard deviation $\frac{1}{30}$ and $f_1$ the standard Cauchy density
with median $\mu$. We are not told how many or which of the sample values are from $f_0$.

Let $A \subseteq \mathcal{X}$ consist of the observations from $f_0$, and $A^c$ the observations from $f_1$; there are $64$ such subsets of $\mathcal{X}$. We could try to maximize the likelihood function
$\prod_{x \in A}f_0(x\,|\mu )\times \prod_{x \in A^c}f_1(x\,|\mu )$
over $A$ and $\mu$. This is a formal attack.

An informal attack would be to treat it as a problem in simple data analysis, and conclude that the clocks showing the times $9:34$ and $8:57$ have become completely unreliable and treat the other four as a Gaussian sample.

Both approaches lead to a value of $\mu$ close to $9:45$, and then you guess that probably $9:45$ was used for the simulation purpose.