Institute of Mathematical Statistics | Student Puzzle Corner 49

February 15, 2024

Deadline: March 15, 2024.

Student members of IMS are invited to submit solutions to bulletin@imstat.org (subject “Student Puzzle Corner”). If correct, we’ll publish your name (and photo, if there’s space), and the answer, in the next issue. The Puzzle Editor is Anirban DasGupta. His decision is final.

Anirban DasGupta says, “We have a fun component in our problems this time. The probability problem is an interesting theoretical calculation. The statistics problem is a contest: the top three winners will get our usual recognition, of name, affiliation and (if there’s room) photo. Of course, we will also recognize those who send correct answers to the probability problem alone and do not attempt the contest problem.” Here are the two components of this month’s problem:

Puzzle 49.1
Suppose we have n i.i.d. standard normal observations, and n i.i.d. standard Cauchy observations, and assume furthermore that all 2n observations are mutually independent.

(a) Derive an expression for μ(n), the expected number of Cauchy observations that fall within the convex hull of the normal observations.

(b) Compute this expected value when n = 7, 25, 50.

Puzzle 49.2
And now our contest problem. The following dataset of 14 observations consists of seven standard normal and seven standard Cauchy observations. They were not made up. There are seven simulated standard normal and seven simulated standard Cauchy observations in the dataset. The 14 observations are reported in ascending order. You are not told which are the Cauchy observations. Identify the seven Cauchy (and by default, the seven normal) observations. You do not have to give a reason, but a reasoned answer would be more satisfactory. For each observation whose distribution you identify correctly, you will get +1 point, and for each misidentified observation, you will get −1 point. You cannot leave any of the 14 observations unidentified. Your score is the total of your 14 points. The top three answers are those with the top three scores. In the case of ties, we will treat the tied answers equally.
Here is the dataset: {−50.64, −6.41, −1.39, −0.72, −0.70, −0.16, −0.11, 0.24, 0.92, 1.01, 1.17, 1.75, 6.65, 12.42}

Solution to Puzzle 48

Deborshi Das

Congratulations to Deborshi Das [pictured right] from Indian Statistical Institute, Delhi, for a sterling example of precision, lucidity and completeness.

Anirban DasGupta explains:

Puzzle 48.1

If we denote the three vertices as $A, B, C$, then the prototypes of connected graphs are $A \sim B \sim C \not\sim A$, and $A \sim B \sim C \sim A$. So the probability that $G(3,p)$ is connected is ${3 \choose 2}\, p^2\,(1-p)+p^3 = 3\,p^2 – 2\,p^3$.

For the next part, write the total number of isolated vertices as the sum of the indicators of vertex $i$ being isolated. The expectation of each of these indicators is obviously $(1-p)^{n-1}$ and so the expected number is $n\,(1-p)^{n-1}$. These indicators are not independent. But their covariances can be easily calculated.Then a little bit of algebra gives that the variance of the number of isolated vertices is
\[ n\,(1-p)^{n-1}+ 2\,{n \choose 2}\,(1-p)^{2n-3}-n^2\,(1-p)^{2n-2}. \]

Moving on to the next part, by elementary calculus, $n\,(1-\frac{c\log n}{n})^{n-1} \sim n^{1-c}$ in the usual sense of this notation. It follows that if $0 < c <1$, then the probability that the number of isolated vertices is zero converges to zero, and so the connectedness probabilty must also converge to zero.

If $c$ is exactly equal to one, the expected value as well as the variance of the number
of isolated vertices converge to $1$. One would feel tempted to conjecture that the number of isolated vertices has a limiting Poisson distribution with mean one, without requiring any further centering or norming. This is actually true, but requires quite a bit of calculation. The most direct proof is to use the factorial moment method for proving weak convergence to Poissonity. A reference is Galambos and Simonelli (1996), pp 137.

Finally the last part follows simply from the fact that if the expectation of a sequence of non-negative random variables converges to zero, then the sequence converges to zero in probability.

Puzzle 48.2

We have various ways to choose a likelihood function to address this problem. A full likelihood would count all the edges that formed. This is a binomial, but identifying all the edges may be difficult for large $n$. One can also sacrifice exactness for simplicity of computation. For example, observe which of the realizations are connected. The connectedness probability can actually be written. Then you get another likelihood function. There are other possibilities too, evidently. Whichever likelihood function one uses, $p$ can be estimated by maximum likelihood, for instance.

The second part, (b), is an interesting question. Basically, one would like to use a graph metric and minimize the metric distance between $G(n,p)$ and $G_0$. There are very interesting ways to attack this problem. For example, one can look at the adjacency matrices of $G(n,p)$ and $G_0$ and compute a distance between them by using a suitable matrix norm. This will involve serious computation. One may use other graph distances to lessen the computation. You can get some ideas of what distances you can use in Wolfram Language Documentation.