Contributing Editor Anirban DasGupta sets this problem. Student members of the IMS are invited to submit solutions (to with subject “Student Puzzle Corner”). The deadline is September 4, 2017.

After a relaxed rendezvous with effulgent nothingness, we should now seriously get back to the problem corner. It is the turn of a problem in statistics.
We will pose a problem on deconvolution, sometimes brandished as noisy data. The basic model is that you get to observe a random variable $Y$ which has the distribution of the convolution of $X$ and $Z$, it being usually assumed that $Z$ has a completely known distribution, while the distribution of $X$ has unknown parameters, perhaps infinite dimensional, associated with it. We would want to infer about the distribution of $X$, knowing only $Y$; often, it is assumed that iid replicates of $Y$ are available. There is massive literature on deconvolution, particularly Gaussian deconvolution. Generally, the results are asymptotic in some sense. The problem we describe today was originally posed by C.R. Rao in 1952.

Here is the exact problem of this issue.

Suppose $X \sim \mbox{Bin}(n_1, 1/2)$ and $Z \sim \mbox{Bin}(n_2, p)$, $0 < p <1$ being an unknown parameter; $X$ and $Z$ are assumed to be independent. Due to (spatial) aggregation, we can only observe $Y = X+Z$.

(a) Is there always an MLE of $p$?

(b) In suitable asymptotic paradigms, are there consistent estimators of $p$ based on $Y$ alone?

(c) How does one construct a confidence interval for $p$, again, based on $Y$ alone?

(d) What can be said about minimax estimation of $p$ on the basis of $Y$, using squared error loss?

Solution to Puzzle 17

The problem asked was about the first hitting time $\tau $ of the set of prime numbers by the process $\{S_n\}, S_n$ being the sum of the first $n$ rolls in an infinite sequence of rolls of an honest die. Evidently, $P(\tau < \infty ) > 0$. The set of primes $\mathcal{P}$ is an infinite set, and in fact, with probability one, (the process) $S_n$ will hit $\mathcal{P}$ infinitely many times (see, e.g. Feller, Vol. 2, pp 360, 381). We know from classic number theory that for fixed $n$, large, $P(S_n \in \mathcal{P}) \approx \frac{1}{\log n}$, and since the series $\sum_{n = 2}^\infty \frac{1}{\log n}$ diverges, the expected number of hits is easily $+\infty $. The tailsum formula tells us

$E(\tau ) = 1 + \sum_{n = 1}^\infty P(\tau > n) = 1 + \sum_{n = 1}^\infty P(\cap _{k = 1}^n S_k \not\in \mathcal{P})$,

and a geometric approximation by terminating the infinite series at one million terms gives $E(\tau ) \approx
7.6$. The initial terms $P(\tau > k)$ can be calculated exactly. By summing $P(\tau > k)$ from $k = 0$ to $9$, we get $E(\tau ) > 2.34 > 7/3$.

It is less clear if $\tau $ has a finite variance, but it probably does. To my knowledge, the answer is not explicitly known.

Brown, L. (private communication)
Davis, B. (private communication)
Feller, W. (1971)
Pitman, J. (private communication; provided an approximate value of 7.62 for $E(\tau )$)
Rao, C.R. (1952)
Szekely, G. (private communication)