Contributing Editor Anirban DasGupta sets this problem. Student members of the IMS are invited to submit solutions (to bulletin@imstat.org with subject “Student Puzzle Corner”). The deadline is April 20, 2017.

After a long gap, we now resume the problem corner, and it is the turn of a problem on probability this time. The problem is at an interesting intersection of probability, analysis, and number theory.

Imagine that you are tossing an honest die repeatedly, and your score after the $n$th roll, say $S_n$, is the sum of the first $n$ rolls. This, of course, is an integer between $n$ and $6n$. Will $S_n$ ever be a prime number for some $n$? For infinitely many $n$? What can we say about how many rolls does it take for $S_n$ to be a prime number for the first time? Does it take just a few rolls? Is the expected waiting time finite? Can we give an approximate value for the expected waiting time? And so on.

Here is the exact problem of this issue:

Let $X_1, X_2, \cdots $ be iid discrete uniform on the set $\{1, 2, \cdots , 6\}$, and let for $n \geq 1, S_n = \sum_{i = 1}^n X_i$. Let $\mathcal{P}$ denote the set of prime numbers $\{2, 3, 5, 7, \cdots \}$, and $\tau = \inf \{n \geq 1: S_n \in \mathcal{P}\}$.

(a) Is $P(\tau < \infty ) >0$?

(b) If $P(\tau < \infty ) >0$, does it have to be $1$?

(c) Show that $E(\tau ) > \frac{7}{3}$.

(d) Is $E(\tau ) < \infty $?

(e) If $E(\tau ) < \infty $, give an approximate numerical value for it.

(f) Conjecture if the variance of $\tau $ is finite.

(g) Is $P(S_n \in \mathcal{P} \, \mbox{for
infinitely many}\, n) = 1$?

Note: Answer as many parts as you can; do not be disappointed if you cannot answer all the parts.

Solution to the previous puzzle:

Contributing Editor Anirban DasGupta writes:

The problem asked was to derive an asymptotically correct $100(1-\alpha )\%$ confidence interval for $F(\mu )$, given an iid sample $X_1, \cdots , X_n$ from a distribution with CDF $F$, finite mean $\mu $ and variance $\sigma ^2$, and a density $f(\mu )$ at $\mu $, in the sense that $F$ is differentiable at $\mu $ with a derivative $f(\mu )$. The mean and the variance are considered unknown, and no functional form of $F$ or $f$ is assumed.

The problem is not entirely simple; there is some literature on it. If we define the empirical process $G_n(t) = \sqrt{n}\,[F_n(t)-F(t)],$ where
$F_n(t) = \frac{1}{n}\,\sum_{i = 1}^n\,I_{X_i \leq t}$, then consider the decomposition $\sqrt{n}\,[F_n(\bar{X})-F(\mu )] = [G_n(\bar{X}) – G_n(\mu )] + \sqrt{n}\,[F_n(\mu ) – F(\mu )] + \sqrt{n}\,[F(\bar{X}) – F(\mu )]$.

By using the multivariate central limit theorem, the delta theorem, and the order of the oscillation of the empirical process $G_n(t)$ in small intervals, one can show that $\sqrt{n}\,[F_n(\bar{X})-F(\mu )] \stackrel {\mathcal{L}} {\Rightarrow } N(0, V(F))$, where $V(F) = F(\mu )(1-F(\mu )) + \sigma ^2\,f^2(\mu ) + 2\,f(\mu )\,E_F[(X-\mu )I_{X \leq \mu }]$. Construction of a confidence interval for $F(\mu )$ requires consistent estimation of $F(\mu )$, $\sigma ^2$, $E_F[(X-\mu )I_{X \leq \mu }]$, and $f(\mu )$. The first three are easily done. Consistent estimation of $f(\mu )$ can be done by using various standard density estimation methods, but because $\mu $ is considered unknown, the assumptions on $f$ are stronger than what one needs for pointwise consistent estimation of $f(x)$ at any known $x$. Alternatively, one can bootstrap $\sqrt{n}\,[F_n(\bar{X})-F(\mu )]$, and find bootstrap estimates of the variance or directly find quantiles of the bootstrap distribution.