Following “guest puzzler” Stanislav Volkov’s rotating wheel probability puzzle (solution below), Anirban DasGupta sets a statistics puzzle:

This is one of those quick-and-dirty methods, popularized by John Tukey, one that makes some intuitive sense, and can be very quickly implemented. This issue’s problem is about testing the equality of two absolutely continuous distributions on the real line. You may not have seen this pocket test before. Here is the exact problem.

Based on iid picks $X_1, … , X_n$ from an absolutely continuous distribution $F$ and an independent iid pick $Y_1, … , Y_n$ from a possibly different absolutely continuous distribution $G$, we propose a test statistic for testing $H0 : F = G$; as stated above, $F,G$ are distributions on the real line. Arrange the combined sample in an ascending order and suppose the overall sample maximum is a sample from $F$, and the overall sample minimum is a sample from $G$. Count the number of $X$-values larger than the largest $Y$-value and also count the number of $Y$-values smaller than the smallest $X$-value. The test statistic $T_n$ is the sum of these two extreme runs counts. If the overall sample maximum and the overall sample minimum are both samples from the same distribution, define $T_n$ to be zero.

a) Give theoretical values or theoretical approximate values for the mean and the variance of $T_n$ under the null.

b) Give theoretical approximations to cut-off values for rejecting the null based on the test statistic $T_n$. This is close to asking what are theoretically justified approximations to the null distribution of $T_n$.

c) Is this test distribution-free in the usual sense?

d) What would be the approximate power of this test at level .05 if $F = N(1, 1),G = N(0, 1), n = 100$? Be careful about the rejection region.

Solution to Student Puzzle 19

We received correct solutions to Stanislav Volkov’s puzzle from Mirza Uzair Baig from the University of Hawai’i at Mānoa, Jiashen Lu from the University of Pittsburgh, and Benjamin Stokell, University of Cambridge. Well done!

Mirza Uzair Baig Jiashen Lu Benjamin Stokell

Stanislav explains:

Observe that the required probability equals
$\begin{align*}
x:= \mathbb{P}(Y_\infty=0 |Y_0=0)=\sum_{k=0}^\infty \mathbb{P}\end{align*}$(the wheel rotates $4k$ times)

$\begin{align*}
=\sum_{k=0}^\infty \sum_{j_1,\dots,j_{4k}}
p_{j_1}\cdots p_{j_{4k}}\prod_{\ell\notin \{j_1,\dots,j_{4k}\}} (1-p_\ell)
\end{align*}$
where $j_n$ are distinct non-negative integers and $\ell$ is a non-negative integer as well; additionally, we assume that the “empty” sum (when $k=0$) equals $1$. This can be somewhat simplified observing that
\begin{align*}
\frac{x}{\prod_{j=1}^\infty (1-p_j)}
=
\sum_{k=0}^\infty \sum_{j_1,\dots,j_{4k}\ge 0}
\rho_{j_1}\cdots \rho_{j_{4k}}=:S
\end{align*}
where $\rho_k=p_k/(1-p_k)$, the $k-$th odds ratio.
Now we are going to use a little trick, namely that
\begin{align*}
\prod_{j=1}^\infty (1 + \nu \rho_j) &=
\sum_{k=0}^\infty \nu^k \sum_{j_1,\dots,j_k} \rho_j.
\end{align*}
Summing the above expression for $\nu=1$, $i$, $i^2=-1$, and $i^3=-i$ respectively, where $i=\sqrt{-1}$, we get
\begin{align*}
\prod_{j=1}^\infty (1 + \rho_j)
+\prod_{j=1}^\infty (1 + i \rho_j)
+\prod_{j=1}^\infty (1 – \rho_j)
+\prod_{j=1}^\infty (1 – i \rho_j)
=4S
\end{align*}
since
$$
1^k+i^k+(-1)^k+(-i)^k=\begin{cases}
1,&\text{if } k\mod 4=0,\\
0,&\text{otherwise}.
\end{cases}
$$
Consequently,
$$
x=\frac 14
\prod_{n=1}^\infty (1-p_n)
\left[
\prod_{n=1}^\infty (1 + \rho_n)
+\prod_{n=1}^\infty (1 + i \rho_n)
+\prod_{n=1}^\infty (1 – \rho_n)
+\prod_{n=1}^\infty (1 – i \rho_n)\right].
$$
Finally, in case $p_n=\frac 1{2n^2+1}$ one can use e.g. formulae 4.5.68–69 from “Handbook of Mathematical Functions” by Abramowitz and Stegun.

Note that this method can be easily generalized for a wheel with any number $M\ge 2$, by replacing $i=\sqrt[4]{1}$ with $\sqrt[M]{1}=e^{2\pi i/M}$.