Congratulations to Mirza Uzair Baig at the University of Hawai’i at Mānoa, who wrote an excellent solution to the problem.

Note that the statistic Tn may be represented as

Tn=IY(1)<X(1),Y(n)<X(n)[i=1nIYi<X(1)+i=1nIXi>Y(n)]
+IX(1)<Y(1),X(n)<Y(n)[i=1nIXi<Y(1)+i=1nIYi>X(n)].

Denote the empirical CDF of X1,,Xn by Fn and that of Y1,,Yn by Gn. Then, this above representation yields

Tn=nIY(1)<X(1),Y(n)<X(n)[Gn(X(1))+1Fn(Y(n))] +nIX(1)<Y(1),X(n)<Y(n)[Fn(Y(1))+1Gn(X(n))]. Use the fact that for given u,v,nFn(u) and nGn(v) are binomial random variables with success probabilities F(u) and G(v). Now use the iterated expectation formula by conditioning on the minima and the maxima to get the mean, and similarly, but with a longer calculation, the variance. It is useful to think of Tn as approximately a sum of two geometrics. Suppose W is a negative binomial with parameters r=2,p=12. Then for n not too small, Tn would have a point mass at zero mixed with the negative binomial. That is, write down a Bernoulii variable Z with parameter 12; then Tn (in law) is approximately Z(W+2). This gives a quick explanation for why the mean and the variance under the null of Tn should be about 2 and 6. You can see a plot below of the null distribution of Tn below when n=300; it is distribution-free in its usual sense.

Under specified alternatives, the negative binomial would be replaced by a sum of two geometrics, approximately independent, but not i.i.d.

The next puzzle, number 21, is here. Can you solve it? Send us your answer by September 7.