Deadline July 1, 2020.
Contributing Editor Anirban DasGupta writes:
After our last puzzle with probabilities on hyperspheres [see solution below], it is now time to turn our thoughts again to something in statistics. This time it’s a problem on epidemiology. Anirban DasGupta deliberately leaves this problem incompletely formulated. A correct solution involves identification of all the parameters, then formulate and answer the question in terms of the model parameters. Be careful: the values of your parameters may be known!
In a certain state in a country, each of m families has k members. We assume m and k to be given to us. Suppose a total of X residents of the state are found to have contracted an infectious viral disease; we assume X to be observable. Suppose these X infected residents come from a total of Y different families. Thus, Y denotes the number of families in the state affected by this virus; data on Y, unfortunately, is not available.
(a) Find a closed form expression for E(Y).
(b) Hence, or otherwise, provide a statistical estimate for Y.
(c) This is more complex: write an expression for the probability mass function of Y.
Solution to Puzzle 28
Contributing Editor Anirban DasGupta writes on the previous problem, which was about probabilities on hyperspheres:
Congratulations to Andrew Thomas, who is a PhD student in the Department of Statistics at Purdue University. Andrew sent a very detailed, rigorous and well written solution.
Suppressing the dimension
Fortunately, one can integrate in closed form and get
As
Hence, by using Stirling’s approximation:
Next, again using the expression for the density, we get that the
which on using the Gamma duplication formula simplifies to
the
If we instead compute the expected geodesic distance between