Anirban DasGupta sets two puzzles, one on probability and one on statistics, about Erdős–Rényi graphs, among the most basic models of random graphs. Random graphs are used to model network data. Take a vertex set equal to $\{1, 2, \cdots , n\}$ where $n \geq 2$. Given two distinct vertices $u, v$ we write $u \sim v$ if there is an edge between $u$ and $v$. In the Erdős–Rényi model, the ${n \choose 2}$ variables $I_{u \sim v}$ are taken to be i.i.d. Bernoulli random variables with a common parameter $p$, and we write a realization as $G(n,p)$.

Puzzle 48.1a Show that $P(G(3,p) \,\mbox{is connected}) = 3p^2-2p^3$.

48.1b We call a vertex isolated if its degree is zero. Find a formula for the mean and the variance of the number of isolated vertices in $G(n,p)$.

48.1c Give an elementary proof that the number of isolated vertices in $G(n,p_n)$ converges in probability to zero if $p_n = c\,\frac{\log n}{n}$ with $c > 1$. Can you make a conjecture about the limiting behavior of the number of isolated vertices in $G(n,p_n)$ if $p_n = c\,\frac{\log n}{n}$ with $c = 1$?

48.1d Suppose $p = p_n = c\,\frac{\log n}{n}$ where $0 < c < 1$. What can we conclude about $\lim_{n \to \infty } P(G(n,p_n) \,\mbox{is connected})$?

 

Puzzle 48.2a Suppose you have obtained $N = 100$ independent realizations of a $G(n,p)$, where you know $n$, but you are not willing to assume a known value for $p$. Suggest a reasonable method to estimate $p$ based on your 100 realizations of $G(n,p)$. You can suggest more than one method if you wish.

48.2b Suggest a formulation for the following question: Find a best fitting $G(n,p)$ to a realized graph $G_0$ on $n$ vertices.


Solution to Puzzle 47

Congratulations once more to Bishakh Bhattacharya and Bilol Banerjee (both from ISI Kolkata), for their answers to 47.1 and 47.2, respectively. Anirban DasGupta explains:

Puzzle 47.1a We use the technique of superposition of independent homogeneous Poisson processes. Suppose an $m$-sided fair die is repeatedly rolled, and that the rolls are mutually independent. We consider $m$ independent homogeneous Poisson processes, each with an intensity equal to $\frac{1}{m}$. If a particular throw of the die results in face number $i$, the $i$th process has a jump at that time. The number of events in any given process up to time $x$ is a Poisson with mean $\frac{x}{m}$, and these are independent. Thus,

$E(W) = \int_0^\infty P(W > x)dx
= \int_0^\infty[1-(1-e^{-x/m}\, (1+x/m))^m]\,dx$

For $m = 6$, this equals $\frac{390,968,681}{16,200,000} = 24.134$.

We can calculate $E(W)$ for dice of other shapes. For a tetrahedral fair die, $E(W) = 14.189$; for an octahedral die, $E(W) = 34.885$, and for a dodecahedral die having 12 faces, $E(W) = 58.045$.

Problem 47.1b If instead we want the expected waiting time $Z$ till one of the $m$ faces occurs three times, then by using the same technique,

$E(Z) = \int_0^\infty P(Z > x)dx
= \int_0^\infty [e^{-x/m}\, (1+x/m+x^2/(2m^2))]^m\, dx$

For $m = 6$, this equals $\frac{4,084,571}{559,872} = 7.2955$.

Problem 47.2 One can use $W$ or $Z$ to write a test for fairness of the die. For instance, one can calculate the second moment of $W$ by evaluating

$2\,\int_0^\infty x\,[1-(1-e^{-x/m}\, (1+x/m))^m]\,dx$.

Hence one calculates both the mean and the variance of $W$. If the realized value of $W$ is $w$, we can compute a bound on either the two-sided or an one-sided $P$-value from the mean and the variance of $W$. Or, one may directly calculate the $P$-value. Similarly, tests can be constructed by using $Z$.