Contributing Editor Anirban DasGupta examines the sometimes unpleasant reasons behind the widely varying ratios of male and female populations around the world, and considers the role of statistics in detecting patterns:

It was observed at least 300 years ago that, for reasons that are not fully understood, the biology of human reproduction leads to a slightly uneven sex ratio at birth. The term “sex ratio” is defined here as $\frac{p}{1-p}$, where $p$ is the probability that a newborn child will be a boy. In most populations, $p$ is about $\frac{21}{41}$, which amounts to saying that 105 boys are born per 100 girls. This gives a sex ratio of 1.05. (We are looking here at male and female gender, but acknowledge that, depending upon which definition is used, between 0.1% and 1.7% of live births are intersex.)

Gender at conception is influenced by numerous factors, including but not limited to the preponderance of Y chromosomes over X among the fathers in a given population, the age of the father or mother, ethnicity, and order of birth of the child. Despite these and other scientific explanations for the natural imbalance in the sex ratio at birth, it is worth noting that in a significant number of countries of the world, the sex ratio far exceeds the 1.05 value. One would wonder, and should ask, why?

This is, in fact, a fairly old question. Nobel Laureate Amartya Sen wrote a famous article in the 1990 New York Review of Books in which he estimated that, in Asia alone, more than 100 million women are “missing.” This means that the actual number of women in the population at large is 100 million short of the expected value.

A large amount of subsequent work, focusing on the sociological, economic and medical aspects of the “missing women” phenomenon now exists. We can look at the UN data on the sex ratio for the period 2010–15 for a sample of 36 countries of the world and perform, for each, the Wald test for the null hypothesis $H_0: p = \frac{21}{41}$ against the one-sided alternative $H_1 : p > \frac{21}{41}$. The $z$-values are astoundingly large for a number of countries.

We will briefly touch on possible explanations for these staggering significance levels. The countries and corresponding data are listed in the table. [The $z$-values are not reported when they are less than or equal to 0. The number of new births, $n$, is needed for each country in order to compute the $z$-value; $n$ is obtained from the UN data on number of births by country for the year 2011 and by multiplying it by 5, because our sex ratios are for a five-year window combined.]

Country Sex Ratio $n (\times 5000)$ $z$-value
Australia 1.06 310 5.9
Afghanistan 1.06 1410 12.58
Bahrain 1.04 23
Bangladesh 1.05 3015
Bhutan 1.04 15
Brazil 1.05 3000
Canada 1.05 390
China 1.15 16400 411.03
Costa Rica 1.05 75
Cuba 1.06 110 3.51
Egypt 1.07 1885 28.95
France 1.05 790
Greece 1.06 115 3.59
India 1.1 27100 270.47
Ireland 1.06 70 2.8
Italy 1.06 560 7.93
Japan 1.06 1075 10.98
Kuwait 1.05 50
Mexico 1.05 2200
Myanmar 1.03 825
Nepal 1.05 720
Norway 1.05 60
Pakistan 1.09 4765 91.17
Qatar 1.05 20
Republic of Korea 1.06 350 6.27
Russian Federation 1.06 1690 13.77
Saudi Arabia 1.03 605
Singapore 1.07 45 4.47
Sri Lanka 1.04 375
Sudan 1.04 1450
Switzerland 1.05 75
Thailand 1.06 825 9.62
UAE 1.05 95
Uganda 1.03 1545
UK 1.05 760
USA 1.05 4320

The five largest $z$-values are 411 (China), 270 (India), 91 (Pakistan), 29 (Egypt) and 14 (Russia); their $p$-values are immeasurably small.

It is well known that sex ratio at puberty can be seriously imbalanced due to treatment of the girls as essentially second class citizens: in some populations, they are deliberately given lesser medical care, poor nutrition, responsibilities for hard physical work, and little or no access to education. But the sex ratios reported in this column are the ratios at birth. One must wonder why the $z$-values are so large for several countries.

Some possible explanations are:
a) gender selective abortions following a cheaply available ultrasound;
b) for births at home, elimination of girls immediately after birth, known as witch killing;
c) incomplete or incorrect data given to the UN by some countries.
d) Some other genetic reason to make the value of $p > \frac{21}{41}$ in these countries.

Female infanticide is a repugnant practice but sadly not a new one: Darwin elaborated on it in 1871.

I close with a few comments on mathematical treatment of this problem. Can we diagnose female infanticide using rigorous statistical methods and suitable data? It turns out that we probably can. If $p$ is the natural probability for a male child and $\theta$ the probability that a female child will be eliminated or aborted, then we have a three-cell multinomial, with the balls in one cell being unobservable. The model is identifiable if $p$ is known, and unidentifiable otherwise. A rigorous likelihood theory is possible, and $\theta$ can be estimated as long as $p$ is known; one has to solve a cubic. The Wald or the score interval is computable. If we have data on the full sequence of births, e.g., BGBG (i.e., boy–girl–boy–girl), we can also devise test statistics for whether certain important patterns occur more frequently than would be normal. As an example, female infanticide is more common among lower order births. Thus, if we see a pattern such as GBGBBB occur frequently in six-children families, we have a marker for female infanticide. Run statistics also give useful tests.

Of course, I do not offer social or political solutions to such a widespread and chronic problem. But if these apparently very large $z$-values do have something to do with female infanticide, I hope in a small way I have helped IMS members to be more aware of this abhorrent monstrosity.