This article is written by Chris Pankow (Northwestern University), Will M. Farr (University of Birmingham, AL) and Ben Farr (University of Chicago). Richard A. Davis, IMS Past President, explains: “Last February, former IMS Bulletin editor Anirban DasGupta wanted to publish an article about the role of statistics in the recent and stunning discovery of Einstein’s prediction on gravitational waves. I approached Andrew Gelman, who has a background in physics and is also the co-creator of the Bayesian program Stan that was used in some of the statistical calculations, to help. Andrew persuaded Chris Pankow, Will Farr, and Ben Farr to write this article. Thanks to Anirban for the great suggestion.”


A century ago, Einstein and others laid down the foundation of what is now known as general relativity — the theory of how mass and the curvature of spacetime interact. One prediction of this theory is the radiation of energy through gravitational waves (Einstein 1916) from accelerating massive systems like astronomical binaries. While normally an immeasurable effect for widely separated, non-compact systems like the Solar System, close binaries with black hole or neutron star components emit potentially detectable and distinct gravitational-wave signatures. Detection by laser interferometers, like the kilometer-scale instruments in Livingston, Louisiana and Hanford, Washington, has been pursued because they are capable of detecting the diminutive effect of passing waves—in the detections discussed here, the length of the 4 km interferometer arms varied by only ~10−18m. Most searches for gravitational waves from compact binaries use models which are based on general relativity and parameterized by the physical properties of the binary (e.g., compact object masses, angular momenta, orientation and position; usually more than 15 parameters). These models describe the effect of the wave impinging on a network of gravitational-wave detectors. To account for modeling uncertainty, or when a complete signal model is unavailable, more generic searches are employed, which assume no particular signal morphology. In the latter half of 2015, the two LIGO interferometers (Abbott et al. 2016e) recorded several transient events from the mergers of binary black holes (Abbott et al. 2016b), confirming their existence and measuring their properties (Abbott et al. 2016a) using a variety of sophisticated statistical techniques. Abbott et al. (2016g) describes the basic, order-of-magnitude physics behind the radiated signal and its interpretation in the first detection.

The second detection of GW151226 and weaker, but still compelling, LVT151012 have left no doubts in the minds of scientists that gravitational waves exist and match with our description of Nature to high precision. The detections also provided a unique opportunity to test general relativity in its position as the prevailing theory of gravity. The vast majority of alternative theories of gravity predict gravitational waves, but with modifications according to their peculiarities. The searches and most of the parameter estimation use models which assume general relativity (GR) is the correct description of the interaction between gravity and matter. The generic transient analyses mentioned previously do not require GR to be entirely accurate, but still require most of its fundamental tenets. There are also a variety of explorations into systematic deviations from GR—a summary of LIGO–Virgo Collaboration work is in Figures 7&8 in Abbott et al. (2016b). One of many efforts to translate the observations into constraints on various GR extension theories can be found in Yunes et al. (2016).

The community response to the discovery has been nearly ubiquitously positive and constructive—they are incorporating and building on our result. There have been very few serious, peer-reviewed attempts at debunking the result. The statistical arguments remain unchallenged in the literature, but some have questioned various pieces of the experiment and its interpretation. For example, in Chang et al. (2016), it is argued that a part of our frequency response to gravitational waves was flawed. However, no counterclaim has yet been considered to be credible.


The field of astrostatistics is quite rich, and the LIGO–Virgo Collaboration has taken advantage of a solid foundation forged by others before us. The literature has many earlier examples of statistical modeling and its use with large data sets, weak signal strength, possible measurement biases, and disentangling mixtures of populations: for example, exoplanet detection with Kepler data (e.g. see work by Foreman-Mackey, Hogg, Rogers, and many more). Another very concrete, though not yet realized, example is the LSST (Large Synoptic Survey Telescope) project 1, slated to begin in the 2020s. They will be dealing with an event rate in the tens of thousands per night and the false positive problem is a complicated one which is just beginning to be addressed.

The transient searches employed by LIGO can be separated into those which assume a particular source model (e.g. searches for binary mergers, see Abbott et al. (2016b) and references therein) and more generic searches which only enforce reconstructed signal consistency between instruments (see Abbott et al. (2016c) and references therein). Many of the statistics involving the matched filtering were derived in the early 2000s: both types of searches use time-series filtering algorithms (Anderson et al. 2001; Allen et al. 2012). These algorithms produce lists of times and amplitudes relative to the noise (encoded in the signal-to-noise ratio, or SNR) which characterize putative signals embedded in a noisy data stream. The stochastic nature of the noise means that the SNR has a statistical distribution with larger SNRs becoming increasingly improbable.

The main function of statistics like the SNR and χ2 (Allen 2005) is to distinguish the event candidate from the non-astrophysical transient background distribution and establish its statistical significance. Searches must also deal with transients in the data which are induced by the instrument and its environs (Abbott et al. 2016a) — colloquially called “glitches”. This additional population imposes fatter tails on the idealized SNR distribution. In practice, there is no analytic description for the non-astrophysical environmentally-induced transient SNR distribution, and so this must be measured empirically.


Once an event time of interest is observed by the searches, a posterior distribution over the parameters describing the source of a signal is sampled using forward-modeling Bayesian methods that demand explicit modeling of signal and noise alike. The likelihood function is formed from the residuals left after signal subtraction from the data time series. While this is a (mostly) straightforward application of Bayes’ Law, there is no analytical marginalization available, so stochastic sampling techniques, particularly Markov Chain Monte Carlo and nested sampling (Abbott et al. 2016a; Veitch et al. 2015) are employed. This likelihood, coupled with the priors on source parameters (e.g., uniform in compact object masses and spins, isotropic in orientation, uniform in volume in the local universe), provide the posterior probability density for the source parameters.

Another analysis has been developed to account for possible glitch behavior at the time of a signal and to mitigate uncertainties in the signal model. The non-Gaussian components of the noise (assumed uncorrelated between instruments) can be modeled simultaneously with the signal (assumed correlated across detectors). The reconstruction is obtained using a reverse-jump Markov Chain Monte Carlo (RJMCMC) 2. This enables jumps between model spaces in addition to traditional MCMC jumps within a single model space, allowing for model comparison to be done “on the fly” (Cornish & Littenberg 2015; Abbott et al. 2016c). It is a powerful tool in unparameterized signal reconstruction and glitch rejection, but this method has relaxed assumptions about the astrophysical signal relative to the rigorous parameterized modeling imposed by the previously mentioned MCMC and nested sampling techniques, and is not typically used to generate posteriors over the physical properties of the binary.


One of the key science outputs from these observations is the density of merging binary black holes in spacetime (the merger rate), expressed as a number of mergers per cubic gigaparsec (Gpc3) per year 3. To calculate this number, the Collaboration must calculate the number of detected mergers (the “numerator” in units of counts) and the spacetime volume that it has searched (the “denominator” in units of Gpc3 yr1). There are statistical challenges to both calculations. To derive the rates and the probability of astrophysical origin (Abbott et al. 2016h), hierarchical modeling (a pedagogical reference can be found in Foreman-Mackey et al. (2014)) and mixture models have been adapted from their use in astrostatistics.

While two confident gravitational-wave detections were reported (GW150914 and GW151226) there was a third candidate (LVT151012) identified with measured cumulative accidental background event coincidence (or false alarm) rate of (2.3 yr)1 (Abbott et al. 2016d). This candidate has a significance, or p-value, of 0.045 (Abbott et al. 2016b). To account for our uncertainty about the origin of this trigger we fit a mixture model comparing the SNR distributions of both the terrestrial and astrophysical populations to the set of triggers from our searches, including the three candidates and many more (Abbott et al. 2016b,i,h; Farr et al. 2015). From this model, we infer a posterior probability that the LVT151012 trigger is associated with an astrophysical source of 0.86. We use the posterior on the astrophysical population in the mixture model to infer coalescence rates.

Fitting a hierarchical power-law model to our three candidates, accounting for our mass measurement uncertainties and selection effects (Loredo 2004; Mandel et al. 2016) yields a posterior median and 90% credible interval of α=2.5+1.5−1.6 (Abbott et al. 2016b); not surprisingly, given three possible detections, the source population is poorly constrained. A full accounting of rates under different population assumptions can be found in Abbott et al. (2016b), but the union of the rates estimate provides a conservative 90% credible interval of 9–240 Gpc−3 yr−1.


The posterior predictive distribution for the rate can be used to extrapolate the expected number of future detections over the next set of observational runs (labeled O2, O3, etc.). While dependent on progress with planned improvements in the instruments (Abbott et al. 2016e,c), using a reasonable range of estimates for the expected detector sensitivities in the upcoming six-month LIGO observing run from late 2016 (Abbott et al., 2016f), the probability of at least 10 more confident detections (like GW150914 and GW151226) in the next run is between 15–80% (see Fig. 13, Abbott et al., 2016b).

The measurement of the binary parameters, their statistical significance, and occurrence rate are definitive statements: our methods are well tested from previous data-taking runs, and the evidence presented is a clear affirmation of the significance of the detections. However, as O2 evolves and more events are collected, the rates will become better measured. As a consequence, several groups inside and outside the Collaboration are working on more sophisticated hierarchical models in attempts to incorporate more astrophysics and do better, more nuanced model selection. We seek the answers to questions like: “How do heavy black holes form?” and “What are their environments and interactions?”. Speaking more directly to physics, we’ve confirmed yet another (elusive!) piece of general relativity, and pushed the frontiers of precision science and large-scale experimental physics. The astronomy community has also been very excited: another potential discovery in the next few years may be a neutron-star black-hole binary. Should this happen, there is a chance we can correlate the gravitational-wave signal with a flash of gamma rays (very high energy light). This would allow for a rich laboratory of science involving the high-density matter composing neutron stars, otherwise inaccessible in terrestrial laboratories. Beyond binary merger events, there is potential for detections of supernova, semi-monochromatic GW emission from deformed neutrons stars, and the gravitational-wave analog to the cosmic microwave background. The Collaboration’s plan for just the next year fill many pages (Collaboration 2016).

GW150914, in and of itself, did not directly affect humanity. Indeed, all gravitational-wave events pass by without notice by the population at large. However, the technological and sociological benefits of the Collaboration’s journey to discovery of gravitational waves are immeasurable. As a beginning, we have developed an instrument capable of extremely high precision measurements and we’ve developed leading edge data analysis algorithms and statistical modeling techniques. Furthermore, the Collaboration has provided data releases to demonstrate to society the fruits of endeavor that their investment has directly contributed to. About an hour of the time series data surrounding each event has been released by the LIGO Open Science Center (LOSC), free for exploration and accompanied by basic tutorials in their use4. A full data release for O1 will occur, but may take several more months to assemble. In the meantime, those interested can obtain the data from the fifth and sixth science runs (2005–2010) from the LOSC site above.

Maybe less concretely, but more poignantly: we’ve allowed the world to “listen” to black holes for the first, and assuredly not the last, time.


1 Large Synoptic Survey Telescope:  

2 For a good introduction, see  

3 One parsec is 3.1×1016 m, or about three light-years. There are about 12,000 Gpc3 in the observable universe. See Hogg (1999) for a discussion of distances and times in cosmology. 





Abbott, B. P., Abbott, R., Abbott, T. D., Abernathy, M. R., Acernese, F., Ackley, K., & Adams, C. 2016a, Classical and Quantum Gravity, 33, 134001

Abbott, B. P. et al. 2016b, Phys. Rev. D, 93, 122003

——. 2016c, Phys. Rev. D, 93, 122004

Abbott, B. P. et al. 2016a, Physical Review Letters, 116, 241102, 1602.03840

——. 2016b, Physical Review X, 6, 041015, arXiv:1606.04856

——. 2016c, ArXiv e-prints, arXiv:1602.03845

——. 2016d, Phys. Rev. D, 93, 122003, arXiv:1602.03839

——. 2016e, Physical Review Letters, 116, 131103, arXiv:1602.03838

——. 2016f, Living Reviews in Relativity, 19, arXiv:1304.0670

——. 2016g, ArXiv e-prints, 1608.01940

——. 2016h, ArXiv e-prints, arXiv:1606.03939

——. 2016i, ArXiv e-prints, arXiv:1602.03842 Allen, B. 2005, Phys. Rev. D, 71, 062001

Allen, B., Anderson, W. G., Brady, P. R., Brown, D. A., & Creighton, J. D. E. 2012, Phys. Rev. D, 85, 122006

Anderson, W. G., Brady, P. R., Creighton, J. D. E., & Flanagan, E. E. 2001, Phys. Rev. D, 63, 042003

Chang, Z., Huang, C.-G., & Zhao, Z.-C. 2016, ArXiv e-prints, 1612.01615

Collaboration, 2016, The LSC-Virgo White Paper on Gravitational Wave Searches and Astrophysics (2016-2017 edition), Tech. rep.

Cornish, N. J., & Littenberg, T. B. 2015, Classical and Quantum Gravity, 32, 135012, 1410.3835

Einstein, A. 1916, Sitzungsber. Preuss. Akad. Wiss. Berlin (Math. Phys.), 1916, 688

Farr, W. M., Gair, J. R., Mandel, I., & Cutler, C. 2015, Phys. Rev. D, 91, 023005, arXiv:1302.5341

Foreman-Mackey, D., Hogg, D. W., & Morton, T. D. 2014, The Astrophysical Journal, 795, 64

Hogg, D. W. 1999, ArXiv Astrophysics e-prints, arXiv:astro-ph/9905116

Loredo, T. J. 2004, in American Institute of Physics Conference Series, Vol. 735, ed. R. Fischer, R. Preuss, & U. V. Toussaint, 195–206, arXiv:astro-ph/0409387

Mandel, I., Farr, W. M., & Gair, J. 2016, Extracting distribution parameters from multiple uncertain observations with selection biases, Tech. Rep. P1600187, LIGO,

Veitch, J. et al. 2015, Phys. Rev. D, 91, 042003, 1409.7215

Yunes, N., Yagi, K., & Pretorius, F. 2016, Phys. Rev. D, 94, 084002