Rxivist logo

A non-zero variance of Tajima's estimator for two sequences even for infinitely many unlinked loci

By Léandra King, John Wakeley, Shai Carmi

Posted 17 Aug 2016
bioRxiv DOI: 10.1101/069989 (published DOI: 10.1016/j.tpb.2017.03.002)

The population-scaled mutation rate, Θ, is informative on the effective population size and is thus widely used in population genetics. We show that for two sequences and n unlinked loci, Tajima's estimator (Θ^), which is the average number of pairwise differences, is not consistent and therefore its variance does not vanish even as n → ∞. The non-zero variance of Θ^ results from a (weak) correlation between coalescence times even at unlinked loci, which, in turn, is due to the underlying fixed pedigree shared by all genealogies. We derive the correlation coefficient under a diploid, discrete-time, Wright-Fisher model, and we also derive a simple, closed-form lower bound. We also obtain empirical estimates of the correlation of coalescence times under demographic models inspired by large-scale human genealogies. While the effect we describe is small (Var[Θ^]/Θ2 ≈ O(N-1)), it is important to recognize this feature of statistical population genetics, which runs counter to commonly held notions about unlinked loci.

Download data

  • Downloaded 487 times
  • Download rankings, all-time:
    • Site-wide: 68,887
    • In evolutionary biology: 3,497
  • Year to date:
    • Site-wide: 98,639
  • Since beginning of last month:
    • Site-wide: 113,277

Altmetric data

Downloads over time

Distribution of downloads per paper, site-wide