DNA sequencing can discover not only single-base variants but also copy-number alterations (CNAs). In shotgun sequencing, regions of CNAs show step-wise changes in read depth when compared to adjacent "normal" regions, allowing their detection by parametric statistical tests that compare the mean coverage in suspected regions against that of a baseline distribution. Traditionally, the power of such a test depends on (1) the integer number of copy number change, (2) the overall sequencing depth, (3) the length of the CNA region, (4) the read length and (5) the variation of coverage along the genome, which depends on many experimental factors, including whether the chosen platform is whole-genome, whole-exome, or targeted-panel sequencing. In cases involving inadvertent sample mixing or genuine somatic mosaicism, power also depends on the mixing ratio. However, the analysis of statistical power that considers the interplay of all these factors has not been systematically developed. Here we present a general analytical framework and a series of simulations that explore situations from the simplest to the increasingly multifactorial. Specifically, we expand the expression of power to include not just the known factors but also one or both of two complications: (1) the dispersion of read depth around the mean beyond the independent sampling-by-sequencing assumption, and (2) the reduced fraction of the CNA-bearing sample ("purity") as seen in studies of intratumor heterogeneity or in clinical monitoring of minimal residual disease. We describe the analytical formula and their simplifications in special cases, and share the extendable scripts for others to perform customized power analysis using study-specific parameters. As study designs vary and technologies continue to evolve, the input data and the noise characteristics will change depending on the practical situation. We present two use cases commonly encountered in cancer research: ultra-shallow whole-genome sequencing for detecting large, chromosome-scale events, and targeted ultra-deep sequencing for surveillance of known CNAs in rare tumor clones in the task of sensitive detection of cancer relapse or metastasis. We also present an online calculator at https://shiny.med.umich.edu/apps/hanyou/CNV_Detection_Power_Calculator/.
- Downloaded 346 times
- Download rankings, all-time:
- Site-wide: 46,816 out of 89,194
- In bioinformatics: 5,417 out of 8,418
- Year to date:
- Site-wide: 54,198 out of 89,194
- Since beginning of last month:
- Site-wide: 51,986 out of 89,194
Downloads over time
Distribution of downloads per paper, site-wide
- 18 Dec 2019: We're pleased to announce PanLingua, a new tool that enables you to search for machine-translated bioRxiv preprints using more than 100 different languages.
- 21 May 2019: PLOS Biology has published a community page about Rxivist.org and its design.
- 10 May 2019: The paper analyzing the Rxivist dataset has been published at eLife.
- 1 Mar 2019: We now have summary statistics about bioRxiv downloads and submissions.
- 8 Feb 2019: Data from Altmetric is now available on the Rxivist details page for every preprint. Look for the "donut" under the download metrics.
- 30 Jan 2019: preLights has featured the Rxivist preprint and written about our findings.
- 22 Jan 2019: Nature just published an article about Rxivist and our data.
- 13 Jan 2019: The Rxivist preprint is live!