Rxivist combines preprints from bioRxiv with data from Twitter to help you find the papers being discussed in your field. Currently indexing 67,316 bioRxiv papers from 296,340 authors.
DNA sequencing can discover not only single-base variants but also copy-number alterations (CNAs). In shotgun sequencing, regions of CNAs show step-wise changes in read depth when compared to adjacent "normal" regions, allowing their detection by parametric statistical tests that compare the mean coverage in suspected regions against that of a baseline distribution. Traditionally, the power of such a test depends on (1) the integer number of copy number change, (2) the overall sequencing depth, (3) the length of the CNA region, (4) the read length and (5) the variation of coverage along the genome, which depends on many experimental factors, including whether the chosen platform is whole-genome, whole-exome, or targeted-panel sequencing. In cases involving inadvertent sample mixing or genuine somatic mosaicism, power also depends on the mixing ratio. However, the analysis of statistical power that considers the interplay of all these factors has not been systematically developed. Here we present a general analytical framework and a series of simulations that explore situations from the simplest to the increasingly multifactorial. Specifically, we expand the expression of power to include not just the known factors but also one or both of two complications: (1) the dispersion of read depth around the mean beyond the independent sampling-by-sequencing assumption, and (2) the reduced fraction of the CNA-bearing sample ("purity") as seen in studies of intratumor heterogeneity or in clinical monitoring of minimal residual disease. We describe the analytical formula and their simplifications in special cases, and share the extendable scripts for others to perform customized power analysis using study-specific parameters. As study designs vary and technologies continue to evolve, the input data and the noise characteristics will change depending on the practical situation. We present two use cases commonly encountered in cancer research: ultra-shallow whole-genome sequencing for detecting large, chromosome-scale events, and targeted ultra-deep sequencing for surveillance of known CNAs in rare tumor clones in the task of sensitive detection of cancer relapse or metastasis. We also present an online calculator at https://shiny.med.umich.edu/apps/hanyou/CNV_Detection_Power_Calculator/.
- Downloaded 262 times
- Download rankings, all-time:
- Site-wide: 40,451 out of 67,351
- In bioinformatics: 4,807 out of 6,638
- Year to date:
- Site-wide: 42,092 out of 67,351
- Since beginning of last month:
- Site-wide: 35,550 out of 67,351
Downloads over time
Distribution of downloads per paper, site-wide
- Top preprints of 2018
- Paper search
- Author leaderboards
- Overall metrics
- The API
- Email newsletter
- 21 May 2019: PLOS Biology has published a community page about Rxivist.org and its design.
- 10 May 2019: The paper analyzing the Rxivist dataset has been published at eLife.
- 1 Mar 2019: We now have summary statistics about bioRxiv downloads and submissions.
- 8 Feb 2019: Data from Altmetric is now available on the Rxivist details page for every preprint. Look for the "donut" under the download metrics.
- 30 Jan 2019: preLights has featured the Rxivist preprint and written about our findings.
- 22 Jan 2019: Nature just published an article about Rxivist and our data.
- 13 Jan 2019: The Rxivist preprint is live!