Rxivist logo

Large-Scale Uniform Analysis of Cancer Whole Genomes in Multiple Computing Environments

By Christina K. Yung, Brian D. O’Connor, Sergei Yakneen, Junjun Zhang, Kyle Ellrott, Kortine Kleinheinz, Naoki Miyoshi, Keiran M. Raine, Romina Royo, Gordon B. Saksena, Matthias Schlesner, Solomon I. Shorser, Miguel Vazquez, Joachim Weischenfeldt, Denis Yuen, Adam P Butler, Brandi N. Davis-Dusenbery, Roland Eils, Vincent Ferretti, Robert Grossman, Olivier Harismendy, Youngwook Kim, Hidewaki Nakagawa, Steven J. Newhouse, David Torrents, Lincoln D Stein, on behalf of the PCAWG Technical Working Group, Javier Bartolomé Rodriguez, Keith A Boroevich, Rich Boyce, Angela N. Brooks, Alex Buchanan, Ivo Buchhalter, Niall J. Byrne, Andy Cafferkey, Peter J. Campbell, Zhaohong Chen, Sunghoon Cho, Wan Choi, Peter Clapham, Francisco M. De La Vega, Jonas Demeulemeester, Michelle T. Dow, Lewis J. Dursi, Juergen Eils, Claudiu Farcas, Francesco Favero, Nodirjon Fayzullaev, Paul Flicek, Nuno A. Fonseca, Josep Ll Gelpi, Gad A. Getz, Bob Gibson, Michael C. Heinold, Julian M. Hess, Oliver Hofmann, Jongwhi H. Hong, Thomas J. Hudson, Daniel Huebschmann, Barbara Hutter, Carolyn M. Hutter, Seiya Imoto, Sinisa Ivkovic, Seung-Hyup Jeon, Wei Jiao, Jongsun Jung, Rolf Kabbe, Andre Kahles, Jules Kerssemakers, Hyunghwan Kim, Hyung-Lae Kim, Jihoon Kim, Jan Korbel, Michael Koscher, Antonios Koures, Milena Kovacevic, Chris Lawerenz, Ignaty Leshchiner, Dimitri G. Livitz, George L. Mihaiescu, Sanja Mijalkovic, Ana Mijalkovic Lazic, Satoru Miyano, Hardeep K. Nahal, Mia Nastic, Jonathan Nicholson, David Ocana, Kazuhiro Ohi, Lucila Ohno-Machado, Larsson Omberg, B.F. Francis Ouellette, Nagarajan Paramasivam, Marc D. Perry, Todd D. Pihl, Manuel Prinz, Montserrat Puiggròs, Petar Radovic, Esther Rheinbay, Mara W. Rosenberg, Charles Short, Heidi J. Sofia, Jonathan Spring, Adam J Struck, Grace Tiao, Nebojsa Tijanic, Peter Van Loo, David Vicente, Jeremiah A. Wala, Zhining Wang, Johannes Werner, Ashley Williams, Youngchoon Woo, Adam J. Wright, Qian Xiang, the PCAWG Network

Posted 10 Jul 2017
bioRxiv DOI: 10.1101/161638

The International Cancer Genome Consortium (ICGC)'s Pan-Cancer Analysis of Whole Genomes (PCAWG) project aimed to categorize somatic and germline variations in both coding and non-coding regions in over 2,800 cancer patients. To provide this dataset to the research working groups for downstream analysis, the PCAWG Technical Working Group marshalled ~800TB of sequencing data from distributed geographical locations; developed portable software for uniform alignment, variant calling, artifact filtering and variant merging; performed the analysis in a geographically and technologically disparate collection of compute environments; and disseminated high-quality validated consensus variants to the working groups. The PCAWG dataset has been mirrored to multiple repositories and can be located using the ICGC Data Portal. The PCAWG workflows are also available as Docker images through Dockstore enabling researchers to replicate our analysis on their own data.

Download data

  • Downloaded 2,338 times
  • Download rankings, all-time:
    • Site-wide: 5,937
    • In genomics: 668
  • Year to date:
    • Site-wide: 34,099
  • Since beginning of last month:
    • Site-wide: 40,696

Altmetric data

Downloads over time

Distribution of downloads per paper, site-wide


Sign up for the Rxivist weekly newsletter! (Click here for more details.)