Rxivist logo

Significance estimation for large scale untargeted metabolomics annotations

By Kerstin Scheubert, Franziska Hufsky, Daniel Petras, Mingxun Wang, Louis-Félix Nothias, Kai Dührkop, Nuno Bandeira, Pieter C. Dorrestein, Sebastian Bocker

Posted 17 Feb 2017
bioRxiv DOI: 10.1101/109389 (published DOI: 10.1038/s41467-017-01318-5)

The annotation of small molecules in untargeted mass spectrometry relies on the matching of fragment spectra to reference library spectra. While various spectrum-spectrum match scores exist, the field lacks statistical methods for estimating the false discovery rates (FDR) of these annotations. We present empirical Bayes and target-decoy based methods to estimate the false discovery rate. Relying on estimations of false discovery rates, we explore the effect of different spectrum-spectrum match criteria on the number and the nature of the molecules annotated. We show that the spectral matching settings needs to be adjusted for each project. By adjusting the scoring parameters and thresholds, the number of annotations rose, on average, by +139% (ranging from -92% up to +5705%) when compared to a default parameter set available at GNPS. The FDR estimation methods presented will enable a user to define the scoring criteria for large scale analysis of untargeted small molecule data that has been essential in the advancement of large scale proteomics, transcriptomics, and genomics science.

Download data

  • Downloaded 904 times
  • Download rankings, all-time:
    • Site-wide: 12,725 out of 85,187
    • In bioinformatics: 2,045 out of 8,148
  • Year to date:
    • Site-wide: 68,540 out of 85,187
  • Since beginning of last month:
    • Site-wide: 63,634 out of 85,187

Altmetric data

Downloads over time

Distribution of downloads per paper, site-wide


Sign up for the Rxivist weekly newsletter! (Click here for more details.)