Rxivist logo

Rxivist combines preprints from bioRxiv with data from Twitter to help you find the papers being discussed in your field. Currently indexing 64,772 bioRxiv papers from 287,133 authors.

A most wanted list of conserved protein families with no known domains

By Stacia K. Wyman, Aram Avila-Herrera, Stephen Nayfach, Katherine S Pollard

Posted 23 Oct 2017
bioRxiv DOI: 10.1101/207985 (published DOI: 10.1371/journal.pone.0205749)

The number and proportion of genes with no known function are growing rapidly. To quantify this phenomenon and provide criteria for prioritizing genes for functional characterization, we developed a bioinformatics pipeline that identifies robustly defined protein families with no annotated domains, ranks these with respect to phylogenetic breadth, and identifies them in metagenomics data. We applied this approach to 271 965 protein families from the SFams database and discovered many with no functional annotation, including >118 000 families lacking any known protein domain. From these, we prioritized 6 668 conserved protein families with at least three sequences from organisms in at least two distinct classes. These Function Unknown Families (FUnkFams) are present in Tara Oceans Expedition and Human Microbiome Project metagenomes, with distributions associated with sampling environment. Our findings highlight the extent of functional novelty in sequence databases and establish an approach for creating a "most wanted" list of genes to characterize.

Download data

  • Downloaded 696 times
  • Download rankings, all-time:
    • Site-wide: 12,864 out of 64,772
    • In genomics: 1,774 out of 4,433
  • Year to date:
    • Site-wide: 58,970 out of 64,772
  • Since beginning of last month:
    • Site-wide: 60,082 out of 64,772

Altmetric data


Downloads over time

Distribution of downloads per paper, site-wide


Sign up for the Rxivist weekly newsletter! (Click here for more details.)


News