Rxivist logo

Chromosome-scale de novo assembly and phasing of a Chinese indigenous pig genome

By Yalan Yang, Jinmin Lian, Bingkun Xie, Muya Chen, Yongchao Niu, Qiaowei Li, Yuwen Liu, Guoqiang Yi, Xinhao Fan, Yijie Tang, Jiang Li, Ivan Liachko, Shawn T Sullivan, Bradley Nelson, Erwei Zuo, Zhonglin Tang

Posted 16 Sep 2019
bioRxiv DOI: 10.1101/770958

Chinese indigenous pigs differ significantly from Western commercial pig breeds in phenotypic and genomic characteristics. Thus, building a high-quality reference genome for Chinese indigenous pigs is pivotal to exploring gene function, genome evolution and improving genetic breeding in pigs. Here, we report an ultrahigh-quality phased chromosome-scale genome assembly for a male Luchuan pig, a representative Chinese domestic breed, by generating and combining data from PacBio Sequel reads, Illumina paired-end reads, high-throughput chromatin conformation capture and BioNano optical map. The primary assembly is ~ 2.58 Gb in size with contig and scaffold N50s of 18.03 Mb and 140.09 Mb, respectively. Comparison between primary assembly and alternative haplotig reveals numerous haplotype-specific alleles, which provide a rich resource to study the allele-specific expression, epigenetic regulation, genome structure and evolution of pigs. Gene enrichment analysis indicates that the Luchuan-specific genes are predominantly enriched in Gene Ontology terms for phosphoprotein phosphatase activity, signaling receptor activity and phosphatidylinositol binding. We provide clear molecular evolutionary evidence that the divergence time between Luchuan and Duroc pigs is dated back to about 1.7 million years ago. Meanwhile, Luchuan exhibits fewer events of gene family expansion and stronger gene family contraction than Duroc. The positively selected genes (PSGs) in Luchuan pig significantly enrich for protein tyrosine kinase activity, microtubule motor activity, GTPase activator activity and ubiquitin-protein transferase activity, whereas the PSGs in Duroc pig enrich for G-protein coupled receptor activity. Overall, our findings not only provide key benchmark data for the pig genetics community, but also pave a new avenue for utilizing porcine biomedical models to study human health and diseases.

Download data

  • Downloaded 917 times
  • Download rankings, all-time:
    • Site-wide: 15,495 out of 101,301
    • In genomics: 1,979 out of 6,276
  • Year to date:
    • Site-wide: 15,103 out of 101,301
  • Since beginning of last month:
    • Site-wide: 26,320 out of 101,301

Altmetric data

Downloads over time

Distribution of downloads per paper, site-wide


Sign up for the Rxivist weekly newsletter! (Click here for more details.)


  • 20 Oct 2020: Support for sorting preprints using Twitter activity has been removed, at least temporarily, until a new source of social media activity data becomes available.
  • 18 Dec 2019: We're pleased to announce PanLingua, a new tool that enables you to search for machine-translated bioRxiv preprints using more than 100 different languages.
  • 21 May 2019: PLOS Biology has published a community page about Rxivist.org and its design.
  • 10 May 2019: The paper analyzing the Rxivist dataset has been published at eLife.
  • 1 Mar 2019: We now have summary statistics about bioRxiv downloads and submissions.
  • 8 Feb 2019: Data from Altmetric is now available on the Rxivist details page for every preprint. Look for the "donut" under the download metrics.
  • 30 Jan 2019: preLights has featured the Rxivist preprint and written about our findings.
  • 22 Jan 2019: Nature just published an article about Rxivist and our data.
  • 13 Jan 2019: The Rxivist preprint is live!