Rxivist logo

Detection of cross-contamination and strong mitonuclear discordance in two species groups of sawfly genus Empria (Hymenoptera, Tenthredinidae)

By Marko Prous, Kyung Min Lee, Marko Mutanen

Posted 21 Jan 2019
bioRxiv DOI: 10.1101/525626 (published DOI: 10.1016/j.ympev.2019.106670)

In several sawfly taxa strong mitonuclear discordance has been observed, with nuclear genes supporting species assignments based on morphology, whereas the barcode region of the mitochondrial COI gene suggesting different relationships. As previous studies were based on only few nuclear genes, the causes and the degree of mitonuclear discordance remain ambiguous. Here, we obtain genomic-scale ddRAD data together with Sanger sequencing of mitochondrial COI and two to three nuclear protein coding genes to investigate species limits and mitonuclear discordance in two closely related species groups within the sawfly genus Empria. As found previously based on nuclear ITS and mitochondrial COI sequences, species are in most cases supported as monophyletic based on previous and new nuclear data reported here, but not based on mitochondrial COI. This mitonuclear discordance can be explained by occasional mitochondrial introgression with little or no nuclear gene flow, a pattern that might be common in haplodiploid taxa with slowly evolving mitochondrial genomes. Some species in E. immersa group are not recovered as monophyletic also based on nuclear data, but this could partly be because of unresolved taxonomy. Preliminary analyses of ddRAD data did not recover monophyly of E. japonica within E. longicornis group (three Sanger sequenced nuclear genes strongly supported monophyly), but closer examination of the data and additional Sanger sequencing suggested that both specimens were substantially (possibly 10-20% of recovered loci) cross-contaminated. A reason could be due to specimen identification tag jumps during sequencing library preparation of pooled specimens that in previous studies have been shown to affect up to 2.5% of the sequenced reads. We provide an R script to examine patterns of identical loci among the specimens and estimate that cross-contamination rate is not unusually high for our ddRAD dataset as a whole (based on counting identical sequences between immersa and longicornis groups that are well separated from each other and probably do not hybridise). The high rate of cross-contamination for both E. japonica specimens might be explained by small number of recovered loci (~1000) compared to most other specimens (>10 000 in some cases) because of poor sequencing results. We caution drawing unexpected biological conclusions when closely related specimens are pooled before sequencing and tagged only at one end of the molecule or at both ends using unique combination of limited number of tags (less than the number of specimens).

Download data

  • Downloaded 342 times
  • Download rankings, all-time:
    • Site-wide: 78,735
    • In evolutionary biology: 4,286
  • Year to date:
    • Site-wide: 132,762
  • Since beginning of last month:
    • Site-wide: 125,666

Altmetric data

Downloads over time

Distribution of downloads per paper, site-wide