Rxivist combines preprints from bioRxiv with data from Twitter to help you find the papers being discussed in your field. Currently indexing 45,367 bioRxiv papers from 204,202 authors.
Most tweeted bioRxiv papers, last 7 days
419 results found. For more information, click each entry to expand.
381 tweets genomics
Ryan L Collins, Harrison Brand, Konrad J. Karczewski, Xuefang Zhao, Jessica Alflödi, Amit V Khera, Laurent C Francioli, Laura D Gauthier, Harold Wang, Nicholas A Watts, Matthew Solomonson, Anne O'Donnell-Luria, Alexander Baumann, Ruchi Munshi, Chelsea Lowther, Mark Walker, Christopher Whelan, Yongquing Huang, Ted Brookings, Ted Sharpe, Matthew R Stone, Elise Valkanas, Jack Fu, Grace Tiao, Kristen M Laricchia, Christine Stevens, Namrata Gupta, Lauren Margolin, The Genome Aggregation Database (gnomAD) Productio, The gnomAD Consortium, John A Spertus, Kent D Taylor, Henry J Lin, Stephen S Rich, Wendy Post, Yii-Der Ida Chen, Jerome I Rotter, Chad Nusbaum, Anthony Philippakis, Eric Lander, Stacey Gabriel, Benjamin M Neale, Sekar Kathiresan, Mark J. Daly, Eric Banks, Daniel G. MacArthur, Michael E. Talkowski
Structural variants (SVs) rearrange the linear and three-dimensional organization of the genome, which can have profound consequences in evolution, diversity, and disease. As national biobanks, human disease association studies, and clinical genetic testing are increasingly reliant on whole-genome sequencing, population references for small variants (i.e., SNVs & indels) in protein-coding genes, such as the Genome Aggregation Database (gnomAD), have become integral for the evaluation and interpretation of genomic variation. However, no comparable large-scale reference maps for SVs exist to date. Here, we constructed a reference atlas of SVs from deep whole-genome sequencing (WGS) of 14,891 individuals across diverse global populations (54% non-European) as a component of gnomAD. We discovered a rich landscape of 498,257 unique SVs, including 5,729 multi-breakpoint complex SVs across 13 mutational subclasses, and examples of localized chromosome shattering, like chromothripsis, in the general population. The mutation rates and densities of SVs were non-uniform across chromosomes and SV classes. We discovered strong correlations between constraint against predicted loss-of-function (pLoF) SNVs and rare SVs that both disrupt and duplicate protein-coding genes, suggesting that existing per-gene metrics of pLoF SNV constraint do not simply reflect haploinsufficiency, but appear to capture a gene's general sensitivity to dosage alterations. The average genome in gnomAD-SV harbored 8,202 SVs, and approximately eight genes altered by rare SVs. When incorporating these data with pLoF SNVs, we estimate that SVs comprise at least 25% of all rare pLoF events per genome. We observed large ( ≥1Mb), rare SVs in 3.1% of genomes (~1:32 individuals), and a clinically reportable pathogenic incidental finding from SVs in 0.24% of genomes (~1:417 individuals). We also estimated the prevalence of previously reported pathogenic recurrent CNVs associated with genomic disorders, which highlighted differences in frequencies across populations and confirmed that WGS-based analyses can readily recapitulate these clinically important variants. In total, gnomAD-SV includes at least one CNV covering 57% of the genome, while the remaining 43% is significantly enriched for CNVs found in tumors and individuals with developmental disorders. However, current sample sizes remain markedly underpowered to establish estimates of SV constraint on the level of individual genes or noncoding loci. The gnomAD-SV resources have been integrated into the gnomAD browser (https://gnomad.broadinstitute.org), where users can freely explore this dataset without restrictions on reuse, which will have broad utility in population genetics, disease association, and diagnostic screening.
115 tweets synthetic biology
To extend the frontier of genome editing and enable the radical redesign of mammalian genomes, we developed a set of dead-Cas9 base editor (dBEs) variants that allow editing at tens of thousands of loci per cell by overcoming the cell death associated with DNA double-strand breaks (DSBs) and single-strand breaks (SSBs). We used a set of gRNAs targeting repetitive elements - ranging in target copy number from about 31 to 124,000 per cell. dBEs enabled survival after large-scale base editing, allowing targeted mutations at up to ~13,200 and ~2610 loci in 293T and human induced pluripotent stem cells (hiPSCs), respectively, three orders of magnitude greater than previously recorded. These dBEs can overcome current on-target mutation and toxicity barriers that prevent cell survival after large-scale genome engineering.
72 tweets genomics
Single-cell RNA-seq (scRNA-seq) data exhibits significant cell-to-cell variation due to technical factors, including the number of molecules detected in each cell, which can confound biological heterogeneity with technical effects. To address this, we present a modeling framework for the normalization and variance stabilization of molecular count data from scRNA-seq experiments. We propose that the Pearson residuals from 'regularized negative binomial regression', where cellular sequencing depth is utilized as a covariate in a generalized linear model, successfully remove the influence of technical characteristics from downstream analyses while preserving biological heterogeneity. Importantly, we show that an unconstrained negative binomial model may overfit scRNA-seq data, and overcome this by pooling information across genes with similar abundances to obtain stable parameter estimates. Our procedure omits the need for heuristic steps including pseudocount addition or log-transformation, and improves common downstream analytical tasks such as variable gene selection, dimensional reduction, and differential expression. Our approach can be applied to any UMI-based scRNA-seq dataset and is freely available as part of the R package sctransform (https://github.com/ChristophH/sctransform), with a direct interface to our single-cell toolkit Seurat.
47 tweets scientific communication and education
The data in this report summarises the responses gathered from 365 principal investigators of academic laboratories, who started their independent positions in the UK within the last 6 years up to 2018. We find that too many new investigators express frustration and poor optimism for the future. These data also reveal, that many of these individuals lack the support required to make a successful transition to independence and that simple measures could be put in place by both funders and universities in order to better support these early career researchers. We use these data to make both recommendations of good practice and for changes to policies that would make significant improvements to those currently finding independence challenging. We find that some new investigators face significant obstacles when building momentum and hiring a research team. In particular, access to PhD students. We also find some important areas such as starting salaries where significant gender differences persist, which cannot be explained by seniority. Our data also underlines the importance of support networks, within and outside the department, and the positive influence of good mentorship through this difficult career stage.
41 tweets genomics
Single cell RNA-Seq (scRNA-Seq) profiles gene expression of individual cells. Recent scRNA-Seq datasets have incorporated unique molecular identifiers (UMIs). Using negative controls, we show UMI counts follow multinomial sampling with no zero-inflation. Current normalization procedures such as log of counts per million and feature selection by highly variable genes produce false variability in dimension reduction. We propose simple multinomial methods, including generalized principal component analysis (GLM-PCA) for non-normal distributions, and feature selection using deviance. These methods outperform current practice in a downstream clustering assessment using ground-truth datasets.
39 tweets immunology
Animal models are an integral part of the drug development and evaluation process. However, they are unsurprisingly imperfect reflections of humans, and the extent and nature of many immunological differences are unknown. With the rise of targeted and biological therapeutics, it is increasingly important that we understand the molecular differences in immunological behavior of humans and model organisms. Thus, we profiled a large number of healthy humans, along with three of the model organisms most similar to humans: rhesus and cynomolgus macaques and African green monkeys; and the most widely used mammalian model: mice. Using cross-species, universal phenotyping and signaling panels, we measured immune cell signaling responses to an array of 15 stimuli using CyTOF mass cytometry. We found numerous instances of different cellular phenotypes and immune signaling events occurring within and between species with likely effects on evaluation of therapeutics, and detail three examples (double-positive T cell frequency and signaling; granulocyte response to Bacillus anthracis antigen; and B cell subsets). We also explore the correlation of herpes simian B virus serostatus on the immune profile. The full dataset is available online at https://flowrepository.org (accession FR-FCM-Z2ZY) and https://immuneatlas.org.
34 tweets genetics
The population of the United States is shaped by centuries of migration, isolation, growth, and admixture between populations of global origins. Here, we assemble a comprehensive view of recent population history by studying the ancestry and population structure of over 32,000 individuals in the US using genetic, ancestral birth origin, and geographic data. We identify migration routes and barriers that reflect historical demographic events. We also uncover the spatial patterns of relatedness in subpopulations through the combination of haplotype clustering, ancestral birth origin analysis, and local ancestry inference. These patterns include substantial structure and heterogeneity in Hispanics/Latinos, isolation-by-distance in African Americans, elevated levels of relatedness and homozygosity in Asian immigrants, and fine-scale structure in European descents. Furthermore, quantification of familial birthplaces recapitulates historical immigration waves at high resolution. Taken together, our results provide detailed insights into the genetic structure and demographic history of the diverse US population.
33 tweets neuroscience
We present a system for scalable and customizable recording and stimulation of neural activity. In large animals and humans, the current benchmark for high spatial and temporal resolution neural interfaces are fixed arrays of wire or silicon electrodes inserted into the parenchyma of the brain. However, probes that are large and stiff enough to penetrate the brain have been shown to cause acute and chronic damage and inflammation, which limits their longevity, stability, and yield. One approach to this problem is to separate the requirements of the insertion device, which should to be as stiff as possible, with the implanted device, which should be as small and flexible as possible. Here, we demonstrate the feasibility and scalability of this approach with a system incorporating fine and flexible thin-film polymer probes, a fine and stiff insertion needle, and a robotic insertion machine. Together the system permits rapid and precise implantation of probes, each individually targeted to avoid observable vasculature and to attain diverse anatomical targets. As an initial demonstration of this system, we implanted arrays of electrodes in rat somatosensory cortex, recorded extracellular action potentials from them, and obtained histological images of the tissue response. This approach points the way toward a new generation of scaleable, stable, and safe neural interfaces, both for the basic scientific study of brain function and for clinical applications.
30 tweets ecology
Citizen science data are valuable for addressing a wide range of ecological research questions, and there has been a rapid increase in the scope and volume of data available. However, data from large-scale citizen science projects typically present a number of challenges that can inhibit robust ecological inferences. These challenges include: species bias, spatial bias, variation in effort, and variation in observer skill. To demonstrate key challenges in analysing citizen science data, we use the example of estimating species distributions with data from eBird, a large semi-structured citizen science project. We estimate three widely applied metrics for describing species distributions: encounter rate, occupancy probability, and relative abundance. For each method, we outline approaches for data processing and modelling that are suitable for using citizen science data for estimating species distributions. Model performance improved when data processing and analytical methods addressed the challenges arising from citizen science data. The largest gains in model performance were achieved with two key processes 1) the use of complete checklists rather than presence-only data, and 2) the use of covariates describing variation in effort and detectability for each checklist. Including these covariates accounted for heterogeneity in detectability and reporting, and resulted in substantial differences in predicted distributions. The data processing and analytical steps we outlined led to improved model performance across a range of sample sizes. When using citizen science data it is imperative to carefully consider the appropriate data processing and analytical procedures required to address the bias and variation. Here, we describe the consequences and utility of applying our suggested approach to semi-structured citizen science data to estimate species distributions. The methods we have outlined are also likely to improve other forms of inference and will enable researchers to conduct robust analyses and harness the vast ecological knowledge that exists within citizen science data.
25 tweets genomics
The highly dynamic nature of chromosome conformation and three-dimensional (3D) genome organization leads to cell-to-cell variability in chromatin interactions within a cell population, even if the cells of the population appear to be functionally homogeneous. Hence, although Hi-C is a powerful tool for mapping 3D genome organization, this heterogeneity of chromosome higher order structure among individual cells limits the interpretive power of population based bulk Hi-C assays. Moreover, single-cell studies have the potential to enable the identification and characterization of rare cell populations or cell subtypes in a heterogeneous population. However, it may require surveying relatively large numbers of single cells to achieve statistically meaningful observations in single-cell studies. By applying combinatorial cellular indexing to chromosome conformation capture, we developed single-cell combinatorial indexed Hi-C (sci-Hi-C), a high throughput method that enables mapping chromatin interactomes in large number of single cells. We demonstrated the use of sci-Hi-C data to separate cells by karytoypic and cell-cycle state differences and to identify cellular variability in mammalian chromosomal conformation. Here, we provide a detailed description of method design and step-by-step working protocols for sci-Hi-C.
24 tweets evolutionary biology
The testis expresses the largest number of genes of any mammalian organ, a finding that has long puzzled molecular biologists. Analyzing our single-cell transcriptomic maps of human and mouse spermatogenesis, we provide evidence that this widespread transcription serves to maintain DNA sequence integrity in the male germline by correcting DNA damage through 'transcriptional scanning'. Supporting this model, we find that genes expressed during spermatogenesis display lower mutation rates on the transcribed strand and have low diversity in the population. Moreover, this effect is fine-tuned by the level of gene expression during spermatogenesis. The unexpressed genes, which in our model do not benefit from transcriptional scanning, diverge faster over evolutionary time-scales and are enriched for sensory and immune-defense functions. Collectively, we propose that transcriptional scanning modulates germline mutation rates in a gene-specific manner, maintaining DNA sequence integrity for the bulk of genes but allowing for fast evolution in a specific subset.
24 tweets epidemiology
The majority of studies that link antibiotic usage and resistance focus on simple associations between the resistance against a specific antibiotic and the use of that specific antibiotic. However, the relationship between antibiotic use and resistance is more complex. Here we evaluate which antibiotics, including those mainly prescribed for respiratory tract infections, are associated with increased resistance among Escherichia coli isolated from urinary samples. Monthly primary care prescribing data were obtained from National Health Service (NHS) Digital. Positive E. coli records from urine samples in English primary care (n=888,207) between April 2014 and January 2016 were obtained from the Second Generation Surveillance System. Elastic net regularization was used to evaluate associations between prescribing of different antibiotic groups and resistance against amoxicillin, cephalexin, ciprofloxacin, co-amoxiclav and nitrofurantoin at the clinical commissioning group (CCG) level. England is divided into 209 CCGs, with each NHS practice prolonging to one CCG. Amoxicillin prescribing (measured in DDD/ 1000 inhabitants / day) was positively associated with amoxicillin (RR 1.03, 95% CI 1.01-1.04) and ciprofloxacin (RR 1.09, 95% CI 1.04-1.17) resistance. In contrast, nitrofurantoin prescribing was associated with lower levels of resistance to amoxicillin (RR 0.92, 95% CI 0.84-0.97). CCGs with higher levels of trimethoprim prescribing also had higher levels of ciprofloxacin resistance (RR 1.34, 95% CI 1.10-1.59). Amoxicillin, which is mainly (and often unnecessarily) prescribed for respiratory tract infections is associated with increased resistance against various antibiotics among E. coli causing urinary tract infections. Our findings suggest that when predicting the potential impact of interventions on antibiotic resistances it is important to account for use of other antibiotics, including those typically used for other indications.
23 tweets cell biology
Analysis of flagellum beating in three dimensions is important for understanding how cells can undergo complex flagellum-driven motility and the ability to use fluorescence microscopy for such three-dimensional analysis would be extremely powerful. Trypanosoma and Leishmania are unicellular parasites which undergo complex cell movements in three dimensions as they swim and would particularly benefit from such an analysis. Here, high-speed multifocal plane fluorescence microscopy, a technique in which a light path multi-splitter is used to visualise 4 focal planes simultaneously, was used to reconstruct the flagellum beating of Trypanosoma brucei and Leishmania mexicana in three dimensions. It was possible to use either an organic fluorescent stain or a genetically-encoded fluorescence fusion protein to visualise flagellum and cell movement in three dimensions at a 200 Hz frame rate. This high-speed multifocal plane fluorescence microscopy approach was used to address two open questions regarding Trypanosoma and Leishmania swimming: To quantify the planarity of the L. mexicana flagellum beat and analyse the nature of flagellum beating during T. brucei 'tumbling'.
23 tweets bioengineering
Glucose is arguably the most important molecule in metabolism, and its mismanagement underlies diseases of vast societal import, most notably diabetes. Although glucose-related metabolism has been the subject of intense study for over a century, tools to track glucose in living organisms with high spatio-temporal resolution are lacking. We describe the engineering of a family of genetically encoded glucose sensors with high signal-to-noise ratio, fast kinetics and affinities varying over four orders of magnitude (1 μM to 10 mM). The sensors allow rigorous mechanistic characterization of glucose transporters expressed in cultured cells with high spatial and temporal resolution. Imaging of neuron/glia co-cultures revealed ~3-fold higher glucose changes in astrocytes versus neurons. In larval Drosophila central nervous system explants, imaging of intracellular neuronal glucose suggested a novel rostro-caudal transport pathway in the ventral nerve cord neuropil, with paradoxically slower uptake into the peripheral cell bodies and brain lobes. In living zebrafish, expected glucose-related physiological sequelae of insulin and epinephrine treatments were directly visualized in real time. Additionally, spontaneous muscle twitches induced glucose uptake in muscle, and sensory- and pharmacological perturbations gave rise to large but enigmatic changes in the brain. These sensors will enable myriad experiments, most notably rapid, high-resolution imaging of glucose influx, efflux, and metabolism in behaving animals.
21 tweets genomics
Senjuti Saha, Akshaya Ramesh, Katrina L Kalantar, Roly Malaker, Md Hasanuzzaman, Lillian M Khan, Madeline Y Mayday, Mohammad Saiful Islam Sajib, Lucy M Li, Charles Langelier, Hafizur Rahman, Emily Dawn Crawford, Cristina M Tato, Maksuda Islam, Yun-Fang Juan, Charles de Bourcy, Boris Dimitrov, James Wang, Jennifer Tang, Jonathan Sheu, Rebecca Egger, Tiago Rodrigues De Carvalho, Michael R Wilson, Samir Saha, Joseph L. DeRisi
Background The disease burden due to meningitis in low and middle-income countries remains significant and failure to determine an etiology impedes appropriate treatment for patients and evidence-based policy decisions for populations. Broad-range pathogen surveillance using metagenomic next-generation sequencing (mNGS) of RNA isolated from cerebral spinal fluid (CSF) provides an unbiased assessment for possible infectious etiologies. In this study, our objective was to use mNGS to identify etiologies of pediatric meningitis in Bangladesh. Methods We conducted a retrospective case-control mNGS study on CSF from patients with known neurologic infections (n=36), idiopathic meningitis (n=25), without infection (n=30) and six environmental samples collected between 2012-2018. Using an open-access, cloud-based bioinformatics pipeline (IDseq) and machine learning, we identified potential pathogens which were confirmed through qPCR and Sanger sequencing. These cases were followed-up through phone/home-visits. The CSF samples were collected from children with WHO-defined meningeal signs during prospective meningitis surveillance at the largest pediatric referral hospital in Bangladesh. Results The 91 participants (42% female) ranged in age from 0-160 months (median: 9 months). In samples with known infectious causes of meningitis and without infections (n=66), there was 83% concordance between mNGS and conventional testing. In idiopathic cases (n=25), mNGS identified a potential etiology in 40% (n=10), including bacterial and viral pathogens. There were three instances of neuroinvasive Chikungunya virus (CHIKV). The CHIKV genomes were >99% identical to each other and to a Bangladeshi strain only previously recognized to cause systemic illness in 2017. CHIKV qPCR of all remaining stored CSF samples from children who presented with idiopathic meningitis in 2017 at the same hospital (n=472) revealed 17 additional CHIKV meningitis cases. Orthogonal molecular confirmation of each mNGS-identified infection, case-based clinical data, and follow-up of patients substantiated the key findings. Conclusions Using mNGS, we obtained a microbiological diagnosis for 40% of idiopathic meningitis cases and identified a previous unappreciated pediatric CHIKV meningitis outbreak. Case-control CSF mNGS surveys can complement conventional diagnostic methods to identify etiologies of meningitis and facilitate informed policy decisions.
20 tweets bioinformatics
The extensive generation of RNA sequencing (RNA-seq) data in the last decade has resulted in a myriad of specialized software for its analysis. Each software module typically targets a specific step within the analysis pipeline, making it necessary to join several of them to get a single cohesive workflow. Multiple software programs automating this procedure have been proposed, but often lack modularity, transparency or flexibility. We present ARMOR, which performs an end-to-end RNA-seq data analysis, from raw read files, via quality checks, alignment and quantification, to differential expression testing, geneset analysis and browser-based exploration of the data. ARMOR is implemented using the Snakemake workflow management system and leverages conda environments; Bioconductor objects are generated to facilitate downstream analysis, ensuring seamless integration with many R packages. The workflow is easily implemented by cloning the GitHub repository, replacing the supplied input and reference files and editing a configuration file. Although we have selected the tools currently included in ARMOR, the setup is modular and alternative tools can be easily integrated.
19 tweets ecology
Hannah Weigand, Arne J Beermann, Fedor Čiampor, Filipe O Costa, Zoltán Csabai, Sofia Duarte, Matthias F Geiger, Michał Grabowski, Frédéric Rimet, Björn Rulik, Malin Strand, Nikolaus Szucsich, Alexander M Weigand, Endre Willassen, Sofia A Wyler, Agnès Bouchez, Angel Borja, Zuzana Čiamporová-Zat'ovičová, Sónia Ferreira, KD Dijkstra, Ursula Eisendle, Jörg Freyhof, Piotr Gadawski, Wolfram Graf, Arne Haegerbaeumer, Berry B van der Hoorn, Bella Japoshvili, Lujza Keresztes, Emre Keskin, Florian Leese, Jan Macher, Tomasz Mamos, Guy Paz, Vladimir Pešić, Daniela Maric Pfannkuchen, Martin Andreas Pfannkuchen, Benjamin W Price, Buki Rinkevich, Marcos A. L. Teixeira, Gábor Várbíró, Torbjørn Ekrem
Effective identification of species using short DNA fragments (DNA barcoding and DNA metabarcoding) requires reliable sequence reference libraries of known taxa. Both taxonomically comprehensive coverage and content quality are important for sufficient accuracy. For aquatic ecosystems in Europe, reliable barcode reference libraries are particularly important if molecular identification tools are to be implemented in biomonitoring and reports in the context of the EU Water Framework Directive (WFD) and the Marine Strategy Framework Directive (MSFD). We analysed gaps in the two most important reference databases, Barcode of Life Data Systems (BOLD) and NCBI GenBank, with a focus on the taxa most frequently used in WFD and MSFD. Our analyses show that coverage varies strongly among taxonomic groups, and among geographic regions. In general, groups that were actively targeted in barcode projects (e.g. fish, true bugs, caddisflies and vascular plants) are well represented in the barcode libraries, while others have fewer records (e.g. marine molluscs, ascidians, and freshwater diatoms). We also found that species monitored in several countries often are represented by barcodes in reference libraries, while species monitored in a single country frequently lack sequence records. A large proportion of species (up to 50%) in several taxonomic groups are only represented by private data in BOLD. Our results have implications for the future strategy to fill existing gaps in barcode libraries, especially if DNA metabarcoding is to be used in the monitoring of European aquatic biota under the WFD and MSFD. For example, missing species relevant to monitoring in multiple countries should be prioritized. We also discuss why a strategy for quality control and quality assurance of barcode reference libraries is needed and recommend future steps to ensure full utilization of metabarcoding in aquatic biomonitoring.
19 tweets genetics
In C. elegans nematodes, components of liquid-like germ granules were shown to be required for transgenerational small RNA inheritance. Surprisingly, we show here that mutants with defective germ granules (pptr-1, meg-3/4, pgl-1) can nevertheless inherit potent small RNA-based silencing responses, but some of the mutants lose this ability after many generations of homozygosity. Animals mutated in pptr-1, which is required for stabilization of P granules in the early embryo, display extremely strong heritable RNAi responses, which last for tens of generations, long after the responses in wild type animals peter out. The phenotype of mutants defective in the core germ granules proteins MEG-3 and MEG-4, depends on the genotype of the ancestors: Mutants that derive from maternal lineages that had functional MEG-3 and MEG-4 proteins exhibit enhanced RNAi inheritance for multiple generations. While functional ancestral meg-3/4 alleles correct, and even potentiates the ability of mutant descendants to inherit RNAi, defects in germ granules functions can be memorized as well; Wild type descendants that derive from lineages of mutants show impaired RNAi inheritance for many (>16) generations, although their germ granules are intact. Importantly, while P granules are maternally deposited, wild type progeny derived from meg-3/4 male mutants also show reduced RNAi inheritance. Unlike germ granules, small RNAs are inherited also from the sperm. Moreover, we find that the transgenerational effects that depend on the ancestral germ granules require the argonaute protein HRDE-1, which carries heritable small RNAs in the germline. Indeed, small RNA sequencing reveals imbalanced levels of many endogenous small RNAs in germ granules mutants. Strikingly, we find that hrde-1;meg-3/4 triple mutants inherit RNAi, although hrde-1 was previously thought to be essential for heritable silencing. We propose that germ granules sort and shape the RNA pool, and that small RNA inheritance memorizes this activity for multiple generations.
19 tweets genomics
Takashi Gakuhari, Shigeki Nakagome, Simon Rasumussen, Morten Allentoft, Takehiro Sato, Thorfinn Korneliussen, Blanaid Ni Chuinneagain, Hiromi Matsumae, Kae Koganebuchi, Ryan Schmidt, Souichiro Mizushima, Osamu Kondo, Nobuo Shigehara, Minoru Yoneda, Ryosuke Kimura, Hajime Ishida, Yoshiyuki Masuyama, Yasuhiro Yamada, Atsushi Tajima, Hiroki Shibata, Atsushi Toyoda, Toshiyuki Tsurumoto, Tetsuaki Wakebe, Hiromi Shitara, Tsunehiko Hanihara, Eske Willerslev, Martin Sikora, Hiroki Oota
Anatomical modern humans reached East Asia by >40,000 years ago (kya). However, key questions still remain elusive with regard to the route(s) and the number of wave(s) in the dispersal into East Eurasia. Ancient genomes at the edge of East Eurasia may shed light on the detail picture of peopling to East Eurasia. Here, we analyze the whole-genome sequence of a 2.5 kya individual (IK002) characterized with a typical Jomon culture that started in the Japanese archipelago >16 kya. The phylogenetic analyses support multiple waves of migration, with IK002 forming a lineage basal to the rest of the ancient/present-day East Eurasians examined, likely to represent some of the earliest-wave migrants who went north toward East Asia from Southeast Asia. Furthermore, IK002 has the extra genetic affinity with the indigenous Taiwan aborigines, which may support a coastal route of the Jomon-ancestry migration from Southeast Asia to the Japanese archipelago. This study highlight the power of ancient genomics with the isolated population to provide new insights into complex history in East Eurasia.
19 tweets genomics
Massively parallel reporter assays (MPRAs) functionally screen thousands of sequences for regulatory activity in parallel. Although MPRAs have been applied to address diverse questions in gene regulation, there has been no systematic comparison of how differences in experimental design influence findings. Here, we screen a library of 2,440 sequences, representing candidate liver enhancers and controls, in HepG2 cells for regulatory activity using nine different approaches (including conventional episomal, STARR-seq, and lentiviral MPRA designs). We identify subtle but significant differences in the resulting measurements that correlate with epigenetic and sequence-level features. We also test this library in both orientations with respect to the promoter, validating en masse that enhancer activity is robustly independent of orientation. Finally, we develop and apply a novel method to assemble and functionally test libraries of the same putative enhancers as 192-mers, 354-mers, and 678-mers, and observe surprisingly large differences in functional activity. This work provides a framework for the experimental design of high-throughput reporter assays, suggesting that the extended sequence context of tested elements, and to a lesser degree the precise assay, influence MPRA results.
- Top preprints of 2018
- Author leaderboards
- Overall metrics (New!)
- The API
- About the project
- Email newsletter
- Rxivist preprint
- 1 Mar 2019: We now have summary statistics about bioRxiv downloads and submissions.
- 8 Feb 2019: Data from Altmetric is now available on the Rxivist details page for every preprint. Look for the "donut" under the download metrics.
- 30 Jan 2019: preLights has featured the Rxivist preprint and written about our findings.
- 22 Jan 2019: Nature just published an article about Rxivist and our data.
- 13 Jan 2019: The Rxivist preprint is live!