Rxivist logo

Sequence-based modeling of genome 3D architecture from kilobase to chromosome-scale

By Jian Zhou

Posted 20 May 2021
bioRxiv DOI: 10.1101/2021.05.19.444847

The structural organization of the genome plays an important role in multiple aspects of genome function. Understanding how genomic sequence influences 3D organization can help elucidate their roles in various processes in healthy and disease states. However, the sequence determinants of genome structure across multiple spatial scales are still not well understood. To learn the complex sequence dependencies of multiscale genome architecture, here we developed a sequence-based deep learning approach, Orca, that predicts genome 3D architecture from kilobase to whole-chromosome scale, covering structures including chromatin compartments and topologically associating domains. Orca also makes both intrachromosomal and interchromosomal predictions and captures the sequence dependencies of diverse types of interactions, from CTCF-mediated to enhancer-promoter interactions and Polycomb-mediated interactions. Orca enables the interpretation of the effects of any structural variant at any size on multiscale genome organization and provides an in silico model to help study the sequence-dependent mechanistic basis of genome architecture. We show that the models accurately recapitulate effects of experimentally studied structural variants at varying sizes (300bp-80Mb) using only sequence. Furthermore, these sequence models enable in silico virtual screen assays to probe the sequence-basis of genome 3D organization at different scales. At the submegabase scale, the models predicted specific transcription factor motifs underlying cell-type-specific genome interactions. At the compartment scale, based on virtual screens of sequence activities, we propose a new model for the sequence basis of chromatin compartments: sequences at active transcription start sites are primarily responsible for establishing the expression-active compartment A, while the inactive compartment B typically requires extended stretches of AT-rich sequences (at least 6-12kb) and can form 'passively' without depending on any particular sequence pattern. Orca thus effectively provides an 'in silico genome observatory' to predict variant effects on genome structure and probe the sequence-based mechanisms of genome organization.

Download data

  • Downloaded 1,454 times
  • Download rankings, all-time:
    • Site-wide: 16,820
    • In bioinformatics: 1,870
  • Year to date:
    • Site-wide: 4,151
  • Since beginning of last month:
    • Site-wide: 5,190

Altmetric data

Downloads over time

Distribution of downloads per paper, site-wide