Rxivist logo

Multiple linear regression allows weighted burden analysis of rare coding variants in an ethnically heterogeneous population

By David Curtis

Posted 12 Jun 2020
bioRxiv DOI: 10.1101/2020.06.11.145938

Weighted burden analysis has been used in exome-sequenced case-control studies to identify genes in which there is an excess of rare and/or functional variants associated with phenotype. Implementation in a ridge regression framework allows simultaneous analysis of all variants along with relevant covariates such as population principal components. In order to apply the approach to a quantitative phenotype, a weighted burden score is derived for each subject and included in a linear regression analysis. The weighting scheme is adjusted in order to apply differential weights to rare and very rare variants and a score is derived based on both the frequency and predicted effect of each variant. When applied to an ethnically heterogeneous dataset consisting of 49,790 exome-sequenced UK Biobank subjects and using BMI as the phenotype the method produces a very inflated test statistic. However this is almost completely corrected by including 20 population principal components as covariates. When this is done the top 30 genes include a few which are quite plausibly associated with the phenotype, including LYPLAL1 and NSDHL. This approach offers a way to carry out gene-based analyses of rare variants identified by exome sequencing in heterogeneous datasets without requiring that data from ethnic minority subjects be discarded. This research has been conducted using the UK Biobank Resource. ### Competing Interest Statement The authors have declared no competing interest.

Download data

  • Downloaded 344 times
  • Download rankings, all-time:
    • Site-wide: 146,512
    • In genetics: 5,544
  • Year to date:
    • Site-wide: 50,351
  • Since beginning of last month:
    • Site-wide: 77,408

Altmetric data

Downloads over time

Distribution of downloads per paper, site-wide