Speaker:
Rui Dong (HIMIS)
Time:
May 13th, 10:30-11:30am
Location:
HIMIS 7A1
Title:
HIMIS Colloquium - Leveraging cis- and trans-variants to improve protein expression level prediction for proteome-wide association studies
Abstract:
For an audience more familiar with mathematics than molecular biology, we will fist briefly introduce the key concepts. The human genome differs between individuals at millions of single-nucleotide variants, which influence the abundance (expression level) of proteins, the molecular products of genes that drive most cellular function. Variants located near the gene encoding a protein are called cis-variants, while those located elsewhere in the genome are called trans-variants; both can carry signal for predicting protein expression.
Since genetic effects are often mediated through proteins, the analysis of proteomic data can provide insights into disease etiology. However, most studies lack proteomic data. To address this problem, we developed TransCisPredict to perform proteome-wide association studies (PWAS) at a biobank scale. TransCisPredict reduces computational burden through linkage-disequilibrium block selection which facilitates incorporating cis- and trans-variants to predict protein expression and performs protein-phenotype association analyses. To account for differences in protein regulatory architecture, four prediction methods are used for weight estimation, i.e., BayesR, Elastic Net, LASSO, and SuSiE. Five-fold cross-validation (CV) is used to select the optimal method for each protein. Weight estimation was performed using White British UK Biobank study subjects (N=42,644) with proteomic and genotype array data. Of the 2,920 available protein expression levels, 2,339 could be predicted with a CV-R^2 > 0.05 when cis- and trans-variants were used. Since most methods are limited to cis-variation, for comparison only cis-variants were used for prediction yielding 466 proteins with a CV-R^2 > 0.05. A PWAS was performed for 2,339 predicted protein expression levels and type 2 diabetes (T2D) using White British UK Biobank study subjects without proteomic data (N=364,132) followed by two-sample Mendelian randomization using a method that controls for horizontal pleiotropy for validation. Forty proteins were associated with T2D and validated. For the 466 cis-only predicted protein expression levels, three proteins were associated with T2D and validated. Incorporating both cis- and trans-variation using TransCisPredict facilitates the prediction of many more proteins compared to using cis-only variants thereby increasing the power of PWAS.
Keywords:
cis- and trans-variants, biobank-scale analysis, proteome-wide association studies (PWAS), two-sample Mendelian randomization.