Deep neural network improves the estimation of polygenic risk scores for breast cancer

Adrien Badré, Li Zhang, Wellington Muchero, Justin C. Reynolds, Chongle Pan

Research output: Contribution to journalArticlepeer-review

31 Scopus citations

Abstract

Polygenic risk scores (PRS) estimate the genetic risk of an individual for a complex disease based on many genetic variants across the whole genome. In this study, we compared a series of computational models for estimation of breast cancer PRS. A deep neural network (DNN) was found to outperform alternative machine learning techniques and established statistical algorithms, including BLUP, BayesA, and LDpred. In the test cohort with 50% prevalence, the Area Under the receiver operating characteristic Curve (AUC) were 67.4% for DNN, 64.2% for BLUP, 64.5% for BayesA, and 62.4% for LDpred. BLUP, BayesA, and LPpred all generated PRS that followed a normal distribution in the case population. However, the PRS generated by DNN in the case population followed a bimodal distribution composed of two normal distributions with distinctly different means. This suggests that DNN was able to separate the case population into a high-genetic-risk case subpopulation with an average PRS significantly higher than the control population and a normal-genetic-risk case subpopulation with an average PRS similar to the control population. This allowed DNN to achieve 18.8% recall at 90% precision in the test cohort with 50% prevalence, which can be extrapolated to 65.4% recall at 20% precision in a general population with 12% prevalence. Interpretation of the DNN model identified salient variants that were assigned insignificant p values by association studies, but were important for DNN prediction. These variants may be associated with the phenotype through nonlinear relationships.

Original languageEnglish
Pages (from-to)359-369
Number of pages11
JournalJournal of Human Genetics
Volume66
Issue number4
DOIs
StatePublished - Apr 2021

Funding

Acknowledgements We would like to thank the OU Supercomputing Center for Education & Research (OSCER) for supercomputing technical support, the DRIVE project for the GWAS data, NIH dbGap for data access authorization, and Dr. Xu Chao for helpful discussions. The study was funded by Dr. Pan’s startup funding from the University of Oklahoma and by the Oak Ridge National Laboratory (ORNL)’ Directed Research Development (LDRD) Funding. Oak Ridge National Laboratory is managed by UT-Battelle, LLC for the U. S. Department of Energy under Contract Number DE-AC05-00OR22725.

FundersFunder number
U. S. Department of EnergyDE-AC05-00OR22725
Oak Ridge National Laboratory
University of Oklahoma
Oklahoma Water Resources Center, Oklahoma State University

    Fingerprint

    Dive into the research topics of 'Deep neural network improves the estimation of polygenic risk scores for breast cancer'. Together they form a unique fingerprint.

    Cite this