DOME: Directional medical embedding vectors from Electronic Health Records

Jun Wen, Hao Xue, Everett Rush, Vidul A. Panickan, Tianrun Cai, Doudou Zhou, Yuk Lam Ho, Lauren Costa, Edmon Begoli, Chuan Hong, J. Michael Gaziano, Kelly Cho, Katherine P. Liao, Junwei Lu, Tianxi Cai

Research output: Contribution to journalArticlepeer-review

Abstract

Motivation: The increasing availability of Electronic Health Record (EHR) systems has created enormous potential for translational research. Recent developments in representation learning techniques have led to effective large-scale representations of EHR concepts along with knowledge graphs that empower downstream EHR studies. However, most existing methods require training with patient-level data, limiting their abilities to expand the training with multi-institutional EHR data. On the other hand, scalable approaches that only require summary-level data do not incorporate temporal dependencies between concepts. Methods: We introduce a DirectiOnal Medical Embedding (DOME) algorithm to encode temporally directional relationships between medical concepts, using summary-level EHR data. Specifically, DOME first aggregates patient-level EHR data into an asymmetric co-occurrence matrix. Then it computes two Positive Pointwise Mutual Information (PPMI) matrices to correspondingly encode the pairwise prior and posterior dependencies between medical concepts. Following that, a joint matrix factorization is performed on the two PPMI matrices, which results in three vectors for each concept: a semantic embedding and two directional context embeddings. They collectively provide a comprehensive depiction of the temporal relationship between EHR concepts. Results: We highlight the advantages and translational potential of DOME through three sets of validation studies. First, DOME consistently improves existing direction-agnostic embedding vectors for disease risk prediction in several diseases, for example achieving a relative gain of 5.5% in the area under the receiver operating characteristic (AUROC) for lung cancer. Second, DOME excels in directional drug-disease relationship inference by successfully differentiating between drug side effects and indications, correspondingly achieving relative AUROC gain over the state-of-the-art methods by 10.8% and 6.6%. Finally, DOME effectively constructs directional knowledge graphs, which distinguish disease risk factors from comorbidities, thereby revealing disease progression trajectories. The source codes are provided at https://github.com/celehs/Directional-EHR-embedding.

Original languageEnglish
Article number104768
JournalJournal of Biomedical Informatics
Volume162
DOIs
StatePublished - Feb 2025

Keywords

  • Directional medical embedding
  • Disease risk prediction
  • Drug-disease relationship
  • Electronic Health Records

Fingerprint

Dive into the research topics of 'DOME: Directional medical embedding vectors from Electronic Health Records'. Together they form a unique fingerprint.

Cite this