TY - JOUR
T1 - DOME
T2 - Directional medical embedding vectors from Electronic Health Records
AU - Wen, Jun
AU - Xue, Hao
AU - Rush, Everett
AU - Panickan, Vidul A.
AU - Cai, Tianrun
AU - Zhou, Doudou
AU - Ho, Yuk Lam
AU - Costa, Lauren
AU - Begoli, Edmon
AU - Hong, Chuan
AU - Gaziano, J. Michael
AU - Cho, Kelly
AU - Liao, Katherine P.
AU - Lu, Junwei
AU - Cai, Tianxi
N1 - Publisher Copyright:
© 2024
PY - 2025/2
Y1 - 2025/2
N2 - Motivation: The increasing availability of Electronic Health Record (EHR) systems has created enormous potential for translational research. Recent developments in representation learning techniques have led to effective large-scale representations of EHR concepts along with knowledge graphs that empower downstream EHR studies. However, most existing methods require training with patient-level data, limiting their abilities to expand the training with multi-institutional EHR data. On the other hand, scalable approaches that only require summary-level data do not incorporate temporal dependencies between concepts. Methods: We introduce a DirectiOnal Medical Embedding (DOME) algorithm to encode temporally directional relationships between medical concepts, using summary-level EHR data. Specifically, DOME first aggregates patient-level EHR data into an asymmetric co-occurrence matrix. Then it computes two Positive Pointwise Mutual Information (PPMI) matrices to correspondingly encode the pairwise prior and posterior dependencies between medical concepts. Following that, a joint matrix factorization is performed on the two PPMI matrices, which results in three vectors for each concept: a semantic embedding and two directional context embeddings. They collectively provide a comprehensive depiction of the temporal relationship between EHR concepts. Results: We highlight the advantages and translational potential of DOME through three sets of validation studies. First, DOME consistently improves existing direction-agnostic embedding vectors for disease risk prediction in several diseases, for example achieving a relative gain of 5.5% in the area under the receiver operating characteristic (AUROC) for lung cancer. Second, DOME excels in directional drug-disease relationship inference by successfully differentiating between drug side effects and indications, correspondingly achieving relative AUROC gain over the state-of-the-art methods by 10.8% and 6.6%. Finally, DOME effectively constructs directional knowledge graphs, which distinguish disease risk factors from comorbidities, thereby revealing disease progression trajectories. The source codes are provided at https://github.com/celehs/Directional-EHR-embedding.
AB - Motivation: The increasing availability of Electronic Health Record (EHR) systems has created enormous potential for translational research. Recent developments in representation learning techniques have led to effective large-scale representations of EHR concepts along with knowledge graphs that empower downstream EHR studies. However, most existing methods require training with patient-level data, limiting their abilities to expand the training with multi-institutional EHR data. On the other hand, scalable approaches that only require summary-level data do not incorporate temporal dependencies between concepts. Methods: We introduce a DirectiOnal Medical Embedding (DOME) algorithm to encode temporally directional relationships between medical concepts, using summary-level EHR data. Specifically, DOME first aggregates patient-level EHR data into an asymmetric co-occurrence matrix. Then it computes two Positive Pointwise Mutual Information (PPMI) matrices to correspondingly encode the pairwise prior and posterior dependencies between medical concepts. Following that, a joint matrix factorization is performed on the two PPMI matrices, which results in three vectors for each concept: a semantic embedding and two directional context embeddings. They collectively provide a comprehensive depiction of the temporal relationship between EHR concepts. Results: We highlight the advantages and translational potential of DOME through three sets of validation studies. First, DOME consistently improves existing direction-agnostic embedding vectors for disease risk prediction in several diseases, for example achieving a relative gain of 5.5% in the area under the receiver operating characteristic (AUROC) for lung cancer. Second, DOME excels in directional drug-disease relationship inference by successfully differentiating between drug side effects and indications, correspondingly achieving relative AUROC gain over the state-of-the-art methods by 10.8% and 6.6%. Finally, DOME effectively constructs directional knowledge graphs, which distinguish disease risk factors from comorbidities, thereby revealing disease progression trajectories. The source codes are provided at https://github.com/celehs/Directional-EHR-embedding.
KW - Directional medical embedding
KW - Disease risk prediction
KW - Drug-disease relationship
KW - Electronic Health Records
UR - http://www.scopus.com/inward/record.url?scp=85214340231&partnerID=8YFLogxK
U2 - 10.1016/j.jbi.2024.104768
DO - 10.1016/j.jbi.2024.104768
M3 - Article
AN - SCOPUS:85214340231
SN - 1532-0464
VL - 162
JO - Journal of Biomedical Informatics
JF - Journal of Biomedical Informatics
M1 - 104768
ER -