Advanced Analytics Studies Applied to US Department of Veterans Affairs' Corporate Data Warehouse (Initial Draft)

Byung Hoony Park, Jason A. Laska, Hilda B. Klasky, Aileen Boone, Ozgur Ozmen, Rajasekar Karthik, Aneel Advani, Colby A. Cox, Mark G. Pleszkoch, Edmon Begoli

Research output: Book/ReportCommissioned report

Abstract

In this report, we describe our experience of applying several advanced analytics algorithms to the US Department of Veterans Affairs’ (VA’s) Corporate Data Warehouse (CDW) electronic health records datasets during FY2017-18. While various algorithms were applied to the CDW data, the main goal of this effort was to provide useful insights into the operational aspects (as opposed to the purely clinical aspect) of this specific implementation. Since there are not many reports in the public literature on advanced analytics applied to this data, this report provides unique insight in this regard. We focused on two machine learning (ML) applications: (1) clinical pathway inference (patterns dominating clinical pathways applied to a male dataset of Stable Ischemic Heart Disease patients as use case) and (2) medical-concept representation learning’s capability to inform and refine cohort membership based on probable patient outcomes. During our study on clinical pathway inference, we used two main modeling techniques: (a) topic probabilistic modeling (latent Dirichlet allocation [LDA]) and (b) feature imbedding. We observed that the applicability of LDA to the pathway inference remains in question. LDA provides results that are intuitive and easy for humans to comprehend; however, when applied to the pathway inference, it also generates pathway patterns that consist of a few dominant pathway components. In addition, LDA provides no information regarding the relationships among components in the same pathway. The outputs of LDA are also found to categorize patients based on their trace data of clinical procedures. However, the results suggest that LDA is biased toward statistically dominant components. This makes it particularly hard to discriminate pathway subbranches. To address the issue, we designed a new pathway inference methodology that integrates temporal ordering of pathway components into LDA results, as existing feature embedding tools such as word2vec are not readily applicable to our task. In addition, we developed our own feature embedding tool customized for patient traces. We conclude this study by suggesting two approaches to use embedding representation of a component into LDA and briefly list our lessons learned and recommendations for future work. For our study on medical-concept representation learning ability to inform and refine cohort membership based on probable patient outcomes, during our empirical evaluation we compared methods from medical-concept learning to standard one of a kind (one-of-K) encoding to evaluate the change in effectiveness as done in Choi, Schuetz, Stewart, and Sun (2016). We performed two primary empirical evaluations. The first evaluation was on a curated collection of 60,000 patients with no more than 1 year of medical history included. The second was a collection of patients with no restriction to the amount of medical history included. The hypothesis across both evaluations is that representation learning is broadly useful independent of downstream processing models; thus, for each experiment, we trained three models—a logistic regression model, a two-layer neural network model, and a nearest-neighbor model—and then averaged the results. Each model was trained using fivefold cross validation. We provided evidence that medical-representation learning improves predictions of the primary diagnosis category of a short patient history through 12 experiments. In addition, we provided evidence that medical-representation learning fails to improve prediction of the primary diagnosis category of an arbitrarily long patient history through 12 experiments. We conclude this study with a brief list of our lessons learned and recommendations for future work. We conclude the report with a discussion of recent technology trends as they relate to our artificial intelligence and, more specifically, to our ML research and approaches as described previously.
Original languageEnglish
Place of PublicationUnited States
DOIs
StatePublished - 2018

Keywords

  • 97 MATHEMATICS AND COMPUTING

Fingerprint

Dive into the research topics of 'Advanced Analytics Studies Applied to US Department of Veterans Affairs' Corporate Data Warehouse (Initial Draft)'. Together they form a unique fingerprint.

Cite this