TY - GEN
T1 - MMiDaS-AE
AU - Lee, Eric W.
AU - Wallace, Byron C.
AU - Galaviz, Karla I.
AU - Ho, Joyce C.
N1 - Publisher Copyright:
© 2020 ACM.
PY - 2020/2/4
Y1 - 2020/2/4
N2 - Systematic review (SR) is an essential process to identify, evaluate, and summarize the findings of all relevant individual studies concerning health-related questions. However, conducting a SR is labor-intensive, as identifying relevant studies is a daunting process that entails multiple researchers screening thousands of articles for relevance. In this paper, we propose MMiDaS-AE, a Multi-modal Missing Data aware Stacked Autoencoder, for semi-automating screening for SRs. We use a multi-modal view that exploits three representations, of: 1) documents, 2) topics, and 3) citation networks. Documents that contain similar words will be nearby in the document embedding space. Models can also exploit the relationship between documents and the associated SR MeSH terms to capture article relevancy. Finally, related works will likely share the same citations, and thus closely related articles would, intuitively, be trained to be close to each other in the embedding space. However, using all three learned representations as features directly result in an unwieldy number of parameters. Thus, motivated by recent work on multi-modal auto-encoders, we adopt a multi-modal stacked autoencoder that can learn a shared representation encoding all three representations in a compressed space. However, in practice one or more of these modalities may be missing for an article (e.g., if we cannot recover citation information). Therefore, we propose to learn to impute the shared representation even when specific inputs are missing. We find this new model significantly improves performance on a dataset consisting of 15 SRs compared to existing approaches.
AB - Systematic review (SR) is an essential process to identify, evaluate, and summarize the findings of all relevant individual studies concerning health-related questions. However, conducting a SR is labor-intensive, as identifying relevant studies is a daunting process that entails multiple researchers screening thousands of articles for relevance. In this paper, we propose MMiDaS-AE, a Multi-modal Missing Data aware Stacked Autoencoder, for semi-automating screening for SRs. We use a multi-modal view that exploits three representations, of: 1) documents, 2) topics, and 3) citation networks. Documents that contain similar words will be nearby in the document embedding space. Models can also exploit the relationship between documents and the associated SR MeSH terms to capture article relevancy. Finally, related works will likely share the same citations, and thus closely related articles would, intuitively, be trained to be close to each other in the embedding space. However, using all three learned representations as features directly result in an unwieldy number of parameters. Thus, motivated by recent work on multi-modal auto-encoders, we adopt a multi-modal stacked autoencoder that can learn a shared representation encoding all three representations in a compressed space. However, in practice one or more of these modalities may be missing for an article (e.g., if we cannot recover citation information). Therefore, we propose to learn to impute the shared representation even when specific inputs are missing. We find this new model significantly improves performance on a dataset consisting of 15 SRs compared to existing approaches.
KW - Missing Data Imputation
KW - Multi-modal Stacked Autoencoder
KW - Systematic Review
UR - http://www.scopus.com/inward/record.url?scp=85082772371&partnerID=8YFLogxK
U2 - 10.1145/3368555.3384463
DO - 10.1145/3368555.3384463
M3 - Conference contribution
AN - SCOPUS:85082772371
T3 - ACM CHIL 2020 - Proceedings of the 2020 ACM Conference on Health, Inference, and Learning
SP - 139
EP - 150
BT - ACM CHIL 2020 - Proceedings of the 2020 ACM Conference on Health, Inference, and Learning
PB - Association for Computing Machinery, Inc
T2 - 2020 ACM Conference on Health, Inference, and Learning, CHIL 2020
Y2 - 2 April 2020 through 4 April 2020
ER -