TY - JOUR
T1 - In with the old, in with the new
T2 - machine learning for time to event biomedical research
AU - Danciu, Ioana
AU - Agasthya, Greeshma
AU - Tate, Janet P.
AU - Chandra-Shekar, Mayanka
AU - Goethert, Ian
AU - Ovchinnikova, Olga S.
AU - Mcmahon, Benjamin H.
AU - Justice, Amy C.
N1 - Publisher Copyright:
© 2022 Published by Oxford University Press on behalf of the American Medical Informatics Association.
PY - 2022/10/1
Y1 - 2022/10/1
N2 - The predictive modeling literature for biomedical applications is dominated by biostatistical methods for survival analysis, and more recently some out of the box machine learning approaches. In this article, we show a presentation of a machine learning method appropriate for time-to-event modeling in the area of prostate cancer long-term disease progression. Using XGBoost adapted to long-term disease progression, we developed a predictive model for 118 788 patients with localized prostate cancer at diagnosis from the Department of Veterans Affairs (VA). Our model accounted for patient censoring. Harrell's c-index for our model using only features available at the time of diagnosis was 0.757 95% confidence interval [0.756, 0.757]. Our results show that machine learning methods like XGBoost can be adapted to use accelerated failure time (AFT) with censoring to model long-term risk of disease progression. The long median survival justifies and requires censoring. Overall, we show that an existing machine learning approach can be used for AFT outcome modeling in prostate cancer, and more generally for other chronic diseases with long observation times.
AB - The predictive modeling literature for biomedical applications is dominated by biostatistical methods for survival analysis, and more recently some out of the box machine learning approaches. In this article, we show a presentation of a machine learning method appropriate for time-to-event modeling in the area of prostate cancer long-term disease progression. Using XGBoost adapted to long-term disease progression, we developed a predictive model for 118 788 patients with localized prostate cancer at diagnosis from the Department of Veterans Affairs (VA). Our model accounted for patient censoring. Harrell's c-index for our model using only features available at the time of diagnosis was 0.757 95% confidence interval [0.756, 0.757]. Our results show that machine learning methods like XGBoost can be adapted to use accelerated failure time (AFT) with censoring to model long-term risk of disease progression. The long median survival justifies and requires censoring. Overall, we show that an existing machine learning approach can be used for AFT outcome modeling in prostate cancer, and more generally for other chronic diseases with long observation times.
KW - machine learning
KW - predictive modeling
KW - survival analysis
KW - xgboost
UR - http://www.scopus.com/inward/record.url?scp=85138445691&partnerID=8YFLogxK
U2 - 10.1093/jamia/ocac106
DO - 10.1093/jamia/ocac106
M3 - Article
C2 - 35920306
AN - SCOPUS:85138445691
SN - 1067-5027
VL - 29
SP - 1737
EP - 1743
JO - Journal of the American Medical Informatics Association
JF - Journal of the American Medical Informatics Association
IS - 10
ER -