Predictive Modeling of Chronic Disease Progression Using Longitudinal Electronic Health Records and Machine Learning Algorithms
Keywords:
Chronic diseases, Electronic Health Records, machine learning, disease progression, predictive modellingAbstract
Chronic diseases represent a leading cause of morbidity and mortality worldwide. The increasing availability of longitudinal electronic health records (EHRs) offers an unprecedented opportunity to model disease progression using advanced machine learning (ML) algorithms. This study aims to develop predictive models for chronic disease progression by leveraging temporal patient data. Utilizing datasets comprising multiple years of structured clinical encounters, we compare the performance of traditional and deep learning models in forecasting disease milestones such as hospitalization, comorbidity onset, and mortality. Results indicate that temporal models, especially recurrent neural networks (RNNs), outperform baseline methods and show significant promise in personalized risk stratification and proactive care planning.
References
Choi, E., Bahadori, M. T., Sun, J., Kulas, J., Schuetz, A., & Stewart, W. F. (2016). RETAIN: An Interpretable Predictive Model for Healthcare Using Reverse Time Attention Mechanism. Advances in Neural Information Processing Systems, 29, 3504–3512.
Miotto, R., Li, L., Kidd, B. A., & Dudley, J. T. (2016). Deep Patient: An Unsupervised Representation to Predict the Future of Patients from the Electronic Health Records. Scientific Reports, 6, 26094.
Rajkomar, A., Oren, E., Chen, K., Dai, A. M., Hajaj, N., Hardt, M., ... & Dean, J. (2018). Scalable and accurate deep learning with electronic health records. NPJ Digital Medicine, 1(1), 18.
Nguyen, P., Tran, T., Wickramasinghe, N., & Venkatesh, S. (2017). Deepr: A Convolutional Net for Medical Records. IEEE Journal of Biomedical and Health Informatics, 21(1), 22-30.
Che, Z., Purushotham, S., Cho, K., Sontag, D., & Liu, Y. (2018). Recurrent Neural Networks for Multivariate Time Series with Missing Values. Scientific Reports, 8(1), 6085.
Tomašev, N., Glorot, X., Rae, J. W., Zielinski, M., Askham, H., Saraiva, A., ... & Suleyman, M. (2019). A clinically applicable approach to continuous prediction of future acute kidney injury. Nature, 572(7767), 116–119.
Esteban, C., Hyland, S. L., & Rätsch, G. (2016). Real-valued (Medical) Time Series Generation with Recurrent Conditional GANs. arXiv preprint arXiv:1706.02633.
Goldstein, B. A., Navar, A. M., Carter, R. E., & Sniderman, A. D. (2017). Moving beyond regression techniques in cardiovascular risk prediction: applying machine learning to address analytic challenges. European Heart Journal, 38(23), 1805–1814.
Zhang, Y., Milinovich, A., Xu, Z., Bambrick, H., Mengersen, K., Tong, S., & Hu, W. (2018). Monitoring pertussis infections using internet search queries. Scientific Reports, 8(1), 1–9.
Rajpurkar, P., Irvin, J., Zhu, K., Yang, B., Mehta, H., Duan, T., ... & Ng, A. Y. (2017). CheXNet: Radiologist-Level Pneumonia Detection on Chest X-Rays with Deep Learning. arXiv preprint arXiv:1711.05225.
Obermeyer, Z., Powers, B., Vogeli, C., & Mullainathan, S. (2019). Dissecting racial bias in an algorithm used to manage the health of populations. Science, 366(6464), 447–453.
Luo, Y., Xin, Y., Joshi, R., Celi, L. A., Szolovits, P., & Anand, V. (2016). Predicting ICU mortality risk by grouping temporal trends from a multivariate panel of physiological measurements. IEEE Journal of Biomedical and Health Informatics, 20(3), 730–738.
Zhao, J., Papapetrou, P., Asker, L., & Boström, H. (2017). Learning from heterogeneous temporal data in EHRs: A case study for risk prediction. Journal of Biomedical Informatics, 65, 105–116.
Suresh, H., & Guttag, J. V. (2019). A framework for understanding unintended consequences of machine learning. Communications of the ACM, 62(11), 62–71.
Razavian, N., Blecker, S., Schmidt, A. M., Smith-McLallen, A., Nigam, S., & Sontag, D. (2016). Population-level prediction of type 2 diabetes from claims data and EHRs. Journal of the American Medical Informatics Association, 23(e2), e295–e302.
Harutyunyan, H., Khachatrian, H., Kale, D. C., Ver Steeg, G., & Galstyan, A. (2019). Multitask learning and benchmarking with clinical time series data. Scientific Data, 6(1), 96
Downloads
Published
Issue
Section
License
Copyright (c) 2025 Banner Nunen (Author)

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.