Leveraging Natural Language Processing for Automated Extraction of Patient Information from Unstructured Clinical Notes in Electronic Health Records

Authors

  • Kiran Acharya Nepal Author

Keywords:

Natural Language Processing, Clinical Notes, EHR, Information Extraction, Medical AI, Named Entity Recognition, Clinical Ontology

Abstract

Unstructured clinical notes embedded within Electronic Health Records (EHRs) hold critical insights for patient care and decision-making. However, the narrative nature of these notes limits the ease of data retrieval, integration, and real-time analytics. In this paper, we explore the application of Natural Language Processing (NLP) techniques for the automated extraction of structured patient information from unstructured clinical notes. Framed in the 2022 context, where EHR adoption and AI tools have matured, this paper evaluates NLP pipelines involving Named Entity Recognition (NER), clinical ontologies (UMLS, SNOMED CT), and machine learning models.

We present both a conceptual architecture and a proof-of-concept system trained on MIMIC-III datasets. The NLP pipeline uses hybrid rule-based and deep learning components to extract diagnoses, medication events, and temporal relationships. Evaluations show a notable improvement in precision and recall compared to previous heuristic systems. This automation holds promise in clinical decision support, population health research, and administrative documentation

References

Meystre, S. M., Savova, G. K., Kipper-Schuler, K. C., & Hurdle, J. F. (2008). Extracting information from textual documents in the electronic health record: A review of recent research. Yearbook of Medical Informatics, 17(1), 128–144.

Savova, G. K., Masanz, J. J., Ogren, P. V., Zheng, J., Sohn, S., Kipper-Schuler, K. C., & Chute, C. G. (2010). Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications. Journal of the American Medical Informatics Association, 17(5), 507–513.

Demner-Fushman, D., Chapman, W. W., & McDonald, C. J. (2009). What can natural language processing do for clinical decision support? Journal of Biomedical Informatics, 42(5), 760–772.

Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2018). BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.

Alsentzer, E., Murphy, J. R., Boag, W., Weng, W. H., Jindi, D., Naumann, T., & McDermott, M. (2019). Publicly available clinical BERT embeddings. arXiv preprint arXiv:1904.03323.

Sohn, S., Clark, C., Halgrim, S., & Chute, C. G. (2012). MedXN: An open source medication extraction and normalization tool for clinical text. Journal of the American Medical Informatics Association, 21(5), 858–865.

Friedman, C., Shagina, L., Lussier, Y., & Hripcsak, G. (2004). Automated encoding of clinical documents based on natural language processing. Journal of the American Medical Informatics Association, 11(5), 392–402.

Uzuner, Ö., South, B. R., Shen, S., & DuVall, S. L. (2011). 2010 i2b2/VA challenge on concepts, assertions, and relations in clinical text. Journal of the American Medical Informatics Association, 18(5), 552–556

Chapman, W. W., Nadkarni, P. M., Hirschman, L., D’Avolio, L. W., Savova, G. K., & Uzuner, Ö. (2011). Overcoming barriers to NLP for clinical text: the role of shared tasks and the need for additional creative solutions. Journal of the American Medical Informatics Association, 18(5), 540–543.

Liu, H., Bielinski, S. J., Sohn, S., Murphy, S., Wagholikar, K. B., Jonnalagadda, S., ... & Chute, C. G. (2013). An information extraction framework for cohort identification using electronic health records. AMIA Summits on Translational Science Proceedings, 2013, 149–153.

Roberts, K., Demner-Fushman, D., Tonning, J. M., Gonzalez, G., & Karp, P. D. (2017). Overview of the TREC 2017 precision medicine track. Proceedings of The Twenty-Sixth Text REtrieval Conference (TREC 2017).

Yang, X., Bian, J., Hogan, W. R., & Wu, Y. (2017). Clinical concept extraction using transformers. BMC Medical Informatics and Decision Making, 17(Suppl 2), 1–10.

Soysal, E., Wang, J., Jiang, M., Wu, Y., Pakhomov, S., Liu, H., & Xu, H. (2017). CLAMP – A toolkit for efficiently building customized clinical natural language processing pipelines. Journal of the American Medical Informatics Association, 25(3), 331–336.

Wang, Y., Wang, L., Rastegar-Mojarad, M., Liu, S., Shen, F., Liu, H., & Zhu, Q. (2018). Clinical information extraction applications: a literature review. Journal of Biomedical Informatics, 77, 34–49

Downloads

Published

2023-04-24