Deep representation learning of patient data from Electronic Health Records (EHR): A systematic review

Abstract

  • Patient representation learning refers to learning a dense mathematical representation of a patient that encodes meaningful information from Electronic Health Records (EHRs).
  • The existing predictive models mainly focus on the prediction of single diseases, rather than considering the complex mechanisms of patients from a holistic review.
  • Advances in patient representation learning techniques will be essential for powering patient-level EHR analyses.

Introduction

  • In Electronic Health Records (EHRs), information regarding patient status is extensively documented. Therefore, EHR data provides a feasible mechanism to track patient health information and to make better decisions based on data-driven technologies.
  • Unlike data in clinical trials or other biomedical studies, secondary data extracted from EHRs are not designed to answer a specific hypothesis. Instead, their primary goal is to monitor a patient.
  • This results in the issue that EHR data have many challenging characteristics such as uncurated (data are not carefully chosen and thoughtfully organized or presented), poor-quality (data are rarely subject to data quality audits), high-dimensional (thousands of distinct medical events), sparse (lots of zero values), heterogeneous (drawn from different resources), temporal (data are collected over time), incomplete (missing values), large-scale (a large volume of data), and multimodal (multiple data modalities).
  • 非精挑细选的、质量差、高维、稀疏、异构、时序、不完整、大规模、多模态

Background

Patient learning data pipeline

  • Pipeline: raw data -> embedding technique -> patient representation
  • Learning strategies: unsupervised, supervised, or self-supervised.
  • Evaluate:
    • Clinical tasks: mortality, readmission, or a specific disease prediction.
    • Visualization: intuitive understanding or interpretation.

Patient representation methods