Deep representation learning of patient data from Electronic Health Records (EHR): A systematic review

Abstract

Patient representation learning refers to learning a dense mathematical representation of a patient that encodes meaningful information from Electronic Health Records (EHRs).
The existing predictive models mainly focus on the prediction of single diseases, rather than considering the complex mechanisms of patients from a holistic review.
Advances in patient representation learning techniques will be essential for powering patient-level EHR analyses.

In Electronic Health Records (EHRs), information regarding patient status is extensively documented. Therefore, EHR data provides a feasible mechanism to track patient health information and to make better decisions based on data-driven technologies.
Unlike data in clinical trials or other biomedical studies, secondary data extracted from EHRs are not designed to answer a specific hypothesis. Instead, their primary goal is to monitor a patient.
This results in the issue that EHR data have many challenging characteristics such as uncurated (data are not carefully chosen and thoughtfully organized or presented), poor-quality (data are rarely subject to data quality audits), high-dimensional (thousands of distinct medical events), sparse (lots of zero values), heterogeneous (drawn from different resources), temporal (data are collected over time), incomplete (missing values), large-scale (a large volume of data), and multimodal (multiple data modalities).
非精挑细选的、质量差、高维、稀疏、异构、时序、不完整、大规模、多模态

Pipeline: raw data -> embedding technique -> patient representation
Learning strategies: unsupervised, supervised, or self-supervised.
Evaluate:
- Clinical tasks: mortality, readmission, or a specific disease prediction.
- Visualization: intuitive understanding or interpretation.