Uncertainty-aware deep learning in healthcare: A scoping review
Tyler J. Loftus, Benjamin Shickel, Matthew M. Ruppert, Jeremy A. Balch, Tezcan Ozrazgat-Baslanti, Patrick J. Tighe, Philip A. Efron, William R. Hogan, Parisa Rashidi, Gilbert R. Upchurch Jr., Azra Bihorac
Abstract
Mistrust is a major barrier to implementing deep learning in healthcare settings. Entrustment could be earned by conveying model certainty, or the probability that a given model output is accurate, but the use of uncertainty estimation for deep learning entrustment is largely unexplored, and there is no consensus regarding optimal methods for quantifying uncertainty. Our purpose is to critically evaluate methods for quantifying uncertainty in deep learning for healthcare applications and propose a conceptual framework for specifying certainty of deep learning predictions. We searched Embase, MEDLINE, and PubMed databases for articles relevant to study objectives, complying with PRISMA guidelines, rated study quality using validated tools, and extracted data according to modified CHARMS criteria. Among 30 included studies, 24 described medical imaging applications.
Introduction
Deep learning is increasingly important in healthcare. Deep learning prediction models that leverage electronic health record data have outperformed other statistical and regression-based methods [1,2]. Computer vision models have matched or outperformed physicians for several common and essential clinical tasks, albeit in select circumstances [3,4]. These results suggest a potential role for clinical implementation of deep learning applications in health care.
Mistrust is a major barrier to clinical implementation of deep learning predictions [5,6]. Efforts to restore and build trust in machine learning have focused primarily on improving model explainability and interpretability.
Materials & Methods
Article inclusion is illustrated in Fig 1, a PRISMA flow diagram. We searched Embase, MEDLINE, and PubMed databases, chosen for their specificity to the healthcare domain, for articles with “deep learning” and “confidence” or “uncertainty” in the title or abstract and for articles with “deep learning” and “conformal prediction” in the title or abstract, identifying 37 unique articles. Two investigators independently screened all article abstracts for relevance to review objectives, removing three articles.
Results
Included articles are summarized in Table 1. Notably, the use of uncertainty estimation in these articles was rarely applied to building trust in deep learning among patients, caregivers, and clinicians. Therefore, the presentation of results will focus primarily on the content of the articles, and opportunities to use uncertainty-aware deep learning to build trust will be discussed further in the Discussion section as a novel application of established techniques.
Discussion
This review found that the uncertainty inherent in deep learning predictions are most commonly estimated for medical imaging applications using Monte Carlo dropout methods on convolutional neural networks. In addition, unique model architectures and uncertainty estimation methods can apply to non-pixel features, simultaneously improving predictive performance (presumably by mitigating risk for overfitting, in the case of Monte Carlo Dropout) while accurately estimating uncertainty. Unsurprisingly, for medical imaging applications, larger datasets of training images were associated with greater predictive performance [15,21,29–38]. We could not perform meta-analyses on predictive performance or uncertainty estimations because performance metrics and methods for quantifying uncertainty were heterogenous, despite relative homogeneity in model architectures–which were primarily based on convolutional neural networks–and homogeneity in methods for estimating uncertainty–which were primarily based on Monte Carlo dropout [14].
Conclusions
For convolutional neural network predictions on medical images, Monte Carlo dropout methods accurately estimate uncertainty. For non-medical imaging applications, a paucity of evidence suggests that several uncertainty estimation methods can improve predictive performance and accurately estimate uncertainty. Using uncertainty estimations to gain the trust of patients and clinicians is a novel concept that warrants empirical investigation.
Citation: Loftus TJ, Shickel B, Ruppert MM, Balch JA, Ozrazgat-Baslanti T, Tighe PJ, et al. (2022) Uncertainty-aware deep learning in healthcare: A scoping review. PLOS Digit Health 1(8): e0000085. https://doi.org/10.1371/journal.pdig.0000085
Editor: Yuan Lai, Tsinghua University, CHINA
Received: April 19, 2022; Accepted: July 9, 2022; Published: August 10, 2022
Copyright: © 2022 Loftus et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: All data are in the manuscript and/or supporting information files.
Funding: T.J.L. was supported by the National Institute of General Medical Sciences of the National Institutes of Health under Award Number K23 GM140268 and by the Thomas Maren Junior Investigator Fund. T.O.B. was supported by the National Institute of Diabetes and Digestive and Kidney Diseases of the National Institutes of Health grant K01 DK120784, R01GM110240 from the National Institute of General Medical Sciences, and by UF Research AWD09459 and the Gatorade Trust, University of Florida. P.T.J. was supported by R01GM114290 from the NIGMS and R01AG121647 from the National Institute on Aging (NIA). PR was supported by National Science Foundation CAREER award 1750192, P30AG028740 and R01AG05533 from the NIA, 1R21EB027344 from the National Institute of Biomedical Imaging and Bioengineering (NIBIB), and R01GM-110240 from the NIGMS. A.B. was supported by W. Martin Smith Interdisciplinary Patient Quality and Safety Award (IPQSA), Sepsis and Critical Illness Research Center Award P50 GM-111152 from the National Institute of General Medical Sciences, R01 GM110240 from the National Institute of General Medical Sciences, and by UF Research AWD09458. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.
Competing interests: The authors declare no conflicts of interest.