Unifying cardiovascular modelling with deep reinforcement learning for uncertainty aware control of sepsis treatment

Thesath Nanayakkara, Gilles Clermont, Christopher James Langmead, David Swigon

Sepsis is a potentially life-threatening inflammatory response to infection or severe tissue damage. It has a highly variable clinical course, requiring constant monitoring of the patient’s state to guide the management of intravenous fluids and vasopressors, among other interventions. Despite decades of research, there’s still debate among experts on optimal treatment. Here, we combine for the first time, distributional deep reinforcement learning with mechanistic physiological models to find personalized sepsis treatment strategies. Our method handles partial observability by leveraging known cardiovascular physiology, introducing a novel physiology-driven recurrent autoencoder, and quantifies the uncertainty of its own results. Moreover, we introduce a framework for uncertainty-aware decision support with humans in the loop. We show that our method learns physiologically explainable, robust policies, that are consistent with clinical knowledge.

Background and Related Work
Reinforcement Learning is a framework for optimizing sequential decision making. In its standard form, a Markov Decision Process (MDP), consisting of a 5-tuple (S,A,r,γ,p) is the framework considered. Here, S and A are state and action spaces,  is a reward function, p : (S, A, S) → [0, ∞) denotes the unknown environment dynamics, which specifies the distribution of the next state s′, given the state-action pair (s, a), and γ is a discount rate applied to rewards.

Trajectory reconstruction using a physiology-driven autoencoder
One of the key features of our method is the physiology-driven structure of the autoencoder that represents the cardiovascular state of the patient (see Fig 1B and 1C). The decoder of this autoencoder is a set of algebraic equations that map the latent state to observable, and clinically relevant physiological parameters, such as heart rate and blood pressure. Fig 2 shows selected reconstructed trajectories for one representative patient, using various levels of data corruption (see Methods). As the figure illustrates, the model successfully reconstructs the observable outputs and their trends with corruption probabilities as high as 25%. It is only at extreme levels of corruption (50%) that the model’s accuracy degrades.

Discussion and conclusion
We present an interdisciplinary approach which we believe takes a significant step towards improving the current state of data-driven interventions in the context of clinical sepsis, in terms of improving both outcome and interpretability. Indeed, we believe that the maximum benefit of Artificial Intelligence applied to medicine is best realized through the integration of mechanistic models of physiology whenever possible, uncertainty quantification, and human expert knowledge into sequential decision making frameworks.

Citation: Nanayakkara T, Clermont G, Langmead CJ, Swigon D (2022) Unifying cardiovascular modelling with deep reinforcement learning for uncertainty aware control of sepsis treatment. PLOS Digit Health 1(2): e0000012. https://doi.org/10.1371/journal.pdig.0000012

Editor: Matthew Chua Chin Heng, National University of Singapore, SINGAPORE

Received: August 14, 2021; Accepted: November 29, 2021; Published: February 17, 2022

Copyright: This is an open access article, free of all copyright, and may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose. The work is made available under the Creative Commons CC0 public domain dedication.

Data Availability: This work uses the publicly available MIMIIC-III database. This can accessed by visiting https://physionet.org/content/mimiciii/1.4/.

Funding: The author(s) received no specific funding for this work.

Competing interests: The authors have declared that no competing interests exist.