An explainable artificial intelligence approach for predicting cardiovascular outcomes using electronic health records

Sergiusz Wesołowski, Gordon Lemmon, Edgar J. Hernandez, Alex Henrie, Thomas A. Miller, Derek Weyhrauch, Michael D. Puchalski, Bruce E. Bray, Rashmee U. Shah, Vikrant G. Deshmukh, Rebecca Delaney, H. Joseph Yost, Karen Eilbeck, Martin Tristani-Firouzi, Mark Yandell

Abstract
Understanding the conditionally-dependent clinical variables that drive cardiovascular health outcomes is a major challenge for precision medicine. Here, we deploy a recently developed massively scalable comorbidity discovery method called Poisson Binomial based Comorbidity discovery (PBC), to analyze Electronic Health Records (EHRs) from the University of Utah and Primary Children’s Hospital (over 1.6 million patients and 77 million visits) for comorbid diagnoses, procedures, and medications. Using explainable Artificial Intelligence (AI) methodologies, we then tease apart the intertwined, conditionally-dependent impacts of comorbid conditions and demography upon cardiovascular health, focusing on the key areas of heart transplant, sinoatrial node dysfunction and various forms of congenital heart disease.

Introduction
The application of data-science methods to electronic health record (EHR) databases promises a new, global perspective on human health, with widespread applications for outcomes research and precision medicine initiatives. However, unmet technological challenges still exist [1–3][. One is the need for improved means for ab initio discovery of comorbid clinical variables in the context of confounding demographic variables at scale.

Results
PBC is well powered for discovery of cardiovascular comorbidities
Table 1 demonstrates the utility of the PBC [10] approach for discovery, by comparing the power of PBC versus a standard stratification approach (followed by χ2) to detect the well documented comorbid relationship between atrial fibrillation (AF) and acute cerebrovascular disease (stroke) [29,30]. Table 1 provides a power analysis as a function of corpus size and number of demographic variables. The effects of stratifying the data for χ2 analysis, versus adding them to the PBC calculation, can be observed as one proceeds down the table columns. Results for three different starting cohort sizes are shown. Note how stratification lowers the strength of p-values as a function of the size of the stratum.

Methods
Human subjects approval for this study was obtained following review by the University of Utah Institutional Review Board, IRB_00095807 under a waiver of consent and authorization. Patient data was not anonymized prior to the start of the study. All authors completed Human Subjects research requirements.

Discussion
The ability to model dependencies among multiple risk factors is crucial for meaningful outcomes research. Unfortunately, traditional techniques, such as logistic regression, have limited ability to capture so-called ‘conditional dependencies’ between variables, which are the heart and soul of multimorbid analyses. Although mixture and generalized linear models with mixed effects can (in principle) overcome this weakness, these techniques are limited because a new model must be designed for every question. Neural nets provide one possible alternative.

Conclusion
The analyses presented here provide a first step toward a global description of heart disease and associated comorbidities across the USA intermountain west. However, the map we seek resides not so much in the results reported here, as it does in the products of our analyses: the PGM multimorbidity networks. As we have explained, these networks support multitudes of queries, and when used in combination, support both wide-ranging and focused explorations of a disease landscape. Given the right datasets, we have shown that the approach can provide new insights, such as the mother-child cross-generational cardiovascular multimorbidities we described.

Citation: Wesołowski S, Lemmon G, Hernandez EJ, Henrie A, Miller TA, Weyhrauch D, et al. (2022) An explainable artificial intelligence approach for predicting cardiovascular outcomes using electronic health records. PLOS Digit Health 1(1): e0000004. https://doi.org/10.1371/journal.pdig.0000004

Editor: Mecit Can Emre Simsekler, Khalifa University of Science and Technology, UNITED ARAB EMIRATES

Received: August 31, 2021; Accepted: November 17, 2021; Published: January 18, 2022

Copyright: © 2022 Wesołowski et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: We obtained medical records from the University of Utah and Primary Children’s Hospital under an IRB that waived consent (see ethics statement). We refer to this cross-institution extract as the Utah Data Resource. Because the aggregate is comprised of exact dates and other protected patient information, the data cannot be made publicly available. Information regarding how qualified researchers might apply for data access can be found here https://irb.utah.edu/about/contact/. However, All Probabilistic Graphical Models described in this paper are available through the web using the following link: https://pbc.genetics.utah.edu/lemmon2021/bayes/.

Funding: This research was supported by the AHA Children’s Strategically Focused Research Network grant (17SFRN33630041) (https://professional.heart.org/en/research-programs/strategically-focused-research/strategically-focused-research-networks) and the Nora Eccles Treadwell Foundation. RD’s effort was supported by the National Institutes of Health under Ruth L. Kirschstein National Research Service Award T32 HL007576 from the National Heart, Lung, and Blood Institute (https://grants.nih.gov/grants/oer.htm). GL was supported by NRSA training grant T32H757632 (https://researchtraining.nih.gov/programs/training-grants/T32). SW was supported by NRSA training grant T32DK110966-04 (https://researchtraining.nih.gov/programs/training-grants/T32). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests: I have read the journal’s policy and the authors of this manuscript have the following competing interests: GL, VD, MY own shares in Backdrop Health, there are no financial ties regarding this research.

An explainable artificial intelligence approach for predicting cardiovascular outcomes using electronic health records

Sergiusz Wesołowski, Gordon Lemmon, Edgar J. Hernandez, Alex Henrie, Thomas A. Miller, Derek Weyhrauch, Michael D. Puchalski, Bruce E. Bray, Rashmee U. Shah, Vikrant G. Deshmukh, Rebecca Delaney, H. Joseph Yost, Karen Eilbeck, Martin Tristani-Firouzi, Mark Yandell

quick links

get in touch