BD - Earth day 2024

High resolution data modifies intensive care unit dialysis outcome predictions as compared with low resolution administrative data set

Jennifer Ziegler, Barret N. M. Rush, Eric R. Gottlieb, Leo Anthony Celi, Miguel Ángel Armengol de la Hoz

Abstract
High resolution clinical databases from electronic health records are increasingly being used in the field of health data science. Compared to traditional administrative databases and disease registries, these newer highly granular clinical datasets offer several advantages, including availability of detailed clinical information for machine learning and the ability to adjust for potential confounders in statistical models. The purpose of this study is to compare the analysis of the same clinical research question using an administrative database and an electronic health record database. The Nationwide Inpatient Sample (NIS) was used for the low-resolution model, and the eICU Collaborative Research Database (eICU) was used for the high-resolution model. A parallel cohort of patients admitted to the intensive care unit (ICU) with sepsis and requiring mechanical ventilation was extracted from each database.

Introduction
Low resolution databases are data sets that lack granular and detailed clinical data, and often contain pre-specified types of information, such as patient demographics, diagnoses, hospital information as well hospital admission and discharge information [1,2]. Administrative databases, which have been utilized for medical research purposes since they were first created in the 1970s, are one example of a low resolution database [3]. These databases have allowed for the analysis of large amounts of healthcare data over the past decades and have been responsible for numerous practice-changing studies [4–6]. Administrative databases, such as the Nationwide Inpatient Sample (NIS), provide large patient samples and include valuable and reliable information such as patient demographics, diagnostic coding of primary and secondary diagnoses, procedures performed, length of hospitalization and discharge status (ie. Discharge, death, transfer to another facility) [7]. These databases are easily accessible, inexpensive and permit the study of practices and outcomes across a large spectrum of healthcare related research questions.

Results
There were 139,367 patients included in the 2014 eICU sample, of which a total of 8,822 (6.3%) patients were included in the cohort (Fig 1). A total of 7,071,762 hospitalizations from the 2014 NIS sample were analyzed and 223,947 (3.2%) were included in the cohort (Fig 2). The overall mortality was 22.6% in the eICU cohort, while the NIS cohort had a mortality of 26.9%. There were 727 (8.2%) patients in the eICU cohort who required dialysis, with a mortality rate of 36.0%; the NIS cohort contained 19,149 (8.5%) patients who required dialysis with a mortality rate of 40.0%.

Discussion
In this comparative analysis utilizing comparable cohorts of mechanically ventilated patients with sepsis, the addition of the high-resolution clinical variables in the eICU database allowed for greater adjustment of severity of illness and significantly altered the point estimate for the association of hemodialysis use and hospital mortality. We demonstrate that the cohorts of patients that were obtained from the low-resolution NIS database and the high resolution eICU-CRD are comparable by baseline patient and hospital demographics, dialysis use, and in-hospital mortality. The results show that the baseline low resolution models from both the cohorts predicted a significant association of in-hospital mortality with dialysis use.

Citation: Ziegler J, Rush BNM, Gottlieb ER, Celi LA, Armengol de la Hoz MÁ (2022) High resolution data modifies intensive care unit dialysis outcome predictions as compared with low resolution administrative data set. PLOS Digit Health 1(10): e0000124. https://doi.org/10.1371/journal.pdig.0000124

Editor: Hamish S. Fraser, Brown University, UNITED STATES

Received: October 25, 2021; Accepted: September 9, 2022; Published: October 11, 2022

Copyright: © 2022 Ziegler et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: The data for the high resolution model were extracted from the eICU Collaborative Research Database, a freely available multi-center database for critical care research. Pollard TJ, Johnson AEW, Raffa JD, Celi LA, Mark RG and Badawi O. Scientific Data (2018). DOI: http://dx.doi.org/10.1038/sdata.2018.178. Available from: https://www.nature.com/articles/sdata2018178 The data for the low resolution model were extracted from the Nationwide Inpatient Sample (NIS), a national, all-payer database: HCUP Databases. Healthcare Cost and Utilization Project (HCUP). Agency for Healthcare Research and Quality, Rockville, MD. Available from: https://www.hcup-us.ahrq.gov/nisoverview.jsp The authors provide open access to all their data extraction, filtering, data wrangling, modeling, figures and tables, code, and queries on https://github.com/theonesp/hr_vs_lr_repos.

Funding: Research reported in this publication was supported by the National Institute of Health grants T32DK007527 (ERG) and NIBIB R01EB017205 (LAC). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests: I have read the journal’s policy and the authors of this manuscript have the following competing interests: Leo Anthony Celi is the Editor-in Chief of PLOS Digital Health.