Siemens
siemens
siemens

Predicting and elucidating the etiology of fatty liver disease: A machine learning modeling and validation study in the IMI DIRECT cohorts

Naeimeh Atabaki-Pasdar, Mattias Ohlsson, Ana Viñuela, Francesca Frau, Hugo Pomares-Millan, Mark Haid, Angus G. Jones, E. Louise Thomas, Robert W. Koivula, Azra Kurbasic, Pascal M. Mutie,
Hugo Fitipaldi, Juan Fernandez, Paul W. Franks

 

Abstract

Background

Non-alcoholic fatty liver disease (NAFLD) is highly prevalent and causes serious health complications in individuals with and without type 2 diabetes (T2D). Early diagnosis of NAFLD is important, as this can help prevent irreversible damage to the liver and, ultimately, hepatocellular carcinomas. We sought to expand etiological understanding and develop a diagnostic tool for NAFLD using machine learning.

Introduction

Non-alcoholic fatty liver disease (NAFLD) is characterized by the accumulation of fat in hepatocytes in the absence of excessive alcohol consumption. NAFLD is a spectrum of liver diseases, with its first stage, known as simple steatosis, defined as liver fat content ≥5% of total liver weight. Simple steatosis can progress to non-alcoholic steatohepatitis (NASH), fibrosis, cirrhosis, and eventually hepatocellular carcinoma. In NAFLD, triglycerides (TG) accumulate in hepatocytes, and liver insulin sensitivity is diminished, promoting hepatic gluconeogenesis, thereby raising the risk of type 2 diabetes (T2D) or exacerbating the disease pathology in those with diabetes [1–5]. Growing evidence also links an increased risk of cardiovascular events with NAFLD [6,7].

Methods

Participants (IMI DIRECT)

The primary data utilized in this study were generated within the IMI DIRECT consortium, which includes persons with diabetes (n = 795) and without diabetes (n = 2,234). All participants provided informed written consent, and the study protocol was approved by the regional research ethics committees for each clinical study center. Details of the study design and the core characteristics are provided elsewhere [15,16].

Discussion

Using data from the IMI DIRECT consortium, we developed 18 diagnostic models for early-stage NAFLD. These models were developed to reflect different scenarios within which they might be used: These included both clinical and research settings, with the more complex (and less accessible) models having the greatest predictive ability. The models were successfully validated in the UK Biobank where data permitted such analysis (clinical models 1 and 2). Overall, the basic clinical variables proved to be stronger predictors of fatty liver than more complex omics data, although adding omics data yielded the most powerful model, with very good cross-validated predictive ability (ROCAUC = 0.84).

Acknowledgments

We thank Mattias Borell for developing, logistical support, and advice related to the web interface. We thank all the participants and study center staff in IMI DIRECT for their contribution to the study. We thank all the participants in the UK Biobank. This research was conducted using the UK Biobank resource (application ID: 18274). For the proteomic analyses, we thank the entire staff of the Human Protein Atlas, the Plasma Profiling Facility at Science for Life Laboratory, and in particular Elin Birgersson, Annika Bendes, and Eni Andersson for technical assistance. We thank C. Prehn (HMGU) for laboratory work related to the metabolomic data.

Citation: Atabaki-Pasdar N, Ohlsson M, Viñuela A, Frau F, Pomares-Millan H, Haid M, et al. (2020) Predicting and elucidating the etiology of fatty liver disease: A machine learning modeling and validation study in the IMI DIRECT cohorts. PLoS Med 17(6): e1003149. https://doi.org/10.1371/journal.pmed.1003149

Academic Editor: Dominik Heider, University of Marburg, GERMANY

Received: January 16, 2020; Accepted: May 22, 2020; Published: June 19, 2020

This is an open access article, free of all copyright, and may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose. The work is made available under the Creative Commons CC0 public domain dedication.

Data Availability: Data cannot be shared publicly due to a need to maintain the confidentiality of patient data. Interested researchers may contact [email protected] to request and obtain relevant data.

Funding: The work leading to this publication has received support from the Innovative Medicines Initiative Joint Undertaking under grant agreement n°115317 (DIRECT), resources of which are composed of financial contribution from the European Union’s Seventh Framework Programme (FP7/2007-2013) and EFPIA companies’ in kind contribution. NAP is supported in part by Henning och Johan Throne-Holsts Foundation, Hans Werthén Foundation, an IRC award from the Swedish Foundation for Strategic Research and a European Research Council award ERC-2015-CoG - 681742_NASCENT. HPM is supported by an IRC award from the Swedish Foundation for Strategic Research and a European Research Council award ERC-2015-CoG - 681742_NASCENT. AGJ is supported by an NIHR Clinician Scientist award (17/0005624). RK is funded by the Novo Nordisk Foundation (NNF18OC0031650) as part of a postdoctoral fellowship, an IRC award from the Swedish Foundation for Strategic Research and a European Research Council award ERC-2015-CoG - 681742_NASCENT. AK, PM, HF, JF and GNG are supported by an IRC award from the Swedish Foundation for Strategic Research and a European Research Council award ERC-2015-CoG - 681742_NASCENT. TJM is funded by an NIHR clinical senior lecturer fellowship. S.Bru acknowledges support from the Novo Nordisk Foundation (grants NNF17OC0027594 and NNF14CC0001). ATH is a Wellcome Trust Senior Investigator and is also supported by the NIHR Exeter Clinical Research Facility. JMS acknowledges support from Science for Life Laboratory (Plasma Profiling Facility), Knut and Alice Wallenberg Foundation (Human Protein Atlas) and Erling-Persson Foundation (KTH Centre for Precision Medicine). MIM is supported by the following grants; Wellcome (090532, 098381, 106130, 203141, 212259); NIH (U01-DK105535). PWF is supported by an IRC award from the Swedish Foundation for Strategic Research and a European Research Council award ERC-2015-CoG - 681742_NASCENT. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests: I have read the journal’s policy and the authors of this manuscript have the following competing interests: PWF is a consultant for Novo Nordisk, Lilly, and Zoe Global Ltd., and has received research grants from numerous diabetes drug companies. HR is an employee and shareholder of Sanofi. MIM: The views expressed in this article are those of the author(s) and not necessarily those of the NHS, the NIHR, or the Department of Health. MIM has served on advisory panels for Pfizer, NovoNordisk and Zoe Global, has received honoraria from Merck, Pfizer, Novo Nordisk and Eli Lilly, and research funding from Abbvie, Astra Zeneca, Boehringer Ingelheim, Eli Lilly, Janssen, Merck, NovoNordisk, Pfizer, Roche, Sanofi Aventis, Servier, and Takeda. As of June 2019, MIM is an employee of Genentech, and a holder of Roche stock. AM is a consultant for Lilly and has received research grants from several diabetes drug companies.

Abbreviations: ALT, alanine transaminase; AST, aspartate transaminase; DBP, diastolic blood pressure; EFS, ensemble feature selection; FLI, fatty liver index; HBA1c, hemoglobin A1C; HSI, hepatic steatosis index; MMTT, mixed-meal tolerance test; MRI, magnetic resonance imaging; NAFLD, non-alcoholic fatty liver disease; NAFLD-LFS, non-alcoholic fatty liver disease liver fat score; NASH, non-alcoholic steatohepatitis; OGTT, oral glucose tolerance test; QC, quality control; ROCAUC, receiver operating characteristic area under the curve; SBP, systolic blood pressure; T2D, type 2 diabetes; TG, triglycerides