Population heterogeneity in clinical cohorts affects the predictive accuracy of brain imaging
Oualid Benkarim, Casey Paquola, Bo-yong Park, Valeria Kebets, Seok-Jun Hong, Reinder Vos de Wael, Shaoshi Zhang, B. T. Thomas Yeo, Michael Eickenberg, Tian Ge, Jean-Baptiste Poline, Boris C. Bernhardt, Danilo Bzdok
Abstract
Brain imaging research enjoys increasing adoption of supervised machine learning for single-participant disease classification. Yet, the success of these algorithms likely depends on population diversity, including demographic differences and other factors that may be outside of primary scientific interest. Here, we capitalize on propensity scores as a composite confound index to quantify diversity due to major sources of population variation. We delineate the impact of population heterogeneity on the predictive accuracy and pattern stability in 2 separate clinical cohorts: the Autism Brain Imaging Data Exchange (ABIDE, n = 297) and the Healthy Brain Network (HBN, n = 551). Across various analysis scenarios, our results uncover the extent to which cross-validated prediction performances are interlocked with diversity.
Introduction
Brain scanning technology opens a noninvasive window into the structure and function of the human brain. Combined with machine learning algorithms, brain imaging research is now gaining momentum to transition from group-level contrast analyses toward single-participant prediction [1–3]. In supervised learning, the main purpose is to learn coherent patterns from brain measurements (i.e., brain signatures) that can be used to make accurate forecasts for new participants [4,5]. The prediction paradigm holds the promise of improving disease diagnosis, enhancing prognostic estimates, and ultimately paving the way to precision medicine [6,7]. Machine learning methods are now increasingly adopted for the goal of classifying various conditions, including autism spectrum disorder (ASD) [8–11], attention-deficit/hyperactivity disorder (ADHD) [12–15], anxiety (ANX) [16,17], or schizophrenia [18–22].
Methods
For ABIDE, we considered data from all acquisition sites with at least 10 participants per group and with both children and adults. After detailed quality control, only cases with acceptable T1-weighted (T1w) MRI, surface extraction, and head motion in rs-fMRI were included. Participants with ASD were diagnosed based on an in-person interview administered by board-certified mental health professionals using the gold standard diagnostics of the Autism Diagnostic Observation Schedule (ADOS) [51] and/or Autism Diagnostic Interview-Revised (ADI-R) [52,53]. Typically developing (TD) participants had no history of mental disorders.
Results
In this work, we explored the value of propensity scores as a handle to detect and monitor the role of cohort diversity in predictive modeling. With our approach, we were able to encompass multiple sources of population variation into a single dimension that meticulously recapitulated the diversity among the participants in our cohorts. Tailoring the propensity score framework to brain imaging predictions allowed us to bring to the fore the relationship between diversity and prediction accuracy in a rigorous manner.
Discussion
In our quest toward realizing single-patient prediction in real-world settings, MRI-based machine learning approaches seek to provide accurate and replicable biomarkers. Due to the recent rise of large-scale brain scanning collections, analytical tools are now urgently needed to account for potential distributional shifts as a consequence of increasing diversity of the participant cohorts. In multisite neuroimaging studies, the generalization power of the predictive models is more likely to be affected by several sources of population variation. Some previous research studied their impact on prediction accuracy and biomarker robustness [42,48,89]. Yet, such analyses typically focused on a single source of participant heterogeneity.
Citation: Benkarim O, Paquola C, Park B-y, Kebets V, Hong S-J, Vos de Wael R, et al. (2022) Population heterogeneity in clinical cohorts affects the predictive accuracy of brain imaging. PLoS Biol 20(4): e3001627. https://doi.org/10.1371/journal.pbio.3001627
Editor: Ben Seymour, University of Cambridge, UNITED KINGDOM
Received: October 19, 2021; Accepted: April 11, 2022; Published: April 29, 2022
Copyright: © 2022 Benkarim et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: The imaging and phenotypic data were provided, in part, by the Autism Brain Imaging Data Exchange initiative (ABIDE-I and II; https://fcon_1000.projects.nitrc.org/indi/abide) and the Healthy Brain Network (https://fcon_1000.projects.nitrc.org/indi/cmi_healthy_brain_network). All imaging features, phenotypic data, and code are openly available at https://github.com/OualidBenkarim/ps_diversity. The data underlying all main and supplementary figures can be found in S1 Data.xlsx.
Funding: OB was funded by a Healthy Brains for Healthy Lives (HBHL) postdoctoral fellowship and is a member of the Quebec Autism Research Training (QART) program. BCB acknowledges research support from the National Science and Engineering Research Council of Canada (NSERC Discovery-1304413), the Canadian Institutes of Health Research (CIHR FDN-154298), SickKids Foundation (NI17-039), Azrieli Center for Autism Research (ACAR-TACC), BrainCanada (Azrieli Future Leaders), and the Tier-2 Canada Research Chairs program. DB was supported by US NIH grant R01AG068563A and the Canadian Institutes of Health Research project grant 438531. DB was also supported by the Healthy Brains Healthy Lives initiative (Canada First Research Excellence fund), Google (Research Award, Teaching Award), and by the CIFAR Artificial Intelligence Chairs program (Canada Institute for Advanced Research). BTTY was supported by the Singapore National Research Foundation (NRF) Fellowship (Class of 2017), the NUS Yong Loo Lin School of Medicine (NUHSRO/2020/124/TMR/LOA) and the Singapore National Medical Research Council (NMRC) LCG (OFLCG19May-0035) and STaR (STaR20nov-0003). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: The authors have declared that no competing interests exist.
Abbreviations: ABIDE, Autism Brain Imaging Data Exchange; ADHD, attention-deficit/hyperactivity disorder; ADI-R, Autism Diagnostic Interview-Revised; ADOS, Autism Diagnostic Observation Schedule; ANOVA, analysis of variance; ANX, anxiety; ASD, autism spectrum disorder; AUC, area under the receiver operating characteristic curve; CBIC, CitiGroup Corcell Brain Imaging Center; CV, cross-validation; FN, false negatives; FWHM, full width at half maximum; HBN, Healthy Brain Network; i.i.d., independent and identically distributed; NYU, New York University Langone Medical Center; PITT, University of Pittsburgh, School of Medicine; rs-fMRI, resting-state functional MRI; RU, Rutgers University Brain Imaging Center; SI, Staten Island; SMD, standardized mean difference; TCD, Trinity Centre for Health Sciences, Trinity College Dublin; TD, typically developing; TN, true negatives; TP, true positives; T1w, T1-weighted; USM, University of Utah, School of Medicine.