A multiple instance learning approach for detecting COVID-19 in peripheral blood smears

Colin L. Cooke, Kanghyun Kim, Shiqi Xu, Amey Chaware, Xing Yao, Xi Yang, Jadee Neff, Patricia Pittman, Chad McCall, Carolyn Glass, Xiaoyin Sara Jiang, Roarke Horstmeyer

Abstract
A wide variety of diseases are commonly diagnosed via the visual examination of cell morphology within a peripheral blood smear. For certain diseases, such as COVID-19, morphological impact across the multitude of blood cell types is still poorly understood. In this paper, we present a multiple instance learning-based approach to aggregate high-resolution morphological information across many blood cells and cell types to automatically diagnose disease at a per-patient level. We integrated image and diagnostic information from across 236 patients to demonstrate not only that there is a significant link between blood and a patient’s COVID-19 infection status, but also that novel machine learning approaches offer a powerful and scalable means to analyze peripheral blood smears.

Introduction
The analysis of blood cell morphology plays a critical role in hematology to diagnose and understand various diseases [1]. A key tool for blood cell morphology assessment is the light microscope, which is often applied to examine peripheral blood smears (PBS) [2]. In a typical procedure, a physician will visually examine white and red blood cells within a PBS on a glass slide at high microscope magnification (usually 100×). The nature of visual examination at high resolution limits the observable field-of-view (FOV) to contain just a few white and red blood cells at a time, making analysis of multiple cells challenging and time consuming. Digital microscopes [3] have emerged as an effective alternative to manual analysis. By automating the scanning process and presenting digitized images of PBSs to physicians on a computer, such digital microscopes are quickly becoming the predominate method of PBS analysis.

Results & Discussion
We investigated the diagnostic potential of PBS images for COVID-19 infection through a partnership with the Duke University Medical Center. Over a five-month period (April 2020—August 2020) we collected digital PBS image data from 236 patients, 53% of whom tested positive for COVID-19 by a separately administered PCR test. We denote this group as the Standard cohort. No other patient information was collected for this cohort. In addition to the Standard cohort, we collected PBS image data from 40 additional patients admitted to the medical intensive care unit who presented with acute respiratory illness but were confirmed to be COVID-19 negative using the same PCR testing method. We denote this group as the Challenge cohort.

Methods
We collected digital anonymized PBS images from patients at the Duke Medical Center (IRB Protocol 00105472). We preserved patient anonymity by only collecting PBS image data and COVID-19 infection status within the standard group of tested patients. The patients from the challenge group were selected by collecting PBS data from patients admitted to the medical intensive care unit with acute respiratory illness (from pneumonia or other acute respiratory failures) who tested negative for COVID-19. While used for cohort formation, this diagnostic information was not present during analysis

Discussion
In this paper, we present a MIL-based method to accurately diagnose the COVID-19 disease at a per-patient level from high-resolution morphological information across many blood cells and cell types. Besides the final aggregated decision, the proposed attention mechanism also provides cell-type importance, which can help pathologists to build valuable insights on which cell types are more diagnostically relevant. Moreover, by evaluating how different perturbations to our image dataset can affect the diagnosis results, we also studied which morphological features are more critical to the screening, opening a window into improving the explainability of machine learning approaches.

Citation: Cooke CL, Kim K, Xu S, Chaware A, Yao X, Yang X, et al. (2022) A multiple instance learning approach for detecting COVID-19 in peripheral blood smears. PLOS Digit Health 1(8): e0000078. https://doi.org/10.1371/journal.pdig.0000078

Academic Editor: Dukyong Yoon, Yonsei University College of Medicine, KOREA, REPUBLIC OF

Received: January 24, 2022; Accepted: June 21, 2022; Published: August 19, 2022

Copyright: © 2022 Cooke et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: Due to the conditions and agreements under which this data was collected, no data will be made publicly available, but can be requested at https://irb.duhs.duke.edu/ under IRB Pro00105472-KSP-2.0 - Cell morphology of COVID-19-positive blood smear images. The code used within this work has been made publicly available at: https://github.com/clvcooke/covid-blood.

Funding: This study was funded by a Duke-Coulter Translational Partnership, a fellowship from the Natural Sciences and Engineering Research Council (NSERC) of Canada, and funding from a 3M Nontenured Faculty Award. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests: I have read the journal’s policy and the authors of this manuscript have the following competing interests: RH is the scientific director of Ramona Optics Inc., and RH and AC are co-founders of Airilabs LLC. Both companies are developing novel hardware for microscope imaging.