Reproducibility of Deep Learning in Digital Pathology Whole Slide Image Analysis

Christina Fell, Mahnaz Mohammadi, David Morrison , Ognjen Arandjelovic, Peter Caie, David Harris-Birtill

For a method to be widely adopted in medical research or clinical practice, it needs to be reproducible so that clinicians and regulators can have confidence in its use. Machine learning and deep learning have a particular set of challenges around reproducibility. Small differences in the settings or the data used for training a model can lead to large differences in the outcomes of experiments.

Digital pathology is a rapidly expanding field of medical imaging. Modern digital scanners allow tissue specimens to be captured at high resolutions (up to 160 nm per pixel), referred to as Whole Slide Images (WSIs). Once samples are available digitally, large displays can replace microscopes, collaborations between clinicians can be done remotely, and the augmentation and automation of the assessment procedure become feasible.


When reproducing a paper, some parts of a reported method may be critical to obtaining the same results and confirming the hypothesis and others may be less important. Some reported details may have no effect on the outcome at all. In order to come up with a check list that can help reproducibility in digital pathology, we replicated each paper independently and identified any missing information that would effect replication.

In this section we discuss two things, firstly the problems that happen when information is missing, and secondly the impact of this missing information has on each papers results. In general it should be noted that it is hard to untangle the effect on the results of any individual missing piece of information without extensive experimentation that controls for many other variables. For example, if information about patch extraction and heatmap generation are both missing, it’s not always clear which one is causing differences from the reported results.

Citation: Fell C, Mohammadi M, Morrison D, Arandjelovic O, Caie P, Harris-Birtill D (2022) Reproducibility of deep learning in digital pathology whole slide image analysis. PLOS Digit Health 1(12): e0000145.

Editor: Sanjay Aneja, Yale School of Medicine: Yale University School of Medicine, UNITED STATES

Received: May 13, 2022; Accepted: October 13, 2022; Published: December 2, 2022.

Copyright: © 2022 Fell et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: This study is conducted on data from the Camelyon 16 and 17 grand challenges. This data is available on the grand challenges website at and is made available under CC0.

Funding: This work is supported by the Industrial Centre for AI Research in digital Diagnostics (iCAIRD) which is funded by Innovate UK on behalf of UK Research and Innovation (UKRI) [project number: 104690], and in part by Chief Scientist Office, Scotland. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests: The authors have declared that no competing interests exist.