Synthetic Data in Health Care: A Narrative Review

Aldren Gonzales, Guruprabha Guruswamy, Scott R. Smith

Data are central to research, public health, and in developing health information technology (IT) systems. Nevertheless, access to most data in health care is tightly controlled, which may limit innovation, development, and efficient implementation of new research, products, services, or systems. Using synthetic data is one of the many innovative ways that can allow organizations to share datasets with broader users.

Data play a significant role in advancing health care delivery, public health, research, and innovations to address barriers and improve the quality of care. When researchers and innovators have timely access to real-world data, it can inform the development of new treatment, promote evidence-based policymaking, advance program evaluation, and transform outbreak responses. However, users continue to face different challenges to accessing original data.


We conducted a narrative review of existing literature using PubMed and Scopus. The narrative review method was used to enable a thematic analysis of the different use cases. The review was initially limited to peer-reviewed articles. These articles were identified by conducting an abstract/title search with the following terms: synthetic AND data OR dataset AND healthcare OR health care.


The use cases presented in this paper highlight the utility and the value of synthetic data in health care. Considering what was covered in the review, synthetic data can address three challenges in health care data. The first is protecting the privacy of individuals and ensuring the confidentiality of records. Because synthetic data can be composed purely or mixed with “fake” data, it is harder to re-identify the records.

Citation: Gonzales A, Guruswamy G, Smith SR (2023) Synthetic data in health care: A narrative review. PLOS Digit Health 2(1): e0000082.

Editor: Alistair Johnson, SickKids: The Hospital for Sick Children, CANADA

Received: June 29, 2022; Accepted: December 6, 2022; Published: January 6, 2023.

Copyright: This is an open access article, free of all copyright, and may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose. The work is made available under the Creative Commons CC0 public domain dedication.

Data Availability: All data are in the manuscript.

Funding: The authors received no specific funding for this work.

Competing interests: The authors have declared that no competing interests exist.