A practical approach for colorectal cancer diagnosis based on machine learning
Nguyen Hai Minh, Tran Quang Quy, Ngo Duc Tam, Tran Manh Tuan, Le Hoang Son
Abstract
In this paper, we present the results of applying machine learning models to build a Colorectal Cancer Diagnosis system. The methodology encompasses six key steps: collecting raw data from Electronic Medical Records (EMRs), revising feature attributes with expert input, data preprocessing, model adaptation, training machine learning models (CART, Random Forest, and XGBOOST), and evaluating the results.
Introduction
Colorectal colon, with the third highest diagnosis rate, is the second dangerous cancer in Viet Nam. Colorectal cancer constitutes a substantial public health challenge, particularly among men. Originating from malignant cells within the rectum, a segment of the large intestine, colorectal cancer progresses through distinct stages, often asymptomatically during its early phases.
Materials
Data of an Electronic Medical Record (EMR) is a crucial component in managing patient information and providing efficient healthcare in a hospital. The issue of using EMR data for machine learning models to support physicians in diagnosis is an important and meaningful matter. This study does not involve any human or animal participation.
Methods
Based on the data collected and through the preprocessing steps in Section 2 and also results of the analysis on previous studies of the machine learning models, including CART, Random Forest, XGBOOTS in Section 1.
Results and Discussion
For data processing, analysis, and visualization, the following libraries were applied: PANDAS, SKLEARN, XLSXWRITER, MATH, MATPLOTLIB, and PYVI using an Asus laptop with an Intel Core i5-10300H processor, 8GB RAM, and the Ubuntu 20.04 operating system.
Conclusions
In this paper, a real dataset from a hospital is collected and preprocessed. Apart from that, a novel method is proposed. This model combines unified academic algorithms to support Colorectal cancer prediction.
Citation: Hai Minh N, Quy TQ, Tam ND, Tuan TM, Son LH (2025) A practical approach for colorectal cancer diagnosis based on machine learning. PLoS One 20(4): e0321009. https://doi.org/10.1371/journal.pone.0321009
Editor: Jie Zhang, Newcastle University, UNITED KINGDOM OF GREAT BRITAIN AND NORTHERN IRELAND
Received: October 17, 2024; Accepted: February 27, 2025; Published: April 29, 2025
Copyright: © 2025 Hai Minh et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: All relevant data are within the paper.
Funding: This research was sponsored by the Ministry of Education and Training project, “Research on Application of Machine Learning Model in Analysis Electronic Medical Records Gastrointestinal Disease”, B2022-TNA-24.
Competing interests: The authors have declared that no competing interests exist.