Enhancing Medical Diagnostics with Ensemble Machine Learning: A Comparative Study of Gradient Boosting, XGBoost, LightGBM, and Blended Models
(1) Universitas Katolik Indonesia Atma Jaya, Indonesia
(*) Corresponding Author
Abstract
This research investigates the performance of various machine learning models, including Gradient Boosting, AdaBoost, Support Vector Machine (SVM), Logistic Regression, XGBoost, LightGBM, and a Blended Model, in the context of medical diagnostics. The objective of the study is to identify the most accurate and reliable model for predicting outcomes, particularly in cases where the accurate identification of positive instances is critical. The research employs a systematic evaluation using cross-validation and test accuracy metrics to assess each model's performance. Results indicate that ensemble methods, such as Gradient Boosting, XGBoost, and LightGBM, generally outperform simpler models. LightGBM achieved the highest cross-validation accuracy at 89.10%, while the Blended Model demonstrated the potential of combining multiple classifiers, achieving a cross-validation accuracy of 90.19%. However, a common challenge across all models was balancing precision and recall for the positive class, suggesting the need for further optimization. The study concludes that while advanced ensemble methods show promise, enhancing the models' sensitivity to positive cases is crucial for improving their applicability in medical diagnostics. Future research should focus on refining these models to achieve a better balance between precision and recall, ensuring that critical cases are not overlooked.
Full Text:
PDFReferences
A. A. Choudhury and V. D. Rajeswari, “Gestational diabetes mellitus-A metabolic and reproductive disorder,” Biomed. & Pharmacother., vol. 143, p. 112183, 2021.
A. Ornoy, M. Becker, L. Weinstein-Fudim, and Z. Ergaz, “Diabetes during pregnancy: a maternal disease complicating the course of pregnancy with long-term deleterious effects on the offspring. a clinical review,” Int. J. Mol. Sci., vol. 22, no. 6, p. 2965, 2021.
R. A. Shqara, Y. N. Francis, S. Or, L. Lowenstein, and M. F. Wolf, “Obstetrical Outcome following Diagnosis of Gestational Diabetes in the Third Trimester (> 29 Weeks) versus Second Trimester (24--28 Weeks): A Retrospective Comparative Study,” Am. J. Perinatol., 2022.
A. Preda et al., “Analysis of maternal and neonatal complications in a group of patients with gestational diabetes mellitus,” Medicina (B. Aires)., vol. 57, no. 11, p. 1170, 2021.
A. Thakur, S. Agrawal, S. Chakole, and B. Wandile, “A Critical Review of Diagnostic Strategies and Maternal Offspring Complications in Gestational Diabetes Mellitus,” Cureus, vol. 15, no. 12, 2023.
M. Zahmatkeshan, S. Zakerabasali, M. Farjam, Y. Gholampour, M. Seraji, and A. Yazdani, “The use of mobile health interventions for gestational diabetes mellitus: a descriptive literature review,” J. Med. Life, vol. 14, no. 2, p. 131, 2021.
M. Javaid, A. Haleem, R. P. Singh, R. Suman, and S. Rab, “Significance of machine learning in healthcare: Features, pillars and applications,” Int. J. Intell. Networks, vol. 3, pp. 58–73, 2022.
P. Khan et al., “Machine learning and deep learning approaches for brain disease diagnosis: principles and recent advances,” Ieee Access, vol. 9, pp. 37622–37655, 2021.
I. H. Sarker, “Deep learning: a comprehensive overview on techniques, taxonomy, applications and research directions,” SN Comput. Sci., vol. 2, no. 6, p. 420, 2021.
M. Shehab et al., “Machine learning in medical applications: A review of state-of-the-art methods,” Comput. Biol. Med., vol. 145, p. 105458, 2022.
N. Wang et al., “Development and validation of risk prediction models for large for gestational age infants using logistic regression and two machine learning algorithms,” J. Diabetes, vol. 15, no. 4, pp. 338–348, 2023.
P. Gyasi-Antwi et al., “Global prevalence of gestational diabetes mellitus: a systematic review and meta-analysis,” New Am. J. Med., vol. 1, no. 3, pp. 1–10, 2020.
S. A. Samadi et al., “Screening children for Autism Spectrum Disorders in low-and middle-income countries: Experiences from the Kurdistan region of Iraq,” Int. J. Environ. Res. Public Health, vol. 19, no. 8, p. 4581, 2022.
A. Paleyes, R.-G. Urma, and N. D. Lawrence, “Challenges in deploying machine learning: a survey of case studies,” ACM Comput. Surv., vol. 55, no. 6, pp. 1–29, 2022.
H. Lu, L. Ehwerhemuepha, and C. Rakovski, “A comparative study on deep learning models for text classification of unstructured medical notes with various levels of class imbalance,” BMC Med. Res. Methodol., vol. 22, no. 1, p. 181, 2022.
F. Arshad, S. Ahmed, A. Amjad, and M. Kabir, “An explainable stacking-based approach for accelerating the prediction of antidiabetic peptides,” Anal. Biochem., vol. 691, p. 115546, 2024.
M. R. Indupalli and others, “A Hybrid Blended Stacking Disease Prediction System Based on Symptoms,” 2023.
S. Ramasamy, H. C. Kantharaju, N. B. Madhavi, and M. P. Haripriya, “8 Meta-learning through ensemble approach: bagging, boosting, and random forest strategies,” Towar. Artif. Gen. Intell. Deep Learn. Neural Networks, Gener. AI, p. 167, 2023.
Y. Gao, H. Hasegawa, Y. Yamaguchi, and H. Shimada, “Gradient Boosting Decision Tree Ensemble Learning for Malware Binary Classification,” 2020.
DOI: http://dx.doi.org/10.30645/jurasik.v9i2.842
DOI (PDF): http://dx.doi.org/10.30645/jurasik.v9i2.842.g817
Refbacks
- There are currently no refbacks.
JURASIK (Jurnal Riset Sistem Informasi dan Teknik Informatika)
Published Papers Indexed/Abstracted By: