Prediksi Diabetes menggunakan Metode Ensemble Learning dengan Teknik Soft Voting
Abstract
Diabetes is a chronic disease characterized by high blood glucose levels due to the body's inability to produce or use insulin effectively. This disease is one of the serious global health problems, and it has a significant impact; therefore, early detection is very important. Efforts to overcome this challenge can be made by applying machine learning, which provides a new and effective approach. This study aims to predict diabetes with a higher accuracy level through the Ensemble Learning Soft Voting method. In addition, the data balancing technique using SMOTE is applied to overcome the problem of imbalance in the data set. This study also compares various classification models using Machine Learning algorithms, namely LightGBM, XGBoost, and Random Forest. The test results show that the Random Forest model achieves the highest level of accuracy at 97.20%. In comparison, the Ensemble Learning Soft Voting method that combines the three algorithms has increased the accuracy to 97.74%. This Ensemble Learning approach has proven effective in significantly improving predictions and performing better than a single model.
References
A. Kumar, R. Gangwar, A. Ahmad Zargar, R. Kumar, and A. Sharma, “Prevalence of Diabetes in India: A Review of IDF Diabetes Atlas 10th Edition,” Curr. Diabetes Rev., vol. 20, no. 1, pp. 105–114, Jan. 2024, doi: 10.2174/1573399819666230413094200.
“Global and regional diabetes prevalence estimates for 2019 and projections for 2030 and 2045: Results from the International Diabetes Federation Diabetes Atlas, 9th edition - ScienceDirect.” Accessed: Nov. 14, 2024. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0168822719312306
Kementerian Kesehatan Republik Indonesia, “Prevalensi, Dampak, serta Upaya Pengendalian Hipertensi & Diabetes di Indonesia.” Accessed: Nov. 14, 2024. [Online]. Available: https://p2ptm.kemkes.go.id/uploads/cEdQdm1WVXZuRXhad3FtVXduOW1WUT09/2024/09/factsheet%20PTM%2024aprl07.30.pdf
P. N. Thotad, G. R. Bharamagoudar, and B. S. Anami, “Diabetes disease detection and classification on Indian demographic and health survey data using machine learning methods,” Diabetes Metab. Syndr. Clin. Res. Rev., vol. 17, no. 1, p. 102690, Jan. 2023, doi: 10.1016/j.dsx.2022.102690.
“Use of Machine Learning Approaches in Clinical Epidemiological Research of Diabetes | Current Diabetes Reports.” Accessed: Nov. 14, 2024. [Online]. Available: https://link.springer.com/article/10.1007/s11892-020-01353-5
S. Badillo et al., “An Introduction to Machine Learning,” Clin. Pharmacol. Ther., vol. 107, no. 4, pp. 871–885, 2020, doi: 10.1002/cpt.1796.
H. Habehh and S. Gohel, “Machine Learning in Healthcare,” Curr. Genomics, vol. 22, no. 4, p. 291, Dec. 2021, doi: 10.2174/1389202922666210705124359.
J. Chaki, S. Thillai Ganesh, S. K. Cidham, and S. Ananda Theertan, “Machine learning and artificial intelligence based Diabetes Mellitus detection and self-management: A systematic review,” J. King Saud Univ. - Comput. Inf. Sci., vol. 34, no. 6, Part B, pp. 3204–3225, Jun. 2022, doi: 10.1016/j.jksuci.2020.06.013.
Z. Mushtaq, M. F. Ramzan, S. Ali, S. Baseer, A. Samad, and M. Husnain, “Voting Classification-Based Diabetes Mellitus Prediction Using Hypertuned Machine-Learning Techniques,” Mob. Inf. Syst., vol. 2022, no. 1, p. 6521532, Jan. 2022, doi: 10.1155/2022/6521532.
S. A. Alex, N. Z. Jhanjhi, M. Humayun, A. O. Ibrahim, and A. W. Abulfaraj, “Deep LSTM Model for Diabetes Prediction with Class Balancing by SMOTE,” Electronics, vol. 11, no. 17, Art. no. 17, Jan. 2022, doi: 10.3390/electronics11172737.
I. D. Mienye and Y. Sun, “A Survey of Ensemble Learning: Concepts, Algorithms, Applications, and Prospects,” IEEE Access, vol. 10, pp. 99129–99149, 2022, doi: 10.1109/ACCESS.2022.3207287.
S. Kumari, D. Kumar, and M. Mittal, “An ensemble approach for classification and prediction of diabetes mellitus using soft voting classifier,” Int. J. Cogn. Comput. Eng., vol. 2, pp. 40–46, Jun. 2021, doi: 10.1016/j.ijcce.2021.01.001.
H. B. Kibria, M. Nahiduzzaman, M. O. F. Goni, M. Ahsan, and J. Haider, “An Ensemble Approach for the Prediction of Diabetes Mellitus Using a Soft Voting Classifier with an Explainable AI,” Sensors, vol. 22, no. 19, Art. no. 19, Jan. 2022, doi: 10.3390/s22197268.
T. A. Suchi, Md. A. Rabbi, and Md. A. Layek, “Effective Feature Selection and Soft Voting Classifier based Diabetes Detection Using Machine Learning Approaches,” in 2023 International Conference on Next-Generation Computing, IoT and Machine Learning (NCIM), Jun. 2023, pp. 1–7. doi: 10.1109/NCIM59001.2023.10212616.
M. Amjad, I. Ahmad, M. Ahmad, P. Wróblewski, P. Kamiński, and U. Amjad, “Prediction of Pile Bearing Capacity Using XGBoost Algorithm: Modeling and Performance Evaluation,” Appl. Sci., vol. 12, no. 4, Art. no. 4, Jan. 2022, doi: 10.3390/app12042126.
D. D. Rufo, T. G. Debelee, A. Ibenthal, and W. G. Negera, “Diagnosis of Diabetes Mellitus Using Gradient Boosting Machine (LightGBM),” Diagnostics, vol. 11, no. 9, Art. no. 9, Sep. 2021, doi: 10.3390/diagnostics11091714.
K. Guo, X. Wan, L. Liu, Z. Gao, and M. Yang, “Fault Diagnosis of Intelligent Production Line Based on Digital Twin and Improved Random Forest,” Appl. Sci., vol. 11, no. 16, Art. no. 16, Jan. 2021, doi: 10.3390/app11167733.
S. M. Finda and D. W. Utomo, “Klasifikasi Stunting Balita menggunakan Metode Ensemble Learning dan Random Forest,” Infotekmesin, vol. 15, no. 2, pp. 287–295, Jul. 2024, doi: 10.35970/infotekmesin.v15i2.2326.
M. Saqlain, B. Jargalsaikhan, and J. Y. Lee, “A Voting Ensemble Classifier for Wafer Map Defect Patterns Identification in Semiconductor Manufacturing,” IEEE Trans. Semicond. Manuf., vol. 32, no. 2, pp. 171–182, May 2019, doi: 10.1109/TSM.2019.2904306.
S. W. A. Sherazi, J.-W. Bae, and J. Y. Lee, “A soft voting ensemble classifier for early prediction and diagnosis of occurrences of major adverse cardiovascular events for STEMI and NSTEMI during 2-year follow-up in patients with acute coronary syndrome,” PLOS ONE, vol. 16, no. 6, p. e0249338, Jun. 2021, doi: 10.1371/journal.pone.0249338.
H. Jafarzadeh, M. Mahdianpari, E. Gill, F. Mohammadimanesh, and S. Homayouni, “Bagging and Boosting Ensemble Classifiers for Classification of Multispectral, Hyperspectral and PolSAR Data: A Comparative Evaluation,” Remote Sens., vol. 13, no. 21, Art. no. 21, Jan. 2021, doi: 10.3390/rs13214405.
B. Zhang, Y. Zhang, and X. Jiang, “Feature selection for global tropospheric ozone prediction based on the BO-XGBoost-RFE algorithm,” Sci. Rep., vol. 12, no. 1, p. 9244, Jun. 2022, doi: 10.1038/s41598-022-13498-2.
M. Li, X. Fu, and D. Li, “Diabetes Prediction Based on XGBoost Algorithm,” IOP Conf. Ser. Mater. Sci. Eng., vol. 768, no. 7, p. 072093, Mar. 2020, doi: 10.1088/1757-899X/768/7/072093.
R. Wang et al., “Power System Transient Stability Assessment Based on Bayesian Optimized LightGBM,” in 2019 IEEE 3rd Conference on Energy Internet and Energy System Integration (EI2), Nov. 2019, pp. 263–268. doi: 10.1109/EI247390.2019.9062027.
Y. Sun, H. Zhang, T. Zhao, Z. Zou, B. Shen, and L. Yang, “A New Convolutional Neural Network With Random Forest Method for Hydrogen Sensor Fault Diagnosis,” IEEE Access, vol. 8, pp. 85421–85430, 2020, doi: 10.1109/ACCESS.2020.2992231.
Z. Mushtaq, M. F. Ramzan, S. Ali, S. Baseer, A. Samad, and M. Husnain, “Voting Classification-Based Diabetes Mellitus Prediction Using Hypertuned Machine-Learning Techniques,” Mob. Inf. Syst., vol. 2022, no. 1, p. 6521532, 2022, doi: 10.1155/2022/6521532.
N. Nnamoko and I. Korkontzelos, “Efficient treatment of outliers and class imbalance for diabetes prediction,” Artif. Intell. Med., vol. 104, p. 101815, Apr. 2020, doi: 10.1016/j.artmed.2020.101815.
M. Heydarian, T. E. Doyle, and R. Samavi, “MLCM: Multi-Label Confusion Matrix,” IEEE Access, vol. 10, pp. 19083–19095, 2022, doi: 10.1109/ACCESS.2022.3151048.
Copyright (c) 2025 Hilmi Hanif, Danang Wahyu Utomo
This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).