Perbandingan Random Forest dan K-Nearest Neighbors untuk Klasifikasi Body Mass Index Menggunakan SMOTE-ENN untuk Mengatasi Ketidakseimbangan Data pada Analisis Kesehatan

  • Naufal Yogi Aptana Universitas Amikom Purwokerto
  • Ali Nur Ikhsan
  • Wiga Maulana Baihaqi Program Studi Informatika, Universitas Amikom Purwokerto
  • Chyntia Raras Ajeng Widiawati Program Studi Informatika, Universitas Amikom Purwokerto
Abstract views: 14 ,
Keywords: body mass index, random forest, k-nearest neighbors, SMOTE-ENN

Abstract

This study aims to compare the Random Forest and K-Nearest Neighbors (KNN) algorithms in Body Mass Index (BMI) classification using the SMOTE-ENN method to address data imbalance. The dataset consists of 2111 entries with demographic and health attributes of individuals. Data imbalance poses a significant challenge that may affect the accuracy of machine learning models. The SMOTE-ENN combination was employed to improve data distribution, enabling models to recognize patterns in minority classes better. Key evaluation factors included both algorithms' accuracy, precision, recall, and F1-score. Results indicate that the Random Forest algorithm achieved higher performance with 100% accuracy than KNN with 96% after applying SMOTE-ENN. These findings highlight the unique contribution of SMOTE-ENN in handling imbalanced data, enhancing classification model quality, and significantly impacting machine learning applications in healthcare.

References

R. Hidayat, R. Rismayeti, and V. Amelia, “Literasi Kesehatan Dalam Pencegahan Covid-19 (Studi Kasus Unilak),” J. FPPTI, vol. 1, no. 2, pp. 1–9, 2022, doi: 10.59239/jfppti.v1i2.9.

A. S. Marsanti et al., “Pentingnya Penyuluhan Pemeriksaan Kesehatan Rutin pada Lansia dalam Upaya Peningkatan GERMAS di Desa Tapak,” APMa J. Pengabdi. Masy., vol. 3, no. 1, pp. 57–63, 2023, doi: 10.47575/apma.v3i1.387.

J. Yu, X. Han, H. Wen, J. Ren, and L. Qi, “Better Dietary Knowledge and Socioeconomic Status,” Nutrients, vol. 12, no. 4, pp. 1–15, 2020, doi: https://doi.org/10.3390/nu12041197.

F. Shiely and S. R. Millar, “BMI self-selection: Exploring alternatives to self-reported BMI,” Res. Methods Med. Heal. Sci., vol. 2, no. 3, pp. 112–122, 2021, doi: 10.1177/26320843211010061.

I. F. Hawari et al., “Pengaruh Teknik Oversampling Pada Algoritma Machine Learning Dalam Klasifikasi Body Mass Index (BMI),” J. Ris. dan Apl. Mat., vol. 08, no. 01, pp. 51–68, 2024, [Online]. Available: https://journal.unesa.ac.id/index.php/jram/article/view/29199

G. Delnevo, G. Mancini, M. Roccetti, P. Salomoni, E. Trombini, and F. Andrei, “The prediction of body mass index from negative affectivity through machine learning: A confirmatory study,” Sensors, vol. 21, no. 7, pp. 1–13, 2021, doi: 10.3390/s21072361.

J. Zhou, J. Yang, H. Sun, Y. Liu, and X. Liu, “The Influence of Entrepreneurial Cognition on Business Model Innovation: A Hybrid Method Based on Multiple Regressions and Machine Learning,” Front. Psychol., vol. 12, no. November, pp. 1–16, 2021, doi: 10.3389/fpsyg.2021.744237.

J. Peregrin-Alvarez, “Reinventing the Body Mass Index: A Machine Learning Approach,” medRxiv, vol. 4, no. 26, pp. 1–9, 2024, doi: https://doi.org/10.1101/2024.04.26.24306457.

H. Sujaini, “Image Classification of Tourist Attractions with K-Nearest Neighbor, Logistic Regression, Random Forest, and Support Vector Machine,” Int. J. Adv. Sci. Eng. Inf. Technol., vol. 10, no. 6, pp. 2207–2212, 2020, doi: 10.18517/ijaseit.10.6.9098.

J. Prasetya and A. Abdurakhman, “Comparison of Smote Random Forest and Smote K-Nearest Neighbors Classification Analysis on Imbalanced Data,” Media Stat., vol. 15, no. 2, pp. 198–208, 2023, doi: 10.14710/medstat.15.2.198-208.

H. Hairani and D. Priyanto, “A New Approach of Hybrid Sampling SMOTE and ENN to the Accuracy of Machine Learning Methods on Unbalanced Diabetes Disease Data,” Int. J. Adv. Comput. Sci. Appl., vol. 14, no. 8, pp. 585–590, 2023, doi: 10.14569/IJACSA.2023.0140864.

J. Wang, “Prediction of postoperative recovery in patients with acoustic neuroma using machine learning and SMOTE-ENN techniques,” Math. Biosci. Eng., vol. 19, no. 10, pp. 10407–10423, 2022, doi: 10.3934/mbe.2022487.

Q. Gao et al., “Identification of Orphan Genes in Unbalanced Datasets Based on Ensemble Learning,” Front. Genet., vol. 11, no. 9, pp. 1–11, 2020, doi: 10.3389/fgene.2020.00820.

Z. Ullah et al., “Detecting High-Risk Factors and Early Diagnosis of Diabetes Using Machine Learning Methods,” Comput. Intell. Neurosci., vol. 22, no. 10, pp. 1–10, 2022, doi: 10.1155/2022/2557795.

M. Alghamdi, M. Al-Mallah, S. Keteyian, C. Brawner, J. Ehrman, and S. Sakr, “Predicting diabetes mellitus using SMOTE and ensemble machine learning approach: The Henry Ford ExercIse Testing (FIT) project,” PLoS One, vol. 12, no. 7, pp. 1–15, 2017, doi: 10.1371/journal.pone.0179805.

X. Wang et al., “Diabetes mellitus early warning and factor analysis using ensemble Bayesian networks with SMOTE-ENN and Boruta,” Sci. Rep., vol. 13, no. 1, pp. 1–15, 2023, doi: 10.1038/s41598-023-40036-5.

J. Haidar, M. Yetubie, H. Kassa, and L. F. Fallon Jr, “Socioeconomic and Demographic Factors Affecting Body Mass Index of Adolescents Students Aged 10-19 in Ambo (a Rural Town) in Ethiopia,” Int. J. Biomed. Sci., vol. 6, no. 4, pp. 321–326, 2010, doi: 10.59566/ijbs.2010.6321.

J. U. Lim et al., “Comparison of World Health Organization and Asia-Pacific body mass index classifications in COPD patients,” Int. J. COPD, vol. 12, pp. 2465–2475, 2017, doi: 10.2147/COPD.S141295.

P. Macek et al., “Assessment of age-induced changes in body fat percentage and bmi aided by bayesian modelling: A cross-sectional cohort study in middle-aged and older adults,” Clin. Interv. Aging, vol. 15, pp. 2301–2311, 2020, doi: 10.2147/CIA.S277171.

D. A. Abdel Hady, O. M. Mabrouk, and T. Abd El-Hafeez, “Employing machine learning for enhanced abdominal fat prediction in cavitation post-treatment,” Sci. Rep., vol. 14, no. 1, pp. 1–22, 2024, doi: 10.1038/s41598-024-60387-x.

D. Mohajan and H. K. Mohajan, “Body Mass Index (BMI) is a Popular Anthropometric Tool to Measure Obesity Among Adults,” J. Innov. Med. Res., vol. 2, no. 4, pp. 25–33, 2023, doi: 10.56397/jimr/2023.04.06.

C. B. Weir and A. Jan, “BMI Classification Percentile And Cut Off Points,” StatPearls, Jun. 2023, Accessed: Nov. 21, 2024. [Online]. Available: https://www.ncbi.nlm.nih.gov/books/NBK541070/

Mutammimul Ula, Ananda Faridhatul Ulva, Ilham Saputra, Mauliza Mauliza, and Ivan Maulana, “Implementation of Machine Learning Using the K-Nearest Neighbor Classification Model in Diagnosing Malnutrition in Children,” Multica Sci. Technol. J., vol. 2, no. 1, pp. 94–99, 2022, doi: 10.47002/mst.v2i1.326.

S. Benbelkacem and B. Atmani, “Random forests for diabetes diagnosis,” 2019 Int. Conf. Comput. Inf. Sci. ICCIS 2019, pp. 1–4, 2019, doi: 10.1109/ICCISci.2019.8716405.

E. Helmud, E. Helmud, F. Fitriyani, and P. Romadiana, “Classification Comparison Performance of Supervised Machine Learning Random Forest and Decision Tree Algorithms Using Confusion Matrix,” J. Sisfokom (Sistem Inf. dan Komputer), vol. 13, no. 1, pp. 92–97, 2024, doi: 10.32736/sisfokom.v13i1.1985.

L. R. Pendrill, J. Melin, A. Stavelin, and G. Nordin, “Modernising Receiver Operating Characteristic (ROC) Curves †,” Algorithms, vol. 16, no. 5, pp. 1–22, 2023, doi: 10.3390/a16050253.

PlumX Metrics

Published
2025-01-04