Optimasi Algoritma K-Nearest Neighbors Menggunakan GridSearchCV untuk Klasifikasi Penyakit Diabetes

  • Ainul Yaqin Universitas Dian Nuswantoro
  • Defri Kurniawan Universitas Dian Nuswantoro
  • Junta Zeniarja Universitas Dian Nuswantoro
Abstract views: 0 ,
Keywords: diabetes, k-nearest neighbors, random over-sampling, gridsearchcv optimization

Abstract

Diabetes is a chronic disease that has a significant impact on global health, with prevalence increasing every year. Therefore, early detection is crucial to prevent further complications and save lives. The utilization of technology, such as machine learning, offers innovative solutions to improve the accuracy of predicting this disease. This research develops a diabetes prediction model using the K-Nearest Neighbors (KNN) algorithm with the Pima Indians Diabetes Database dataset. Given the class imbalance in the dataset, Random Over-Sampling technique was applied to balance the data distribution. The results showed that the KNN model optimized with GridSearchCV resulted in 88% accuracy, 89% precision, 75% recall, and 82% F1-score. This approach is expected to produce a more accurate and efficient model to support early detection of diabetes, and shows the great potential of machine learning technology in improving the effectiveness of disease diagnosis and control.

References

K. D. Prasetio, I. K. Sireegar, and S. Suparmadi, “Sistem Pakar Diagnosa Penyakit Disebabkan Rokok dengan Menggunakan Metode Forward Chaining,” J. Media Inform. Budidarma, vol. 6, no. 4, pp. 2205–2213, Oct. 2022, doi: 10.30865/mib.v6i4.4755.

F. Andika, N. Afriza, A. Husna, N. Rahmi, and F. Safitri, “Edukasi Tentang Isu Permasalahan Kesehatan di Indonesia Bersama Calon Tenaga Kesehatan Masyarakat Provinsi Aceh,” J. Pengabdi. Masy., vol. 4, no. 1, pp. 39–44, 2022.

U. M. Butt, S. Letchmunan, M. Ali, F. H. Hassan, A. Baqir, and H. H. R. Sherazi, “Machine Learning Based Diabetes Classification and Prediction for Healthcare Applications,” J. Healthc. Eng., vol. 2021, pp. 1–17, 2021, doi: 10.1155/2021/9930985.

A. Tanwar and P. K. Bhatia, “A Review on Diabetes Prediction Using Machine Learning Techniques,” Lect. Notes Electr. Eng., vol. 1185, no. 09, pp. 513–524, 2024, doi: 10.1007/978-981-97-1682-1_41.

J. J. Khanam and S. Y. Foo, “A comparison of machine learning algorithms for diabetes prediction,” ICT Express, vol. 7, no. 4, pp. 432–439, 2021, doi: 10.1016/j.icte.2021.02.004.

Y. Setyaji, I. D. Duri, P. Kuniasiwi, and N. A. Putri, “Pengendalian Diabetes Melitus Melalui Edukasi Dan Pemeriksaan Kadar Gula Darah Sewaktu Di Perumahan Roto Kenongo Sewon,” Borneo Community Heal. Serv. J., vol. 3, no. 2, pp. 128–132, 2023.

S. Islam Ayon and M. Milon Islam, “Diabetes Prediction: A Deep Learning Approach,” Int. J. Inf. Eng. Electron. Bus., vol. 11, no. 2, pp. 21–27, 2019, doi: 10.5815/ijieeb.2019.02.03.

E. Ramadanti, D. Aprilya Dinathi, C. Christianskaditya, and D. R. Chandranegara, “Diabetes Disease Detection Classification Using Light Gradient Boosting (LightGBM) With Hyperparameter Tuning,” Sinkron, vol. 8, no. 2, pp. 956–963, 2024, doi: 10.33395/sinkron.v8i2.13530.

B. Kurniawan, A. Ari Aldino, and A. Rahman Isnain, “Sentimen Analisis terhadap Kebijakan Penyelenggara Sistem Elektronik (PSE) Menggunakan Algoritma Bidirectional Encoder Representations from Transformers (Bert),” J. Teknol. dan Sist. Inf., vol. 3, no. 4, pp. 98–106, 2022.

A. Vishwakarma, “a Review: Machine Learning Algorithms,” Data Sci. Pract. Approach with Python R, vol. 9, no. 1, pp. 162–175, 2024, doi: 10.58532/nbennurch299.

E. Retnoningsih and R. Pramudita, “Mengenal Machine Learning Dengan Teknik Supervised Dan Unsupervised Learning Menggunakan Python,” Bina Insa. Ict J., vol. 7, no. 2, pp. 156–165, 2020, doi: 10.51211/biict.v7i2.1422.

F. S. Pamungkas, B. D. Prasetya, and I. Kharisudin, “Perbandingan Metode Klasifikasi Supervised Learning pada Data Bank Customers Menggunakan Python,” Prism. Pros. Semin. Nas. Mat., vol. 3, pp. 692–697, 2020.

UCI Machine Learning, “Pima Indians Diabetes Database,” Kaggle. [Online]. Available: https://www.kaggle.com/datasets/uciml/pima-indians-diabetes-database/data

H. Hasbi and T. B. Sasongko, “Optimasi Performa Random Forest dengan Random Oversampling dan SMOTE pada Dataset Diabetes,” J. Media Inform. Budidarma, vol. 8, no. 3, pp. 1756–1767, 2024, doi: 10.30865/mib.v8i3.7855.

A. Handika Permana, F. Rakhmat Umbara, and F. Kasyidi, “Klasifikasi Penyakit Jantung Tipe Kardiovaskular Menggunakan Adaptive Synthetic Sampling dan Algoritma Extreme Gradient Boosting,” Build. Informatics, Technol. Sci., vol. 6, no. 1, pp. 499–508, 2024.

A. M. Widodo, Y. S. Anggraeni, N. Anwar, A. Ichwani, and B. A. Sekti, “Performansi K-NN, J48, Naive Bayes dan Regresi Logistik sebagai Algoritma Pengklasifikasi Diabetes,” Pros. SISFOTEK, vol. 5, no. 1, pp. 27–33, 2021.

Emad Majeed Hameed and Hardik Joshi, “Improving Diabetes Prediction by Selecting Optimal K and Distance Measures in KNN Classifier,” J. Tech., vol. 6, no. 3, pp. 19–25, 2024, doi: 10.51173/jt.v6i3.2587.

A. Oktaviana, D. P. Wijaya, A. Pramuntadi, and D. Heksaputra, “Prediksi Penyakit Diabetes Melitus Tipe 2 Menggunakan Algoritma K-Nearest Neighbor (K-NN),” MALCOM Indones. J. Mach. Learn. Comput. Sci., vol. 4, no. 3, pp. 812–818, 2024.

N. Rikatsih, M. Anshori, R. Siwi Pradini, and F. Faurika, “K-Nearest Neighbor Method for Early Detection of Diabetes Patients Based on Symptoms and Clinical Data,” Inf. J. Ilm. Bid. Teknol. Inf. dan Komun., vol. 9, no. 2, pp. 187–193, 2024, doi: 10.25139/inform.v9i2.8582.

A. R. Lubis, M. Lubis, and Al-Khowarizmi, “Optimization of distance formula in k-nearest neighbor method,” Bull. Electr. Eng. Informatics, vol. 9, no. 1, pp. 326–338, 2020, doi: 10.11591/eei.v9i1.1464.

Andi, Thamrin, A. Susanto, E. Wijaya, and D. Djohan, “Analysis of the random forest and grid search algorithms in early detection of diabetes mellitus disease,” J. Mantik, vol. 7, no. 2, pp. 2685–4236, 2023.

Smita Panigrahy, “SMOTE-based Deep LSTM System with GridSearchCV Optimization for Intelligent Diabetes Diagnosis,” J. Electr. Syst., vol. 20, no. 7s, pp. 804–815, 2024, doi: 10.52783/jes.3455.

A. Anggrawan and M. Mayadi, “Application of KNN Machine Learning and Fuzzy C-Means to Diagnose Diabetes,” MATRIK J. Manajemen, Tek. Inform. dan Rekayasa Komput., vol. 22, no. 2, pp. 405–418, 2023, doi: 10.30812/matrik.v22i2.2777.

G. Y. Lee, L. Alzamil, B. Doskenov, and A. Termehchy, “A Survey on Data Cleaning Methods for Improved Machine Learning Model Performance,” pp. 1–6, 2021.

R. A. Maula et al., “Handling Missing Value dengan Pendekatan Regresi pada Dataset Akuakultur Berukuran Kecil,” J. Rekayasa Elektr., vol. 18, no. 3, pp. 175–184, 2022, doi: 10.17529/jre.v18i3.25903.

I. Permana and F. N. S. Salisah, “Pengaruh Normalisasi Data Terhadap Performa Hasil Klasifikasi Algoritma Backpropagation,” Indones. J. Inform. Res. Softw. Eng., vol. 2, no. 1, pp. 67–72, 2022.

R. R. Laska and A. M. Yolanda, “A Comparative Study of Z-Score and Min-Max Normalization for Rainfall Classification in Pekanbaru,” J. Data Sci., vol. 2024, no. 1, pp. 1–8, 2024, doi: 10.61453/jods.v2024no04.

B. F. Rochman, A. Rahim, and T. A. Y. Siswa, “Optimasi Algoritma KNN dengan Parameter K dan PSO Untuk Klasifikasi Status Gizi Balita,” J. Media Inform. Budidarma, vol. 8, no. 3, pp. 1609–1616, 2024.

K. Widyatmoko, E. Sugiarto, M. Muslih, and F. Budiman, “Optimasi Metode K-Nearest Neighbor Dengan Particle Swarm Optimization Untuk Pengenalan Citra Batik Dengan Ragam Hias Geometris,” J. Inform. Upgris, vol. 8, no. 1, pp. 1–6, 2022, doi: 10.26877/jiu.v8i1.11705.

A. T. Akbar, R. Husaini, B. M. Akbar, and S. Saifullah, “A proposed method for handling an imbalance data in classification of blood type based on Myers-Briggs type indicator,” J. Teknol. dan Sist. Komput., vol. 8, no. 4, pp. 276–283, 2020, doi: 10.14710/jtsiskom.2020.13625.

T. Wongvorachan, S. He, and O. Bulut, “A Comparison of Undersampling, Oversampling, and SMOTE Methods for Dealing with Imbalanced Classification in Educational Data Mining,” Inf., vol. 14, no. 1, pp. 1–15, 2023.

G. N. Ahmad, H. Fatima, Shafiullah, A. Salah Saidi, and Imdadullah, “Efficient Medical Diagnosis of Human Heart Diseases Using Machine Learning Techniques with and Without GridSearchCV,” IEEE Access, vol. 10, no. August, pp. 80151–80173, 2022, doi: 10.1109/ACCESS.2022.3165792.

J. Nasional, S. Informasi, N. Christina, and T. Linda, “Komparasi Algoritma Naïve Bayes dan Gradient Boosting untuk Prediksi Pasien Diabetes,” vol. 02, pp. 118–125, 2024.

PlumX Metrics

Published
2025-01-04