Penerapan Data Mining dalam Analisis Prediksi Kanker Paru Menggunakan Algoritma Random Forest

  • Laura Sari Politeknik Negeri Cilacap
  • Annisa Romadloni Politeknik Negeri Cilacap
  • Rostika Listyaningrum Politeknik Negeri Cilacap
Abstract views: 1922 , PDF downloads: 2383
Keywords: data mining, random forest, prediction, naïve bayes

Abstract

Cancer is the second highest cause of death in the world. In Indonesia, it is a disease with a high mortality rate. Most patients do not realize that they have lung cancer thus the treatment is sometimes too late. A prediction method with a high degree of accuracy is needed to detect lung cancer earlier. Previous research used data mining calcification methods with the Naïve Bayes algorithm to predict lung cancer. This research resulted in high recall values for the positive class (Yes class) but low for the negative class (No class). This research was made using the Random Forest algorithm which is known to have good performance. The modeling is optimized by applying the K-fold Cross Validation technique. The Random Forest algorithm produces a higher Accuracy value than the Naïve Bayes algorithm, which is 98.4%. This algorithm produces 100% Recall for the positive class, 80% for the negative class and provides a 100% correct prediction as can be seen from the AUC value of 1. Although a statistical test with a significance level of 5% shows the results of the two algorithms are not significantly different.

References

J. Braithwaite, “What Is Cancer?,” in The Lancet, vol. 131, no. 3383, 1888, pp. 1287–1289. doi: 10.1016/S0140-6736(02)16666-9.

H. Sung et al., “Global Cancer Statistics 2020: GLOBOCAN Estimates of Incidence and Mortality Worldwide for 36 Cancers in 185 Countries,” CA Cancer J Clin, vol. 71, no. 3, pp. 209–249, 2021, doi: 10.3322/caac.21660.

“The Global Cancer Observatory,” 2020. https://gco.iarc.fr/today/data/factsheets/cancers/15-Lung-fact-sheet.pdf (accessed Jan. 13, 2023).

S. Sugiharto, R. A. Putri, S. Simanjuntak, and O. Larissa, “Kanker Paru, Faktor Resiko Dan Pencegahannya,” in Seminar Nasional Hasil Penelitian dan Pengabdian Kepada Masyarakat (SENAPENMAS), 2021.

S. R. Rahmadania, “Fakta-fakta Hari Kanker Sedunia 2022, Dirayakan Setiap Tanggal 4 Februari,” detikHealth, 2022. https://health.detik.com/berita-detikhealth/d-5925795/fakta-fakta-hari-kanker-sedunia-2022-dirayakan-setiap-tanggal-4-februari (accessed Jan. 13, 2023).

I. W. Gamadarenda and I. Waspada, “Implementasi Data Mining Untuk Deteksi Penyakit Ginjal Kronis ( Pgk ) Menggunakan K-Nearest Neighbor ( Knn ) Dengan Backward Data Mining Implementation For Detection Of Chronic Kidney ( Ckd ) Using K-Nearest Neighbor ( Knn ) With Backward Elimination,” Jurnal Teknologi Informasi dan Ilmu Komputer (JTIIK), vol. 7, no. 2, pp. 417–426, 2020, doi: 10.25126/jtiik.202071896.

I. Amal, “Klasifikasi Menggunakan Naive Bayes, Decision Tree, dan Random Forest,” 2021. https://rstudio-pubs-static.s3.amazonaws.com/717459_5136236cf5064b8d973e4d8c1b863943.html#5_Cross_validation (accessed Jan. 13, 2023).

A. I. Kusumarini, P. A. Hogantara, M. Fadhlurohman, and S. Kom. , M. K. Nurul Chamidah, “Perbandingan Algoritma Random Forest, Naive Bayes, Dan Decision Tree Dengan Oversampling Untuk Klasifikasi Bakteri E.Coli,” Prosiding Seminar Nasional Mahasiswa Bidang Ilmu Komputer dan Aplikasinya, vol. 2, no. 1, pp. 792–799, 2021.

B. Bawono and R. Wasono, “Perbandingan Metode Random Forest dan Naive Bayes,” Jurnal Sains dan Sistem Informasi, vol. 3, no. 7, pp. 343–348, 2019, [Online]. Available: http://prosiding.unimus.ac.id

G. M. Momole and E. Mailoa, “Perbandingan Naïve Bayes Dan Random Forest Dalam Klasifikasi Bahasa Daerah,” vol. 9, no. 2, pp. 855–863, 2022.

R. Leonardo, J. Pratama, and Chrisnatalis, “Perbandingan Metode Random Forest Dan Naïve Bayes Dalam Prediksi Keberhasilan Klien Telemarketing,” vol. 3, pp. 455–459, 2020.

S. Amaliah and M. Nusrang, “Penerapan Metode Random Forest Untuk Klasifikasi Varian Minuman Kopi Di Kedai Kopi Konijiwa Bantaeng,” Variansi: Journal of Statistic and Its Application on Teaching and Research, vol. 4, no. 2, pp. 121–127, 2022, doi: 10.35580/variansiunm31.

D. H. Depari et al., “Perbandingan Model Decision Tree , Naive Bayes dan Random Forest untuk Prediksi Klasifikasi Penyakit Jantung,” vol. 4221, pp. 239–248, 2022.

P. Sejati et al., “Studi Komparasi Naive Bayes , K-Nearest Neighbor , Dan Random Forest Untuk Prediksi Calon Mahasiswa Yang Diterima Atau Comparative Study Of Naive Bayes , K-Nearest Neighbor , And Random Forest For The Prediction Of Prospective Students,” Jurnal Teknologi Informasi dan Ilmu Komputer (JTIIK), vol. 9, no. 7, pp. 1341–1348, 2022, doi: 10.25126/jtiik.202296737.

Ramadani and B. H. Hayadi, “Perbandingan Metode Naive Bayes Dan Random Forest Untuk Menentukan Prestasi Belajar Siswa Pada Jurusan RPL (Studi Kasus SMK Swasta Siti Banun Sigambal),” Journal Computer Science and Information Technology(JCoInT) Program Studi Teknologi Informasi, no. 2, p. 2022, 2022, [Online]. Available: http://jurnal.ulb.ac.id/index.php/JCoInT/index

D. Dablain, B. Krawczyk, and N. v. Chawla, “DeepSMOTE: Fusing Deep Learning and SMOTE for Imbalanced Data,” IEEE Trans Neural Netw Learn Syst, pp. 1–14, 2022, doi: 10.1109/TNNLS.2021.3136503.

V. Nugraha, “Menghadapi Imblanced Target Variable dengan SMOTE,” RPubs, 2021. https://rpubs.com/VicNP/UBL-SmoteClassif (accessed Jan. 16, 2023).

V. Rupapara, F. Rustam, H. F. Shahzad, A. Mehmood, I. Ashraf, and G. S. Choi, “Impact of SMOTE on Imbalanced Text Features for Toxic Comments Classification Using RVVC Model,” IEEE Access, vol. 9, pp. 78621–78634, 2021, doi: 10.1109/ACCESS.2021.3083638.

D. Mualfah, W. Fadila, and R. Firdaus, “Teknik SMOTE untuk Mengatasi Imbalance Data pada Deteksi Penyakit Stroke Menggunakan Algoritma Random Forest,” Jurnal CoSciTech (Computer Science and Information Technology), vol. 3, no. 2, pp. 107–113, 2022, doi: 10.37859/coscitech.v3i2.3912.

A. N. Kasanah, M. Muladi, and U. Pujianto, “Penerapan Teknik SMOTE untuk Mengatasi Imbalance Class dalam Klasifikasi Objektivitas Berita Online Menggunakan Algoritma KNN,” Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi), vol. 3, no. 2, pp. 196–201, 2019, doi: 10.29207/resti.v3i2.945.

A. Ishaq et al., “Improving the Prediction of Heart Failure Patients’ Survival Using SMOTE and Effective Data Mining Techniques,” IEEE Access, vol. 9, pp. 39707–39716, 2021, doi: 10.1109/ACCESS.2021.3064084.

F. Putri, Sanni Ucha; Irawan, Eka; Rizky, “Implementasi Data Mining Untuk Prediksi Penyakit Diabetes,” KESATRIA: Jurnal Penerapan Sistem Informasi (Komputer & Manajemen), vol. 2, no. 1, pp. 39–46.

D. Alita and A. Rahman, “Pendeteksian Sarkasme pada Proses Analisis Sentimen Menggunakan Random Forest Classifier,” Jurnal Komputasi, vol. 8, no. 2, pp. 50–58, 2020, doi: 10.23960/komputasi.v8i2.2615.

Y. Yuliani, “Algoritma Random Forest Untuk Prediksi Kelangsungan Hidup Pasien Gagal Jantung Menggunakan Seleksi Fitur Bestfirst,” vol. 5, no. 2, pp. 298–306, 2022.

K. Pal and B. v. Patel, “Data Classification with k-fold Cross Validation and Holdout Accuracy Estimation Methods with 5 Different Machine Learning Techniques,” in 2020 Fourth International Conference on Computing Methodologies and Communication (ICCMC), 2020, pp. 83–87. doi: 10.1109/ICCMC48092.2020.ICCMC-00016.

K. Phinzi, D. Abriha, and S. Szabó, “Classification efficacy using k-fold cross-validation and bootstrapping resampling techniques on the example of mapping complex gully systems,” Remote Sens (Basel), vol. 13, no. 15, 2021, doi: 10.3390/rs13152980.

M. Madanan, A. Venugopal, and N. C. Velayudhan, “Applying an optimal feature ranking and selection algorithm and random forest classifier algorithm along with k-fold cross validation for classification of blood cancer cells,” European Journal of Molecular and Clinical Medicine, vol. 7, no. 11, pp. 774–789, 2020, [Online]. Available: https://www.embase.com/search/results?subaction=viewrecord&id=L2010514747&from=export

PlumX Metrics

Published
2023-01-29