Penerapan Data Mining dalam Analisis Prediksi Kanker Paru Menggunakan Algoritma Random Forest
Abstract
Cancer is the second highest cause of death in the world. In Indonesia, it is a disease with a high mortality rate. Most patients do not realize that they have lung cancer thus the treatment is sometimes too late. A prediction method with a high degree of accuracy is needed to detect lung cancer earlier. Previous research used data mining calcification methods with the Naïve Bayes algorithm to predict lung cancer. This research resulted in high recall values for the positive class (Yes class) but low for the negative class (No class). This research was made using the Random Forest algorithm which is known to have good performance. The modeling is optimized by applying the K-fold Cross Validation technique. The Random Forest algorithm produces a higher Accuracy value than the Naïve Bayes algorithm, which is 98.4%. This algorithm produces 100% Recall for the positive class, 80% for the negative class and provides a 100% correct prediction as can be seen from the AUC value of 1. Although a statistical test with a significance level of 5% shows the results of the two algorithms are not significantly different.
References
J. Braithwaite, “What Is Cancer?,” in The Lancet, vol. 131, no. 3383, 1888, pp. 1287–1289. doi: 10.1016/S0140-6736(02)16666-9.
H. Sung et al., “Global Cancer Statistics 2020: GLOBOCAN Estimates of Incidence and Mortality Worldwide for 36 Cancers in 185 Countries,” CA Cancer J Clin, vol. 71, no. 3, pp. 209–249, 2021, doi: 10.3322/caac.21660.
“The Global Cancer Observatory,” 2020. https://gco.iarc.fr/today/data/factsheets/cancers/15-Lung-fact-sheet.pdf (accessed Jan. 13, 2023).
S. Sugiharto, R. A. Putri, S. Simanjuntak, and O. Larissa, “Kanker Paru, Faktor Resiko Dan Pencegahannya,” in Seminar Nasional Hasil Penelitian dan Pengabdian Kepada Masyarakat (SENAPENMAS), 2021.
S. R. Rahmadania, “Fakta-fakta Hari Kanker Sedunia 2022, Dirayakan Setiap Tanggal 4 Februari,” detikHealth, 2022. https://health.detik.com/berita-detikhealth/d-5925795/fakta-fakta-hari-kanker-sedunia-2022-dirayakan-setiap-tanggal-4-februari (accessed Jan. 13, 2023).
I. W. Gamadarenda and I. Waspada, “Implementasi Data Mining Untuk Deteksi Penyakit Ginjal Kronis ( Pgk ) Menggunakan K-Nearest Neighbor ( Knn ) Dengan Backward Data Mining Implementation For Detection Of Chronic Kidney ( Ckd ) Using K-Nearest Neighbor ( Knn ) With Backward Elimination,” Jurnal Teknologi Informasi dan Ilmu Komputer (JTIIK), vol. 7, no. 2, pp. 417–426, 2020, doi: 10.25126/jtiik.202071896.
I. Amal, “Klasifikasi Menggunakan Naive Bayes, Decision Tree, dan Random Forest,” 2021. https://rstudio-pubs-static.s3.amazonaws.com/717459_5136236cf5064b8d973e4d8c1b863943.html#5_Cross_validation (accessed Jan. 13, 2023).
A. I. Kusumarini, P. A. Hogantara, M. Fadhlurohman, and S. Kom. , M. K. Nurul Chamidah, “Perbandingan Algoritma Random Forest, Naive Bayes, Dan Decision Tree Dengan Oversampling Untuk Klasifikasi Bakteri E.Coli,” Prosiding Seminar Nasional Mahasiswa Bidang Ilmu Komputer dan Aplikasinya, vol. 2, no. 1, pp. 792–799, 2021.
B. Bawono and R. Wasono, “Perbandingan Metode Random Forest dan Naive Bayes,” Jurnal Sains dan Sistem Informasi, vol. 3, no. 7, pp. 343–348, 2019, [Online]. Available: http://prosiding.unimus.ac.id
G. M. Momole and E. Mailoa, “Perbandingan Naïve Bayes Dan Random Forest Dalam Klasifikasi Bahasa Daerah,” vol. 9, no. 2, pp. 855–863, 2022.
R. Leonardo, J. Pratama, and Chrisnatalis, “Perbandingan Metode Random Forest Dan Naïve Bayes Dalam Prediksi Keberhasilan Klien Telemarketing,” vol. 3, pp. 455–459, 2020.
S. Amaliah and M. Nusrang, “Penerapan Metode Random Forest Untuk Klasifikasi Varian Minuman Kopi Di Kedai Kopi Konijiwa Bantaeng,” Variansi: Journal of Statistic and Its Application on Teaching and Research, vol. 4, no. 2, pp. 121–127, 2022, doi: 10.35580/variansiunm31.
D. H. Depari et al., “Perbandingan Model Decision Tree , Naive Bayes dan Random Forest untuk Prediksi Klasifikasi Penyakit Jantung,” vol. 4221, pp. 239–248, 2022.
P. Sejati et al., “Studi Komparasi Naive Bayes , K-Nearest Neighbor , Dan Random Forest Untuk Prediksi Calon Mahasiswa Yang Diterima Atau Comparative Study Of Naive Bayes , K-Nearest Neighbor , And Random Forest For The Prediction Of Prospective Students,” Jurnal Teknologi Informasi dan Ilmu Komputer (JTIIK), vol. 9, no. 7, pp. 1341–1348, 2022, doi: 10.25126/jtiik.202296737.
Ramadani and B. H. Hayadi, “Perbandingan Metode Naive Bayes Dan Random Forest Untuk Menentukan Prestasi Belajar Siswa Pada Jurusan RPL (Studi Kasus SMK Swasta Siti Banun Sigambal),” Journal Computer Science and Information Technology(JCoInT) Program Studi Teknologi Informasi, no. 2, p. 2022, 2022, [Online]. Available: http://jurnal.ulb.ac.id/index.php/JCoInT/index
D. Dablain, B. Krawczyk, and N. v. Chawla, “DeepSMOTE: Fusing Deep Learning and SMOTE for Imbalanced Data,” IEEE Trans Neural Netw Learn Syst, pp. 1–14, 2022, doi: 10.1109/TNNLS.2021.3136503.
V. Nugraha, “Menghadapi Imblanced Target Variable dengan SMOTE,” RPubs, 2021. https://rpubs.com/VicNP/UBL-SmoteClassif (accessed Jan. 16, 2023).
V. Rupapara, F. Rustam, H. F. Shahzad, A. Mehmood, I. Ashraf, and G. S. Choi, “Impact of SMOTE on Imbalanced Text Features for Toxic Comments Classification Using RVVC Model,” IEEE Access, vol. 9, pp. 78621–78634, 2021, doi: 10.1109/ACCESS.2021.3083638.
D. Mualfah, W. Fadila, and R. Firdaus, “Teknik SMOTE untuk Mengatasi Imbalance Data pada Deteksi Penyakit Stroke Menggunakan Algoritma Random Forest,” Jurnal CoSciTech (Computer Science and Information Technology), vol. 3, no. 2, pp. 107–113, 2022, doi: 10.37859/coscitech.v3i2.3912.
A. N. Kasanah, M. Muladi, and U. Pujianto, “Penerapan Teknik SMOTE untuk Mengatasi Imbalance Class dalam Klasifikasi Objektivitas Berita Online Menggunakan Algoritma KNN,” Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi), vol. 3, no. 2, pp. 196–201, 2019, doi: 10.29207/resti.v3i2.945.
A. Ishaq et al., “Improving the Prediction of Heart Failure Patients’ Survival Using SMOTE and Effective Data Mining Techniques,” IEEE Access, vol. 9, pp. 39707–39716, 2021, doi: 10.1109/ACCESS.2021.3064084.
F. Putri, Sanni Ucha; Irawan, Eka; Rizky, “Implementasi Data Mining Untuk Prediksi Penyakit Diabetes,” KESATRIA: Jurnal Penerapan Sistem Informasi (Komputer & Manajemen), vol. 2, no. 1, pp. 39–46.
D. Alita and A. Rahman, “Pendeteksian Sarkasme pada Proses Analisis Sentimen Menggunakan Random Forest Classifier,” Jurnal Komputasi, vol. 8, no. 2, pp. 50–58, 2020, doi: 10.23960/komputasi.v8i2.2615.
Y. Yuliani, “Algoritma Random Forest Untuk Prediksi Kelangsungan Hidup Pasien Gagal Jantung Menggunakan Seleksi Fitur Bestfirst,” vol. 5, no. 2, pp. 298–306, 2022.
K. Pal and B. v. Patel, “Data Classification with k-fold Cross Validation and Holdout Accuracy Estimation Methods with 5 Different Machine Learning Techniques,” in 2020 Fourth International Conference on Computing Methodologies and Communication (ICCMC), 2020, pp. 83–87. doi: 10.1109/ICCMC48092.2020.ICCMC-00016.
K. Phinzi, D. Abriha, and S. Szabó, “Classification efficacy using k-fold cross-validation and bootstrapping resampling techniques on the example of mapping complex gully systems,” Remote Sens (Basel), vol. 13, no. 15, 2021, doi: 10.3390/rs13152980.
M. Madanan, A. Venugopal, and N. C. Velayudhan, “Applying an optimal feature ranking and selection algorithm and random forest classifier algorithm along with k-fold cross validation for classification of blood cancer cells,” European Journal of Molecular and Clinical Medicine, vol. 7, no. 11, pp. 774–789, 2020, [Online]. Available: https://www.embase.com/search/results?subaction=viewrecord&id=L2010514747&from=export
Copyright (c) 2023 Laura Sari, Annisa Romadloni, Rostika Listiyaningrum
This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).