Improving Cervical Cancer Classification Using ADASYN and Random Forest with GridSearchCV Optimization

  • Resha Mahardhika Saputra Universitas Dian Nuswantoro
  • Farrikh Alzami Universitas Dian Nuswantoro
  • Yuventius Tyas Catur Pramudi Universitas Dian Nuswantoro
  • Lalang Erawan Universitas Dian Nuswantoro
  • Rama Aria Megantara Universitas Dian Nuswantoro
  • Ricardus Anggi Pramunendar Universitas Dian Nuswantoro
  • Moh. Yusuf Universitas Islam Sultan Agung
Abstract views: 0 ,
Keywords: cervical cancer, class imbalance, random forest, ADASYN, gridsearchcv, feature importance

Abstract

Cervical cancer is a leading cause of death among women, with over 300,000 deaths recorded in 2020. This study aims to improve the accuracy of cervical cancer diagnosis classification through a combination of Adaptive Synthetic Sampling (ADASYN) and Random Forest algorithm. The research data was obtained from the Cervical Cancer dataset in the UCI Machine Learning Repository with an imbalanced data distribution of 95% negative class and 5% positive class. ADASYN method was chosen for its ability to handle imbalanced data by focusing on minority data points that are difficult to classify. The Random Forest algorithm was optimized using GridSearchCV to achieve maximum performance. Results show that this combination improved accuracy from 96.5% to 96.8% and recall from 93.7% to 94.3%. Feature importance analysis identified key risk factors such as number of pregnancies, age at first sexual intercourse, and hormonal contraceptive use that significantly influence diagnosis. This research demonstrates the effectiveness of combining ADASYN and Random Forest in enhancing classification performance for early cervical cancer detection.

References

B. E. Blass, “Editorial for Cancer Virtual Issue,” ACS Med. Chem. Lett., vol. 8, no. 12, pp. 1205–1207, Dec. 2017, doi: 10.1021/acsmedchemlett.7b00472.

S. H. Akbar, “Klasifikasi Kanker Serviks Menggunakan Model Convolutional Neural Network Alexnet,” JIKO J. Inform. Dan Komput., vol. 4, no. 1.

B. Manning-Geist, M. A. Grace, and Y. Sonoda, “Trachelectomy and fertility-sparing procedures for early-stage cervical cancer: A state of the science review,” Gynecol. Oncol., vol. 181, pp. 179–182, Feb. 2024, doi: 10.1016/j.ygyno.2024.01.014.

Udin Rosidin, Iceu Amira, and Hendrawati, “Socialization of Cervical Cancer Prevention Program in Panglanjan Hamlet, Cintaratu Village, Pangandaran,” ABDIMAS J. Pengabdi. Masy., vol. 7, no. 3, pp. 930–936, Jul. 2024, doi: 10.35568/abdimas.v7i3.4697.

D. Kupas, A. Hajdu, I. Kovacs, Z. Hargitai, Z. Szombathy, and B. Harangi, “Annotated Pap cell images and smear slices for cell classification,” Sci. Data, vol. 11, no. 1, p. 743, Jul. 2024, doi: 10.1038/s41597-024-03596-3.

K. Kumari et al., “Pap Smear and Colposcopic Examination of the Cervix in Pelvic Inflammatory Disease and other Gynaecological Conditions: A Prospective Analytical Study,” J. Clin. Diagn. Res., 2023, doi: 10.7860/JCDR/2023/60337.17594.

K. Adem, S. Kiliçarslan, and O. Cömert, “Classification and diagnosis of cervical cancer with stacked autoencoder and softmax classification,” Expert Syst. Appl., vol. 115, pp. 557–564, Jan. 2019, doi: 10.1016/j.eswa.2018.08.050.

Z. Zhang, H. Tian, and J. Jin, “Multiple adaptive over-sampling for imbalanced data evidential classification,” Eng. Appl. Artif. Intell., vol. 133, p. 108532, Jul. 2024, doi: 10.1016/j.engappai.2024.108532.

M. M. Ahsan, M. S. Ali, and Z. Siddique, “Enhancing and improving the performance of imbalanced class data using novel GBO and SSG: A comparative analysis,” Neural Netw., vol. 173, p. 106157, May 2024, doi: 10.1016/j.neunet.2024.106157.

N. A. Azhar, M. S. Mohd Pozi, A. Mohamed Din, and A. Jatowt, “An Investigation of SMOTE based Methods for Imbalanced Datasets with Data Complexity Analysis,” IEEE Trans. Knowl. Data Eng., pp. 1–1, 2022, doi: 10.1109/TKDE.2022.3179381.

D. V. Ramadhanti, R. Santoso, and T. Widiharih, “Perbandingan SMOTE dan ADASYN pada data imbalance untuk klasifikasi rumah tangga miskin di Kabupaten Temanggung dengan algoritma K-Nearest Neighbor,” J. Gaussian, vol. 11, no. 4, pp. 499–505, Feb. 2023, doi: 10.14710/j.gauss.11.4.499-505.

S. K. Satapathy, S. Mishra, P. K. Mallick, and G.-S. Chae, “ADASYN and ABC-optimized RBF convergence network for classification of electroencephalograph signal,” Pers. Ubiquitous Comput., vol. 27, no. 3, pp. 1161–1177, Jun. 2023, doi: 10.1007/s00779-021-01533-4.

A. Alhudhaif, “A novel multi-class imbalanced EEG signals classification based on the adaptive synthetic sampling (ADASYN) approach,” PeerJ Comput. Sci., vol. 7, p. e523, May 2021, doi: 10.7717/peerj-cs.523.

S. Alsubai et al., “Privacy Preserved Cervical Cancer Detection Using Convolutional Neural Networks Applied to Pap Smear Images,” Comput. Math. Methods Med., vol. 2023, no. 1, p. 9676206, Jan. 2023, doi: 10.1155/2023/9676206.

I. Hut, B. Jeftic, A. Dragicevic, L. Matija, and D. Koruga, “Computer Aided Diagnostic System For Whole Slide Image Of Liquid Based Cervical Cytology Sample Classification Using Convolutional Neural Network,” Contemp. Mater., vol. 13, no. 2, Oct. 2022, doi: 10.7251/COMEN2202169H.

L. Wong, A. Ccopa, E. Diaz, S. Valcarcel, D. Mauricio, and V. Villoslada, “Deep Learning and Transfer Learning Methods to Effectively Diagnose Cervical Cancer from Liquid-Based Cytology Pap Smear Images,” Int. J. Online Biomed. Eng. IJOE, vol. 19, no. 04, pp. 77–93, Apr. 2023, doi: 10.3991/ijoe.v19i04.37437.

M. Kruczkowski, A. Drabik-Kruczkowska, A. Marciniak, M. Tarczewska, M. Kosowska, and M. Szczerska, “Predictions of cervical cancer identification by photonic method combined with machine learning,” Sci. Rep., vol. 12, no. 1, p. 3762, Mar. 2022, doi: 10.1038/s41598-022-07723-1.

S. Mallah et al., “Predicting Soil Textural Classes Using Random Forest Models: Learning from Imbalanced Dataset,” Agronomy, vol. 12, no. 11, p. 2613, Oct. 2022, doi: 10.3390/agronomy12112613.

N. Hasegawa, M. Sugiyama, and K. Igarashi, “Random forest machine-learning algorithm classifies white- and brown-rot fungi according to the number of the genes encoding Carbohydrate-Active enZyme families,” Appl. Environ. Microbiol., vol. 90, no. 7, pp. e00482-24, Jul. 2024, doi: 10.1128/aem.00482-24.

F. Arnaut, A. Kolarski, and V. A. Srećković, “Random Forest Classification and Ionospheric Response to Solar Flares: Analysis and Validation,” Universe, vol. 9, no. 10, p. 436, Sep. 2023, doi: 10.3390/universe9100436.

M. Imani, Z. Ghaderpour, M. Joudaki, and A. Beikmohammadi, “The Impact of SMOTE and ADASYN on Random Forest and Advanced Gradient Boosting Techniques in Telecom Customer Churn Prediction,” Apr. 10, 2024, Computer Science and Mathematics. doi: 10.20944/preprints202403.0213.v2.

A. F. Pulungan, D. Selvida, and A. I. Silitonga, “Combination of ADASYN and random forest for classification of imbalanced lung cancer dataset,” presented at the Proceedings of the 6th International Conference On Computing And Applied Informatics 2022, Medan, Indonesia, 2024, p. 020027. doi: 10.1063/5.0200590.

Z. Qing, Q. Zeng, H. Wang, Y. Liu, T. Xiong, and S. Zhang, “ADASYN-LOF Algorithm for Imbalanced Tornado Samples,” Atmosphere, vol. 13, no. 4, p. 544, Mar. 2022, doi: 10.3390/atmos13040544.

H. A. Salman, A. Kalakech, and A. Steiti, “Random Forest Algorithm Overview,” Babylon. J. Mach. Learn., vol. 2024, pp. 69–79, Jun. 2024, doi: 10.58496/BJML/2024/007.

H. Karamti et al., “Improving Prediction of Cervical Cancer Using KNN Imputed SMOTE Features and Multi-Model Ensemble Learning Approach,” Cancers, vol. 15, no. 17, p. 4412, Sep. 2023, doi: 10.3390/cancers15174412.

J. A Ilemobayo et al., “Hyperparameter Tuning in Machine Learning: A Comprehensive Review,” J. Eng. Res. Rep., vol. 26, no. 6, pp. 388–395, Jun. 2024, doi: 10.9734/jerr/2024/v26i61188.

“Grid Search of Convolutional Neural Network model in the case of load forecasting,” Arch. Electr. Eng., Jan. 2024, doi: 10.24425/aee.2021.136050.

R. Rahman and F. Fauzi Abdulloh, “Performance of Various Naïve Bayes Using GridSearch Approach In Phishing Email Dataset,” sinkron, vol. 8, no. 4, pp. 2336–2344, Oct. 2023, doi: 10.33395/sinkron.v8i4.12958.

M. Fahmy Amin, “Confusion Matrix in Binary Classification Problems: A Step-by-Step Tutorial,” J. Eng. Res., vol. 6, no. 5, pp. 0–0, Dec. 2022, doi: 10.21608/erjeng.2022.274526.

H. Kaneko, “Interpretation of Machine Learning Models for Data Sets with Many Features Using Feature Importance,” ACS Omega, vol. 8, no. 25, pp. 23218–23225, Jun. 2023, doi: 10.1021/acsomega.3c03722.

K. Chadaga, S. Prabhu, N. Sampathila, R. Chadaga, S. K S, and S. Sengupta, “Predicting cervical cancer biopsy results using demographic and epidemiological parameters: a custom stacked ensemble machine learning approach,” Cogent Eng., vol. 9, no. 1, p. 2143040, Dec. 2022, doi: 10.1080/23311916.2022.2143040.

R. Alsmariy, G. Healy, and H. Abdelhafez, “Predicting Cervical Cancer using Machine Learning Methods,” Int. J. Adv. Comput. Sci. Appl., vol. 11, no. 7, 2020, doi: 10.14569/IJACSA.2020.0110723.

PlumX Metrics

Published
2025-01-04