EARLY PREDICTION OF CERVICAL CANCER USING MACHINE LEARNING TECHNIQUES

(Received: 28-Aug.-2022, Revised: 16-Oct.-2022 , Accepted: 30-Oct.-2022)

Authors Mohammad Subhi Al- Batah, Mazen Alzyoud, Raed Alazaidah, Malek Toubat, Haneen Alzoubi, Areej Olaiyat,

Keywords #Cervical cancer #Classification #Feature selection #Machine learning #Medical diagnosis

Abstract According to recent studies and statistics, Cervical Cancer (CC) is one of the most common causes of death worldwide and mainly in the developing countries. CC has a mortality rate of around 60%, in poor developing countries and the percentages could go even higher, due to poor screening processes, lack of sensitization and several other reasons. Therefore, this paper aims to utilize the high capabilities of machine-learning techniques in the early prediction of CC. In specific, three well-known feature selection and ranking methods have been used to identify the most significant features that help in the diagnosis process. Also, eighteen different classifiers that belong to six learning strategies have been trained and extensively evaluated against primary data consisting of five hundred images. Moreover, an investigation regarding the problem of imbalance class distribution which is common in medical datasets is conducted. The results revealed that LWNB and RandomForest classifiers showed the best performance in general and considering four different evaluation metrics. Also, LWNB and logistic classifiers were the best choices to handle the problem of imbalance class distribution which is common in medical diagnosis tasks. The final conclusion which could be made is that using an ensemble model which consists of several classifiers such as LWNB, RandomForest and logistic classifiers is the best solution to handle this type of problems.

References

[1] M. A. Abu-Lubad, A. J. Dua’a, G. F. Helaly et al., "Human Papillomavirus as an Independent Risk Factor of Invasive Cervical and Endometrial Carcinomas in Jordan," Journal of Infection and Public Health, vol. 13, no. 4, pp. 613-618, 2022.

[2] B. Obeidat, I. Matalka, A. Mohtaseb et al., "Prevalence and Distribution of High-risk Human Papillomavirus Genotypes in Cervical Carcinoma, Low-grade and High-grade Squamous Intraepithelial Lesions in Jordanian Women," European Journal of Gynaecological Oncology, vol. 34, no. 3, pp. 257-260, 2013.

[3] S. E. Jordan, M. Schlumbrecht, S. George et al., "The Moore Criteria: Applicability in a Diverse, Non-trial, Recurrent Cervical Cancer Population," Gynecologic Oncology, vol. 157, no. 1, pp. 167-172, 2022.

[4] M. Al Qadire, K. M. Aldiabat, E. Alsrayheen et al., "Public Attitudes toward Cancer and Cancer Patients: A Jordanian National Online Survey," Middle East Journal of Cancer, vol. 13, DOI: 10.30476/mejc.2020.86835.1381, 2020.

[5] A. I. Khasawneh, F. F. Asali, R. M. Kilani et al., "Prevalence and Genotype Distribution of Human Papillomavirus among a Sub-population of Jordanian Women," International Journal of Women’s Health and Reproduction Sciences, vol. 9, no. 1, pp. 17-23, 2021.

[6] R. Alazaidah, M. A. Almaiah and M. Al-luwaici, "Associative Classification in Multi-label Classification: An Investigative Study," Jordanian Journal of Computers and Information Technology (JJCIT), vol. 7, no. 2, pp. 166 - 179, 2021.

[7] M. Al-luwaici, A. K., Junoh, W. A. AlZoubi., R. Alazaidah and W. Al-luwaici, "New Features Selection Method for Multi-label Classification Based on the Positive Dependencies among Labels," Solid State Technology, vol. 63, no. 2s, pp. 9896-9909, 2020.

[8] R. Alazaidah, F. A. Ahmad, M. F. M. Mohsin and W. A. AlZoubi, "Multi-label Ranking Method Based on Positive Class Correlations," Jordanian Journal of Computers and Information Technology (JJCIT), vol. 6, no. 4, pp. 377-391, 2020.

[9] M. Alluwaici, A. K. Junoh and R. Alazaidah, "New Problem Transformation Method Based on the Local Positive Pairwise Dependencies among Labels," Journal of Information & Knowledge Management, vol. 19, no. 1, ID. 2040017, 2020.

[10] R. Alazaidah, F. K. Ahmad and M. F. M. Mohsin, "Multi Label Ranking Based on Positive Pairwise Correlations among Labels," The International Arab Journal of Information Technology, vol. 17, no. 4, pp. 440-449, 2020.

[11] B. J. Priyanka, "Machine Learning Approach for Prediction of Cervical Cancer," Turkish Journal of Computer and Mathematics Education (TURCOMAT), vol. 12, no. 8, pp. 3050-3058, 2021.

[12] Q. M. Ilyas and M. Ahmad, "An Enhanced Ensemble Diagnosis of Cervical Aancer: A Pursuit of Machine Intelligence towards Sustainable Health," IEEE Access, vol. 9, pp. 12374-12388, 2021.

[13] J. Wahid and H. F. A. Al-Mazini, "Classification of Cervical Cancer Using Ant-miner for Medical Expertise Knowledge Management," Proc. of the Knowledge Management Int. Conf. (KMICe), Miri Sarawak, Malaysia, 25 –27 July 2018.

[14] I. Khoulqi and N. Idrissi, "Cervical Cancer Detection and Classification Using MRIs," Jordanian Journal of Computers and Information Technology (JJCIT), vol. 8, no. 2, pp. 141-158, 2022.

[15] K. Fernandes, D. Chicco, J. S. Cardoso and J. Fernandes, "Supervised Deep Learning Embeddings for the Prediction of Cervical Cancer Diagnosis," PeerJ Computer Science, vol. 4, e154, DOI: 10.7717/peerj-cs.154, 2018.

[16] M. F. Ijaz, M. Attique and Y. Son, "Data-driven Cervical Cancer Prediction Model with Outlier Detection and Over-sampling Methods," Sensors, vol. 20, no. 10, ID. 2809, 2020.

[17] V. Mishra, S. Aslan and M. M. Asem, "Theoretical Assessment of Cervical Cancer Using Machine Learning Methods Based on Pap-Smear Test," Proc. of the 9th IEEE Annual Information Technology, Electronics and Mobile Communication Conf. (IEMCON), pp. 1367-1373, Vancouver, Canada, 2018.

[18] R. Vidya and G. M. Nasira, "Predicting Cervical Cancer Using Machine Learning Techniques - An Analysis," Glob. J. Pure Appl. Math, vol. 12, no. 3, 2016.

[19] N. Al Mudawi and A. Alazeb, "A Model for Predicting Cervical Cancer Using Machine Learning Algorithms," Sensors, vol. 22, no. 11, ID. 4132, 2022.

[20] N. A. M. Isa, S. A. Salamah and U. K. Ngah, "Adaptive Fuzzy Moving K-means Clustering Algorithm for Image Segmentation," IEEE Trans. on Consumer Electronics, vol. 55, no. 4, pp. 2145-2153, 2009.

[21] C. Zhang and P. Wang, "A New Method of Color Image Segmentation Based on Intensity and Hue Clustering," Proc. of the 15th IEEE Int. Conf. on Pattern Recognition (ICPR-2000), vol. 3, pp. 613-616, Barcelona, Spain, 2000.

[22] N. Mustafa, N. A. M. Isa, M. Y. Mashor and N. H. Othman, "Capability of New Features of Cervical Cells for Cervical Cancer Diagnostic System Using Hierarchical Neural Network," IJSSST, vol. 9, no. 2, pp. 56-64, 2008.

[23] M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann and I. H. Witten, "The WEKA Data Mining Software: An Update," ACM SIGKDD Explorations Newsletter, vol. 11, no. 1, pp. 10-18, 2009.

[24] G. H. John and P. Langley, "Estimating Continuous Distributions in Bayesian Classifier," Proc. of the 11th Conf. on Uncertainty in Artificial Intelligence (UAI1995), pp. 338-345, San Mateo, 1995.

[25] S. Le Cessie and J. C. Van Houwelingen, "Ridge Estimators in Logistic Regression," Journal of the Royal Statistical Society: Series C (Applied Statistics), vol. 41, no. 1, pp. 191-201, 1992.

[26] J. Platt, "Using Analytic QP and Sparseness to Speed Training of Support Vector Machines," Advances in Neural Information Processing Systems, vol. 11, 1998.

[27] N. Landwehr, M. Hall and E. Frank, "Logistic Model Trees," Machine Learning, vol. 59, no. 1, pp. 161-205, 2005.

[28] D. W. Aha, D. Kibler and M. K. Albert, "Instance-based Learning Algorithms," Machine Learning, vol. 6, no. 1, pp. 37-66, 1991.

[29] J. G. Cleary and L. E. Trigg, "K*: An Instance-based Learner Using an Entropic Distance Measure," Proc. of the 12th Int. Conf. on Machine Learning, pp. 108-114, Tahoe City, California, July 9–12, 1995.

[30] E. Frank, M. Hall and B. Pfahringer, "Locally Weighted Naive Bayes," Proc. of the 19th Conf. on Uncertainty in Artificial Intelligence, pp. 249-256, arXiv:1212.2487, 2003. [31] Y. Freund and R. E. Schapire, "Experiments with a New Boosting Algorithm," Proc. of the 13th Int. Conf. on Int. Conf. on Machine Learning (ICML'96), vol. 96, pp. 148-156, 1996.

[32] J. Friedman, T. Hastie and R. Tibshirani, "Additive Logistic Regression: A Statistical View of Boosting," The Annals of Statistics, vol. 28, no. 2, pp. 337-407, Stanford University, 1998.

[33] R. Kohavi, "The Power of Decision Tables," Proc. of the European Conf. on Machine Learning (ECML), pp. 174-189, Springer, Berlin, Heidelberg, 1995.

[34] W. W. Cohen, "Fast Effective Rule Induction," Proc. of the 12th Int. Conf. on Machine Learning, pp. 115-123, Tahoe City, California, 1995.

[35] E. Frank and I. H. Witten, "Generating Accurate Rule Sets without Global Optimization," Proc. of the 15th Int. Conf. on Machine Learning (ICML '98), pp. 144–151, 1998.

[36] L. Breiman, "Random Forests," Machine Learning, vol. 45, no. 1, pp. 5-32, 2001. [37] J. R. Quinlan, C4. 5: Program for Machine Learning, Morgan Kaufmann Publishers, Inc., 1993.