RISK FACTOR IDENTIFICATION FOR STROKE PROGNOSIS USING MACHINE-LEARNING ALGORITHMS


(Received: 16-May-2022, Revised: 6-Jul.-2022 , Accepted: 7-Jul.-2022)
Tanvir Ahammad,
Stroke is a life-threatening condition causing the second-leading number of deaths worldwide. It is a challenging problem in the public-health domain of the 21st century to healthcare professionals and researchers. So, proper monitoring of stroke can prevent and reduce its severity. Risk-factor analysis is one of the promising approaches for identifying the presence of stroke disease. Numerous researches have focused on forecasting strokes in patients. The majority had a good accuracy ratio, around 90%, on the publicly available datasets. Combining several pre-processing tasks can considerably increase the quality of classifiers, an area of research need. Additionally, researchers should pinpoint the major risk factors for stroke disease and use advanced classifiers to forecast the likelihood of stroke. This article presents an enhanced approach for identifying the potential risk factors and predicting the incidence of stroke on a publicly available clinical dataset. The method considers and resolves significant gaps in previous studies. It incorporates ten classification models, including advanced boosting classifiers, to detect the presence of stroke. The performance of the classifiers is analyzed on all possible subsets of attribute/feature selections concerning five metrics to find the best-performing algorithms. The experimental results demonstrate that the proposed approach achieved the best accuracy on all feature classifications. Overall, this study’s main achievement is obtaining a higher percentage (97% accuracy using boosting classifiers) of stroke prognosis than state-of-the-art approaches to stroke dataset. Hence, physicians can use gradient and ensemble boosting-tree-based models that are most suitable for predicting patients’ strokes in the real world. Moreover, this investigation reveals that age, heart disease, glucose level, hypertension and marital status are the most significant risk factors. At the same time, the remaining attributes are also essential to obtaining the best performance.

[1] C. O. Johnson, M. Nguyen, G. A. Roth et al., "Global, Regional and National Burden of Stroke, 1990– 2016: A Systematic Analysis for the Global Burden of Disease Study 2016," The Lancet Neurology, vol. 18, no. 5, pp. 439–458, 2019. 

[2] B. C. Campbell, D. A. De Silva, M. R. Macleod, S. B. Coutts, L. H. Schwamm, S. M. Davis and G. A. Donnan, "Ischaemic Stroke," Nature Reviews Disease Primers, vol. 5, no. 1, pp. 1–22, 2019. 

[3] S. S. Virani, A. Alonso, H. J. Aparicio et al., "Heart Disease and Stroke Statistics—2021 Update: A Report from the American Heart Association," Circulation, vol. 143, no. 8, pp. e254–e743, 2021. 

[4] A. Subudhi, M. Dash and S. Sabut, "Automated Segmentation and Classification of Brain Stroke Using Expectation-maximization and Random Forest Classifier," Biocybernetics and Biomedical Engineering, vol. 40, no. 1, pp. 277–289, 2020.

[5] J. J. Noubiap, V. F. Feteh, M. E. Middeldorp, J. L. Fitzgerald, G. Thomas, T. Kleinig, D. H. Lau and P. Sanders, "A Meta-analysis of Clinical Risk Factors for Stroke in Anticoagulant-Naïve Patients with Atrial Fibrillation," EP Europace, vol. 23, no. 10, pp. 1528–1538, 2021.

[6] M. S. Elkind and R. L. Sacco, "Stroke Risk Factors and Stroke Prevention," Seminars in Neurology, vol. 18, no. 04, pp. 429–440, Thieme Medical Publishers, Inc., 1998. 

[7] G. Jackson and K. Chari, "National Hospital Care Survey Demonstration Projects: Stroke Inpatient Hospitalizations," Natl Health Stat Report, vol. 132, pp. 1-11, National Library of Medicine, 2019.

[8] V. Malik, A. N. Ganesan, J. B. Selvanayagam, D. P. Chew and A. D. McGavigan, "Is Atrial Fibrillation a Stroke Risk Factor or Risk Marker? An Appraisal Using the Bradford Hill Framework for Causality," Heart, Lung and Circulation, vol. 29, no. 1, pp. 86–93, 2020.

[9] H.-J. Lin, J.-H. Yeh, M.-T. Hsieh and C.-Y. Hsu, "Continuous Positive Airway Pressure with Good Adherence Can Reduce Risk of Stroke in Patients with Moderate to Severe Obstructive Sleep Apnea: An Updated Systematic Review and Meta-analysis," Sleep Medicine Reviews, vol. 54, p. 101354, 2020.

[10] K. Furie, "Epidemiology and Primary Prevention of Stroke," CONTINUUM: Lifelong Learning in Neurology, vol. 26, no. 2, pp. 260–267, 2020.

[11] C. English, L. MacDonald-Wicks, A. Patterson, J. Attia and G. J. Hankey, "The Role of Diet in Secondary Stroke Prevention," The Lancet Neurology, vol. 20, no. 2, pp. 150–160, 2021.

[12] J. D. Pandian, S. L. Gall, M. P. Kate et al., "Prevention of Stroke: A Global Perspective," The Lancet, vol. 392, no. 10154, pp. 1269–1278, 2018.

[13] K. Shailaja, B. Seetharamulu and M. Jabbar, "Machine Learning in Healthcare: A Review," Proc. of the 2nd IEEE International Conference on Electronics, Communication and Aerospace Technology (ICECA), pp. 910–914, Coimbatore, India, 2018.

[14] J. Yu, S. Park, S.-H. Kwon, C. M. B. Ho, C.-S. Pyo and H. Lee, "Ai-based Stroke Disease Prediction System Using Real-time Electromyography Signals," Applied Sciences, vol. 10, no. 19, p. 6791, 2020.

[15] A. A. Ali, "Stroke Prediction Using Distributed Machine Learning Based on Apache Spark," Stroke, vol. 28, no. 15, pp. 89–97, 2019.

[16] P. Govindarajan, R. K. Soundarapandian, A. H. Gandomi et al., "Classification of Stroke Disease Using Machine Learning Algorithms," Neural Computing and Applications, vol. 32, no. 3, pp. 817–828, 2020.

[17] G. Sailasya and G. L. A. Kumari, "Analyzing the Performance of Stroke Prediction Using ML Classification Algorithms," International Journal of Advanced Computer Science and Applications, vol. 12, no. 6, pp. 539–545, 2021.

[18] T. Tazin, M. N. Alam, N. N. Dola, M. S. Bari, S. Bourouis and M. Monirujjaman Khan, "Stroke Disease Detection and Prediction Using Robust Learning Approaches," Journal of Healthcare Engineering, vol. 2021, Article ID 7633381, 2021.

[19] S. Dev, H. Wang, C. S. Nwosu, N. Jain, B. Veeravalli and D. John, "A Predictive Analytics Approach for Stroke Prediction Using Machine Learning and Neural Networks," Healthcare Analytics, vol. 2, p. 100032, 2022.

[20] N. Kasabov, V. Feigin, Z.-G. Hou, Y. Chen, L. Liang, R. Krishnamurthi, M. Othman and P. Parmar, "Evolving Spiking Neural Networks for Personalized Modeling, Classification and Prediction of Spatio-temporal Patterns with a Case Study on Stroke," Neurocomputing, vol. 134, pp. 269–279, 2014.

[21] D. Shanthi, G. Sahoo and N. Saravanan, "Designing an Artificial Neural Network Model for the Prediction of Thrombo-embolic Stroke," International Journal of Biometrics and Bioinformatics (IJBB), vol. 3, no. 1, pp. 10–18, 2009.

[22] L. Amini, R. Azarpazhouh, M. T. Farzadfar, S. A. Mousavi, F. Jazaieri, F. Khorvash, R. Norouzi and N. Toghianfar, "Prediction and Control of Stroke by Data Mining," International Journal of Preventive Medicine, vol. 4, no. Suppl. 2, p. S245, 2013.

[23] C. Colak, E. Karaman and M. G. Turtay, "Application of Knowledge Discovery Process on the Prediction of Stroke," Computer Methods and Programs in Biomedicine, vol. 119, no. 3, pp. 181–185, 2015.

[24] L. I. Santos, M. O. Camargos, M. F. S. V. D’Angelo et al., "Decision Tree and Artificial Immune Systems for Stroke Prediction in Imbalanced Data," Expert Systems with Applications, vol. 191, p. 116221, 2022.

[25] D. Paikaray and A. K. Mehta, "An Extensive Approach towards Heart Stroke Prediction Using Machine Learning with Ensemble Classifier," Proc. of the International Conference on Paradigms of Communication, Computing and Data Sciences, pp. 767–777, Springer, 2022.

[26] P. Songram and C. Jareanpon, "A Study of Features Affecting on Stroke Prediction Using Machine Learning," Proc. of the International Conference on Multi-disciplinary Trends in Artificial Intelligence, pp. 216–225, Springer, 2019.

[27] R. S. Jeena and A. Sukeshkumar, "Development of a Stroke Risk Assessment Model for a Small Population in South Kerala Using Logistic Regression," Proc. of TENCON 2019-2019 IEEE Region 10 Conference (TENCON), pp. 350–355, Kochi, India, 2019.

[28] G. Fang, W. Liu and L. Wang, "A Machine Learning Approach to Select Features Important to Stroke Prognosis," Computational Biology and Chemistry, vol. 88, p. 107316, 2020.

[29] L. R. Guarneros-Nolasco, N. A. Cruz-Ramos, G. Alor-Hernández, L. Rodríguez-Mazahua and J. L. Sánchez-Cervantes, "Identifying the Main Risk Factors for Cardiovascular Diseases Prediction Using Machine Learning Algorithms," Mathematics, vol. 9, no. 20, p. 2537, 2021.

[30] A. Parmar, R. Katariya and V. Patel, "A Review on Random Forest: An Ensemble Classifier," Proc. of the International Conference on Intelligent Data Communication Technologies and Internet of Things, pp. 758–763, Springer, 2018.

[31] T. Chen, T. He, M. Benesty, V. Khotilovich, Y. Tang, H. Cho et al., "Xgboost: Extreme Gradient Boosting," R Package Version 0.4-2, vol. 1, no. 4, pp. 1–4, 2015.

[32] M. Salman Saeed, M. W. Mustafa, U. U. Sheikh, T. A. Jumani, I. Khan, S. Atawneh and N. N. Hamadneh, "An Efficient Boosted C5. 0 Decision-tree-based Classification Approach for Detecting Nontechnical Losses in Power Utilities," Energies, vol. 13, no. 12, p. 3242, 2020.

[33] G. Ke, Q. Meng, T. Finley, T. Wang, W. Chen, W. Ma, Q. Ye and T.-Y. Liu, "Lightgbm: A Highly Efficient Gradient Boosting Decision Tree," Advances in Neural Information Processing Systems, vol. 30, pp. 3146–3154, 2017.

[34] J. T. Hancock and T. M. Khoshgoftaar, "Catboost for Big Data: An Interdisciplinary Review," Journal of Big Data, vol. 7, no. 1, pp. 1–45, 2020.

[35] W. Wang and D. Sun, "The Improved Adaboost Algorithms for Imbalanced Data Classification," Information Sciences, vol. 563, pp. 358–374, 2021.

[36] S. Suthaharan, "Support Vector Machine," Proc. of Machine Learning Models and Algorithms for Big Data Classification, pp. 207–235, Springer, 2016.

[37] S. Wan, Y. Liang, Y. Zhang and M. Guizani, "Deep Multi-layer Perceptron Classifier for Behavior Analysis to Estimate Parkinson’s Disease Severity Using Smartphones," IEEE Access, vol. 6, pp. 36 825– 36 833, 2018.

[38] H. Saadatfar, S. Khosravi, J. H. Joloudari, A. Mosavi and S. Shamshirband, "A New K-nearest Neighbors Classifier for Big Data Based on Efficient Data Pruning," Mathematics, vol. 8, no. 2, p. 286, 2020.

[39] K. Shah, H. Patel, D. Sanghvi and M. Shah, "A Comparative Analysis of Logistic Regression, Random Forest and KNN Models for the Text Classification," Augmented Human Research, vol. 5, no. 1, pp. 1–16, 2020.

[40] J. Yu, S. Park, S.-H. Kwon, C. M. B. Ho, C.-S. Pyo and H. Lee, "Ai-based Stroke Disease Prediction System Using Real-time Electromyography Signals," Applied Sciences, vol. 10, no. 19, p. 6791, 2020.

[41] S. Wang, Y. Dai, J. Shen and J. Xuan, "Research on Expansion and Classification of Imbalanced Data Based on Smote Algorithm," Scientific Reports, vol. 11, no. 1, pp. 1–11, 2021.

[42] J. Nithyashri and G. Kulanthaivel, "Classification of Human Age Based on Neural Network Using FG-net Aging Database and Wavelets," Proc. of the 4th IEEE International Conference on Advanced Computing (ICoAC), pp. 1–5, Chennai, India, 2012.