NEWS

A DEEP DECISION FORESTS MODEL FOR HATE SPEECH DETECTION


(Received: 2-Nov.-2022, Revised: 4-Feb.-2023 , Accepted: 14-Feb.-2023)
Detecting and controlling propagation of hate speech over social-media platforms is a challenge. This problem is exacerbated by extremely fast flow, readily available audience and relative permanence of information on social media. The objective of this research is to propose a model that could be used to detect political hate speech that is propagated through social-media platforms in Kenya. Using Twitter textual data and Keras TensorFlow Decision Forests (TF-DF), three models were developed; i.e., Gradient Boosted Trees with Universal Sentence Embedding (USE), Gradient Boosted Trees and Random Forest, respectively. The Gradient Boosted Trees with USE model exhibited a superior performance with an accuracy of 98.86%, a recall of 0.9587, a precision of 0.9831 and an AUC of 0.9984. Therefore, this model can be utilized for detecting hate speech on social media platforms.

[1] U. Nations, "Understanding Hate Speech," [Online], Available: https://www.un.org/en/hate-speech/understanding-hate-speech/what-is-hate-speech, 20 Oct. 2022.

[2] U. Nations, "Hate Speech," [Online], Available: https://www.un.org/en/hate-speech/impact-and-prevention/why-tackle-hate-speech, 20 Oct. 2022.

[3] E. Ombui, L. Muchemi and P. Wagacha, "Hate Speech Detection in Code-switched Text Messages," Proc. of the 3rd IEEE Int. Symposium on Multidisciplinary Studies and Innovative Technologies (ISMSIT), pp. 1–6, Ankara, Turkey, 2019.

[4] F. Poletto, V. Basile, M. Sanguinetti, C. Bosco and V. Patti, "Resources and Benchmark Corpora for Hate Speech Detection: A Systematic Review," Language Resources and Evaluation, vol. 55, no. 2, pp. 477–523, 2021.

[5] U. Nations, "Hate Speech," [Online], Available: https://www.un.org/en/hate-speech/impact-and-prevention/challenges-of-tracking-hate, 20 Oct. 2022.

[6] R. Gomez, J. Gibert, L. Gomez and D. Karatzas, "Exploring Hate Speech Detection in Multimodal Publications," Proc. of the IEEE Winter Conf. on Applications of Computer Vision (WACV), pp. 1459-1467, 2020.

[7] N. S. Mullah and W. M. N. W. Zainon, "Advances in Machine Learning Algorithms for Hate Speech Detection in Social Media: A Review," IEEE Access, vol. 9, DOI: 10.1109/ACCESS.2021.3089515, 2021.

[8] P. Badjatiya, S. Gupta, M. Gupta and V. Varma, "Deep Learning for Hate Speech Detection in Tweets," Proc. of the 26th Int. Conf. on World Wide Web Companion, pp. 759–760, DOI:10.1145/3041021.3054223, 2017.

[9] W. Dorris, R. Hu, N. Vishwamitra, F. Luo and M. Costello, "Towards Automatic Detection and Explanation of Hate Speech and Offensive Language," Proc. of the 6th Int. Workshop on Security and Privacy Analytics, pp. 23–29, DOI: 10.1145/3375708.3380312, 2020.

[10] S. Hochreiter and J. Schmidhuber, "Long Short-term Memory," Neural Computation, vol. 9, no. 8, pp. 1735–1780, 1997.

[11] R. Ong, "Offensive Language Analysis Using Deep Learning Architecture," arXiv: 1903.05280, DOI: 10.48550/arXiv.1903.05280, 2019.

[12] Github, "Understanding LSTM Networks," [Online], Available: https://colah.github.io/posts/2015-08- Understanding-LSTMs/, 21 Oct. 2022.

[13] F. Chollet, "Keras," [Online], Available: https://keras.io/, 21 Oct. 2022.

[14] F. Chollet, "Introduction to Keras for Engineers," [Online], Available: https://keras.io/gettingstarted/ intro to keras for engineers/, 21 Oct. 2022.

[15] Google, "Tensorflow Decision Forests," [Online], Available: https://www.tensorflow.org/decision forests, 20 Oct. 2022.

[16] A. Criminisi, J. Shotton and E. Konukoglu, "Decision Forests for Classification, Regression, Density Estimation, Manifold Learning and Semi-supervised Learning [Internet]," Microsoft Research, MSR-TR-2011-114, 2011.

[17] G. Developers, "Decision Forests," [Online], Available: https://developers.google.com/machine-learning/decision-forests/intro-to-decision-forests-real, 20 Oct. 2022.

[18] L. Rokach, "Decision Forest: Twenty Years of Research," Information Fusion, vol. 27, pp. 111–125, 2016.

[19] C. Krauss, X. A. Do and N. Huck, "Deep Neural Networks, Gradient-boosted Trees, Random Forests: Statistical Arbitrage on the S&P 500," Europ. J. of Operat. Research, vol. 259, no. 2, pp. 689–702, 2017.

[20] L. Breiman, "Random Forests," Machine Learning, vol. 45, no. 1, pp. 5–32, 2001.

[21] Xgboost Developers, "Introduction to Boosted Trees," [Online], Available: https://xgboost.readthedocs. io/en/stable/tutorials/model.html, 20 Oct. 2022.

[22] Google, "Open Sourcing Bert: State-of-the-art Pre-training for Natural Language Processing," [Online], Available: https://ai.googleblog.com/2018/11/open-sourcing-bert-state-of-art-pre.html, 20 Oct. 2022.

[23] M. Mozafari, R. Farahbakhsh and N. Crespi, "A BERT-based Transfer Learning Approach for Hate Speech Detection in Online Social Media," Proc. of the Int. Conf. on Complex Networks and Their Applications, Computational Intelligence Book Series, vol. 881, pp. 928–940, Springer, 2019.

[24] A. Velankar, H. Patil, A. Gore, S. Salunke and R. Joshi, "L3Cube-mahahate: A Tweet-based Marathi Hate Speech Detection Dataset and BERT Models," Proc. of the 3rdWorkshop on Threat, Aggression and Cyberbullying (TRAC 2022), pp. 1-9, Gyeongju, Republic of Korea, 2022.

[25] D. Cer, Y. Yang, S.-Y. Kong et al., "Universal Sentence Encoder," arXiv: 1803.11175, 2018.

[26] T. Hub, "Universal-sentence-encoder," [Online], Available: https://tfhub.dev/google/universal-sentence-encoder/4, 20 Oct. 2022.

[27] Twitter, "Tweet Downloader," [Online], Available: https://developer.twitter.com/apitools/downloader, 20 Oct. 2022.

[28] N. Cohesion and I. Commission, "Hatelex: A Lexicon of Hate Speech Terms in Kenya," Nairobi, Tech. Rep., 2022.

[29] C. Hutto and E. Gilbert, "Vader: A Parsimonious Rule-based Model for Sentiment Analysis of Social Media Text," Proc. of the Int. AAAI Conf. on Web and Social Media, vol. 8, no. 1, pp. 216–225, 2014.

[30] GitHub, "Vadersentiment," [Online], Available: https://github.com/cjhutto/vaderSentiment, Oct. 2022.

[31] Google, "Welcome to Colaboratory," [Online], Available: https://colab.research.google.com, Oct. 2022.

[32] Google, "Build, Train and Evaluate Models with Tensorflow Decision Forests," [Online], Available: https://www.tensorflow.org/decision forests/tutorials/beginner colab, 20 Oct. 2022.

[33] A. Natekin and A. Knoll, "Gradient Boosting Machines: A Tutorial," Frontiers in Neurorobotics, vol. 7, p.21, 2013.

[34] S. Khan, A. Kamal, M. Fazil et al., "HCovBi-Caps: Hate Speech Detection Using Convolutional and Bi-directional Gated Recurrent Unit with Capsule Network," IEEE Access, vol. 10, pp. 7881–7894, 2022.

[35] B. Vidgen, T. Thrush, Z. Waseem and D. Kiela, "Learning from the Worst: Dynamically Generated Datasets to Improve Online Hate Detection," Proc. of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th Int. Joint Conf. on Natural Language Processing, vol. 1: Long Papers, pp. 1667–1682, DOI: 10.18653/v1/2021.acl-long.132, 2021.

[36] S. Khan, M. Fazil, V. K. Sejwal, M. A. Alshara, R. M. Alotaibi and Kamal, "BICHAT: BiLSTM with Deep CNN and Hierarchical Attention for Hate Speech Detection," Journal of King Saud University-Computer and Information Sciences, vol. 34, no. 7, pp. 4335 – 4344, 2022.

[37] C. Koutlis, M. Schinas and S. Papadopoulos, "MemeTector: Enforcing Deep Focus for Meme Detection," arXiv: 2205.13268, DOI: 10.48550/arXiv.2205.13268, 2022.

[38] A. Aggarwal, V. Sharma, A. Trivedi et al., "Two-way Feature Extraction Using Sequential and Multimodal Approach for Hateful Meme Classification," Complexity, vol. 2021, pp. 1–7, 2021.

[39] IBM, "AI vs. Machine Learning vs. Deep Learning vs. Neural Networks: What’s the Difference?" [Online], Available: https://www.ibm.com/cloud/blog/ai-vs-machine-learning-vs-deep-learning-vs-neural-networks, 30 Jan. 2023.

[40] O. Sagi and L. Rokach, "Ensemble Learning: A Survey," Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, vol. 8, no. 4, p. e1249, 2018.

[41] V. Zocca, G. Spacagna, D. Slater and P. Roelants, Python Deep Learning, 2nd Edition, ISBN: 978-1789348460, Packt Publishing Ltd, 2017.