NEWS

MULTI-DOMAIN MACHINE LEARNING APPROACH OF NAMED ENTITY RECOGNITION FOR ARABIC BOOKING CHATBOT ENGINES USING PRE-TRAINED BIDIRECTIONAL TRANSFORMERS


(Received: 12-Sep.-2023, Revised: 4-Nov.-2023 , Accepted: 20-Nov.-2023)
Chatbots have recently become essential in various fields, ranging from customer service and information acquisition to entertainment. The use of chatbots reduces operational costs and human errors while providing services at any time. This work presents a Named Entity Recognition (NER) model for the Arabic booking chatbot, focusing on booking tickets and appointments across multiple domains. This research paves the way for the development of chatbots that can support multiple booking domains, contributing to the advancement of the Arabic language in this field. We adopt deep machine-learning and transfer-learning approaches to solve this task. Specifically, we utilized and fine-tuned the AraBERTv0.2 base model to develop the Named Entity Recognition for Booking Queries (NERB) model. Furthermore, we extended it to the Domain-Aware Named Entity Recognition for Booking Queries (DA-NERB) model by adding an additional input for domain type and an embedding layer. The input to our proposed model consists of text sequences of reservation requests, while the output includes sequences of tags representing entities within the input sequences. For training and testing, we synthesized the Arabic Booking Chatbot-Synthetic Dataset (ABC-S Dataset), comprising 76,117 reservation samples that span seven different domains and encompassing 26 categories of named entities. Additionally, we collected the Arabic Booking Chatbot-Collected Dataset (ABC-C Dataset) from volunteers to evaluate our model using various samples. It's worth noting that these datasets are written in informal Arabic, specifically the Levantine dialect. The proposed model achieves 100% and 96.9% accuracy scores on ABC-S (test set) and ABC-C, respectively. Both the datasets and the code for our model are publicly available to support research in the field of Arabic chatbots.

[1] E. H. Almansor and F. K. Hussain, "Survey on Intelligent Chatbots: State-of-the-art and Future Research Directions," Proc. of Conf. on Complex, Intelligent and Software Intensive Systems (CISIS 2019), vol. 993, pp. 534–543, 2020.

[2] M. Al-Ayyoub et al., "Deep Learning for Arabic NLP: A Survey," JOCSCI, vol. 26, pp. 522-531, 2018.

[3] S. AlHumoud, A. Al Wazrah and W. Aldamegh, "Arabic Chatbots: A Survey," IJSCSA, vol. 9, no. 8, pp. 535-541, 2018.

[4] M. Mnasri, "Recent Advances in Conversational NLP: Towards the Standardization of Chatbot Building," DOI: 10.48550/arXiv.1903.09025, Clermont-Ferrand, France, 2019.

[5] Infobip, "The Intelligent Chatbot Building Platform," [Online], Available: https://www.infobip.com, 2021.

[6] T. Alshareef and M. A. Siddiqui, "A seq2seq Neural Network Based Conversational Agent for Gulf Arabic Dialect," Proc. of the 2020 21st IEEE Int. Arab Conf. on Information Technology (ACIT), pp. 1-7, Giza, Egypt, 2021.

[7] Y. Saoudi and M. M. Gammoudi, "Trends and Challenges of Arabic Chatbots: Literature Review," Jordanian Journal of Computers and Information Technology (JJCIT), vol. 9, no. 3, pp. 261-286, 2023.

[8] Class Central, "Microsoft Bot Framework and Conversation as a Platform," [Online], Available: https://www.classcentral.com/course/edx-microsoft-bot-framework-and-conversation-as-a-platform-11325, 2021.

[9] Chatbots Life, "Best Chatbot Development Frameworks | RASA | IBM Watson | Dialogflow," [Online], Available: https://chatbotslife.com/best-chatbot-development-frameworks-rasa-ibm-watson-dialogflow-e2792f9363eb, 2019.

[10] A. H. Al-Ajmi and N. Al-Twairesh, "Building an Arabic Flight Booking Dialogue System Using a Hybrid Rule-based and Data Driven Approach," IEEE Access., vol. 9, pp. 7043-7053, Jan. 2021.

[11] D. Al-Ghadhban and N. Al-Twairesh, "Nabiha: An Arabic Dialect Chatbot, " IJACSA, vol. 11, no. 3, pp. 452-459, 2020.

[12] A. Fadhil, "Ollobot-Towards a Text-based Arabic Health Conversational Agent: Evaluation and Results," Proc. of the Int. Conf. on Recent Advances in Natural Language Processing (RANLP 2019), pp. 295-303, Varna, Bulgaria, 2019.

[13] G. Mesnil et al., "Using Recurrent Neural Networks for Slot Filling in Spoken Language Understanding," IEEE/ACM Transactions on Audio, Speech and Language Processing, vol. 23, no. 3, pp. 530-539, DOI:10.1109/TASLP.2014.2383614, Mar. 2015.

[14] W. Antoun, F. Baly and H. Hajj, "AraBERT: Transformer-based Model for Arabic Language Understanding," Proc. of the 4th Workshop on Open-Source Arabic Corpora and Processing Tools, with a Shared Task on Offensive Language Detection (LREC2020), pp. 9-15, Marseille, France, 2020.

[15] J. Devlin, M. W. Chang, K. Lee and K. Toutanova, "Bert: Pre-training of Deep Bidirectional Transformers for Language Understanding," Proceedings of NAACL-HLT2019, pp. 4171–4186, Minneapolis, Minnesota, June2-June7, 2019.

[16] R. Horev, "BERT Explained: State of the art Language Model for NLP," Towards Data Science, [Online], Available: https://towardsdatascience.com/bert-explained-state-of-the-art-language-model-for-nlp-f8b21a9b6270, 2018.

[17] D. Shulga, "BERT to the Rescue," Towards Data Science, [Online], Available: https://towardsdatascience.com/bert-to-the-rescue-17671379687f, 2019.

[18] W. Antoun, "Aub-mind/arabert: Pre-trained Transformers for the Arabic Language Understanding and Generation (Arabic Bert, Arabic GPT2, Arabic Electra)," GitHub, Edited by M. Al Salti et al. AUB MIND, Beirut, Lebanon, [Online], Available: https://github.com/aub-mind/arabert, 2020.

[19] A. Youssef, M. Elattar and S. R. El-Beltagy, "A Multi-embeddings Approach Coupled with Deep Learning for Arabic Named Entity Recognition," Proc. of the 2020 2nd IEEE Novel Intelligent and Leading Emerging Sciences Conf. (NILES), pp. 456-460, Giza, Egypt, 2020.

[20] E. Taher, S. A.Hoseini and M.Shamsfard, "Beheshti-NER: Persian Named Entity Recognition Using BERT," Proc. of the 1st Int. Workshop on NLP Solutions for Under Resourced Languages (NSURL 2019) Co-located with ICNLSP 2019, pp. 37-42, Trento, Italy, 2020.

[21] B. A. Benali, S. Mihi, N. Laachfoubi and A. A. Mlouk, "Arabic Named Entity Recognition in Arabic Tweets Using BERT-based Models," Procedia Computer Science, vol. 203, pp. 733-738, 2022.

[22] A. Shaker, A. Aldarf and I. Bessmertny, "Using LSTM and GRU with a New Dataset for Named Entity Recognition in the Arabic Language," arXiv: 2304.03399, DOI: 10.48550/arXiv.2304.03399, 2023.

[23] X. Qu, Y. Gu, Q. Xia et al., "A Survey on Arabic Named Entity Recognition: Past, Recent Advances and Future Trends," arXiv: 2302.03512, DOI: 10.48550/arXiv.2302.03512, 2023.

[24] B. Saddar and R. Saddar, "Boshra-sadder/Arabic Booking Chatbot," GitHub, [Online], Available: https://github.com/Boshra-sadder/Arabic-Booking-Chatbot, Amman, Jordan, 2021.

[25] G.Yufeng, "Introduction to Kaggle Kernels," Towards Data Science, [Online], Available: https://towardsdatascience.com/introduction-to-kaggle-kernels-2ad754ebf77, 2017.

[26] E. M. B. Nagoudi, A. Elmadany and M. Abdul-Mageed, "TURJUMAN: A Public Toolkit for Neural Arabic Machine Translation," arXiv: 2206.03933, DOI: 10.48550/arXiv.2206.03933, 2022.

[27] B. AlKhamissi et al., "Adapting MARBERT for Improved Arabic Dialect Identification: Submission to the NADI," arXiv: 2103.01065, DOI: 10.48550/arXiv.2103.01065, 2021.