AN ENHANCED WORD LEVEL ARABIC OCR BASED ON DUAL ENCODER TRANSFORMER ARCHITECTURE

https://jjcit.org/paper/272 AN ENHANCED WORD LEVEL ARABIC OCR BASED ON DUAL ENCODER TRANSFORMER ARCHITECTURE 10.5455/jjcit.71-1746709575 Khulood Gaashan,Maram Bani Younes Arabic OCR,Multi-batch size,Transformer,Dual encoder transformer,Decoder,Feature extraction,Self-attention mechanism 3 2750 989 16-Jun.-2025 12-Jul.-2025 and 12-Aug.-2025 13-Sep.-2025 Arabic script is one of the most sophisticated and difficult scripts. It uses different shapes of characters with complex diacritical marks that are difficult to distinguish from the dots of characters. This script’s distinctive features make the Optical Character Recognition (OCR) procedure more challenging and result in low-accuracy recognition. Different studies have aimed to introduce high-accuracy Arabic OCR in the literature. However, enhancing the accuracy of reading the words has been an open issue that depends on the used dataset and the developed recognition model. Besides, considering diacritics has been limited and not sufficiently addressed. Experimental tests on words with diacritics in prior models have shown bad accuracy that does not exceed 60%. Consequently, this work aims to introduce a new, accurate deep-learning model for Arabic OCR that considers words with and without diacritical marks. It utilizes a dual encoder transformer (DTrOCR), a deep-learning architecture that has demonstrated superior performance in identification and classification tasks. The proposed DTrOCR creates multi-batch sizes. It has been trained using a comprehensive, generated Arabic word-based dataset named MFSRHRD and tested on unseen datasets. The accuracy of configuring Arabic words without diacritics reaches 98.5%. However, for words with diacritics, it achieved an accuracy of 89.9%.