
		<paper>
			<loc>https://jjcit.org/paper/78</loc>
			<title>ACCURATE AND FAST RECURRENT NEURAL NETWORK SOLUTION FOR THE AUTOMATIC DIACRITIZATION OF ARABIC TEXT</title>
			<doi>10.5455/jjcit.71-1567402817</doi>
			<authors>Gheith Abandah,Asma Abdel-Karim</authors>
			<keywords>Automatic diacritization,Arabic natural language processing,Sequence transcription,Arabic text,Recurrent neural networks,Long short-term memory,Bidirectional neural network.</keywords>
			<citation>30</citation>
			<views>7155</views>
			<downloads>1661</downloads>
			<received_date> 2-Sep-2019</received_date>
			<revised_date>  27-Oct-2019 and 21-Nov-2019</revised_date>
			<accepted_date>  16-Dec-2019</accepted_date>
			<abstract>Arabic  is mostly written now without  its diacritics (short  vowels).  Adding  these diacritics  decreases  reading 
ambiguity  among  other  benefits. This  work  aims  to develop a fast  and  accurate machine  learning  solution to 
diacritize Arabic text automatically. This paper uses long short-term memory (LSTM) recurrent neural networks 
to  diacritize  Arabic  text.  Intensive  experiments  are  performed  to evaluate proposed  alternative design and data 
encoding options  towards  a  fast  and  accurate solution. Our  experiments involve  investigating  and handling 
problems in sequence lengths, proposing and evaluating alternative encodings of the diacritized output sequences 
and tuning  and  evaluating  neural network options  including architecture, network  size and hyper-parameters. 
This paper recommends a solution that can be fast trained on a large dataset and uses four bidirectional LSTM 
layers to predict the diacritics of the input sequence of Arabic letters. This solution achieves a diacritization error 
rate of 2.46% on the LDC ATB3 dataset benchmark and 1.97% on the larger new Tashkeela dataset. This latter 
rate is 47% improvement over the best-published previous result.</abstract>
		</paper>


