
		<paper>
			<loc>https://jjcit.org/paper/264</loc>
			<title>A HYBRID CNN-TRANSFORMER APPROACH FOR PRECISE THREE-CLASS DIABETIC RETINOPATHY CLASSIFICATION</title>
			<doi>10.5455/jjcit.71-1738691198</doi>
			<authors>Samira Ait Kaci Azzou,Djamila Boukredera,Sifeddine Baouz</authors>
			<keywords>Diabetic retinopathy,Vision transformer,Transfer learning,Artificial intelligence</keywords>
			<citation>2</citation>
			<views>3632</views>
			<downloads>800</downloads>
			<received_date>4-Feb.-2025</received_date>
			<revised_date>  11-Apr.-2025</revised_date>
			<accepted_date>  16-Apr.-2025</accepted_date>
			<abstract>This study evaluates the effectiveness of Vision Transformers (ViTs) and hybrid deep-learning architectures for 
diabetic retinopathy (DR)  classification,  addressing  the  challenge  of inter-stage  ambiguity in traditional 
systems. While convolutional neural networks (CNNs) such as ResNet50 excel at localized feature extraction in 
retinal images, ViTs offer superior global contextual modeling. To synergize these strengths, we propose a 
hybrid architecture integrating ResNet50’s granular feature extraction with ViTs’ global relational reasoning.  
Three models are designed and evaluated:  (1) an auto-tuned ResNet50, (2) a hyperparameter-optimized ViT 
and (3) a hybrid model combining both architectures. To reduce ambiguity between neighboring stages, we 
simplified the traditional five-stage classification into three clinically relevant categories: no DR, early DR 
(mild/moderate) and advanced DR (severe/proliferative). Trained and validated on the APTOS dataset, the 
ResNet50 model achieves precision scores of 93.0% (No DR), 82.0% (Early DR) and 86.0% (Advanced DR). 
The standalone ViT demonstrates relative improvements, attaining 98.0%, 91.0% and 93.0%, respectively. The 
hybrid model surpasses both, achieving 98.0% average precision across all classes, with gains of +7.0% (early 
DR) and +5.0% (advanced DR) over the standalone ViT. The proposed hybrid model achieved an impressive 
value of 99.5% on all metrics (accuracy, precision and recall) for identifying DR (binary classification) and a 
value of 98.3% for 3-stage classification. It was also concluded that the proposed method achieved better 
performance in DR detection and classification compared to conventional CNN and other state-of-the-art 
methods. The proposed hybrid approach significantly reduces confusion between classes, demonstrating its 
potential for accurate classification of the different stages of DR.</abstract>
		</paper>


