UNCONSTRAINED EAR RECOGNITION USING TRANSFORMERS


(Received: 3-Aug.-2021, Revised: 5-Sep.-2021 , Accepted: 12-Sep.-2021)
Marwin B. Alejo,
The advantages of the ears as a means of identification over other biometric modalities provided an avenue for researchers to conduct biometric recognition studies on state-of-the-art computing methods. This paper presents a deep learning pipeline for unconstrained ear recognition using a transformer neural network: Vision Transformer (ViT) and Data-efficient image Transformers (DeiTs). The ViT-Ear and DeiT-Ear models of this study achieved a recognition accuracy comparable or more significant than the results of state-of-the-art CNN- based methods and other deep learning algorithms. This study also determined that the performance of Vision Transformer and Data-efficient image Transformer models works better than that of ResNets without using exhaustive data augmentation processes. Moreover, this study observed that the performance of ViT-Ear is nearly like that of other ViT-based biometric studies.

[1] S. Shi, J. Cui, X. L. Zhang, Y. Liu, J. L. Gao and Y. J. Wang, "Fingerprint Recognition Strategies Based on a Fuzzy Commitment for Cloud-Assisted IoT: A Minutiae-based Sector Coding Approach," IEEE Access, vol. 7, pp. 44803–44812, DOI: 10.1109/ACCESS.2019.2906265, 2019.

[2] I. Elzein and M. Kurdi, "Analysis of Embedded Fingerprint Biometric Recognition System Algorithm," Proc. of the 12th IEEE International Symposium on Advanced Topics in Electrical Engineering (ATEE 2021), DOI: 10.1109/ATEE52255.2021.9425124, Bucharest, Romania, Mar. 2021.

[3] M. H. Hersyah, D. Yolanda and H. Sitohang, "Multiple Laboratory Authentication System Design Using Fingerprints Sensor and Keypad Based on Microcontroller," Proc. of the IEEE International Conference on Information Technology Systems and Innovation (ICITSI 2020), pp. 14–19, DOI: 10.1109/ICITSI50517.2020.9264969, Bandung, Indonesia, Oct. 2020.

[4] M. Sahu and R. Dash, "Study on Face Recognition Techniques," Proc. of the 2020 IEEE Int. Conf. on Communication and Signal Processing (ICCSP 2020), pp. 613–616, Chennai, India, Jul. 2020.

[5] A. A. Sukmandhani and I. Sutedja, "Face Recognition Method for Online Exams," Proc. of the IEEE International Conference on Information Management and Technology (ICIMTech 2019), pp. 175–179, DOI: 10.1109/ICIMTECH.2019.8843831, Jakarta/Bali, Indonesia, Aug. 2019.

[6] C. S. Hsiao, C. P. Fan and Y. T. Hwang, "Design and Analysis of Deep-learning Based Iris Recognition Technologies by Combination of U-Net and EfficientNet," Proc. of the 9th IEEE Int. Conf. on Information and Education Technology (ICIET 2021), pp. 433–437, Okayama, Japan, Mar. 2021.

[7] H. D. Rafik and M. Boubaker, "A Multi Biometric System Based on the Right Iris and the Left Iris Using the Combination of Convolutional Neural Networks," Proc. of the 4th IEEE Int. Conf. on Intelligent Computing in Data Sciences (ICDS 2020), DOI: 10.1109/ICDS50568.2020.9268737, Fez, Morocco, Oct. 2020.

[8] S. D. Shirke and C. Rajabhushnam, "Biometric Personal Iris Recognition from an Image at Long Distance," Proceedings of the International Conference on Trends in Electronics and Informatics (ICOEI 2019), vol. 2019-April, pp. 560–565, DOI: 10.1109/ICOEI.2019.8862640, Apr. 2019.

[9] R. Giorgi, N. Bettin, S. Ermini, F. Montefoschi and A. Rizzo, "An Iris+Voice Recognition System for a Smart Doorbell," Proc. of the 8th IEEE Mediterranean Conference on Embedded Computing (MECO 2019), DOI: 10.1109/MECO.2019.8760187, Budva, Montenegro, Jun. 2019.

[10] O. Tymchenko, B. Havrysh, O. O. Tymchenko, O. Khamula, B. Kovalskyi and K. Havrysh, "Person

Voice Recognition Methods," Proc. of the IEEE 3rd Int. Conf. on Data Stream Mining and Processing (DSMP 2020), pp. 287–290, Aug. 2020.

[11] E. M. Owaidah, K. S. Aloufi and J. H. Alkhatib, "Gait Recognition for Saudi Costume Using Kinect Skeletal Tracking," Proc. of the 2nd Int. Conf. on Computer Applications and Information Security (ICCAIS 2019), DOI: 10.1109/CAIS.2019.8769552, Riyadh, Saudi Arabia, May 2019.

[12] H. M. L. Aung and C. Pluempitiwiriyawej, "Gait Biometric-based Human Recognition System Using Deep Convolutional Neural Network in Surveillance System," Peoc. Of IEEE Asia Conference on Computers and Communications (ACCC 2020), pp. 47–51, DOI: 10.1109/ACCC51160.2020.9347899, Singapore, Sep. 2020.

[13] R. Srivastva, A. Singh and Y. N. Singh, "PlexNet: A Fast and Robust ECG Biometric System for Human Recognition," Information Sciences, vol. 558, pp. 208–228, DOI: 10.1016/J.INS.2021.01.001, May 2021.

[14] M. Wang, K. Kasmarik, A. Bezerianos, K. C. Tan and H. Abbass, "On the Channel Density of EEG Signals for Reliable Biometric Recognition," Pattern Recognition Letters, vol. 147, pp. 134–141, DOI: 10.1016/J.PATREC.2021.04.003, Jul. 2021.

[15] W. Cui, Z. Wang and Y. Li, "ECG-based Biometric Recognition under Exercise and Rest Situations," Biomedical Engineering Advances, p. 100008, DOI: 10.1016/J.BEA.2021.100008, Jul. 2021.

[16] Z. Wang, J. Yang and Y. Zhu, "Review of Ear Biometrics," Archives of Computational Methods in Engineering, vol. 28, no. 1, pp. 149–180, DOI: 10.1007/S11831-019-09376-2, Nov. 2019.

[17] A. Abaza, A. Ross, C. Hebert et al., "A Survey on Ear Biometrics," ACM Computing Surveys (CSUR), vol. 45, no. 2, pp. 1-35, DOI: 10.1145/2431211.2431221, Mar. 2013.

[18] Ž. Emeršič, V. Štruc and P. Peer, "Ear Recognition: More than a Survey," Neurocomputing, vol. 255, pp. 26–39, DOI: 10.1016/J.NEUCOM.2016.08.139, Sep. 2017.

[19] L. P. Etter, E. J. Ragan, R. Campion, D. Martinez and C. J. Gill, "Ear Biometrics for Patient Identification in Global Health: A Field Study to Test the Effectiveness of an Image Stabilization Device in Improving Identification Accuracy," BMC Medical Informatics and Decision Making, vol. 19, no. 1, pp. 1–9, DOI: 10.1186/S12911-019-0833-9, Jun. 2019.

[20] B. Bhanu, "Ear Shape for Biometric Identification," Encyclopedia of Cryptography and Security, pp. 372–378, DOI: 10.1007/978-1-4419-5906-5_738, 2011.

[21] A. Kamboj, R. Rani and A. Nigam, "A Comprehensive Survey and Deep Learning-based Approach for Human Recognition Using Ear Biometric," The Visual Computer, vol. 2021, pp. 1–34, DOI: 10.1007/S00371-021-02119-0, 2021.

[22] S. Ntshangase and D. Mathekga, "Ear Recognition for Young Children," Proc. of the IEEE International Multidisciplinary Information Technology and Engineering Conference (IMITEC 2019), DOI: 10.1109/IMITEC45504.2019.9015852, Vanderbijlpark, South Africa, Nov. 2019.

[23] P. Kavipriya, M. R. Ebenezar Jebarani, T. Vino and G. Jegan, "Ear Biometric for Personal Identification Using Canny Edge Detection Algorithm and Contour Tracking Method," Materials Today: Proceedings, DOI: 10.1016/J.MATPR.2021.03.351, Apr. 2021.

[24] M. Cheribet and S. Mazouzi, "A New Adapted Canny Filter for Edge Detection in Range Images," Jordanian Journal of Computers and Information Technology (JJCIT), vol. 7, no. 3, pp. 278-291, DOI: 10.5455/JJCIT.71-1620428305, Sep. 2021.

[25] N. Mangayarkarasi, G. Raghuraman and A. Nasreen, "Contour Detection Based Ear Recognition for Biometric Applications," Procedia Computer Science, vol. 165, pp. 751–758, DOI: 10.1016/J.PROCS.2020.01.016, Jan. 2019.

[26] S. M. Jiddah and K. Yurtkan, "Fusion of Geometric and Texture Features for Ear Recognition," Proc. of the 2nd International Symposium on Multidisciplinary Studies and Innovative Technologies (ISMSIT 2018), DOI: 10.1109/ISMSIT.2018.8567044, Ankara, Turkey, Dec. 2018.

[27] M. Zarachoff, A. Sheikh-Akbari and D. Monekosso, "Single Image Ear Recognition Using Wavelet-based Multi-band PCA," Proc. of the 27th IEEE European Signal Processing Conference (EUSIPCO 2019), vol. 2019-September, DOI: 10.23919/EUSIPCO.2019.8903090, A Coruna, Spain, Sep. 2019.

[28] S. Sajadi and A. Fathi, "Genetic Algorithm Based Local and Global Spectral Features Extraction for Ear Recognition," Expert Systems with Applications, vol. 159, p. 113639, DOI: 10.1016/J.ESWA.2020.113639, Nov. 2020.

[29] S. F. A. Abuowaida and H. Y. Chan, "Improved Deep Learning Architecture for Depth Estimation from Single Image," Jordanian Journal of Computers and Information Technology (JJCIT), vol. 6, no. 4, pp. 434–445, DOI: 10.5455/JJCIT.71-1593368945, Dec. 2020.

[30] Y. Khaldi, A. Benzaoui, A. Ouahabi, S. Jacques and A. Taleb-Ahmed, "Ear Recognition Based on Deep Unsupervised Active Learning," IEEE Sensors Journal, Early Access, vol. 2021, DOI: 10.1109/JSEN.2021.3100151, 2021.

[31] Y. Lei, B. Du, J. Qian and Z. Feng, "Research on Ear Recognition Based on SSD-MobileNet-v1 Network," Proceedings of the Chinese Automation Congress, (CAC 2020), pp. 4371–4376, DOI: 10.1109/CAC51589.2020.9326541, Nov. 2020.

[32] T. Ying, W. Shining and L. Wanxiang, "Human Ear Recognition Based on Deep Convolutional Neural Network," Proc. of the 30th Chinese Control and Decision Conference (CCDC 2018), pp. 1830–1835, DOI: 10.1109/CCDC.2018.8407424, Jul. 2018.

[33] M. Chowdhury, R. Islam and J. Gao, "Robust Ear Biometric Recognition Using Neural Network," Proc. of the 12th IEEE Conference on Industrial Electronics and Applications (ICIEA 2017), vol. 2018-February, pp. 1855–1859, DOI: 10.1109/ICIEA.2017.8283140, Feb. 2018.

[34] H. Alshazly, C. Linse, E. Barth and T. Martinetz, "Deep Convolutional Neural Networks for Unconstrained Ear Recognition," IEEE Access, vol. 8, pp. 170295–170310, DOI: 10.1109/ACCESS.2020.3024116, 2020.

[35] M. Alejo and C. P. G. Hate, "Unconstrained Ear Recognition through Domain Adaptive Deep Learning Models of Convolutional Neural Network," International Journal of Recent Technology and Engineering, vol. 8, no. 2, DOI: 10.35940/ijrte.B2865.078219, 2019.

[36] A. A. Almisreb, N. Jamil and N. M. Din, "Utilizing AlexNet Deep Transfer Learning for Ear Recognition," Proc. of the 4th International Conference on Information Retrieval and Knowledge Management: Diving into Data Sciences (CAMP 2018), pp. 8–12, DOI: 10.1109/INFRKM.2018.8464769, 2018.

[37] Y. Zhong and W. Deng, "Face Transformer for Recognition," arXiv, arXiv:2103.14803v2, [Online], Available: https://arxiv.org/abs/2103.14803v2, Mar. 2021.

[38] A. George and S. Marcel, "On the Effectiveness of Vision Transformers for Zero-shot Face Anti-Spoofing," Proc. of IEEE International Joint Conference on Biometrics (IJCB), pp. 1–8, DOI: 10.1109/IJCB52358.2021.9484333, Shenzhen, China, 2021.

[39] A. Vaswani et al., "Attention Is All You Need," Advances in Neural Information Processing Systems, vol. 2017-December, pp. 5999–6009, [Online], Available: https://arxiv.org/abs/1706.03762v5, 2017.

[40] S. Khan, M. Naseer, M. Hayat, S. W. Zamir, F. S. Khan and M. Shah, "Transformers in Vision: A Survey," arXiv, [Online], Available: http://arxiv.org/abs/2101.01169, 2021.

[41] N. Carion, F. Massa, G. Synnaeve, N. Usunier, A. Kirillov and S. Zagoruyko, "End-to-End Object Detection with Transformers," Proc. of the European Conference on Computer Vision, Part of the Lecture Notes in Computer Science Book Series, vol. 12346, pp. 213–229, [Online], Available: https://arxiv.org/abs/2005.12872v3, 2021.

[42] X. Zhu, W. Su, L. Lu, B. Li, X. Wang and J. Dai, "Deformable DETR: Deformable Transformers for End-to-End Object Detection," arXiv, [Online]. Available: https://arxiv.org/abs/2010.04159v4, Oct. 2020, Accessed: Aug. 02, 2021.

[43] H. Wang, Y. Zhu, B. Green, H. Adam, A. Yuille and L.-C. Chen, "Axial-DeepLab: Stand-alone Axial-attention for Panoptic Segmentation," Proc. of the European Conference on Computer Vision, Part of the Lecture Notes in Computer Science Book Series, vol. 12349, pp. 108–126, [Online], Available: https://arxiv.org/abs/2003.07853v2, 2021.

[44] L. Ye, M. Rochan, Z. Liu and Y. Wang, "Cross-modal Self-attention Network for Referring Image Segmentation," Proc. of the IEEE Computer Society Conf. on Computer Vision and Pattern Recognition (CVPR), vol. 2019, pp. 10494–10503, DOI: 10.1109/CVPR.2019.01075, CA, USA, 2019.

[45] N. Parmar et al., "Image Transformer," Proc. of the 35th Int. Conf. on Machine Learning (ICML 2018), vol. 9, pp. 6453–6462, [Online], Available: https://arxiv.org/abs/1802.05751v3, Feb. 2018.

[46] M. Chen et al., "Generative Pretraining from Pixels," Proc. of the 37th International Conference on Machine Learning, pp. 1691–1703. [Online]. Available: http://proceedings.mlr.press/v119/chen20s.

html, Nov. 2020.

[47] P. Esser, R. Rombach and B. Ommer, "Taming Transformers for High-resolution Image Synthesis," arXiv, [Online], Available: https://arxiv.org/abs/2012.09841v3, Dec. 2020.

[48] Y. Jiang, S. Chang and Z. Wang, "TransGAN: Two Pure Transformers Can Make One Strong GAN and That Can Scale Up," arXiv, [Online], Available: http://arxiv.org/abs/2102.07074, Feb. 2021.

[49] X. Wang, C. Yeshwanth and M. Nießner, "SceneFormer: Indoor Scene Generation with Transformers," arXiv, [Online], Available: https://arxiv.org/abs/2012.09793, Dec. 2020.

[50] A. Radford et al., "Learning Transferable Visual Models from Natural Language Supervision," arXiv, [Online], Available: http://arxiv.org/abs/2103.00020, Feb. 2021.

[51] A. Dosovitskiy et al., "An Image Is Worth 16x16 Words: Transformers for Image Recognition at Scale," arXiv, [Online], Available: http://arxiv.org/abs/2010.11929, Oct. 2020.

[52] H. Touvron, M. Cord, M. Douze, F. Massa, A. Sablayrolles and H. Jégou, "Training Data-efficient Image Transformers & Amp; Distillation through Attention," arXiv, [Online], Available: https://arxiv.org/abs/2012.12877, Dec. 2020.

[53] A. Bakhtiarnia, Q. Zhang and A. Iosifidis, "Single-layer Vision Transformers for More Accurate Early Exits with Less Overhead," arXiv, [Online], Available: https://arxiv.org/abs/2105.09121v1, May 2021.

[54] M. Naseer, K. Ranasinghe, S. Khan, M. Hayat, F. S. Khan and M.-H. Yang, "Intriguing Properties of Vision Transformers," arXiv, [Online], Available: https://arxiv.org/abs/2105.10497v2, May 2021.

[55] V. T. Hoang, "EarVN1.0: A New Large-scale Ear Images Dataset in the Wild," Data in Brief, vol. 27, p. 104630, DOI: 10.1016/J.DIB.2019.104630, Dec. 2019.

[56] Ž. Emeršič et al., "The Unconstrained Ear Recognition Challenge 2019 - ArXiv Version with Appendix," arXiv, [Online], Available: https://arxiv.org/abs/1903.04143v3, Mar. 2019.

[57] K. Weiss, T. M. Khoshgoftaar and D. D. Wang, "A Survey of Transfer Learning, " Journal of Big Data, vol. 3, no. 1, pp. 1–40, DOI: 10.1186/s40537-016-0043-6., Dec. 2016.

[58] B. Zoph, V. Vasudevan, J. Shlens and Q. V. Le, "Learning Transferable Architectures for Scalable Image Recognition, "Proc. of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 8697–8710, DOI: 10.1109/CVPR.2018.00907, 2018.

[59] X. Ying, "An Overview of Overfitting and Its Solutions," Journal of Physics: Conference Series, vol. 1168, no. 2, DOI: 10.1088/1742-6596/1168/2/022022, Mar. 2019.

[60] X. Chen, C.-J. Hsieh and B. Gong, "When Vision Transformers Outperform ResNets without Pretraining or Strong Data Augmentations," arXiv, arXiv: 2106.01548, [Online], Available: https://arxiv.org/abs/2106.01548, Jun. 2021.