IMPROVED DEEP LEARNING ARCHITECTURE FOR DEPTH ESTIMATION FROM SINGLE IMAGE

(Received: 28-Jun.-2020, Revised: 3-Aug.-2020 and 20-Sep.-2020 , Accepted: 27-Sep.-2020)

Authors Suhaila F. A. Abuowaida, Huah Yong Chan,

Keywords #Depth estimation #Single image #Deep learning #Encoder-decoder

Abstract Numerous benefits of depth estimation from the single image field on medicine, robot video games and 3D reality applications have garnered attention in recent years. Closely related to the third dimension of depth, this operation can be accomplished using human vision, though considered challenging due to the various issues when using computer vision. The differences in the geometry, the texture of the scene, the occlusion scene boundaries and the inherent ambiguity exist because of the minimal information that could be gathered from a single image. This paper, therefore, proposes a novel depth estimation in the field of architecture, which includes the stages that can manage depth estimation from a single RGB image. An encoder-decoder architecture has been proposed, based on the improvement yielded from DenseNet that extracted the map of an image using skip connection technique. This paper also takes on the reverse Huber loss function that essentially suits our architecture hand driven by the value distributions that are commonly present in depth maps. Experimental results have indicated that the depth estimation architecture that employs the NYU Depth v2 dataset has a better performance than the other state-of-the-art methods that tend to have fewer parameters and require fewer training time.

References

[1] A. Abrams, C. Hawley and R. Pless, "Heliometric Stereo: Shape from Sun Position," European Conference on Computer Vision, Part of the Lecture Notes in Computer Science Book Series (LNCS), Vol. 7573, pp. 357–370, Springer, 2012.

[2] I. Alhashim and P. Wonka, "High Quality Monocular Depth Estimation via Transfer Learning," arXiv: 1812.11941v2, [Online], Available: https://arxiv.org/pdf/1812.11941.pdf, 2018.

[3] A. Atapour-Abarghouei and T. P. Breckon, "Real-time Monocular Depth Estimation Using Synthetic Data with Domain Adaptation via Image Style Transfer," Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (IEEE/CVF), pp. 2800–2810, Salt Lake City, UT, USA, 2018.

[4] T. Bebie and H. Bieri, "A Video-based 3D-Reconstruction of Soccer Games," Computer Graphics Forum, vol. 19, no. 3, pp. 391–400, DOI: 10.1111/1467-8659.00431, 2000.

[5] M. Carvalho et al., "On Regression Losses for Deep Depth Estimation," Proc. of the 25th IEEE International Conference on Image Processing (ICIP), pp. 2915–2919, Athens, Greece, 2018.

[6] J. Dai, K. He and J. Sun, "Instance-aware Semantic Segmentation via Multi-task Network Cascades," Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3150– 3158, Las Vegas, NV, USA, 2016.

[7] D. Eigen, C. Puhrsch and R. Fergus, "Depth Map Prediction from a Single Image Using a Multi-scale Deep Network," Advances in Neural Information Processing Systems, arXiv: 1406.2283v1, pp. 2366– 2374, [Online], Available: https://arxiv.org/pdf/1406.2283.pdf, 2014.

[8] H. Fu et al., "Deep Ordinal Regression Network for Monocular Depth Estimation," Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (IEEE/CVF), pp. 2002–2011, Salt Lake City, UT, USA, 2018.

[9] A. Grigorev et al., "Depth Estimation from Single Monocular Images Using Deep Hybrid Network," Multimedia Tools and Applications, vol. 76, no. 18, pp. 18585–18604, 2017.

[10] Z. Hao et al., "Detail Preserving Depth Estimation from a Single Image Using Attention Guided Networks," Proc. of the IEEE International Conference on 3D Vision (3DV), pp. 304–313, Verona, Italy, 2018.

[11] K. He et al., "Deep Residual Learning for Image Recognition," Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778, Las Vegas, NV, USA, 2016.

[12] K. He et al., "Mask R-CNN," Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 2961–2969, Venice, Italy, 2017.

[13] G. Huang et al., "Densely Connected Convolutional Networks," Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4700–4708, Honolulu, HI, USA, 2017.

[14] I. Laina et al., "Deeper Depth Prediction with Fully Convolutional Residual Networks," Proc. of the 4th I EEE International Conference on 3D Vision (3DV), pp. 239–248, Stanford, CA, USA, 2016.

[15] W. Lee, N. Park and W. Woo, "Depth-assisted Real-time 3D Object Detection for Augmented Reality," Proc. of the 21st International Conference on Artificial Reality and Telexistence, (ICAT), vol. 11, no. 2, pp. 126–132, Osaka, Japan, 2011.

[16] B. Li et al., "Depth and Surface Normal Estimation from Monocular Images Using Regression on Deep Features and Hierarchical CRFs," Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVRP), pp. 1119–1127, Boston, MA, USA, 2015.

[17] Y. Li et al., "Fully Convolutional Instance-aware Semantic Segmentation," Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVRP), pp. 2359–2367, Honolulu, HI, USA, 2017.

[18] F. Liu et al., "Learning Depth from Single Monocular Images Using Deep Convolutional Neural Fields,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 38, no. 10, pp. 2024–2039, 2015.

[19] M. Liu, M. Salzmann and X. He, "Discrete-continuous Depth Estimation from a Single Image," Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 716–723, Columbus, OH, USA, 2014.

[20] Y. Liu et al., "Continuous Depth Estimation for Multi-view Stereo," Proc. of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2121–2128, Miami, FL, USA, 2009.

[21] P. K. Martin et al., "Improved Depth Map Estimation from Stereo Images Based on Hybrid Method," RadioEngineering Journal, vol. 21, no. 1, pp. 70-78, 2012.

[22] F. Qi et al., "Structure Guided Fusion for Depth Map Inpainting," Pattern Recognition Letters, vol. 34, no. 1, pp. 70–76, 2013.

[23] H. Ren, M. El-Khamy and J. Lee, "Deep Robust Single Image Depth Estimation Neural Network Using Scene Understanding," Computer Vision and Pattern Recognition Workshops, arXiv: 1906.03279v1, [Online], Available: https://arxiv.org/pdf/1906.03279.pdf, pp. 37–45, 2019.

[24] A. Saxena, M. Sun and A. Y. Ng, "Make3D: Learning 3D Scene Structure from a Single Still Image," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 31, no. 5, pp. 824–840, 2008.

[25] N. Silberman et al., "Indoor Segmentation and Support Inference from RGBD Images," Proc. of European Conference on Computer Vision, Part of the Lecture Notes in Computer Science Book Series (LNCS), vol. 7576, pp. 746–760, Springer, 2012.

[26] F. Simões et al., "Challenges in 3D Reconstruction from Images for Difficult Large-scale Objects: A Study on the Modeling of Electrical Substations," Proc. of the 14th IEEE Symposium on Virtual and Augmented Reality, pp. 74–83, Rio de Janiero, Brazil, 2012.

[27] R. Szeliski, Computer Vision: Algorithms and Applications (Texts in Computer Science), 2011 Edition, Springer, 2011.

[28] M. W. Tao et al., "Depth from Shading, Defocus and Correspondence Using Light-field Angular Coherence," Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1940–1948, Boston, MA, USA, 2015.

[29] P. Wang et al., "Towards Unified Depth and Semantic Prediction from a Single Image," Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2800–2809, Boston, MA, USA, 2015.

[30] Y. C. Wong et al., "Deep Learning-based Racing Bib Number Detection and Recognition," Jordanian Journal of Computers and Information Technology (JJCIT), vol. 5, no. 3, pp. 181-194, 2019.

[31] D. Xu et al., "Multi-scale Continuous CRFs As Sequential Deep Networks for Monocular Depth Estimation," Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5354–5362, Honolulu, HI, USA, 2017.

[32] H. Xu, Y. Cai and R. Wang, "Depth Estimation in Multi-view Stereo Based on Image Pyramid," Proceedings of the 2nd International Conference on Computer Science and Artificial Intelligence, pp. 345–349, [Online], Available: https://doi.org/10.1145/3297156.3297238, 2018.

[33] S. Zachow, M. Zilske and H.-C. Hege, "3D Reconstruction of Individual Anatomy from Medical Image Data: Segmentation and Geometry Processing," Proc. of the 25th ANSYS Conference & CADFEM Users' Meeting, Proc. CD 2.12.15, ZIB-Report, pp. 7-41, ISSN: 1438-0064, Dresden, Germany, 2007.