TY - GEN
T1 - Monocular Depth Estimation with Adaptive Geometric Attention
AU - Naderi, Taher
AU - Sadovnik, Amir
AU - Hayward, Jason
AU - Qi, Hairong
N1 - Publisher Copyright:
© 2022 IEEE.
PY - 2022
Y1 - 2022
N2 - ingle image depth estimation is an ill-posed problem. That is, it is not mathematically possible to uniquely estimate the 3rd dimension (or depth) from a single 2D image. Hence, additional constraints need to be incorporated in order to regulate the solution space. In this paper, we explore the idea of constraining the model by taking advantage of the similarity between the RGB image and the corresponding depth map at the geometric edges of the 3D scene for more accurate depth estimation. We propose a general light-weight adaptive geometric attention module that uses the cross-correlation between the encoder and the decoder as a measure of this similarity. More precisely, we use the cosine similarity between the local embedded features in the encoder and the decoder at each spatial point. The proposed module along with the encoder-decoder network is trained in an end-to-end fashion and achieves superior and competitive performance in comparison with other state-of-the-art methods. In addition, adding our module to the base encoder-decoder model adds only an additional 0.03% (or 0.0003) parameters. Therefore, this module can be added to any base encoder-decoder network without changing its structure to address any task at hand.
AB - ingle image depth estimation is an ill-posed problem. That is, it is not mathematically possible to uniquely estimate the 3rd dimension (or depth) from a single 2D image. Hence, additional constraints need to be incorporated in order to regulate the solution space. In this paper, we explore the idea of constraining the model by taking advantage of the similarity between the RGB image and the corresponding depth map at the geometric edges of the 3D scene for more accurate depth estimation. We propose a general light-weight adaptive geometric attention module that uses the cross-correlation between the encoder and the decoder as a measure of this similarity. More precisely, we use the cosine similarity between the local embedded features in the encoder and the decoder at each spatial point. The proposed module along with the encoder-decoder network is trained in an end-to-end fashion and achieves superior and competitive performance in comparison with other state-of-the-art methods. In addition, adding our module to the base encoder-decoder model adds only an additional 0.03% (or 0.0003) parameters. Therefore, this module can be added to any base encoder-decoder network without changing its structure to address any task at hand.
KW - 3D Computer Vision Deep Learning
KW - Autoencoders
KW - GANs
KW - Grouping and Shape
KW - Low-level and Physics-based Vision
KW - Neural Generative Models
KW - Scene Understanding
KW - Segmentation
KW - Vision for Robotics
UR - http://www.scopus.com/inward/record.url?scp=85126111839&partnerID=8YFLogxK
U2 - 10.1109/WACV51458.2022.00069
DO - 10.1109/WACV51458.2022.00069
M3 - Conference contribution
AN - SCOPUS:85126111839
T3 - Proceedings - 2022 IEEE/CVF Winter Conference on Applications of Computer Vision, WACV 2022
SP - 617
EP - 627
BT - Proceedings - 2022 IEEE/CVF Winter Conference on Applications of Computer Vision, WACV 2022
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 22nd IEEE/CVF Winter Conference on Applications of Computer Vision, WACV 2022
Y2 - 4 January 2022 through 8 January 2022
ER -