TY - GEN
T1 - Image description with a goal
T2 - 2012 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2012
AU - Sadovnik, Amir
AU - Chiu, Yi I.
AU - Snavely, Noah
AU - Edelman, Shimon
AU - Chen, Tsuhan
PY - 2012
Y1 - 2012
N2 - Many works in computer vision attempt to solve different tasks such as object detection, scene recognition or attribute detection, either separately or as a joint problem. In recent years, there has been a growing interest in combining the results from these different tasks in order to provide a textual description of the scene. However, when describing a scene, there are many items that can be mentioned. If we include all the objects, relationships, and attributes that exist in the image, the description would be extremely long and not convey a true understanding of the image. We present a novel approach to ranking the importance of the items to be described. Specifically, we focus on the task of discriminating one image from a group of others. We investigate the factors that contribute to the most efficient description that achieves this task. We also provide a quantitative method to measure the description quality for this specific task using data from human subjects and show that our method achieves better results than baseline methods.
AB - Many works in computer vision attempt to solve different tasks such as object detection, scene recognition or attribute detection, either separately or as a joint problem. In recent years, there has been a growing interest in combining the results from these different tasks in order to provide a textual description of the scene. However, when describing a scene, there are many items that can be mentioned. If we include all the objects, relationships, and attributes that exist in the image, the description would be extremely long and not convey a true understanding of the image. We present a novel approach to ranking the importance of the items to be described. Specifically, we focus on the task of discriminating one image from a group of others. We investigate the factors that contribute to the most efficient description that achieves this task. We also provide a quantitative method to measure the description quality for this specific task using data from human subjects and show that our method achieves better results than baseline methods.
UR - http://www.scopus.com/inward/record.url?scp=84866654828&partnerID=8YFLogxK
U2 - 10.1109/CVPR.2012.6248003
DO - 10.1109/CVPR.2012.6248003
M3 - Conference contribution
AN - SCOPUS:84866654828
SN - 9781467312264
T3 - Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
SP - 2791
EP - 2798
BT - 2012 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2012
Y2 - 16 June 2012 through 21 June 2012
ER -