Abstract
Visual attributes are powerful features for many different applications in computer vision such as object detection and scene recognition. Visual attributes present another application that has not been examined as rigorously: verbal communication from a computer to a human. Since many attributes are nameable, the computer is able to communicate these concepts through language. However, this is not a trivial task. Given a set of attributes, selecting a subset to be communicated is task dependent. Moreover, because attribute classifiers are noisy, it is important to find ways to deal with this uncertainty. We address the issue of communication by examining the task of composing an automatic description of a person in a group photo that distinguishes him from the others. We introduce an efficient, principled method for choosing which attributes are included in a short description to maximize the likelihood that a third party will correctly guess to which person the description refers. We compare our algorithm to computer baselines and human describers, and show the strength of our method in creating effective descriptions.
Original language | English |
---|---|
Article number | 6619241 |
Pages (from-to) | 3089-3096 |
Number of pages | 8 |
Journal | Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition |
DOIs | |
State | Published - 2013 |
Externally published | Yes |
Event | 26th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2013 - Portland, OR, United States Duration: Jun 23 2013 → Jun 28 2013 |
Keywords
- Attributes
- Image Description
- Referring Expression