TY - GEN
T1 - Spoken attributes
T2 - 2013 14th IEEE International Conference on Computer Vision, ICCV 2013
AU - Sadovnik, Amir
AU - Gallagher, Andrew
AU - Parikh, Devi
AU - Chen, Tsuhan
PY - 2013
Y1 - 2013
N2 - In recent years, there has been a great deal of progress in describing objects with attributes. Attributes have proven useful for object recognition, image search, face verification, image description, and zero-shot learning. Typically, attributes are either binary or relative: they describe either the presence or absence of a descriptive characteristic, or the relative magnitude of the characteristic when comparing two exemplars. However, prior work fails to model the actual way in which humans use these attributes in descriptive statements of images. Specifically, it does not address the important interactions between the binary and relative aspects of an attribute. In this work we propose a spoken attribute classifier which models a more natural way of using an attribute in a description. For each attribute we train a classifier which captures the specific way this attribute should be used. We show that as a result of using this model, we produce descriptions about images of people that are more natural and specific than past systems.
AB - In recent years, there has been a great deal of progress in describing objects with attributes. Attributes have proven useful for object recognition, image search, face verification, image description, and zero-shot learning. Typically, attributes are either binary or relative: they describe either the presence or absence of a descriptive characteristic, or the relative magnitude of the characteristic when comparing two exemplars. However, prior work fails to model the actual way in which humans use these attributes in descriptive statements of images. Specifically, it does not address the important interactions between the binary and relative aspects of an attribute. In this work we propose a spoken attribute classifier which models a more natural way of using an attribute in a description. For each attribute we train a classifier which captures the specific way this attribute should be used. We show that as a result of using this model, we produce descriptions about images of people that are more natural and specific than past systems.
KW - attributes
KW - relative attributes
KW - visual attributes
UR - http://www.scopus.com/inward/record.url?scp=84898815599&partnerID=8YFLogxK
U2 - 10.1109/ICCV.2013.268
DO - 10.1109/ICCV.2013.268
M3 - Conference contribution
AN - SCOPUS:84898815599
SN - 9781479928392
T3 - Proceedings of the IEEE International Conference on Computer Vision
SP - 2160
EP - 2167
BT - Proceedings - 2013 IEEE International Conference on Computer Vision, ICCV 2013
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 1 December 2013 through 8 December 2013
ER -