Talk by Zeynep Akata

You are all cordially invited to the AMLab seminar on Tuesday May 2 at 16:00 in C3.163, where Zeynep Akata will give a talk titled “Vision and Language for Multimodal Deep Learning”. Afterwards there are the usual drinks and snacks.

Scaling up visual category recognition to large numbers of classes remains challenging. A promising research direction is zero-shot learning, which does not require any training data to recognize new classes, but rather relies on some form of auxiliary information describing the new classes. Ultimately, this may allow to use textbook knowledge that humans employ to learn about new classes by transferring knowledge from classes they know well. We tackle the zero-shot learning problem by learning a compatibility function such that matching image-class embedding pairs are assigned a higher score than mismatching pairs; zero-shot classification proceeds by finding the label vector yielding the highest joint compatibility score. We propose and compare different class embeddings learned automatically from unlabeled text corpora and from expert annotated attributes. Attribute annotations performed by humans are not readily available for most classes. On the other hand, humans have a natural ability to determine distinguishing properties of unknown objects. We use detailed visual descriptions collected from naive users as side-information for zero-shot learning, to generate images from scratch and to generate visual explanations which justify a classification decision.

Akata, Label Embeddings for Image Classification, TPAMI 2016
Xian, Latent Embeddings for Zero-Shot Classification, CVPR 2016
Akata, Multi-Cue Zero-Shot Learning with Strong Supervision, CVPR 2016
Xian, Zero-Shot Learning: The Good, the Bad and the Ugly, CVPR’17
Karessli, Gaze Embeddings for Zero-Shot Learning, CVPR’17
Reed, Learning Deep Representations of Fine-Grained Visual Descriptions, CVPR 2016
Reed, Generative Adversarial Text to Image Synthesis, ICML 2016
Reed, Learning What and Where to Draw, NIPS 2016
Hendricks, Akata, Generating Visual Explanations, ECCV 2016