Hi everyone! Happy New Year and our thrilling AMLab Seminar will come back this Thursday! We have an external speaker Yuge Shi from Oxford University and you are all cordially invited to the AMLab Seminar on January 14th at 4:00 p.m. CET on Zoom, where Yuge will give a talk titled “Multimodal Learning with Deep Generative Models“.
Title : Multimodal Learning with Deep Generative Models
Abstract: In this talk, I will present my two works on multi-modal representation learning using deep generative models. In these works, we mainly focus on multi-modal scenarios that naturally occur in the real world that depict common concepts, such as image-caption, photo-sketch, video-audio etc. In the first work, we propose to use a mixture-of-expert posterior in VAE to achieve balanced representation learning of different modalities; by doing so, the model is able to leverage the commonality between modalities to learn more robust representations and achieve better generative performance. In addition, we also proposed 4 criteria (with evaluation metrics) that multi-modal deep generative models should satisfy; in the second work, we designed a contrastive-ELBO objective for multi-modal VAEs that greatly reduced the amount of paired data needed to train such models. We show that our objective is effective on multiple SOTA multi-modal VAEs and on different datasets, and showed that only 20% of data is needed to achieve similar performance to a model trained on the original objective.
To gain more deep insights into Multi-modal Learning, feel free to join and discuss it! See you there 🙂 !