Présentation prédoc III de Qian Yang
Bonjour à tous,
Vous êtes tous et toutes cordialement invité.es à assister à la présentation de projet du prédoc III de Qian Yang, le 27 août à 10h (mode hybride).
Titre : Building Reliable and Resource-Efficient Vision-Language Models
Date: mercredi 27 août à 10h.
Location: Auditorium 2, Mila, 6650 (2e étage)
Jury
| Président | Aaron Courville |
| Directeur | Aishwarya Agrawal |
| Membre | Chris Pal |
Résumé
Vision-Language Models (VLMs) are transformative, but their widespreaddeployment is hindered by two core challenges: answer reliability and resource efficiency. This report summarizes my foundational research onthese issues and outlines a comprehensive thesis plan. On reliability, we introduce a task-decomposition-based method to measure VLM trustworthiness,demonstrating the unreliability of traditional confidence scores. On data efficiency, we introduce a prioritized concept learning framework for MLLM instruction tuning, which provides a way to track the learning behavior, and an efficient transfer learning method for aligning unimodal models using much fewer cross-modal data.