Passer au contenu

/ Department of Computer Science and Operations Research

Je donne

Rechercher

Présentation prédoc III de Qian Yang

Bonjour à tous,

Vous êtes tous et toutes cordialement invité.es à assister à la présentation de projet du prédoc III de Qian Yang, le 27 août à 10h (mode hybride).

Titre : Building Reliable and Resource-Efficient Vision-Language Models

Date: mercredi 27 août à 10h.

Location: Auditorium 2, Mila, 6650 (2e étage)

 

Jury

Président 
Aaron Courville
DirecteurAishwarya Agrawal
MembreChris Pal

Résumé

Vision-Language Models (VLMs) are transformative, but their widespreaddeployment is hindered by two core challenges: answer reliability and resource efficiency. This report summarizes my foundational research onthese issues and outlines a comprehensive thesis plan. On reliability, we introduce a task-decomposition-based method to measure VLM trustworthiness,demonstrating the unreliability of traditional confidence scores. On data efficiency, we introduce a prioritized concept learning framework for MLLM instruction tuning, which provides a way to track the learning behavior, and an efficient transfer learning method for aligning unimodal models using much fewer cross-modal data.