Passer au contenu

/ Département d'informatique et de recherche opérationnelle

Je donne
Navigation secondaire

Présentation prédoc III - Nizar Islah

Bonjour à tous,


Vous êtes invité à assister à l'examen Prédoc III de Nizar Islah, vendredi le 11 avril, à 15h00.


Title: Reframing continual learning as a memory problem at training and test time

Date: vendredi le 11 avril de 15h à 17h

Location:  4.17.013 (4e étage), 3175 Chem. de la Côte-Sainte-Catherine, Montréal, QC H3T 1C5 ( CHU Sainte-Justine)

 

Jury

Président 
Aishwarya Agrawal
DirecteurEilif Muller
Co-DirecteurIrina Rish
Membre
Dianbo Liu

 

Abstract

Continual learning (CL) is a core challenge in both biological and artificial intelligence, centered on how agents retain and integrate information over time without interference or forgetting. In large language models (LLMs), this challenge re-emerges in a new form: the need to process extended contexts without re-training the model. Despite architectural advances, LLMs remain limited in maintaining coherent internal representations over long sequences. Standard attention mechanisms scale quadratically with sequence length, making them impractical for real-world settings such as long documents, code bases, or conversations.

While recent architectures, including those augmented with memory, have been proposed to be computationally more efficient, they still suffer from poor recall ability and forgetting over extended inputs. Here, we propose a unified framework that recasts both training-time CL and test time memory for long context processing as a continual memory problem. We interpret streaming long-contexts as an instance of continual learning over a sequence, where memory must be updated incrementally and selectively. We hypothesize that interference in test-time memory leads to failures on long sequences. We introduce metrics and approaches inspired by the continual learning literature aimed at addressing this problem. We further propose desiderata for scalable memory by emphasizing sparse, modular, and stable update mechanisms, and demonstrate how these principles could be leveraged to allow models to integrate new knowledge while retaining what was previously learned. By reframing continual learning as a unifying lens across training and inference and drawing inspiration from biological memory systems, we address core limitations in the deployment of LLMs in real-world domains.