Passer au contenu

/ Département d'informatique et de recherche opérationnelle

Je donne

Rechercher

Navigation secondaire

Soutenance de thèse - Hattie Zhou's

Dear all / Bonjour à tous,

We are happy to invite you to Hattie Zhou's PhD defense on October 16th at 9 am (hybrid mode).

Vous êtes cordialement invité.e.s à la soutenance de thèse de Hattie Zhou's, le 16 octobre à 9h (mode hybride).

Title: Toward Neural Networks that Generalize Systematically

Date: October 16th, at 9 am

Location: A05 (Mila 6666, second floor)

Linkhttps://umontreal.zoom.us/j/84719965626?pwd=SFNiWVFqL2haNnVtTTFJSE9TUlJkZz09

 

Jury

President / Présidente
Aishwarya Agrawal
Director / Directeur de rechercheHugo Larochelle
Member / Membre
Sarath Chandar
External examiner / Examinateur externe
Boaz Barak(TBD)

Abstract:

A defining characteristic of human intelligence is our ability to generalize systematically---i.e. generalize to test examples which are structurally different from those seen in training. This often requires the ability to combine known components in novel ways for compositional tasks, or to learn the correct underlying problem-solving strategy or algorithm for reasoning tasks. However, systematic generalization remains a challenge for deep learning systems, which are powerful in capturing statistical regularities in the dataset, but fall short when these patterns do not capture the true data-generating structures.

In this thesis, we aim to understand the factors that affect systematic generalization in neural networks. We begin by introducing the forget-and-relearn paradigm, which unifies a number of iterative training algorithms proposed in the literature. In this process, the forgetting operation selectively removes undesirable information from the model, and the relearning stage reinforces features that are consistently useful under different conditions. We show that this method of training can significantly improve generalization on vision tasks in the low-data setting, and improve the compositionality of emergent languages in the Lewis communication game. Next, we study the ability of Transformer-based language models to learn and execute an algorithm via in-context learning. We introduce "algorithmic prompting''--- a prompting strategy that unlocks symbolic reasoning capabilities in large language models (LLMs) on arithmetic tasks, and demonstrate the first instance of strong length generalization on tasks like addition and parity using general-purpose Transformer architectures. Finally, we aim to characterize the tasks for which Transformers trained from scratch can exhibit strong length generalization. We hypothesize that solutions which are simple-to-represent are also more likely to be learned, and that the number of RASP-L operations of a solution can be used as a measure of Transformer-complexity. We show empirically that tasks with simple algorithmic solutions per RASP-L are more likely to exhibit strong length generalization. Finally, we discuss remedies for learning tasks or algorithms which are unnatural for a Transformer.