Soutenance de thèse - Hattie Zhou's
Dear all / Bonjour à tous,
We are happy to invite you to Hattie Zhou's PhD defense on October 16th at 9 am (hybrid mode).
Vous êtes cordialement invité.e.s à la soutenance de thèse de Hattie Zhou's, le 16 octobre à 9h (mode hybride).
Title: Toward Neural Networks that Generalize Systematically
Date: October 16th, at 9 am
Location: A05 (Mila 6666, second floor)
Link: https://umontreal.zoom.us/j/84719965626?pwd=SFNiWVFqL2haNnVtTTFJSE9TUlJkZz09
Jury
| President / Présidente | Aishwarya Agrawal |
| Director / Directeur de recherche | Hugo Larochelle |
| Member / Membre | Sarath Chandar |
| External examiner / Examinateur externe | Boaz Barak(TBD) |
Abstract:
A defining characteristic of human intelligence is our ability to generalize systematically---i.e. generalize to test examples which are structurally different from those seen in training. This often requires the ability to combine known components in novel ways for compositional tasks, or to learn the correct underlying problem-solving strategy or algorithm for reasoning tasks. However, systematic generalization remains a challenge for deep learning systems, which are powerful in capturing statistical regularities in the dataset, but fall short when these patterns do not capture the true data-generating structures.
In this thesis, we aim to understand the factors that affect systematic generalization in neural networks. We begin by introducing the forget-and-relearn paradigm, which unifies a number of iterative training algorithms proposed in the literature. In this process, the forgetting operation selectively removes undesirable information from the model, and the relearning stage reinforces features that are consistently useful under different conditions. We show that this method of training can significantly improve generalization on vision tasks in the low-data setting, and improve the compositionality of emergent languages in the Lewis communication game. Next, we study the ability of Transformer-based language models to learn and execute an algorithm via in-context learning. We introduce "algorithmic prompting''--- a prompting strategy that unlocks symbolic reasoning capabilities in large language models (LLMs) on arithmetic tasks, and demonstrate the first instance of strong length generalization on tasks like addition and parity using general-purpose Transformer architectures. Finally, we aim to characterize the tasks for which Transformers trained from scratch can exhibit strong length generalization. We hypothesize that solutions which are simple-to-represent are also more likely to be learned, and that the number of RASP-L operations of a solution can be used as a measure of Transformer-complexity. We show empirically that tasks with simple algorithmic solutions per RASP-L are more likely to exhibit strong length generalization. Finally, we discuss remedies for learning tasks or algorithms which are unnatural for a Transformer.