Soutenance de thèse - Sara Hooker
Bonjour à tous,
Vous êtes cordialement invité.e.s à la soutenance de thèse de Sara Hooker, le 30 août à 14h30 (mode hybride).
Title: Beyond Top Line Metrics: Understanding the Trade-off Between Model Size and Generalization Properties
Date: 30 Août 2024 de 14:30 à 17:30 EST
Location: Auditorium 1, MILA + *Zoom Link
Jury
Président rapporteur | Agrawal, Aishwarya |
Directeur de recherche | Courville, Aaron |
Co-directeur de recherche | Larochelle, Hugo |
Membre régulier | Farnadi, Golnoosh |
Examinateur externe | Frankle, Jonathan, Databricks inc. |
Abstract
An argument in favor of scaling the size of modern algorithms is that it is a surprisingly simple recipe that has provided persuasive gains in overall performance. Ken Thompson famously said “When in doubt, use brute force.” It is costly to deviate from the predictable gains of adding more parameters, particularly when different regimes of parameter size appear to unlock new and unexpected generalization properties. However, a key limitation of simply throwing more parameters at a task is that the relationship between weights and generalization remains poorly understood.
The works we will discuss ask “What is gained or lost as we vary the number of parameters in a deep neural network?” This question is very relevant in an era of scientific inquiry where the large size of networks incurs prohibitive energy costs and hurts accessibility.
The key findings across the constituent works is that we pay an exorbitant amount in compute to learn rare patterns in the world around us. When we radically vary the number of the parameters, we lose performance on a tiny slice of the distribution -- the long-tail. Most natural datasets follow a long-tail distribution, with many infrequent attributes. Hence, the findings of this thesis have widespread implications for understanding the limitations of our current optimization approaches for modeling the real world.