Valentin Thomas PhD Defense
Dear all / Bonjour à tous,
We are happy to invite you to Valentin Thomas' PhD defense, Wednesday, August 29th, at 9:00 am (hybrid mode).
Vous êtes cordialement invité.e.s à la soutenance de thèse de Valentin Thomas, mercredi 29 août, à 9h00 (mode hybride)
Title: Learning and Planning with Noise in Optimization and Reinforcement Learning
Date: August 29th, 2023 at 9:00am-11:00am EST
Location: Auditorium 1 - 6650 Rue Saint Urbain
Link: https://meet.google.com/bbg-emrw-iid
Jury
Président | Bacon, Pierre-Luc |
Directeur de recherche | Bengio, Yoshua |
Co-directeur de recherche | Le Roux, Nicolas |
Membre | Berseth, Glen |
Examinateur externe | Thomas, Philip, Univ. Massachusetts |
Abstract
Most modern machine learning algorithms incorporate a degree of randomness in their processes, which we will refer to as noise, which can ultimately impact the model's predictions. In this thesis, we take a closer look at learning and planning in the presence of noise for reinforcement
learning and optimization algorithms.
The first two articles presented in this document focus on reinforcement learning in an unknown environment, specifically how we can design algorithms that use the stochasticity of their policy and of the environment to their advantage. Our first contribution presented in this document focuses on the unsupervised reinforcement learning setting. We show how an agent left alone in an unknown world without any specified goal can learn which aspects of the environment it can control independently from each other as well as jointly learning a disentangled latent representation of these aspects, or factors of variation. The second contribution focuses on planning in continuous control tasks. By framing reinforcement learning as an inference problem, we borrow tools from Sequential Monte Carlo literature to design a theoretically grounded and efficient algorithm for probabilistic planning using a learned model of the world. We show how the agent can leverage the uncertainty of the model to imagine a diverse set of solutions.
The following two contributions analyze the impact of gradient noise due to sampling in optimization algorithms. The third contribution examines the role of gradient noise in maximum likelihood estimation with stochastic gradient descent, exploring the relationship between the structure of the gradient noise and local curvature on the generalization and convergence speed of the model. Our fourth contribution returns to the topic of reinforcement learning to analyze the impact of sampling noise on the policy gradient algorithm. We find that sampling noise can significantly impact the optimization dynamics and policies discovered in on-policy reinforcement learning.