Présentation prédoc III de Marco Jiralespong
Bonjour à tous,
Vous êtes tous et toutes cordialement invité.es à assister à la présentation de projet du prédoc III de Marco
Jiralespong, le 27 août à 15h (mode hybride).
Titre : Improved Soft Operators for Reinforcement Learning and Scientific
Date: mercredi 27 août à 15h.
Location: Auditorium 2, Mila, 6650 (2e étage)
Jury
| Président | Aaron Courville |
| Directeur | Gauthier Gidel |
| Membre | Pierre-Luc Bacon |
Résumé
The goal of reinforcement learning (RL) is often presented as learning apolicy that maximizes the expected sum of discounted rewards, leading tothe ubiquity of the Bellman optimality operator. However, this standard goal is not the only possible target for RL: alternative operators exist,each corresponding to different notions of optimality. We begin by taking an operator-based perspective to present and compare different forms of RL(including regularized RL, robust RL and generative flow networks). We then turn our attention to the scientific discovery problem of narrowing a large combinatorial set of objects, such as proteins or molecules, to a small set of promising candidates. While RL-based approaches have been used for this task, we show that the currently used operators either lack diversity or yield suboptimal candidates, especially in large search spaces. To remedy this, we propose a novel general operator, general mellowmax, that target speakier sampling distributions while encompassing known soft RL operators.Then, we propose a corresponding algorithm and demonstrate that itidentifies higher-quality, diverse candidates in both synthetic and real-world tasks. Finally, we outline potential future research directionsfor our work: algorithmic changes, improved generalization and better connections to real scientific discovery applications.