Passer au contenu

/ Department of Computer Science and Operations Research

Je donne

Rechercher

Navigation secondaire

Présentation prédoc III de Marco Jiralespong

Bonjour à tous,

Vous êtes tous et toutes cordialement invité.es à assister à la présentation de projet du prédoc III de Marco
Jiralespong, le 27 août à 15h (mode hybride).

Titre : Improved Soft Operators for Reinforcement Learning and Scientific

Date: mercredi 27 août à 15h.

Location: Auditorium 2, Mila, 6650 (2e étage)

 

Jury

Président 
Aaron Courville
DirecteurGauthier Gidel
MembrePierre-Luc Bacon

Résumé

The goal of reinforcement learning (RL) is often presented as learning apolicy that maximizes the expected sum of discounted rewards, leading tothe ubiquity of the Bellman optimality operator. However, this standard goal is not the only possible target for RL: alternative operators exist,each corresponding to different notions of optimality. We begin by taking an operator-based perspective to present and compare different forms of RL(including regularized RL, robust RL and generative flow networks). We then turn our attention to the scientific discovery problem of narrowing a large combinatorial set of objects, such as proteins or molecules, to a small set of promising candidates. While RL-based approaches have been used for this task, we show that the currently used operators either lack diversity or yield suboptimal candidates, especially in large search spaces. To remedy this, we propose a novel general operator, general mellowmax, that target speakier sampling distributions while encompassing known soft RL operators.Then, we propose a corresponding algorithm and demonstrate that itidentifies higher-quality, diverse candidates in both synthetic and real-world tasks. Finally, we outline potential future research directionsfor our work: algorithmic changes, improved generalization and better connections to real scientific discovery applications.