Passer au contenu

/ Département d'informatique et de recherche opérationnelle

Je donne

Rechercher

Navigation secondaire

Présentation prédoc III - Juan Duque

Bonjour à tous,


Vous êtes cordialement invité.e.s à l'évaluation du Predoc III de Juan Duque, le 29 Août, 13h00 (mode hybride).


Title: Policy Optimization in the Landscape of General-Sum Games

Date: 29 Août 2024 de 13:00 à 16:00 EST

Location:  Auditorium 2, MILA + *Zoom Link

 

Jury

Président rapporteur
Gidel, Gauthier
Directeur de rechercheCourville, Aaron
Membre régulier
Bacon, Pierre-Luc

 

Abstract

In real-world scenarios, agent interactions often involve general-sum games, where each agent seeks to optimize its utility, potentially leading to conflicts and suboptimal outcomes. Traditional decentralized reinforcement learning algorithms struggle to find equilibria that balance individual utility and social welfare. We propose two approaches to policy optimization in general-sum games. The first, Learning with Opponent Q-Learning Awareness (LOQA), is a decentralized algorithm that optimizes utility while fostering cooperation in partially competitive environments. LOQA, which assumes opponents sample actions based on their Q-value function, achieves state-of-the-art performance in benchmarks like the Iterated Prisoner's Dilemma and Coin Game with lower computational demands. The second approach, Advantage Alignment, refines opponent shaping by aligning the advantages of conflicting agents to increase the likelihood of mutually beneficial actions. Simplifying methods like LOLA and LOQA, Advantage Alignment extends to continuous action domains and demonstrates superior results across various social dilemmas, including the Negotiation Game.