Présentation prédoc III - Juan Duque
Bonjour à tous,
Vous êtes cordialement invité.e.s à l'évaluation du Predoc III de Juan Duque, le 29 Août, 13h00 (mode hybride).
Title: Policy Optimization in the Landscape of General-Sum Games
Date: 29 Août 2024 de 13:00 à 16:00 EST
Location: Auditorium 2, MILA + *Zoom Link
Jury
Président rapporteur | Gidel, Gauthier |
Directeur de recherche | Courville, Aaron |
Membre régulier | Bacon, Pierre-Luc |
Abstract
In real-world scenarios, agent interactions often involve general-sum games, where each agent seeks to optimize its utility, potentially leading to conflicts and suboptimal outcomes. Traditional decentralized reinforcement learning algorithms struggle to find equilibria that balance individual utility and social welfare. We propose two approaches to policy optimization in general-sum games. The first, Learning with Opponent Q-Learning Awareness (LOQA), is a decentralized algorithm that optimizes utility while fostering cooperation in partially competitive environments. LOQA, which assumes opponents sample actions based on their Q-value function, achieves state-of-the-art performance in benchmarks like the Iterated Prisoner's Dilemma and Coin Game with lower computational demands. The second approach, Advantage Alignment, refines opponent shaping by aligning the advantages of conflicting agents to increase the likelihood of mutually beneficial actions. Simplifying methods like LOLA and LOQA, Advantage Alignment extends to continuous action domains and demonstrates superior results across various social dilemmas, including the Negotiation Game.