Présentation prédoc III - Juan Duque - Département d'informatique et de recherche opérationnelle

Bonjour à tous,

Vous êtes cordialement invité.e.s à l'évaluation du Predoc III de Juan Duque, le 29 Août, 13h00 (mode hybride).

Title: Policy Optimization in the Landscape of General-Sum Games

Date: 29 Août 2024 de 13:00 à 16:00 EST

Location: Auditorium 2, MILA + *Zoom Link

Link: https://umontreal.zoom.us/j/2438793436?pwd=KzNTVWVhckZESDMyWjU4Sm1RRkd1dz09

Jury

Président rapporteur	Gidel, Gauthier
Directeur de recherche	Courville, Aaron
Membre régulier	Bacon, Pierre-Luc

Abstract

In real-world scenarios, agent interactions often involve general-sum games, where each agent seeks to optimize its utility, potentially leading to conflicts and suboptimal outcomes. Traditional decentralized reinforcement learning algorithms struggle to find equilibria that balance individual utility and social welfare. We propose two approaches to policy optimization in general-sum games. The first, Learning with Opponent Q-Learning Awareness (LOQA), is a decentralized algorithm that optimizes utility while fostering cooperation in partially competitive environments. LOQA, which assumes opponents sample actions based on their Q-value function, achieves state-of-the-art performance in benchmarks like the Iterated Prisoner's Dilemma and Coin Game with lower computational demands. The second approach, Advantage Alignment, refines opponent shaping by aligning the advantages of conflicting agents to increase the likelihood of mutually beneficial actions. Simplifying methods like LOLA and LOQA, Advantage Alignment extends to continuous action domains and demonstrates superior results across various social dilemmas, including the Negotiation Game.

Back

Université de Montréal / Faculty of Arts and Science Department of Computer Science and Operations Research

Présentation prédoc III - Juan Duque

Supporting the Department

NEED HELP?

FACULTY OF ARTS AND SCIENCE