Actor-Dual-Critic Dynamics for Zero-sum and Identical-Interest Stochastic Games
Abstract
A novel model-free, game-agnostic, and gradient-free learning framework for stochastic games uses a best-response-type actor-critic architecture with fast and slow critics to achieve convergence to approximate equilibria.
We propose a novel independent and payoff-based learning framework for stochastic games that is model-free, game-agnostic, and gradient-free. The learning dynamics follow a best-response-type actor-critic architecture, where agents update their strategies (actors) using feedback from two distinct critics: a fast critic that intuitively responds to observed payoffs under limited information, and a slow critic that deliberatively approximates the solution to the underlying dynamic programming problem. Crucially, the learning process relies on non-equilibrium adaptation through smoothed best responses to observed payoffs. We establish convergence to (approximate) equilibria in two-agent zero-sum and multi-agent identical-interest stochastic games over an infinite horizon. This provides one of the first payoff-based and fully decentralized learning algorithms with theoretical guarantees in both settings. Empirical results further validate the robustness and effectiveness of the proposed approach across both classes of games.
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper