Spaces:
Sleeping
Sleeping
A newer version of the Gradio SDK is available: 6.13.0
metadata
title: QMIX 3-Agent Theorem Visualization
emoji: ๐ฎ
colorFrom: blue
colorTo: indigo
sdk: gradio
sdk_version: 6.2.0
app_file: app.py
pinned: false
license: cc-by-4.0
short_description: Interactive QMIX multi-agent RL theorem proof
tags:
- reinforcement-learning
- multi-agent
- qmix
- value-decomposition
- ctde
- marl
๐ฎ QMIX 3-Agent Theorem Visualization
Interactive demonstration of the QMIX theorem for multi-agent reinforcement learning.
๐ฏ Core Insight
Monotonic mixing preserves individual argmax alignment with joint Q_tot argmax.
This enables Centralized Training, Decentralized Execution (CTDE).
๐ The QMIX Formula
Q_tot(u) = wโยทQโ(uโ) + wโยทQโ(uโ) + wโยทQโ(uโ) + b
Theorem: If all weights w_i โฅ 0 (monotonic), then:
argmax_u Q_tot(u) = (argmax Qโ, argmax Qโ, argmax Qโ)
๐ง Features
- Interactive Q-value sliders: Adjust individual agent Q-functions
- Mixer weight control: Test monotonicity by setting weights negative
- Real-time visualization: See Q_tot for all 8 joint action combinations
- Theorem verification: Automatic check if greedy = global argmax
- Swap analysis: Demonstrate monotonic improvement property
๐ก Try This
- Keep all weights positive โ Theorem holds โ
- Set one weight negative โ Watch the theorem break โ
- Observe which joint action becomes optimal
๐ What is QMIX?
QMIX (Rashid et al., 2018) is a value decomposition method for cooperative multi-agent reinforcement learning:
- Factored Q-values: Each agent maintains its own Q-function
- Monotonic Mixing: Q_tot is a monotonic combination of individual Qs
- Decentralized Execution: Agents act greedily on local information
- Centralized Training: Global Q_tot used for backpropagation
๐ References
- Rashid, T. et al. (2018). QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning
- Sunehag, P. et al. (2017). Value-Decomposition Networks For Cooperative Multi-Agent Learning
๐ License
CC-BY-4.0
Built with ๐ฎ by Quantum Pi Forge โข T=โ = T=0