qmix-theorem-viz / README.md
onenoly11's picture
Update README.md
53c9850 verified

A newer version of the Gradio SDK is available: 6.13.0

Upgrade
metadata
title: QMIX 3-Agent Theorem Visualization
emoji: ๐ŸŽฎ
colorFrom: blue
colorTo: indigo
sdk: gradio
sdk_version: 6.2.0
app_file: app.py
pinned: false
license: cc-by-4.0
short_description: Interactive QMIX multi-agent RL theorem proof
tags:
  - reinforcement-learning
  - multi-agent
  - qmix
  - value-decomposition
  - ctde
  - marl

๐ŸŽฎ QMIX 3-Agent Theorem Visualization

Interactive demonstration of the QMIX theorem for multi-agent reinforcement learning.

๐ŸŽฏ Core Insight

Monotonic mixing preserves individual argmax alignment with joint Q_tot argmax.

This enables Centralized Training, Decentralized Execution (CTDE).

๐Ÿ“ The QMIX Formula

Q_tot(u) = wโ‚ยทQโ‚(uโ‚) + wโ‚‚ยทQโ‚‚(uโ‚‚) + wโ‚ƒยทQโ‚ƒ(uโ‚ƒ) + b

Theorem: If all weights w_i โ‰ฅ 0 (monotonic), then:

argmax_u Q_tot(u) = (argmax Qโ‚, argmax Qโ‚‚, argmax Qโ‚ƒ)

๐Ÿ”ง Features

  • Interactive Q-value sliders: Adjust individual agent Q-functions
  • Mixer weight control: Test monotonicity by setting weights negative
  • Real-time visualization: See Q_tot for all 8 joint action combinations
  • Theorem verification: Automatic check if greedy = global argmax
  • Swap analysis: Demonstrate monotonic improvement property

๐Ÿ’ก Try This

  1. Keep all weights positive โ†’ Theorem holds โœ…
  2. Set one weight negative โ†’ Watch the theorem break โŒ
  3. Observe which joint action becomes optimal

๐Ÿ“š What is QMIX?

QMIX (Rashid et al., 2018) is a value decomposition method for cooperative multi-agent reinforcement learning:

  1. Factored Q-values: Each agent maintains its own Q-function
  2. Monotonic Mixing: Q_tot is a monotonic combination of individual Qs
  3. Decentralized Execution: Agents act greedily on local information
  4. Centralized Training: Global Q_tot used for backpropagation

๐Ÿ“– References

  • Rashid, T. et al. (2018). QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning
  • Sunehag, P. et al. (2017). Value-Decomposition Networks For Cooperative Multi-Agent Learning

๐Ÿ“„ License

CC-BY-4.0


Built with ๐Ÿ”ฎ by Quantum Pi Forge โ€ข T=โˆž = T=0