Tasha02
/

rl-workshop-2026

+---
+license: mit
+tags:
+- multi-agent-rl
+- mappo
+- reward-attribution
+- representation-geometry
+- smacv2
+- tribal-village
+---
+# rl-workshop-2026 — Final Policy Checkpoints
+Final MAPPO policy checkpoints for the workshop paper *"Feedback Attribution
+Determines Representation Geometry in Multi-Agent RL."* Logged alongside W&B
+project [`tashapais/rl_workshop_2026`](https://wandb.ai/tashapais/rl_workshop_2026).
+Each `.pt` is a dict with key `"model"` (PyTorch `state_dict` for the
+`ActorCritic` defined in the `tribal-village` repo's `experiments/`).
+## Table 1 — Tribal Village (12 agents, 308 actions), 4M agent-steps
+`tribal_village/<run>/step_4002816.pt`, reward attribution
+`r_i^alpha = (1-alpha) r_i + alpha * mean_j r_j`:
+| Condition | alpha | seeds |
+|-----------|-------|-------|
+| Individual | 0.0 | 0,1,2 |
+| Mixed | 0.8 | 0,1,2 |
+| Shared | 1.0 | 0,1,2 |
+## Table 2 — SMACv2 10gen_terran (6 terran units), 2M steps
+`smacv2/<run>/step_2001408.pt`:
+| Condition | seeds |
+|-----------|-------|
+| Individual (per-agent reward) | 0,1,2 |
+| Shared (team-averaged) | 0,1,2 |
+## Caveats (see repo `runs.md`)
+- Tribal Village runs **fail the behavior gate** (no-op/random baseline) at 4M
+  steps under passive shaping; representation geometry from them is direction-
+  consistent (probe declines 0.75→0.50) but not yet behavior-grounded.
+- SMAC `individual` agents are weak (~1.7% win) vs `shared` (~25%); SMAC D_act
+  is mask-contaminated and not paper-quotable without a mask-aware recompute.