Tasha02 commited on
Commit
fcfeea5
·
verified ·
1 Parent(s): 5dd498c

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +44 -0
README.md ADDED
@@ -0,0 +1,44 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ tags:
4
+ - multi-agent-rl
5
+ - mappo
6
+ - reward-attribution
7
+ - representation-geometry
8
+ - smacv2
9
+ - tribal-village
10
+ ---
11
+
12
+ # rl-workshop-2026 — Final Policy Checkpoints
13
+
14
+ Final MAPPO policy checkpoints for the workshop paper *"Feedback Attribution
15
+ Determines Representation Geometry in Multi-Agent RL."* Logged alongside W&B
16
+ project [`tashapais/rl_workshop_2026`](https://wandb.ai/tashapais/rl_workshop_2026).
17
+
18
+ Each `.pt` is a dict with key `"model"` (PyTorch `state_dict` for the
19
+ `ActorCritic` defined in the `tribal-village` repo's `experiments/`).
20
+
21
+ ## Table 1 — Tribal Village (12 agents, 308 actions), 4M agent-steps
22
+ `tribal_village/<run>/step_4002816.pt`, reward attribution
23
+ `r_i^alpha = (1-alpha) r_i + alpha * mean_j r_j`:
24
+
25
+ | Condition | alpha | seeds |
26
+ |-----------|-------|-------|
27
+ | Individual | 0.0 | 0,1,2 |
28
+ | Mixed | 0.8 | 0,1,2 |
29
+ | Shared | 1.0 | 0,1,2 |
30
+
31
+ ## Table 2 — SMACv2 10gen_terran (6 terran units), 2M steps
32
+ `smacv2/<run>/step_2001408.pt`:
33
+
34
+ | Condition | seeds |
35
+ |-----------|-------|
36
+ | Individual (per-agent reward) | 0,1,2 |
37
+ | Shared (team-averaged) | 0,1,2 |
38
+
39
+ ## Caveats (see repo `runs.md`)
40
+ - Tribal Village runs **fail the behavior gate** (no-op/random baseline) at 4M
41
+ steps under passive shaping; representation geometry from them is direction-
42
+ consistent (probe declines 0.75→0.50) but not yet behavior-grounded.
43
+ - SMAC `individual` agents are weak (~1.7% win) vs `shared` (~25%); SMAC D_act
44
+ is mask-contaminated and not paper-quotable without a mask-aware recompute.