erickfm commited on
Commit
49c4c05
Β·
verified Β·
1 Parent(s): 84253e3

update card: 8-char retrain on new 13-col schema + principled transforms (2026-04-20)

Browse files
Files changed (1) hide show
  1. README.md +152 -91
README.md CHANGED
@@ -11,30 +11,49 @@ library_name: pytorch
11
 
12
  # MIMIC: Melee Imitation Model for Input Cloning
13
 
14
- Behavior-cloned Super Smash Bros. Melee bots trained on human Slippi replays.
15
- Four character-specific models (Fox, Falco, Captain Falcon, Luigi), each a
16
- ~20M-parameter transformer that takes a 256-frame window of game state and
17
- outputs controller inputs (main stick, c-stick, shoulder, buttons) at 60 Hz.
 
18
 
19
  - **Repo**: https://github.com/erickfm/MIMIC
20
- - **Base architecture**: HAL's GPTv5Controller (Eric Gu,
21
- https://github.com/ericyuegu/hal) β€” 6-layer causal transformer,
22
- 512 d_model, 8 heads, 256-frame context, relative position encoding
23
- (Shaw et al.)
24
- - **MIMIC-specific changes**: 7-class button head (distinct TRIG class for
25
- airdodge/wavedash, which HAL's 5-class head cannot represent); v2 shard
26
- alignment that fixes a subtle gamestate leak in the training targets
27
- (see research notes 2026-04-11c); fix for the digital L press bug that
28
- prevented all 7-class BC bots from wavedashing until 2026-04-13.
29
- - **Training data**: filtered from
30
- [erickfm/slippi-public-dataset-v3.7](https://huggingface.co/datasets/erickfm/slippi-public-dataset-v3.7)
31
- (~95K Slippi replays).
32
-
33
- ## Per-character checkpoints
34
-
35
- | Character | Games | Val btn F1 | Val main F1 | Val loss | Step |
36
- |---|---|---|---|---|---|
37
- | **Fox** | 17,319 | 87.1% | ~55% | 0.77 | 55,692 |
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
38
 
39
  ## Repo layout
40
 
@@ -45,30 +64,33 @@ MIMIC/
45
  β”‚ β”œβ”€β”€ model.pt # raw PyTorch checkpoint
46
  β”‚ β”œβ”€β”€ config.json # ModelConfig (copied from ckpt["config"])
47
  β”‚ β”œβ”€β”€ metadata.json # provenance (step, val metrics, notes)
48
- β”‚ β”œβ”€β”€ mimic_norm.json # normalization stats
49
  β”‚ β”œβ”€β”€ controller_combos.json # 7-class button combo spec
50
  β”‚ β”œβ”€β”€ cat_maps.json
51
  β”‚ β”œβ”€β”€ stick_clusters.json
52
- β”‚ └── norm_stats.json
53
- β”œβ”€β”€ falco/ (same layout)
54
- β”œβ”€β”€ cptfalcon/ (same layout)
55
- └── luigi/ (same layout)
 
 
 
 
 
56
  ```
57
 
58
  Each character directory is self-contained β€” the JSONs are the exact
59
- metadata used during training, copied verbatim from the MIMIC data dir so
60
  any inference script can load them without touching the MIMIC repo.
61
 
62
  ## Usage
63
 
64
- Clone the MIMIC repo and pull this model:
65
-
66
  ```bash
67
  git clone https://github.com/erickfm/MIMIC.git
68
  cd MIMIC
69
  bash setup.sh # installs Dolphin, deps, ISO
70
 
71
- # Download all four characters
72
  python3 -c "
73
  from huggingface_hub import snapshot_download
74
  snapshot_download('erickfm/MIMIC', local_dir='./hf_checkpoints')
@@ -79,23 +101,23 @@ Run a character against a level-9 CPU:
79
 
80
  ```bash
81
  python3 tools/play_vs_cpu.py \
82
- --checkpoint hf_checkpoints/falco/model.pt \
83
  --dolphin-path ./emulator/squashfs-root/usr/bin/dolphin-emu \
84
  --iso-path ./melee.iso \
85
- --data-dir hf_checkpoints/falco \
86
- --character FALCO --cpu-character FALCO --cpu-level 9 \
87
  --stage FINAL_DESTINATION
88
  ```
89
 
90
- Or play the bot over Slippi Online Direct Connect:
91
 
92
  ```bash
93
  python3 tools/play_netplay.py \
94
- --checkpoint hf_checkpoints/falco/model.pt \
95
  --dolphin-path ./emulator/squashfs-root/usr/bin/dolphin-emu \
96
  --iso-path ./melee.iso \
97
- --data-dir hf_checkpoints/falco \
98
- --character FALCO \
99
  --connect-code YOUR#123
100
  ```
101
 
@@ -106,16 +128,16 @@ See [docs/discord-bot-setup.md](https://github.com/erickfm/MIMIC/blob/main/docs/
106
  ## Architecture
107
 
108
  ```
109
- Slippi Frame ──► HALFlatEncoder (Linear 166β†’512) ──► 512-d per-frame vector
110
- β”‚
111
- 256-frame window ──► + Relative Position Encoding β”€β”€β”€β”€β”€β”€β”€β”€β”˜
112
- β”‚
113
- 6Γ— Pre-Norm Causal Transformer Blocks (512-d, 8 heads)
114
- β”‚
115
- Autoregressive Output Heads (with detach)
116
- β”‚
117
- β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
118
- shoulder(3) c_stick(9) main_stick(37) buttons(7)
119
  ```
120
 
121
  ### 7-class button head
@@ -130,67 +152,106 @@ Slippi Frame ──► HALFlatEncoder (Linear 166β†’512) ──► 512-d per-fra
130
  | 5 | A_TRIG (shield grab) |
131
  | 6 | NONE |
132
 
133
- HAL's original 5-class head (`A, B, Jump, Z, None`) has no TRIG class and
134
- structurally cannot execute airdodge, which means HAL-lineage bots cannot
135
- wavedash. MIMIC's 7-class encoding plus a fix for `decode_and_press`
136
- (which was silently dropping the digital L press until 2026-04-13) is
137
- what enables the wavedashing you'll see in the replays.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
138
 
139
- ### Input features
 
 
 
 
 
 
 
140
 
141
- 9 numeric features per player (ego + opponent = 18 total):
142
- `percent, stock, facing, invulnerable, jumps_left, on_ground,
143
- shield_strength, position_x, position_y`
144
 
145
- Plus categorical embeddings: stage(4d), 2Γ— character(12d), 2Γ— action(32d).
146
- Plus controller state from the previous frame as a 56-dim one-hot
147
- (37 stick + 9 c-stick + 7 button + 3 shoulder).
148
 
149
- Total input per frame: 166 dimensions β†’ projected to 512.
 
 
 
 
 
 
150
 
151
  ## Training
152
 
153
- - Optimizer: AdamW, LR 3e-4, weight decay 0.01, no warmup
154
- - LR schedule: CosineAnnealingLR, eta_min 1e-6
 
155
  - Gradient clip: 1.0
156
  - Dropout: 0.2
157
- - Sequence length: 256 frames (~4.3 seconds)
 
158
  - Mixed precision: BF16 AMP with FP32 upcast for relpos attention
159
- (prevents BF16 overflow in the manual Q@K^T + Srel computation)
160
- - Batch size: 512 (typically single-GPU on an RTX 5090)
161
- - Steps: ~32K for well-represented characters, early-stopped for Luigi
162
- - Reaction delay: 0 (v2 shards have target[i] = buttons[i+1], so the
163
- default rd=0 matches inference β€” do NOT use `--reaction-delay 1` or
164
- `--controller-offset` with v2 shards)
 
 
 
 
 
 
 
165
 
166
  ## Known limitations
167
 
168
- 1. **Character-locked**: each model only plays the character it was trained
169
- on. No matchup generalization. Training a multi-character model with a
170
- character embedding is a natural next step but not done yet.
171
- 2. **Fox model is legacy**: the Fox checkpoint is from an earlier run that
172
- predates the `--self-inputs` fix. Its val metrics are much lower than
173
- the others and it plays slightly worse.
174
- 3. **Small-dataset overfitting**: Luigi only has 1951 training games after
175
- filtering. The `_best.pt` checkpoint is early-stopped at step 5242 to
176
- avoid the val-loss climb. Plays surprisingly well for the data volume.
177
- 4. **Edge guarding and recovery weaknesses**: the bot doesn't consistently
178
- go for off-stage edge guards or execute high-skill recovery mixups.
179
- 5. **No matchmaking / Ranked**: the Discord bot only joins explicit Direct
180
- Connect lobbies. Do NOT adapt it for Slippi Online Unranked or Ranked β€”
181
- the libmelee README explicitly forbids bots on those ladders, and
182
- Slippi has not yet opened a "bot account" opt-in system.
 
183
 
184
  ## Acknowledgments
185
 
186
- - **Eric Gu** for HAL, the reference implementation MIMIC is based on.
187
- HAL's architecture, tokenization, and training pipeline are the
188
- foundation. https://github.com/ericyuegu/hal
189
- - **Vlad Firoiu and collaborators** for libmelee, the Python interface
190
- to Dolphin + Slippi. https://github.com/altf4/libmelee
 
191
  - **Project Slippi** for the Slippi Dolphin fork, replay format, and
192
  Direct Connect rollback netplay. https://slippi.gg
193
 
194
  ## License
195
 
196
- MIT β€” see the MIMIC repo's LICENSE file.
 
11
 
12
  # MIMIC: Melee Imitation Model for Input Cloning
13
 
14
+ Behavior-cloned Super Smash Bros. Melee bots trained on human Slippi
15
+ replays. Eight character-specific ~20M-parameter transformers that take
16
+ a 180-frame window of game state and output controller inputs (main
17
+ stick, c-stick, shoulder, buttons) at 60 Hz. Each model plays over
18
+ Slippi Online Direct Connect through Dolphin + libmelee.
19
 
20
  - **Repo**: https://github.com/erickfm/MIMIC
21
+ - **Training data**:
22
+ [erickfm/melee-ranked-replays](https://huggingface.co/datasets/erickfm/melee-ranked-replays)
23
+ β€” ranked Slippi replays (master/diamond/platinum tier) per character.
24
+ - **Base architecture**: Shaw-relative-position causal transformer
25
+ (d_model=512, 6 layers, 8 heads, seq_len=180). Bootstrapped from
26
+ [HAL](https://github.com/ericyuegu/hal) (Eric Gu) and since diverged.
27
+ - **Defining MIMIC changes over HAL**: 7-class button head with a
28
+ distinct TRIG class for airdodge/wavedash (HAL's 5-class head can't
29
+ represent airdodge and thus can't wavedash); v2 shard alignment that
30
+ fixes a subtle post-frame-gamestate leak in the training targets
31
+ (see `research-notes-2026-04-11c`); the digital-L-press fix in
32
+ `decode_and_press` (research notes 2026-04-13) without which no
33
+ 7-class BC bot wavedashes.
34
+
35
+ ## Current checkpoints (retrained on 2026-04-20 baseline)
36
+
37
+ Retrained on the post-schema-drop (13 numeric cols), new-transforms
38
+ (`tanh_scale` / `linear_max` / `log_max` for velocity / hitlag /
39
+ hitstun) basis. See `research-notes-2026-04-20.md` in the MIMIC repo
40
+ for methodology + results analysis.
41
+
42
+ | Character | Run | Train games | Val loss | btn F1 | main F1 | Step |
43
+ |---|---|---|---|---|---|---|
44
+ | **Fox** | `fox-20260420-baseline` | 31,030 | 0.7144 | ~88% | ~58% | 32768 |
45
+ | **Falco** | `falco-20260420-baseline` | 20,882 | 0.7487 | ~88% | ~58% | 31392 |
46
+ | **Marth** | `marth-20260420-baseline` | 11,759 | 0.6664 | ~89% | ~58% | 31065 |
47
+ | **Sheik** | `sheik-20260420-baseline` | 51,751 | 0.6566 | ~90% | ~60% | 26160 |
48
+ | Captain Falcon | `cptfalcon-20260420-baseline` | 17,557 | _(training)_ | β€” | β€” | β€” |
49
+ | Luigi | `luigi-20260420-baseline` | _(queued)_ | β€” | β€” | β€” | β€” |
50
+ | Jigglypuff | `puff-20260420-baseline` | _(queued)_ | β€” | β€” | β€” | β€” |
51
+ | Ice Climbers | `ice_climbers-20260420-baseline` | _(queued)_ | β€” | β€” | β€” | β€” |
52
+
53
+ **Peach** is present on the repo (`peach-20260420-baseline`,
54
+ val 0.6322) but was trained 2026-04-19 on the pre-drop 22-col schema.
55
+ Peach will be re-trained alongside the rest in a follow-on cycle so
56
+ all chars sit on the exact same basis.
57
 
58
  ## Repo layout
59
 
 
64
  β”‚ β”œβ”€β”€ model.pt # raw PyTorch checkpoint
65
  β”‚ β”œβ”€β”€ config.json # ModelConfig (copied from ckpt["config"])
66
  β”‚ β”œβ”€β”€ metadata.json # provenance (step, val metrics, notes)
67
+ β”‚ β”œβ”€β”€ mimic_norm.json # per-feature transforms + params
68
  β”‚ β”œβ”€β”€ controller_combos.json # 7-class button combo spec
69
  β”‚ β”œβ”€β”€ cat_maps.json
70
  β”‚ β”œβ”€β”€ stick_clusters.json
71
+ β”‚ └── norm_stats.json # per-column mean/std (z-score fallback)
72
+ β”œβ”€β”€ falco/ (same layout)
73
+ β”œβ”€β”€ marth/ (same layout)
74
+ β”œβ”€β”€ sheik/ (same layout)
75
+ β”œβ”€β”€ cptfalcon/ (same layout)
76
+ β”œβ”€β”€ luigi/ (same layout)
77
+ β”œβ”€β”€ puff/ (same layout)
78
+ β”œβ”€β”€ ice_climbers/(same layout)
79
+ └── peach/ (same layout, pre-drop schema β€” retrain pending)
80
  ```
81
 
82
  Each character directory is self-contained β€” the JSONs are the exact
83
+ metadata used during training, copied verbatim from the data dir so
84
  any inference script can load them without touching the MIMIC repo.
85
 
86
  ## Usage
87
 
 
 
88
  ```bash
89
  git clone https://github.com/erickfm/MIMIC.git
90
  cd MIMIC
91
  bash setup.sh # installs Dolphin, deps, ISO
92
 
93
+ # Download all characters
94
  python3 -c "
95
  from huggingface_hub import snapshot_download
96
  snapshot_download('erickfm/MIMIC', local_dir='./hf_checkpoints')
 
101
 
102
  ```bash
103
  python3 tools/play_vs_cpu.py \
104
+ --checkpoint hf_checkpoints/marth/model.pt \
105
  --dolphin-path ./emulator/squashfs-root/usr/bin/dolphin-emu \
106
  --iso-path ./melee.iso \
107
+ --data-dir hf_checkpoints/marth \
108
+ --character MARTH --cpu-character FOX --cpu-level 9 \
109
  --stage FINAL_DESTINATION
110
  ```
111
 
112
+ Or play a bot over Slippi Online Direct Connect:
113
 
114
  ```bash
115
  python3 tools/play_netplay.py \
116
+ --checkpoint hf_checkpoints/sheik/model.pt \
117
  --dolphin-path ./emulator/squashfs-root/usr/bin/dolphin-emu \
118
  --iso-path ./melee.iso \
119
+ --data-dir hf_checkpoints/sheik \
120
+ --character SHEIK \
121
  --connect-code YOUR#123
122
  ```
123
 
 
128
  ## Architecture
129
 
130
  ```
131
+ Slippi frame ──► MimicFlatEncoder (Linear 184β†’512) ──► 512-d per-frame vector
132
+ β”‚
133
+ 180-frame window ──► + Shaw Relative-Position attention β”€β”€β”€β”€β”˜
134
+ β”‚
135
+ 6Γ— Pre-Norm Causal Transformer Blocks (512-d, 8 heads, d_ff=2048, GELU, LN)
136
+ β”‚
137
+ Autoregressive Output Heads (with detach)
138
+ β”‚
139
+ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
140
+ shoulder(3) c_stick(9) main_stick(37) buttons(7)
141
  ```
142
 
143
  ### 7-class button head
 
152
  | 5 | A_TRIG (shield grab) |
153
  | 6 | NONE |
154
 
155
+ HAL's original 5-class head (A / B / Jump / Z / None) has no TRIG class
156
+ and structurally can't execute airdodge, which means HAL-lineage bots
157
+ can't wavedash. MIMIC's 7-class encoding plus a fix for
158
+ `decode_and_press` (which was silently dropping the digital L press
159
+ until 2026-04-13) is what enables the wavedashing in the replays.
160
+
161
+ ### Input features (per frame, per player)
162
+
163
+ Numeric (13):
164
+
165
+ pos_x, pos_y, percent, stock, jumps_left,
166
+ speed_air_x_self, speed_ground_x_self,
167
+ speed_x_attack, speed_y_attack, speed_y_self,
168
+ hitlag_left, hitstun_left,
169
+ shield_strength
170
+
171
+ Flags (5):
172
+
173
+ on_ground, off_stage, facing, invulnerable, moonwalkwarning
174
+
175
+ Per-feature normalization is defined in each character's
176
+ `mimic_norm.json`. The active transforms are:
177
 
178
+ | transform | formula | used for |
179
+ |---|---|---|
180
+ | `normalize` | `2(x-min)/(max-min) - 1` β†’ [-1, +1] | percent, stock, jumps_left, facing, invulnerable, on_ground |
181
+ | `standardize` | `(x - mean) / std` | pos_x, pos_y |
182
+ | `invert_normalize` | `2(max-x)/(max-min) - 1` | shield_strength (so "shield broken" is +1) |
183
+ | `tanh_scale` | `tanh(x / scale)` | 5 velocities (scale=5 for self, scale=10 for attack) |
184
+ | `linear_max` | `x / max` | hitlag_left (max=20) |
185
+ | `log_max` | `log1p(clamp(x,0,max)) / log1p(max)` | hitstun_left (max=120) |
186
 
187
+ Plus categorical embeddings: stage(4d), 2Γ— character(12d),
188
+ 2Γ— action(32d). Plus the previous-frame controller state as a 56-dim
189
+ one-hot (37 stick + 9 c-stick + 7 button + 3 shoulder).
190
 
191
+ Total input per frame: **184 dimensions** β†’ projected to 512.
 
 
192
 
193
+ Earlier builds (pre-2026-04-20) used a 22-col numeric schema that
194
+ included `invuln_left` and 8 ECB corners. Those columns turned out to
195
+ be structurally zero for our .slp parse path β€” libmelee never
196
+ populates them β€” so they were dropped from the schema. See research
197
+ notes 2026-04-20 for the audit. Checkpoints trained pre-drop
198
+ (`peach-20260420-baseline`) still load via their own pickled config
199
+ but use the 202-dim projection path.
200
 
201
  ## Training
202
 
203
+ - Model preset: `mimic` (20M params)
204
+ - Optimizer: AdamW, LR 3e-4, weight decay 0.01, **no warmup**
205
+ - LR schedule: `CosineAnnealingLR` to `eta_min=1e-6`
206
  - Gradient clip: 1.0
207
  - Dropout: 0.2
208
+ - Sequence length: **180 frames** (~3 seconds)
209
+ - Batch size: 256 per-GPU Γ— 2 RTX 5090s Γ— grad-accum 1 = **eff-batch 512**
210
  - Mixed precision: BF16 AMP with FP32 upcast for relpos attention
211
+ (prevents BF16 overflow in the manual Q@Kα΅€ + S_rel computation)
212
+ - Max samples: 16.78M (β‰ˆ 32,768 steps at eff-batch 512)
213
+ - Watchdog: patience=12 evals on val-plateau β€” some chars finish early
214
+ - Reaction delay: 0. v2 shards have `target[i] = buttons[i+1]`, so
215
+ `rd=0` matches inference β€” do NOT use `--reaction-delay 1` or
216
+ `--controller-offset` with v2 shards.
217
+ - `--self-inputs` is required even on v2 shards. Runs without it
218
+ drop the controller-history input entirely and land at val loss ~2.3.
219
+
220
+ Typical wall-clock per char on 2Γ—RTX 5090: 10-15 min download/extract
221
+ + 20 min parallel `norm_stats` bootstrap + 45-120 min sharding
222
+ (depending on char, cptfalcon and sheik are the longest) + ~50 min
223
+ training = 2-4 hours.
224
 
225
  ## Known limitations
226
 
227
+ 1. **Character-locked.** Each model only plays the character it was
228
+ trained on. No matchup generalization. Multi-character training
229
+ with a character embedding is a natural next step but not done.
230
+ 2. **Small-dataset overfitting on Luigi / Ice Climbers.** Luigi has
231
+ ~2K training games; IC around 5K. Their `_bestloss.pt` is
232
+ early-stopped β€” either by the patience=12 watchdog during this
233
+ cycle or by inspection in prior cycles. Play quality varies.
234
+ 3. **Edge guarding and recovery weaknesses.** Bots don't consistently
235
+ go for off-stage edge guards or execute high-skill recovery
236
+ mixups. The training data has these in it, but BC bots under-sample
237
+ long-tail strategic decisions.
238
+ 4. **No Matchmaking / Ranked.** The Discord bot only joins explicit
239
+ Direct Connect lobbies. Do NOT adapt it for Slippi Online Unranked
240
+ or Ranked β€” libmelee's README explicitly forbids bots on those
241
+ ladders, and Slippi has not yet opened a "bot account" opt-in
242
+ system.
243
 
244
  ## Acknowledgments
245
 
246
+ - **Eric Gu** for [HAL](https://github.com/ericyuegu/hal), the
247
+ reference implementation MIMIC is based on. HAL's architecture,
248
+ tokenization, and training pipeline are the foundation.
249
+ - **Vlad Firoiu and collaborators** for
250
+ [libmelee](https://github.com/altf4/libmelee), the Python interface
251
+ to Dolphin + Slippi.
252
  - **Project Slippi** for the Slippi Dolphin fork, replay format, and
253
  Direct Connect rollback netplay. https://slippi.gg
254
 
255
  ## License
256
 
257
+ MIT β€” see the MIMIC repo's `LICENSE` file.