update card: 8-char retrain on new 13-col schema + principled transforms (2026-04-20)
Browse files
README.md
CHANGED
|
@@ -11,30 +11,49 @@ library_name: pytorch
|
|
| 11 |
|
| 12 |
# MIMIC: Melee Imitation Model for Input Cloning
|
| 13 |
|
| 14 |
-
Behavior-cloned Super Smash Bros. Melee bots trained on human Slippi
|
| 15 |
-
|
| 16 |
-
|
| 17 |
-
|
|
|
|
| 18 |
|
| 19 |
- **Repo**: https://github.com/erickfm/MIMIC
|
| 20 |
-
- **
|
| 21 |
-
https://
|
| 22 |
-
|
| 23 |
-
|
| 24 |
-
|
| 25 |
-
|
| 26 |
-
|
| 27 |
-
|
| 28 |
-
|
| 29 |
-
-
|
| 30 |
-
|
| 31 |
-
(
|
| 32 |
-
|
| 33 |
-
|
| 34 |
-
|
| 35 |
-
|
| 36 |
-
|
| 37 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 38 |
|
| 39 |
## Repo layout
|
| 40 |
|
|
@@ -45,30 +64,33 @@ MIMIC/
|
|
| 45 |
β βββ model.pt # raw PyTorch checkpoint
|
| 46 |
β βββ config.json # ModelConfig (copied from ckpt["config"])
|
| 47 |
β βββ metadata.json # provenance (step, val metrics, notes)
|
| 48 |
-
β βββ mimic_norm.json #
|
| 49 |
β βββ controller_combos.json # 7-class button combo spec
|
| 50 |
β βββ cat_maps.json
|
| 51 |
β βββ stick_clusters.json
|
| 52 |
-
β βββ norm_stats.json
|
| 53 |
-
βββ falco/
|
| 54 |
-
βββ
|
| 55 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 56 |
```
|
| 57 |
|
| 58 |
Each character directory is self-contained β the JSONs are the exact
|
| 59 |
-
metadata used during training, copied verbatim from the
|
| 60 |
any inference script can load them without touching the MIMIC repo.
|
| 61 |
|
| 62 |
## Usage
|
| 63 |
|
| 64 |
-
Clone the MIMIC repo and pull this model:
|
| 65 |
-
|
| 66 |
```bash
|
| 67 |
git clone https://github.com/erickfm/MIMIC.git
|
| 68 |
cd MIMIC
|
| 69 |
bash setup.sh # installs Dolphin, deps, ISO
|
| 70 |
|
| 71 |
-
# Download all
|
| 72 |
python3 -c "
|
| 73 |
from huggingface_hub import snapshot_download
|
| 74 |
snapshot_download('erickfm/MIMIC', local_dir='./hf_checkpoints')
|
|
@@ -79,23 +101,23 @@ Run a character against a level-9 CPU:
|
|
| 79 |
|
| 80 |
```bash
|
| 81 |
python3 tools/play_vs_cpu.py \
|
| 82 |
-
--checkpoint hf_checkpoints/
|
| 83 |
--dolphin-path ./emulator/squashfs-root/usr/bin/dolphin-emu \
|
| 84 |
--iso-path ./melee.iso \
|
| 85 |
-
--data-dir hf_checkpoints/
|
| 86 |
-
--character
|
| 87 |
--stage FINAL_DESTINATION
|
| 88 |
```
|
| 89 |
|
| 90 |
-
Or play
|
| 91 |
|
| 92 |
```bash
|
| 93 |
python3 tools/play_netplay.py \
|
| 94 |
-
--checkpoint hf_checkpoints/
|
| 95 |
--dolphin-path ./emulator/squashfs-root/usr/bin/dolphin-emu \
|
| 96 |
--iso-path ./melee.iso \
|
| 97 |
-
--data-dir hf_checkpoints/
|
| 98 |
-
--character
|
| 99 |
--connect-code YOUR#123
|
| 100 |
```
|
| 101 |
|
|
@@ -106,16 +128,16 @@ See [docs/discord-bot-setup.md](https://github.com/erickfm/MIMIC/blob/main/docs/
|
|
| 106 |
## Architecture
|
| 107 |
|
| 108 |
```
|
| 109 |
-
Slippi
|
| 110 |
-
|
| 111 |
-
|
| 112 |
-
|
| 113 |
-
|
| 114 |
-
|
| 115 |
-
|
| 116 |
-
|
| 117 |
-
ββββββββββββΌβββββββββββ¬ββββββββββββ
|
| 118 |
-
|
| 119 |
```
|
| 120 |
|
| 121 |
### 7-class button head
|
|
@@ -130,67 +152,106 @@ Slippi Frame βββΊ HALFlatEncoder (Linear 166β512) βββΊ 512-d per-fra
|
|
| 130 |
| 5 | A_TRIG (shield grab) |
|
| 131 |
| 6 | NONE |
|
| 132 |
|
| 133 |
-
HAL's original 5-class head (
|
| 134 |
-
structurally
|
| 135 |
-
wavedash. MIMIC's 7-class encoding plus a fix for
|
| 136 |
-
(which was silently dropping the digital L press
|
| 137 |
-
what enables the wavedashing
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 138 |
|
| 139 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 140 |
|
| 141 |
-
|
| 142 |
-
|
| 143 |
-
|
| 144 |
|
| 145 |
-
|
| 146 |
-
Plus controller state from the previous frame as a 56-dim one-hot
|
| 147 |
-
(37 stick + 9 c-stick + 7 button + 3 shoulder).
|
| 148 |
|
| 149 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 150 |
|
| 151 |
## Training
|
| 152 |
|
| 153 |
-
-
|
| 154 |
-
-
|
|
|
|
| 155 |
- Gradient clip: 1.0
|
| 156 |
- Dropout: 0.2
|
| 157 |
-
- Sequence length:
|
|
|
|
| 158 |
- Mixed precision: BF16 AMP with FP32 upcast for relpos attention
|
| 159 |
-
(prevents BF16 overflow in the manual Q@K
|
| 160 |
-
-
|
| 161 |
-
-
|
| 162 |
-
- Reaction delay: 0
|
| 163 |
-
|
| 164 |
-
`--controller-offset` with v2 shards
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 165 |
|
| 166 |
## Known limitations
|
| 167 |
|
| 168 |
-
1. **Character-locked**
|
| 169 |
-
on. No matchup generalization.
|
| 170 |
-
character embedding is a natural next step but not done
|
| 171 |
-
2. **
|
| 172 |
-
|
| 173 |
-
|
| 174 |
-
|
| 175 |
-
|
| 176 |
-
|
| 177 |
-
|
| 178 |
-
|
| 179 |
-
|
| 180 |
-
Connect lobbies. Do NOT adapt it for Slippi Online Unranked
|
| 181 |
-
|
| 182 |
-
Slippi has not yet opened a "bot account" opt-in
|
|
|
|
| 183 |
|
| 184 |
## Acknowledgments
|
| 185 |
|
| 186 |
-
- **Eric Gu** for HAL, the
|
| 187 |
-
|
| 188 |
-
foundation.
|
| 189 |
-
- **Vlad Firoiu and collaborators** for
|
| 190 |
-
|
|
|
|
| 191 |
- **Project Slippi** for the Slippi Dolphin fork, replay format, and
|
| 192 |
Direct Connect rollback netplay. https://slippi.gg
|
| 193 |
|
| 194 |
## License
|
| 195 |
|
| 196 |
-
MIT β see the MIMIC repo's LICENSE file.
|
|
|
|
| 11 |
|
| 12 |
# MIMIC: Melee Imitation Model for Input Cloning
|
| 13 |
|
| 14 |
+
Behavior-cloned Super Smash Bros. Melee bots trained on human Slippi
|
| 15 |
+
replays. Eight character-specific ~20M-parameter transformers that take
|
| 16 |
+
a 180-frame window of game state and output controller inputs (main
|
| 17 |
+
stick, c-stick, shoulder, buttons) at 60 Hz. Each model plays over
|
| 18 |
+
Slippi Online Direct Connect through Dolphin + libmelee.
|
| 19 |
|
| 20 |
- **Repo**: https://github.com/erickfm/MIMIC
|
| 21 |
+
- **Training data**:
|
| 22 |
+
[erickfm/melee-ranked-replays](https://huggingface.co/datasets/erickfm/melee-ranked-replays)
|
| 23 |
+
β ranked Slippi replays (master/diamond/platinum tier) per character.
|
| 24 |
+
- **Base architecture**: Shaw-relative-position causal transformer
|
| 25 |
+
(d_model=512, 6 layers, 8 heads, seq_len=180). Bootstrapped from
|
| 26 |
+
[HAL](https://github.com/ericyuegu/hal) (Eric Gu) and since diverged.
|
| 27 |
+
- **Defining MIMIC changes over HAL**: 7-class button head with a
|
| 28 |
+
distinct TRIG class for airdodge/wavedash (HAL's 5-class head can't
|
| 29 |
+
represent airdodge and thus can't wavedash); v2 shard alignment that
|
| 30 |
+
fixes a subtle post-frame-gamestate leak in the training targets
|
| 31 |
+
(see `research-notes-2026-04-11c`); the digital-L-press fix in
|
| 32 |
+
`decode_and_press` (research notes 2026-04-13) without which no
|
| 33 |
+
7-class BC bot wavedashes.
|
| 34 |
+
|
| 35 |
+
## Current checkpoints (retrained on 2026-04-20 baseline)
|
| 36 |
+
|
| 37 |
+
Retrained on the post-schema-drop (13 numeric cols), new-transforms
|
| 38 |
+
(`tanh_scale` / `linear_max` / `log_max` for velocity / hitlag /
|
| 39 |
+
hitstun) basis. See `research-notes-2026-04-20.md` in the MIMIC repo
|
| 40 |
+
for methodology + results analysis.
|
| 41 |
+
|
| 42 |
+
| Character | Run | Train games | Val loss | btn F1 | main F1 | Step |
|
| 43 |
+
|---|---|---|---|---|---|---|
|
| 44 |
+
| **Fox** | `fox-20260420-baseline` | 31,030 | 0.7144 | ~88% | ~58% | 32768 |
|
| 45 |
+
| **Falco** | `falco-20260420-baseline` | 20,882 | 0.7487 | ~88% | ~58% | 31392 |
|
| 46 |
+
| **Marth** | `marth-20260420-baseline` | 11,759 | 0.6664 | ~89% | ~58% | 31065 |
|
| 47 |
+
| **Sheik** | `sheik-20260420-baseline` | 51,751 | 0.6566 | ~90% | ~60% | 26160 |
|
| 48 |
+
| Captain Falcon | `cptfalcon-20260420-baseline` | 17,557 | _(training)_ | β | β | β |
|
| 49 |
+
| Luigi | `luigi-20260420-baseline` | _(queued)_ | β | β | β | β |
|
| 50 |
+
| Jigglypuff | `puff-20260420-baseline` | _(queued)_ | β | β | β | β |
|
| 51 |
+
| Ice Climbers | `ice_climbers-20260420-baseline` | _(queued)_ | β | β | β | β |
|
| 52 |
+
|
| 53 |
+
**Peach** is present on the repo (`peach-20260420-baseline`,
|
| 54 |
+
val 0.6322) but was trained 2026-04-19 on the pre-drop 22-col schema.
|
| 55 |
+
Peach will be re-trained alongside the rest in a follow-on cycle so
|
| 56 |
+
all chars sit on the exact same basis.
|
| 57 |
|
| 58 |
## Repo layout
|
| 59 |
|
|
|
|
| 64 |
β βββ model.pt # raw PyTorch checkpoint
|
| 65 |
β βββ config.json # ModelConfig (copied from ckpt["config"])
|
| 66 |
β βββ metadata.json # provenance (step, val metrics, notes)
|
| 67 |
+
β βββ mimic_norm.json # per-feature transforms + params
|
| 68 |
β βββ controller_combos.json # 7-class button combo spec
|
| 69 |
β βββ cat_maps.json
|
| 70 |
β βββ stick_clusters.json
|
| 71 |
+
β βββ norm_stats.json # per-column mean/std (z-score fallback)
|
| 72 |
+
βββ falco/ (same layout)
|
| 73 |
+
βββ marth/ (same layout)
|
| 74 |
+
βββ sheik/ (same layout)
|
| 75 |
+
βββ cptfalcon/ (same layout)
|
| 76 |
+
βββ luigi/ (same layout)
|
| 77 |
+
βββ puff/ (same layout)
|
| 78 |
+
βββ ice_climbers/(same layout)
|
| 79 |
+
βββ peach/ (same layout, pre-drop schema β retrain pending)
|
| 80 |
```
|
| 81 |
|
| 82 |
Each character directory is self-contained β the JSONs are the exact
|
| 83 |
+
metadata used during training, copied verbatim from the data dir so
|
| 84 |
any inference script can load them without touching the MIMIC repo.
|
| 85 |
|
| 86 |
## Usage
|
| 87 |
|
|
|
|
|
|
|
| 88 |
```bash
|
| 89 |
git clone https://github.com/erickfm/MIMIC.git
|
| 90 |
cd MIMIC
|
| 91 |
bash setup.sh # installs Dolphin, deps, ISO
|
| 92 |
|
| 93 |
+
# Download all characters
|
| 94 |
python3 -c "
|
| 95 |
from huggingface_hub import snapshot_download
|
| 96 |
snapshot_download('erickfm/MIMIC', local_dir='./hf_checkpoints')
|
|
|
|
| 101 |
|
| 102 |
```bash
|
| 103 |
python3 tools/play_vs_cpu.py \
|
| 104 |
+
--checkpoint hf_checkpoints/marth/model.pt \
|
| 105 |
--dolphin-path ./emulator/squashfs-root/usr/bin/dolphin-emu \
|
| 106 |
--iso-path ./melee.iso \
|
| 107 |
+
--data-dir hf_checkpoints/marth \
|
| 108 |
+
--character MARTH --cpu-character FOX --cpu-level 9 \
|
| 109 |
--stage FINAL_DESTINATION
|
| 110 |
```
|
| 111 |
|
| 112 |
+
Or play a bot over Slippi Online Direct Connect:
|
| 113 |
|
| 114 |
```bash
|
| 115 |
python3 tools/play_netplay.py \
|
| 116 |
+
--checkpoint hf_checkpoints/sheik/model.pt \
|
| 117 |
--dolphin-path ./emulator/squashfs-root/usr/bin/dolphin-emu \
|
| 118 |
--iso-path ./melee.iso \
|
| 119 |
+
--data-dir hf_checkpoints/sheik \
|
| 120 |
+
--character SHEIK \
|
| 121 |
--connect-code YOUR#123
|
| 122 |
```
|
| 123 |
|
|
|
|
| 128 |
## Architecture
|
| 129 |
|
| 130 |
```
|
| 131 |
+
Slippi frame βββΊ MimicFlatEncoder (Linear 184β512) βββΊ 512-d per-frame vector
|
| 132 |
+
β
|
| 133 |
+
180-frame window βββΊ + Shaw Relative-Position attention βββββ
|
| 134 |
+
β
|
| 135 |
+
6Γ Pre-Norm Causal Transformer Blocks (512-d, 8 heads, d_ff=2048, GELU, LN)
|
| 136 |
+
β
|
| 137 |
+
Autoregressive Output Heads (with detach)
|
| 138 |
+
β
|
| 139 |
+
ββββββββββββββββΌββββββββββββββββ¬βββββββββββββ
|
| 140 |
+
shoulder(3) c_stick(9) main_stick(37) buttons(7)
|
| 141 |
```
|
| 142 |
|
| 143 |
### 7-class button head
|
|
|
|
| 152 |
| 5 | A_TRIG (shield grab) |
|
| 153 |
| 6 | NONE |
|
| 154 |
|
| 155 |
+
HAL's original 5-class head (A / B / Jump / Z / None) has no TRIG class
|
| 156 |
+
and structurally can't execute airdodge, which means HAL-lineage bots
|
| 157 |
+
can't wavedash. MIMIC's 7-class encoding plus a fix for
|
| 158 |
+
`decode_and_press` (which was silently dropping the digital L press
|
| 159 |
+
until 2026-04-13) is what enables the wavedashing in the replays.
|
| 160 |
+
|
| 161 |
+
### Input features (per frame, per player)
|
| 162 |
+
|
| 163 |
+
Numeric (13):
|
| 164 |
+
|
| 165 |
+
pos_x, pos_y, percent, stock, jumps_left,
|
| 166 |
+
speed_air_x_self, speed_ground_x_self,
|
| 167 |
+
speed_x_attack, speed_y_attack, speed_y_self,
|
| 168 |
+
hitlag_left, hitstun_left,
|
| 169 |
+
shield_strength
|
| 170 |
+
|
| 171 |
+
Flags (5):
|
| 172 |
+
|
| 173 |
+
on_ground, off_stage, facing, invulnerable, moonwalkwarning
|
| 174 |
+
|
| 175 |
+
Per-feature normalization is defined in each character's
|
| 176 |
+
`mimic_norm.json`. The active transforms are:
|
| 177 |
|
| 178 |
+
| transform | formula | used for |
|
| 179 |
+
|---|---|---|
|
| 180 |
+
| `normalize` | `2(x-min)/(max-min) - 1` β [-1, +1] | percent, stock, jumps_left, facing, invulnerable, on_ground |
|
| 181 |
+
| `standardize` | `(x - mean) / std` | pos_x, pos_y |
|
| 182 |
+
| `invert_normalize` | `2(max-x)/(max-min) - 1` | shield_strength (so "shield broken" is +1) |
|
| 183 |
+
| `tanh_scale` | `tanh(x / scale)` | 5 velocities (scale=5 for self, scale=10 for attack) |
|
| 184 |
+
| `linear_max` | `x / max` | hitlag_left (max=20) |
|
| 185 |
+
| `log_max` | `log1p(clamp(x,0,max)) / log1p(max)` | hitstun_left (max=120) |
|
| 186 |
|
| 187 |
+
Plus categorical embeddings: stage(4d), 2Γ character(12d),
|
| 188 |
+
2Γ action(32d). Plus the previous-frame controller state as a 56-dim
|
| 189 |
+
one-hot (37 stick + 9 c-stick + 7 button + 3 shoulder).
|
| 190 |
|
| 191 |
+
Total input per frame: **184 dimensions** β projected to 512.
|
|
|
|
|
|
|
| 192 |
|
| 193 |
+
Earlier builds (pre-2026-04-20) used a 22-col numeric schema that
|
| 194 |
+
included `invuln_left` and 8 ECB corners. Those columns turned out to
|
| 195 |
+
be structurally zero for our .slp parse path β libmelee never
|
| 196 |
+
populates them β so they were dropped from the schema. See research
|
| 197 |
+
notes 2026-04-20 for the audit. Checkpoints trained pre-drop
|
| 198 |
+
(`peach-20260420-baseline`) still load via their own pickled config
|
| 199 |
+
but use the 202-dim projection path.
|
| 200 |
|
| 201 |
## Training
|
| 202 |
|
| 203 |
+
- Model preset: `mimic` (20M params)
|
| 204 |
+
- Optimizer: AdamW, LR 3e-4, weight decay 0.01, **no warmup**
|
| 205 |
+
- LR schedule: `CosineAnnealingLR` to `eta_min=1e-6`
|
| 206 |
- Gradient clip: 1.0
|
| 207 |
- Dropout: 0.2
|
| 208 |
+
- Sequence length: **180 frames** (~3 seconds)
|
| 209 |
+
- Batch size: 256 per-GPU Γ 2 RTX 5090s Γ grad-accum 1 = **eff-batch 512**
|
| 210 |
- Mixed precision: BF16 AMP with FP32 upcast for relpos attention
|
| 211 |
+
(prevents BF16 overflow in the manual Q@Kα΅ + S_rel computation)
|
| 212 |
+
- Max samples: 16.78M (β 32,768 steps at eff-batch 512)
|
| 213 |
+
- Watchdog: patience=12 evals on val-plateau β some chars finish early
|
| 214 |
+
- Reaction delay: 0. v2 shards have `target[i] = buttons[i+1]`, so
|
| 215 |
+
`rd=0` matches inference β do NOT use `--reaction-delay 1` or
|
| 216 |
+
`--controller-offset` with v2 shards.
|
| 217 |
+
- `--self-inputs` is required even on v2 shards. Runs without it
|
| 218 |
+
drop the controller-history input entirely and land at val loss ~2.3.
|
| 219 |
+
|
| 220 |
+
Typical wall-clock per char on 2ΓRTX 5090: 10-15 min download/extract
|
| 221 |
+
+ 20 min parallel `norm_stats` bootstrap + 45-120 min sharding
|
| 222 |
+
(depending on char, cptfalcon and sheik are the longest) + ~50 min
|
| 223 |
+
training = 2-4 hours.
|
| 224 |
|
| 225 |
## Known limitations
|
| 226 |
|
| 227 |
+
1. **Character-locked.** Each model only plays the character it was
|
| 228 |
+
trained on. No matchup generalization. Multi-character training
|
| 229 |
+
with a character embedding is a natural next step but not done.
|
| 230 |
+
2. **Small-dataset overfitting on Luigi / Ice Climbers.** Luigi has
|
| 231 |
+
~2K training games; IC around 5K. Their `_bestloss.pt` is
|
| 232 |
+
early-stopped β either by the patience=12 watchdog during this
|
| 233 |
+
cycle or by inspection in prior cycles. Play quality varies.
|
| 234 |
+
3. **Edge guarding and recovery weaknesses.** Bots don't consistently
|
| 235 |
+
go for off-stage edge guards or execute high-skill recovery
|
| 236 |
+
mixups. The training data has these in it, but BC bots under-sample
|
| 237 |
+
long-tail strategic decisions.
|
| 238 |
+
4. **No Matchmaking / Ranked.** The Discord bot only joins explicit
|
| 239 |
+
Direct Connect lobbies. Do NOT adapt it for Slippi Online Unranked
|
| 240 |
+
or Ranked β libmelee's README explicitly forbids bots on those
|
| 241 |
+
ladders, and Slippi has not yet opened a "bot account" opt-in
|
| 242 |
+
system.
|
| 243 |
|
| 244 |
## Acknowledgments
|
| 245 |
|
| 246 |
+
- **Eric Gu** for [HAL](https://github.com/ericyuegu/hal), the
|
| 247 |
+
reference implementation MIMIC is based on. HAL's architecture,
|
| 248 |
+
tokenization, and training pipeline are the foundation.
|
| 249 |
+
- **Vlad Firoiu and collaborators** for
|
| 250 |
+
[libmelee](https://github.com/altf4/libmelee), the Python interface
|
| 251 |
+
to Dolphin + Slippi.
|
| 252 |
- **Project Slippi** for the Slippi Dolphin fork, replay format, and
|
| 253 |
Direct Connect rollback netplay. https://slippi.gg
|
| 254 |
|
| 255 |
## License
|
| 256 |
|
| 257 |
+
MIT β see the MIMIC repo's `LICENSE` file.
|