File size: 7,426 Bytes
a6b0487 a30bee7 a6b0487 a30bee7 a6b0487 a30bee7 f5d1cd7 a30bee7 a6b0487 a30bee7 97ccaa3 a30bee7 a6b0487 a30bee7 a6b0487 a30bee7 a6b0487 a30bee7 a6b0487 a30bee7 a6b0487 a30bee7 a6b0487 a30bee7 a6b0487 a30bee7 97ccaa3 a30bee7 a6b0487 a30bee7 a6b0487 a30bee7 a6b0487 a30bee7 a6b0487 a30bee7 a6b0487 a30bee7 a6b0487 a30bee7 a6b0487 a30bee7 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 | ---
license: mit
tags:
- behavior-cloning
- imitation-learning
- super-smash-bros-melee
- reinforcement-learning
- gaming
library_name: pytorch
---
# MIMIC: Melee Imitation Model for Input Cloning
Behavior-cloned Super Smash Bros. Melee bots trained on human Slippi replays.
Four character-specific models (Fox, Falco, Captain Falcon, Luigi), each a
~20M-parameter transformer that takes a 256-frame window of game state and
outputs controller inputs (main stick, c-stick, shoulder, buttons) at 60 Hz.
- **Repo**: https://github.com/erickfm/MIMIC
- **Base architecture**: HAL's GPTv5Controller (Eric Gu,
https://github.com/ericyuegu/hal) β 6-layer causal transformer,
512 d_model, 8 heads, 256-frame context, relative position encoding
(Shaw et al.)
- **MIMIC-specific changes**: 7-class button head (distinct TRIG class for
airdodge/wavedash, which HAL's 5-class head cannot represent); v2 shard
alignment that fixes a subtle gamestate leak in the training targets
(see research notes 2026-04-11c); fix for the digital L press bug that
prevented all 7-class BC bots from wavedashing until 2026-04-13.
- **Training data**: filtered from
[erickfm/slippi-public-dataset-v3.7](https://huggingface.co/datasets/erickfm/slippi-public-dataset-v3.7)
(~95K Slippi replays).
## Per-character checkpoints
| Character | Games | Val btn F1 | Val main F1 | Val loss | Step |
|---|---|---|---|---|---|
| **Fox** | 17,319 | 87.1% | ~55% | 0.77 | 55,692 |
## Repo layout
```
MIMIC/
βββ README.md # this file
βββ fox/
β βββ model.pt # raw PyTorch checkpoint
β βββ config.json # ModelConfig (copied from ckpt["config"])
β βββ metadata.json # provenance (step, val metrics, notes)
β βββ mimic_norm.json # normalization stats
β βββ controller_combos.json # 7-class button combo spec
β βββ cat_maps.json
β βββ stick_clusters.json
β βββ norm_stats.json
βββ falco/ (same layout)
βββ cptfalcon/ (same layout)
βββ luigi/ (same layout)
```
Each character directory is self-contained β the JSONs are the exact
metadata used during training, copied verbatim from the MIMIC data dir so
any inference script can load them without touching the MIMIC repo.
## Usage
Clone the MIMIC repo and pull this model:
```bash
git clone https://github.com/erickfm/MIMIC.git
cd MIMIC
bash setup.sh # installs Dolphin, deps, ISO
# Download all four characters
python3 -c "
from huggingface_hub import snapshot_download
snapshot_download('erickfm/MIMIC', local_dir='./hf_checkpoints')
"
```
Run a character against a level-9 CPU:
```bash
python3 tools/play_vs_cpu.py \
--checkpoint hf_checkpoints/falco/model.pt \
--dolphin-path ./emulator/squashfs-root/usr/bin/dolphin-emu \
--iso-path ./melee.iso \
--data-dir hf_checkpoints/falco \
--character FALCO --cpu-character FALCO --cpu-level 9 \
--stage FINAL_DESTINATION
```
Or play the bot over Slippi Online Direct Connect:
```bash
python3 tools/play_netplay.py \
--checkpoint hf_checkpoints/falco/model.pt \
--dolphin-path ./emulator/squashfs-root/usr/bin/dolphin-emu \
--iso-path ./melee.iso \
--data-dir hf_checkpoints/falco \
--character FALCO \
--connect-code YOUR#123
```
The MIMIC repo also includes a Discord bot frontend
(`tools/discord_bot.py`) that queues direct-connect matches per user.
See [docs/discord-bot-setup.md](https://github.com/erickfm/MIMIC/blob/main/docs/discord-bot-setup.md).
## Architecture
```
Slippi Frame βββΊ HALFlatEncoder (Linear 166β512) βββΊ 512-d per-frame vector
β
256-frame window βββΊ + Relative Position Encoding βββββββββ
β
6Γ Pre-Norm Causal Transformer Blocks (512-d, 8 heads)
β
Autoregressive Output Heads (with detach)
β
ββββββββββββΌβββββββββββ¬ββββββββββββ
shoulder(3) c_stick(9) main_stick(37) buttons(7)
```
### 7-class button head
| Class | Meaning |
|---|---|
| 0 | A |
| 1 | B |
| 2 | Z |
| 3 | JUMP (X or Y) |
| 4 | TRIG (digital L or R) |
| 5 | A_TRIG (shield grab) |
| 6 | NONE |
HAL's original 5-class head (`A, B, Jump, Z, None`) has no TRIG class and
structurally cannot execute airdodge, which means HAL-lineage bots cannot
wavedash. MIMIC's 7-class encoding plus a fix for `decode_and_press`
(which was silently dropping the digital L press until 2026-04-13) is
what enables the wavedashing you'll see in the replays.
### Input features
9 numeric features per player (ego + opponent = 18 total):
`percent, stock, facing, invulnerable, jumps_left, on_ground,
shield_strength, position_x, position_y`
Plus categorical embeddings: stage(4d), 2Γ character(12d), 2Γ action(32d).
Plus controller state from the previous frame as a 56-dim one-hot
(37 stick + 9 c-stick + 7 button + 3 shoulder).
Total input per frame: 166 dimensions β projected to 512.
## Training
- Optimizer: AdamW, LR 3e-4, weight decay 0.01, no warmup
- LR schedule: CosineAnnealingLR, eta_min 1e-6
- Gradient clip: 1.0
- Dropout: 0.2
- Sequence length: 256 frames (~4.3 seconds)
- Mixed precision: BF16 AMP with FP32 upcast for relpos attention
(prevents BF16 overflow in the manual Q@K^T + Srel computation)
- Batch size: 512 (typically single-GPU on an RTX 5090)
- Steps: ~32K for well-represented characters, early-stopped for Luigi
- Reaction delay: 0 (v2 shards have target[i] = buttons[i+1], so the
default rd=0 matches inference β do NOT use `--reaction-delay 1` or
`--controller-offset` with v2 shards)
## Known limitations
1. **Character-locked**: each model only plays the character it was trained
on. No matchup generalization. Training a multi-character model with a
character embedding is a natural next step but not done yet.
2. **Fox model is legacy**: the Fox checkpoint is from an earlier run that
predates the `--self-inputs` fix. Its val metrics are much lower than
the others and it plays slightly worse.
3. **Small-dataset overfitting**: Luigi only has 1951 training games after
filtering. The `_best.pt` checkpoint is early-stopped at step 5242 to
avoid the val-loss climb. Plays surprisingly well for the data volume.
4. **Edge guarding and recovery weaknesses**: the bot doesn't consistently
go for off-stage edge guards or execute high-skill recovery mixups.
5. **No matchmaking / Ranked**: the Discord bot only joins explicit Direct
Connect lobbies. Do NOT adapt it for Slippi Online Unranked or Ranked β
the libmelee README explicitly forbids bots on those ladders, and
Slippi has not yet opened a "bot account" opt-in system.
## Acknowledgments
- **Eric Gu** for HAL, the reference implementation MIMIC is based on.
HAL's architecture, tokenization, and training pipeline are the
foundation. https://github.com/ericyuegu/hal
- **Vlad Firoiu and collaborators** for libmelee, the Python interface
to Dolphin + Slippi. https://github.com/altf4/libmelee
- **Project Slippi** for the Slippi Dolphin fork, replay format, and
Direct Connect rollback netplay. https://slippi.gg
## License
MIT β see the MIMIC repo's LICENSE file.
|