MIMIC / README.md
erickfm's picture
Upload MIMIC character checkpoints + model card
f5d1cd7 verified
---
license: mit
tags:
- behavior-cloning
- imitation-learning
- super-smash-bros-melee
- reinforcement-learning
- gaming
library_name: pytorch
---
# MIMIC: Melee Imitation Model for Input Cloning
Behavior-cloned Super Smash Bros. Melee bots trained on human Slippi replays.
Four character-specific models (Fox, Falco, Captain Falcon, Luigi), each a
~20M-parameter transformer that takes a 256-frame window of game state and
outputs controller inputs (main stick, c-stick, shoulder, buttons) at 60 Hz.
- **Repo**: https://github.com/erickfm/MIMIC
- **Base architecture**: HAL's GPTv5Controller (Eric Gu,
https://github.com/ericyuegu/hal) β€” 6-layer causal transformer,
512 d_model, 8 heads, 256-frame context, relative position encoding
(Shaw et al.)
- **MIMIC-specific changes**: 7-class button head (distinct TRIG class for
airdodge/wavedash, which HAL's 5-class head cannot represent); v2 shard
alignment that fixes a subtle gamestate leak in the training targets
(see research notes 2026-04-11c); fix for the digital L press bug that
prevented all 7-class BC bots from wavedashing until 2026-04-13.
- **Training data**: filtered from
[erickfm/slippi-public-dataset-v3.7](https://huggingface.co/datasets/erickfm/slippi-public-dataset-v3.7)
(~95K Slippi replays).
## Per-character checkpoints
| Character | Games | Val btn F1 | Val main F1 | Val loss | Step |
|---|---|---|---|---|---|
| **Fox** | 17,319 | 87.1% | ~55% | 0.77 | 55,692 |
## Repo layout
```
MIMIC/
β”œβ”€β”€ README.md # this file
β”œβ”€β”€ fox/
β”‚ β”œβ”€β”€ model.pt # raw PyTorch checkpoint
β”‚ β”œβ”€β”€ config.json # ModelConfig (copied from ckpt["config"])
β”‚ β”œβ”€β”€ metadata.json # provenance (step, val metrics, notes)
β”‚ β”œβ”€β”€ mimic_norm.json # normalization stats
β”‚ β”œβ”€β”€ controller_combos.json # 7-class button combo spec
β”‚ β”œβ”€β”€ cat_maps.json
β”‚ β”œβ”€β”€ stick_clusters.json
β”‚ └── norm_stats.json
β”œβ”€β”€ falco/ (same layout)
β”œβ”€β”€ cptfalcon/ (same layout)
└── luigi/ (same layout)
```
Each character directory is self-contained β€” the JSONs are the exact
metadata used during training, copied verbatim from the MIMIC data dir so
any inference script can load them without touching the MIMIC repo.
## Usage
Clone the MIMIC repo and pull this model:
```bash
git clone https://github.com/erickfm/MIMIC.git
cd MIMIC
bash setup.sh # installs Dolphin, deps, ISO
# Download all four characters
python3 -c "
from huggingface_hub import snapshot_download
snapshot_download('erickfm/MIMIC', local_dir='./hf_checkpoints')
"
```
Run a character against a level-9 CPU:
```bash
python3 tools/play_vs_cpu.py \
--checkpoint hf_checkpoints/falco/model.pt \
--dolphin-path ./emulator/squashfs-root/usr/bin/dolphin-emu \
--iso-path ./melee.iso \
--data-dir hf_checkpoints/falco \
--character FALCO --cpu-character FALCO --cpu-level 9 \
--stage FINAL_DESTINATION
```
Or play the bot over Slippi Online Direct Connect:
```bash
python3 tools/play_netplay.py \
--checkpoint hf_checkpoints/falco/model.pt \
--dolphin-path ./emulator/squashfs-root/usr/bin/dolphin-emu \
--iso-path ./melee.iso \
--data-dir hf_checkpoints/falco \
--character FALCO \
--connect-code YOUR#123
```
The MIMIC repo also includes a Discord bot frontend
(`tools/discord_bot.py`) that queues direct-connect matches per user.
See [docs/discord-bot-setup.md](https://github.com/erickfm/MIMIC/blob/main/docs/discord-bot-setup.md).
## Architecture
```
Slippi Frame ──► HALFlatEncoder (Linear 166β†’512) ──► 512-d per-frame vector
β”‚
256-frame window ──► + Relative Position Encoding β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚
6Γ— Pre-Norm Causal Transformer Blocks (512-d, 8 heads)
β”‚
Autoregressive Output Heads (with detach)
β”‚
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
shoulder(3) c_stick(9) main_stick(37) buttons(7)
```
### 7-class button head
| Class | Meaning |
|---|---|
| 0 | A |
| 1 | B |
| 2 | Z |
| 3 | JUMP (X or Y) |
| 4 | TRIG (digital L or R) |
| 5 | A_TRIG (shield grab) |
| 6 | NONE |
HAL's original 5-class head (`A, B, Jump, Z, None`) has no TRIG class and
structurally cannot execute airdodge, which means HAL-lineage bots cannot
wavedash. MIMIC's 7-class encoding plus a fix for `decode_and_press`
(which was silently dropping the digital L press until 2026-04-13) is
what enables the wavedashing you'll see in the replays.
### Input features
9 numeric features per player (ego + opponent = 18 total):
`percent, stock, facing, invulnerable, jumps_left, on_ground,
shield_strength, position_x, position_y`
Plus categorical embeddings: stage(4d), 2Γ— character(12d), 2Γ— action(32d).
Plus controller state from the previous frame as a 56-dim one-hot
(37 stick + 9 c-stick + 7 button + 3 shoulder).
Total input per frame: 166 dimensions β†’ projected to 512.
## Training
- Optimizer: AdamW, LR 3e-4, weight decay 0.01, no warmup
- LR schedule: CosineAnnealingLR, eta_min 1e-6
- Gradient clip: 1.0
- Dropout: 0.2
- Sequence length: 256 frames (~4.3 seconds)
- Mixed precision: BF16 AMP with FP32 upcast for relpos attention
(prevents BF16 overflow in the manual Q@K^T + Srel computation)
- Batch size: 512 (typically single-GPU on an RTX 5090)
- Steps: ~32K for well-represented characters, early-stopped for Luigi
- Reaction delay: 0 (v2 shards have target[i] = buttons[i+1], so the
default rd=0 matches inference β€” do NOT use `--reaction-delay 1` or
`--controller-offset` with v2 shards)
## Known limitations
1. **Character-locked**: each model only plays the character it was trained
on. No matchup generalization. Training a multi-character model with a
character embedding is a natural next step but not done yet.
2. **Fox model is legacy**: the Fox checkpoint is from an earlier run that
predates the `--self-inputs` fix. Its val metrics are much lower than
the others and it plays slightly worse.
3. **Small-dataset overfitting**: Luigi only has 1951 training games after
filtering. The `_best.pt` checkpoint is early-stopped at step 5242 to
avoid the val-loss climb. Plays surprisingly well for the data volume.
4. **Edge guarding and recovery weaknesses**: the bot doesn't consistently
go for off-stage edge guards or execute high-skill recovery mixups.
5. **No matchmaking / Ranked**: the Discord bot only joins explicit Direct
Connect lobbies. Do NOT adapt it for Slippi Online Unranked or Ranked β€”
the libmelee README explicitly forbids bots on those ladders, and
Slippi has not yet opened a "bot account" opt-in system.
## Acknowledgments
- **Eric Gu** for HAL, the reference implementation MIMIC is based on.
HAL's architecture, tokenization, and training pipeline are the
foundation. https://github.com/ericyuegu/hal
- **Vlad Firoiu and collaborators** for libmelee, the Python interface
to Dolphin + Slippi. https://github.com/altf4/libmelee
- **Project Slippi** for the Slippi Dolphin fork, replay format, and
Direct Connect rollback netplay. https://slippi.gg
## License
MIT β€” see the MIMIC repo's LICENSE file.