| --- |
| license: mit |
| tags: |
| - behavior-cloning |
| - imitation-learning |
| - super-smash-bros-melee |
| - reinforcement-learning |
| - gaming |
| library_name: pytorch |
| --- |
| |
| # MIMIC: Melee Imitation Model for Input Cloning |
|
|
| Behavior-cloned Super Smash Bros. Melee bots trained on human Slippi replays. |
| Four character-specific models (Fox, Falco, Captain Falcon, Luigi), each a |
| ~20M-parameter transformer that takes a 256-frame window of game state and |
| outputs controller inputs (main stick, c-stick, shoulder, buttons) at 60 Hz. |
|
|
| - **Repo**: https://github.com/erickfm/MIMIC |
| - **Base architecture**: HAL's GPTv5Controller (Eric Gu, |
| https://github.com/ericyuegu/hal) β 6-layer causal transformer, |
| 512 d_model, 8 heads, 256-frame context, relative position encoding |
| (Shaw et al.) |
| - **MIMIC-specific changes**: 7-class button head (distinct TRIG class for |
| airdodge/wavedash, which HAL's 5-class head cannot represent); v2 shard |
| alignment that fixes a subtle gamestate leak in the training targets |
| (see research notes 2026-04-11c); fix for the digital L press bug that |
| prevented all 7-class BC bots from wavedashing until 2026-04-13. |
| - **Training data**: filtered from |
| [erickfm/slippi-public-dataset-v3.7](https://huggingface.co/datasets/erickfm/slippi-public-dataset-v3.7) |
| (~95K Slippi replays). |
| |
| ## Per-character checkpoints |
| |
| | Character | Games | Val btn F1 | Val main F1 | Val loss | Step | |
| |---|---|---|---|---|---| |
| | **Fox** | 17,319 | 87.1% | ~55% | 0.77 | 55,692 | |
| |
| ## Repo layout |
| |
| ``` |
| MIMIC/ |
| βββ README.md # this file |
| βββ fox/ |
| β βββ model.pt # raw PyTorch checkpoint |
| β βββ config.json # ModelConfig (copied from ckpt["config"]) |
| β βββ metadata.json # provenance (step, val metrics, notes) |
| β βββ mimic_norm.json # normalization stats |
| β βββ controller_combos.json # 7-class button combo spec |
| β βββ cat_maps.json |
| β βββ stick_clusters.json |
| β βββ norm_stats.json |
| βββ falco/ (same layout) |
| βββ cptfalcon/ (same layout) |
| βββ luigi/ (same layout) |
| ``` |
| |
| Each character directory is self-contained β the JSONs are the exact |
| metadata used during training, copied verbatim from the MIMIC data dir so |
| any inference script can load them without touching the MIMIC repo. |
| |
| ## Usage |
| |
| Clone the MIMIC repo and pull this model: |
| |
| ```bash |
| git clone https://github.com/erickfm/MIMIC.git |
| cd MIMIC |
| bash setup.sh # installs Dolphin, deps, ISO |
|
|
| # Download all four characters |
| python3 -c " |
| from huggingface_hub import snapshot_download |
| snapshot_download('erickfm/MIMIC', local_dir='./hf_checkpoints') |
| " |
| ``` |
| |
| Run a character against a level-9 CPU: |
| |
| ```bash |
| python3 tools/play_vs_cpu.py \ |
| --checkpoint hf_checkpoints/falco/model.pt \ |
| --dolphin-path ./emulator/squashfs-root/usr/bin/dolphin-emu \ |
| --iso-path ./melee.iso \ |
| --data-dir hf_checkpoints/falco \ |
| --character FALCO --cpu-character FALCO --cpu-level 9 \ |
| --stage FINAL_DESTINATION |
| ``` |
| |
| Or play the bot over Slippi Online Direct Connect: |
| |
| ```bash |
| python3 tools/play_netplay.py \ |
| --checkpoint hf_checkpoints/falco/model.pt \ |
| --dolphin-path ./emulator/squashfs-root/usr/bin/dolphin-emu \ |
| --iso-path ./melee.iso \ |
| --data-dir hf_checkpoints/falco \ |
| --character FALCO \ |
| --connect-code YOUR#123 |
| ``` |
| |
| The MIMIC repo also includes a Discord bot frontend |
| (`tools/discord_bot.py`) that queues direct-connect matches per user. |
| See [docs/discord-bot-setup.md](https://github.com/erickfm/MIMIC/blob/main/docs/discord-bot-setup.md). |
|
|
| ## Architecture |
|
|
| ``` |
| Slippi Frame βββΊ HALFlatEncoder (Linear 166β512) βββΊ 512-d per-frame vector |
| β |
| 256-frame window βββΊ + Relative Position Encoding βββββββββ |
| β |
| 6Γ Pre-Norm Causal Transformer Blocks (512-d, 8 heads) |
| β |
| Autoregressive Output Heads (with detach) |
| β |
| ββββββββββββΌβββββββββββ¬ββββββββββββ |
| shoulder(3) c_stick(9) main_stick(37) buttons(7) |
| ``` |
|
|
| ### 7-class button head |
|
|
| | Class | Meaning | |
| |---|---| |
| | 0 | A | |
| | 1 | B | |
| | 2 | Z | |
| | 3 | JUMP (X or Y) | |
| | 4 | TRIG (digital L or R) | |
| | 5 | A_TRIG (shield grab) | |
| | 6 | NONE | |
| |
| HAL's original 5-class head (`A, B, Jump, Z, None`) has no TRIG class and |
| structurally cannot execute airdodge, which means HAL-lineage bots cannot |
| wavedash. MIMIC's 7-class encoding plus a fix for `decode_and_press` |
| (which was silently dropping the digital L press until 2026-04-13) is |
| what enables the wavedashing you'll see in the replays. |
| |
| ### Input features |
| |
| 9 numeric features per player (ego + opponent = 18 total): |
| `percent, stock, facing, invulnerable, jumps_left, on_ground, |
| shield_strength, position_x, position_y` |
|
|
| Plus categorical embeddings: stage(4d), 2Γ character(12d), 2Γ action(32d). |
| Plus controller state from the previous frame as a 56-dim one-hot |
| (37 stick + 9 c-stick + 7 button + 3 shoulder). |
|
|
| Total input per frame: 166 dimensions β projected to 512. |
|
|
| ## Training |
|
|
| - Optimizer: AdamW, LR 3e-4, weight decay 0.01, no warmup |
| - LR schedule: CosineAnnealingLR, eta_min 1e-6 |
| - Gradient clip: 1.0 |
| - Dropout: 0.2 |
| - Sequence length: 256 frames (~4.3 seconds) |
| - Mixed precision: BF16 AMP with FP32 upcast for relpos attention |
| (prevents BF16 overflow in the manual Q@K^T + Srel computation) |
| - Batch size: 512 (typically single-GPU on an RTX 5090) |
| - Steps: ~32K for well-represented characters, early-stopped for Luigi |
| - Reaction delay: 0 (v2 shards have target[i] = buttons[i+1], so the |
| default rd=0 matches inference β do NOT use `--reaction-delay 1` or |
| `--controller-offset` with v2 shards) |
| |
| ## Known limitations |
| |
| 1. **Character-locked**: each model only plays the character it was trained |
| on. No matchup generalization. Training a multi-character model with a |
| character embedding is a natural next step but not done yet. |
| 2. **Fox model is legacy**: the Fox checkpoint is from an earlier run that |
| predates the `--self-inputs` fix. Its val metrics are much lower than |
| the others and it plays slightly worse. |
| 3. **Small-dataset overfitting**: Luigi only has 1951 training games after |
| filtering. The `_best.pt` checkpoint is early-stopped at step 5242 to |
| avoid the val-loss climb. Plays surprisingly well for the data volume. |
| 4. **Edge guarding and recovery weaknesses**: the bot doesn't consistently |
| go for off-stage edge guards or execute high-skill recovery mixups. |
| 5. **No matchmaking / Ranked**: the Discord bot only joins explicit Direct |
| Connect lobbies. Do NOT adapt it for Slippi Online Unranked or Ranked β |
| the libmelee README explicitly forbids bots on those ladders, and |
| Slippi has not yet opened a "bot account" opt-in system. |
|
|
| ## Acknowledgments |
|
|
| - **Eric Gu** for HAL, the reference implementation MIMIC is based on. |
| HAL's architecture, tokenization, and training pipeline are the |
| foundation. https://github.com/ericyuegu/hal |
| - **Vlad Firoiu and collaborators** for libmelee, the Python interface |
| to Dolphin + Slippi. https://github.com/altf4/libmelee |
| - **Project Slippi** for the Slippi Dolphin fork, replay format, and |
| Direct Connect rollback netplay. https://slippi.gg |
|
|
| ## License |
|
|
| MIT β see the MIMIC repo's LICENSE file. |
|
|