--- license: mit tags: - behavior-cloning - imitation-learning - super-smash-bros-melee - reinforcement-learning - gaming library_name: pytorch --- # MIMIC: Melee Imitation Model for Input Cloning Behavior-cloned Super Smash Bros. Melee bots trained on human Slippi replays. Four character-specific models (Fox, Falco, Captain Falcon, Luigi), each a ~20M-parameter transformer that takes a 256-frame window of game state and outputs controller inputs (main stick, c-stick, shoulder, buttons) at 60 Hz. - **Repo**: https://github.com/erickfm/MIMIC - **Base architecture**: HAL's GPTv5Controller (Eric Gu, https://github.com/ericyuegu/hal) — 6-layer causal transformer, 512 d_model, 8 heads, 256-frame context, relative position encoding (Shaw et al.) - **MIMIC-specific changes**: 7-class button head (distinct TRIG class for airdodge/wavedash, which HAL's 5-class head cannot represent); v2 shard alignment that fixes a subtle gamestate leak in the training targets (see research notes 2026-04-11c); fix for the digital L press bug that prevented all 7-class BC bots from wavedashing until 2026-04-13. - **Training data**: filtered from [erickfm/slippi-public-dataset-v3.7](https://huggingface.co/datasets/erickfm/slippi-public-dataset-v3.7) (~95K Slippi replays). ## Per-character checkpoints | Character | Games | Val btn F1 | Val main F1 | Val loss | Step | |---|---|---|---|---|---| | **Fox** | 17,319 | 87.1% | ~55% | 0.77 | 55,692 | ## Repo layout ``` MIMIC/ ├── README.md # this file ├── fox/ │ ├── model.pt # raw PyTorch checkpoint │ ├── config.json # ModelConfig (copied from ckpt["config"]) │ ├── metadata.json # provenance (step, val metrics, notes) │ ├── mimic_norm.json # normalization stats │ ├── controller_combos.json # 7-class button combo spec │ ├── cat_maps.json │ ├── stick_clusters.json │ └── norm_stats.json ├── falco/ (same layout) ├── cptfalcon/ (same layout) └── luigi/ (same layout) ``` Each character directory is self-contained — the JSONs are the exact metadata used during training, copied verbatim from the MIMIC data dir so any inference script can load them without touching the MIMIC repo. ## Usage Clone the MIMIC repo and pull this model: ```bash git clone https://github.com/erickfm/MIMIC.git cd MIMIC bash setup.sh # installs Dolphin, deps, ISO # Download all four characters python3 -c " from huggingface_hub import snapshot_download snapshot_download('erickfm/MIMIC', local_dir='./hf_checkpoints') " ``` Run a character against a level-9 CPU: ```bash python3 tools/play_vs_cpu.py \ --checkpoint hf_checkpoints/falco/model.pt \ --dolphin-path ./emulator/squashfs-root/usr/bin/dolphin-emu \ --iso-path ./melee.iso \ --data-dir hf_checkpoints/falco \ --character FALCO --cpu-character FALCO --cpu-level 9 \ --stage FINAL_DESTINATION ``` Or play the bot over Slippi Online Direct Connect: ```bash python3 tools/play_netplay.py \ --checkpoint hf_checkpoints/falco/model.pt \ --dolphin-path ./emulator/squashfs-root/usr/bin/dolphin-emu \ --iso-path ./melee.iso \ --data-dir hf_checkpoints/falco \ --character FALCO \ --connect-code YOUR#123 ``` The MIMIC repo also includes a Discord bot frontend (`tools/discord_bot.py`) that queues direct-connect matches per user. See [docs/discord-bot-setup.md](https://github.com/erickfm/MIMIC/blob/main/docs/discord-bot-setup.md). ## Architecture ``` Slippi Frame ──► HALFlatEncoder (Linear 166→512) ──► 512-d per-frame vector │ 256-frame window ──► + Relative Position Encoding ────────┘ │ 6× Pre-Norm Causal Transformer Blocks (512-d, 8 heads) │ Autoregressive Output Heads (with detach) │ ┌──────────┼──────────┬───────────┐ shoulder(3) c_stick(9) main_stick(37) buttons(7) ``` ### 7-class button head | Class | Meaning | |---|---| | 0 | A | | 1 | B | | 2 | Z | | 3 | JUMP (X or Y) | | 4 | TRIG (digital L or R) | | 5 | A_TRIG (shield grab) | | 6 | NONE | HAL's original 5-class head (`A, B, Jump, Z, None`) has no TRIG class and structurally cannot execute airdodge, which means HAL-lineage bots cannot wavedash. MIMIC's 7-class encoding plus a fix for `decode_and_press` (which was silently dropping the digital L press until 2026-04-13) is what enables the wavedashing you'll see in the replays. ### Input features 9 numeric features per player (ego + opponent = 18 total): `percent, stock, facing, invulnerable, jumps_left, on_ground, shield_strength, position_x, position_y` Plus categorical embeddings: stage(4d), 2× character(12d), 2× action(32d). Plus controller state from the previous frame as a 56-dim one-hot (37 stick + 9 c-stick + 7 button + 3 shoulder). Total input per frame: 166 dimensions → projected to 512. ## Training - Optimizer: AdamW, LR 3e-4, weight decay 0.01, no warmup - LR schedule: CosineAnnealingLR, eta_min 1e-6 - Gradient clip: 1.0 - Dropout: 0.2 - Sequence length: 256 frames (~4.3 seconds) - Mixed precision: BF16 AMP with FP32 upcast for relpos attention (prevents BF16 overflow in the manual Q@K^T + Srel computation) - Batch size: 512 (typically single-GPU on an RTX 5090) - Steps: ~32K for well-represented characters, early-stopped for Luigi - Reaction delay: 0 (v2 shards have target[i] = buttons[i+1], so the default rd=0 matches inference — do NOT use `--reaction-delay 1` or `--controller-offset` with v2 shards) ## Known limitations 1. **Character-locked**: each model only plays the character it was trained on. No matchup generalization. Training a multi-character model with a character embedding is a natural next step but not done yet. 2. **Fox model is legacy**: the Fox checkpoint is from an earlier run that predates the `--self-inputs` fix. Its val metrics are much lower than the others and it plays slightly worse. 3. **Small-dataset overfitting**: Luigi only has 1951 training games after filtering. The `_best.pt` checkpoint is early-stopped at step 5242 to avoid the val-loss climb. Plays surprisingly well for the data volume. 4. **Edge guarding and recovery weaknesses**: the bot doesn't consistently go for off-stage edge guards or execute high-skill recovery mixups. 5. **No matchmaking / Ranked**: the Discord bot only joins explicit Direct Connect lobbies. Do NOT adapt it for Slippi Online Unranked or Ranked — the libmelee README explicitly forbids bots on those ladders, and Slippi has not yet opened a "bot account" opt-in system. ## Acknowledgments - **Eric Gu** for HAL, the reference implementation MIMIC is based on. HAL's architecture, tokenization, and training pipeline are the foundation. https://github.com/ericyuegu/hal - **Vlad Firoiu and collaborators** for libmelee, the Python interface to Dolphin + Slippi. https://github.com/altf4/libmelee - **Project Slippi** for the Slippi Dolphin fork, replay format, and Direct Connect rollback netplay. https://slippi.gg ## License MIT — see the MIMIC repo's LICENSE file.