---
license: mit
tags:
- behavior-cloning
- imitation-learning
- super-smash-bros-melee
- reinforcement-learning
- gaming
library_name: pytorch
---

# MIMIC: Melee Imitation Model for Input Cloning

Behavior-cloned Super Smash Bros. Melee bots trained on human Slippi replays.
Four character-specific models (Fox, Falco, Captain Falcon, Luigi), each a
~20M-parameter transformer that takes a 256-frame window of game state and
outputs controller inputs (main stick, c-stick, shoulder, buttons) at 60 Hz.

- **Repo**: https://github.com/erickfm/MIMIC
- **Base architecture**: HAL's GPTv5Controller (Eric Gu,
  https://github.com/ericyuegu/hal) — 6-layer causal transformer,
  512 d_model, 8 heads, 256-frame context, relative position encoding
  (Shaw et al.)
- **MIMIC-specific changes**: 7-class button head (distinct TRIG class for
  airdodge/wavedash, which HAL's 5-class head cannot represent); v2 shard
  alignment that fixes a subtle gamestate leak in the training targets
  (see research notes 2026-04-11c); fix for the digital L press bug that
  prevented all 7-class BC bots from wavedashing until 2026-04-13.
- **Training data**: filtered from
  [erickfm/slippi-public-dataset-v3.7](https://huggingface.co/datasets/erickfm/slippi-public-dataset-v3.7)
  (~95K Slippi replays).

## Per-character checkpoints

| Character | Games | Val btn F1 | Val main F1 | Val loss | Step |
|---|---|---|---|---|---|
| **Fox** | 17,319 | 87.1% | ~55% | 0.77 | 55,692 |

## Repo layout

```
MIMIC/
├── README.md                      # this file
├── fox/
│   ├── model.pt                   # raw PyTorch checkpoint
│   ├── config.json                # ModelConfig (copied from ckpt["config"])
│   ├── metadata.json              # provenance (step, val metrics, notes)
│   ├── mimic_norm.json            # normalization stats
│   ├── controller_combos.json     # 7-class button combo spec
│   ├── cat_maps.json
│   ├── stick_clusters.json
│   └── norm_stats.json
├── falco/      (same layout)
├── cptfalcon/  (same layout)
└── luigi/      (same layout)
```

Each character directory is self-contained — the JSONs are the exact
metadata used during training, copied verbatim from the MIMIC data dir so
any inference script can load them without touching the MIMIC repo.

## Usage

Clone the MIMIC repo and pull this model:

```bash
git clone https://github.com/erickfm/MIMIC.git
cd MIMIC
bash setup.sh  # installs Dolphin, deps, ISO

# Download all four characters
python3 -c "
from huggingface_hub import snapshot_download
snapshot_download('erickfm/MIMIC', local_dir='./hf_checkpoints')
"
```

Run a character against a level-9 CPU:

```bash
python3 tools/play_vs_cpu.py \
  --checkpoint hf_checkpoints/falco/model.pt \
  --dolphin-path ./emulator/squashfs-root/usr/bin/dolphin-emu \
  --iso-path ./melee.iso \
  --data-dir hf_checkpoints/falco \
  --character FALCO --cpu-character FALCO --cpu-level 9 \
  --stage FINAL_DESTINATION
```

Or play the bot over Slippi Online Direct Connect:

```bash
python3 tools/play_netplay.py \
  --checkpoint hf_checkpoints/falco/model.pt \
  --dolphin-path ./emulator/squashfs-root/usr/bin/dolphin-emu \
  --iso-path ./melee.iso \
  --data-dir hf_checkpoints/falco \
  --character FALCO \
  --connect-code YOUR#123
```

The MIMIC repo also includes a Discord bot frontend
(`tools/discord_bot.py`) that queues direct-connect matches per user.
See [docs/discord-bot-setup.md](https://github.com/erickfm/MIMIC/blob/main/docs/discord-bot-setup.md).

## Architecture

```
Slippi Frame ──► HALFlatEncoder (Linear 166→512) ──► 512-d per-frame vector
                                                          │
256-frame window ──► + Relative Position Encoding ────────┘
                         │
                    6× Pre-Norm Causal Transformer Blocks (512-d, 8 heads)
                         │
                    Autoregressive Output Heads (with detach)
                         │
              ┌──────────┼──────────┬───────────┐
           shoulder(3) c_stick(9) main_stick(37) buttons(7)
```

### 7-class button head

| Class | Meaning |
|---|---|
| 0 | A |
| 1 | B |
| 2 | Z |
| 3 | JUMP (X or Y) |
| 4 | TRIG (digital L or R) |
| 5 | A_TRIG (shield grab) |
| 6 | NONE |

HAL's original 5-class head (`A, B, Jump, Z, None`) has no TRIG class and
structurally cannot execute airdodge, which means HAL-lineage bots cannot
wavedash. MIMIC's 7-class encoding plus a fix for `decode_and_press`
(which was silently dropping the digital L press until 2026-04-13) is
what enables the wavedashing you'll see in the replays.

### Input features

9 numeric features per player (ego + opponent = 18 total):
`percent, stock, facing, invulnerable, jumps_left, on_ground,
shield_strength, position_x, position_y`

Plus categorical embeddings: stage(4d), 2× character(12d), 2× action(32d).
Plus controller state from the previous frame as a 56-dim one-hot
(37 stick + 9 c-stick + 7 button + 3 shoulder).

Total input per frame: 166 dimensions → projected to 512.

## Training

- Optimizer: AdamW, LR 3e-4, weight decay 0.01, no warmup
- LR schedule: CosineAnnealingLR, eta_min 1e-6
- Gradient clip: 1.0
- Dropout: 0.2
- Sequence length: 256 frames (~4.3 seconds)
- Mixed precision: BF16 AMP with FP32 upcast for relpos attention
  (prevents BF16 overflow in the manual Q@K^T + Srel computation)
- Batch size: 512 (typically single-GPU on an RTX 5090)
- Steps: ~32K for well-represented characters, early-stopped for Luigi
- Reaction delay: 0 (v2 shards have target[i] = buttons[i+1], so the
  default rd=0 matches inference — do NOT use `--reaction-delay 1` or
  `--controller-offset` with v2 shards)

## Known limitations

1. **Character-locked**: each model only plays the character it was trained
   on. No matchup generalization. Training a multi-character model with a
   character embedding is a natural next step but not done yet.
2. **Fox model is legacy**: the Fox checkpoint is from an earlier run that
   predates the `--self-inputs` fix. Its val metrics are much lower than
   the others and it plays slightly worse.
3. **Small-dataset overfitting**: Luigi only has 1951 training games after
   filtering. The `_best.pt` checkpoint is early-stopped at step 5242 to
   avoid the val-loss climb. Plays surprisingly well for the data volume.
4. **Edge guarding and recovery weaknesses**: the bot doesn't consistently
   go for off-stage edge guards or execute high-skill recovery mixups.
5. **No matchmaking / Ranked**: the Discord bot only joins explicit Direct
   Connect lobbies. Do NOT adapt it for Slippi Online Unranked or Ranked —
   the libmelee README explicitly forbids bots on those ladders, and
   Slippi has not yet opened a "bot account" opt-in system.

## Acknowledgments

- **Eric Gu** for HAL, the reference implementation MIMIC is based on.
  HAL's architecture, tokenization, and training pipeline are the
  foundation. https://github.com/ericyuegu/hal
- **Vlad Firoiu and collaborators** for libmelee, the Python interface
  to Dolphin + Slippi. https://github.com/altf4/libmelee
- **Project Slippi** for the Slippi Dolphin fork, replay format, and
  Direct Connect rollback netplay. https://slippi.gg

## License

MIT — see the MIMIC repo's LICENSE file.