File size: 7,426 Bytes

a6b0487
 
 
a30bee7
a6b0487
a30bee7
 
 
 
a6b0487
 
a30bee7
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
f5d1cd7
a30bee7
 
a6b0487
a30bee7
 
 
 
 
 
 
97ccaa3
a30bee7
 
 
 
 
 
 
 
a6b0487
a30bee7
 
 
a6b0487
a30bee7
a6b0487
a30bee7
a6b0487
a30bee7
 
 
 
a6b0487
a30bee7
 
 
 
 
 
a6b0487
a30bee7
a6b0487
a30bee7
97ccaa3
a30bee7
 
 
 
 
 
 
a6b0487
a30bee7
a6b0487
a30bee7
 
 
 
 
 
 
 
a6b0487
 
a30bee7
 
 
a6b0487
a30bee7
a6b0487
a30bee7
 
 
 
 
 
 
 
 
 
 
 
a6b0487
a30bee7
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
a6b0487
 
 
a30bee7

---
license: mit
tags:
- behavior-cloning
- imitation-learning
- super-smash-bros-melee
- reinforcement-learning
- gaming
library_name: pytorch
---

# MIMIC: Melee Imitation Model for Input Cloning

Behavior-cloned Super Smash Bros. Melee bots trained on human Slippi replays.
Four character-specific models (Fox, Falco, Captain Falcon, Luigi), each a
~20M-parameter transformer that takes a 256-frame window of game state and
outputs controller inputs (main stick, c-stick, shoulder, buttons) at 60 Hz.

- **Repo**: https://github.com/erickfm/MIMIC
- **Base architecture**: HAL's GPTv5Controller (Eric Gu,
  https://github.com/ericyuegu/hal) — 6-layer causal transformer,
  512 d_model, 8 heads, 256-frame context, relative position encoding
  (Shaw et al.)
- **MIMIC-specific changes**: 7-class button head (distinct TRIG class for
  airdodge/wavedash, which HAL's 5-class head cannot represent); v2 shard
  alignment that fixes a subtle gamestate leak in the training targets
  (see research notes 2026-04-11c); fix for the digital L press bug that
  prevented all 7-class BC bots from wavedashing until 2026-04-13.
- **Training data**: filtered from
  [erickfm/slippi-public-dataset-v3.7](https://huggingface.co/datasets/erickfm/slippi-public-dataset-v3.7)
  (~95K Slippi replays).

## Per-character checkpoints

| Character | Games | Val btn F1 | Val main F1 | Val loss | Step |
|---|---|---|---|---|---|
| **Fox** | 17,319 | 87.1% | ~55% | 0.77 | 55,692 |

## Repo layout

```
MIMIC/
├── README.md                      # this file
├── fox/
│   ├── model.pt                   # raw PyTorch checkpoint
│   ├── config.json                # ModelConfig (copied from ckpt["config"])
│   ├── metadata.json              # provenance (step, val metrics, notes)
│   ├── mimic_norm.json            # normalization stats
│   ├── controller_combos.json     # 7-class button combo spec
│   ├── cat_maps.json
│   ├── stick_clusters.json
│   └── norm_stats.json
├── falco/      (same layout)
├── cptfalcon/  (same layout)
└── luigi/      (same layout)
```

Each character directory is self-contained — the JSONs are the exact
metadata used during training, copied verbatim from the MIMIC data dir so
any inference script can load them without touching the MIMIC repo.

## Usage

Clone the MIMIC repo and pull this model:

```bash
git clone https://github.com/erickfm/MIMIC.git
cd MIMIC
bash setup.sh  # installs Dolphin, deps, ISO

# Download all four characters
python3 -c "
from huggingface_hub import snapshot_download
snapshot_download('erickfm/MIMIC', local_dir='./hf_checkpoints')
"
```

Run a character against a level-9 CPU:

```bash
python3 tools/play_vs_cpu.py \
  --checkpoint hf_checkpoints/falco/model.pt \
  --dolphin-path ./emulator/squashfs-root/usr/bin/dolphin-emu \
  --iso-path ./melee.iso \
  --data-dir hf_checkpoints/falco \
  --character FALCO --cpu-character FALCO --cpu-level 9 \
  --stage FINAL_DESTINATION
```

Or play the bot over Slippi Online Direct Connect:

```bash
python3 tools/play_netplay.py \
  --checkpoint hf_checkpoints/falco/model.pt \
  --dolphin-path ./emulator/squashfs-root/usr/bin/dolphin-emu \
  --iso-path ./melee.iso \
  --data-dir hf_checkpoints/falco \
  --character FALCO \
  --connect-code YOUR#123
```

The MIMIC repo also includes a Discord bot frontend
(`tools/discord_bot.py`) that queues direct-connect matches per user.
See [docs/discord-bot-setup.md](https://github.com/erickfm/MIMIC/blob/main/docs/discord-bot-setup.md).

## Architecture

```
Slippi Frame ──► HALFlatEncoder (Linear 166→512) ──► 512-d per-frame vector
                                                          │
256-frame window ──► + Relative Position Encoding ────────┘
                         │
                    6× Pre-Norm Causal Transformer Blocks (512-d, 8 heads)
                         │
                    Autoregressive Output Heads (with detach)
                         │
              ┌──────────┼──────────┬───────────┐
           shoulder(3) c_stick(9) main_stick(37) buttons(7)
```

### 7-class button head

| Class | Meaning |
|---|---|
| 0 | A |
| 1 | B |
| 2 | Z |
| 3 | JUMP (X or Y) |
| 4 | TRIG (digital L or R) |
| 5 | A_TRIG (shield grab) |
| 6 | NONE |

HAL's original 5-class head (`A, B, Jump, Z, None`) has no TRIG class and
structurally cannot execute airdodge, which means HAL-lineage bots cannot
wavedash. MIMIC's 7-class encoding plus a fix for `decode_and_press`
(which was silently dropping the digital L press until 2026-04-13) is
what enables the wavedashing you'll see in the replays.

### Input features

9 numeric features per player (ego + opponent = 18 total):
`percent, stock, facing, invulnerable, jumps_left, on_ground,
shield_strength, position_x, position_y`

Plus categorical embeddings: stage(4d), 2× character(12d), 2× action(32d).
Plus controller state from the previous frame as a 56-dim one-hot
(37 stick + 9 c-stick + 7 button + 3 shoulder).

Total input per frame: 166 dimensions → projected to 512.

## Training

- Optimizer: AdamW, LR 3e-4, weight decay 0.01, no warmup
- LR schedule: CosineAnnealingLR, eta_min 1e-6
- Gradient clip: 1.0
- Dropout: 0.2
- Sequence length: 256 frames (~4.3 seconds)
- Mixed precision: BF16 AMP with FP32 upcast for relpos attention
  (prevents BF16 overflow in the manual Q@K^T + Srel computation)
- Batch size: 512 (typically single-GPU on an RTX 5090)
- Steps: ~32K for well-represented characters, early-stopped for Luigi
- Reaction delay: 0 (v2 shards have target[i] = buttons[i+1], so the
  default rd=0 matches inference — do NOT use `--reaction-delay 1` or
  `--controller-offset` with v2 shards)

## Known limitations

1. **Character-locked**: each model only plays the character it was trained
   on. No matchup generalization. Training a multi-character model with a
   character embedding is a natural next step but not done yet.
2. **Fox model is legacy**: the Fox checkpoint is from an earlier run that
   predates the `--self-inputs` fix. Its val metrics are much lower than
   the others and it plays slightly worse.
3. **Small-dataset overfitting**: Luigi only has 1951 training games after
   filtering. The `_best.pt` checkpoint is early-stopped at step 5242 to
   avoid the val-loss climb. Plays surprisingly well for the data volume.
4. **Edge guarding and recovery weaknesses**: the bot doesn't consistently
   go for off-stage edge guards or execute high-skill recovery mixups.
5. **No matchmaking / Ranked**: the Discord bot only joins explicit Direct
   Connect lobbies. Do NOT adapt it for Slippi Online Unranked or Ranked —
   the libmelee README explicitly forbids bots on those ladders, and
   Slippi has not yet opened a "bot account" opt-in system.

## Acknowledgments

- **Eric Gu** for HAL, the reference implementation MIMIC is based on.
  HAL's architecture, tokenization, and training pipeline are the
  foundation. https://github.com/ericyuegu/hal
- **Vlad Firoiu and collaborators** for libmelee, the Python interface
  to Dolphin + Slippi. https://github.com/altf4/libmelee
- **Project Slippi** for the Slippi Dolphin fork, replay format, and
  Direct Connect rollback netplay. https://slippi.gg

## License

MIT — see the MIMIC repo's LICENSE file.