MIMIC / README.md

Upload MIMIC character checkpoints + model card

f5d1cd7 verified 2 days ago

7.43 kB

	---
	license: mit
	tags:
	- behavior-cloning
	- imitation-learning
	- super-smash-bros-melee
	- reinforcement-learning
	- gaming
	library_name: pytorch
	---

	# MIMIC: Melee Imitation Model for Input Cloning

	Behavior-cloned Super Smash Bros. Melee bots trained on human Slippi replays.
	Four character-specific models (Fox, Falco, Captain Falcon, Luigi), each a
	~20M-parameter transformer that takes a 256-frame window of game state and
	outputs controller inputs (main stick, c-stick, shoulder, buttons) at 60 Hz.

	- Repo: https://github.com/erickfm/MIMIC
	- Base architecture: HAL's GPTv5Controller (Eric Gu,
	https://github.com/ericyuegu/hal) — 6-layer causal transformer,
	512 d_model, 8 heads, 256-frame context, relative position encoding
	(Shaw et al.)
	- MIMIC-specific changes: 7-class button head (distinct TRIG class for
	airdodge/wavedash, which HAL's 5-class head cannot represent); v2 shard
	alignment that fixes a subtle gamestate leak in the training targets
	(see research notes 2026-04-11c); fix for the digital L press bug that
	prevented all 7-class BC bots from wavedashing until 2026-04-13.
	- Training data: filtered from
	[erickfm/slippi-public-dataset-v3.7](https://huggingface.co/datasets/erickfm/slippi-public-dataset-v3.7)
	(~95K Slippi replays).

	## Per-character checkpoints

	\| Character \| Games \| Val btn F1 \| Val main F1 \| Val loss \| Step \|
	\|---\|---\|---\|---\|---\|---\|
	\| Fox \| 17,319 \| 87.1% \| ~55% \| 0.77 \| 55,692 \|

	## Repo layout

	```
	MIMIC/
	├── README.md # this file
	├── fox/
	│ ├── model.pt # raw PyTorch checkpoint
	│ ├── config.json # ModelConfig (copied from ckpt["config"])
	│ ├── metadata.json # provenance (step, val metrics, notes)
	│ ├── mimic_norm.json # normalization stats
	│ ├── controller_combos.json # 7-class button combo spec
	│ ├── cat_maps.json
	│ ├── stick_clusters.json
	│ └── norm_stats.json
	├── falco/ (same layout)
	├── cptfalcon/ (same layout)
	└── luigi/ (same layout)
	```

	Each character directory is self-contained — the JSONs are the exact
	metadata used during training, copied verbatim from the MIMIC data dir so
	any inference script can load them without touching the MIMIC repo.

	## Usage

	Clone the MIMIC repo and pull this model:

	```bash
	git clone https://github.com/erickfm/MIMIC.git
	cd MIMIC
	bash setup.sh # installs Dolphin, deps, ISO

	# Download all four characters
	python3 -c "
	from huggingface_hub import snapshot_download
	snapshot_download('erickfm/MIMIC', local_dir='./hf_checkpoints')
	"
	```

	Run a character against a level-9 CPU:

	```bash
	python3 tools/play_vs_cpu.py \
	--checkpoint hf_checkpoints/falco/model.pt \
	--dolphin-path ./emulator/squashfs-root/usr/bin/dolphin-emu \
	--iso-path ./melee.iso \
	--data-dir hf_checkpoints/falco \
	--character FALCO --cpu-character FALCO --cpu-level 9 \
	--stage FINAL_DESTINATION
	```

	Or play the bot over Slippi Online Direct Connect:

	```bash
	python3 tools/play_netplay.py \
	--checkpoint hf_checkpoints/falco/model.pt \
	--dolphin-path ./emulator/squashfs-root/usr/bin/dolphin-emu \
	--iso-path ./melee.iso \
	--data-dir hf_checkpoints/falco \
	--character FALCO \
	--connect-code YOUR#123
	```

	The MIMIC repo also includes a Discord bot frontend
	(`tools/discord_bot.py`) that queues direct-connect matches per user.
	See [docs/discord-bot-setup.md](https://github.com/erickfm/MIMIC/blob/main/docs/discord-bot-setup.md).

	## Architecture

	```
	Slippi Frame ──► HALFlatEncoder (Linear 166→512) ──► 512-d per-frame vector
	│
	256-frame window ──► + Relative Position Encoding ────────┘
	│
	6× Pre-Norm Causal Transformer Blocks (512-d, 8 heads)
	│
	Autoregressive Output Heads (with detach)
	│
	┌──────────┼──────────┬───────────┐
	shoulder(3) c_stick(9) main_stick(37) buttons(7)
	```

	### 7-class button head

	\| Class \| Meaning \|
	\|---\|---\|
	\| 0 \| A \|
	\| 1 \| B \|
	\| 2 \| Z \|
	\| 3 \| JUMP (X or Y) \|
	\| 4 \| TRIG (digital L or R) \|
	\| 5 \| A_TRIG (shield grab) \|
	\| 6 \| NONE \|

	HAL's original 5-class head (`A, B, Jump, Z, None`) has no TRIG class and
	structurally cannot execute airdodge, which means HAL-lineage bots cannot
	wavedash. MIMIC's 7-class encoding plus a fix for `decode_and_press`
	(which was silently dropping the digital L press until 2026-04-13) is
	what enables the wavedashing you'll see in the replays.

	### Input features

	9 numeric features per player (ego + opponent = 18 total):
	`percent, stock, facing, invulnerable, jumps_left, on_ground,
	shield_strength, position_x, position_y`

	Plus categorical embeddings: stage(4d), 2× character(12d), 2× action(32d).
	Plus controller state from the previous frame as a 56-dim one-hot
	(37 stick + 9 c-stick + 7 button + 3 shoulder).

	Total input per frame: 166 dimensions → projected to 512.

	## Training

	- Optimizer: AdamW, LR 3e-4, weight decay 0.01, no warmup
	- LR schedule: CosineAnnealingLR, eta_min 1e-6
	- Gradient clip: 1.0
	- Dropout: 0.2
	- Sequence length: 256 frames (~4.3 seconds)
	- Mixed precision: BF16 AMP with FP32 upcast for relpos attention
	(prevents BF16 overflow in the manual Q@K^T + Srel computation)
	- Batch size: 512 (typically single-GPU on an RTX 5090)
	- Steps: ~32K for well-represented characters, early-stopped for Luigi
	- Reaction delay: 0 (v2 shards have target[i] = buttons[i+1], so the
	default rd=0 matches inference — do NOT use `--reaction-delay 1` or
	`--controller-offset` with v2 shards)

	## Known limitations

	1. Character-locked: each model only plays the character it was trained
	on. No matchup generalization. Training a multi-character model with a
	character embedding is a natural next step but not done yet.
	2. Fox model is legacy: the Fox checkpoint is from an earlier run that
	predates the `--self-inputs` fix. Its val metrics are much lower than
	the others and it plays slightly worse.
	3. Small-dataset overfitting: Luigi only has 1951 training games after
	filtering. The `_best.pt` checkpoint is early-stopped at step 5242 to
	avoid the val-loss climb. Plays surprisingly well for the data volume.
	4. Edge guarding and recovery weaknesses: the bot doesn't consistently
	go for off-stage edge guards or execute high-skill recovery mixups.
	5. No matchmaking / Ranked: the Discord bot only joins explicit Direct
	Connect lobbies. Do NOT adapt it for Slippi Online Unranked or Ranked —
	the libmelee README explicitly forbids bots on those ladders, and
	Slippi has not yet opened a "bot account" opt-in system.

	## Acknowledgments

	- Eric Gu for HAL, the reference implementation MIMIC is based on.
	HAL's architecture, tokenization, and training pipeline are the
	foundation. https://github.com/ericyuegu/hal
	- Vlad Firoiu and collaborators for libmelee, the Python interface
	to Dolphin + Slippi. https://github.com/altf4/libmelee
	- Project Slippi for the Slippi Dolphin fork, replay format, and
	Direct Connect rollback netplay. https://slippi.gg

	## License

	MIT — see the MIMIC repo's LICENSE file.