erickfm
/

MIMIC

@@ -11,30 +11,49 @@ library_name: pytorch
 # MIMIC: Melee Imitation Model for Input Cloning
-Behavior-cloned Super Smash Bros. Melee bots trained on human Slippi replays.
-Four character-specific models (Fox, Falco, Captain Falcon, Luigi), each a
-~20M-parameter transformer that takes a 256-frame window of game state and
-outputs controller inputs (main stick, c-stick, shoulder, buttons) at 60 Hz.
 - **Repo**: https://github.com/erickfm/MIMIC
-- **Base architecture**: HAL's GPTv5Controller (Eric Gu,
-  https://github.com/ericyuegu/hal) — 6-layer causal transformer,
-  512 d_model, 8 heads, 256-frame context, relative position encoding
-  (Shaw et al.)
-- **MIMIC-specific changes**: 7-class button head (distinct TRIG class for
-  airdodge/wavedash, which HAL's 5-class head cannot represent); v2 shard
-  alignment that fixes a subtle gamestate leak in the training targets
-  (see research notes 2026-04-11c); fix for the digital L press bug that
-  prevented all 7-class BC bots from wavedashing until 2026-04-13.
-- **Training data**: filtered from
-  [erickfm/slippi-public-dataset-v3.7](https://huggingface.co/datasets/erickfm/slippi-public-dataset-v3.7)
-  (~95K Slippi replays).
-## Per-character checkpoints
-| Character | Games | Val btn F1 | Val main F1 | Val loss | Step |
-|---|---|---|---|---|---|
-| **Fox** | 17,319 | 87.1% | ~55% | 0.77 | 55,692 |
 ## Repo layout
@@ -45,30 +64,33 @@ MIMIC/
 │   ├── model.pt                   # raw PyTorch checkpoint
 │   ├── config.json                # ModelConfig (copied from ckpt["config"])
 │   ├── metadata.json              # provenance (step, val metrics, notes)
-│   ├── mimic_norm.json            # normalization stats
 │   ├── controller_combos.json     # 7-class button combo spec
 │   ├── cat_maps.json
 │   ├── stick_clusters.json
-│   └── norm_stats.json
-├── falco/      (same layout)
-├── cptfalcon/  (same layout)
-└── luigi/      (same layout)
 ```
 Each character directory is self-contained — the JSONs are the exact
-metadata used during training, copied verbatim from the MIMIC data dir so
 any inference script can load them without touching the MIMIC repo.
 ## Usage
-Clone the MIMIC repo and pull this model:
 ```bash
 git clone https://github.com/erickfm/MIMIC.git
 cd MIMIC
 bash setup.sh  # installs Dolphin, deps, ISO
-# Download all four characters
 python3 -c "
 from huggingface_hub import snapshot_download
 snapshot_download('erickfm/MIMIC', local_dir='./hf_checkpoints')
@@ -79,23 +101,23 @@ Run a character against a level-9 CPU:
 ```bash
 python3 tools/play_vs_cpu.py \
-  --checkpoint hf_checkpoints/falco/model.pt \
   --dolphin-path ./emulator/squashfs-root/usr/bin/dolphin-emu \
   --iso-path ./melee.iso \
-  --data-dir hf_checkpoints/falco \
-  --character FALCO --cpu-character FALCO --cpu-level 9 \
   --stage FINAL_DESTINATION
 ```
-Or play the bot over Slippi Online Direct Connect:
 ```bash
 python3 tools/play_netplay.py \
-  --checkpoint hf_checkpoints/falco/model.pt \
   --dolphin-path ./emulator/squashfs-root/usr/bin/dolphin-emu \
   --iso-path ./melee.iso \
-  --data-dir hf_checkpoints/falco \
-  --character FALCO \
   --connect-code YOUR#123
 ```
@@ -106,16 +128,16 @@ See [docs/discord-bot-setup.md](https://github.com/erickfm/MIMIC/blob/main/docs/
 ## Architecture
 ```
-Slippi Frame ──► HALFlatEncoder (Linear 166→512) ──► 512-d per-frame vector
-                                                          │
-256-frame window ──► + Relative Position Encoding ────────┘
-                         │
-                    6× Pre-Norm Causal Transformer Blocks (512-d, 8 heads)
-                         │
-                    Autoregressive Output Heads (with detach)
-                         │
-              ┌──────────┼──────────┬───────────┐
-           shoulder(3) c_stick(9) main_stick(37) buttons(7)
 ```
 ### 7-class button head
@@ -130,67 +152,106 @@ Slippi Frame ──► HALFlatEncoder (Linear 166→512) ──► 512-d per-fra
 | 5 | A_TRIG (shield grab) |
 | 6 | NONE |
-HAL's original 5-class head (`A, B, Jump, Z, None`) has no TRIG class and
-structurally cannot execute airdodge, which means HAL-lineage bots cannot
-wavedash. MIMIC's 7-class encoding plus a fix for `decode_and_press`
-(which was silently dropping the digital L press until 2026-04-13) is
-what enables the wavedashing you'll see in the replays.
-### Input features
-9 numeric features per player (ego + opponent = 18 total):
-`percent, stock, facing, invulnerable, jumps_left, on_ground,
-shield_strength, position_x, position_y`
-Plus categorical embeddings: stage(4d), 2× character(12d), 2× action(32d).
-Plus controller state from the previous frame as a 56-dim one-hot
-(37 stick + 9 c-stick + 7 button + 3 shoulder).
-Total input per frame: 166 dimensions → projected to 512.
 ## Training
-- Optimizer: AdamW, LR 3e-4, weight decay 0.01, no warmup
-- LR schedule: CosineAnnealingLR, eta_min 1e-6
 - Gradient clip: 1.0
 - Dropout: 0.2
-- Sequence length: 256 frames (~4.3 seconds)
 - Mixed precision: BF16 AMP with FP32 upcast for relpos attention
-  (prevents BF16 overflow in the manual Q@K^T + Srel computation)
-- Batch size: 512 (typically single-GPU on an RTX 5090)
-- Steps: ~32K for well-represented characters, early-stopped for Luigi
-- Reaction delay: 0 (v2 shards have target[i] = buttons[i+1], so the
-  default rd=0 matches inference — do NOT use `--reaction-delay 1` or
-  `--controller-offset` with v2 shards)
 ## Known limitations
-1. **Character-locked**: each model only plays the character it was trained
-   on. No matchup generalization. Training a multi-character model with a
-   character embedding is a natural next step but not done yet.
-2. **Fox model is legacy**: the Fox checkpoint is from an earlier run that
-   predates the `--self-inputs` fix. Its val metrics are much lower than
-   the others and it plays slightly worse.
-3. **Small-dataset overfitting**: Luigi only has 1951 training games after
-   filtering. The `_best.pt` checkpoint is early-stopped at step 5242 to
-   avoid the val-loss climb. Plays surprisingly well for the data volume.
-4. **Edge guarding and recovery weaknesses**: the bot doesn't consistently
-   go for off-stage edge guards or execute high-skill recovery mixups.
-5. **No matchmaking / Ranked**: the Discord bot only joins explicit Direct
-   Connect lobbies. Do NOT adapt it for Slippi Online Unranked or Ranked —
-   the libmelee README explicitly forbids bots on those ladders, and
-   Slippi has not yet opened a "bot account" opt-in system.
 ## Acknowledgments
-- **Eric Gu** for HAL, the reference implementation MIMIC is based on.
-  HAL's architecture, tokenization, and training pipeline are the
-  foundation. https://github.com/ericyuegu/hal
-- **Vlad Firoiu and collaborators** for libmelee, the Python interface
-  to Dolphin + Slippi. https://github.com/altf4/libmelee
 - **Project Slippi** for the Slippi Dolphin fork, replay format, and
   Direct Connect rollback netplay. https://slippi.gg
 ## License
-MIT — see the MIMIC repo's LICENSE file.

 # MIMIC: Melee Imitation Model for Input Cloning
+Behavior-cloned Super Smash Bros. Melee bots trained on human Slippi
+replays. Eight character-specific ~20M-parameter transformers that take
+a 180-frame window of game state and output controller inputs (main
+stick, c-stick, shoulder, buttons) at 60 Hz. Each model plays over
+Slippi Online Direct Connect through Dolphin + libmelee.
 - **Repo**: https://github.com/erickfm/MIMIC
+- **Training data**:
+  [erickfm/melee-ranked-replays](https://huggingface.co/datasets/erickfm/melee-ranked-replays)
+  — ranked Slippi replays (master/diamond/platinum tier) per character.
+- **Base architecture**: Shaw-relative-position causal transformer
+  (d_model=512, 6 layers, 8 heads, seq_len=180). Bootstrapped from
+  [HAL](https://github.com/ericyuegu/hal) (Eric Gu) and since diverged.
+- **Defining MIMIC changes over HAL**: 7-class button head with a
+  distinct TRIG class for airdodge/wavedash (HAL's 5-class head can't
+  represent airdodge and thus can't wavedash); v2 shard alignment that
+  fixes a subtle post-frame-gamestate leak in the training targets
+  (see `research-notes-2026-04-11c`); the digital-L-press fix in
+  `decode_and_press` (research notes 2026-04-13) without which no
+  7-class BC bot wavedashes.
+## Current checkpoints (retrained on 2026-04-20 baseline)
+Retrained on the post-schema-drop (13 numeric cols), new-transforms
+(`tanh_scale` / `linear_max` / `log_max` for velocity / hitlag /
+hitstun) basis. See `research-notes-2026-04-20.md` in the MIMIC repo
+for methodology + results analysis.
+| Character | Run | Train games | Val loss | btn F1 | main F1 | Step |
+|---|---|---|---|---|---|---|
+| **Fox**          | `fox-20260420-baseline`       | 31,030 | 0.7144 | ~88% | ~58% | 32768 |
+| **Falco**        | `falco-20260420-baseline`     | 20,882 | 0.7487 | ~88% | ~58% | 31392 |
+| **Marth**        | `marth-20260420-baseline`     | 11,759 | 0.6664 | ~89% | ~58% | 31065 |
+| **Sheik**        | `sheik-20260420-baseline`     | 51,751 | 0.6566 | ~90% | ~60% | 26160 |
+| Captain Falcon   | `cptfalcon-20260420-baseline` | 17,557 | _(training)_ | — | — | — |
+| Luigi            | `luigi-20260420-baseline`     | _(queued)_ | — | — | — | — |
+| Jigglypuff       | `puff-20260420-baseline`      | _(queued)_ | — | — | — | — |
+| Ice Climbers     | `ice_climbers-20260420-baseline` | _(queued)_ | — | — | — | — |
+**Peach** is present on the repo (`peach-20260420-baseline`,
+val 0.6322) but was trained 2026-04-19 on the pre-drop 22-col schema.
+Peach will be re-trained alongside the rest in a follow-on cycle so
+all chars sit on the exact same basis.
 ## Repo layout
 │   ├── model.pt                   # raw PyTorch checkpoint
 │   ├── config.json                # ModelConfig (copied from ckpt["config"])
 │   ├── metadata.json              # provenance (step, val metrics, notes)
+│   ├── mimic_norm.json            # per-feature transforms + params
 │   ├── controller_combos.json     # 7-class button combo spec
 │   ├── cat_maps.json
 │   ├── stick_clusters.json
+│   └── norm_stats.json            # per-column mean/std (z-score fallback)
+├── falco/       (same layout)
+├── marth/       (same layout)
+├── sheik/       (same layout)
+├── cptfalcon/   (same layout)
+├── luigi/       (same layout)
+├── puff/        (same layout)
+├── ice_climbers/(same layout)
+└── peach/       (same layout, pre-drop schema — retrain pending)
 ```
 Each character directory is self-contained — the JSONs are the exact
+metadata used during training, copied verbatim from the data dir so
 any inference script can load them without touching the MIMIC repo.
 ## Usage
 ```bash
 git clone https://github.com/erickfm/MIMIC.git
 cd MIMIC
 bash setup.sh  # installs Dolphin, deps, ISO
+# Download all characters
 python3 -c "
 from huggingface_hub import snapshot_download
 snapshot_download('erickfm/MIMIC', local_dir='./hf_checkpoints')
 ```bash
 python3 tools/play_vs_cpu.py \
+  --checkpoint hf_checkpoints/marth/model.pt \
   --dolphin-path ./emulator/squashfs-root/usr/bin/dolphin-emu \
   --iso-path ./melee.iso \
+  --data-dir hf_checkpoints/marth \
+  --character MARTH --cpu-character FOX --cpu-level 9 \
   --stage FINAL_DESTINATION
 ```
+Or play a bot over Slippi Online Direct Connect:
 ```bash
 python3 tools/play_netplay.py \
+  --checkpoint hf_checkpoints/sheik/model.pt \
   --dolphin-path ./emulator/squashfs-root/usr/bin/dolphin-emu \
   --iso-path ./melee.iso \
+  --data-dir hf_checkpoints/sheik \
+  --character SHEIK \
   --connect-code YOUR#123
 ```
 ## Architecture
 ```
+Slippi frame ──► MimicFlatEncoder (Linear 184→512) ──► 512-d per-frame vector
+                                                            │
+180-frame window ──► + Shaw Relative-Position attention ────┘
+                             │
+                      6× Pre-Norm Causal Transformer Blocks (512-d, 8 heads, d_ff=2048, GELU, LN)
+                             │
+                        Autoregressive Output Heads (with detach)
+                             │
+              ┌──────────────┼───────────────┬────────────┐
+          shoulder(3)    c_stick(9)     main_stick(37)  buttons(7)
 ```
 ### 7-class button head
 | 5 | A_TRIG (shield grab) |
 | 6 | NONE |
+HAL's original 5-class head (A / B / Jump / Z / None) has no TRIG class
+and structurally can't execute airdodge, which means HAL-lineage bots
+can't wavedash. MIMIC's 7-class encoding plus a fix for
+`decode_and_press` (which was silently dropping the digital L press
+until 2026-04-13) is what enables the wavedashing in the replays.
+### Input features (per frame, per player)
+Numeric (13):
+    pos_x, pos_y, percent, stock, jumps_left,
+    speed_air_x_self, speed_ground_x_self,
+    speed_x_attack, speed_y_attack, speed_y_self,
+    hitlag_left, hitstun_left,
+    shield_strength
+Flags (5):
+    on_ground, off_stage, facing, invulnerable, moonwalkwarning
+Per-feature normalization is defined in each character's
+`mimic_norm.json`. The active transforms are:
+| transform | formula | used for |
+|---|---|---|
+| `normalize` | `2(x-min)/(max-min) - 1` → [-1, +1] | percent, stock, jumps_left, facing, invulnerable, on_ground |
+| `standardize` | `(x - mean) / std` | pos_x, pos_y |
+| `invert_normalize` | `2(max-x)/(max-min) - 1` | shield_strength (so "shield broken" is +1) |
+| `tanh_scale` | `tanh(x / scale)` | 5 velocities (scale=5 for self, scale=10 for attack) |
+| `linear_max` | `x / max` | hitlag_left (max=20) |
+| `log_max` | `log1p(clamp(x,0,max)) / log1p(max)` | hitstun_left (max=120) |
+Plus categorical embeddings: stage(4d), 2× character(12d),
+2× action(32d). Plus the previous-frame controller state as a 56-dim
+one-hot (37 stick + 9 c-stick + 7 button + 3 shoulder).
+Total input per frame: **184 dimensions** → projected to 512.
+Earlier builds (pre-2026-04-20) used a 22-col numeric schema that
+included `invuln_left` and 8 ECB corners. Those columns turned out to
+be structurally zero for our .slp parse path — libmelee never
+populates them — so they were dropped from the schema. See research
+notes 2026-04-20 for the audit. Checkpoints trained pre-drop
+(`peach-20260420-baseline`) still load via their own pickled config
+but use the 202-dim projection path.
 ## Training
+- Model preset: `mimic` (20M params)
+- Optimizer: AdamW, LR 3e-4, weight decay 0.01, **no warmup**
+- LR schedule: `CosineAnnealingLR` to `eta_min=1e-6`
 - Gradient clip: 1.0
 - Dropout: 0.2
+- Sequence length: **180 frames** (~3 seconds)
+- Batch size: 256 per-GPU × 2 RTX 5090s × grad-accum 1 = **eff-batch 512**
 - Mixed precision: BF16 AMP with FP32 upcast for relpos attention
+  (prevents BF16 overflow in the manual Q@Kᵀ + S_rel computation)
+- Max samples: 16.78M (≈ 32,768 steps at eff-batch 512)
+- Watchdog: patience=12 evals on val-plateau — some chars finish early
+- Reaction delay: 0. v2 shards have `target[i] = buttons[i+1]`, so
+  `rd=0` matches inference — do NOT use `--reaction-delay 1` or
+  `--controller-offset` with v2 shards.
+- `--self-inputs` is required even on v2 shards. Runs without it
+  drop the controller-history input entirely and land at val loss ~2.3.
+Typical wall-clock per char on 2×RTX 5090: 10-15 min download/extract
++ 20 min parallel `norm_stats` bootstrap + 45-120 min sharding
+(depending on char, cptfalcon and sheik are the longest) + ~50 min
+training = 2-4 hours.
 ## Known limitations
+1. **Character-locked.** Each model only plays the character it was
+   trained on. No matchup generalization. Multi-character training
+   with a character embedding is a natural next step but not done.
+2. **Small-dataset overfitting on Luigi / Ice Climbers.** Luigi has
+   ~2K training games; IC around 5K. Their `_bestloss.pt` is
+   early-stopped — either by the patience=12 watchdog during this
+   cycle or by inspection in prior cycles. Play quality varies.
+3. **Edge guarding and recovery weaknesses.** Bots don't consistently
+   go for off-stage edge guards or execute high-skill recovery
+   mixups. The training data has these in it, but BC bots under-sample
+   long-tail strategic decisions.
+4. **No Matchmaking / Ranked.** The Discord bot only joins explicit
+   Direct Connect lobbies. Do NOT adapt it for Slippi Online Unranked
+   or Ranked — libmelee's README explicitly forbids bots on those
+   ladders, and Slippi has not yet opened a "bot account" opt-in
+   system.
 ## Acknowledgments
+- **Eric Gu** for [HAL](https://github.com/ericyuegu/hal), the
+  reference implementation MIMIC is based on. HAL's architecture,
+  tokenization, and training pipeline are the foundation.
+- **Vlad Firoiu and collaborators** for
+  [libmelee](https://github.com/altf4/libmelee), the Python interface
+  to Dolphin + Slippi.
 - **Project Slippi** for the Slippi Dolphin fork, replay format, and
   Direct Connect rollback netplay. https://slippi.gg
 ## License
+MIT — see the MIMIC repo's `LICENSE` file.