| | --- |
| | license: mit |
| | tags: |
| | - reinforcement-learning |
| | - minihack |
| | - diffusion |
| | - planning |
| | - behavior-cloning |
| | --- |
| | |
| | # ReMDM-MiniHack |
| |
|
| | Generative Planning Agent for MiniHack navigation using **Re-Masked Discrete Diffusion (ReMDM)**. |
| |
|
| | The agent uses Masked Discrete Diffusion to iteratively generate action sequences for dungeon navigation. |
| | Instead of predicting the next action autoregressively, the model generates entire 64-step trajectories |
| | by progressively unmasking action tokens. |
| |
|
| | ## Code |
| |
|
| | GitHub: [piotrwilam/ReMDM-MiniHack-Project](https://github.com/piotrwilam/ReMDM-MiniHack-Project) |
| |
|
| | ## Models |
| |
|
| | | Version | Model | Params | Training | Tag | |
| | |---|---|---|---|---| |
| | | v017_local_baseline | LocalDiffusionPlanner | 7M | Offline BC, 200 demos/env, 30 epochs | β | |
| | | v017_local_baseline | LocalDiffusionPlanner | 7M | Offline BC, 500 demos/env, 60 epochs | `v0.17-local-baseline-gold` (pending) | |
| |
|
| | ## Repo Structure |
| |
|
| | ``` |
| | ReMDM-MiniHack/ |
| | βββ README.md # This file |
| | βββ v017_local_baseline/ |
| | β βββ inference_weights.pth # EMA state dict (for evaluation) |
| | β βββ full_checkpoint.pth # Full training state (for resuming) |
| | β βββ config.json # Hyperparams + model args |
| | β βββ eval_results.csv # Per-environment results |
| | βββ datasets/ |
| | βββ oracle_demos_v017.pt # Oracle demonstration dataset |
| | ``` |
| |
|
| | ## Quick Start |
| |
|
| | ```python |
| | import torch |
| | from huggingface_hub import hf_hub_download |
| | |
| | # Download weights |
| | path = hf_hub_download("piotrwilam/ReMDM-MiniHack", "v017_local_baseline/inference_weights.pth") |
| | weights = torch.load(path, map_location="cpu", weights_only=False) |
| | |
| | # Load model |
| | from model import LocalDiffusionPlanner |
| | model = LocalDiffusionPlanner(action_dim=12) |
| | model.load_state_dict(weights) |
| | model.eval() |
| | ``` |
| |
|
| | ## Results: v017 Local Baseline (Offline BC, 200 demos/env, 30 epochs) |
| |
|
| | | Environment | Win% | Avg Steps | |
| | |---|---|---| |
| | | Room-Random-5x5 | 94% | 18.3 | |
| | | Room-Random-15x15 | 54% | 130.4 | |
| | | Room-Dark-5x5 | 90% | 25.5 | |
| | | Room-Ultimate-5x5 | 84% | 20.8 | |
| | | Room-Ultimate-15x15 | 30% | 72.1 | |
| | | Corridor-R2 | 42% | 132.1 | |
| | | Corridor-R3 | 0% | 200.0 | |
| | | MazeWalk-9x9 | 48% | 119.0 | |
| | | MazeWalk-15x15 | 22% | 162.3 | |
| |
|