Initial repo structure with model card

Browse files

Files changed (3) hide show

README.md +73 -0
datasets/.gitkeep +0 -0
v017_local_baseline/.gitkeep +0 -0

README.md ADDED Viewed

	@@ -0,0 +1,73 @@

+---
+license: mit
+tags:
+  - reinforcement-learning
+  - minihack
+  - diffusion
+  - planning
+  - behavior-cloning
+---
+# ReMDM-MiniHack
+Generative Planning Agent for MiniHack navigation using **Re-Masked Discrete Diffusion (ReMDM)**.
+The agent uses Masked Discrete Diffusion to iteratively generate action sequences for dungeon navigation.
+Instead of predicting the next action autoregressively, the model generates entire 64-step trajectories
+by progressively unmasking action tokens.
+## Code
+GitHub: [piotrwilam/ReMDM-MiniHack-Project](https://github.com/piotrwilam/ReMDM-MiniHack-Project)
+## Models
+| Version | Model | Params | Training | Tag |
+|---|---|---|---|---|
+| v017_local_baseline | LocalDiffusionPlanner | 7M | Offline BC, 200 demos/env, 30 epochs | — |
+| v017_local_baseline | LocalDiffusionPlanner | 7M | Offline BC, 500 demos/env, 60 epochs | `v0.17-local-baseline-gold` (pending) |
+## Repo Structure
+```
+ReMDM-MiniHack/
+├── README.md                              # This file
+├── v017_local_baseline/
+│   ├── inference_weights.pth              # EMA state dict (for evaluation)
+│   ├── full_checkpoint.pth                # Full training state (for resuming)
+│   ├── config.json                        # Hyperparams + model args
+│   └── eval_results.csv                   # Per-environment results
+└── datasets/
+    └── oracle_demos_v017.pt               # Oracle demonstration dataset
+```
+## Quick Start
+```python
+import torch
+from huggingface_hub import hf_hub_download
+# Download weights
+path = hf_hub_download("piotrwilam/ReMDM-MiniHack", "v017_local_baseline/inference_weights.pth")
+weights = torch.load(path, map_location="cpu", weights_only=False)
+# Load model
+from model import LocalDiffusionPlanner
+model = LocalDiffusionPlanner(action_dim=12)
+model.load_state_dict(weights)
+model.eval()
+```
+## Results: v017 Local Baseline (Offline BC, 200 demos/env, 30 epochs)
+| Environment | Win% | Avg Steps |
+|---|---|---|
+| Room-Random-5x5 | 94% | 18.3 |
+| Room-Random-15x15 | 54% | 130.4 |
+| Room-Dark-5x5 | 90% | 25.5 |
+| Room-Ultimate-5x5 | 84% | 20.8 |
+| Room-Ultimate-15x15 | 30% | 72.1 |
+| Corridor-R2 | 42% | 132.1 |
+| Corridor-R3 | 0% | 200.0 |
+| MazeWalk-9x9 | 48% | 119.0 |
+| MazeWalk-15x15 | 22% | 162.3 |

datasets/.gitkeep ADDED Viewed

File without changes

v017_local_baseline/.gitkeep ADDED Viewed

File without changes