ReMDM-MiniHack

Generative Planning Agent for MiniHack navigation using Re-Masked Discrete Diffusion (ReMDM).

The agent uses Masked Discrete Diffusion to iteratively generate action sequences for dungeon navigation. Instead of predicting the next action autoregressively, the model generates entire 64-step trajectories by progressively unmasking action tokens.

Code

GitHub: piotrwilam/ReMDM-MiniHack-Project

Models

Version Model Params Training Tag
v017_local_baseline LocalDiffusionPlanner 7M Offline BC, 200 demos/env, 30 epochs β€”
v017_local_baseline LocalDiffusionPlanner 7M Offline BC, 500 demos/env, 60 epochs v0.17-local-baseline-gold (pending)

Repo Structure

ReMDM-MiniHack/
β”œβ”€β”€ README.md                              # This file
β”œβ”€β”€ v017_local_baseline/
β”‚   β”œβ”€β”€ inference_weights.pth              # EMA state dict (for evaluation)
β”‚   β”œβ”€β”€ full_checkpoint.pth                # Full training state (for resuming)
β”‚   β”œβ”€β”€ config.json                        # Hyperparams + model args
β”‚   └── eval_results.csv                   # Per-environment results
└── datasets/
    └── oracle_demos_v017.pt               # Oracle demonstration dataset

Quick Start

import torch
from huggingface_hub import hf_hub_download

# Download weights
path = hf_hub_download("piotrwilam/ReMDM-MiniHack", "v017_local_baseline/inference_weights.pth")
weights = torch.load(path, map_location="cpu", weights_only=False)

# Load model
from model import LocalDiffusionPlanner
model = LocalDiffusionPlanner(action_dim=12)
model.load_state_dict(weights)
model.eval()

Results: v017 Local Baseline (Offline BC, 200 demos/env, 30 epochs)

Environment Win% Avg Steps
Room-Random-5x5 94% 18.3
Room-Random-15x15 54% 130.4
Room-Dark-5x5 90% 25.5
Room-Ultimate-5x5 84% 20.8
Room-Ultimate-15x15 30% 72.1
Corridor-R2 42% 132.1
Corridor-R3 0% 200.0
MazeWalk-9x9 48% 119.0
MazeWalk-15x15 22% 162.3
Downloads last month

-

Downloads are not tracked for this model. How to track
Video Preview
loading