---
license: mit
tags:
- reinforcement-learning
- model-based-rl
- world-model
- minigrid
- fourrooms
- planning
---

# MiniDreamer

MiniDreamer is a PlaNet-style world model project for `MiniGrid-FourRooms-v0`. It learns a recurrent latent dynamics model from partial RGB observations, predicts reward and episode termination, and uses discrete CEM planning in latent space.

The repository contains:

- MiniGrid RGB environment wrappers and bootstrap trajectory collection
- Episode-aware replay buffer with reproducible train/val/test splits
- CNN encoder, Gaussian RSSM, reward/done heads, optional decoder
- Discrete CEM planner with termination-aware return scoring
- PPO baseline entrypoint with a MiniGrid-compatible CNN feature extractor
- Evaluation code, configs, scripts, tests, and project documentation

A complete baseline training run has been executed. A summary is recorded in [results.md](/Users/patryktargosinski/minidreamer/results.md), while the frozen baseline artifacts remain gitignored under `artifacts/world_model/`.

## Layout

```text
configs/
docs/
notebooks/
scripts/
src/
tests/
```

Core code lives under `src/minidreamer/`, with CLI entrypoints at `src/train_world_model.py` and `src/evaluate.py`.

## Setup

Use Python 3.11 or 3.12. The project metadata is defined in [pyproject.toml](/Users/patryktargosinski/minidreamer/pyproject.toml).

```bash
python3.11 -m venv .venv
source .venv/bin/activate
pip install -e ".[dev]"
```

## Main Commands

Bootstrap replay collection:

```bash
./scripts/collect_random.sh
```

World-model pipeline:

```bash
./scripts/train_world_model.sh
```

By default, the script writes new experiments to `artifacts/world_model_experiment/`. To choose a different experiment directory without touching the frozen baseline, set `MINIDREAMER_OUTPUT_DIR`:

```bash
MINIDREAMER_OUTPUT_DIR=artifacts/world_model_restricted_actions ./scripts/train_world_model.sh
```

Resume an interrupted world-model run from a checkpoint:

```bash
python3.11 src/train_world_model.py \
  --config configs/fourrooms_world_model.yaml \
  --output-dir artifacts/world_model \
  --replay-dir artifacts/world_model/replay \
  --resume-checkpoint artifacts/world_model/checkpoints/world_model_env_steps_90021.pt
```

Planner evaluation from a checkpoint:

```bash
./scripts/eval_planner.sh /path/to/checkpoint.pt /path/to/replay
```

PPO baseline:

```bash
./scripts/train_ppo.sh
```

## Notes

- The latest completed run summary is in [results.md](/Users/patryktargosinski/minidreamer/results.md).
- The baseline run in `artifacts/world_model/` is intentionally frozen as the reference artifact.
- New world-model experiments should write to separate directories under `artifacts/`.
- The trainer refuses to overwrite an existing run directory unless you resume with `--resume-checkpoint` or explicitly pass `--allow-overwrite-existing-output`.
- Metrics, replay snapshots, and checkpoints are intentionally gitignored.