Spaces:

ViditOstwal
/

maze_env

Running

App Files Files Community

ViditOstwal commited on Apr 21

Commit

d8977cf

verified ·

1 Parent(s): e7cea77

Upload folder using huggingface_hub

Browse files

Files changed (22) hide show

Dockerfile +83 -0
README.md +124 -5
__init__.py +17 -0
client.py +119 -0
dataset/ice-maze-levels.json +1210 -0
dataset/validate_dataset.py +203 -0
experiment.ipynb +138 -0
models.py +148 -0
openenv.yaml +7 -0
openenv_maze_env.egg-info/PKG-INFO +9 -0
openenv_maze_env.egg-info/SOURCES.txt +19 -0
openenv_maze_env.egg-info/dependency_links.txt +1 -0
openenv_maze_env.egg-info/entry_points.txt +2 -0
openenv_maze_env.egg-info/requires.txt +5 -0
openenv_maze_env.egg-info/top_level.txt +1 -0
pyproject.toml +48 -0
server/__init__.py +11 -0
server/app.py +84 -0
server/maze_env_environment.py +291 -0
server/maze_env_helpers.py +298 -0
server/requirements.txt +6 -0
uv.lock +0 -0

Dockerfile ADDED Viewed

	@@ -0,0 +1,83 @@

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the BSD-style license found in the
+# LICENSE file in the root directory of this source tree.
+# Multi-stage build using openenv-base
+# This Dockerfile is flexible and works for both:
+# - In-repo environments (with local OpenEnv sources)
+# - Standalone environments (with openenv from PyPI/Git)
+# The build script (openenv build) handles context detection and sets appropriate build args.
+ARG BASE_IMAGE=ghcr.io/meta-pytorch/openenv-base:latest
+FROM ${BASE_IMAGE} AS builder
+WORKDIR /app
+# Ensure git is available (required for installing dependencies from VCS)
+RUN apt-get update && \
+    apt-get install -y --no-install-recommends git && \
+    rm -rf /var/lib/apt/lists/*
+# Build argument to control whether we're building standalone or in-repo
+ARG BUILD_MODE=in-repo
+ARG ENV_NAME=maze_env
+# Copy environment code (always at root of build context)
+COPY . /app/env
+# For in-repo builds, openenv is already vendored in the build context
+# For standalone builds, openenv will be installed via pyproject.toml
+WORKDIR /app/env
+# Ensure uv is available (for local builds where base image lacks it)
+RUN if ! command -v uv >/dev/null 2>&1; then \
+        curl -LsSf https://astral.sh/uv/install.sh | sh && \
+        mv /root/.local/bin/uv /usr/local/bin/uv && \
+        mv /root/.local/bin/uvx /usr/local/bin/uvx; \
+    fi
+# Install dependencies using uv sync
+# If uv.lock exists, use it; otherwise resolve on the fly
+RUN --mount=type=cache,target=/root/.cache/uv \
+    if [ -f uv.lock ]; then \
+        uv sync --frozen --no-install-project --no-editable; \
+    else \
+        uv sync --no-install-project --no-editable; \
+    fi
+RUN --mount=type=cache,target=/root/.cache/uv \
+    if [ -f uv.lock ]; then \
+        uv sync --frozen --no-editable; \
+    else \
+        uv sync --no-editable; \
+    fi
+# Final runtime stage
+FROM ${BASE_IMAGE}
+WORKDIR /app
+# Copy the virtual environment from builder
+COPY --from=builder /app/env/.venv /app/.venv
+# Copy the environment code
+COPY --from=builder /app/env /app/env
+# Set PATH to use the virtual environment
+ENV PATH="/app/.venv/bin:$PATH"
+# Set PYTHONPATH so imports work correctly
+ENV PYTHONPATH="/app/env:$PYTHONPATH"
+# WEB INTERFACE
+ENV ENABLE_WEB_INTERFACE=true
+# Health check
+HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \
+    CMD curl -f http://localhost:8000/health || exit 1
+# Run the FastAPI server
+# The module path is constructed to work with the /app/env structure
+CMD ["sh", "-c", "cd /app/env && uvicorn server.app:app --host 0.0.0.0 --port 8000"]

README.md CHANGED Viewed

@@ -1,10 +1,129 @@
 ---
-title: Maze Env
-emoji: 🏃
-colorFrom: pink
-colorTo: yellow
 sdk: docker
 pinned: false
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 ---
+title: Maze Env Environment Server
+emoji: 🎭
+colorFrom: green
+colorTo: pink
 sdk: docker
 pinned: false
+app_port: 8000
+base_path: /web
+tags:
+  - openenv
 ---
+# Maze Env Environment
+Ice-sliding maze environment on [OpenEnv](https://github.com/meta-pytorch/OpenEnv).
+Agents call `reset` / `step` with directions. All players slide simultaneously in the chosen direction until blocked by a wall or another player.
+Levels live in `dataset/ice-maze-levels.json`.
+## Core Rules
+- **Actions**: `LEFT`, `RIGHT`, `UP`, `DOWN`
+- **Movement**: on each step, every player slides as far as possible in that direction
+- **Win condition**: episode is solved only when every player is on an exit after a step
+Board symbols used at runtime:
+- `#` wall
+- `.` empty ice
+- `e` unoccupied exit
+- `a` player on non-exit cell
+- `b` player on exit cell
+## Quick Start (Client)
+```python
+from maze_env import MazeAction, MazeEnv
+with MazeEnv(base_url="http://localhost:8000").sync() as env:
+    reset_result = env.reset(level_index=0)
+    print(reset_result.observation.board)
+    print(reset_result.observation.system_prompt)
+    step_result = env.step(MazeAction(direction="LEFT"))
+    obs = step_result.observation
+    print(obs.board)
+    print(obs.message, obs.reward, obs.done)
+```
+## Run Locally
+Start server:
+```bash
+uv run --project . server
+```
+Or with uvicorn directly:
+```bash
+uv run uvicorn server.app:app --reload
+```
+## Docker
+Build:
+```bash
+docker build -t maze_env-env:latest -f server/Dockerfile .
+```
+Run:
+```bash
+docker run --rm -p 8000:8000 maze_env-env:latest
+```
+## Dataset Validation
+Run:
+```bash
+uv run python dataset/validate_dataset.py
+```
+The validator checks:
+- `start`/`end` consistency against `annotated_board`
+- `diameter == len(path)` when both are present
+- path replay through the actual environment:
+  - `done` must **not** become `True` before the final path move
+  - `done` must be `True` at the final path move
+## Smoke Test Environment Logic
+```bash
+uv run python server/maze_env_environment.py
+```
+This runs a direct `reset`/`step` demo without starting the API server.
+## Deployment (OpenEnv / Hugging Face)
+```bash
+openenv push
+```
+This uses `openenv.yaml` and deploys the Docker-backed environment.
+## Project Structure
+```text
+.
+├── __init__.py
+├── client.py
+├── models.py
+├── openenv.yaml
+├── pyproject.toml
+├── dataset/
+│   ├── ice-maze-levels.json
+│   └── validate_dataset.py
+└── server/
+    ├── app.py
+    ├── maze_env_environment.py
+    ├── maze_env_helpers.py
+    └── Dockerfile
+```

__init__.py ADDED Viewed

	@@ -0,0 +1,17 @@

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the BSD-style license found in the
+# LICENSE file in the root directory of this source tree.
+"""Maze Env Environment."""
+from .client import MazeEnv
+from .models import MazeAction, MazeDirection, MazeObservation
+__all__ = [
+    "MazeAction",
+    "MazeDirection",
+    "MazeObservation",
+    "MazeEnv",
+]

client.py ADDED Viewed

	@@ -0,0 +1,119 @@

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the BSD-style license found in the
+# LICENSE file in the root directory of this source tree.
+"""Ice Maze Environment Client."""
+from typing import Dict
+from openenv.core import EnvClient
+from openenv.core.client_types import StepResult
+from openenv.core.env_server.types import State
+from .models import MazeAction, MazeObservation
+class MazeEnv(EnvClient[MazeAction, MazeObservation, State]):
+    """
+    Client for the Ice Maze Environment.
+    Maintains a persistent WebSocket connection to the environment server,
+    enabling efficient multi-step interactions with lower latency.
+    Each client instance has its own dedicated environment session on the server.
+    Example (async):
+        >>> async with MazeEnv(base_url="http://localhost:8000") as env:
+        ...     obs = await env.reset(level_index=1)
+        ...     print(obs.observation.system_prompt)
+        ...
+        ...     obs = await env.step(MazeAction(direction="LEFT"))
+        ...     print(obs.observation.board)
+        ...     print(obs.observation.message)
+    Example (sync wrapper):
+        >>> with MazeEnv(base_url="http://localhost:8000").sync() as env:
+        ...     obs = env.reset(level_index=1)
+        ...     print(obs.observation.system_prompt)
+        ...     obs = env.step(MazeAction(direction="UP"))
+        ...     print(obs.observation.board)
+    Example with Docker:
+        >>> client = await MazeEnv.from_docker_image("maze_env-env:latest")
+        >>> try:
+        ...     obs = await client.reset(level_index=0)
+        ...     obs = await client.step(MazeAction(direction="RIGHT"))
+        ... finally:
+        ...     await client.close()
+    """
+    def _step_payload(self, action: MazeAction) -> Dict:
+        """
+        Convert MazeAction to JSON payload for the step WebSocket message.
+        Args:
+            action: MazeAction instance with a direction field.
+        Returns:
+            Dictionary representation suitable for JSON encoding.
+        """
+        # Send canonical wire value expected by the environment.
+        return {"direction": action.direction.value}
+    def _parse_result(self, payload: Dict) -> StepResult[MazeObservation]:
+        """
+        Parse server response into StepResult[MazeObservation].
+        The server serializes the observation via serialize_observation(), which
+        produces:
+            {
+                "observation": { <MazeObservation fields minus done/reward/metadata> },
+                "reward": float | None,
+                "done": bool,
+            }
+        Args:
+            payload: JSON response data from server.
+        Returns:
+            StepResult with a fully populated MazeObservation.
+        """
+        obs_data = payload.get("observation", {})
+        done = payload.get("done", False)
+        reward = payload.get("reward")
+        observation = MazeObservation(
+            board=obs_data.get("board", ""),
+            step_count=obs_data.get("step_count", 0),
+            max_steps=obs_data.get("max_steps", 0),
+            previous_actions=obs_data.get("previous_actions", []),
+            system_prompt=obs_data.get("system_prompt", ""),
+            agent_positions=obs_data.get("agent_positions", []),
+            exit_positions=obs_data.get("exit_positions", []),
+            num_players=obs_data.get("num_players", 1),
+            message=obs_data.get("message", ""),
+            done=done,
+            reward=reward,
+            metadata=obs_data.get("metadata", payload.get("metadata", {})),
+        )
+        return StepResult(
+            observation=observation,
+            reward=reward,
+            done=done,
+        )
+    def _parse_state(self, payload: Dict) -> State:
+        """
+        Parse server response into a State object.
+        The state endpoint returns the full State dict including extra fields
+        (current_board, agent_positions, action_history, etc.).
+        Args:
+            payload: JSON response from the state WebSocket message.
+        Returns:
+            State object with episode_id, step_count, and all extra fields.
+        """
+        return State(**payload) if payload else State()

dataset/ice-maze-levels.json ADDED Viewed

	@@ -0,0 +1,1210 @@

+[
+  {
+    "width": 8,
+    "height": 8,
+    "players": 1,
+    "diameter":4,
+    "threes_along_diameter": 1,
+    "open_cells": 58,
+    "states": 58,
+    "start": [
+      [
+        6,
+        4
+      ]
+    ],
+    "end": [
+      [
+        4,
+        7
+      ]
+    ],
+    "path": "ULDR",
+    "annotated_board": [
+      "##########",
+      "#........#",
+      "#........#",
+      "#........#",
+      "#......###",
+      "#.......e#",
+      "###....###",
+      "###..a...#",
+      "###......#",
+      "##########"
+    ],
+    "date": "2026-04-06T01:42:07-06:00"
+  },
+  {
+    "width": 8,
+    "height": 8,
+    "players": 1,
+    "diameter": 6,
+    "threes_along_diameter": 4,
+    "open_cells": 61,
+    "states": 61,
+    "start": [
+      [
+        3,
+        6
+      ]
+    ],
+    "end": [
+      [
+        0,
+        5
+      ]
+    ],
+    "path": "LULDRU",
+    "annotated_board": [
+      "##########",
+      "#.....e..#",
+      "#........#",
+      "#......#.#",
+      "###....a.#",
+      "#........#",
+      "#........#",
+      "#........#",
+      "#........#",
+      "##########"
+    ],
+    "date": "2026-04-06T01:48:26-06:00"
+  },
+  {
+    "width": 8,
+    "height": 8,
+    "players": 1,
+    "diameter": 8,
+    "threes_along_diameter": 5,
+    "open_cells": 60,
+    "states": 60,
+    "start": [
+      [
+        0,
+        5
+      ]
+    ],
+    "end": [
+      [
+        1,
+        0
+      ]
+    ],
+    "path": "DLURDRUL",
+    "annotated_board": [
+      "##########",
+      "#....#a..#",
+      "#e.......#",
+      "#........#",
+      "#........#",
+      "#.....#..#",
+      "#..##....#",
+      "#........#",
+      "#........#",
+      "##########"
+    ],
+    "date": "2026-04-06T01:48:25-06:00"
+  },
+  {
+    "width": 8,
+    "height": 8,
+    "players": 1,
+    "diameter": 12,
+    "threes_along_diameter": 8,
+    "open_cells": 58,
+    "states": 58,
+    "start": [
+      [
+        6,
+        6
+      ]
+    ],
+    "end": [
+      [
+        1,
+        5
+      ]
+    ],
+    "path": "LURURDRDRULU",
+    "annotated_board": [
+      "##########",
+      "#.....#..#",
+      "#.....e..#",
+      "##......##",
+      "#....#...#",
+      "#........#",
+      "#......###",
+      "#......a.#",
+      "#........#",
+      "##########"
+    ],
+    "date": "2026-04-06T01:42:07-06:00"
+  },
+  {
+    "width": 8,
+    "height": 8,
+    "players": 1,
+    "diameter": 18,
+    "threes_along_diameter": 15,
+    "open_cells": 51,
+    "states": 51,
+    "start": [
+      [
+        6,
+        1
+      ]
+    ],
+    "end": [
+      [
+        0,
+        3
+      ]
+    ],
+    "path": "LURDRURDLURULURDLU",
+    "annotated_board": [
+      "##########",
+      "#..#e#...#",
+      "##......##",
+      "#..#.....#",
+      "#......###",
+      "#...#....#",
+      "#.#.....##",
+      "#.a......#",
+      "#..#...###",
+      "##########"
+    ],
+    "date": "2026-04-06T01:40:27-06:00"
+  },
+  {
+    "width": 8,
+    "height": 8,
+    "players": 2,
+    "diameter": 8,
+    "threes_along_diameter": 1,
+    "open_cells": 63,
+    "states": 1953,
+    "start": [
+      [
+        0,
+        1
+      ],
+      [
+        0,
+        2
+      ]
+    ],
+    "end": [
+      [
+        1,
+        7
+      ],
+      [
+        2,
+        7
+      ]
+    ],
+    "path": "DLURDLUR",
+    "annotated_board": [
+      "##########",
+      "##aa.....#",
+      "#.......e#",
+      "#.......e#",
+      "#........#",
+      "#........#",
+      "#........#",
+      "#........#",
+      "#........#",
+      "##########"
+    ],
+    "date": "2026-04-06T01:54:31-06:00"
+  },
+  {
+    "width": 8,
+    "height": 8,
+    "players": 2,
+    "diameter": 8,
+    "threes_along_diameter": 1,
+    "open_cells": 62,
+    "states": 1891,
+    "start": [
+      [
+        0,
+        0
+      ],
+      [
+        1,
+        0
+      ]
+    ],
+    "end": [
+      [
+        3,
+        0
+      ],
+      [
+        7,
+        1
+      ]
+    ],
+    "path": "RDLDRULD",
+    "annotated_board": [
+      "##########",
+      "#a.......#",
+      "#a.......#",
+      "#........#",
+      "#e.......#",
+      "##.......#",
+      "##.......#",
+      "##.......#",
+      "#.e......#",
+      "##########"
+    ],
+    "date": "2026-04-06T01:54:30-06:00"
+  },
+  {
+    "width": 8,
+    "height": 8,
+    "players": 2,
+    "diameter": 9,
+    "threes_along_diameter": 1,
+    "open_cells": 61,
+    "states": 1830,
+    "start": [
+      [
+        0,
+        0
+      ],
+      [
+        1,
+        0
+      ]
+    ],
+    "end": [
+      [
+        1,
+        0
+      ],
+      [
+        2,
+        0
+      ]
+    ],
+    "path": "RDLULDRUL",
+    "annotated_board": [
+      "##########",
+      "#a......##",
+      "#b.......#",
+      "#e.......#",
+      "#........#",
+      "#........#",
+      "#....#...#",
+      "#........#",
+      "##.......#",
+      "##########"
+    ],
+    "date": "2026-04-06T02:03:47-06:00"
+  },
+  {
+    "width": 8,
+    "height": 8,
+    "players": 2,
+    "diameter": 9,
+    "threes_along_diameter": 1,
+    "open_cells": 60,
+    "states": 1770,
+    "start": [
+      [
+        6,
+        1
+      ],
+      [
+        7,
+        0
+      ]
+    ],
+    "end": [
+      [
+        0,
+        7
+      ],
+      [
+        7,
+        7
+      ]
+    ],
+    "path": "RULDRDLUR",
+    "annotated_board": [
+      "##########",
+      "#.......e#",
+      "#........#",
+      "#........#",
+      "#...#....#",
+      "#........#",
+      "#.#...#..#",
+      "##a......#",
+      "#a......e#",
+      "##########"
+    ],
+    "date": "2026-04-06T01:54:30-06:00"
+  },
+  {
+    "width": 8,
+    "height": 8,
+    "players": 2,
+    "diameter": 9,
+    "threes_along_diameter": 1,
+    "open_cells": 59,
+    "states": 1711,
+    "start": [
+      [
+        0,
+        0
+      ],
+      [
+        0,
+        1
+      ]
+    ],
+    "end": [
+      [
+        7,
+        1
+      ],
+      [
+        7,
+        5
+      ]
+    ],
+    "path": "DRURDLURD",
+    "annotated_board": [
+      "##########",
+      "#aa#######",
+      "#......#.#",
+      "#........#",
+      "#........#",
+      "#........#",
+      "#........#",
+      "#........#",
+      "#.e...e..#",
+      "##########"
+    ],
+    "date": "2026-04-06T02:05:44-06:00"
+  },
+  {
+    "width": 8,
+    "height": 8,
+    "players": 2,
+    "diameter": 15,
+    "threes_along_diameter": 12,
+    "open_cells": 60,
+    "states": 1770,
+    "start": [
+      [
+        6,
+        0
+      ],
+      [
+        7,
+        0
+      ]
+    ],
+    "end": [
+      [
+        1,
+        0
+      ],
+      [
+        4,
+        0
+      ]
+    ],
+    "path": "RULURULULDRULDL",
+    "annotated_board": [
+      "##########",
+      "#........#",
+      "#e.......#",
+      "#.#......#",
+      "#........#",
+      "#e.....#.#",
+      "##.......#",
+      "#a......##",
+      "#a.......#",
+      "##########"
+    ],
+    "date": "2026-04-06T02:05:47-06:00"
+  },
+  {
+    "width": 8,
+    "height": 8,
+    "players": 2,
+    "diameter": 20,
+    "threes_along_diameter": 12,
+    "open_cells": 59,
+    "states": 1711,
+    "start": [
+      [
+        4,
+        6
+      ],
+      [
+        4,
+        7
+      ]
+    ],
+    "end": [
+      [
+        5,
+        7
+      ],
+      [
+        7,
+        7
+      ]
+    ],
+    "path": "URULDLDLURULDRURDLDR",
+    "annotated_board": [
+      "##########",
+      "#..#.....#",
+      "#........#",
+      "#........#",
+      "#......#.#",
+      "#.....#aa#",
+      "#.......e#",
+      "##.......#",
+      "#...#...e#",
+      "##########"
+    ],
+    "date": "2026-04-06T02:05:49-06:00"
+  },
+  {
+    "width": 8,
+    "height": 8,
+    "players": 2,
+    "diameter": 16,
+    "threes_along_diameter": 12,
+    "open_cells": 57,
+    "states": 1596,
+    "start": [
+      [
+        1,
+        0
+      ],
+      [
+        1,
+        1
+      ]
+    ],
+    "end": [
+      [
+        0,
+        1
+      ],
+      [
+        0,
+        5
+      ]
+    ],
+    "path": "RDLDRULURURDRULU",
+    "annotated_board": [
+      "##########",
+      "#.e...e.##",
+      "#aa..#...#",
+      "#.......##",
+      "#.......##",
+      "##.......#",
+      "#..##....#",
+      "#........#",
+      "#........#",
+      "##########"
+    ],
+    "date": "2026-04-06T02:03:47-06:00"
+  },
+  {
+    "width": 8,
+    "height": 8,
+    "players": 2,
+    "diameter": 18,
+    "threes_along_diameter": 15,
+    "open_cells": 58,
+    "states": 1653,
+    "start": [
+      [
+        0,
+        1
+      ],
+      [
+        5,
+        0
+      ]
+    ],
+    "end": [
+      [
+        0,
+        2
+      ],
+      [
+        0,
+        3
+      ]
+    ],
+    "path": "LDRDRULURDRDLULDRU",
+    "annotated_board": [
+      "##########",
+      "#.aee....#",
+      "#........#",
+      "#.#......#",
+      "#....#...#",
+      "##......##",
+      "#a.......#",
+      "#...#....#",
+      "#.#......#",
+      "##########"
+    ],
+    "date": "2026-04-06T02:03:44-06:00"
+  },
+  {
+    "width": 8,
+    "height": 8,
+    "players": 2,
+    "diameter": 20,
+    "threes_along_diameter": 14,
+    "open_cells": 55,
+    "states": 1485,
+    "start": [
+      [
+        0,
+        5
+      ],
+      [
+        5,
+        0
+      ]
+    ],
+    "end": [
+      [
+        0,
+        5
+      ],
+      [
+        2,
+        0
+      ]
+    ],
+    "path": "URDLDRURURDLURULDLUL",
+    "annotated_board": [
+      "##########",
+      "##...#b..#",
+      "#....##..#",
+      "#e.......#",
+      "#.....#..#",
+      "#........#",
+      "#a.#....##",
+      "#.#......#",
+      "#...#....#",
+      "##########"
+    ],
+    "date": "2026-04-06T01:54:31-06:00"
+  },
+  {
+    "width": 8,
+    "height": 8,
+    "diameter": 35,
+    "threes_along_diameter": 30,
+    "open_cells": 56,
+    "states": 1540,
+    "start": [
+      [
+        4,
+        0
+      ],
+      [
+        5,
+        0
+      ]
+    ],
+    "end": [
+      [
+        2,
+        0
+      ],
+      [
+        4,
+        6
+      ]
+    ],
+    "path": "RULULULDRDRURULULULURURDRURULDRDLUL",
+    "annotated_board": [
+      "##########",
+      "#...#....#",
+      "#.#......#",
+      "#e.....#.#",
+      "##......##",
+      "#a....#e.#",
+      "#a.......#",
+      "#......#.#",
+      "#......#.#",
+      "##########"
+    ],
+    "date": "2026-03-29T23:13:48-06:00"
+  },
+  {
+    "width": 8,
+    "height": 8,
+    "diameter": 35,
+    "threes_along_diameter": 30,
+    "open_cells": 56,
+    "states": 1540,
+    "start": [
+      [
+        4,
+        0
+      ],
+      [
+        5,
+        1
+      ]
+    ],
+    "end": [[0, 3],[7, 0]],
+    "path": "RURULULDRURDLURURULDRURDRDLURULURDL",
+    "annotated_board": [
+      "##########",
+      "#..#e..#.#",
+      "#....#...#",
+      "#.......##",
+      "#........#",
+      "#a..#....#",
+      "##a..#..##",
+      "#........#",
+      "#e.......#",
+      "##########"
+    ],
+    "date": "2026-03-29T23:38:26-06:00"
+  },
+  {
+    "width": 8,
+    "height": 8,
+    "diameter": 38,
+    "threes_along_diameter": 30,
+    "open_cells": 56,
+    "states": 1540,
+    "start": [[0, 4], [7, 5]],
+    "end": [[0, 4], [2, 7]],
+    "path": "LULDRDLDLURDRDRDLULDRDRDLULURDRURDLDRU",
+    "annotated_board": [
+      "##########",
+      "#...#b...#",
+      "#.......##",
+      "##......e#",
+      "#......#.#",
+      "#........#",
+      "#.......##",
+      "#.....#..#",
+      "##....a#.#",
+      "##########"
+    ],
+    "date": "2026-03-30T00:03:39-06:00"
+  },
+  {
+    "width": 8,
+    "height": 8,
+    "diameter": 38,
+    "threes_along_diameter": 30,
+    "open_cells": 56,
+    "states": 1540,
+    "start": [
+      [
+        0,
+        6
+      ],
+      [
+        4,
+        0
+      ]
+    ],
+    "end": [
+      [
+        0,
+        1
+      ],
+      [
+        4,
+        0
+      ]
+    ],
+    "path": "RULDLULULDRURDLULULDRULURDRULDRULDRULU",
+    "annotated_board": [
+      "##########",
+      "#.e#..#a.#",
+      "#......#.#",
+      "#....#...#",
+      "##.......#",
+      "#b.......#",
+      "#.....#..#",
+      "#.#.....##",
+      "#........#",
+      "##########"
+    ],
+    "date": "2026-03-30T00:07:08-06:00"
+  },
+  {
+    "width": 8,
+    "height": 8,
+    "diameter": 35,
+    "threes_along_diameter": 30,
+    "open_cells": 56,
+    "states": 1540,
+    "start": [
+      [
+        5,
+        0
+      ],
+      [
+        6,
+        1
+      ]
+    ],
+    "end": [
+      [
+        0,
+        3
+      ],
+      [
+        4,
+        0
+      ]
+    ],
+    "path": "RURULULDRURDLURURULDRURDRDLURULULDL",
+    "annotated_board": [
+      "##########",
+      "#..#e..#.#",
+      "#....#...#",
+      "#.......##",
+      "#........#",
+      "#e.......#",
+      "#a..#....#",
+      "##a..#..##",
+      "#........#",
+      "##########"
+    ],
+    "date": "2026-03-30T00:45:01-06:00"
+  },
+  {
+    "width": 8,
+    "height": 8,
+    "diameter": 36,
+    "threes_along_diameter": 30,
+    "open_cells": 56,
+    "states": 1540,
+    "start": [
+      [
+        0,
+        0
+      ],
+      [
+        0,
+        3
+      ]
+    ],
+    "end": [
+      [
+        0,
+        2
+      ],
+      [
+        5,
+        1
+      ]
+    ],
+    "path": "LDLDRULDLDRDLDLDLURULDLULURURDLDLDLU",
+    "annotated_board": [
+      "##########",
+      "#a.ea....#",
+      "##.......#",
+      "#....#...#",
+      "#....#...#",
+      "#.#.....##",
+      "#.e...#..#",
+      "##.......#",
+      "#..#.....#",
+      "##########"
+    ],
+    "date": "2026-03-30T00:57:40-06:00"
+  },
+  {
+    "width": 8,
+    "height": 8,
+    "diameter": 35,
+    "threes_along_diameter": 30,
+    "open_cells": 56,
+    "states": 1540,
+    "start": [
+      [
+        4,
+        6
+      ],
+      [
+        6,
+        5
+      ]
+    ],
+    "end": [
+      [
+        0,
+        4
+      ],
+      [
+        2,
+        0
+      ]
+    ],
+    "path": "ULULDLDRULURDLULULDRULURURDLULDLDRU",
+    "annotated_board": [
+      "##########",
+      "#..#.e.#.#",
+      "##.......#",
+      "#e.......#",
+      "#.#....#.#",
+      "#.....#a.#",
+      "##.......#",
+      "#.....a..#",
+      "#......#.#",
+      "##########"
+    ],
+    "date": "2026-03-30T01:23:46-06:00"
+  },
+  {
+    "width": 8,
+    "height": 8,
+    "diameter": 35,
+    "threes_along_diameter": 30,
+    "open_cells": 56,
+    "states": 1540,
+    "start": [
+      [
+        4,
+        6
+      ],
+      [
+        5,
+        4
+      ]
+    ],
+    "end": [
+      [
+        0,
+        2
+      ],
+      [
+        7,
+        0
+      ]
+    ],
+    "path": "LULURURDLULDRULULURDLULDLDRULURULDL",
+    "annotated_board": [
+      "##########",
+      "#.#e..#..#",
+      "#...#....#",
+      "##.......#",
+      "#........#",
+      "#....#.a.#",
+      "##..#a..##",
+      "#........#",
+      "#e.......#",
+      "##########"
+    ],
+    "date": "2026-03-30T01:35:30-06:00"
+  },
+  {
+    "width": 8,
+    "height": 8,
+    "diameter": 37,
+    "threes_along_diameter": 30,
+    "open_cells": 56,
+    "states": 1540,
+    "start": [
+      [
+        0,
+        5
+      ],
+      [
+        7,
+        0
+      ]
+    ],
+    "end": [
+      [
+        0,
+        2
+      ],
+      [
+        1,
+        4
+      ]
+    ],
+    "path": "URULDRDLDLDRULURULULULDLURULULURDRULU",
+    "annotated_board": [
+      "##########",
+      "#..e.#a..#",
+      "###..e...#",
+      "#........#",
+      "#........#",
+      "#.....#..#",
+      "#...#...##",
+      "#......#.#",
+      "#a.#.....#",
+      "##########"
+    ],
+    "date": "2026-03-30T03:05:59-06:00"
+  },
+  {
+    "width": 8,
+    "height": 8,
+    "diameter": 42,
+    "threes_along_diameter": 30,
+    "open_cells": 56,
+    "states": 1540,
+    "start": [
+      [
+        0,
+        7
+      ],
+      [
+        1,
+        0
+      ]
+    ],
+    "end": [
+      [
+        0,
+        4
+      ],
+      [
+        0,
+        5
+      ]
+    ],
+    "path": "LDRDRURULDRDRDLULDRULDRDLDLURDLULULDRDLDRU",
+    "annotated_board": [
+      "##########",
+      "##...ee.a#",
+      "#a.......#",
+      "#........#",
+      "#.......##",
+      "#.#......#",
+      "##.......#",
+      "#...#....#",
+      "##.#...#.#",
+      "##########"
+    ],
+    "date": "2026-03-30T04:40:17-06:00"
+  },
+  {
+    "width": 8,
+    "height": 8,
+    "diameter": 56,
+    "threes_along_diameter": 49,
+    "open_cells": 47,
+    "states": 1081,
+    "start": [
+      [
+        1,
+        0
+      ],
+      [
+        4,
+        7
+      ]
+    ],
+    "end": [
+      [
+        1,
+        0
+      ],
+      [
+        6,
+        3
+      ]
+    ],
+    "path": "DLURDRULDLDLDRULDRULDLDLURDRDRURDLULDRDRULDLDLDRULULDLUL",
+    "annotated_board": [
+      "##########",
+      "##..#.#.##",
+      "#b#...#..#",
+      "#..#.....#",
+      "#....#..##",
+      "##.....#a#",
+      "#.....#..#",
+      "####e....#",
+      "#....#..##",
+      "##########"
+    ],
+    "date": "2026-04-02T21:34:31-06:00"
+  },
+  {
+    "width": 8,
+    "height": 8,
+    "diameter": 60,
+    "threes_along_diameter": 50,
+    "open_cells": 44,
+    "states": 946,
+    "start": [
+      [
+        4,
+        0
+      ],
+      [
+        7,
+        7
+      ]
+    ],
+    "end": [
+      [
+        0,
+        4
+      ],
+      [
+        6,
+        6
+      ]
+    ],
+    "path": "LULURDRURURDRULURURULDLULDLULURURDRDLDLDLURURDRURDRDLDRURULU",
+    "annotated_board": [
+      "##########",
+      "##..#e.#.#",
+      "#..#...#.#",
+      "#....#...#",
+      "####..#..#",
+      "#a.###..##",
+      "#...#..#.#",
+      "#.#....e##",
+      "#.#...#.a#",
+      "##########"
+    ],
+    "date": "2026-04-02T21:31:33-06:00"
+  },
+  {
+    "width": 8,
+    "height": 8,
+    "diameter": 69,
+    "threes_along_diameter": 50,
+    "open_cells": 46,
+    "states": 1035,
+    "start": [
+      [
+        0,
+        0
+      ],
+      [
+        0,
+        1
+      ]
+    ],
+    "end": [
+      [
+        1,
+        2
+      ],
+      [
+        4,
+        7
+      ]
+    ],
+    "path": "DRURURDLURURURDRDLDRDLDLULDLDRDLDRULDLURDRURULULURULULDRDRULDLDLURDRU",
+    "annotated_board": [
+      "##########",
+      "#aa#...###",
+      "##.e.#...#",
+      "#...#.#..#",
+      "#..#....##",
+      "###..#..e#",
+      "##....#..#",
+      "#...#..###",
+      "#..#.....#",
+      "##########"
+    ],
+    "date": "2026-04-02T21:40:15-06:00"
+  },
+  {
+    "width": 8,
+    "height": 8,
+    "diameter": 61,
+    "threes_along_diameter": 50,
+    "open_cells": 51,
+    "states": 1275,
+    "start": [
+      [
+        0,
+        7
+      ],
+      [
+        1,
+        5
+      ]
+    ],
+    "end": [
+      [
+        3,
+        1
+      ],
+      [
+        6,
+        7
+      ]
+    ],
+    "path": "LDLDLDRDLURDRDRDLURDRURURULULULURDRURDRDLDLURDRURDRDLULULDLUR",
+    "annotated_board": [
+      "##########",
+      "#......#a#",
+      "##..#.a..#",
+      "##......##",
+      "#.e#.....#",
+      "#......#.#",
+      "#....#.#.#",
+      "#.#...#.e#",
+      "#...#..#.#",
+      "##########"
+    ],
+    "date": "2026-04-02T21:37:40-06:00"
+  },
+  {
+    "width": 8,
+    "height": 8,
+    "diameter": 63,
+    "threes_along_diameter": 51,
+    "open_cells": 48,
+    "states": 1128,
+    "start": [
+      [
+        2,
+        7
+      ],
+      [
+        3,
+        7
+      ]
+    ],
+    "end": [
+      [
+        3,
+        5
+      ],
+      [
+        6,
+        0
+      ]
+    ],
+    "path": "LULULURDRDRURDLDRDRURDRDRDLDLULULULULDRDRDRDLDLURULULULDRDRDRDL",
+    "annotated_board": [
+      "##########",
+      "#...######",
+      "##....####",
+      "#..#....a#",
+      "#.#..#e.a#",
+      "##......##",
+      "#.......##",
+      "#e.#..#..#",
+      "#.#....#.#",
+      "##########"
+    ],
+    "date": "2026-04-02T21:33:28-06:00"
+  }
+]

dataset/validate_dataset.py ADDED Viewed

	@@ -0,0 +1,203 @@

+#!/usr/bin/env python3
+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the BSD-style license found in the
+# LICENSE file in the root directory of this source tree.
+"""
+Validate dataset/ice-maze-levels.json: ``start`` / ``end`` match the board, and ``diameter`` equals ``len(path)`` when both are set.
+Board glyphs: ``a`` = start (player) only, ``e`` = exit only, ``b`` = start and exit on the same cell.
+Other lowercase letters (e.g. ``c``) are treated as additional player starts only.
+"""
+from __future__ import annotations
+import json
+import sys
+from pathlib import Path
+from typing import Dict, List, Tuple
+REPO_ROOT = Path(__file__).resolve().parent.parent
+if str(REPO_ROOT) not in sys.path:
+    sys.path.insert(0, str(REPO_ROOT))
+from models import MazeAction
+from server.maze_env_environment import MazeEnvironment
+STEP_CHAR_TO_DIRECTION = {
+    "U": "UP",
+    "D": "DOWN",
+    "L": "LEFT",
+    "R": "RIGHT",
+}
+def parse_board(rows: List[str]) -> Tuple[List[Tuple[int, int]], List[Tuple[int, int]]]:
+    """Board ``(row, col)`` from ``enumerate``: ``a``/``b``/other players → starts; ``e``/``b`` → exits."""
+    players: List[Tuple[int, int]] = []
+    exits: List[Tuple[int, int]] = []
+    for r, row in enumerate(rows):
+        for c, ch in enumerate(row):
+            if ch == "b":
+                players.append((r, c))
+                exits.append((r, c))
+            elif ch == "a":
+                players.append((r, c))
+            elif ch == "e":
+                exits.append((r, c))
+            elif len(ch) == 1 and ch.islower() and ch.isalpha():
+                players.append((r, c))
+    return sorted(players), sorted(exits)
+def to_interior_zero_based(board_rc: List[Tuple[int, int]]) -> List[Tuple[int, int]]:
+    """0-based interior: subtract 1 from each board (row, col) from ``enumerate``."""
+    return sorted((r - 1, c - 1) for r, c in board_rc)
+def json_coords(name: str, raw: object) -> List[Tuple[int, int]]:
+    if not isinstance(raw, list):
+        raise ValueError(f"{name} must be a list")
+    out: List[Tuple[int, int]] = []
+    for i, item in enumerate(raw):
+        if not isinstance(item, (list, tuple)) or len(item) != 2:
+            raise ValueError(f"{name}[{i}] must be [row, col]")
+        out.append((int(item[0]), int(item[1])))
+    return sorted(out)
+def validate_level(i: int, level: Dict) -> List[str]:
+    err: List[str] = []
+    p = f"Level[{i}]"
+    if not isinstance(level.get("annotated_board"), list) or not level["annotated_board"]:
+        err.append(f"{p}: need non-empty annotated_board (list of strings)")
+        return err
+    if "start" not in level:
+        err.append(f"{p}: missing start")
+    if "end" not in level:
+        err.append(f"{p}: missing end")
+    if err:
+        return err
+    rows = level["annotated_board"]
+    try:
+        board_players, board_exits = parse_board(rows)
+    except (TypeError, ValueError) as e:
+        return [f"{p}: board parse error: {e}"]
+    try:
+        want_players = json_coords("start", level["start"])
+        want_exits = json_coords("end", level["end"])
+    except ValueError as e:
+        return [f"{p}: {e}"]
+    got_players = to_interior_zero_based(board_players)
+    got_exits = to_interior_zero_based(board_exits)
+    if got_players != want_players:
+        err.append(
+            f"{p}: start mismatch — JSON (0-based interior, sorted): {want_players}; "
+            f"parsed board row/col (sorted): {board_players}; after -1 per axis: {got_players}"
+        )
+    if got_exits != want_exits:
+        err.append(
+            f"{p}: end mismatch — JSON (0-based interior, sorted): {want_exits}; "
+            f"parsed board row/col (sorted): {board_exits}; after -1 per axis: {got_exits}"
+        )
+    if "diameter" in level and "path" in level:
+        path = level["path"]
+        diam = level["diameter"]
+        if not isinstance(path, str):
+            err.append(f"{p}: path must be a string, got {type(path).__name__}")
+        elif not isinstance(diam, int) or isinstance(diam, bool):
+            err.append(f"{p}: diameter must be an int, got {type(diam).__name__}")
+        elif len(path) != diam:
+            err.append(
+                f"{p}: diameter ({diam}) != len(path) ({len(path)}); path={path!r}"
+            )
+    return err
+def validate_level_path_replay(i: int, level: Dict, env: MazeEnvironment) -> List[str]:
+    """Replay level path in MazeEnvironment and verify done only at the final step."""
+    p = f"Level[{i}]"
+    path = level.get("path")
+    if path is None:
+        return []
+    if not isinstance(path, str):
+        return [f"{p}: path must be a string, got {type(path).__name__}"]
+    errors: List[str] = []
+    obs = env.reset(level_index=i)
+    if not path:
+        if not obs.done:
+            errors.append(f"{p}: empty path but reset state is not done")
+        return errors
+    if obs.done:
+        errors.append(f"{p}: reset starts done=True but path is non-empty ({path!r})")
+        return errors
+    for step_idx, token in enumerate(path, start=1):
+        direction = STEP_CHAR_TO_DIRECTION.get(token)
+        if direction is None:
+            errors.append(
+                f"{p}: path contains invalid token {token!r} at 1-based step {step_idx}; "
+                f"use only {sorted(STEP_CHAR_TO_DIRECTION)}"
+            )
+            break
+        obs = env.step(MazeAction(direction=direction))
+        is_last = step_idx == len(path)
+        if obs.done and not is_last:
+            errors.append(
+                f"{p}: done became True too early at step {step_idx}/{len(path)}; path={path!r}"
+            )
+            break
+        if is_last and not obs.done:
+            errors.append(
+                f"{p}: done is False at final path step {step_idx}/{len(path)}; path={path!r}"
+            )
+    return errors
+def main() -> int:
+    path = Path(__file__).resolve().parent / "ice-maze-levels.json"
+    if not path.is_file():
+        print(f"error: missing {path}", file=sys.stderr)
+        return 1
+    data = json.loads(path.read_text(encoding="utf-8"))
+    if not isinstance(data, list):
+        print("error: root must be a JSON array", file=sys.stderr)
+        return 1
+    errors: List[str] = []
+    env = MazeEnvironment()
+    for i, level in enumerate(data):
+        if isinstance(level, dict):
+            errors.extend(validate_level(i, level))
+            errors.extend(validate_level_path_replay(i, level, env))
+        else:
+            errors.append(f"Level[{i}]: not an object")
+    if errors:
+        for msg in errors:
+            print(msg, file=sys.stderr)
+        return 1
+    print(f"OK: {len(data)} levels — {path}")
+    return 0
+if __name__ == "__main__":
+    raise SystemExit(main())

experiment.ipynb ADDED Viewed

	@@ -0,0 +1,138 @@

+{
+ "cells": [
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "id": "e962db90",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "([(0, 0), (1, 0)], [(3, 0), (7, 1)])"
+      ]
+     },
+     "execution_count": 1,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "from validate_dataset import scan_annotated_board\n",
+    "\n",
+    "scan_annotated_board(\n",
+    "    [\n",
+    "      \"##########\",\n",
+    "      \"#a.......#\",\n",
+    "      \"#a.......#\",\n",
+    "      \"#........#\",\n",
+    "      \"#e.......#\",\n",
+    "      \"##.......#\",\n",
+    "      \"##.......#\",\n",
+    "      \"##.......#\",\n",
+    "      \"#.e......#\",\n",
+    "      \"##########\"\n",
+    "    ]\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 9,
+   "id": "9b7f20ab",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from __future__ import annotations\n",
+    "\n",
+    "import json\n",
+    "import sys\n",
+    "from pathlib import Path\n",
+    "from typing import Dict, List, Tuple\n",
+    "\n",
+    "path = Path(\"dataset/ice-maze-levels.json\")\n",
+    "\n",
+    "if not path.is_file():\n",
+    "    print(f\"error: missing {path}\", file=sys.stderr)\n",
+    "    sys.exit(1)\n",
+    "\n",
+    "data = json.loads(path.read_text(encoding=\"utf-8\"))\n",
+    "\n",
+    "if not isinstance(data, list):\n",
+    "    print(\"error: root must be a JSON array\", file=sys.stderr)\n",
+    "    sys.exit(1)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 13,
+   "id": "a53bb716",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "30"
+      ]
+     },
+     "execution_count": 13,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "len(data)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "id": "9953a0b9",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "42"
+      ]
+     },
+     "execution_count": 2,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "len('LDRDRURULDRDRDLULDRULDRDLDLURDLULULDRDLDRU')"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "28535d64",
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": ".venv (3.13.11)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.13.11"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}

models.py ADDED Viewed

	@@ -0,0 +1,148 @@

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the BSD-style license found in the
+# LICENSE file in the root directory of this source tree.
+"""
+Data models for the Ice Maze Environment.
+The maze environment loads ice-sliding puzzle levels and exposes them
+to LLM agents via the OpenEnv protocol.
+"""
+from enum import Enum
+from typing import List
+from openenv.core.env_server.types import Action, Observation
+from pydantic import Field, field_validator
+class MazeDirection(str, Enum):
+    """Cardinal direction for a single Ice Maze step (all players move together)."""
+    LEFT = "LEFT"
+    RIGHT = "RIGHT"
+    UP = "UP"
+    DOWN = "DOWN"
+class MazeAction(Action):
+    """
+    Action for the Ice Maze environment.
+    The agent specifies a direction to slide all players simultaneously.
+    On ice, players slide until they hit a wall (#) or another player.
+    Exit cells (e) do NOT stop sliding — players slide through them.
+    """
+    direction: MazeDirection = Field(
+        ...,
+        description="Direction to move all players simultaneously: LEFT, RIGHT, UP, or DOWN.",
+    )
+    @field_validator("direction", mode="before")
+    @classmethod
+    def _coerce_direction(cls, v: object) -> object:
+        if isinstance(v, MazeDirection):
+            return v
+        if isinstance(v, str):
+            key = v.strip().upper()
+            if key in MazeDirection.__members__:
+                return MazeDirection[key]
+        return v
+class MazeObservation(Observation):
+    """
+    Observation from the Ice Maze environment.
+    Primary agent-facing fields: current board, step budget, action history,
+    and (on reset) the system prompt with rules and layout context.
+    Additional fields support tooling and state introspection.
+    Inherited from Observation base:
+        done (bool)         — True when all players are simultaneously on exit cells
+        reward (float|None) — Reward signal for this step
+        metadata (dict)     — Extra info: level_index, action_history (no oracle path)
+    """
+    board: str = Field(
+        default="",
+        description=(
+            "Current ASCII board rendered as a newline-separated string. "
+            "Symbols: # wall, . ice, a player on non-exit, e unoccupied exit, "
+            "b player currently on an exit."
+        ),
+    )
+    step_count: int = Field(
+        default=0,
+        description="Number of steps taken so far in this episode.",
+    )
+    max_steps: int = Field(
+        default=0,
+        description="Maximum steps allowed for this episode before a hard limit (set on reset).",
+    )
+    previous_actions: List[str] = Field(
+        default_factory=list,
+        description=(
+            "Directions applied so far in order, each value one of "
+            "LEFT, RIGHT, UP, DOWN (same vocabulary as MazeAction)."
+        ),
+    )
+    system_prompt: str = Field(
+        default="",
+        description=(
+            "Instructions for the LLM: maze rules, valid actions, symbols, layout, "
+            "step count vs max steps, and previous actions (oldest first). "
+            "Refreshed on reset() and on each step()."
+        ),
+    )
+    agent_positions: List[List[int]] = Field(
+        default_factory=list,
+        description=(
+            "Current interior coordinates of each player as [[row, col], ...]. "
+            "Interior coords are 0-indexed from the top-left non-wall cell "
+            "(i.e. board_row - 1, board_col - 1)."
+        ),
+    )
+    exit_positions: List[List[int]] = Field(
+        default_factory=list,
+        description=(
+            "Interior coordinates of all exit cells as [[row, col], ...]. "
+            "Exit cells are shared — any player can use any exit. "
+            "Fixed for the duration of the episode."
+        ),
+    )
+    num_players: int = Field(
+        default=1,
+        description="Number of players in this level.",
+    )
+    message: str = Field(
+        default="",
+        description=(
+            "Human-readable status message describing what just happened, "
+            "e.g. 'Moved LEFT. Step 3.', 'Solved! All players reached an exit in 6 steps.', "
+            "'Invalid direction. Use: LEFT, RIGHT, UP, DOWN'."
+        ),
+    )
+    def __str__(self) -> str:
+        parts = []
+        parts.append(f"done={self.done} | reward={self.reward}")
+        parts.append(f"step={self.step_count}/{self.max_steps}")
+        parts.append(f"players={self.agent_positions} exits={self.exit_positions}")
+        if self.previous_actions:
+            parts.append(f"actions={self.previous_actions}")
+        if self.message:
+            parts.append(f"message={self.message}")
+        # 👇 Full system prompt (clearly separated)
+        if self.system_prompt:
+            parts.append("\n=== SYSTEM PROMPT ===")
+            parts.append(self.system_prompt)
+        return "\n".join(parts)

openenv.yaml ADDED Viewed

	@@ -0,0 +1,7 @@

+spec_version: 1
+name: maze_env
+type: space
+runtime: fastapi
+app: server.app:app
+port: 8000

openenv_maze_env.egg-info/PKG-INFO ADDED Viewed

	@@ -0,0 +1,9 @@

+Metadata-Version: 2.4
+Name: openenv-maze_env
+Version: 0.1.0
+Summary: Maze Env environment for OpenEnv
+Requires-Python: >=3.10
+Requires-Dist: openenv-core[core]>=0.2.2
+Provides-Extra: dev
+Requires-Dist: pytest>=8.0.0; extra == "dev"
+Requires-Dist: pytest-cov>=4.0.0; extra == "dev"

openenv_maze_env.egg-info/SOURCES.txt ADDED Viewed

	@@ -0,0 +1,19 @@

+README.md
+__init__.py
+client.py
+models.py
+pyproject.toml
+./__init__.py
+./client.py
+./models.py
+./validate_dataset.py
+./dataset/ice-maze-levels.json
+openenv_maze_env.egg-info/PKG-INFO
+openenv_maze_env.egg-info/SOURCES.txt
+openenv_maze_env.egg-info/dependency_links.txt
+openenv_maze_env.egg-info/entry_points.txt
+openenv_maze_env.egg-info/requires.txt
+openenv_maze_env.egg-info/top_level.txt
+server/__init__.py
+server/app.py
+server/maze_env_environment.py

openenv_maze_env.egg-info/dependency_links.txt ADDED Viewed

	@@ -0,0 +1 @@


1	+

openenv_maze_env.egg-info/entry_points.txt ADDED Viewed

	@@ -0,0 +1,2 @@


1	+ [console_scripts]
2	+ server = maze_env.server.app:main

openenv_maze_env.egg-info/requires.txt ADDED Viewed

	@@ -0,0 +1,5 @@

+openenv-core[core]>=0.2.2
+[dev]
+pytest>=8.0.0
+pytest-cov>=4.0.0

openenv_maze_env.egg-info/top_level.txt ADDED Viewed

	@@ -0,0 +1 @@


1	+ maze_env

pyproject.toml ADDED Viewed

	@@ -0,0 +1,48 @@

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the BSD-style license found in the
+# LICENSE file in the root directory of this source tree.
+[build-system]
+requires = ["setuptools>=45", "wheel"]
+build-backend = "setuptools.build_meta"
+[project]
+name = "openenv-maze_env"
+version = "0.1.0"
+description = "Maze Env environment for OpenEnv"
+requires-python = ">=3.10"
+dependencies = [
+    # Core OpenEnv runtime (provides FastAPI server + HTTP client types)
+    # install from github
+    # "openenv-core[core] @ git+https://github.com/meta-pytorch/OpenEnv.git",
+    "openenv-core[core]>=0.2.2",
+    # Environment-specific dependencies
+    # Add all dependencies needed for your environment here
+    # Examples:
+    # "numpy>=1.19.0",
+    # "torch>=2.0.0",
+    # "gymnasium>=0.29.0",
+    # "openspiel>=1.0.0",
+    # "smolagents>=1.22.0,<2",
+]
+[project.optional-dependencies]
+dev = [
+    "pytest>=8.0.0",
+    "pytest-cov>=4.0.0",
+]
+[project.scripts]
+# Server entry point - enables running via: uv run --project . server
+# or: python -m maze_env.server.app
+server = "maze_env.server.app:main"
+[tool.setuptools]
+include-package-data = true
+packages = ["maze_env", "maze_env.server"]
+package-dir = { "maze_env" = ".", "maze_env.server" = "server" }
+[tool.setuptools.package-data]
+maze_env = ["dataset/ice-maze-levels.json"]

server/__init__.py ADDED Viewed

	@@ -0,0 +1,11 @@

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the BSD-style license found in the
+# LICENSE file in the root directory of this source tree.
+"""Maze Env environment server components."""
+from .maze_env_environment import MazeEnvironment
+__all__ = ["MazeEnvironment"]

server/app.py ADDED Viewed

	@@ -0,0 +1,84 @@

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the BSD-style license found in the
+# LICENSE file in the root directory of this source tree.
+"""
+FastAPI application for the Maze Env Environment.
+This module creates an HTTP server that exposes the MazeEnvironment
+over HTTP and WebSocket endpoints, compatible with EnvClient.
+Endpoints:
+    - POST /reset: Reset the environment
+    - POST /step: Execute an action
+    - GET /state: Get current environment state
+    - GET /schema: Get action/observation schemas
+    - WS /ws: WebSocket endpoint for persistent sessions
+Usage:
+    # Development (with auto-reload):
+    uvicorn server.app:app --reload --host 0.0.0.0 --port 8000
+    # Production:
+    uvicorn server.app:app --host 0.0.0.0 --port 8000 --workers 4
+    # Or run directly:
+    python -m server.app
+"""
+try:
+    from openenv.core.env_server.http_server import create_app
+except Exception as e:  # pragma: no cover
+    raise ImportError(
+        "openenv is required for the web interface. Install dependencies with '\n    uv sync\n'"
+    ) from e
+try:
+    from ..models import MazeAction, MazeObservation
+    from .maze_env_environment import MazeEnvironment
+except ModuleNotFoundError:
+    from models import MazeAction, MazeObservation
+    from server.maze_env_environment import MazeEnvironment
+# Create the app with web interface and README integration
+app = create_app(
+    MazeEnvironment,
+    MazeAction,
+    MazeObservation,
+    env_name="maze_env",
+    max_concurrent_envs=1,  # increase this number to allow more concurrent WebSocket sessions
+)
+def main(host: str = "0.0.0.0", port: int = 8000):
+    """
+    Entry point for direct execution via uv run or python -m.
+    This function enables running the server without Docker:
+        uv run --project . server
+        uv run --project . server --port 8001
+        python -m maze_env.server.app
+    Args:
+        host: Host address to bind to (default: "0.0.0.0")
+        port: Port number to listen on (default: 8000)
+    For production deployments, consider using uvicorn directly with
+    multiple workers:
+        uvicorn maze_env.server.app:app --workers 4
+    """
+    import uvicorn
+    uvicorn.run(app, host=host, port=port)
+if __name__ == "__main__":
+    import argparse
+    parser = argparse.ArgumentParser()
+    parser.add_argument("--port", type=int, default=8000)
+    args = parser.parse_args()
+    main(port=args.port)

server/maze_env_environment.py ADDED Viewed

	@@ -0,0 +1,291 @@

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the BSD-style license found in the
+# LICENSE file in the root directory of this source tree.
+"""
+Ice Maze Environment Implementation.
+Players slide on ice in a given direction until they hit a wall (#) or
+another player. All players move simultaneously. The episode is solved
+when every player is simultaneously on an exit cell (e) after a move.
+"""
+from typing import FrozenSet, List, Optional, Tuple
+from uuid import uuid4
+from openenv.core.env_server.interfaces import Environment
+from openenv.core.env_server.types import State
+try:
+    from ..models import MazeAction, MazeObservation
+    from .maze_env_helpers import (
+        apply_direction_slide,
+        build_step_feedback,
+        build_system_prompt,
+        load_ice_maze_levels,
+        parse_board_entities,
+        render_board,
+        resolve_max_steps,
+    )
+except ImportError:
+    try:
+        from models import MazeAction, MazeObservation
+        from server.maze_env_helpers import (
+            apply_direction_slide,
+            build_step_feedback,
+            build_system_prompt,
+            load_ice_maze_levels,
+            parse_board_entities,
+            render_board,
+            resolve_max_steps,
+        )
+    except ImportError:
+        import os
+        import sys
+        repo_root = os.path.dirname(os.path.dirname(__file__))
+        if repo_root not in sys.path:
+            sys.path.insert(0, repo_root)
+        from models import MazeAction, MazeObservation
+        from maze_env_helpers import (
+            apply_direction_slide,
+            build_step_feedback,
+            build_system_prompt,
+            load_ice_maze_levels,
+            parse_board_entities,
+            render_board,
+            resolve_max_steps,
+        )
+class MazeEnvironment(Environment):
+    """
+    Ice Maze environment.
+    Each episode loads one puzzle level from dataset/ice-maze-levels.json.
+    Players slide on ice until hitting a wall or another player.
+    All players move simultaneously in the same direction each turn.
+    The episode ends when every player is on an exit cell simultaneously.
+    Supports concurrent WebSocket sessions (each session gets its own instance).
+    """
+    SUPPORTS_CONCURRENT_SESSIONS: bool = True
+    def __init__(self):
+        """Initialise with empty state; call reset() to load a level."""
+        super().__init__()
+        self._levels: List[dict] = load_ice_maze_levels()
+        self._current_level_index: int = 0
+        self._reset_index: int = 0
+        self._level: dict = {}
+        self._grid: List[List[str]] = []
+        self._num_players: int = 1
+        # Interior coords (0-based inside ``#`` wall): grid row/col = interior + 1
+        self._agent_positions: List[Tuple[int, int]] = []
+        # Goal tile interiors are fixed for the episode.
+        self._exit_positions: FrozenSet[Tuple[int, int]] = frozenset()
+        self._action_history: List[str] = []
+        self._max_steps: int = 0
+        self._done: bool = False
+        self._state: State = State(episode_id=str(uuid4()), step_count=0)
+    def _interior_positions_lists(self) -> Tuple[List[List[int]], List[List[int]]]:
+        """Return agent/exit interior positions as JSON-friendly `[row, col]` lists."""
+        agents = [[r, c] for r, c in self._agent_positions]
+        exits = [[r, c] for r, c in sorted(self._exit_positions)]
+        return agents, exits
+    # ------------------------------------------------------------------
+    # Public API
+    # ------------------------------------------------------------------
+    def reset(self, level_index: Optional[int] = None, **kwargs) -> MazeObservation:
+        """
+        Reset the environment and load a puzzle level.
+        Args:
+            level_index: If given, load this level index (modulo number of levels).
+                If omitted, use the internal reset counter and advance it so
+                successive resets cycle through the dataset.
+        Returns:
+            MazeObservation with the initial board state and full system prompt.
+        """
+        n = len(self._levels)
+        if level_index is not None:
+            idx = int(level_index) % n
+        else:
+            idx = self._reset_index % n
+        # Count every reset call, including manual level picks.
+        self._reset_index += 1
+        self._current_level_index = idx
+        self._level = self._levels[idx]
+        self._grid = [list(row) for row in self._level["annotated_board"]]
+        agent_list, exit_list = parse_board_entities(self._grid)
+        self._agent_positions = list(agent_list)
+        self._exit_positions = frozenset(exit_list)
+        self._num_players = self._level.get("players", len(self._agent_positions))
+        self._action_history = []
+        self._max_steps = resolve_max_steps(self._level, kwargs)
+        self._done = False
+        self._state = State(episode_id=str(uuid4()), step_count=0)
+        # Check degenerate case: all players already on exits
+        self._done = all(pos in self._exit_positions for pos in self._agent_positions)
+        ag, ex = self._interior_positions_lists()
+        return MazeObservation(
+            board=render_board(self._grid),
+            step_count=0,
+            max_steps=self._max_steps,
+            previous_actions=[],
+            system_prompt=self._full_system_prompt(),
+            agent_positions=ag,
+            exit_positions=ex,
+            num_players=self._num_players,
+            message="Level loaded. Find the exit!",
+            done=self._done,
+            reward=0.0,
+            # path/diameter live on self._level for offline rubrics only — not agent-facing
+            metadata={
+                "level_index": idx,
+            },
+        )
+    def step(self, action: MazeAction, **kwargs) -> MazeObservation:  # type: ignore[override]
+        """
+        Execute one environment step for a direction command.
+        If the episode is already done, return the current state unchanged.
+        Otherwise apply one directional slide move to all players and update
+        step history, solved status, reward, and message.
+        """
+        # MazeAction enforces a valid MazeDirection; value is canonical ("LEFT", etc.).
+        direction = action.direction.value
+        if self._done:
+            return self._current_obs(
+                message="Episode already complete. Call reset() to start a new episode.",
+                reward=0.0,
+            )
+        any_slide_moved = apply_direction_slide(
+            grid=self._grid,
+            direction=direction,
+            num_players=self._num_players,
+            agent_positions=self._agent_positions,
+            exit_positions=self._exit_positions,
+        )
+        self._action_history.append(direction)
+        self._state.step_count += 1
+        self._done = all(pos in self._exit_positions for pos in self._agent_positions)
+        reward, message = build_step_feedback(
+            done=self._done,
+            moved=any_slide_moved,
+            direction=direction,
+            step_count=self._state.step_count,
+        )
+        prev = list(self._action_history)
+        ag, ex = self._interior_positions_lists()
+        return MazeObservation(
+            board=render_board(self._grid),
+            step_count=self._state.step_count,
+            max_steps=self._max_steps,
+            previous_actions=prev,
+            system_prompt=self._full_system_prompt(),
+            agent_positions=ag,
+            exit_positions=ex,
+            num_players=self._num_players,
+            message=message,
+            done=self._done,
+            reward=reward,
+            metadata={
+                "level_index": self._current_level_index,
+                "action_history": prev,
+            },
+        )
+    @property
+    def state(self) -> State:
+        """
+        Return the full current state for LLM introspection.
+        Includes board, positions, action history, and level metadata.
+        """
+        ag, ex = self._interior_positions_lists()
+        return State(
+            episode_id=self._state.episode_id,
+            step_count=self._state.step_count,
+            # Extra fields (State uses extra="allow")
+            current_board=render_board(self._grid),
+            num_players=self._num_players,
+            agent_positions=ag,
+            exit_positions=ex,
+            action_history=list(self._action_history),
+            level_index=self._current_level_index,
+            done=self._done,
+        )
+    def _full_system_prompt(self) -> str:
+        """Rules + board + positions + step budget + current step count and action history."""
+        ag, ex = self._interior_positions_lists()
+        return build_system_prompt(
+            width=self._level.get("width", "?"),
+            height=self._level.get("height", "?"),
+            num_players=self._num_players,
+            board=render_board(self._grid),
+            agent_positions_interior=ag,
+            exit_positions_interior=ex,
+            max_steps=self._max_steps,
+            step_count=self._state.step_count,
+            previous_actions=list(self._action_history),
+        )
+    def _current_obs(self, message: str, reward: float) -> MazeObservation:
+        """Return an observation reflecting the current state (no movement)."""
+        prev = list(self._action_history)
+        ag, ex = self._interior_positions_lists()
+        return MazeObservation(
+            board=render_board(self._grid),
+            step_count=self._state.step_count,
+            max_steps=self._max_steps,
+            previous_actions=prev,
+            system_prompt=self._full_system_prompt(),
+            agent_positions=ag,
+            exit_positions=ex,
+            num_players=self._num_players,
+            message=message,
+            done=self._done,
+            reward=reward,
+            metadata={
+                "level_index": self._current_level_index,
+                "action_history": prev,
+            },
+        )
+# ---------------------------------------------------------------------------
+# Quick smoke-test (run directly: python server/maze_env_environment.py)
+# ---------------------------------------------------------------------------
+# if __name__ == "__main__":
+#     env = MazeEnvironment()
+#     print("=== RESET (level 0) ===")
+#     obs = env.reset(level_index=23)
+#     print(obs)
+#     print(f"done={obs.done}, reward={obs.reward}")
+#     moves = ["UP", "LEFT", "DOWN", "RIGHT"]
+#     for move in moves:
+#         print(f"\n=== STEP: {move} ===")
+#         obs = env.step(MazeAction(direction=move))
+#         print(obs)
+#         print("######################################################")

server/maze_env_helpers.py ADDED Viewed

	@@ -0,0 +1,298 @@

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the BSD-style license found in the
+# LICENSE file in the root directory of this source tree.
+"""Pure helpers for the Ice Maze environment (I/O, coords, prompts, slide order)."""
+from __future__ import annotations
+import json
+import os
+from typing import Dict, FrozenSet, List, Optional, Set, Tuple
+try:
+    from ..models import MazeDirection
+except ImportError:
+    from models import MazeDirection
+# ---------------------------------------------------------------------------
+# Level dataset
+# ---------------------------------------------------------------------------
+_LEVELS_CACHE: Optional[List[dict]] = None
+def load_ice_maze_levels() -> List[dict]:
+    """Load the level JSON once and reuse it across environment instances."""
+    global _LEVELS_CACHE
+    if _LEVELS_CACHE is not None:
+        return _LEVELS_CACHE
+    levels_path = os.path.normpath(
+        os.path.join(os.path.dirname(__file__), "..", "dataset", "ice-maze-levels.json")
+    )
+    with open(levels_path, "r") as f:
+        _LEVELS_CACHE = json.load(f)
+    return _LEVELS_CACHE
+# ---------------------------------------------------------------------------
+# Board / cell utilities
+# ---------------------------------------------------------------------------
+def parse_board_entities(
+    grid: List[List[str]],
+) -> Tuple[List[Tuple[int, int]], List[Tuple[int, int]]]:
+    """Extract player (`a`/`b`) and exit (`e`/`b`) coordinates from the board grid."""
+    agents: List[Tuple[int, int]] = []
+    exits: Set[Tuple[int, int]] = set()
+    for br, row in enumerate(grid):
+        for bc, cell in enumerate(row):
+            ir, ic = br - 1, bc - 1
+            if cell == "e":
+                exits.add((ir, ic))
+            elif cell == "b":
+                agents.append((ir, ic))
+                exits.add((ir, ic))
+            elif cell == "a":
+                agents.append((ir, ic))
+    return agents, sorted(exits)
+def render_board(grid: List[List[str]]) -> str:
+    """Convert the 2D grid into the text board sent in observations/prompts."""
+    return "\n".join("".join(row) for row in grid)
+# ---------------------------------------------------------------------------
+# Movement
+# ---------------------------------------------------------------------------
+DIRECTION_DELTAS: Dict[str, Tuple[int, int]] = {
+    MazeDirection.UP.value: (-1, 0),
+    MazeDirection.DOWN.value: (1, 0),
+    MazeDirection.LEFT.value: (0, -1),
+    MazeDirection.RIGHT.value: (0, 1),
+}
+def cell_at_interior(grid: List[List[str]], ir: int, ic: int) -> str:
+    """Read the grid character at an interior coordinate (border-offset by +1)."""
+    return grid[ir + 1][ic + 1]
+def set_cell_at_interior(grid: List[List[str]], ir: int, ic: int, ch: str) -> None:
+    """Write a grid character at an interior coordinate (border-offset by +1)."""
+    grid[ir + 1][ic + 1] = ch
+def can_move_to_cell(
+    grid: List[List[str]],
+    ir: int,
+    ic: int,
+    exit_positions: FrozenSet[Tuple[int, int]],
+) -> bool:
+    """Return whether a player may slide into this interior cell."""
+    br, bc = ir + 1, ic + 1
+    if br < 0 or bc < 0 or br >= len(grid) or bc >= len(grid[0]):
+        return False
+    ch = cell_at_interior(grid, ir, ic)
+    # Movable: empty ice or unoccupied exit.
+    if ch == "." or ch == "e":
+        return True
+    # Blocked: wall, occupied floor, occupied exit.
+    if ch == "#" or ch == "a" or ch == "b":
+        return False
+    return ch == "."
+def glyph_agent_enters(
+    ir: int,
+    ic: int,
+    exit_positions: FrozenSet[Tuple[int, int]],
+) -> str:
+    """Return destination glyph after an agent enters a cell."""
+    if (ir, ic) in exit_positions:
+        return "b"
+    return "a"
+def glyph_after_agent_leaves(
+    ir: int,
+    ic: int,
+    exit_positions: FrozenSet[Tuple[int, int]],
+) -> str:
+    """Return source glyph after an agent leaves a cell."""
+    if (ir, ic) in exit_positions:
+        return "e"
+    return "."
+def slide_one_agent(
+    grid: List[List[str]],
+    agent_positions: List[Tuple[int, int]],
+    agent_index: int,
+    dr: int,
+    dc: int,
+    exit_positions: FrozenSet[Tuple[int, int]],
+) -> bool:
+    """Slide one agent until blocked; return True if it moved at least one cell."""
+    moved = False
+    while True:
+        ir, ic = agent_positions[agent_index]
+        nr, nc = ir + dr, ic + dc
+        if not can_move_to_cell(grid, nr, nc, exit_positions):
+            break
+        set_cell_at_interior(
+            grid,
+            ir,
+            ic,
+            glyph_after_agent_leaves(ir, ic, exit_positions),
+        )
+        set_cell_at_interior(
+            grid,
+            nr,
+            nc,
+            glyph_agent_enters(nr, nc, exit_positions),
+        )
+        agent_positions[agent_index] = (nr, nc)
+        moved = True
+    return moved
+def apply_direction_slide(
+    grid: List[List[str]],
+    direction: str,
+    num_players: int,
+    agent_positions: List[Tuple[int, int]],
+    exit_positions: FrozenSet[Tuple[int, int]],
+) -> bool:
+    """Apply one directional move to all players; return whether any player moved."""
+    dr, dc = DIRECTION_DELTAS[direction]
+    any_moved = False
+    indices = sorted_slide_player_indices(direction, num_players, agent_positions)
+    for agent_index in indices:
+        if slide_one_agent(
+            grid=grid,
+            agent_positions=agent_positions,
+            agent_index=agent_index,
+            dr=dr,
+            dc=dc,
+            exit_positions=exit_positions,
+        ):
+            any_moved = True
+    return any_moved
+def build_step_feedback(done: bool, moved: bool, direction: str, step_count: int) -> Tuple[float, str]:
+    """Return step reward and status message from current transition outcome."""
+    if done:
+        return (
+            1.0,
+            f"Solved! All players reached an exit in {step_count} step(s).",
+        )
+    if not moved:
+        return (-0.1, f"No player moved — already against a wall going {direction}.")
+    return (-0.01, f"Moved {direction}. Step {step_count}.")
+def sorted_slide_player_indices(
+    direction: str,
+    num_players: int,
+    agent_positions: List[Tuple[int, int]],
+) -> List[int]:
+    """Order player updates so simultaneous sliding resolves collisions correctly."""
+    if direction == MazeDirection.LEFT.value:
+        return sorted(range(num_players), key=lambda i: agent_positions[i][1])
+    if direction == MazeDirection.RIGHT.value:
+        return sorted(range(num_players), key=lambda i: agent_positions[i][1], reverse=True)
+    if direction == MazeDirection.UP.value:
+        return sorted(range(num_players), key=lambda i: agent_positions[i][0])
+    return sorted(range(num_players), key=lambda i: agent_positions[i][0], reverse=True)
+# ---------------------------------------------------------------------------
+# Episode parameters
+# ---------------------------------------------------------------------------
+def resolve_max_steps(level: dict, reset_kwargs: Optional[dict] = None) -> int:
+    """Choose max steps from reset args, level config, or a diameter-based default."""
+    reset_kwargs = reset_kwargs or {}
+    if "max_steps" in reset_kwargs:
+        return int(reset_kwargs["max_steps"])
+    if "max_steps" in level:
+        return int(level["max_steps"])
+    path = level.get("path") or ""
+    diam = int(level.get("diameter", len(path) if path else 1))
+    return max(1, diam * 5)
+# ---------------------------------------------------------------------------
+# LLM system prompt
+# ---------------------------------------------------------------------------
+def build_system_prompt(
+    *,
+    width: object,
+    height: object,
+    num_players: int,
+    board: str,
+    agent_positions_interior: List[List[int]],
+    exit_positions_interior: List[List[int]],
+    max_steps: int,
+    step_count: int,
+    previous_actions: List[str],
+) -> str:
+    """Build the full system prompt text describing rules and current episode state."""
+    player_line = (
+        "There is 1 player on the board."
+        if num_players == 1
+        else f"There are {num_players} players on the board."
+    )
+    move_line = (
+        "Each turn, send a direction to move the player."
+        if num_players == 1
+        else "Each turn, ALL players move SIMULTANEOUSLY in the same direction."
+    )
+    block_line = (
+        "" if num_players == 1
+        else "  - Players act as walls — they block each other's sliding.\n"
+    )
+    prev_display = ", ".join(previous_actions) if previous_actions else "(none yet)"
+    return (
+        f"You are playing an Ice Maze puzzle.\n"
+        f"\n"
+        f"BOARD ({width}×{height}):\n"
+        f"{board}\n"
+        f"\n"
+        f"SYMBOLS:\n"
+        f"  #  = Wall (impassable)\n"
+        f"  .  = Open cell (slippery ice)\n"
+        f"  a  = Player on a non-exit cell\n"
+        f"  b  = Player currently on an exit cell\n"
+        f"  e  = Exit cell (goal)\n"
+        f"\n"
+        f"RULES:\n"
+        f"  - {player_line}\n"
+        f"  - {move_line}\n"
+        f"  - On ice, each player SLIDES until they hit a wall (#) or another player (a or b).\n"
+        f"{block_line}"
+        f"  - Exit cells (e) do NOT stop sliding — players slide through or onto them.\n"
+        f"  - After all players stop: if EVERY player is on an exit cell → you win!\n"
+        f"  - Exit cells are shared — any player can use any exit.\n"
+        f"\n"
+        f"STEP BUDGET: at most {max_steps} steps for this level.\n"
+        f"\n"
+        f"VALID ACTIONS: \"LEFT\", \"RIGHT\", \"UP\", \"DOWN\"\n"
+        f"\n"
+        f"Current player position(s): {agent_positions_interior}\n"
+        f"Exit cell position(s):       {exit_positions_interior}\n"
+        f"\n"
+        f"EPISODE PROGRESS:\n"
+        f"  - Step count (moves so far): {step_count} / {max_steps}\n"
+        f"  - Previous actions (oldest → newest): {prev_display}\n"
+    )

server/requirements.txt ADDED Viewed

	@@ -0,0 +1,6 @@

+openenv[core]>=0.2.0
+fastapi>=0.115.0
+uvicorn>=0.24.0

uv.lock ADDED Viewed

The diff for this file is too large to render. See raw diff