Spaces:

anushaacharya
/

minesweeper-env

Running

App Files Files Community

anushaacharya commited on 5 days ago

Commit

d218d3a

verified ·

1 Parent(s): 9c16ca9

Upload folder using huggingface_hub

Browse files

Files changed (11) hide show

Dockerfile +72 -0
README.md +150 -4
__init__.py +13 -0
client.py +118 -0
models.py +144 -0
openenv.yaml +6 -0
pyproject.toml +32 -0
server/__init__.py +12 -0
server/app.py +35 -0
server/build_docker.sh +48 -0
server/minesweeper_environment.py +338 -0

Dockerfile ADDED Viewed

	@@ -0,0 +1,72 @@

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the BSD-style license found in the
+# LICENSE file in the root directory of this source tree.
+ARG BASE_IMAGE=ghcr.io/meta-pytorch/openenv-base:latest
+FROM ${BASE_IMAGE} AS builder
+WORKDIR /app
+# Build argument to control whether we're building standalone or in-repo
+ARG BUILD_MODE=in-repo
+# Copy environment code (always at root of build context)
+COPY . /app/env
+# For in-repo builds, openenv-core is already in the pyproject.toml dependencies
+# For standalone builds, openenv-core will be installed from pip via pyproject.toml
+WORKDIR /app/env
+# Ensure uv is available (for local builds where base image lacks it)
+RUN if ! command -v uv >/dev/null 2>&1; then \
+        curl -LsSf https://astral.sh/uv/install.sh | sh && \
+        mv /root/.local/bin/uv /usr/local/bin/uv && \
+        mv /root/.local/bin/uvx /usr/local/bin/uvx; \
+    fi
+# Install git for building from git repos (build-time only)
+RUN apt-get update && apt-get install -y --no-install-recommends \
+    git \
+    && rm -rf /var/lib/apt/lists/*
+# Install dependencies using uv sync
+RUN --mount=type=cache,target=/root/.cache/uv \
+    if [ -f uv.lock ]; then \
+        uv sync --frozen --no-install-project --no-editable; \
+    else \
+        uv sync --no-install-project --no-editable; \
+    fi
+RUN --mount=type=cache,target=/root/.cache/uv \
+    if [ -f uv.lock ]; then \
+        uv sync --frozen --no-editable; \
+    else \
+        uv sync --no-editable; \
+    fi
+# Final runtime stage
+FROM ${BASE_IMAGE}
+WORKDIR /app
+# Copy the virtual environment from builder
+COPY --from=builder /app/env/.venv /app/.venv
+# Copy the environment code
+COPY --from=builder /app/env /app/env
+# Set PATH to use the virtual environment
+ENV PATH="/app/.venv/bin:$PATH"
+# Set PYTHONPATH so imports work correctly
+ENV PYTHONPATH="/app/env:$PYTHONPATH"
+# Health check using Python (more portable than curl/wget)
+HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \
+    CMD python -c "import urllib.request; urllib.request.urlopen('http://localhost:8000/health')" || exit 1
+# Run the FastAPI server
+ENV ENABLE_WEB_INTERFACE=true
+CMD ["sh", "-c", "cd /app/env && uvicorn server.app:app --host 0.0.0.0 --port 8000"]

README.md CHANGED Viewed

@@ -1,10 +1,156 @@
 ---
-title: Minesweeper Env
-emoji: 🔥
-colorFrom: yellow
 colorTo: indigo
 sdk: docker
 pinned: false
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 ---
+title: Minesweeper Environment Server
+emoji: 💣
+colorFrom: blue
 colorTo: indigo
 sdk: docker
 pinned: false
+app_port: 8000
+base_path: /web
+tags:
+  - openenv
 ---
+# Minesweeper Environment
+A Minesweeper game environment for reinforcement learning agents. The environment consists of a grid with hidden mines where the agent must reveal all non-mine cells without triggering any mines.
+## Overview
+The agent can perform two types of actions:
+- Reveal cells to uncover numbers indicating adjacent mines
+- Place or remove flags on suspected mine locations
+The game ends when all non-mine cells are revealed (win) or a mine is revealed (loss).
+## Quick Start
+```python
+from envs.minesweeper_env import MinesweeperAction, MinesweeperEnv
+# Create environment from Docker image
+minesweeper_env = MinesweeperEnv.from_docker_image("minesweeper-env:latest")
+try:
+    # Reset the environment
+    result = minesweeper_env.reset()
+    print(f"Board size: {result.observation.board_height}x{result.observation.board_width}")
+    print(f"Number of mines: {result.observation.num_mines}")
+    # Reveal a cell
+    result = minesweeper_env.step(MinesweeperAction(row=2, col=2, action_type="reveal"))
+    print(f"Cells revealed: {result.observation.cells_revealed}")
+    print(f"Reward: {result.observation.reward}")
+    # Place a flag
+    result = minesweeper_env.step(MinesweeperAction(row=1, col=1, action_type="flag"))
+    print(f"Flags placed: {result.observation.flags_placed}")
+finally:
+    minesweeper_env.close()
+```
+## Building the Docker Image
+Build the Docker image from the project root:
+```bash
+docker build -t minesweeper-env:latest -f src/envs/minesweeper_env/server/Dockerfile .
+```
+Or use the build script:
+```bash
+cd src/envs/minesweeper_env/server
+./build_docker.sh latest
+```
+## Environment Details
+### Action
+**MinesweeperAction**: Specifies the cell and action type
+- `row` (int) - Row index (0-indexed)
+- `col` (int) - Column index (0-indexed)
+- `action_type` (str) - Either "reveal" or "flag"
+### Observation
+**MinesweeperObservation**: Current board state and game information
+- `board` (list[list]) - 2D grid showing the current state of each cell:
+  - `-1`: Unrevealed cell
+  - `0-8`: Number of adjacent mines (revealed cell)
+  - `'F'`: Flagged cell
+  - `'*'`: Mine (only shown when game is lost)
+- `num_mines` (int) - Total number of mines on the board
+- `flags_placed` (int) - Number of flags currently placed
+- `cells_revealed` (int) - Number of cells that have been revealed
+- `game_status` (GameStatus) - Current game status (ONGOING, WON, or LOST)
+- `done` (bool) - Whether the game has ended
+- `reward` (float) - Reward from the last action
+- `metadata` (dict) - Additional information
+### Rewards
+- Revealing a safe cell: +1.0
+- Placing a flag on a mine: +0.5
+- Revealing a mine (game over): -10.0
+- Revealing an already revealed cell: -0.05
+- Invalid action: -0.1
+### Game Status
+- `GameStatus.ONGOING`: Game is still in progress
+- `GameStatus.WON`: All non-mine cells have been revealed
+- `GameStatus.LOST`: A mine was revealed
+## Configuration
+The default configuration is:
+- Board height: 5
+- Board width: 5
+- Number of mines: 5
+These can be configured when initializing the environment server.
+## Connecting to an Existing Server
+If you have a server already running:
+```python
+from envs.minesweeper_env import MinesweeperEnv
+# Connect to existing server
+minesweeper_env = MinesweeperEnv(base_url="http://localhost:8000")
+# Use as normal
+result = minesweeper_env.reset()
+```
+Note: When connecting to an existing server, `close()` will not stop the server.
+## Running Tests
+Run the test suite:
+```bash
+python tests/envs/test_minesweeper_env.py
+```
+## Project Structure
+```
+minesweeper_env/
+├── __init__.py                    # Module exports
+├── README.md                      # This file
+├── client.py                      # MinesweeperEnv client implementation
+├── models.py                      # Action, Observation, and State models
+├── openenv.yaml                   # Environment configuration
+├── pyproject.toml                 # Package dependencies
+└── server/
+    ├── __init__.py                # Server module exports
+    ├── minesweeper_environment.py # Core game logic
+    ├── app.py                     # FastAPI application
+    ├── Dockerfile                 # Container image definition
+    └── build_docker.sh            # Build script
+```

__init__.py ADDED Viewed

	@@ -0,0 +1,13 @@

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the BSD-style license found in the
+# LICENSE file in the root directory of this source tree.
+"""Minesweeper Environment - A simple test environment for HTTP server."""
+from .client import MinesweeperEnv
+from .models import MinesweeperAction, MinesweeperObservation, GameStatus
+__all__ = ["MinesweeperAction", "MinesweeperObservation", "MinesweeperEnv", "GameStatus"]

client.py ADDED Viewed

	@@ -0,0 +1,118 @@

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the BSD-style license found in the
+# LICENSE file in the root directory of this source tree.
+"""
+Minesweeper Environment Client.
+This module provides the client for connecting to a Minesweeper Environment server
+via WebSocket for persistent sessions.
+"""
+from typing import Any, Dict
+# Support both in-repo and standalone imports
+try:
+    # In-repo imports (when running from OpenEnv repository)
+    from openenv.core.client_types import StepResult
+    from openenv.core.env_server.types import State
+    from openenv.core.env_client import EnvClient
+    from .models import MinesweeperAction, MinesweeperObservation
+except ImportError:
+    # Standalone imports (when environment is standalone with openenv from pip)
+    from openenv.core.client_types import StepResult
+    from openenv.core.env_server.types import State
+    from openenv.core.env_client import EnvClient
+    from models import MinesweeperAction, MinesweeperObservation
+class MinesweeperEnv(EnvClient[MinesweeperAction, MinesweeperObservation, State]):
+    """
+    Client for the Minesweeper Environment.
+    This client maintains a persistent WebSocket connection to the environment
+    server, enabling efficient multi-step interactions with lower latency.
+    Each client instance has its own dedicated environment session on the server.
+    Example:
+        >>> # Connect to a running server
+        >>> with MinesweeperEnv(base_url="http://localhost:8000") as client:
+        ...     result = client.reset()
+        ...     print(result.observation.board)
+        ...     print(result.observation.game_status)
+        ...
+        ...     # Reveal a cell
+        ...     result = client.step(MinesweeperAction(row=0, col=0, action_type="reveal"))
+        ...     print(result.observation.board)
+        ...     print(result.reward)
+    Example with Docker:
+        >>> # Automatically start container and connect
+        >>> client = MinesweeperEnv.from_docker_image("minesweeper-env:latest")
+        >>> try:
+        ...     result = client.reset()
+        ...     result = client.step(MinesweeperAction(row=2, col=3, action_type="reveal"))
+        ... finally:
+        ...     client.close()
+    """
+    def _step_payload(self, action: MinesweeperAction) -> Dict:
+        """
+        Convert MinesweeperAction to JSON payload for step request.
+        Args:
+            action: MinesweeperAction instance
+        Returns:
+            Dictionary representation suitable for JSON encoding
+        """
+        return {
+            "row": action.row,
+            "col": action.col,
+            "action_type": action.action_type,
+        }
+    def _parse_result(self, payload: Dict) -> StepResult[MinesweeperObservation]:
+        """
+        Parse server response into StepResult[MinesweeperObservation].
+        Args:
+            payload: JSON response from server
+        Returns:
+            StepResult with MinesweeperObservation
+        """
+        obs_data = payload.get("observation", {})
+        observation = MinesweeperObservation(
+            board=obs_data.get("board", []),
+            num_mines=obs_data.get("num_mines", 0),
+            flags_placed=obs_data.get("flags_placed", 0),
+            cells_revealed=obs_data.get("cells_revealed", 0),
+            game_status=obs_data.get("game_status", "ongoing"),
+            done=payload.get("done", False),
+            reward=payload.get("reward"),
+            metadata=obs_data.get("metadata", {}),
+        )
+        return StepResult(
+            observation=observation,
+            reward=payload.get("reward"),
+            done=payload.get("done", False),
+        )
+    def _parse_state(self, payload: Dict) -> State:
+        """
+        Parse server response into State object.
+        Args:
+            payload: JSON response from /state endpoint
+        Returns:
+            State object with episode_id and step_count
+        """
+        return State(
+            episode_id=payload.get("episode_id"),
+            step_count=payload.get("step_count", 0),
+        )

models.py ADDED Viewed

	@@ -0,0 +1,144 @@

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the BSD-style license found in the
+# LICENSE file in the root directory of this source tree.
+"""
+Data models for the Minesweeper Environment.
+The minesweeper_env environment is a Minesweeper game where agents reveal cells and place flags
+to identify mines on a grid board.
+"""
+from enum import Enum
+from typing import List, Any, Set, Tuple
+from pydantic import Field, BaseModel
+# Support both in-repo and standalone imports
+try:
+    # In-repo imports (when running from OpenEnv repository)
+    from openenv.core.env_server.types import Action, Observation
+except ImportError:
+    # Standalone imports (when environment is standalone with openenv from pip)
+    from openenv.core.env_server.types import Action, Observation
+class GameStatus(Enum):
+    """Status of the Minesweeper game."""
+    ONGOING = "ongoing"
+    WON = "won"
+    LOST = "lost"
+class MinesweeperAction(Action):
+    """
+    Action for the Minesweeper environment.
+    Attributes:
+        row: Row index of the cell to act on (0-indexed).
+        col: Column index of the cell to act on (0-indexed).
+        action_type: Type of action - 'reveal' to uncover a cell, 'flag' to place/remove a flag.
+    """
+    row: int = Field(..., ge=0, description="Row index of the cell")
+    col: int = Field(..., ge=0, description="Column index of the cell")
+    action_type: str = Field(..., pattern="^(reveal|flag)$", description="Type of action: 'reveal' or 'flag'")
+class MinesweeperObservation(Observation):
+    """
+    Observation from the Minesweeper environment.
+    This represents what the agent can see - a partial view of the board with hidden mine locations (unless revealed).
+    Attributes:
+        board: 2D list representing the current state of the board. Each cell can be:
+            - -1: unrevealed
+            - 0-8: number of adjacent mines (if revealed)
+            - 'F': flagged cell
+            - '*': mine (only revealed if game is lost)
+        num_mines: Total number of mines on the board.
+        flags_placed: Number of flags currently placed by the agent.
+        cells_revealed: Number of cells that have been revealed so far.
+        game_status: Current status of the game - ongoing, won, or lost.
+    """
+    board: List[List[Any]] = Field(default_factory=list, description="2D board state")
+    num_mines: int = Field(..., ge=0, description="Total number of mines")
+    flags_placed: int = Field(..., ge=0, description="Number of flags placed")
+    cells_revealed: int = Field(..., ge=0, description="Number of cells revealed")
+    game_status: GameStatus = Field(..., description="Current game status")
+    @property
+    def board_height(self) -> int:
+        """Height of the board (number of rows)."""
+        return len(self.board)
+    @property
+    def board_width(self) -> int:
+        """Width of the board (number of columns)."""
+        return len(self.board[0]) if self.board else 0
+class MinesweeperState(BaseModel):
+    """
+    Internal state of the Minesweeper environment.
+    This represents the full internal state of the environment, including hidden information.
+    Attributes:
+        episode_id: Unique identifier for the current episode.
+        step_count: Number of steps taken in the current episode.
+        board_height: Height of the board (number of rows).
+        board_width: Width of the board (number of columns).
+        mine_locations: Set of (row, col) tuples indicating where mines are located.
+        revealed_cells: Set of (row, col) tuples indicating which cells have been revealed.
+        flags: Set of (row, col) tuples indicating where flags have been placed.
+        mine_counts: 2D list with counts of adjacent mines for each cell.
+        game_status: Current status of the game - ongoing, won, or lost.
+    """
+    episode_id: str
+    step_count: int
+    board_height: int
+    board_width: int
+    mine_locations: Set[Tuple[int, int]]
+    revealed_cells: Set[Tuple[int, int]]
+    flags: Set[Tuple[int, int]]
+    mine_counts: List[List[int]]
+    game_status: GameStatus
+    def to_observation(self) -> MinesweeperObservation:
+        """
+        Convert the full state to a partial observation for the agent.
+        Returns:
+            MinesweeperObservation representing the agent's view of the board.
+        """
+        board = []
+        for r in range(self.board_height):
+            row = []
+            for c in range(self.board_width):
+                if (r, c) in self.revealed_cells:
+                    if (r, c) in self.mine_locations:
+                        cell_value = '*'  # Revealed mine
+                    else:
+                        cell_value = self.mine_counts[r][c]  # Number of adjacent mines
+                elif (r, c) in self.flags:
+                    cell_value = 'F'  # Flagged cell
+                else:
+                    cell_value = -1  # Unrevealed cell
+                row.append(cell_value)
+            board.append(row)
+        return MinesweeperObservation(
+            board=board,
+            num_mines=len(self.mine_locations),
+            flags_placed=len(self.flags),
+            cells_revealed=len(self.revealed_cells),
+            game_status=self.game_status,
+            done=self.game_status != GameStatus.ONGOING,
+            reward=0.0,
+            metadata={
+                "episode_id": self.episode_id,
+                "step_count": self.step_count,
+            },
+        )

openenv.yaml ADDED Viewed

	@@ -0,0 +1,6 @@

+spec_version: 1
+name: minesweeper
+type: space
+runtime: fastapi
+app: server.app:app
+port: 8000

pyproject.toml ADDED Viewed

	@@ -0,0 +1,32 @@

+[build-system]
+requires = ["setuptools>=45", "wheel"]
+build-backend = "setuptools.build_meta"
+[project]
+name = "openenv-minesweeper"
+version = "0.1.0"
+description = "Minesweeper Environment for OpenEnv"
+requires-python = ">=3.10"
+dependencies = [
+    "openenv-core>=0.1.0",
+    "fastapi>=0.115.0",
+    "uvicorn>=0.24.0",
+    "pydantic>=2.0.0",
+    "requests>=2.31.0",
+]
+[project.optional-dependencies]
+dev = [
+    "pytest>=8.0.0",
+    "pytest-cov>=4.0.0",
+]
+[project.scripts]
+# Server entry point -enables running via: uv run --project . server
+# or: python -m minesweeper.server.app
+server = "minesweeper.server.app:main"
+[tool.setuptools]
+include-package-data = true
+packages = ["minesweeper", "minesweeper.server"]
+package-dir = { "minesweeper" = ".", "minesweeper.server" = "server" }

server/__init__.py ADDED Viewed

	@@ -0,0 +1,12 @@

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the BSD-style license found in the
+# LICENSE file in the root directory of this source tree.
+"""Minesweeper environment server components."""
+from .minesweeper_environment import MinesweeperEnvironment
+__all__ = ["MinesweeperEnvironment"]

server/app.py ADDED Viewed

	@@ -0,0 +1,35 @@

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the BSD-style license found in the
+# LICENSE file in the root directory of this source tree.
+"""FastAPI application for the Minesweeper Environment."""
+# Support both in-repo and standalone imports
+try:
+    # In-repo imports (when running from OpenEnv repository)
+    from openenv.core.env_server import create_app
+except ImportError:
+    # Standalone imports (when environment is standalone with openenv from pip)
+    from openenv.core.env_server import create_app
+try:
+    from ..models import MinesweeperAction, MinesweeperObservation
+    from .minesweeper_environment import MinesweeperEnvironment
+except ImportError:
+    from models import MinesweeperAction, MinesweeperObservation
+    from server.minesweeper_environment import MinesweeperEnvironment
+# Create the FastAPI app
+# Pass the class (factory) instead of an instance for WebSocket session support
+app = create_app(
+    MinesweeperEnvironment,
+    MinesweeperAction,
+    MinesweeperObservation,
+    env_name="minesweeper_env"
+)
+if __name__ == "__main__":
+    import uvicorn
+    uvicorn.run(app, host="0.0.0.0", port=8000)

server/build_docker.sh ADDED Viewed

	@@ -0,0 +1,48 @@

+#!/bin/bash
+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the BSD-style license found in the
+# LICENSE file in the root directory of this source tree.
+# Script to build the Minesweeper environment Docker image
+# Usage: ./build_docker.sh [tag]
+set -e
+TAG="${1:-latest}"
+IMAGE_NAME="minesweeper-env:${TAG}"
+echo "🐳 Building Minesweeper Environment Docker Image"
+echo "================================================"
+echo "Image: $IMAGE_NAME"
+echo ""
+# Get script directory
+SCRIPT_DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )"
+# Navigate to OpenEnv root (4 levels up from server/)
+OPENENV_ROOT="$(cd "$SCRIPT_DIR/../../../.." && pwd)"
+echo "📁 OpenEnv root: $OPENENV_ROOT"
+echo ""
+# Build Minesweeper environment image
+echo "⏳ Building..."
+docker build \
+    -f "$SCRIPT_DIR/Dockerfile" \
+    -t "$IMAGE_NAME" \
+    "$OPENENV_ROOT"
+if [ $? -eq 0 ]; then
+    echo ""
+    echo "✅ Build successful!"
+    echo ""
+    echo "🚀 Run with:"
+    echo "  docker run -p 8000:8000 $IMAGE_NAME"
+    echo ""
+else
+    echo ""
+    echo "❌ Build failed!"
+    exit 1
+fi

server/minesweeper_environment.py ADDED Viewed

	@@ -0,0 +1,338 @@

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the BSD-style license found in the
+# LICENSE file in the root directory of this source tree.
+"""
+Minesweeper Environment Implementation.
+A Minesweeper game environment where agents must reveal cells and place flags
+to identify mines on a grid board without triggering any mines.
+"""
+import random
+from typing import Any, Dict, List, Optional, Set, Tuple
+from uuid import uuid4
+try:
+    from ..models import (
+        MinesweeperAction,
+        MinesweeperObservation,
+        GameStatus,
+        MinesweeperState,
+    )
+except ImportError:
+    from models import (
+        MinesweeperAction,
+        MinesweeperObservation,
+        GameStatus,
+        MinesweeperState,
+    )
+# Support both in-repo and standalone imports
+try:
+    # In-repo imports (when running from OpenEnv repository)
+    from openenv.core.env_server.interfaces import Environment
+    from openenv.core.env_server.types import State
+except ImportError:
+    # Standalone imports (when environment is standalone with openenv from pip)
+    from openenv.core.env_server.interfaces import Environment
+    from openenv.core.env_server.types import State
+class MinesweeperEnvironment(Environment):
+    """
+    Minesweeper game environment implementation for Reinforcement Learning.
+    The environment consists of a grid with hidden mines. The agent can reveal cells or place flags.
+    The goal is to reveal all non-mine cells without triggering a mine.
+    The agent must:
+    - Reveal cells to uncover numbers indicating adjacent mines.
+    - Place flags on suspected mine locations.
+    The game ends when all non-mine cells are revealed (win) or a mine is revealed (loss).
+    Observation encoding:
+        -1: unrevealed
+        0-8: number of adjacent mines (if revealed)
+        'F': flagged cell
+        '*': mine (only revealed if game is lost)
+    Example:
+        >>> env = MinesweeperEnvironment(height=5, width=5, num_mines=5)
+        >>> obs = env.reset()
+        >>> action = MinesweeperAction(row=2, col=3, action_type='reveal')
+    """
+    def __init__(self, height: int = 5, width: int = 5, num_mines: int = 5):
+        """Initialize the minesweeper_env environment.
+        Args:
+            height: Height of the minesweeper board.
+            width: Width of the minesweeper board.
+            num_mines: Number of mines to place on the board.
+        """
+        self.height = height
+        self.width = width
+        self.num_mines = num_mines
+        self._state = State(episode_id=str(uuid4()), step_count=0)
+        self._reset_count = 0
+        # Internal game state
+        self._mine_positions: Set[Tuple[int, int]] = set()
+        self._revealed_cells: Set[Tuple[int, int]] = set()
+        self._flags_placed: Set[Tuple[int, int]] = set()
+        self._mine_counts: List[List[int]] = [[0 for _ in range(width)] for _ in range(height)]
+        self._game_status = GameStatus.ONGOING
+        # Auto-reset so the board is playable immediately
+        self.reset()
+    def reset(self) -> MinesweeperObservation:
+        """
+        Reset the environment and starts a new game.
+        Returns:
+            MinesweeperObservation with initial board state
+        """
+        self._state = State(episode_id=str(uuid4()), step_count=0)
+        self._reset_count += 1
+        # Reset internal game state
+        self._revealed_cells.clear()
+        self._flags_placed.clear()
+        self._game_status = GameStatus.ONGOING
+        # Place mines randomly
+        self._place_mines()
+        # Compute mine counts for each cell
+        self._compute_mine_counts()
+        return self._create_observation(
+            done=False,
+            reward=0.0,
+        )
+    def step(self, action: MinesweeperAction) -> MinesweeperObservation:  # type: ignore[override]
+        """
+        Execute a step in the environment by performing the given action.
+        Args:
+            action: MinesweeperAction specifying row, col and action_type
+        Returns:
+            MinesweeperObservation with updated board state and reward
+        """
+        self._state.step_count += 1
+        row, col = action.row, action.col
+        # Validate action
+        if not self._is_valid_position(row, col):
+            # Invalid action or game already over
+            return self._create_observation(
+                done=self._game_status != GameStatus.ONGOING,
+                reward=-0.1,
+                metadata={"error": "Invalid action"},
+            )
+        # If game already over, no further actions allowed
+        if self._game_status != GameStatus.ONGOING:
+            return self._create_observation(
+                done=True,
+                reward=0.0,
+                metadata={"info": "Game already over"},
+            )
+        reward = 0.0
+        if action.action_type == "reveal":
+            reward = self._reveal_cell(row, col)
+        elif action.action_type == "flag":
+            reward = self._toggle_flag(row, col)
+        else:
+            reward = -0.1  # Invalid action type
+        self._check_win_condition()
+        return self._create_observation(
+            done=self._game_status != GameStatus.ONGOING,
+            reward=reward,
+        )
+    def _place_mines(self) -> None:
+        """Randomly place mines on the board."""
+        self._mine_positions.clear()
+        while len(self._mine_positions) < self.num_mines:
+            r = random.randint(0, self.height - 1)
+            c = random.randint(0, self.width - 1)
+            self._mine_positions.add((r, c))
+    def _compute_mine_counts(self) -> None:
+        """Compute the number of adjacent mines for each cell."""
+        self._mine_counts = [[0 for _ in range(self.width)] for _ in range(self.height)]
+        for row in range(self.height):
+            for col in range(self.width):
+                if (row, col) not in self._mine_positions:
+                    count = self._count_adjacent_mines(row, col)
+                    self._mine_counts[row][col] = count
+    def _count_adjacent_mines(self, row: int, col: int) -> int:
+        """Count the number of mines adjacent to the given cell."""
+        count = 0
+        for dr in [-1, 0, 1]:
+            for dc in [-1, 0, 1]:
+                if dr == 0 and dc == 0:
+                    continue
+                r, c = row + dr, col + dc
+                if self._is_valid_position(r, c) and (r, c) in self._mine_positions:
+                    count += 1
+        return count
+    def _reveal_cell(self, row: int, col: int) -> float:
+        """Reveal the cell at (row, col). Returns the reward for the action."""
+        if (row, col) in self._revealed_cells or (row, col) in self._flags_placed:
+            return -0.05  # Penalty for revealing already revealed or flagged cell
+        if (row, col) in self._mine_positions:
+            self._game_status = GameStatus.LOST
+            self._revealed_cells.add((row, col))
+            return -10.0  # Penalty for hitting a mine
+        # Reveal the cell and potentially adjacent cells if count is 0
+        self._reveal_recursive(row, col)
+        return 1.0  # Small reward for safe reveal
+    def _reveal_recursive(self, row: int, col: int) -> None:
+        """Recursively reveal cells with 0 adjacent mines."""
+        if not self._is_valid_position(row, col):
+            return
+        if (row, col) in self._revealed_cells or (row, col) in self._flags_placed:
+            return
+        if (row, col) in self._mine_positions:
+            return
+        self._revealed_cells.add((row, col))
+        if self._mine_counts[row][col] == 0:
+            for dr in [-1, 0, 1]:
+                for dc in [-1, 0, 1]:
+                    if dr == 0 and dc == 0:
+                        continue
+                    self._reveal_recursive(row + dr, col + dc)
+    def _toggle_flag(self, row: int, col: int) -> float:
+        """Toggle a flag on the cell at (row, col). Returns the reward for the action."""
+        if (row, col) in self._revealed_cells:
+            return -0.05  # Penalty for flagging a revealed cell
+        if (row, col) in self._flags_placed:
+            self._flags_placed.remove((row, col))
+            return 0.0  # No penalty for removing a flag
+        else:
+            self._flags_placed.add((row, col))
+            if (row, col) in self._mine_positions:
+                return 0.5  # Small reward for correctly flagging a mine
+            return 0.0  # No reward for flagging a non-mine cell
+    def _check_win_condition(self) -> None:
+        """Check if the game has been won."""
+        total_cells = self.height * self.width
+        revealed_count = len(self._revealed_cells)
+        if revealed_count == total_cells - self.num_mines:
+            self._game_status = GameStatus.WON
+    def _is_valid_position(self, row: int, col: int) -> bool:
+        """Check if the given (row, col) is within board bounds."""
+        return 0 <= row < self.height and 0 <= col < self.width
+    def _create_observation(
+        self,
+        done: bool,
+        reward: Optional[float] = None,
+        metadata: Optional[Dict[str, Any]] = None,
+    ) -> MinesweeperObservation:
+        """Create the current observation of the board.
+        Args:
+            done: Whether the episode is done.
+            reward: Reward obtained from the last action.
+            metadata: Additional metadata to include.
+        Returns:
+            MinesweeperObservation representing the current board state.
+        """
+        board = []
+        for r in range(self.height):
+            row = []
+            for c in range(self.width):
+                if (r, c) in self._revealed_cells:
+                    if (r, c) in self._mine_positions:
+                        row.append('*')
+                    else:
+                        row.append(self._mine_counts[r][c])
+                elif (r, c) in self._flags_placed:
+                    row.append('F')
+                else:
+                    row.append(-1)
+            board.append(row)
+        return MinesweeperObservation(
+            board=board,
+            num_mines=self.num_mines,
+            flags_placed=len(self._flags_placed),
+            cells_revealed=len(self._revealed_cells),
+            game_status=self._game_status,
+            done=done,
+            reward=reward,
+            metadata=metadata or {},
+        )
+    @property
+    def state(self) -> State:
+        """
+        Get the current environment state.
+        Returns:
+            Current State with episode_id and step_count
+        """
+        return self._state
+    def get_full_state(self) -> MinesweeperState:
+        """
+        Get the full internal state of the Minesweeper environment.
+        Returns:
+            MinesweeperState representing the full internal state
+        """
+        return MinesweeperState(
+            episode_id=self._state.episode_id or "",
+            step_count=self._state.step_count,
+            board_height=self.height,
+            board_width=self.width,
+            mine_locations=self._mine_positions,
+            revealed_cells=self._revealed_cells,
+            flags=self._flags_placed,
+            mine_counts=self._mine_counts,
+            game_status=self._game_status,
+        )
+    def get_legal_actions(self) -> List[MinesweeperAction]:
+        """
+        Get the list of legal actions available in the current state.
+        Returns:
+            List of MinesweeperAction instances representing legal actions
+        """
+        legal_actions = []
+        # If game is over, no legal actions
+        if self._game_status != GameStatus.ONGOING:
+            return legal_actions
+        for r in range(self.height):
+            for c in range(self.width):
+                if (r, c) not in self._revealed_cells and (r, c) not in self._flags_placed:
+                    legal_actions.append(MinesweeperAction(row=r, col=c, action_type="reveal"))
+                if (r, c) not in self._revealed_cells:
+                    legal_actions.append(MinesweeperAction(row=r, col=c, action_type="flag"))
+        return legal_actions