Spaces:

luccabb
/

moonfish_chess

Running

App Files Files Community

luccabb commited on Jan 22

Commit

3e1f9da

verified ·

1 Parent(s): 68a23df

Upload folder using huggingface_hub

Browse files

Files changed (15) hide show

Dockerfile +29 -0
README.md +179 -3
__init__.py +18 -0
client.py +180 -0
examples/__init__.py +0 -0
examples/basic_usage.py +128 -0
examples/openenv_training.py +134 -0
models.py +78 -0
openenv.yaml +6 -0
outputs/.gitkeep +0 -0
pyproject.toml +21 -0
server/__init__.py +5 -0
server/app.py +151 -0
server/chess_environment.py +326 -0
uv.lock +0 -0

Dockerfile ADDED Viewed

	@@ -0,0 +1,29 @@

+FROM python:3.11-slim
+WORKDIR /app
+# Install system dependencies
+RUN apt-get update && apt-get install -y --no-install-recommends \
+    gcc \
+    && rm -rf /var/lib/apt/lists/*
+# Copy the moonfish package and rl module
+COPY . /app/
+# Install dependencies
+RUN pip install --no-cache-dir \
+    chess>=1.10.0 \
+    fastapi>=0.100.0 \
+    uvicorn[standard]>=0.23.0 \
+    httpx>=0.24.0 \
+    pydantic>=2.0.0
+# Install moonfish from the local package
+RUN pip install --no-cache-dir -e /app
+# Expose port
+EXPOSE 8000
+# Run the server
+ENV ENABLE_WEB_INTERFACE=true
+CMD ["python", "-m", "uvicorn", "moonfish.rl.server.app:app", "--host", "0.0.0.0", "--port", "8000"]

README.md CHANGED Viewed

@@ -1,10 +1,186 @@
 ---
 title: Moonfish Chess
-emoji: 🌖
-colorFrom: indigo
 colorTo: blue
 sdk: docker
 pinned: false
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 ---
 title: Moonfish Chess
+emoji: ♟️
+colorFrom: gray
 colorTo: blue
 sdk: docker
 pinned: false
+license: mit
+base_path: /web
 ---
+# Chess OpenEnv
+A chess environment for reinforcement learning, built on [moonfish](https://github.com/luccab/moonfish) and compatible with the [OpenEnv](https://github.com/meta-pytorch/OpenEnv) framework.
+## Features
+- **Full Chess Rules**: Legal move generation, checkmate/stalemate detection, draw conditions
+- **Position Evaluation**: PeSTO evaluation function from moonfish for reward shaping
+- **OpenEnv Compatible**: Standard `reset()`, `step()`, `state()` interface
+- **Configurable Rewards**: Win/loss/draw payoffs, illegal move penalties, evaluation-based rewards
+- **HTTP API**: FastAPI server for remote training and multi-agent setups
+- **Containerized**: Docker support for reproducible deployments
+## Quick Start
+### Local Usage (No Server)
+```python
+from moonfish.rl import ChessEnvironment, ChessAction
+# Create environment
+env = ChessEnvironment()
+# Start a new game
+obs = env.reset()
+print(f"Legal moves: {obs.legal_moves}")
+# Make a move
+action = ChessAction(move="e2e4")
+obs, reward, done = env.step(action)
+print(f"FEN: {obs.fen}")
+print(f"Reward: {reward}, Done: {done}")
+```
+### Client-Server Usage
+Start the server:
+```bash
+cd moonfish/rl
+python -m uvicorn server.app:app --host 0.0.0.0 --port 8000
+```
+Connect with the client:
+```python
+from moonfish.rl import ChessEnvClient, ChessAction
+client = ChessEnvClient("http://localhost:8000")
+obs = client.reset()
+result = client.step(ChessAction(move="e2e4"))
+print(f"Reward: {result.reward}")
+client.close()
+```
+## Data Models
+### ChessAction
+```python
+@dataclass
+class ChessAction:
+    move: str  # UCI format: "e2e4", "e7e8q" (promotion)
+```
+### ChessObservation
+```python
+@dataclass
+class ChessObservation:
+    fen: str              # Board state in FEN notation
+    legal_moves: List[str]  # Available moves in UCI format
+    is_check: bool        # Current player in check
+    done: bool            # Game over
+    reward: Optional[float]  # Terminal reward
+    result: Optional[str]    # "1-0", "0-1", "1/2-1/2"
+    metadata: Dict[str, Any]  # Evaluation, material, etc.
+```
+### ChessState
+```python
+@dataclass
+class ChessState:
+    episode_id: str        # Unique game identifier
+    step_count: int        # Half-moves played
+    current_player: str    # "white" or "black"
+    fen: str               # Current position
+    move_history: List[str]  # All moves in UCI format
+```
+## Reward Configuration
+```python
+from moonfish.rl import ChessEnvironment, RewardConfig
+config = RewardConfig(
+    win=1.0,           # Reward for winning
+    loss=-1.0,         # Penalty for losing
+    draw=0.0,          # Reward for draw
+    illegal_move=-0.1, # Penalty for illegal moves
+    use_evaluation=True,  # Enable intermediate rewards
+    evaluation_scale=0.0001,  # Scale for eval-based rewards
+)
+env = ChessEnvironment(reward_config=config)
+```
+## Docker
+Build and run:
+```bash
+docker build -t chess-openenv .
+docker run -p 8000:8000 chess-openenv
+```
+## Integration with RL Frameworks
+### With TorchRL
+```python
+from moonfish.rl import ChessEnvironment, ChessAction
+class ChessTorchRLWrapper:
+    def __init__(self):
+        self.env = ChessEnvironment()
+    def reset(self):
+        obs = self.env.reset()
+        return self._obs_to_tensor(obs)
+    def step(self, action_idx):
+        move = self._idx_to_move(action_idx)
+        obs, reward, done = self.env.step(ChessAction(move=move))
+        return self._obs_to_tensor(obs), reward, done
+```
+### With OpenEnv Training Loop
+```python
+from moonfish.rl import make_env, ChessAction
+import random
+client = make_env("http://localhost:8000")
+for episode in range(100):
+    obs = client.reset()
+    episode_reward = 0
+    while not obs.done:
+        # Your policy here (random for demo)
+        move = random.choice(obs.legal_moves)
+        result = client.step(ChessAction(move=move))
+        obs = result.observation
+        episode_reward += result.reward
+    print(f"Episode {episode}: reward={episode_reward}")
+client.close()
+```
+## API Endpoints
+| Endpoint | Method | Description |
+|----------|--------|-------------|
+| `/health` | GET | Health check |
+| `/metadata` | GET | Environment configuration |
+| `/reset` | POST | Start new episode |
+| `/step` | POST | Execute a move |
+| `/state` | GET | Get episode metadata |
+## License
+MIT - See the moonfish repository for full license details.

__init__.py ADDED Viewed

	@@ -0,0 +1,18 @@

+"""Chess OpenEnv - A chess environment for reinforcement learning."""
+from .models import ChessAction, ChessObservation, ChessState, RewardConfig
+from .client import ChessEnvClient, StepResult, make_env
+from .server.chess_environment import ChessEnvironment
+__all__ = [
+    "ChessAction",
+    "ChessObservation",
+    "ChessState",
+    "RewardConfig",
+    "ChessEnvClient",
+    "StepResult",
+    "make_env",
+    "ChessEnvironment",
+]
+__version__ = "1.0.0"

client.py ADDED Viewed

	@@ -0,0 +1,180 @@

+"""Client for the Chess OpenEnv environment."""
+from dataclasses import dataclass
+from typing import Any, Dict, List, Optional
+import httpx
+from .models import ChessAction, ChessObservation, ChessState
+@dataclass
+class StepResult:
+    """Result from a step() call."""
+    observation: ChessObservation
+    reward: float
+    done: bool
+class ChessEnvClient:
+    """
+    HTTP client for the Chess OpenEnv environment.
+    Provides a simple interface to interact with a remote chess environment
+    server for reinforcement learning.
+    Example usage:
+        client = ChessEnvClient("http://localhost:8000")
+        obs = client.reset()
+        print(f"Legal moves: {obs.legal_moves}")
+        result = client.step(ChessAction(move="e2e4"))
+        print(f"Reward: {result.reward}, Done: {result.done}")
+        state = client.state()
+        print(f"Move count: {state.step_count}")
+        client.close()
+    """
+    def __init__(self, base_url: str = "http://localhost:8000", timeout: float = 30.0):
+        """
+        Initialize the chess environment client.
+        Args:
+            base_url: URL of the chess environment server
+            timeout: Request timeout in seconds
+        """
+        self.base_url = base_url.rstrip("/")
+        self._client = httpx.Client(timeout=timeout)
+    def reset(
+        self,
+        seed: Optional[int] = None,
+        episode_id: Optional[str] = None,
+        fen: Optional[str] = None,
+    ) -> ChessObservation:
+        """
+        Reset the environment and start a new episode.
+        Args:
+            seed: Random seed (optional)
+            episode_id: Unique episode identifier (optional)
+            fen: Starting position in FEN notation (optional)
+        Returns:
+            Initial observation of the board state
+        """
+        payload = {}
+        if seed is not None:
+            payload["seed"] = seed
+        if episode_id is not None:
+            payload["episode_id"] = episode_id
+        if fen is not None:
+            payload["fen"] = fen
+        response = self._client.post(f"{self.base_url}/reset", json=payload)
+        response.raise_for_status()
+        data = response.json()
+        return self._parse_observation(data)
+    def step(self, action: ChessAction) -> StepResult:
+        """
+        Execute a move in the environment.
+        Args:
+            action: The chess action (move in UCI format)
+        Returns:
+            StepResult with observation, reward, and done flag
+        """
+        payload = {"move": action.move}
+        response = self._client.post(f"{self.base_url}/step", json=payload)
+        response.raise_for_status()
+        data = response.json()
+        return StepResult(
+            observation=self._parse_observation(data["observation"]),
+            reward=data["reward"],
+            done=data["done"],
+        )
+    def state(self) -> ChessState:
+        """
+        Get the current episode state.
+        Returns:
+            Current episode state with metadata
+        """
+        response = self._client.get(f"{self.base_url}/state")
+        response.raise_for_status()
+        data = response.json()
+        return ChessState(
+            episode_id=data["episode_id"],
+            step_count=data["step_count"],
+            current_player=data["current_player"],
+            fen=data["fen"],
+            move_history=data.get("move_history", []),
+        )
+    def metadata(self) -> Dict[str, Any]:
+        """
+        Get environment metadata.
+        Returns:
+            Dictionary with environment configuration
+        """
+        response = self._client.get(f"{self.base_url}/metadata")
+        response.raise_for_status()
+        return response.json()
+    def health(self) -> bool:
+        """
+        Check if the server is healthy.
+        Returns:
+            True if server is responding
+        """
+        try:
+            response = self._client.get(f"{self.base_url}/health")
+            return response.status_code == 200
+        except Exception:
+            return False
+    def close(self) -> None:
+        """Close the HTTP client."""
+        self._client.close()
+    def __enter__(self):
+        return self
+    def __exit__(self, exc_type, exc_val, exc_tb):
+        self.close()
+    def _parse_observation(self, data: Dict[str, Any]) -> ChessObservation:
+        """Parse observation from JSON response."""
+        return ChessObservation(
+            fen=data["fen"],
+            legal_moves=data["legal_moves"],
+            is_check=data.get("is_check", False),
+            done=data.get("done", False),
+            reward=data.get("reward"),
+            result=data.get("result"),
+            metadata=data.get("metadata", {}),
+        )
+# Convenience function for quick usage
+def make_env(base_url: str = "http://localhost:8000") -> ChessEnvClient:
+    """
+    Create a chess environment client.
+    Args:
+        base_url: URL of the chess environment server
+    Returns:
+        ChessEnvClient instance
+    """
+    return ChessEnvClient(base_url)

examples/__init__.py ADDED Viewed

File without changes

examples/basic_usage.py ADDED Viewed

	@@ -0,0 +1,128 @@

+"""
+Basic usage example for the Chess OpenEnv environment.
+This example shows how to use the chess environment both locally
+(without a server) and via the HTTP client.
+"""
+import random
+from moonfish.rl import ChessAction, ChessEnvironment, RewardConfig
+def play_random_game():
+    """Play a game with random moves to demonstrate the environment."""
+    print("=== Playing a random game ===\n")
+    # Create environment
+    env = ChessEnvironment()
+    # Reset to start a new game
+    obs = env.reset(episode_id="random_game_001")
+    print(f"Initial position: {obs.fen}")
+    print(f"Legal moves: {len(obs.legal_moves)} available")
+    print()
+    move_count = 0
+    total_reward = 0.0
+    while not obs.done:
+        # Pick a random legal move
+        move = random.choice(obs.legal_moves)
+        action = ChessAction(move=move)
+        # Execute the move
+        obs, reward, done = env.step(action)
+        total_reward += reward
+        move_count += 1
+        if move_count <= 5 or done:
+            print(f"Move {move_count}: {move}")
+            print(f"  FEN: {obs.fen}")
+            print(f"  Check: {obs.is_check}, Reward: {reward}")
+            if move_count == 5 and not done:
+                print("  ... (continuing)")
+                print()
+    print(f"\nGame finished after {move_count} moves")
+    print(f"Result: {obs.result}")
+    print(f"Total reward: {total_reward}")
+    # Check final state
+    state = env.state
+    print(f"Episode ID: {state.episode_id}")
+    print(f"Move history: {state.move_history[:10]}...")
+    env.close()
+def play_specific_opening():
+    """Demonstrate playing specific moves (Italian Game opening)."""
+    print("\n=== Playing the Italian Game opening ===\n")
+    env = ChessEnvironment()
+    obs = env.reset()
+    opening_moves = ["e2e4", "e7e5", "g1f3", "b8c6", "f1c4"]
+    for i, move in enumerate(opening_moves):
+        action = ChessAction(move=move)
+        obs, reward, done = env.step(action)
+        print(f"{i+1}. {move} -> Check: {obs.is_check}")
+    print(f"\nPosition after opening: {obs.fen}")
+    print(f"Legal moves for Black: {len(obs.legal_moves)}")
+    print(f"Material: {obs.metadata.get('material', {})}")
+    env.close()
+def demonstrate_illegal_move():
+    """Show how illegal moves are handled."""
+    print("\n=== Handling illegal moves ===\n")
+    env = ChessEnvironment()
+    obs = env.reset()
+    # Try an illegal move
+    illegal_action = ChessAction(move="e2e5")  # Can't move pawn 3 squares
+    obs, reward, done = env.step(illegal_action)
+    print(f"Attempted illegal move: e2e5")
+    print(f"Reward: {reward}")  # Should be negative
+    print(f"Error: {obs.metadata.get('error', 'None')}")
+    print(f"Done: {done}")  # Game continues
+    env.close()
+def with_evaluation_rewards():
+    """Show evaluation-based intermediate rewards."""
+    print("\n=== Evaluation-based rewards ===\n")
+    config = RewardConfig(
+        use_evaluation=True,
+        evaluation_scale=0.0001,  # Scale down the centipawn values
+    )
+    env = ChessEnvironment(reward_config=config)
+    obs = env.reset()
+    # Play a few moves and observe evaluation changes
+    moves = ["e2e4", "d7d5", "e4d5"]  # White wins a pawn
+    for move in moves:
+        action = ChessAction(move=move)
+        obs, reward, done = env.step(action)
+        eval_score = obs.metadata.get("evaluation", 0)
+        print(f"Move: {move}, Reward: {reward:.4f}, Eval: {eval_score:.1f}")
+    env.close()
+if __name__ == "__main__":
+    play_random_game()
+    play_specific_opening()
+    demonstrate_illegal_move()
+    with_evaluation_rewards()

examples/openenv_training.py ADDED Viewed

	@@ -0,0 +1,134 @@

+"""
+OpenEnv Training Example
+This example shows how to use the chess environment with the OpenEnv
+client-server pattern, which is useful for:
+- Distributed training across machines
+- Isolated environment execution
+- Integration with OpenEnv-compatible training frameworks
+Usage:
+    # Terminal 1: Start the server
+    cd moonfish/rl
+    python -m uvicorn server.app:app --host 0.0.0.0 --port 8000
+    # Terminal 2: Run this training script
+    python examples/openenv_training.py
+"""
+import random
+from moonfish.rl import ChessEnvClient, ChessAction, make_env
+def random_policy(legal_moves: list[str]) -> str:
+    """Simple random policy for demonstration."""
+    return random.choice(legal_moves)
+def train_with_remote_env():
+    """
+    Training loop using the HTTP client (OpenEnv pattern).
+    This pattern is useful when:
+    - Environment runs on a different machine
+    - You need environment isolation (sandboxing)
+    - You're using OpenEnv-compatible training frameworks
+    """
+    # Connect to the environment server
+    # For local testing, start the server first:
+    #   python -m uvicorn moonfish.rl.server.app:app --port 8000
+    client = make_env("http://localhost:8000")
+    # Check server health
+    if not client.health():
+        print("Server not running. Start it with:")
+        print("  python -m uvicorn moonfish.rl.server.app:app --port 8000")
+        return
+    print("Connected to chess environment server")
+    print(f"Metadata: {client.metadata()}")
+    print()
+    # Training loop
+    num_episodes = 5
+    for episode in range(num_episodes):
+        # Reset environment
+        obs = client.reset()
+        episode_reward = 0.0
+        print(f"Episode {episode + 1}")
+        while not obs.done:
+            # Select action using policy
+            move = random_policy(obs.legal_moves)
+            action = ChessAction(move=move)
+            # Step environment
+            result = client.step(action)
+            obs = result.observation
+            episode_reward += result.reward
+            # Safety limit
+            state = client.state()
+            if state.step_count > 200:
+                print("  (truncated at 200 moves)")
+                break
+        print(f"  Moves: {client.state().step_count}, "
+              f"Result: {obs.result or 'ongoing'}, "
+              f"Reward: {episode_reward:.2f}")
+    # Cleanup
+    client.close()
+    print("\nTraining complete!")
+def train_with_local_env():
+    """
+    Training loop using local environment (no server needed).
+    This is simpler and faster for single-machine training.
+    """
+    from moonfish.rl import ChessEnvironment
+    env = ChessEnvironment(opponent="random")
+    print("Training with local environment (random opponent)")
+    print()
+    num_episodes = 5
+    for episode in range(num_episodes):
+        obs = env.reset()
+        episode_reward = 0.0
+        while not obs.done:
+            move = random_policy(obs.legal_moves)
+            obs, reward, done = env.step(ChessAction(move=move))
+            episode_reward += reward
+            if env.state.step_count > 200:
+                break
+        print(f"Episode {episode + 1}: "
+              f"Moves={env.state.step_count}, "
+              f"Result={obs.result or 'ongoing'}, "
+              f"Reward={episode_reward:.2f}")
+    env.close()
+    print("\nTraining complete!")
+if __name__ == "__main__":
+    import sys
+    if "--remote" in sys.argv:
+        print("=== Remote Environment (OpenEnv HTTP Client) ===\n")
+        train_with_remote_env()
+    else:
+        print("=== Local Environment ===\n")
+        train_with_local_env()
+        print("\nTo test with HTTP client, run:")
+        print("  1. Start server: python -m uvicorn moonfish.rl.server.app:app --port 8000")
+        print("  2. Run: python examples/openenv_training.py --remote")

models.py ADDED Viewed

	@@ -0,0 +1,78 @@

+"""Data models for the Chess OpenEnv environment."""
+from dataclasses import dataclass, field
+from typing import Any, Dict, List, Optional, Union
+@dataclass
+class ChessAction:
+    """
+    Represents a chess move action.
+    Attributes:
+        move: UCI format move string (e.g., "e2e4", "e7e8q" for promotion)
+    """
+    move: str
+@dataclass
+class ChessObservation:
+    """
+    Represents the observable state of the chess environment.
+    Attributes:
+        fen: Board position in FEN notation
+        legal_moves: List of legal moves in UCI format
+        is_check: Whether the current player is in check
+        done: Whether the episode has ended
+        reward: Reward value (1.0 for win, -1.0 for loss, 0.0 for draw, None otherwise)
+        result: Game result string if game is over (e.g., "1-0", "0-1", "1/2-1/2")
+        metadata: Additional information about the position
+    """
+    fen: str
+    legal_moves: List[str]
+    is_check: bool = False
+    done: bool = False
+    reward: Optional[float] = None
+    result: Optional[str] = None
+    metadata: Dict[str, Any] = field(default_factory=dict)
+@dataclass
+class ChessState:
+    """
+    Tracks episode metadata for the chess environment.
+    Attributes:
+        episode_id: Unique identifier for the current episode
+        step_count: Number of moves (half-moves) played in current episode
+        current_player: "white" or "black"
+        fen: Current position in FEN notation
+        move_history: List of moves played in UCI format
+    """
+    episode_id: str
+    step_count: int
+    current_player: str
+    fen: str
+    move_history: List[str] = field(default_factory=list)
+@dataclass
+class RewardConfig:
+    """
+    Configuration for reward shaping in the chess environment.
+    Attributes:
+        win: Reward for winning the game
+        loss: Reward for losing the game
+        draw: Reward for drawing the game
+        illegal_move: Penalty for attempting an illegal move
+        use_evaluation: Whether to include position evaluation in rewards
+        evaluation_scale: Scale factor for evaluation-based rewards
+    """
+    win: float = 1.0
+    loss: float = -1.0
+    draw: float = 0.0
+    illegal_move: float = -0.1
+    use_evaluation: bool = False
+    evaluation_scale: float = 0.001

openenv.yaml ADDED Viewed

	@@ -0,0 +1,6 @@

+spec_version: 1
+name: moonfish_chess
+type: space
+runtime: fastapi
+app: server.app:app
+port: 8000

outputs/.gitkeep ADDED Viewed

File without changes

pyproject.toml ADDED Viewed

	@@ -0,0 +1,21 @@

+[project]
+name = "moonfish-chess-env"
+version = "1.0.0"
+description = "Chess RL environment using moonfish engine - OpenEnv compatible"
+requires-python = ">=3.10"
+dependencies = [
+    "chess>=1.10.0",
+    "fastapi>=0.100.0",
+    "uvicorn[standard]>=0.23.0",
+    "httpx>=0.24.0",
+    "pydantic>=2.0.0",
+    "openenv>=0.1.0",
+]
+[project.scripts]
+server = "server.app:main"
+[build-system]
+requires = ["hatchling"]
+build-backend = "hatchling.build"

server/__init__.py ADDED Viewed

	@@ -0,0 +1,5 @@

+"""Chess OpenEnv server module."""
+from .chess_environment import ChessEnvironment
+__all__ = ["ChessEnvironment"]

server/app.py ADDED Viewed

	@@ -0,0 +1,151 @@

+"""FastAPI server for the Chess OpenEnv environment."""
+from typing import Any, Dict, Optional
+from dataclasses import asdict
+from fastapi import FastAPI, HTTPException
+from pydantic import BaseModel
+from ..models import ChessAction, RewardConfig
+from .chess_environment import ChessEnvironment
+# Pydantic models for API requests/responses
+class ResetRequest(BaseModel):
+    seed: Optional[int] = None
+    episode_id: Optional[str] = None
+    fen: Optional[str] = None
+class StepRequest(BaseModel):
+    move: str
+class ObservationResponse(BaseModel):
+    fen: str
+    legal_moves: list[str]
+    is_check: bool = False
+    done: bool = False
+    reward: Optional[float] = None
+    result: Optional[str] = None
+    metadata: Dict[str, Any] = {}
+class StepResponse(BaseModel):
+    observation: ObservationResponse
+    reward: float
+    done: bool
+class StateResponse(BaseModel):
+    episode_id: str
+    step_count: int
+    current_player: str
+    fen: str
+    move_history: list[str]
+# Create FastAPI app
+app = FastAPI(
+    title="Chess OpenEnv",
+    description="Chess environment for reinforcement learning using moonfish",
+    version="1.0.0",
+)
+# Global environment instance (for single-player mode)
+# For multi-player, you'd want a session manager
+_env: Optional[ChessEnvironment] = None
+def get_env() -> ChessEnvironment:
+    """Get or create environment instance."""
+    global _env
+    if _env is None:
+        _env = ChessEnvironment()
+    return _env
+@app.get("/health")
+def health():
+    """Health check endpoint."""
+    return {"status": "ok"}
+@app.get("/metadata")
+def metadata():
+    """Get environment metadata."""
+    return get_env().get_metadata()
+@app.post("/reset", response_model=ObservationResponse)
+def reset(request: ResetRequest):
+    """Reset the environment and start a new episode."""
+    env = get_env()
+    obs = env.reset(
+        seed=request.seed,
+        episode_id=request.episode_id,
+        fen=request.fen,
+    )
+    return ObservationResponse(
+        fen=obs.fen,
+        legal_moves=obs.legal_moves,
+        is_check=obs.is_check,
+        done=obs.done,
+        reward=obs.reward,
+        result=obs.result,
+        metadata=obs.metadata,
+    )
+@app.post("/step", response_model=StepResponse)
+def step(request: StepRequest):
+    """Execute a move and return the result."""
+    env = get_env()
+    try:
+        action = ChessAction(move=request.move)
+        obs, reward, done = env.step(action)
+    except RuntimeError as e:
+        raise HTTPException(status_code=400, detail=str(e))
+    return StepResponse(
+        observation=ObservationResponse(
+            fen=obs.fen,
+            legal_moves=obs.legal_moves,
+            is_check=obs.is_check,
+            done=obs.done,
+            reward=obs.reward,
+            result=obs.result,
+            metadata=obs.metadata,
+        ),
+        reward=reward,
+        done=done,
+    )
+@app.get("/state", response_model=StateResponse)
+def state():
+    """Get current episode state."""
+    env = get_env()
+    try:
+        s = env.state
+    except RuntimeError as e:
+        raise HTTPException(status_code=400, detail=str(e))
+    return StateResponse(
+        episode_id=s.episode_id,
+        step_count=s.step_count,
+        current_player=s.current_player,
+        fen=s.fen,
+        move_history=s.move_history,
+    )
+def main():
+    """Entry point for running the server."""
+    import uvicorn
+    uvicorn.run(app, host="0.0.0.0", port=8000)
+if __name__ == "__main__":
+    main()

server/chess_environment.py ADDED Viewed

	@@ -0,0 +1,326 @@

+"""Chess environment for OpenEnv using moonfish."""
+import random
+import uuid
+from typing import Any, Dict, List, Optional, Tuple
+import chess
+from moonfish.psqt import board_evaluation, MG_PIECE_VALUES, count_pieces, get_phase
+from moonfish.lib import search_move
+from ..models import ChessAction, ChessObservation, ChessState, RewardConfig
+class ChessEnvironment:
+    """
+    Chess environment implementing the OpenEnv interface.
+    Uses python-chess for game logic and moonfish for position evaluation.
+    Designed for RL training where an agent plays as one color against
+    an opponent (which can be random, moonfish engine, or self-play).
+    """
+    def __init__(
+        self,
+        reward_config: Optional[RewardConfig] = None,
+        max_moves: int = 500,
+        agent_color: Optional[bool] = None,  # None = alternate, True = White, False = Black
+        opponent: Optional[str] = None,  # None = self-play, "moonfish" = moonfish engine, "random" = random
+        opponent_depth: int = 2,  # Search depth for moonfish opponent
+    ):
+        """
+        Initialize the chess environment.
+        Args:
+            reward_config: Configuration for reward shaping
+            max_moves: Maximum half-moves before draw (prevents infinite games)
+            agent_color: Which color the RL agent plays (None = alternates each episode)
+            opponent: Opponent type - None (self-play), "moonfish", or "random"
+            opponent_depth: Search depth when using moonfish as opponent
+        """
+        self.reward_config = reward_config or RewardConfig()
+        self.max_moves = max_moves
+        self.agent_color_setting = agent_color
+        self.opponent = opponent
+        self.opponent_depth = opponent_depth
+        # Will be set on reset
+        self._board: Optional[chess.Board] = None
+        self._state: Optional[ChessState] = None
+        self._agent_color: bool = chess.WHITE
+    def reset(
+        self,
+        seed: Optional[int] = None,
+        episode_id: Optional[str] = None,
+        fen: Optional[str] = None,
+        **kwargs
+    ) -> ChessObservation:
+        """
+        Initialize a new chess game episode.
+        Args:
+            seed: Random seed (unused for now, chess is deterministic)
+            episode_id: Unique identifier for this episode
+            fen: Optional starting position in FEN notation
+        Returns:
+            Initial observation of the board state
+        """
+        # Create new board
+        if fen:
+            self._board = chess.Board(fen)
+        else:
+            self._board = chess.Board()
+        # Determine agent color
+        if self.agent_color_setting is None:
+            # Alternate each episode based on episode_id hash
+            if episode_id:
+                self._agent_color = hash(episode_id) % 2 == 0
+            else:
+                self._agent_color = chess.WHITE
+        else:
+            self._agent_color = self.agent_color_setting
+        # Initialize state
+        self._state = ChessState(
+            episode_id=episode_id or uuid.uuid4().hex,
+            step_count=0,
+            current_player="white" if self._board.turn else "black",
+            fen=self._board.fen(),
+            move_history=[],
+        )
+        # If agent plays Black and opponent is configured, opponent moves first
+        if self.opponent is not None and self._agent_color == chess.BLACK:
+            self._make_opponent_move()
+        return self._get_observation()
+    def step(
+        self,
+        action: ChessAction,
+        timeout_s: Optional[float] = None,
+        **kwargs
+    ) -> Tuple[ChessObservation, float, bool]:
+        """
+        Execute a chess move and return the resulting state.
+        Args:
+            action: The move to make in UCI format (e.g., "e2e4")
+            timeout_s: Unused timeout parameter
+        Returns:
+            Tuple of (observation, reward, done)
+        """
+        if self._board is None or self._state is None:
+            raise RuntimeError("Environment not initialized. Call reset() first.")
+        # Parse the move
+        try:
+            move = chess.Move.from_uci(action.move)
+        except ValueError:
+            # Invalid move format
+            return self._handle_illegal_move(f"Invalid move format: {action.move}")
+        # Check if move is legal
+        if move not in self._board.legal_moves:
+            return self._handle_illegal_move(f"Illegal move: {action.move}")
+        # Execute the move
+        self._board.push(move)
+        self._state.step_count += 1
+        self._state.move_history.append(action.move)
+        self._state.current_player = "white" if self._board.turn else "black"
+        self._state.fen = self._board.fen()
+        # Calculate reward and check for game end
+        reward, done = self._calculate_reward_and_done()
+        # If game not over and opponent is configured, make opponent move
+        if not done and self.opponent is not None:
+            self._make_opponent_move()
+            # Recalculate after opponent move
+            opp_reward, done = self._calculate_reward_and_done()
+            # Opponent's reward is negative of ours (zero-sum)
+            reward += -opp_reward if done else 0
+        observation = self._get_observation(done=done, reward=reward if done else None)
+        return observation, reward, done
+    @property
+    def state(self) -> ChessState:
+        """Return the current episode state."""
+        if self._state is None:
+            raise RuntimeError("Environment not initialized. Call reset() first.")
+        return self._state
+    def close(self) -> None:
+        """Clean up resources."""
+        self._board = None
+        self._state = None
+    def get_metadata(self) -> Dict[str, Any]:
+        """Return environment metadata."""
+        return {
+            "name": "chess",
+            "version": "1.0.0",
+            "max_moves": self.max_moves,
+            "reward_config": {
+                "win": self.reward_config.win,
+                "loss": self.reward_config.loss,
+                "draw": self.reward_config.draw,
+                "illegal_move": self.reward_config.illegal_move,
+                "use_evaluation": self.reward_config.use_evaluation,
+                "evaluation_scale": self.reward_config.evaluation_scale,
+            },
+        }
+    def _get_observation(
+        self,
+        done: bool = False,
+        reward: Optional[float] = None,
+        result: Optional[str] = None,
+        error: Optional[str] = None,
+    ) -> ChessObservation:
+        """Build observation from current board state."""
+        assert self._board is not None
+        legal_moves = [move.uci() for move in self._board.legal_moves]
+        metadata: Dict[str, Any] = {}
+        # Add evaluation if configured
+        if self.reward_config.use_evaluation:
+            metadata["evaluation"] = board_evaluation(self._board)
+        # Add material count
+        metadata["material"] = self._get_material_count()
+        # Add game phase (0 = opening, 256 = endgame)
+        metadata["phase"] = get_phase(self._board)
+        metadata["fullmove_number"] = self._board.fullmove_number
+        metadata["halfmove_clock"] = self._board.halfmove_clock
+        if error:
+            metadata["error"] = error
+        # Determine result string if game is over
+        if done and result is None:
+            result = self._get_result_string()
+        return ChessObservation(
+            fen=self._board.fen(),
+            legal_moves=legal_moves,
+            is_check=self._board.is_check(),
+            done=done,
+            reward=reward,
+            result=result,
+            metadata=metadata,
+        )
+    def _calculate_reward_and_done(self) -> Tuple[float, bool]:
+        """Calculate reward and check if episode is done."""
+        assert self._board is not None
+        # Check for game end
+        if self._board.is_checkmate():
+            # The side to move is checkmated, so the previous mover won
+            winner = not self._board.turn
+            if winner == self._agent_color:
+                return self.reward_config.win, True
+            else:
+                return self.reward_config.loss, True
+        if self._board.is_stalemate():
+            return self.reward_config.draw, True
+        if self._board.is_insufficient_material():
+            return self.reward_config.draw, True
+        if self._board.is_fifty_moves():
+            return self.reward_config.draw, True
+        if self._board.is_repetition(3):
+            return self.reward_config.draw, True
+        # Check move limit
+        if self._state and self._state.step_count >= self.max_moves:
+            return self.reward_config.draw, True
+        # Game continues
+        reward = 0.0
+        # Optional: Add evaluation-based intermediate rewards
+        if self.reward_config.use_evaluation:
+            eval_score = board_evaluation(self._board)
+            # Normalize evaluation to agent's perspective
+            if self._board.turn != self._agent_color:
+                eval_score = -eval_score
+            reward = eval_score * self.reward_config.evaluation_scale
+        return reward, False
+    def _handle_illegal_move(self, error_msg: str) -> Tuple[ChessObservation, float, bool]:
+        """Handle an illegal move attempt."""
+        observation = self._get_observation(done=False, error=error_msg)
+        return observation, self.reward_config.illegal_move, False
+    def _get_result_string(self) -> str:
+        """Get the game result as a string."""
+        assert self._board is not None
+        if self._board.is_checkmate():
+            return "1-0" if not self._board.turn else "0-1"
+        return "1/2-1/2"
+    def _get_material_count(self) -> Dict[str, int]:
+        """Count material for both sides using moonfish piece values."""
+        assert self._board is not None
+        # count_pieces returns [wp, bp, wn, bn, wb, bb, wr, br, wq, bq]
+        pieces = count_pieces(self._board)
+        wp, bp, wn, bn, wb, bb, wr, br, wq, bq = pieces
+        white = (
+            wp * MG_PIECE_VALUES[chess.PAWN]
+            + wn * MG_PIECE_VALUES[chess.KNIGHT]
+            + wb * MG_PIECE_VALUES[chess.BISHOP]
+            + wr * MG_PIECE_VALUES[chess.ROOK]
+            + wq * MG_PIECE_VALUES[chess.QUEEN]
+        )
+        black = (
+            bp * MG_PIECE_VALUES[chess.PAWN]
+            + bn * MG_PIECE_VALUES[chess.KNIGHT]
+            + bb * MG_PIECE_VALUES[chess.BISHOP]
+            + br * MG_PIECE_VALUES[chess.ROOK]
+            + bq * MG_PIECE_VALUES[chess.QUEEN]
+        )
+        return {"white": white, "black": black}
+    def _make_opponent_move(self) -> None:
+        """Make a move for the opponent using configured strategy."""
+        assert self._board is not None
+        assert self._state is not None
+        if not list(self._board.legal_moves):
+            return  # No legal moves (game should be over)
+        if self.opponent == "moonfish":
+            # Use moonfish engine to find best move
+            move = search_move(self._board, depth=self.opponent_depth)
+        elif self.opponent == "random":
+            # Pick a random legal move
+            move = random.choice(list(self._board.legal_moves))
+        else:
+            return  # No opponent configured
+        # Execute opponent's move
+        self._board.push(move)
+        self._state.step_count += 1
+        self._state.move_history.append(move.uci())
+        self._state.current_player = "white" if self._board.turn else "black"
+        self._state.fen = self._board.fen()

uv.lock ADDED Viewed

The diff for this file is too large to render. See raw diff