---
title: Moonfish Chess
emoji: ♟️
colorFrom: gray
colorTo: blue
sdk: docker
pinned: false
license: mit
base_path: /web
---

# Chess OpenEnv

A chess environment for reinforcement learning, built on [moonfish](https://github.com/luccab/moonfish) and compatible with the [OpenEnv](https://github.com/meta-pytorch/OpenEnv) framework.

## Features

- **Full Chess Rules**: Legal move generation, checkmate/stalemate detection, draw conditions
- **Position Evaluation**: PeSTO evaluation function from moonfish for reward shaping
- **OpenEnv Compatible**: Standard `reset()`, `step()`, `state()` interface
- **Configurable Rewards**: Win/loss/draw payoffs, illegal move penalties, evaluation-based rewards
- **HTTP API**: FastAPI server for remote training and multi-agent setups
- **Containerized**: Docker support for reproducible deployments

## Quick Start

### Local Usage (No Server)

```python
from moonfish.rl import ChessEnvironment, ChessAction

# Create environment
env = ChessEnvironment()

# Start a new game
obs = env.reset()
print(f"Legal moves: {obs.legal_moves}")

# Make a move
action = ChessAction(move="e2e4")
obs, reward, done = env.step(action)

print(f"FEN: {obs.fen}")
print(f"Reward: {reward}, Done: {done}")
```

### Client-Server Usage

Start the server:

```bash
cd moonfish/rl
python -m uvicorn server.app:app --host 0.0.0.0 --port 8000
```

Connect with the client:

```python
from moonfish.rl import ChessEnvClient, ChessAction

client = ChessEnvClient("http://localhost:8000")

obs = client.reset()
result = client.step(ChessAction(move="e2e4"))
print(f"Reward: {result.reward}")

client.close()
```

## Data Models

### ChessAction
```python
@dataclass
class ChessAction:
    move: str  # UCI format: "e2e4", "e7e8q" (promotion)
```

### ChessObservation
```python
@dataclass
class ChessObservation:
    fen: str              # Board state in FEN notation
    legal_moves: List[str]  # Available moves in UCI format
    is_check: bool        # Current player in check
    done: bool            # Game over
    reward: Optional[float]  # Terminal reward
    result: Optional[str]    # "1-0", "0-1", "1/2-1/2"
    metadata: Dict[str, Any]  # Evaluation, material, etc.
```

### ChessState
```python
@dataclass
class ChessState:
    episode_id: str        # Unique game identifier
    step_count: int        # Half-moves played
    current_player: str    # "white" or "black"
    fen: str               # Current position
    move_history: List[str]  # All moves in UCI format
```

## Reward Configuration

```python
from moonfish.rl import ChessEnvironment, RewardConfig

config = RewardConfig(
    win=1.0,           # Reward for winning
    loss=-1.0,         # Penalty for losing
    draw=0.0,          # Reward for draw
    illegal_move=-0.1, # Penalty for illegal moves
    use_evaluation=True,  # Enable intermediate rewards
    evaluation_scale=0.0001,  # Scale for eval-based rewards
)

env = ChessEnvironment(reward_config=config)
```

## Docker

Build and run:

```bash
docker build -t chess-openenv .
docker run -p 8000:8000 chess-openenv
```

## Integration with RL Frameworks

### With TorchRL

```python
from moonfish.rl import ChessEnvironment, ChessAction

class ChessTorchRLWrapper:
    def __init__(self):
        self.env = ChessEnvironment()

    def reset(self):
        obs = self.env.reset()
        return self._obs_to_tensor(obs)

    def step(self, action_idx):
        move = self._idx_to_move(action_idx)
        obs, reward, done = self.env.step(ChessAction(move=move))
        return self._obs_to_tensor(obs), reward, done
```

### With OpenEnv Training Loop

```python
from moonfish.rl import make_env, ChessAction
import random

client = make_env("http://localhost:8000")

for episode in range(100):
    obs = client.reset()
    episode_reward = 0

    while not obs.done:
        # Your policy here (random for demo)
        move = random.choice(obs.legal_moves)
        result = client.step(ChessAction(move=move))
        obs = result.observation
        episode_reward += result.reward

    print(f"Episode {episode}: reward={episode_reward}")

client.close()
```

## API Endpoints

| Endpoint | Method | Description |
|----------|--------|-------------|
| `/health` | GET | Health check |
| `/metadata` | GET | Environment configuration |
| `/reset` | POST | Start new episode |
| `/step` | POST | Execute a move |
| `/state` | GET | Get episode metadata |

## License

MIT - See the moonfish repository for full license details.