File size: 4,499 Bytes
68a23df
 
3e1f9da
 
68a23df
 
 
3e1f9da
 
68a23df
 
3e1f9da
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
---
title: Moonfish Chess
emoji: ♟️
colorFrom: gray
colorTo: blue
sdk: docker
pinned: false
license: mit
base_path: /web
---

# Chess OpenEnv

A chess environment for reinforcement learning, built on [moonfish](https://github.com/luccab/moonfish) and compatible with the [OpenEnv](https://github.com/meta-pytorch/OpenEnv) framework.

## Features

- **Full Chess Rules**: Legal move generation, checkmate/stalemate detection, draw conditions
- **Position Evaluation**: PeSTO evaluation function from moonfish for reward shaping
- **OpenEnv Compatible**: Standard `reset()`, `step()`, `state()` interface
- **Configurable Rewards**: Win/loss/draw payoffs, illegal move penalties, evaluation-based rewards
- **HTTP API**: FastAPI server for remote training and multi-agent setups
- **Containerized**: Docker support for reproducible deployments

## Quick Start

### Local Usage (No Server)

```python
from moonfish.rl import ChessEnvironment, ChessAction

# Create environment
env = ChessEnvironment()

# Start a new game
obs = env.reset()
print(f"Legal moves: {obs.legal_moves}")

# Make a move
action = ChessAction(move="e2e4")
obs, reward, done = env.step(action)

print(f"FEN: {obs.fen}")
print(f"Reward: {reward}, Done: {done}")
```

### Client-Server Usage

Start the server:

```bash
cd moonfish/rl
python -m uvicorn server.app:app --host 0.0.0.0 --port 8000
```

Connect with the client:

```python
from moonfish.rl import ChessEnvClient, ChessAction

client = ChessEnvClient("http://localhost:8000")

obs = client.reset()
result = client.step(ChessAction(move="e2e4"))
print(f"Reward: {result.reward}")

client.close()
```

## Data Models

### ChessAction
```python
@dataclass
class ChessAction:
    move: str  # UCI format: "e2e4", "e7e8q" (promotion)
```

### ChessObservation
```python
@dataclass
class ChessObservation:
    fen: str              # Board state in FEN notation
    legal_moves: List[str]  # Available moves in UCI format
    is_check: bool        # Current player in check
    done: bool            # Game over
    reward: Optional[float]  # Terminal reward
    result: Optional[str]    # "1-0", "0-1", "1/2-1/2"
    metadata: Dict[str, Any]  # Evaluation, material, etc.
```

### ChessState
```python
@dataclass
class ChessState:
    episode_id: str        # Unique game identifier
    step_count: int        # Half-moves played
    current_player: str    # "white" or "black"
    fen: str               # Current position
    move_history: List[str]  # All moves in UCI format
```

## Reward Configuration

```python
from moonfish.rl import ChessEnvironment, RewardConfig

config = RewardConfig(
    win=1.0,           # Reward for winning
    loss=-1.0,         # Penalty for losing
    draw=0.0,          # Reward for draw
    illegal_move=-0.1, # Penalty for illegal moves
    use_evaluation=True,  # Enable intermediate rewards
    evaluation_scale=0.0001,  # Scale for eval-based rewards
)

env = ChessEnvironment(reward_config=config)
```

## Docker

Build and run:

```bash
docker build -t chess-openenv .
docker run -p 8000:8000 chess-openenv
```

## Integration with RL Frameworks

### With TorchRL

```python
from moonfish.rl import ChessEnvironment, ChessAction

class ChessTorchRLWrapper:
    def __init__(self):
        self.env = ChessEnvironment()

    def reset(self):
        obs = self.env.reset()
        return self._obs_to_tensor(obs)

    def step(self, action_idx):
        move = self._idx_to_move(action_idx)
        obs, reward, done = self.env.step(ChessAction(move=move))
        return self._obs_to_tensor(obs), reward, done
```

### With OpenEnv Training Loop

```python
from moonfish.rl import make_env, ChessAction
import random

client = make_env("http://localhost:8000")

for episode in range(100):
    obs = client.reset()
    episode_reward = 0

    while not obs.done:
        # Your policy here (random for demo)
        move = random.choice(obs.legal_moves)
        result = client.step(ChessAction(move=move))
        obs = result.observation
        episode_reward += result.reward

    print(f"Episode {episode}: reward={episode_reward}")

client.close()
```

## API Endpoints

| Endpoint | Method | Description |
|----------|--------|-------------|
| `/health` | GET | Health check |
| `/metadata` | GET | Environment configuration |
| `/reset` | POST | Start new episode |
| `/step` | POST | Execute a move |
| `/state` | GET | Get episode metadata |

## License

MIT - See the moonfish repository for full license details.