--- title: Moonfish Chess emoji: ♟️ colorFrom: gray colorTo: blue sdk: docker pinned: false license: mit base_path: /web --- # Chess OpenEnv A chess environment for reinforcement learning, built on [moonfish](https://github.com/luccab/moonfish) and compatible with the [OpenEnv](https://github.com/meta-pytorch/OpenEnv) framework. ## Features - **Full Chess Rules**: Legal move generation, checkmate/stalemate detection, draw conditions - **Position Evaluation**: PeSTO evaluation function from moonfish for reward shaping - **OpenEnv Compatible**: Standard `reset()`, `step()`, `state()` interface - **Configurable Rewards**: Win/loss/draw payoffs, illegal move penalties, evaluation-based rewards - **HTTP API**: FastAPI server for remote training and multi-agent setups - **Containerized**: Docker support for reproducible deployments ## Quick Start ### Local Usage (No Server) ```python from moonfish.rl import ChessEnvironment, ChessAction # Create environment env = ChessEnvironment() # Start a new game obs = env.reset() print(f"Legal moves: {obs.legal_moves}") # Make a move action = ChessAction(move="e2e4") obs, reward, done = env.step(action) print(f"FEN: {obs.fen}") print(f"Reward: {reward}, Done: {done}") ``` ### Client-Server Usage Start the server: ```bash cd moonfish/rl python -m uvicorn server.app:app --host 0.0.0.0 --port 8000 ``` Connect with the client: ```python from moonfish.rl import ChessEnvClient, ChessAction client = ChessEnvClient("http://localhost:8000") obs = client.reset() result = client.step(ChessAction(move="e2e4")) print(f"Reward: {result.reward}") client.close() ``` ## Data Models ### ChessAction ```python @dataclass class ChessAction: move: str # UCI format: "e2e4", "e7e8q" (promotion) ``` ### ChessObservation ```python @dataclass class ChessObservation: fen: str # Board state in FEN notation legal_moves: List[str] # Available moves in UCI format is_check: bool # Current player in check done: bool # Game over reward: Optional[float] # Terminal reward result: Optional[str] # "1-0", "0-1", "1/2-1/2" metadata: Dict[str, Any] # Evaluation, material, etc. ``` ### ChessState ```python @dataclass class ChessState: episode_id: str # Unique game identifier step_count: int # Half-moves played current_player: str # "white" or "black" fen: str # Current position move_history: List[str] # All moves in UCI format ``` ## Reward Configuration ```python from moonfish.rl import ChessEnvironment, RewardConfig config = RewardConfig( win=1.0, # Reward for winning loss=-1.0, # Penalty for losing draw=0.0, # Reward for draw illegal_move=-0.1, # Penalty for illegal moves use_evaluation=True, # Enable intermediate rewards evaluation_scale=0.0001, # Scale for eval-based rewards ) env = ChessEnvironment(reward_config=config) ``` ## Docker Build and run: ```bash docker build -t chess-openenv . docker run -p 8000:8000 chess-openenv ``` ## Integration with RL Frameworks ### With TorchRL ```python from moonfish.rl import ChessEnvironment, ChessAction class ChessTorchRLWrapper: def __init__(self): self.env = ChessEnvironment() def reset(self): obs = self.env.reset() return self._obs_to_tensor(obs) def step(self, action_idx): move = self._idx_to_move(action_idx) obs, reward, done = self.env.step(ChessAction(move=move)) return self._obs_to_tensor(obs), reward, done ``` ### With OpenEnv Training Loop ```python from moonfish.rl import make_env, ChessAction import random client = make_env("http://localhost:8000") for episode in range(100): obs = client.reset() episode_reward = 0 while not obs.done: # Your policy here (random for demo) move = random.choice(obs.legal_moves) result = client.step(ChessAction(move=move)) obs = result.observation episode_reward += result.reward print(f"Episode {episode}: reward={episode_reward}") client.close() ``` ## API Endpoints | Endpoint | Method | Description | |----------|--------|-------------| | `/health` | GET | Health check | | `/metadata` | GET | Environment configuration | | `/reset` | POST | Start new episode | | `/step` | POST | Execute a move | | `/state` | GET | Get episode metadata | ## License MIT - See the moonfish repository for full license details.