Spaces:
Runtime error
title: ChessEcon
emoji: ♟️
colorFrom: indigo
colorTo: yellow
sdk: docker
app_port: 8000
tags:
- openenv
- reinforcement-learning
- chess
- multi-agent
- grpo
- rl-environment
- economy
- two-player
- game
- textarena
- llm-training
license: apache-2.0
♟️ ChessEcon
Multi-Agent Chess Economy · OpenEnv 0.1 · GRPO Live Training
Live API: https://chessecon.adaboost.io
Dashboard: https://chessecon-ui.adaboost.io
Swagger: https://chessecon.adaboost.io/docs
env_info: https://chessecon.adaboost.io/env/env_info
Overview
ChessEcon is a two-player LLM chess environment where agents compete for economic stakes, fully compliant with the OpenEnv 0.1 specification.
Two language models play chess head-to-head. Each game costs an entry fee. The winner earns a prize pool. The White agent trains live using GRPO (Group Relative Policy Optimisation) — every game updates the policy weights in real-time. A Bloomberg-style dashboard streams all activity via WebSocket.
| Agent | Model | Role |
|---|---|---|
| ♔ White | Qwen/Qwen2.5-0.5B-Instruct |
Trainable — GRPO updates every game |
| ♚ Black | meta-llama/Llama-3.2-1B-Instruct |
Fixed opponent — frozen weights |
OpenEnv 0.1 API
All endpoints are compatible with TRL, verl, SkyRL, and any OpenEnv 0.1 trainer.
| Endpoint | Method | Description |
|---|---|---|
/env/reset |
POST |
Start new episode · deduct entry fees · return initial observation |
/env/step |
POST |
Apply one move (UCI or SAN) · return reward + next observation |
/env/state |
GET |
Read current board state — non-destructive |
/env/env_info |
GET |
Environment metadata for HF Hub discoverability |
/ws |
WebSocket |
Real-time event stream (moves, rewards, GRPO metrics) |
/health |
GET |
Health check + model load status |
/docs |
GET |
Interactive Swagger UI |
Quick Start
import httpx
BASE = "https://chessecon.adaboost.io"
# 1. Start a new episode
reset = httpx.post(f"{BASE}/env/reset").json()
print(reset["observation"]["fen"]) # starting position
print(reset["observation"]["legal_moves_uci"]) # all legal moves in UCI
# 2. Play a move (UCI or SAN accepted)
step = httpx.post(f"{BASE}/env/step", json={"action": "e2e4"}).json()
print(step["observation"]["fen"]) # updated board
print(step["reward"]) # per-step reward signal
print(step["terminated"]) # True when game ends
print(step["truncated"]) # True if move limit reached
# 3. Inspect current state (read-only)
state = httpx.get(f"{BASE}/env/state").json()
print(state["step_count"]) # moves played so far
print(state["status"]) # "active" | "terminated" | "idle"
# 4. Environment metadata
info = httpx.get(f"{BASE}/env/env_info").json()
print(info["openenv_version"]) # "0.1"
print(info["agents"]) # model IDs for white/black
Drop-in Client (TRL / verl / SkyRL)
import httpx
class ChessEconEnv:
"""
OpenEnv 0.1 client for ChessEcon.
Compatible with TRL, verl, SkyRL, and any gym-style RL trainer.
"""
def __init__(self, base_url: str = "https://chessecon.adaboost.io"):
self.base = base_url.rstrip("/")
self.http = httpx.Client(timeout=30)
def reset(self, seed: int | None = None) -> tuple[dict, dict]:
payload = {"seed": seed} if seed is not None else {}
r = self.http.post(f"{self.base}/env/reset", json=payload)
r.raise_for_status()
d = r.json()
return d["observation"], d["info"]
def step(self, action: str) -> tuple[dict, float, bool, bool, dict]:
"""
Args:
action: Move in UCI (e.g. "e2e4") or SAN (e.g. "e4")
Returns:
(observation, reward, terminated, truncated, info)
"""
r = self.http.post(f"{self.base}/env/step", json={"action": action})
r.raise_for_status()
d = r.json()
return (d["observation"], d["reward"], d["terminated"], d["truncated"], d["info"])
def state(self) -> dict:
return self.http.get(f"{self.base}/env/state").json()
def env_info(self) -> dict:
return self.http.get(f"{self.base}/env/env_info").json()
def close(self):
self.http.close()
# Example: random rollout
import random
env = ChessEconEnv()
obs, info = env.reset()
total_reward = 0.0
while True:
action = random.choice(obs["legal_moves_uci"]) # replace with your policy
obs, reward, terminated, truncated, info = env.step(action)
total_reward += reward
if terminated or truncated:
print(f"Game over | result={info.get('result')} | total_reward={total_reward:.3f}")
break
env.close()
Observation Schema
Every response from /env/reset, /env/step, and /env/state contains a ChessObservation:
{
"observation": {
"fen": "rnbqkbnr/pppppppp/8/8/4P3/8/PPPP1PPP/RNBQKBNR b KQkq - 0 1",
"turn": "black",
"move_number": 1,
"last_move_uci": "e2e4",
"last_move_san": "e4",
"legal_moves_uci": ["e7e5", "d7d5", "g8f6", "..."],
"is_check": false,
"wallet_white": 90.0,
"wallet_black": 90.0,
"white_model": "Qwen/Qwen2.5-0.5B-Instruct",
"black_model": "meta-llama/Llama-3.2-1B-Instruct",
"info": {}
}
}
/env/step Response
{
"observation": { "...": "ChessObservation — see above" },
"reward": 0.01,
"terminated": false,
"truncated": false,
"info": { "san": "e4", "uci": "e2e4", "move_number": 1 }
}
/env/state Response
{
"observation": { "...": "ChessObservation — see above" },
"episode_id": "ep-42",
"step_count": 1,
"status": "active",
"info": {}
}
/env/env_info Response
{
"openenv_version": "0.1",
"environment_id": "chessecon-v1",
"name": "ChessEcon",
"description": "Multi-agent chess economy with live GRPO training",
"action_space": "text",
"observation_space": "text",
"reward_range": [-1.0, 1.0],
"max_steps": 40,
"agents": {
"white": "Qwen/Qwen2.5-0.5B-Instruct",
"black": "meta-llama/Llama-3.2-1B-Instruct"
},
"tags": ["chess", "multi-agent", "economy", "grpo", "openenv"]
}
Reward Structure
Per-step rewards are issued after every move. Terminal rewards are issued at game end.
| Event | Reward | Type |
|---|---|---|
| Legal move played | +0.01 |
Per-step |
| Move delivers check | +0.05 |
Per-step bonus |
| Capture | +0.10 |
Per-step bonus |
| Win (checkmate / material adj.) | +1.00 |
Terminal |
| Loss | -1.00 |
Terminal |
| Draw | 0.00 |
Terminal |
| Illegal move attempted | -0.10 |
Per-step penalty |
Combined reward formula:
R = 0.4 × game_reward + 0.6 × economic_reward
economic_reward = (prize_income − entry_fee) / entry_fee
Material Adjudication
Games reaching the move limit are adjudicated by material count (Q=9, R=5, B=3, N=3, P=1). The side with superior material wins — ensuring every game produces a decisive +1 / -1 signal for GRPO training.
Economy Model
Both agents pay into a shared prize pool each game, creating zero-sum economic incentives aligned with game outcome.
| Parameter | Value |
|---|---|
| Starting wallet | 100 units |
| Entry fee | 10 units per agent per game |
| Prize pool | 18 units (90% of 2 × entry fee) |
| Win payout | +18 units → net +8 |
| Draw payout | +9 units each → net −1 |
| Loss payout | +0 units → net −10 |
GRPO Training
The White agent (Qwen2.5-0.5B) trains live using Group Relative Policy Optimisation:
Per-game update:
1. White generates moves: sample log π_θ(a | s) at each position
2. Reference log-probs log π_ref(a | s) computed from frozen snapshot
3. Terminal reward R ∈ {+1, 0, −1} from material adjudication
4. Advantage: A = (R − mean_R) / (std_R + ε)
5. Clipped surrogate: L = −min(ratio·A, clip(ratio, 0.8, 1.2)·A)
6. KL penalty: KL(π_θ ∥ π_ref), diff clamped to [−10, 10]
7. Total: L_total = L + β·KL, β = 0.04
8. AdamW update, grad-norm clip max_norm=1.0
| Hyperparameter | Value |
|---|---|
| LoRA rank | 8 |
| LoRA target modules | q_proj, v_proj |
| Learning rate | 1e-5 |
| KL coefficient β | 0.04 |
| Update frequency | Every 1 game |
| Checkpoint frequency | Every 100 steps |
| Optimizer | AdamW |
| Gradient clip | max_norm=1.0 |
Architecture
┌──────────────────────────────────────────────────────────────┐
│ External RL Trainers │
│ TRL · verl · SkyRL · custom OpenEnv clients │
└──────────────────────┬───────────────────────────────────────┘
│ HTTP POST /env/reset /env/step
│ GET /env/state /env/env_info
▼
┌──────────────────────────────────────────────────────────────┐
│ FastAPI WebSocket Server │
│ ┌──────────────────────┐ ┌───────────────────────────┐ │
│ │ OpenEnv 0.1 Router │ │ WebSocket /ws │ │
│ │ asyncio.Lock │ │ broadcast() → dashboard │ │
│ └──────────┬───────────┘ └───────────────────────────┘ │
│ │ │
│ ┌──────────▼───────────┐ ┌───────────────────────────┐ │
│ │ Chess Engine │ │ Economy Engine │ │
│ │ python-chess │ │ Wallets · Entry fees │ │
│ │ FEN · UCI · SAN │ │ Prize pool · P&L │ │
│ └──────────┬───────────┘ └───────────────────────────┘ │
│ │ │
│ ┌──────────▼───────────┐ ┌───────────────────────────┐ │
│ │ ♔ White Agent │ │ ♚ Black Agent (fixed) │ │
│ │ Qwen2.5-0.5B │ │ Llama-3.2-1B │ │
│ │ LoRA r=8 │ │ Frozen weights │ │
│ └──────────┬───────────┘ └───────────────────────────┘ │
│ │ │
│ ┌──────────▼───────────┐ │
│ │ GRPO Trainer │──▶ /checkpoints/step_N │
│ │ PPO-clip + KL │ │
│ │ AdamW LR=1e-5 │ │
│ └──────────────────────┘ │
└──────────────────────┬───────────────────────────────────────┘
│ WebSocket broadcast()
▼
┌──────────────────────────────────────────────────────────────┐
│ React Dashboard (nginx) │
│ Live Board · Wallet History · GRPO Metrics · P&L Chart │
│ Architecture View · Live Event Feed │
└──────────────────────────────────────────────────────────────┘
WebSocket Event Stream
Connect to wss://chessecon.adaboost.io/ws for real-time events:
import asyncio, json, websockets
async def watch():
async with websockets.connect("wss://chessecon.adaboost.io/ws") as ws:
async for raw in ws:
msg = json.loads(raw)
match msg["type"]:
case "move":
print(f"{msg['data']['player']} plays {msg['data']['move']}")
case "game_end":
d = msg["data"]
print(f"Game over: {d['result']} | reward={d['reward']}")
case "training_step":
d = msg["data"]
print(f"GRPO step {d['step']} | loss={d['loss']:.4f} kl={d['kl_div']:.4f}")
case "status":
print(f"Snapshot: game #{msg['data']['game_id']}")
asyncio.run(watch())
Event Types
| Type | Key Fields |
|---|---|
status |
game_id, wallet_white, wallet_black, grpo_step |
game_start |
game_id, wallet_white, wallet_black, prize_pool |
move |
player, move, uci, fen, move_number |
game_end |
result, reward, wallet_white, wallet_black, net_pnl_white |
training_step |
step, loss, reward, kl_div, win_rate |
Models
ChessEcon uses two publicly available HuggingFace models:
| Agent | Model Card | Size | Local Path |
|---|---|---|---|
| ♔ White (trainable) | Qwen/Qwen2.5-0.5B-Instruct | 943 MB | training/models/Qwen_Qwen2.5-0.5B-Instruct/ |
| ♚ Black (fixed) | meta-llama/Llama-3.2-1B-Instruct | 2.4 GB | training/models/meta-llama_Llama-3.2-1B-Instruct/ |
Note:
Llama-3.2-1B-Instructrequires a HuggingFace account with Meta's license accepted at meta-llama/Llama-3.2-1B-Instruct. Generate a token at huggingface.co/settings/tokens.
Download Commands
Option A — Python (recommended):
from huggingface_hub import snapshot_download
# White agent — Qwen2.5-0.5B-Instruct (no token required)
snapshot_download(
repo_id="Qwen/Qwen2.5-0.5B-Instruct",
local_dir="training/models/Qwen_Qwen2.5-0.5B-Instruct",
local_dir_use_symlinks=False,
)
# Black agent — Llama-3.2-1B-Instruct (requires HF token + Meta license)
snapshot_download(
repo_id="meta-llama/Llama-3.2-1B-Instruct",
local_dir="training/models/meta-llama_Llama-3.2-1B-Instruct",
local_dir_use_symlinks=False,
token="hf_YOUR_TOKEN_HERE",
)
Option B — huggingface-cli:
# Install CLI if needed
pip install huggingface_hub
# White agent (no token)
huggingface-cli download Qwen/Qwen2.5-0.5B-Instruct \
--local-dir training/models/Qwen_Qwen2.5-0.5B-Instruct
# Black agent (token required)
huggingface-cli login # paste your HF token when prompted
huggingface-cli download meta-llama/Llama-3.2-1B-Instruct \
--local-dir training/models/meta-llama_Llama-3.2-1B-Instruct
Option C — git lfs:
git lfs install
# White agent
git clone https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct \
training/models/Qwen_Qwen2.5-0.5B-Instruct
# Black agent (must be logged in: huggingface-cli login)
git clone https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct \
training/models/meta-llama_Llama-3.2-1B-Instruct
Verify Downloads
# Expected files after download:
ls training/models/Qwen_Qwen2.5-0.5B-Instruct/
# config.json generation_config.json model.safetensors tokenizer*.json ...
ls training/models/meta-llama_Llama-3.2-1B-Instruct/
# config.json generation_config.json model.safetensors tokenizer*.json ...
# Check sizes
du -sh training/models/Qwen_Qwen2.5-0.5B-Instruct/model.safetensors
# → 943M
du -sh training/models/meta-llama_Llama-3.2-1B-Instruct/model.safetensors
# → 2.4G
Running Locally
git clone https://huggingface.co/spaces/adaboost-ai/chessecon
cd chessecon
# 1. Download models (see Models section above)
# 2. Start backend + dashboard
docker-compose up -d
# API: http://localhost:8008
# Dashboard: http://localhost:3006
# Docs: http://localhost:8008/docs
Key Environment Variables
| Variable | Default | Description |
|---|---|---|
WHITE_MODEL |
/models/Qwen_... |
Path to White model |
BLACK_MODEL |
/models/meta-llama_... |
Path to Black model |
DEVICE |
cuda |
cuda or cpu |
MAX_MOVES |
15 |
Moves before material adjudication |
MOVE_DELAY |
0.05 |
Seconds between moves |
ENTRY_FEE |
10 |
Units per agent per game |
PRIZE_POOL_FRACTION |
0.9 |
Fraction of 2×entry returned as prize |
GRPO_LR |
1e-5 |
AdamW learning rate |
GRPO_KL_COEFF |
0.04 |
KL divergence penalty β |
LORA_RANK |
8 |
LoRA adapter rank |
Hardware Requirements
| Config | Minimum |
|---|---|
| CPU-only | 8 GB RAM · DEVICE=cpu |
| GPU (recommended) | 8 GB VRAM · CUDA 11.8+ |
| Dev server | 4× NVIDIA RTX 3070 (lambda-quad) |
Citation
@software{chessecon2026,
title = {ChessEcon: Multi-Agent Chess Economy with Live GRPO Training},
author = {AdaBoost AI},
year = {2026},
url = {https://huggingface.co/spaces/adaboost-ai/chessecon},
note = {OpenEnv 0.1 · TextArena + Meta OpenEnv · Hackathon 2026}
}
Links
- Live Dashboard: chessecon-ui.adaboost.io
- API + Swagger: chessecon.adaboost.io/docs
- AdaBoost AI: adaboost.io
- OpenEnv Spec: github.com/huggingface/openenv
- GRPO Paper: DeepSeek-R1 (arXiv 2501.12599)