chessecon

Runtime error

App Files Files Community

chessecon / README.md

suvasis

added huggingfacehub README

87a189e 2 days ago

preview code

raw

history blame contribute delete

19.2 kB

metadata

title: ChessEcon
emoji: ♟️
colorFrom: indigo
colorTo: yellow
sdk: docker
app_port: 8000
tags:
  - openenv
  - reinforcement-learning
  - chess
  - multi-agent
  - grpo
  - rl-environment
  - economy
  - two-player
  - game
  - textarena
  - llm-training
license: apache-2.0

♟️ ChessEcon

Multi-Agent Chess Economy · OpenEnv 0.1 · GRPO Live Training

Live API: https://chessecon.adaboost.io Dashboard: https://chessecon-ui.adaboost.io Swagger: https://chessecon.adaboost.io/docs env_info: https://chessecon.adaboost.io/env/env_info

Overview

ChessEcon is a two-player LLM chess environment where agents compete for economic stakes, fully compliant with the OpenEnv 0.1 specification.

Two language models play chess head-to-head. Each game costs an entry fee. The winner earns a prize pool. The White agent trains live using GRPO (Group Relative Policy Optimisation) — every game updates the policy weights in real-time. A Bloomberg-style dashboard streams all activity via WebSocket.

Agent	Model	Role
♔ White	`Qwen/Qwen2.5-0.5B-Instruct`	Trainable — GRPO updates every game
♚ Black	`meta-llama/Llama-3.2-1B-Instruct`	Fixed opponent — frozen weights

OpenEnv 0.1 API

All endpoints are compatible with TRL, verl, SkyRL, and any OpenEnv 0.1 trainer.

Endpoint	Method	Description
`/env/reset`	`POST`	Start new episode · deduct entry fees · return initial observation
`/env/step`	`POST`	Apply one move (UCI or SAN) · return reward + next observation
`/env/state`	`GET`	Read current board state — non-destructive
`/env/env_info`	`GET`	Environment metadata for HF Hub discoverability
`/ws`	`WebSocket`	Real-time event stream (moves, rewards, GRPO metrics)
`/health`	`GET`	Health check + model load status
`/docs`	`GET`	Interactive Swagger UI

Quick Start

import httpx

BASE = "https://chessecon.adaboost.io"

# 1. Start a new episode
reset = httpx.post(f"{BASE}/env/reset").json()
print(reset["observation"]["fen"])              # starting position
print(reset["observation"]["legal_moves_uci"])  # all legal moves in UCI

# 2. Play a move (UCI or SAN accepted)
step = httpx.post(f"{BASE}/env/step", json={"action": "e2e4"}).json()
print(step["observation"]["fen"])   # updated board
print(step["reward"])               # per-step reward signal
print(step["terminated"])           # True when game ends
print(step["truncated"])            # True if move limit reached

# 3. Inspect current state (read-only)
state = httpx.get(f"{BASE}/env/state").json()
print(state["step_count"])          # moves played so far
print(state["status"])              # "active" | "terminated" | "idle"

# 4. Environment metadata
info = httpx.get(f"{BASE}/env/env_info").json()
print(info["openenv_version"])      # "0.1"
print(info["agents"])               # model IDs for white/black

Drop-in Client (TRL / verl / SkyRL)

import httpx

class ChessEconEnv:
    """
    OpenEnv 0.1 client for ChessEcon.
    Compatible with TRL, verl, SkyRL, and any gym-style RL trainer.
    """

    def __init__(self, base_url: str = "https://chessecon.adaboost.io"):
        self.base = base_url.rstrip("/")
        self.http = httpx.Client(timeout=30)

    def reset(self, seed: int | None = None) -> tuple[dict, dict]:
        payload = {"seed": seed} if seed is not None else {}
        r = self.http.post(f"{self.base}/env/reset", json=payload)
        r.raise_for_status()
        d = r.json()
        return d["observation"], d["info"]

    def step(self, action: str) -> tuple[dict, float, bool, bool, dict]:
        """
        Args:
            action: Move in UCI (e.g. "e2e4") or SAN (e.g. "e4")
        Returns:
            (observation, reward, terminated, truncated, info)
        """
        r = self.http.post(f"{self.base}/env/step", json={"action": action})
        r.raise_for_status()
        d = r.json()
        return (d["observation"], d["reward"], d["terminated"], d["truncated"], d["info"])

    def state(self) -> dict:
        return self.http.get(f"{self.base}/env/state").json()

    def env_info(self) -> dict:
        return self.http.get(f"{self.base}/env/env_info").json()

    def close(self):
        self.http.close()


# Example: random rollout
import random

env = ChessEconEnv()
obs, info = env.reset()
total_reward = 0.0

while True:
    action = random.choice(obs["legal_moves_uci"])  # replace with your policy
    obs, reward, terminated, truncated, info = env.step(action)
    total_reward += reward
    if terminated or truncated:
        print(f"Game over | result={info.get('result')} | total_reward={total_reward:.3f}")
        break

env.close()

Observation Schema

Every response from /env/reset, /env/step, and /env/state contains a ChessObservation:

{
  "observation": {
    "fen": "rnbqkbnr/pppppppp/8/8/4P3/8/PPPP1PPP/RNBQKBNR b KQkq - 0 1",
    "turn": "black",
    "move_number": 1,
    "last_move_uci": "e2e4",
    "last_move_san": "e4",
    "legal_moves_uci": ["e7e5", "d7d5", "g8f6", "..."],
    "is_check": false,
    "wallet_white": 90.0,
    "wallet_black": 90.0,
    "white_model": "Qwen/Qwen2.5-0.5B-Instruct",
    "black_model": "meta-llama/Llama-3.2-1B-Instruct",
    "info": {}
  }
}

`/env/step` Response

{
  "observation": { "...": "ChessObservation — see above" },
  "reward": 0.01,
  "terminated": false,
  "truncated": false,
  "info": { "san": "e4", "uci": "e2e4", "move_number": 1 }
}

`/env/state` Response

{
  "observation": { "...": "ChessObservation — see above" },
  "episode_id": "ep-42",
  "step_count": 1,
  "status": "active",
  "info": {}
}

`/env/env_info` Response

{
  "openenv_version": "0.1",
  "environment_id": "chessecon-v1",
  "name": "ChessEcon",
  "description": "Multi-agent chess economy with live GRPO training",
  "action_space": "text",
  "observation_space": "text",
  "reward_range": [-1.0, 1.0],
  "max_steps": 40,
  "agents": {
    "white": "Qwen/Qwen2.5-0.5B-Instruct",
    "black": "meta-llama/Llama-3.2-1B-Instruct"
  },
  "tags": ["chess", "multi-agent", "economy", "grpo", "openenv"]
}

Reward Structure

Per-step rewards are issued after every move. Terminal rewards are issued at game end.

Event	Reward	Type
Legal move played	`+0.01`	Per-step
Move delivers check	`+0.05`	Per-step bonus
Capture	`+0.10`	Per-step bonus
Win (checkmate / material adj.)	`+1.00`	Terminal
Loss	`-1.00`	Terminal
Draw	`0.00`	Terminal
Illegal move attempted	`-0.10`	Per-step penalty

Combined reward formula: R = 0.4 × game_reward + 0.6 × economic_reward

economic_reward = (prize_income − entry_fee) / entry_fee

Material Adjudication

Games reaching the move limit are adjudicated by material count (Q=9, R=5, B=3, N=3, P=1). The side with superior material wins — ensuring every game produces a decisive +1 / -1 signal for GRPO training.

Economy Model

Both agents pay into a shared prize pool each game, creating zero-sum economic incentives aligned with game outcome.

Parameter	Value
Starting wallet	100 units
Entry fee	10 units per agent per game
Prize pool	18 units (90% of 2 × entry fee)
Win payout	+18 units → net +8
Draw payout	+9 units each → net −1
Loss payout	+0 units → net −10

GRPO Training

The White agent (Qwen2.5-0.5B) trains live using Group Relative Policy Optimisation:

Per-game update:
  1. White generates moves: sample log π_θ(a | s) at each position
  2. Reference log-probs log π_ref(a | s) computed from frozen snapshot
  3. Terminal reward R ∈ {+1, 0, −1} from material adjudication
  4. Advantage: A = (R − mean_R) / (std_R + ε)
  5. Clipped surrogate: L = −min(ratio·A, clip(ratio, 0.8, 1.2)·A)
  6. KL penalty: KL(π_θ ∥ π_ref), diff clamped to [−10, 10]
  7. Total: L_total = L + β·KL,  β = 0.04
  8. AdamW update, grad-norm clip max_norm=1.0

Hyperparameter	Value
LoRA rank	8
LoRA target modules	`q_proj`, `v_proj`
Learning rate	`1e-5`
KL coefficient β	`0.04`
Update frequency	Every 1 game
Checkpoint frequency	Every 100 steps
Optimizer	AdamW
Gradient clip	`max_norm=1.0`

Architecture

┌──────────────────────────────────────────────────────────────┐
│               External RL Trainers                           │
│         TRL · verl · SkyRL · custom OpenEnv clients         │
└──────────────────────┬───────────────────────────────────────┘
                       │ HTTP  POST /env/reset  /env/step
                       │       GET  /env/state  /env/env_info
                       ▼
┌──────────────────────────────────────────────────────────────┐
│                  FastAPI WebSocket Server                    │
│  ┌──────────────────────┐   ┌───────────────────────────┐   │
│  │  OpenEnv 0.1 Router  │   │  WebSocket  /ws           │   │
│  │  asyncio.Lock        │   │  broadcast() → dashboard  │   │
│  └──────────┬───────────┘   └───────────────────────────┘   │
│             │                                               │
│  ┌──────────▼───────────┐   ┌───────────────────────────┐   │
│  │   Chess Engine        │   │   Economy Engine          │   │
│  │   python-chess        │   │   Wallets · Entry fees    │   │
│  │   FEN · UCI · SAN     │   │   Prize pool · P&L        │   │
│  └──────────┬───────────┘   └───────────────────────────┘   │
│             │                                               │
│  ┌──────────▼───────────┐   ┌───────────────────────────┐   │
│  │  ♔ White Agent        │   │  ♚ Black Agent (fixed)    │   │
│  │  Qwen2.5-0.5B         │   │  Llama-3.2-1B             │   │
│  │  LoRA r=8             │   │  Frozen weights           │   │
│  └──────────┬───────────┘   └───────────────────────────┘   │
│             │                                               │
│  ┌──────────▼───────────┐                                   │
│  │  GRPO Trainer         │──▶  /checkpoints/step_N         │
│  │  PPO-clip + KL        │                                   │
│  │  AdamW  LR=1e-5       │                                   │
│  └──────────────────────┘                                   │
└──────────────────────┬───────────────────────────────────────┘
                       │ WebSocket broadcast()
                       ▼
┌──────────────────────────────────────────────────────────────┐
│              React Dashboard (nginx)                         │
│  Live Board · Wallet History · GRPO Metrics · P&L Chart     │
│  Architecture View · Live Event Feed                        │
└──────────────────────────────────────────────────────────────┘

WebSocket Event Stream

Connect to wss://chessecon.adaboost.io/ws for real-time events:

import asyncio, json, websockets

async def watch():
    async with websockets.connect("wss://chessecon.adaboost.io/ws") as ws:
        async for raw in ws:
            msg = json.loads(raw)
            match msg["type"]:
                case "move":
                    print(f"{msg['data']['player']} plays {msg['data']['move']}")
                case "game_end":
                    d = msg["data"]
                    print(f"Game over: {d['result']} | reward={d['reward']}")
                case "training_step":
                    d = msg["data"]
                    print(f"GRPO step {d['step']} | loss={d['loss']:.4f} kl={d['kl_div']:.4f}")
                case "status":
                    print(f"Snapshot: game #{msg['data']['game_id']}")

asyncio.run(watch())

Event Types

Type	Key Fields
`status`	`game_id`, `wallet_white`, `wallet_black`, `grpo_step`
`game_start`	`game_id`, `wallet_white`, `wallet_black`, `prize_pool`
`move`	`player`, `move`, `uci`, `fen`, `move_number`
`game_end`	`result`, `reward`, `wallet_white`, `wallet_black`, `net_pnl_white`
`training_step`	`step`, `loss`, `reward`, `kl_div`, `win_rate`

Models

ChessEcon uses two publicly available HuggingFace models:

Agent	Model Card	Size	Local Path
♔ White (trainable)	Qwen/Qwen2.5-0.5B-Instruct	943 MB	`training/models/Qwen_Qwen2.5-0.5B-Instruct/`
♚ Black (fixed)	meta-llama/Llama-3.2-1B-Instruct	2.4 GB	`training/models/meta-llama_Llama-3.2-1B-Instruct/`

Note: Llama-3.2-1B-Instruct requires a HuggingFace account with Meta's license accepted at meta-llama/Llama-3.2-1B-Instruct. Generate a token at huggingface.co/settings/tokens.

Download Commands

Option A — Python (recommended):

from huggingface_hub import snapshot_download

# White agent — Qwen2.5-0.5B-Instruct (no token required)
snapshot_download(
    repo_id="Qwen/Qwen2.5-0.5B-Instruct",
    local_dir="training/models/Qwen_Qwen2.5-0.5B-Instruct",
    local_dir_use_symlinks=False,
)

# Black agent — Llama-3.2-1B-Instruct (requires HF token + Meta license)
snapshot_download(
    repo_id="meta-llama/Llama-3.2-1B-Instruct",
    local_dir="training/models/meta-llama_Llama-3.2-1B-Instruct",
    local_dir_use_symlinks=False,
    token="hf_YOUR_TOKEN_HERE",
)

Option B — huggingface-cli:

# Install CLI if needed
pip install huggingface_hub

# White agent (no token)
huggingface-cli download Qwen/Qwen2.5-0.5B-Instruct \
  --local-dir training/models/Qwen_Qwen2.5-0.5B-Instruct

# Black agent (token required)
huggingface-cli login   # paste your HF token when prompted
huggingface-cli download meta-llama/Llama-3.2-1B-Instruct \
  --local-dir training/models/meta-llama_Llama-3.2-1B-Instruct

Option C — git lfs:

git lfs install

# White agent
git clone https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct \
  training/models/Qwen_Qwen2.5-0.5B-Instruct

# Black agent (must be logged in: huggingface-cli login)
git clone https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct \
  training/models/meta-llama_Llama-3.2-1B-Instruct

Verify Downloads

# Expected files after download:
ls training/models/Qwen_Qwen2.5-0.5B-Instruct/
# config.json  generation_config.json  model.safetensors  tokenizer*.json  ...

ls training/models/meta-llama_Llama-3.2-1B-Instruct/
# config.json  generation_config.json  model.safetensors  tokenizer*.json  ...

# Check sizes
du -sh training/models/Qwen_Qwen2.5-0.5B-Instruct/model.safetensors
# → 943M

du -sh training/models/meta-llama_Llama-3.2-1B-Instruct/model.safetensors
# → 2.4G

Running Locally

git clone https://huggingface.co/spaces/adaboost-ai/chessecon
cd chessecon

# 1. Download models (see Models section above)

# 2. Start backend + dashboard
docker-compose up -d

# API:       http://localhost:8008
# Dashboard: http://localhost:3006
# Docs:      http://localhost:8008/docs

Key Environment Variables

Variable	Default	Description
`WHITE_MODEL`	`/models/Qwen_...`	Path to White model
`BLACK_MODEL`	`/models/meta-llama_...`	Path to Black model
`DEVICE`	`cuda`	`cuda` or `cpu`
`MAX_MOVES`	`15`	Moves before material adjudication
`MOVE_DELAY`	`0.05`	Seconds between moves
`ENTRY_FEE`	`10`	Units per agent per game
`PRIZE_POOL_FRACTION`	`0.9`	Fraction of 2×entry returned as prize
`GRPO_LR`	`1e-5`	AdamW learning rate
`GRPO_KL_COEFF`	`0.04`	KL divergence penalty β
`LORA_RANK`	`8`	LoRA adapter rank

Hardware Requirements

Config	Minimum
CPU-only	8 GB RAM · `DEVICE=cpu`
GPU (recommended)	8 GB VRAM · CUDA 11.8+
Dev server	4× NVIDIA RTX 3070 (lambda-quad)

Citation

@software{chessecon2026,
  title   = {ChessEcon: Multi-Agent Chess Economy with Live GRPO Training},
  author  = {AdaBoost AI},
  year    = {2026},
  url     = {https://huggingface.co/spaces/adaboost-ai/chessecon},
  note    = {OpenEnv 0.1 · TextArena + Meta OpenEnv · Hackathon 2026}
}