suvasis commited on
Commit
a93cec9
·
1 Parent(s): e4d7d50

added huggingfacehub README

Browse files
Files changed (1) hide show
  1. README.md +308 -92
README.md CHANGED
@@ -2,7 +2,7 @@
2
  title: ChessEcon
3
  emoji: ♟️
4
  colorFrom: indigo
5
- colorTo: purple
6
  sdk: docker
7
  app_port: 8000
8
  tags:
@@ -15,40 +15,55 @@ tags:
15
  - economy
16
  - two-player
17
  - game
 
 
18
  license: apache-2.0
19
  ---
20
 
21
- # ♟️ ChessEcon — OpenEnv 0.1 Compliant Chess Economy Environment
22
 
23
- > **Self-hosted environment** — the live API runs on AdaBoost AI infrastructure.
24
- > Update this URL if the domain changes.
25
 
26
- **Live API base URL:** `https://chessecon.adaboost.io`
27
- **env_info:** `https://chessecon.adaboost.io/env/env_info`
 
 
 
 
 
 
28
  **Dashboard:** `https://chessecon-ui.adaboost.io`
29
- **Swagger docs:** `https://chessecon.adaboost.io/docs`
 
 
 
30
 
31
  ---
32
 
33
- **Two competing LLM agents play chess for economic stakes.**
34
- White = `Qwen/Qwen2.5-0.5B-Instruct` (trainable) | Black = `meta-llama/Llama-3.2-1B-Instruct` (fixed)
35
 
36
- Both agents pay an entry fee each game. The winner earns a prize pool.
37
- The White agent is trained live with **GRPO** (Group Relative Policy Optimisation).
 
 
 
 
 
 
38
 
39
  ---
40
 
41
  ## OpenEnv 0.1 API
42
 
43
- This environment is fully compliant with the [OpenEnv 0.1 spec](https://github.com/huggingface/openenv).
44
 
45
  | Endpoint | Method | Description |
46
  |---|---|---|
47
- | `/env/reset` | `POST` | Start a new episode, deduct entry fees, return initial observation |
48
- | `/env/step` | `POST` | Apply one move (UCI or SAN), return reward + next observation |
49
- | `/env/state` | `GET` | Inspect current board state — read-only, no side effects |
50
  | `/env/env_info` | `GET` | Environment metadata for HF Hub discoverability |
51
- | `/ws` | `WS` | Real-time event stream for the live dashboard |
52
  | `/health` | `GET` | Health check + model load status |
53
  | `/docs` | `GET` | Interactive Swagger UI |
54
 
@@ -63,17 +78,17 @@ BASE = "https://chessecon.adaboost.io"
63
 
64
  # 1. Start a new episode
65
  reset = httpx.post(f"{BASE}/env/reset").json()
66
- print(reset["observation"]["fen"]) # starting FEN
67
- print(reset["observation"]["legal_moves_uci"]) # all legal moves
68
 
69
- # 2. Play moves
70
  step = httpx.post(f"{BASE}/env/step", json={"action": "e2e4"}).json()
71
- print(step["observation"]["fen"]) # board after move
72
  print(step["reward"]) # per-step reward signal
73
- print(step["terminated"]) # True if game is over
74
- print(step["truncated"]) # True if move limit hit
75
 
76
- # 3. Inspect state (non-destructive)
77
  state = httpx.get(f"{BASE}/env/state").json()
78
  print(state["step_count"]) # moves played so far
79
  print(state["status"]) # "active" | "terminated" | "idle"
@@ -81,63 +96,78 @@ print(state["status"]) # "active" | "terminated" | "idle"
81
  # 4. Environment metadata
82
  info = httpx.get(f"{BASE}/env/env_info").json()
83
  print(info["openenv_version"]) # "0.1"
84
- print(info["agents"]) # white/black model IDs
85
  ```
86
 
87
- ### Drop-in Client for TRL / verl / SkyRL
 
 
88
 
89
  ```python
90
  import httpx
91
 
92
- class ChessEconClient:
93
- """OpenEnv 0.1 client — compatible with TRL, verl, SkyRL."""
 
 
 
94
 
95
  def __init__(self, base_url: str = "https://chessecon.adaboost.io"):
96
  self.base = base_url.rstrip("/")
97
- self.client = httpx.Client(timeout=30)
98
 
99
- def reset(self, seed=None):
100
  payload = {"seed": seed} if seed is not None else {}
101
- r = self.client.post(f"{self.base}/env/reset", json=payload)
102
  r.raise_for_status()
103
- data = r.json()
104
- return data["observation"], data["info"]
105
-
106
- def step(self, action: str):
107
- r = self.client.post(f"{self.base}/env/step", json={"action": action})
 
 
 
 
 
 
108
  r.raise_for_status()
109
- data = r.json()
110
- return (
111
- data["observation"],
112
- data["reward"],
113
- data["terminated"],
114
- data["truncated"],
115
- data["info"],
116
- )
117
 
118
- def state(self):
119
- return self.client.get(f"{self.base}/env/state").json()
120
 
121
- def env_info(self):
122
- return self.client.get(f"{self.base}/env/env_info").json()
123
 
 
 
124
 
125
- # Usage
126
- env = ChessEconClient()
127
  obs, info = env.reset()
 
128
 
129
  while True:
130
- action = obs["legal_moves_uci"][0] # replace with your policy
131
  obs, reward, terminated, truncated, info = env.step(action)
 
132
  if terminated or truncated:
 
133
  break
 
 
134
  ```
135
 
136
  ---
137
 
138
  ## Observation Schema
139
 
140
- Every response wraps a `ChessObservation` object:
141
 
142
  ```json
143
  {
@@ -147,7 +177,7 @@ Every response wraps a `ChessObservation` object:
147
  "move_number": 1,
148
  "last_move_uci": "e2e4",
149
  "last_move_san": "e4",
150
- "legal_moves_uci": ["e7e5", "d7d5", "g8f6"],
151
  "is_check": false,
152
  "wallet_white": 90.0,
153
  "wallet_black": 90.0,
@@ -158,11 +188,11 @@ Every response wraps a `ChessObservation` object:
158
  }
159
  ```
160
 
161
- ### Step Response
162
 
163
  ```json
164
  {
165
- "observation": { "...": "see above" },
166
  "reward": 0.01,
167
  "terminated": false,
168
  "truncated": false,
@@ -170,11 +200,11 @@ Every response wraps a `ChessObservation` object:
170
  }
171
  ```
172
 
173
- ### State Response
174
 
175
  ```json
176
  {
177
- "observation": { "...": "see above" },
178
  "episode_id": "ep-42",
179
  "step_count": 1,
180
  "status": "active",
@@ -182,69 +212,255 @@ Every response wraps a `ChessObservation` object:
182
  }
183
  ```
184
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
185
  ---
186
 
187
  ## Reward Structure
188
 
189
- | Event | Reward | Notes |
 
 
190
  |---|---|---|
191
- | Legal move | `+0.01` | Every valid move |
192
- | Move gives check | `+0.05` | Additional bonus |
193
- | Capture | `+0.10` | Additional bonus |
194
- | Win (checkmate) | `+1.00` | Terminal |
195
  | Loss | `-1.00` | Terminal |
196
  | Draw | `0.00` | Terminal |
197
- | Illegal move | `-0.10` | Episode continues |
 
 
 
 
 
198
 
199
- Combined reward: `0.4 × game_reward + 0.6 × economic_reward`
 
 
200
 
201
  ---
202
 
203
  ## Economy Model
204
 
 
 
205
  | Parameter | Value |
206
  |---|---|
207
  | Starting wallet | 100 units |
208
  | Entry fee | 10 units per agent per game |
209
  | Prize pool | 18 units (90% of 2 × entry fee) |
210
- | Draw refund | 5 units each |
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
211
 
212
  ---
213
 
214
  ## Architecture
215
 
216
  ```
217
- External RL Trainers (TRL / verl / SkyRL)
218
- HTTP
219
-
220
- ─────────────────────────────────────────────
221
- OpenEnv 0.1 HTTP API │
222
- POST /env/reset POST /env/step │
223
- │ GET /env/state GET /env/env_info │
224
- │ asyncio.Lock — thread safe │
225
- └──────────────┬──────────────────────────────┘
226
-
227
- ┌───────┴────────┐
228
- ▼ ▼
229
- ─────────────┐ ┌──────────────
230
- Chess Engine Economy Engine│
231
- python-chess│Wallets · Fees│
232
- FEN · UCI │Prize Pool
233
- └──────┬──────┘ └──────────────┘
234
-
235
- ─────────
236
- ▼ ▼
237
- ♔ Qwen ♚ Llama
238
- 0.5B 1B
239
- GRPO↑ Fixed
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
240
  ```
241
 
242
  ---
243
 
244
- ## Hardware
 
 
245
 
246
- Self-hosted on AdaBoost AI infrastructure:
247
- - NVIDIA RTX 3070 (lambda-quad)
248
- - Models loaded in 4-bit quantization
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
249
 
250
- Built by [AdaBoost AI](https://adaboost.io) · Hackathon 2026
 
 
 
2
  title: ChessEcon
3
  emoji: ♟️
4
  colorFrom: indigo
5
+ colorTo: yellow
6
  sdk: docker
7
  app_port: 8000
8
  tags:
 
15
  - economy
16
  - two-player
17
  - game
18
+ - textarena
19
+ - llm-training
20
  license: apache-2.0
21
  ---
22
 
23
+ <div align="center">
24
 
25
+ # ♟️ ChessEcon
 
26
 
27
+ ### Multi-Agent Chess Economy · OpenEnv 0.1 · GRPO Live Training
28
+
29
+ [![OpenEnv](https://img.shields.io/badge/OpenEnv-0.1-blueviolet?style=flat-square)](https://github.com/huggingface/openenv)
30
+ [![TextArena](https://img.shields.io/badge/TextArena-compatible-orange?style=flat-square)](https://github.com/textarena)
31
+ [![License](https://img.shields.io/badge/license-Apache--2.0-green?style=flat-square)](LICENSE)
32
+ [![Hackathon](https://img.shields.io/badge/Hackathon-2026-gold?style=flat-square)](https://adaboost.io)
33
+
34
+ **Live API:** `https://chessecon.adaboost.io`
35
  **Dashboard:** `https://chessecon-ui.adaboost.io`
36
+ **Swagger:** `https://chessecon.adaboost.io/docs`
37
+ **env_info:** `https://chessecon.adaboost.io/env/env_info`
38
+
39
+ </div>
40
 
41
  ---
42
 
43
+ ## Overview
 
44
 
45
+ ChessEcon is a **two-player LLM chess environment** where agents compete for economic stakes, fully compliant with the [OpenEnv 0.1](https://github.com/huggingface/openenv) specification.
46
+
47
+ Two language models play chess head-to-head. Each game costs an entry fee. The winner earns a prize pool. The White agent trains **live** using **GRPO** (Group Relative Policy Optimisation) — every game updates the policy weights in real-time. A Bloomberg-style dashboard streams all activity via WebSocket.
48
+
49
+ | Agent | Model | Role |
50
+ |---|---|---|
51
+ | ♔ White | `Qwen/Qwen2.5-0.5B-Instruct` | **Trainable** — GRPO updates every game |
52
+ | ♚ Black | `meta-llama/Llama-3.2-1B-Instruct` | **Fixed opponent** — frozen weights |
53
 
54
  ---
55
 
56
  ## OpenEnv 0.1 API
57
 
58
+ All endpoints are compatible with TRL, verl, SkyRL, and any OpenEnv 0.1 trainer.
59
 
60
  | Endpoint | Method | Description |
61
  |---|---|---|
62
+ | `/env/reset` | `POST` | Start new episode · deduct entry fees · return initial observation |
63
+ | `/env/step` | `POST` | Apply one move (UCI or SAN) · return reward + next observation |
64
+ | `/env/state` | `GET` | Read current board state — non-destructive |
65
  | `/env/env_info` | `GET` | Environment metadata for HF Hub discoverability |
66
+ | `/ws` | `WebSocket` | Real-time event stream (moves, rewards, GRPO metrics) |
67
  | `/health` | `GET` | Health check + model load status |
68
  | `/docs` | `GET` | Interactive Swagger UI |
69
 
 
78
 
79
  # 1. Start a new episode
80
  reset = httpx.post(f"{BASE}/env/reset").json()
81
+ print(reset["observation"]["fen"]) # starting position
82
+ print(reset["observation"]["legal_moves_uci"]) # all legal moves in UCI
83
 
84
+ # 2. Play a move (UCI or SAN accepted)
85
  step = httpx.post(f"{BASE}/env/step", json={"action": "e2e4"}).json()
86
+ print(step["observation"]["fen"]) # updated board
87
  print(step["reward"]) # per-step reward signal
88
+ print(step["terminated"]) # True when game ends
89
+ print(step["truncated"]) # True if move limit reached
90
 
91
+ # 3. Inspect current state (read-only)
92
  state = httpx.get(f"{BASE}/env/state").json()
93
  print(state["step_count"]) # moves played so far
94
  print(state["status"]) # "active" | "terminated" | "idle"
 
96
  # 4. Environment metadata
97
  info = httpx.get(f"{BASE}/env/env_info").json()
98
  print(info["openenv_version"]) # "0.1"
99
+ print(info["agents"]) # model IDs for white/black
100
  ```
101
 
102
+ ---
103
+
104
+ ## Drop-in Client (TRL / verl / SkyRL)
105
 
106
  ```python
107
  import httpx
108
 
109
+ class ChessEconEnv:
110
+ """
111
+ OpenEnv 0.1 client for ChessEcon.
112
+ Compatible with TRL, verl, SkyRL, and any gym-style RL trainer.
113
+ """
114
 
115
  def __init__(self, base_url: str = "https://chessecon.adaboost.io"):
116
  self.base = base_url.rstrip("/")
117
+ self.http = httpx.Client(timeout=30)
118
 
119
+ def reset(self, seed: int | None = None) -> tuple[dict, dict]:
120
  payload = {"seed": seed} if seed is not None else {}
121
+ r = self.http.post(f"{self.base}/env/reset", json=payload)
122
  r.raise_for_status()
123
+ d = r.json()
124
+ return d["observation"], d["info"]
125
+
126
+ def step(self, action: str) -> tuple[dict, float, bool, bool, dict]:
127
+ """
128
+ Args:
129
+ action: Move in UCI (e.g. "e2e4") or SAN (e.g. "e4")
130
+ Returns:
131
+ (observation, reward, terminated, truncated, info)
132
+ """
133
+ r = self.http.post(f"{self.base}/env/step", json={"action": action})
134
  r.raise_for_status()
135
+ d = r.json()
136
+ return (d["observation"], d["reward"], d["terminated"], d["truncated"], d["info"])
137
+
138
+ def state(self) -> dict:
139
+ return self.http.get(f"{self.base}/env/state").json()
140
+
141
+ def env_info(self) -> dict:
142
+ return self.http.get(f"{self.base}/env/env_info").json()
143
 
144
+ def close(self):
145
+ self.http.close()
146
 
 
 
147
 
148
+ # Example: random rollout
149
+ import random
150
 
151
+ env = ChessEconEnv()
 
152
  obs, info = env.reset()
153
+ total_reward = 0.0
154
 
155
  while True:
156
+ action = random.choice(obs["legal_moves_uci"]) # replace with your policy
157
  obs, reward, terminated, truncated, info = env.step(action)
158
+ total_reward += reward
159
  if terminated or truncated:
160
+ print(f"Game over | result={info.get('result')} | total_reward={total_reward:.3f}")
161
  break
162
+
163
+ env.close()
164
  ```
165
 
166
  ---
167
 
168
  ## Observation Schema
169
 
170
+ Every response from `/env/reset`, `/env/step`, and `/env/state` contains a `ChessObservation`:
171
 
172
  ```json
173
  {
 
177
  "move_number": 1,
178
  "last_move_uci": "e2e4",
179
  "last_move_san": "e4",
180
+ "legal_moves_uci": ["e7e5", "d7d5", "g8f6", "..."],
181
  "is_check": false,
182
  "wallet_white": 90.0,
183
  "wallet_black": 90.0,
 
188
  }
189
  ```
190
 
191
+ ### `/env/step` Response
192
 
193
  ```json
194
  {
195
+ "observation": { "...": "ChessObservation — see above" },
196
  "reward": 0.01,
197
  "terminated": false,
198
  "truncated": false,
 
200
  }
201
  ```
202
 
203
+ ### `/env/state` Response
204
 
205
  ```json
206
  {
207
+ "observation": { "...": "ChessObservation — see above" },
208
  "episode_id": "ep-42",
209
  "step_count": 1,
210
  "status": "active",
 
212
  }
213
  ```
214
 
215
+ ### `/env/env_info` Response
216
+
217
+ ```json
218
+ {
219
+ "openenv_version": "0.1",
220
+ "environment_id": "chessecon-v1",
221
+ "name": "ChessEcon",
222
+ "description": "Multi-agent chess economy with live GRPO training",
223
+ "action_space": "text",
224
+ "observation_space": "text",
225
+ "reward_range": [-1.0, 1.0],
226
+ "max_steps": 40,
227
+ "agents": {
228
+ "white": "Qwen/Qwen2.5-0.5B-Instruct",
229
+ "black": "meta-llama/Llama-3.2-1B-Instruct"
230
+ },
231
+ "tags": ["chess", "multi-agent", "economy", "grpo", "openenv"]
232
+ }
233
+ ```
234
+
235
  ---
236
 
237
  ## Reward Structure
238
 
239
+ Per-step rewards are issued after every move. Terminal rewards are issued at game end.
240
+
241
+ | Event | Reward | Type |
242
  |---|---|---|
243
+ | Legal move played | `+0.01` | Per-step |
244
+ | Move delivers check | `+0.05` | Per-step bonus |
245
+ | Capture | `+0.10` | Per-step bonus |
246
+ | Win (checkmate / material adj.) | `+1.00` | Terminal |
247
  | Loss | `-1.00` | Terminal |
248
  | Draw | `0.00` | Terminal |
249
+ | Illegal move attempted | `-0.10` | Per-step penalty |
250
+
251
+ > **Combined reward formula:**
252
+ > `R = 0.4 × game_reward + 0.6 × economic_reward`
253
+ >
254
+ > `economic_reward = (prize_income − entry_fee) / entry_fee`
255
 
256
+ ### Material Adjudication
257
+
258
+ Games reaching the move limit are adjudicated by material count (Q=9, R=5, B=3, N=3, P=1). The side with superior material wins — ensuring every game produces a decisive `+1` / `-1` signal for GRPO training.
259
 
260
  ---
261
 
262
  ## Economy Model
263
 
264
+ Both agents pay into a shared prize pool each game, creating zero-sum economic incentives aligned with game outcome.
265
+
266
  | Parameter | Value |
267
  |---|---|
268
  | Starting wallet | 100 units |
269
  | Entry fee | 10 units per agent per game |
270
  | Prize pool | 18 units (90% of 2 × entry fee) |
271
+ | Win payout | +18 units net **+8** |
272
+ | Draw payout | +9 units each → net **−1** |
273
+ | Loss payout | +0 units → net **−10** |
274
+
275
+ ---
276
+
277
+ ## GRPO Training
278
+
279
+ The White agent (`Qwen2.5-0.5B`) trains live using Group Relative Policy Optimisation:
280
+
281
+ ```
282
+ Per-game update:
283
+ 1. White generates moves: sample log π_θ(a | s) at each position
284
+ 2. Reference log-probs log π_ref(a | s) computed from frozen snapshot
285
+ 3. Terminal reward R ∈ {+1, 0, −1} from material adjudication
286
+ 4. Advantage: A = (R − mean_R) / (std_R + ε)
287
+ 5. Clipped surrogate: L = −min(ratio·A, clip(ratio, 0.8, 1.2)·A)
288
+ 6. KL penalty: KL(π_θ ∥ π_ref), diff clamped to [−10, 10]
289
+ 7. Total: L_total = L + β·KL, β = 0.04
290
+ 8. AdamW update, grad-norm clip max_norm=1.0
291
+ ```
292
+
293
+ | Hyperparameter | Value |
294
+ |---|---|
295
+ | LoRA rank | 8 |
296
+ | LoRA target modules | `q_proj`, `v_proj` |
297
+ | Learning rate | `1e-5` |
298
+ | KL coefficient β | `0.04` |
299
+ | Update frequency | Every 1 game |
300
+ | Checkpoint frequency | Every 100 steps |
301
+ | Optimizer | AdamW |
302
+ | Gradient clip | `max_norm=1.0` |
303
 
304
  ---
305
 
306
  ## Architecture
307
 
308
  ```
309
+ ┌──────────────────────────────────────────────────────────────┐
310
+ External RL Trainers │
311
+ │ TRL · verl · SkyRL · custom OpenEnv clients │
312
+ ─────────────────────────────────────────────────────────────┘
313
+ │ HTTP POST /env/reset /env/step
314
+ GET /env/state /env/env_info
315
+
316
+ ┌──────────────────────────────────────────────────────────────┐
317
+ │ FastAPI WebSocket Server │
318
+ ┌──────────────────────┐ ┌───────────────────────────┐ │
319
+ │ │ OpenEnv 0.1 Router │ │ WebSocket /ws │ │
320
+ │ │ asyncio.Lock │ │ broadcast() → dashboard │ │
321
+ │ └─────────────────────┘ └───────────────────────────┘ │
322
+
323
+ ┌──────────▼───────────┐ ┌───────────────────────────┐
324
+ │ Chess Engine │ Economy Engine
325
+ │ python-chess │ │ Wallets · Entry fees │ │
326
+ │ FEN · UCI · SAN │ │ Prize pool · P&L │ │
327
+ ──────────┬───────────┘ └───────────────────────────┘ │
328
+ │ │ │
329
+ │ ┌──────────▼───────────┐ ┌───────────────────────────┐ │
330
+ │ │ ♔ White Agent │ │ ♚ Black Agent (fixed) │ │
331
+ │ │ Qwen2.5-0.5B │ │ Llama-3.2-1B │ │
332
+ │ │ LoRA r=8 │ │ Frozen weights │ │
333
+ │ └──────────┬───────────┘ └───────────────────────────┘ │
334
+ │ │ │
335
+ │ ┌──────────▼───────────┐ │
336
+ │ │ GRPO Trainer │──▶ /checkpoints/step_N │
337
+ │ │ PPO-clip + KL │ │
338
+ │ │ AdamW LR=1e-5 │ │
339
+ │ └──────────────────────┘ │
340
+ └──────────────────────┬───────────────────────────────────────┘
341
+ │ WebSocket broadcast()
342
+
343
+ ┌──────────────────────────────────────────────────────────────┐
344
+ │ React Dashboard (nginx) │
345
+ │ Live Board · Wallet History · GRPO Metrics · P&L Chart │
346
+ │ Architecture View · Live Event Feed │
347
+ └──────────────────────────────────────────────────────────────┘
348
  ```
349
 
350
  ---
351
 
352
+ ## WebSocket Event Stream
353
+
354
+ Connect to `wss://chessecon.adaboost.io/ws` for real-time events:
355
 
356
+ ```python
357
+ import asyncio, json, websockets
358
+
359
+ async def watch():
360
+ async with websockets.connect("wss://chessecon.adaboost.io/ws") as ws:
361
+ async for raw in ws:
362
+ msg = json.loads(raw)
363
+ match msg["type"]:
364
+ case "move":
365
+ print(f"{msg['data']['player']} plays {msg['data']['move']}")
366
+ case "game_end":
367
+ d = msg["data"]
368
+ print(f"Game over: {d['result']} | reward={d['reward']}")
369
+ case "training_step":
370
+ d = msg["data"]
371
+ print(f"GRPO step {d['step']} | loss={d['loss']:.4f} kl={d['kl_div']:.4f}")
372
+ case "status":
373
+ print(f"Snapshot: game #{msg['data']['game_id']}")
374
+
375
+ asyncio.run(watch())
376
+ ```
377
+
378
+ ### Event Types
379
+
380
+ | Type | Key Fields |
381
+ |---|---|
382
+ | `status` | `game_id`, `wallet_white`, `wallet_black`, `grpo_step` |
383
+ | `game_start` | `game_id`, `wallet_white`, `wallet_black`, `prize_pool` |
384
+ | `move` | `player`, `move`, `uci`, `fen`, `move_number` |
385
+ | `game_end` | `result`, `reward`, `wallet_white`, `wallet_black`, `net_pnl_white` |
386
+ | `training_step` | `step`, `loss`, `reward`, `kl_div`, `win_rate` |
387
+
388
+ ---
389
+
390
+ ## Running Locally
391
+
392
+ ```bash
393
+ git clone https://huggingface.co/spaces/adaboost-ai/chessecon
394
+ cd chessecon
395
+
396
+ # Download models (first run only — requires HF token for Llama)
397
+ python3 -c "
398
+ from huggingface_hub import snapshot_download
399
+ snapshot_download('Qwen/Qwen2.5-0.5B-Instruct',
400
+ local_dir='training/models/Qwen_Qwen2.5-0.5B-Instruct')
401
+ snapshot_download('meta-llama/Llama-3.2-1B-Instruct',
402
+ local_dir='training/models/meta-llama_Llama-3.2-1B-Instruct')
403
+ "
404
+
405
+ # Start backend + dashboard
406
+ docker-compose up -d
407
+
408
+ # API: http://localhost:8008
409
+ # Dashboard: http://localhost:3006
410
+ # Docs: http://localhost:8008/docs
411
+ ```
412
+
413
+ ### Key Environment Variables
414
+
415
+ | Variable | Default | Description |
416
+ |---|---|---|
417
+ | `WHITE_MODEL` | `/models/Qwen_...` | Path to White model |
418
+ | `BLACK_MODEL` | `/models/meta-llama_...` | Path to Black model |
419
+ | `DEVICE` | `cuda` | `cuda` or `cpu` |
420
+ | `MAX_MOVES` | `15` | Moves before material adjudication |
421
+ | `MOVE_DELAY` | `0.05` | Seconds between moves |
422
+ | `ENTRY_FEE` | `10` | Units per agent per game |
423
+ | `PRIZE_POOL_FRACTION` | `0.9` | Fraction of 2×entry returned as prize |
424
+ | `GRPO_LR` | `1e-5` | AdamW learning rate |
425
+ | `GRPO_KL_COEFF` | `0.04` | KL divergence penalty β |
426
+ | `LORA_RANK` | `8` | LoRA adapter rank |
427
+
428
+ ---
429
+
430
+ ## Hardware Requirements
431
+
432
+ | Config | Minimum |
433
+ |---|---|
434
+ | CPU-only | 8 GB RAM · `DEVICE=cpu` |
435
+ | GPU (recommended) | 8 GB VRAM · CUDA 11.8+ |
436
+ | Dev server | 4× NVIDIA RTX 3070 (lambda-quad) |
437
+
438
+ ---
439
+
440
+ ## Citation
441
+
442
+ ```bibtex
443
+ @software{chessecon2026,
444
+ title = {ChessEcon: Multi-Agent Chess Economy with Live GRPO Training},
445
+ author = {AdaBoost AI},
446
+ year = {2026},
447
+ url = {https://huggingface.co/spaces/adaboost-ai/chessecon},
448
+ note = {OpenEnv 0.1 · TextArena + Meta OpenEnv · Hackathon 2026}
449
+ }
450
+ ```
451
+
452
+ ---
453
+
454
+ ## Links
455
+
456
+ - **Live Dashboard:** [chessecon-ui.adaboost.io](https://chessecon-ui.adaboost.io)
457
+ - **API + Swagger:** [chessecon.adaboost.io/docs](https://chessecon.adaboost.io/docs)
458
+ - **AdaBoost AI:** [adaboost.io](https://adaboost.io)
459
+ - **OpenEnv Spec:** [github.com/huggingface/openenv](https://github.com/huggingface/openenv)
460
+ - **GRPO Paper:** [DeepSeek-R1 (arXiv 2501.12599)](https://arxiv.org/abs/2501.12599)
461
+
462
+ ---
463
 
464
+ <div align="center">
465
+ Built by <a href="https://adaboost.io">AdaBoost AI</a> · TextArena + Meta OpenEnv + GRPO · Hackathon 2026
466
+ </div>