chessecon

Runtime error

App Files Files Community

suvasis commited on Mar 8

Commit

e4d7d50

1 Parent(s): 44140a7

code add

Browse files

This view is limited to 50 files because it contains too many changes. See raw diff

Files changed (50) hide show

Dockerfile +125 -0
Makefile +120 -0
README.md +250 -1
README_1.md +61 -0
app.py +110 -0
backend/.DS_Store +0 -0
backend/.env.example +29 -0
backend/Dockerfile +43 -0
backend/__init__.py +1 -0
backend/agents/__init__.py +0 -0
backend/agents/claude_coach.py +131 -0
backend/agents/complexity.py +79 -0
backend/agents/grpo_trainer.py +236 -0
backend/agents/model_agent.py +285 -0
backend/agents/nvm_player_agent.py +265 -0
backend/agents/qwen_agent.py +228 -0
backend/api/__init__.py +0 -0
backend/api/coaching_router.py +274 -0
backend/api/game_router.py +295 -0
backend/api/training_router.py +75 -0
backend/api/websocket.py +97 -0
backend/api/websocket.py_backup +87 -0
backend/chess_engine.py +186 -0
backend/chess_lib/__init__.py +0 -0
backend/chess_lib/chess_engine.py +166 -0
backend/chess_lib/engine.py +125 -0
backend/economy/.DS_Store +0 -0
backend/economy/__init__.py +0 -0
backend/economy/ledger.py +174 -0
backend/economy/nvm_payments.py +340 -0
backend/economy/register_agent.py +138 -0
backend/grpo_trainer.py +240 -0
backend/main.py +313 -0
backend/main.py_backup +218 -0
backend/openenv/__init__.py +19 -0
backend/openenv/env.py +311 -0
backend/openenv/models.py +136 -0
backend/openenv/router.py +159 -0
backend/qwen_agent.py +228 -0
backend/requirements.txt +40 -0
backend/settings.py +65 -0
backend/websocket_server.py +365 -0
doc.md +124 -0
docker-compose.gpu.yml +52 -0
docker-compose.yml +63 -0
docker-compose.yml_backup +61 -0
docker-entrypoint.sh +175 -0
docs/Issues.md +47 -0
docs/latest_fixes_howto.md +420 -0
frontend/.DS_Store +0 -0

Dockerfile ADDED Viewed

	@@ -0,0 +1,125 @@

+# ─────────────────────────────────────────────────────────────────────────────
+# ChessEcon — Unified Multi-Stage Dockerfile
+#
+# Stages:
+#   1. frontend-builder  — builds the React TypeScript dashboard (Node.js)
+#   2. backend-cpu       — Python FastAPI backend, serves built frontend as static
+#   3. backend-gpu       — same as backend-cpu but with CUDA PyTorch
+#
+# Usage:
+#   CPU:  docker build --target backend-cpu -t chessecon:cpu .
+#   GPU:  docker build --target backend-gpu -t chessecon:gpu .
+# ─────────────────────────────────────────────────────────────────────────────
+# ── Stage 1: Build the React frontend ────────────────────────────────────────
+FROM node:22-alpine AS frontend-builder
+WORKDIR /app/frontend
+# Copy package files AND patches dir (required by pnpm for patched dependencies)
+COPY frontend/package.json frontend/pnpm-lock.yaml* ./
+COPY frontend/patches/ ./patches/
+RUN npm install -g pnpm && pnpm install --frozen-lockfile
+# Copy the full frontend source
+COPY frontend/ ./
+# Build the production bundle (frontend only — no Express server build)
+# vite.config.ts outputs to dist/public/ relative to the project root
+RUN pnpm build:docker
+# ── Stage 2: CPU backend ──────────────────────────────────────────────────────
+FROM python:3.11-slim AS backend-cpu
+LABEL maintainer="ChessEcon Team"
+LABEL description="ChessEcon — Multi-Agent Chess RL System (CPU)"
+# System dependencies
+RUN apt-get update && apt-get install -y --no-install-recommends \
+    stockfish \
+    curl \
+    git \
+    && rm -rf /var/lib/apt/lists/*
+WORKDIR /app
+# Install Python dependencies
+COPY backend/requirements.txt ./backend/requirements.txt
+RUN pip install --no-cache-dir -r backend/requirements.txt
+# Copy the backend source
+COPY backend/ ./backend/
+COPY shared/ ./shared/
+# Copy the built frontend into the backend's static directory
+# vite.config.ts outputs to dist/public/ (see build.outDir in vite.config.ts)
+COPY --from=frontend-builder /app/frontend/dist/public ./backend/static/
+# Copy entrypoint
+COPY docker-entrypoint.sh ./
+RUN chmod +x docker-entrypoint.sh
+# Create directories for model cache and training data
+RUN mkdir -p /app/models /app/data/games /app/data/training /app/logs
+# Expose the application port
+EXPOSE 8000
+# Health check
+HEALTHCHECK --interval=30s --timeout=10s --start-period=60s --retries=3 \
+    CMD curl -f http://localhost:8000/health || exit 1
+ENTRYPOINT ["./docker-entrypoint.sh"]
+CMD ["backend"]
+# ── Stage 3: GPU backend ──────────────────────────────────────────────────────
+FROM nvidia/cuda:12.1.0-cudnn8-runtime-ubuntu22.04 AS backend-gpu
+LABEL maintainer="ChessEcon Team"
+LABEL description="ChessEcon — Multi-Agent Chess RL System (GPU/CUDA)"
+# System dependencies
+RUN apt-get update && apt-get install -y --no-install-recommends \
+    python3.11 \
+    python3.11-dev \
+    python3-pip \
+    stockfish \
+    curl \
+    git \
+    && rm -rf /var/lib/apt/lists/* \
+    && ln -sf /usr/bin/python3.11 /usr/bin/python3 \
+    && ln -sf /usr/bin/python3 /usr/bin/python
+WORKDIR /app
+# Install PyTorch with CUDA support first (separate layer for caching)
+RUN pip install --no-cache-dir torch==2.3.0 torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
+# Install remaining Python dependencies
+COPY backend/requirements.txt ./backend/requirements.txt
+COPY training/requirements.txt ./training/requirements.txt
+RUN pip install --no-cache-dir -r backend/requirements.txt
+RUN pip install --no-cache-dir -r training/requirements.txt
+# Copy source
+COPY backend/ ./backend/
+COPY training/ ./training/
+COPY shared/ ./shared/
+# Copy the built frontend
+COPY --from=frontend-builder /app/frontend/dist/public ./backend/static/
+# Copy entrypoint
+COPY docker-entrypoint.sh ./
+RUN chmod +x docker-entrypoint.sh
+# Create directories
+RUN mkdir -p /app/models /app/data/games /app/data/training /app/logs
+EXPOSE 8000
+HEALTHCHECK --interval=30s --timeout=10s --start-period=120s --retries=3 \
+    CMD curl -f http://localhost:8000/health || exit 1
+ENTRYPOINT ["./docker-entrypoint.sh"]
+CMD ["backend"]

Makefile ADDED Viewed

	@@ -0,0 +1,120 @@

+# ─────────────────────────────────────────────────────────────────────────────
+# ChessEcon — Makefile
+# ─────────────────────────────────────────────────────────────────────────────
+.PHONY: help env-file dirs build build-gpu up up-gpu down demo selfplay train \
+        train-gpu logs shell clean frontend-dev backend-dev test lint
+# ── Default target ────────────────────────────────────────────────────────────
+help:
+	@echo ""
+	@echo "  ChessEcon — Multi-Agent Chess RL System"
+	@echo "  ════════════════════════════════════════"
+	@echo ""
+	@echo "  Setup:"
+	@echo "    make env-file     Copy .env.example → .env (edit before running)"
+	@echo "    make dirs         Create host volume directories"
+	@echo ""
+	@echo "  Docker (CPU):"
+	@echo "    make build        Build the CPU Docker image"
+	@echo "    make up           Start the dashboard + API (http://localhost:8000)"
+	@echo "    make demo         Run a 3-game demo and exit"
+	@echo "    make selfplay     Collect self-play data (no training)"
+	@echo "    make train        Run RL training (CPU)"
+	@echo "    make down         Stop all containers"
+	@echo ""
+	@echo "  Docker (GPU):"
+	@echo "    make build-gpu    Build the GPU Docker image"
+	@echo "    make up-gpu       Start with GPU support"
+	@echo "    make train-gpu    Run RL training (GPU)"
+	@echo ""
+	@echo "  Development:"
+	@echo "    make frontend-dev Start React dev server (hot-reload)"
+	@echo "    make backend-dev  Start FastAPI dev server"
+	@echo "    make test         Run all tests"
+	@echo "    make lint         Run linters"
+	@echo ""
+	@echo "  Utilities:"
+	@echo "    make logs         Tail container logs"
+	@echo "    make shell        Open shell in running container"
+	@echo "    make clean        Remove containers, images, and volumes"
+	@echo ""
+# ── Setup ─────────────────────────────────────────────────────────────────────
+env-file:
+	@if [ -f .env ]; then \
+		echo ".env already exists. Delete it first if you want to reset."; \
+	else \
+		cp .env.example .env; \
+		echo ".env created. Edit it with your API keys before running."; \
+	fi
+dirs:
+	@mkdir -p volumes/models volumes/data volumes/logs
+	@echo "Volume directories created."
+# ── Docker CPU ────────────────────────────────────────────────────────────────
+build: dirs
+	docker compose build chessecon
+up: dirs
+	docker compose up chessecon
+demo: dirs
+	docker compose run --rm chessecon demo
+selfplay: dirs
+	docker compose run --rm \
+		-e RL_METHOD=selfplay \
+		chessecon selfplay
+train: dirs
+	docker compose --profile training up trainer
+down:
+	docker compose down
+# ── Docker GPU ────────────────────────────────────────────────────────────────
+build-gpu: dirs
+	docker compose -f docker-compose.yml -f docker-compose.gpu.yml build
+up-gpu: dirs
+	docker compose -f docker-compose.yml -f docker-compose.gpu.yml up chessecon
+train-gpu: dirs
+	docker compose -f docker-compose.yml -f docker-compose.gpu.yml \
+		--profile training up trainer
+# ── Development (local, no Docker) ───────────────────────────────────────────
+frontend-dev:
+	@echo "Starting React frontend dev server..."
+	cd frontend && pnpm install && pnpm dev
+backend-dev:
+	@echo "Starting FastAPI backend dev server..."
+	cd backend && pip install -r requirements.txt && \
+		uvicorn main:app --reload --host 0.0.0.0 --port 8000
+# ── Testing ───────────────────────────────────────────────────────────────────
+test:
+	@echo "Running backend tests..."
+	cd backend && python -m pytest tests/ -v
+	@echo "Running frontend tests..."
+	cd frontend && pnpm test
+lint:
+	@echo "Linting backend..."
+	cd backend && python -m ruff check . || true
+	@echo "Linting frontend..."
+	cd frontend && pnpm lint || true
+# ── Utilities ─────────────────────────────────────────────────────��───────────
+logs:
+	docker compose logs -f chessecon
+shell:
+	docker compose exec chessecon /bin/bash
+clean:
+	docker compose down -v --rmi local
+	@echo "Containers, images, and volumes removed."

README.md CHANGED Viewed

	@@ -1 +1,250 @@
1	- ~~test~~

+---
+title: ChessEcon
+emoji: ♟️
+colorFrom: indigo
+colorTo: purple
+sdk: docker
+app_port: 8000
+tags:
+  - openenv
+  - reinforcement-learning
+  - chess
+  - multi-agent
+  - grpo
+  - rl-environment
+  - economy
+  - two-player
+  - game
+license: apache-2.0
+---
+# ♟️ ChessEcon — OpenEnv 0.1 Compliant Chess Economy Environment
+> **Self-hosted environment** — the live API runs on AdaBoost AI infrastructure.
+> Update this URL if the domain changes.
+**Live API base URL:** `https://chessecon.adaboost.io`
+**env_info:** `https://chessecon.adaboost.io/env/env_info`
+**Dashboard:** `https://chessecon-ui.adaboost.io`
+**Swagger docs:** `https://chessecon.adaboost.io/docs`
+---
+**Two competing LLM agents play chess for economic stakes.**
+White = `Qwen/Qwen2.5-0.5B-Instruct` (trainable) | Black = `meta-llama/Llama-3.2-1B-Instruct` (fixed)
+Both agents pay an entry fee each game. The winner earns a prize pool.
+The White agent is trained live with **GRPO** (Group Relative Policy Optimisation).
+---
+## OpenEnv 0.1 API
+This environment is fully compliant with the [OpenEnv 0.1 spec](https://github.com/huggingface/openenv).
+| Endpoint | Method | Description |
+|---|---|---|
+| `/env/reset` | `POST` | Start a new episode, deduct entry fees, return initial observation |
+| `/env/step` | `POST` | Apply one move (UCI or SAN), return reward + next observation |
+| `/env/state` | `GET` | Inspect current board state — read-only, no side effects |
+| `/env/env_info` | `GET` | Environment metadata for HF Hub discoverability |
+| `/ws` | `WS` | Real-time event stream for the live dashboard |
+| `/health` | `GET` | Health check + model load status |
+| `/docs` | `GET` | Interactive Swagger UI |
+---
+## Quick Start
+```python
+import httpx
+BASE = "https://chessecon.adaboost.io"
+# 1. Start a new episode
+reset = httpx.post(f"{BASE}/env/reset").json()
+print(reset["observation"]["fen"])             # starting FEN
+print(reset["observation"]["legal_moves_uci"]) # all legal moves
+# 2. Play moves
+step = httpx.post(f"{BASE}/env/step", json={"action": "e2e4"}).json()
+print(step["observation"]["fen"])   # board after move
+print(step["reward"])               # per-step reward signal
+print(step["terminated"])           # True if game is over
+print(step["truncated"])            # True if move limit hit
+# 3. Inspect state (non-destructive)
+state = httpx.get(f"{BASE}/env/state").json()
+print(state["step_count"])          # moves played so far
+print(state["status"])              # "active" | "terminated" | "idle"
+# 4. Environment metadata
+info = httpx.get(f"{BASE}/env/env_info").json()
+print(info["openenv_version"])      # "0.1"
+print(info["agents"])               # white/black model IDs
+```
+### Drop-in Client for TRL / verl / SkyRL
+```python
+import httpx
+class ChessEconClient:
+    """OpenEnv 0.1 client — compatible with TRL, verl, SkyRL."""
+    def __init__(self, base_url: str = "https://chessecon.adaboost.io"):
+        self.base = base_url.rstrip("/")
+        self.client = httpx.Client(timeout=30)
+    def reset(self, seed=None):
+        payload = {"seed": seed} if seed is not None else {}
+        r = self.client.post(f"{self.base}/env/reset", json=payload)
+        r.raise_for_status()
+        data = r.json()
+        return data["observation"], data["info"]
+    def step(self, action: str):
+        r = self.client.post(f"{self.base}/env/step", json={"action": action})
+        r.raise_for_status()
+        data = r.json()
+        return (
+            data["observation"],
+            data["reward"],
+            data["terminated"],
+            data["truncated"],
+            data["info"],
+        )
+    def state(self):
+        return self.client.get(f"{self.base}/env/state").json()
+    def env_info(self):
+        return self.client.get(f"{self.base}/env/env_info").json()
+# Usage
+env = ChessEconClient()
+obs, info = env.reset()
+while True:
+    action = obs["legal_moves_uci"][0]          # replace with your policy
+    obs, reward, terminated, truncated, info = env.step(action)
+    if terminated or truncated:
+        break
+```
+---
+## Observation Schema
+Every response wraps a `ChessObservation` object:
+```json
+{
+  "observation": {
+    "fen": "rnbqkbnr/pppppppp/8/8/4P3/8/PPPP1PPP/RNBQKBNR b KQkq - 0 1",
+    "turn": "black",
+    "move_number": 1,
+    "last_move_uci": "e2e4",
+    "last_move_san": "e4",
+    "legal_moves_uci": ["e7e5", "d7d5", "g8f6"],
+    "is_check": false,
+    "wallet_white": 90.0,
+    "wallet_black": 90.0,
+    "white_model": "Qwen/Qwen2.5-0.5B-Instruct",
+    "black_model": "meta-llama/Llama-3.2-1B-Instruct",
+    "info": {}
+  }
+}
+```
+### Step Response
+```json
+{
+  "observation": { "...": "see above" },
+  "reward": 0.01,
+  "terminated": false,
+  "truncated": false,
+  "info": { "san": "e4", "uci": "e2e4", "move_number": 1 }
+}
+```
+### State Response
+```json
+{
+  "observation": { "...": "see above" },
+  "episode_id": "ep-42",
+  "step_count": 1,
+  "status": "active",
+  "info": {}
+}
+```
+---
+## Reward Structure
+| Event | Reward | Notes |
+|---|---|---|
+| Legal move | `+0.01` | Every valid move |
+| Move gives check | `+0.05` | Additional bonus |
+| Capture | `+0.10` | Additional bonus |
+| Win (checkmate) | `+1.00` | Terminal |
+| Loss | `-1.00` | Terminal |
+| Draw | `0.00` | Terminal |
+| Illegal move | `-0.10` | Episode continues |
+Combined reward: `0.4 × game_reward + 0.6 × economic_reward`
+---
+## Economy Model
+| Parameter | Value |
+|---|---|
+| Starting wallet | 100 units |
+| Entry fee | 10 units per agent per game |
+| Prize pool | 18 units (90% of 2 × entry fee) |
+| Draw refund | 5 units each |
+---
+## Architecture
+```
+External RL Trainers (TRL / verl / SkyRL)
+          │  HTTP
+          ▼
+┌─────────────────────────────────────────────┐
+│         OpenEnv 0.1 HTTP API                │
+│  POST /env/reset  POST /env/step            │
+│  GET  /env/state  GET  /env/env_info        │
+│         asyncio.Lock — thread safe          │
+└──────────────┬──────────────────────────────┘
+               │
+       ┌───────┴────────┐
+       ▼                ▼
+┌─────────────┐  ┌──────────────┐
+│ Chess Engine│  │Economy Engine│
+│ python-chess│  │Wallets · Fees│
+│ FEN · UCI   │  │Prize Pool    │
+└──────┬──────┘  └──────────────┘
+       │
+  ┌────┴─────┐
+  ▼          ▼
+♔ Qwen     ♚ Llama
+0.5B       1B
+GRPO↑      Fixed
+```
+---
+## Hardware
+Self-hosted on AdaBoost AI infrastructure:
+- 4× NVIDIA RTX 3070 (lambda-quad)
+- Models loaded in 4-bit quantization
+Built by [AdaBoost AI](https://adaboost.io) · Hackathon 2026

README_1.md ADDED Viewed

	@@ -0,0 +1,61 @@

+# Pitch: The Autonomous Chess Economy
+## 1. The Vision: From Game Players to Autonomous Businesses
+The hackathon challenges us to build "autonomous businesses where agents make real economic decisions." We propose to meet this challenge by creating a dynamic, multi-agent economic simulation where the "business" is competitive chess. Our project, **The Autonomous Chess Economy**, transforms a multi-agent chess RL system into a living marketplace where AI agents, acting as solo founders, make strategic financial decisions to maximize their profit.
+> In our system, agents don’t just play chess; they run a business. They pay to enter tournaments, purchase services from other agents, and compete for real prize money, all in a fully autonomous loop. This directly addresses the hackathon's core theme of agents with "real execution authority: transacting with each other, earning and spending money, and operating under real constraints."
+## 2. The Architecture: A Multi-Layered Economic Simulation
+We extend our existing multi-agent chess platform by introducing a new economic layer. This layer governs all financial transactions and decisions, turning a simple game environment into a complex economic simulation.
+![Autonomous Economic Agent Architecture](https://private-us-east-1.manuscdn.com/sessionFile/ELP96X8OiHqgxiSAuWbFms/sandbox/DkQnI6BiqjsJuDKZwKYaEL-images_1772590773264_na1fn_L2hvbWUvdWJ1bnR1L2Vjb25vbWljX2FnZW50X2FyY2hpdGVjdHVyZQ.png?Policy=eyJTdGF0ZW1lbnQiOlt7IlJlc291cmNlIjoiaHR0cHM6Ly9wcml2YXRlLXVzLWVhc3QtMS5tYW51c2Nkbi5jb20vc2Vzc2lvbkZpbGUvRUxQOTZYOE9pSHFneGlTQXVXYkZtcy9zYW5kYm94L0RrUW5JNkJpcWpzSnVES1p3S1lhRUwtaW1hZ2VzXzE3NzI1OTA3NzMyNjRfbmExZm5fTDJodmJXVXZkV0oxYm5SMUwyVmpiMjV2YldsalgyRm5aVzUwWDJGeVkyaHBkR1ZqZEhWeVpRLnBuZyIsIkNvbmRpdGlvbiI6eyJEYXRlTGVzc1RoYW4iOnsiQVdTOkVwb2NoVGltZSI6MTc5ODc2MTYwMH19fV19&Key-Pair-Id=K2HSFNDJXOU9YS&Signature=rNPEVwChkPDeMlLFncPZ3h9LMOF8RO71jI~pnNlc6NcjqopfaO1QhBL0NuEmo9Ef26L5G3l6pbGddJoOcRGoj8G2G-4NPW7aJBAOg96JDHSLaQqba4dIo9buQZENchWYVYC06wPCmIZ0rEy-JrA7356pR-yaM8THbJ~EAMLN-S31Uaon3FSJ9YcIlAdF113Vp46Znid6UE0rWF9QbKYk8egTGEy5KPRbxajLXch4KCJPy5G9ZCxYyv8D4Vz8FWIPtCfxUX1R9sX6TB64Qz~d3DNP0fbpwNCoGn1tAzPXMPJ0u7XWr87DdSKKlatL8ed5Qz86bDTk2Em75s-l8f6zww__)
+Our architecture is composed of three primary layers:
+1.  **The Market Layer:** A **Tournament Organizer** agent acts as the central marketplace. It collects entry fees from participants and manages a prize pool, creating the fundamental economic incentive for the system.
+2.  **The Agent Layer:** This layer consists of two types of autonomous businesses:
+    *   **Player Agents (The Competitors):** These are the core businesses in our economy. Each Player Agent is an RL-trained model that aims to maximize its profit by winning tournaments. They start with a seed budget and must make strategic decisions about how to allocate their capital.
+    *   **Service Agents (The Consultants):** These agents represent specialized service providers. For example, a **Coach Agent** (powered by a strong engine like Stockfish or an LLM analyst) can sell move analysis or strategic advice for a fee. This creates a B2B market within our ecosystem.
+3.  **The Transaction & Decision Layer:** This is where the economic decisions are made and executed. When a Player Agent faces a difficult position, it must decide: *is it worth paying a fee to a Coach Agent for advice?* This decision is a core part of the agent's policy. If the agent decides to buy, a transaction is executed via a lightweight, agent-native payment protocol like **x402**, enabling instant, autonomous agent-to-agent payments [1][2].
+## 3. The Economic Model: Profit, Loss, and ROI
+The economic model is designed to mirror real-world business constraints:
+| Economic Component | Business Analogy | Implementation |
+| :--- | :--- | :--- |
+| **Tournament Entry Fee** | **Cost of Goods Sold (COGS)** | A fixed fee paid by each Player Agent to the Tournament Organizer to enter a game. |
+| **Prize Pool** | **Revenue** | The winner of the game receives the prize pool (e.g., 1.8x the total entry fees). |
+| **Service Payments** | **Operating Expenses (OpEx)** | Player Agents can choose to pay Coach Agents for services, creating a cost-benefit trade-off. |
+| **Agent Wallet** | **Company Treasury** | Each agent maintains a wallet (e.g., with a starting balance of 100 units) to manage its funds. |
+| **Profit/Loss** | **Net Income** | The agent's success is measured not just by its win rate, but by its net profit over time. |
+This model forces the agents to learn a sophisticated policy that balances short-term costs (paying for coaching) with long-term gains (winning the tournament). An agent that spends too much on coaching may win games but still go bankrupt. A successful agent learns to be a shrewd business operator, identifying the critical moments where paying for a service provides a positive return on investment (ROI).
+## 4. The RL Problem: Maximizing Profit, Not Just Wins
+This economic layer transforms the reinforcement learning problem from simply maximizing wins to **maximizing profit**. The RL agent's objective is now explicitly financial.
+*   **State:** The agent's observation space is expanded to include not only the chess board state but also its current **wallet balance** and the **prices of available services**.
+*   **Action:** The action space is expanded beyond just chess moves. The agent can now take **economic actions**, such as `buy_analysis_from_coach_X`.
+*   **Reward:** The reward function is no longer a simple `+1` for a win. Instead, the reward is the **change in the agent's wallet balance**. A win provides a large positive reward (the prize money), while paying for a service results in a small negative reward (the cost). The RL algorithm (e.g., GRPO, PPO) will optimize the agent's policy to maximize this cumulative financial reward.
+## 5. Why This Project Fits the Hackathon
+This project is a direct and compelling implementation of the hackathon's vision:
+*   **Autonomous Economic Decisions:** Agents decide what to buy (coaching services), who to pay (which coach), when to switch (if a coach is not providing value), and when to stop (if a game is unwinnable and further expense is futile).
+*   **Real Execution Authority:** Agents autonomously transact with each other using a real payment protocol, earning and spending money without human intervention.
+*   **Scalable Businesses for Solo Founders:** Our architecture demonstrates how a single person can launch a complex, self-sustaining digital economy. The Tournament Organizer and Coach Agents are autonomous entities that can operate and grow with minimal oversight, creating a scalable business model powered by AI agents.
+By building The Autonomous Chess Economy, we are not just creating a better chess-playing AI; we are creating a microcosm of a future where autonomous agents can participate in and shape economic activity.
+## 6. References
+[1] [x402 - Payment Required | Internet-Native Payments Standard](https://www.x402.org/)
+[2] [Agentic Payments: x402 and AI Agents in the AI Economy - Galaxy Digital](https://www.galaxy.com/insights/research/x402-ai-agents-crypto-payments)
+[3] [AI Agents & The New Payment Infrastructure - The Business Engineer](https://businessengineer.ai/p/ai-agents-and-the-new-payment-infrastructure)
+[4] [Introducing Agentic Wallets - Coinbase](https://www.coinbase.com/developer-platform/discover/launches/agentic-wallets)

app.py ADDED Viewed

	@@ -0,0 +1,110 @@

+"""
+app.py
+──────
+HuggingFace Spaces entry point.
+For Docker-based Spaces (sdk: docker), HF looks for this file but does not
+run it — the actual server is started by the Dockerfile CMD.
+This file serves as a discoverable Python client that users can copy/paste
+to interact with the environment from their own code.
+Usage:
+    from app import ChessEconClient
+    env = ChessEconClient()
+    obs, info = env.reset()
+    obs, reward, done, truncated, info = env.step("e2e4")
+"""
+import httpx
+from typing import Any
+SPACE_URL = "https://adaboostai-chessecon.hf.space"
+class ChessEconClient:
+    """
+    OpenEnv 0.1 client for the ChessEcon environment.
+    Compatible with any RL trainer that expects:
+        reset()  → (observation, info)
+        step()   → (observation, reward, terminated, truncated, info)
+        state()  → StateResponse dict
+    """
+    def __init__(self, base_url: str = SPACE_URL, timeout: float = 30.0):
+        self.base = base_url.rstrip("/")
+        self._client = httpx.Client(timeout=timeout)
+    def reset(self, seed: int | None = None) -> tuple[dict[str, Any], dict[str, Any]]:
+        """Start a new episode. Returns (observation, info)."""
+        payload: dict[str, Any] = {}
+        if seed is not None:
+            payload["seed"] = seed
+        r = self._client.post(f"{self.base}/env/reset", json=payload)
+        r.raise_for_status()
+        data = r.json()
+        return data["observation"], data.get("info", {})
+    def step(self, action: str) -> tuple[dict[str, Any], float, bool, bool, dict[str, Any]]:
+        """
+        Apply a chess move (UCI e.g. 'e2e4' or SAN e.g. 'e4').
+        Returns (observation, reward, terminated, truncated, info).
+        """
+        r = self._client.post(f"{self.base}/env/step", json={"action": action})
+        r.raise_for_status()
+        data = r.json()
+        return (
+            data["observation"],
+            data["reward"],
+            data["terminated"],
+            data["truncated"],
+            data.get("info", {}),
+        )
+    def state(self) -> dict[str, Any]:
+        """Return current episode state (read-only)."""
+        r = self._client.get(f"{self.base}/env/state")
+        r.raise_for_status()
+        return r.json()
+    def env_info(self) -> dict[str, Any]:
+        """Return environment metadata."""
+        r = self._client.get(f"{self.base}/env/env_info")
+        r.raise_for_status()
+        return r.json()
+    def health(self) -> dict[str, Any]:
+        r = self._client.get(f"{self.base}/health")
+        r.raise_for_status()
+        return r.json()
+    def close(self):
+        self._client.close()
+    def __enter__(self):
+        return self
+    def __exit__(self, *_):
+        self.close()
+# ── Quick demo ────────────────────────────────────────────────────────────────
+if __name__ == "__main__":
+    import json
+    with ChessEconClient() as env:
+        print("Environment info:")
+        print(json.dumps(env.env_info(), indent=2))
+        print("\nResetting …")
+        obs, info = env.reset()
+        print(f"  FEN:   {obs['fen']}")
+        print(f"  Turn:  {obs['turn']}")
+        print(f"  Wallet W={obs['wallet_white']}  B={obs['wallet_black']}")
+        print("\nPlaying e2e4 …")
+        obs, reward, done, truncated, info = env.step("e2e4")
+        print(f"  Reward: {reward}")
+        print(f"  Done:   {done}")
+        print(f"  FEN:    {obs['fen']}")

backend/.DS_Store ADDED Viewed

Binary file (6.15 kB). View file

backend/.env.example ADDED Viewed

	@@ -0,0 +1,29 @@

+# ─────────────────────────────────────────────────────────────────────────────
+# ChessEcon Backend — environment variables
+# Copy to .env and fill in values
+# ─────────────────────────────────────────────────────────────────────────────
+# HuggingFace token — REQUIRED for Llama-3.2 (gated model)
+# Get yours at https://huggingface.co/settings/tokens
+HF_TOKEN=hf_...
+# Model paths (override with local paths if downloaded)
+WHITE_MODEL=Qwen/Qwen2.5-0.5B-Instruct
+BLACK_MODEL=meta-llama/Llama-3.2-1B-Instruct
+# Device: auto | cuda | cpu
+DEVICE=auto
+# Economy
+STARTING_WALLET=100.0
+ENTRY_FEE=10.0
+PRIZE_POOL_FRACTION=0.9
+# Training
+LORA_RANK=8
+GRPO_LR=1e-5
+GRPO_UPDATE_EVERY_N_GAMES=1
+# Server
+PORT=8000
+MOVE_DELAY=0.5

backend/Dockerfile ADDED Viewed

	@@ -0,0 +1,43 @@

+# ─────────────────────────────────────────────────────────────────────────────
+# ChessEcon Backend — GPU Docker image (OpenEnv 0.1 compliant)
+# ─────────────────────────────────────────────────────────────────────────────
+FROM nvidia/cuda:12.1.1-cudnn8-runtime-ubuntu22.04
+ENV DEBIAN_FRONTEND=noninteractive
+RUN apt-get update && apt-get install -y --no-install-recommends \
+    python3.11 python3.11-dev python3-pip git curl \
+    && rm -rf /var/lib/apt/lists/*
+RUN update-alternatives --install /usr/bin/python python /usr/bin/python3.11 1 \
+ && update-alternatives --install /usr/bin/python3 python3 /usr/bin/python3.11 1 \
+ && update-alternatives --install /usr/bin/pip pip /usr/bin/pip3 1
+# Install deps from requirements.txt before copying the full source
+# so this layer is cached independently of code changes.
+WORKDIR /build
+COPY requirements.txt .
+RUN pip install --no-cache-dir --upgrade pip \
+ && pip install --no-cache-dir torch==2.4.1 --index-url https://download.pytorch.org/whl/cu121 \
+ && pip install --no-cache-dir -r requirements.txt
+# Copy source into /backend so "from backend.X" resolves correctly.
+# PYTHONPATH=/ means Python sees /backend as the top-level package.
+COPY . /backend
+WORKDIR /backend
+# / on PYTHONPATH → "import backend" resolves to /backend
+ENV PYTHONPATH=/
+ENV HF_HOME=/root/.cache/huggingface
+ENV TRANSFORMERS_CACHE=/root/.cache/huggingface
+ENV HF_HUB_OFFLINE=0
+EXPOSE 8000
+HEALTHCHECK --interval=30s --timeout=10s --start-period=180s --retries=5 \
+    CMD curl -f http://localhost:8000/health || exit 1
+CMD ["python", "websocket_server.py"]

backend/__init__.py ADDED Viewed

	@@ -0,0 +1 @@


1	+

backend/agents/__init__.py ADDED Viewed

File without changes

backend/agents/claude_coach.py ADDED Viewed

	@@ -0,0 +1,131 @@

+"""
+ChessEcon Backend — Claude Coach Agent
+Calls Anthropic claude-opus-4-5 ONLY when position complexity warrants it.
+This is a fee-charging service that agents must decide to use.
+"""
+from __future__ import annotations
+import os
+import re
+import logging
+from typing import Optional
+from shared.models import CoachingRequest, CoachingResponse
+logger = logging.getLogger(__name__)
+ANTHROPIC_API_KEY = os.getenv("ANTHROPIC_API_KEY", "")
+CLAUDE_MODEL      = os.getenv("CLAUDE_MODEL", "claude-opus-4-5")
+CLAUDE_MAX_TOKENS = int(os.getenv("CLAUDE_MAX_TOKENS", "1024"))
+COACHING_FEE      = float(os.getenv("COACHING_FEE", "5.0"))
+class ClaudeCoachAgent:
+    """
+    Premium coaching service backed by Claude claude-opus-4-5.
+    Called only for COMPLEX or CRITICAL positions where the agent
+    has explicitly requested coaching AND can afford the fee.
+    """
+    def __init__(self):
+        self._client = None
+        self._available = bool(ANTHROPIC_API_KEY)
+        if self._available:
+            try:
+                import anthropic
+                self._client = anthropic.Anthropic(api_key=ANTHROPIC_API_KEY)
+                logger.info(f"Claude Coach initialized with model {CLAUDE_MODEL}")
+            except ImportError:
+                logger.warning("anthropic package not installed — Claude Coach disabled")
+                self._available = False
+        else:
+            logger.warning("ANTHROPIC_API_KEY not set — Claude Coach disabled")
+    @property
+    def available(self) -> bool:
+        return self._available and self._client is not None
+    def analyze(self, request: CoachingRequest) -> CoachingResponse:
+        """
+        Request chess analysis from Claude. Returns best move recommendation
+        and strategic reasoning. Falls back to heuristic if unavailable.
+        """
+        if not self.available:
+            return self._fallback(request)
+        prompt = self._build_prompt(request)
+        try:
+            response = self._client.messages.create(
+                model=CLAUDE_MODEL,
+                max_tokens=CLAUDE_MAX_TOKENS,
+                messages=[{"role": "user", "content": prompt}],
+            )
+            content = response.content[0].text
+            tokens_used = response.usage.input_tokens + response.usage.output_tokens
+            recommended_move = self._extract_move(content, request.legal_moves)
+            logger.info(
+                f"Claude coaching: game={request.game_id} "
+                f"agent={request.agent_id} move={recommended_move} "
+                f"tokens={tokens_used}"
+            )
+            return CoachingResponse(
+                game_id=request.game_id,
+                agent_id=request.agent_id,
+                recommended_move=recommended_move,
+                analysis=content,
+                cost=COACHING_FEE,
+                model_used=CLAUDE_MODEL,
+                tokens_used=tokens_used,
+            )
+        except Exception as e:
+            logger.error(f"Claude API error: {e}")
+            return self._fallback(request)
+    def _build_prompt(self, request: CoachingRequest) -> str:
+        legal_sample = request.legal_moves[:20]
+        return f"""You are an expert chess coach. Analyze this position and recommend the best move.
+Position (FEN): {request.fen}
+Legal moves (UCI format): {', '.join(legal_sample)}{'...' if len(request.legal_moves) > 20 else ''}
+Position complexity: {request.complexity.level.value} (score: {request.complexity.score:.2f})
+Your wallet: {request.wallet_balance:.1f} units (you paid {COACHING_FEE} for this analysis)
+Provide:
+1. The single best move in UCI format (e.g., e2e4)
+2. Brief strategic reasoning (2-3 sentences)
+3. Key tactical threats to watch
+Start your response with: BEST MOVE: <uci_move>"""
+    def _extract_move(self, text: str, legal_moves: list) -> str:
+        """Extract the recommended UCI move from Claude's response."""
+        # Try explicit BEST MOVE: pattern first
+        match = re.search(r"BEST MOVE:\s*([a-h][1-8][a-h][1-8][qrbn]?)", text, re.IGNORECASE)
+        if match:
+            move = match.group(1).lower()
+            if move in legal_moves:
+                return move
+        # Scan for any UCI move mentioned in the text
+        for token in re.findall(r"\b([a-h][1-8][a-h][1-8][qrbn]?)\b", text):
+            if token.lower() in legal_moves:
+                return token.lower()
+        # Fallback: return first legal move
+        return legal_moves[0] if legal_moves else "e2e4"
+    def _fallback(self, request: CoachingRequest) -> CoachingResponse:
+        """Return a basic heuristic move when Claude is unavailable."""
+        move = request.legal_moves[0] if request.legal_moves else "e2e4"
+        return CoachingResponse(
+            game_id=request.game_id,
+            agent_id=request.agent_id,
+            recommended_move=move,
+            analysis="Claude unavailable — using heuristic fallback.",
+            cost=0.0,
+            model_used="heuristic",
+            tokens_used=0,
+        )
+# Singleton
+claude_coach = ClaudeCoachAgent()

backend/agents/complexity.py ADDED Viewed

	@@ -0,0 +1,79 @@

+"""
+ChessEcon Backend — Position Complexity Analyzer
+Decides when a position is complex enough to warrant calling Claude.
+Claude is only called when ALL three gates pass:
+  1. Position complexity >= threshold
+  2. Agent wallet >= minimum
+  3. Agent's own policy requests coaching
+"""
+from __future__ import annotations
+import os
+from shared.models import ComplexityAnalysis, PositionComplexity
+THRESHOLD_COMPLEX  = float(os.getenv("COMPLEXITY_THRESHOLD_COMPLEX", "0.45"))
+THRESHOLD_CRITICAL = float(os.getenv("COMPLEXITY_THRESHOLD_CRITICAL", "0.70"))
+class ComplexityAnalyzer:
+    def analyze(self, features: dict) -> ComplexityAnalysis:
+        """
+        Compute a 0–1 complexity score from raw board features.
+        Higher = more complex = more likely Claude is useful.
+        """
+        score = 0.0
+        factors: dict = {}
+        # Factor 1: Number of legal moves (high = complex position)
+        num_moves = features.get("num_legal_moves", 20)
+        move_score = min(num_moves / 60.0, 1.0)
+        factors["mobility"] = round(move_score, 3)
+        score += move_score * 0.30
+        # Factor 2: Check pressure
+        check_score = 0.8 if features.get("is_check") else 0.0
+        factors["check_pressure"] = check_score
+        score += check_score * 0.20
+        # Factor 3: Tactical captures available
+        capture_score = 0.6 if features.get("has_captures") else 0.0
+        factors["captures_available"] = capture_score
+        score += capture_score * 0.15
+        # Factor 4: Endgame (few pieces = precise calculation needed)
+        num_pieces = features.get("num_pieces", 32)
+        endgame_score = max(0.0, (16 - num_pieces) / 16.0)
+        factors["endgame_pressure"] = round(endgame_score, 3)
+        score += endgame_score * 0.20
+        # Factor 5: Material imbalance (unbalanced = harder to evaluate)
+        material = abs(features.get("material_balance", 0.0))
+        imbalance_score = min(material / 9.0, 1.0)  # queen = 9
+        factors["material_imbalance"] = round(imbalance_score, 3)
+        score += imbalance_score * 0.15
+        score = round(min(score, 1.0), 4)
+        if score >= THRESHOLD_CRITICAL:
+            level = PositionComplexity.CRITICAL
+        elif score >= THRESHOLD_COMPLEX:
+            level = PositionComplexity.COMPLEX
+        elif score >= 0.25:
+            level = PositionComplexity.MODERATE
+        else:
+            level = PositionComplexity.SIMPLE
+        recommend = level in (PositionComplexity.COMPLEX, PositionComplexity.CRITICAL)
+        return ComplexityAnalysis(
+            fen=features.get("fen", ""),
+            score=score,
+            level=level,
+            factors=factors,
+            recommend_coaching=recommend,
+        )
+# Singleton
+complexity_analyzer = ComplexityAnalyzer()

backend/agents/grpo_trainer.py ADDED Viewed

	@@ -0,0 +1,236 @@

+"""
+grpo_trainer.py
+───────────────
+Group Relative Policy Optimisation (GRPO) training loop for the chess agent.
+Algorithm summary (per game batch):
+  1. Collect a group of G candidate moves per position (sampled from the policy).
+  2. Compute advantages: A_i = (r_i - mean(r)) / (std(r) + ε)
+     where r_i is the terminal game reward for the trajectory that chose move i.
+  3. Compute the GRPO policy loss:
+       L = -E[ min(ratio * A, clip(ratio, 1-ε, 1+ε) * A) ]
+     where ratio = exp(log_π_θ(a) - log_π_old(a))
+  4. Add KL penalty: L_total = L + β * KL(π_θ || π_ref)
+  5. Backprop and update the model weights.
+In practice, for a single-agent chess game:
+  - Each move in the game is a "step" with a delayed terminal reward.
+  - The group is formed by sampling G moves at each position and running
+    mini-rollouts (or approximating with the final game outcome).
+  - For simplicity we use the full game outcome as the reward for every
+    move in the game (REINFORCE-style with GRPO normalisation).
+References:
+  DeepSeek-R1 GRPO: https://arxiv.org/abs/2501.12599
+"""
+import os
+import logging
+import torch
+import torch.nn.functional as F
+from dataclasses import dataclass, field
+from typing import Optional
+from backend.settings import settings
+logger = logging.getLogger(__name__)
+@dataclass
+class Trajectory:
+    """One complete game trajectory collected for training."""
+    agent_color: str
+    log_probs: list[float]          # log π_θ(a_t | s_t) for each move
+    ref_log_probs: list[float]      # log π_ref(a_t | s_t) for KL
+    reward: float                   # terminal reward (+1 win, -1 loss, 0 draw)
+    move_count: int = 0
+@dataclass
+class TrainingMetrics:
+    step: int = 0
+    loss: float = 0.0
+    policy_reward: float = 0.0
+    kl_div: float = 0.0
+    win_rate: float = 0.0
+    avg_profit: float = 0.0
+    coaching_rate: float = 0.0
+    # Running stats
+    wins: int = 0
+    games: int = 0
+    total_profit: float = 0.0
+    total_coaching_calls: int = 0
+    total_moves: int = 0
+class GRPOTrainer:
+    """
+    Manages the GRPO training loop for the Qwen chess agent.
+    Usage:
+        trainer = GRPOTrainer(model, tokenizer)
+        trainer.record_move(log_prob, ref_log_prob)
+        ...
+        metrics = trainer.end_game(reward, profit, coaching_calls)
+        # metrics is None until grpo_update_every_n_games games have been collected
+    """
+    def __init__(self, model, tokenizer):
+        self.model = model
+        self.tokenizer = tokenizer
+        self._step = 0
+        self._pending: list[Trajectory] = []
+        self._current: Optional[Trajectory] = None
+        self._metrics = TrainingMetrics()
+        # Optimizer — only update LoRA params if present, else all params
+        trainable = [p for p in model.parameters() if p.requires_grad]
+        if not trainable:
+            logger.warning("No trainable parameters found — GRPO updates will be no-ops.")
+        self._optimizer = torch.optim.AdamW(trainable, lr=settings.grpo_lr) if trainable else None
+    # ── Game lifecycle ────────────────────────────────────────────────────
+    def start_game(self, agent_color: str):
+        """Call at the start of each game."""
+        self._current = Trajectory(agent_color=agent_color, log_probs=[], ref_log_probs=[], reward=0.0)
+    def record_move(self, log_prob: float, ref_log_prob: float):
+        """Call after each move with the policy and reference log-probs."""
+        if self._current is None:
+            return
+        self._current.log_probs.append(log_prob)
+        self._current.ref_log_probs.append(ref_log_prob)
+        self._current.move_count += 1
+    def end_game(
+        self,
+        reward: float,
+        profit: float = 0.0,
+        coaching_calls: int = 0,
+    ) -> Optional[TrainingMetrics]:
+        """
+        Call at game end with the terminal reward.
+        Returns updated TrainingMetrics if a gradient update was performed,
+        else None (still accumulating games).
+        """
+        if self._current is None:
+            return None
+        self._current.reward = reward
+        self._pending.append(self._current)
+        self._current = None
+        # Update running stats
+        m = self._metrics
+        m.games += 1
+        if reward > 0:
+            m.wins += 1
+        m.total_profit += profit
+        m.total_coaching_calls += coaching_calls
+        m.total_moves += self._pending[-1].move_count
+        # Trigger update every N games
+        if m.games % settings.grpo_update_every_n_games == 0:
+            return self._update()
+        return None
+    # ── GRPO update ───────────────────────────────────────────────────────
+    def _update(self) -> TrainingMetrics:
+        """Perform one GRPO gradient update over the pending trajectories."""
+        if self._optimizer is None or not self._pending:
+            return self._build_metrics()
+        trajectories = self._pending
+        self._pending = []
+        # Collect rewards and compute advantages (GRPO normalisation)
+        rewards = torch.tensor([t.reward for t in trajectories], dtype=torch.float32)
+        mean_r = rewards.mean()
+        std_r = rewards.std() + 1e-8
+        advantages = (rewards - mean_r) / std_r  # shape: (N,)
+        total_loss = torch.tensor(0.0, requires_grad=True)
+        total_kl = 0.0
+        n_tokens = 0
+        for traj, adv in zip(trajectories, advantages):
+            if not traj.log_probs:
+                continue
+            lp = torch.tensor(traj.log_probs, dtype=torch.float32)       # (T,)
+            ref_lp = torch.tensor(traj.ref_log_probs, dtype=torch.float32)  # (T,)
+            # Ratio: π_θ / π_old  (here π_old == π_ref since we update every game)
+            ratio = torch.exp(lp - ref_lp)
+            # Clipped surrogate loss (PPO-style clip)
+            eps = 0.2
+            clipped = torch.clamp(ratio, 1 - eps, 1 + eps)
+            surrogate = torch.min(ratio * adv, clipped * adv)
+            policy_loss = -surrogate.mean()
+            # KL penalty: KL(π_θ || π_ref) ≈ exp(lp - ref_lp) - (lp - ref_lp) - 1
+            kl = (torch.exp(lp - ref_lp) - (lp - ref_lp) - 1).mean()
+            total_kl += kl.item()
+            step_loss = policy_loss + settings.grpo_kl_coeff * kl
+            total_loss = total_loss + step_loss
+            n_tokens += len(traj.log_probs)
+        if n_tokens > 0:
+            total_loss = total_loss / len(trajectories)
+            self._optimizer.zero_grad()
+            total_loss.backward()
+            torch.nn.utils.clip_grad_norm_(self.model.parameters(), max_norm=1.0)
+            self._optimizer.step()
+        self._step += 1
+        # Save checkpoint periodically
+        if self._step % settings.save_every_n_steps == 0:
+            self._save_checkpoint()
+        # Update metrics
+        m = self._metrics
+        m.step = self._step
+        m.loss = total_loss.item() if n_tokens > 0 else 0.0
+        m.policy_reward = float(rewards.mean())
+        m.kl_div = total_kl / max(len(trajectories), 1)
+        m.win_rate = m.wins / max(m.games, 1)
+        m.avg_profit = m.total_profit / max(m.games, 1)
+        m.coaching_rate = m.total_coaching_calls / max(m.total_moves, 1)
+        logger.info(
+            "GRPO step %d | loss=%.4f reward=%.3f kl=%.4f win_rate=%.2f",
+            m.step, m.loss, m.policy_reward, m.kl_div, m.win_rate,
+        )
+        return self._build_metrics()
+    def _build_metrics(self) -> TrainingMetrics:
+        import copy
+        return copy.copy(self._metrics)
+    # ── Checkpoint ────────────────────────────────────────────────────────
+    def _save_checkpoint(self):
+        os.makedirs(settings.checkpoint_dir, exist_ok=True)
+        path = os.path.join(settings.checkpoint_dir, f"step_{self._step:06d}")
+        try:
+            self.model.save_pretrained(path)
+            self.tokenizer.save_pretrained(path)
+            logger.info("Checkpoint saved: %s", path)
+        except Exception as exc:
+            logger.error("Checkpoint save failed: %s", exc)
+    def load_checkpoint(self, path: str):
+        """Load a previously saved LoRA checkpoint."""
+        try:
+            from peft import PeftModel  # type: ignore
+            self.model = PeftModel.from_pretrained(self.model, path)
+            logger.info("Checkpoint loaded: %s", path)
+        except Exception as exc:
+            logger.error("Checkpoint load failed: %s", exc)

backend/agents/model_agent.py ADDED Viewed

	@@ -0,0 +1,285 @@

+"""
+agents/model_agent.py
+─────────────────────
+Unified chess agent that can load ANY HuggingFace CausalLM.
+White → Qwen/Qwen2.5-0.5B-Instruct  (GRPO trainable)
+Black → meta-llama/Llama-3.2-1B-Instruct (fixed opponent)
+Key fix: tight UCI-format prompt + aggressive output parsing ensures
+the model reliably produces legal moves rather than always falling back
+to random. This is essential for GRPO to receive real gradient signal.
+"""
+from __future__ import annotations
+import re
+import logging
+from typing import Optional
+import torch
+from transformers import AutoTokenizer, AutoModelForCausalLM
+from backend.settings import settings
+from backend.chess_engine import ChessEngine
+logger = logging.getLogger(__name__)
+# UCI move pattern: e2e4, g1f3, e1g1, a7a8q (promotion)
+_UCI_RE = re.compile(r'\b([a-h][1-8][a-h][1-8][qrbn]?)\b')
+# SAN fallback patterns: e4, Nf3, O-O, Bxf7+, exd5=Q
+_SAN_RE = re.compile(r'\b(O-O-O|O-O|[KQRBN]?[a-h]?[1-8]?x?[a-h][1-8](?:=[QRBN])?[+#]?)\b')
+class ModelAgent:
+    """
+    A chess-playing agent backed by any HuggingFace CausalLM.
+    Usage:
+        agent = ModelAgent("/models/Qwen_Qwen2.5-0.5B-Instruct")
+        san, log_prob = agent.get_move(engine, "white", move_history)
+    """
+    def __init__(self, model_id: str, device: str = "auto"):
+        self.model_id = model_id
+        self.device = device
+        self._temperature = settings.temperature
+        self._tokenizer = None
+        self._model = None
+        self._loaded = False
+    # ── Lazy model loading ─────────────────────────────────────────────────────
+    def load(self) -> "ModelAgent":
+        """Explicitly load model weights. Called once at startup."""
+        if self._loaded:
+            return self
+        logger.info("Loading model: %s", self.model_id)
+        dtype_map = {
+            "float16":  torch.float16,
+            "bfloat16": torch.bfloat16,
+            "float32":  torch.float32,
+        }
+        torch_dtype = dtype_map.get(settings.torch_dtype, torch.bfloat16)
+        hf_kwargs: dict = {}
+        if settings.hf_token:
+            hf_kwargs["token"] = settings.hf_token
+        self._tokenizer = AutoTokenizer.from_pretrained(
+            self.model_id,
+            trust_remote_code=True,
+            **hf_kwargs,
+        )
+        if self._tokenizer.pad_token is None:
+            self._tokenizer.pad_token = self._tokenizer.eos_token
+        self._model = AutoModelForCausalLM.from_pretrained(
+            self.model_id,
+            dtype=torch_dtype,
+            device_map=self.device if self.device != "auto" else "auto",
+            trust_remote_code=True,
+            **hf_kwargs,
+        )
+        self._model.eval()
+        if settings.lora_rank > 0:
+            try:
+                from peft import get_peft_model, LoraConfig, TaskType  # type: ignore
+                lora_config = LoraConfig(
+                    task_type=TaskType.CAUSAL_LM,
+                    r=settings.lora_rank,
+                    lora_alpha=settings.lora_rank * 2,
+                    lora_dropout=0.05,
+                    target_modules=["q_proj", "v_proj"],
+                )
+                self._model = get_peft_model(self._model, lora_config)
+                logger.info("[%s] LoRA applied (rank=%d)", self.model_id, settings.lora_rank)
+            except ImportError:
+                logger.warning("[%s] peft not installed — running without LoRA", self.model_id)
+        device_str = str(next(self._model.parameters()).device)
+        logger.info("[%s] Loaded on %s", self.model_id, device_str)
+        self._loaded = True
+        return self
+    @property
+    def model(self):
+        if not self._loaded:
+            self.load()
+        return self._model
+    @property
+    def tokenizer(self):
+        if not self._loaded:
+            self.load()
+        return self._tokenizer
+    def set_temperature(self, temp: float):
+        self._temperature = max(0.1, temp)
+    # ── Prompt building ────────────────────────────────────────────────────────
+    def _build_prompt(self, engine: ChessEngine, color: str, history: list[str]) -> str:
+        """
+        Build a tight prompt that forces the model to output a single UCI move.
+        We give it ALL legal moves so it only needs to pick one — no need to
+        invent a move from scratch. This dramatically reduces illegal outputs.
+        """
+        legal_uci  = engine.legal_moves_uci          # full list e.g. ["e2e4","d2d4",...]
+        legal_san  = engine.legal_moves_san           # same moves in SAN
+        history_str = " ".join(history[-10:]) if history else "game start"
+        # Show up to 30 legal moves so the model has enough context
+        legal_display = " ".join(legal_uci[:30])
+        system = (
+            "You are a chess engine. "
+            "You must respond with EXACTLY ONE move from the legal moves list. "
+            "Use UCI format only (e.g. e2e4). No explanation, no punctuation."
+        )
+        user = (
+            f"Color: {color}\n"
+            f"FEN: {engine.fen}\n"
+            f"Move history: {history_str}\n"
+            f"Legal moves: {legal_display}\n"
+            f"Your move (UCI):"
+        )
+        messages = [
+            {"role": "system", "content": system},
+            {"role": "user",   "content": user},
+        ]
+        try:
+            return self._tokenizer.apply_chat_template(
+                messages,
+                tokenize=False,
+                add_generation_prompt=True,
+            )
+        except Exception:
+            return f"<s>[INST] {system}\n{user} [/INST]"
+    # ── Output parsing ─────────────────────────────────────────────────────────
+    def _parse_move(self, text: str, engine: ChessEngine) -> Optional[str]:
+        """
+        Extract a legal move from model output.
+        Priority: UCI match → SAN match → first token direct match.
+        Returns SAN string if legal, else None.
+        """
+        text = text.strip()
+        # 1. Try every UCI token in output order
+        for m in _UCI_RE.finditer(text):
+            san = engine.uci_to_san(m.group(1))
+            if san:
+                return san
+        # 2. Try SAN tokens
+        for m in _SAN_RE.finditer(text):
+            san = engine.parse_model_output(m.group(1))
+            if san:
+                return san
+        # 3. Try the raw first word (model sometimes outputs move + newline)
+        first = text.split()[0] if text.split() else ""
+        if first:
+            san = engine.uci_to_san(first) or engine.parse_model_output(first)
+            if san:
+                return san
+        return None
+    # ── Move generation ────────────────────────────────────────────────────────
+    def get_move(
+        self,
+        engine: ChessEngine,
+        color: str,
+        history: list[str],
+    ) -> tuple[str, float]:
+        """
+        Generate a legal chess move. Returns (san_move, log_prob).
+        Falls back to random legal move after max_move_retries.
+        """
+        if not self._loaded:
+            self.load()
+        prompt = self._build_prompt(engine, color, history)
+        inputs = self._tokenizer(prompt, return_tensors="pt").to(self._model.device)
+        input_len = inputs["input_ids"].shape[1]
+        best_san: Optional[str] = None
+        best_lp = 0.0
+        for attempt in range(settings.max_move_retries):
+            with torch.no_grad():
+                outputs = self._model.generate(
+                    **inputs,
+                    max_new_tokens=10,        # a UCI move is at most 5 chars
+                    temperature=self._temperature,
+                    do_sample=True,
+                    pad_token_id=self._tokenizer.eos_token_id,
+                    return_dict_in_generate=True,
+                    output_scores=True,
+                )
+            gen_ids = outputs.sequences[0][input_len:]
+            gen_text = self._tokenizer.decode(gen_ids, skip_special_tokens=True)
+            lp = _compute_log_prob(outputs.scores, gen_ids)
+            san = self._parse_move(gen_text, engine)
+            if san:
+                best_san, best_lp = san, lp
+                logger.debug("[%s] ✓ move=%s attempt=%d lp=%.3f raw=%r",
+                             self.model_id, san, attempt + 1, lp, gen_text)
+                break
+            logger.warning("[%s] ✗ attempt %d bad output: %r", self.model_id, attempt + 1, gen_text)
+        if best_san is None:
+            best_san = engine.random_legal_move_san() or "e4"
+            best_lp = 0.0
+            logger.warning("[%s] retries exhausted — random fallback: %s", self.model_id, best_san)
+        return best_san, best_lp
+    def get_move_log_prob_only(
+        self,
+        engine: ChessEngine,
+        color: str,
+        history: list[str],
+        san_move: str,
+    ) -> float:
+        """Log-probability of a specific move under the current policy. Used for GRPO KL."""
+        if not self._loaded:
+            self.load()
+        prompt = self._build_prompt(engine, color, history)
+        # Convert SAN → UCI for consistency with prompt format
+        uci = engine.san_to_uci(san_move) or san_move
+        target_text = prompt + uci
+        inputs = self._tokenizer(target_text, return_tensors="pt").to(self._model.device)
+        prompt_len = self._tokenizer(prompt, return_tensors="pt")["input_ids"].shape[1]
+        with torch.no_grad():
+            out = self._model(**inputs, labels=inputs["input_ids"])
+        logits = out.logits[0, prompt_len - 1:-1]
+        target_ids = inputs["input_ids"][0, prompt_len:]
+        log_probs = torch.nn.functional.log_softmax(logits, dim=-1)
+        selected = log_probs.gather(1, target_ids.unsqueeze(1)).squeeze(1)
+        return selected.sum().item()
+# ── Helpers ────────────────────────────────────────────────────────────────────
+def _compute_log_prob(scores, generated_ids) -> float:
+    total = 0.0
+    for step, score in enumerate(scores):
+        if step >= len(generated_ids):
+            break
+        lp = torch.nn.functional.log_softmax(score[0], dim=-1)
+        total += lp[generated_ids[step]].item()
+    return total

backend/agents/nvm_player_agent.py ADDED Viewed

	@@ -0,0 +1,265 @@

+"""
+ChessEcon — NVM-Aware Player Agent
+====================================
+Extends the chess player agent with Nevermined payment capabilities.
+This agent can:
+  1. Discover external coaching services via Nevermined marketplace
+  2. Purchase coaching plans from other teams' agents
+  3. Generate x402 access tokens for paid service calls
+  4. Make HTTP requests to external coaching endpoints with payment headers
+  5. Fall back to internal Claude coaching if external service fails
+This demonstrates the core hackathon requirement: autonomous agents
+making real economic decisions — buy, pay, switch, stop.
+Economic decision logic:
+  - If position complexity > threshold AND wallet balance > min_balance:
+      → Try external NVM coaching first (cross-team transaction)
+      → Fall back to internal Claude coaching
+      → Fall back to heuristic
+  - Track spending vs. performance to decide when to stop coaching
+"""
+from __future__ import annotations
+import logging
+import os
+from typing import Dict, List, Optional, Tuple
+import httpx
+logger = logging.getLogger(__name__)
+# ── Config ─────────────────────────────────────────────────────────────────────
+# External coaching service (another team's endpoint)
+# Set EXTERNAL_COACHING_URL to use cross-team agent payments
+EXTERNAL_COACHING_URL = os.getenv("EXTERNAL_COACHING_URL", "")
+EXTERNAL_NVM_PLAN_ID = os.getenv("EXTERNAL_NVM_PLAN_ID", "")
+EXTERNAL_NVM_AGENT_ID = os.getenv("EXTERNAL_NVM_AGENT_ID", "")
+# Internal NVM credentials (for purchasing external services)
+NVM_API_KEY = os.getenv("NVM_API_KEY", "")
+NVM_ENVIRONMENT = os.getenv("NVM_ENVIRONMENT", "sandbox")
+# Economic thresholds
+EXTERNAL_COACHING_BUDGET = float(os.getenv("EXTERNAL_COACHING_BUDGET", "50.0"))
+MIN_WALLET_FOR_EXTERNAL = float(os.getenv("MIN_WALLET_FOR_EXTERNAL", "20.0"))
+class NvmPlayerAgent:
+    """
+    A chess player agent that makes autonomous economic decisions
+    using Nevermined for cross-team agent-to-agent payments.
+    """
+    def __init__(self, agent_id: str):
+        self.agent_id = agent_id
+        self._payments = None
+        self._nvm_available = False
+        self._external_token: Optional[str] = None
+        self._external_plan_ordered = False
+        self._total_external_spend = 0.0
+        self._external_calls = 0
+        self._external_successes = 0
+        self._init_nvm()
+    def _init_nvm(self):
+        """Initialize Nevermined SDK for purchasing external services."""
+        if not NVM_API_KEY:
+            logger.debug(f"Agent {self.agent_id}: NVM_API_KEY not set, external payments disabled")
+            return
+        try:
+            from payments_py import Payments, PaymentOptions
+            self._payments = Payments.get_instance(
+                PaymentOptions(
+                    nvm_api_key=NVM_API_KEY,
+                    environment=NVM_ENVIRONMENT,
+                )
+            )
+            self._nvm_available = True
+            logger.info(f"Agent {self.agent_id}: NVM SDK initialized")
+        except Exception as exc:
+            logger.warning(f"Agent {self.agent_id}: NVM init failed: {exc}")
+    # ── External coaching via NVM ──────────────────────────────────────────────
+    def can_use_external_coaching(self, wallet_balance: float) -> bool:
+        """
+        Decide whether to use external coaching based on:
+          - NVM availability
+          - External service configured
+          - Wallet balance above threshold
+          - Budget not exhausted
+        """
+        return (
+            self._nvm_available
+            and bool(EXTERNAL_COACHING_URL)
+            and bool(EXTERNAL_NVM_PLAN_ID)
+            and wallet_balance >= MIN_WALLET_FOR_EXTERNAL
+            and self._total_external_spend < EXTERNAL_COACHING_BUDGET
+        )
+    def _ensure_plan_ordered(self) -> bool:
+        """
+        Order the external coaching plan if not already done.
+        This is the 'buy' decision — agent autonomously purchases a service.
+        """
+        if self._external_plan_ordered:
+            return True
+        if not self._nvm_available or not EXTERNAL_NVM_PLAN_ID:
+            return False
+        try:
+            logger.info(
+                f"Agent {self.agent_id}: Ordering external coaching plan {EXTERNAL_NVM_PLAN_ID}"
+            )
+            result = self._payments.plans.order_plan(EXTERNAL_NVM_PLAN_ID)
+            self._external_plan_ordered = True
+            logger.info(f"Agent {self.agent_id}: Plan ordered successfully: {result}")
+            return True
+        except Exception as exc:
+            logger.warning(f"Agent {self.agent_id}: Failed to order plan: {exc}")
+            return False
+    def _get_access_token(self) -> Optional[str]:
+        """
+        Get or refresh the x402 access token for the external coaching service.
+        """
+        if not self._nvm_available or not EXTERNAL_NVM_PLAN_ID:
+            return None
+        try:
+            result = self._payments.x402.get_x402_access_token(
+                plan_id=EXTERNAL_NVM_PLAN_ID,
+                agent_id=EXTERNAL_NVM_AGENT_ID or None,
+            )
+            token = result.get("accessToken") or result.get("access_token")
+            self._external_token = token
+            return token
+        except Exception as exc:
+            logger.warning(f"Agent {self.agent_id}: Failed to get access token: {exc}")
+            return None
+    def request_external_coaching(
+        self,
+        fen: str,
+        legal_moves: List[str],
+        game_id: str,
+        wallet_balance: float,
+    ) -> Optional[Dict]:
+        """
+        Request chess analysis from an external agent service via Nevermined.
+        This is the core cross-team agent-to-agent payment flow:
+          1. Order plan (if not already done)
+          2. Get x402 access token
+          3. Call external endpoint with payment-signature header
+          4. Track spending
+        Returns:
+            Analysis dict with 'recommended_move' and 'analysis', or None on failure.
+        """
+        if not self.can_use_external_coaching(wallet_balance):
+            return None
+        # Step 1: Ensure plan is ordered (buy decision)
+        if not self._ensure_plan_ordered():
+            logger.warning(f"Agent {self.agent_id}: Could not order external plan")
+            return None
+        # Step 2: Get access token (pay decision)
+        token = self._get_access_token()
+        if not token:
+            logger.warning(f"Agent {self.agent_id}: Could not get access token")
+            return None
+        # Step 3: Call external coaching endpoint
+        try:
+            self._external_calls += 1
+            response = httpx.post(
+                f"{EXTERNAL_COACHING_URL}/api/chess/analyze",
+                headers={
+                    "Content-Type": "application/json",
+                    "payment-signature": token,
+                },
+                json={
+                    "fen": fen,
+                    "legal_moves": legal_moves[:30],  # Limit for API efficiency
+                    "game_id": game_id,
+                    "agent_id": self.agent_id,
+                },
+                timeout=10.0,
+            )
+            if response.status_code == 200:
+                data = response.json()
+                self._external_successes += 1
+                self._total_external_spend += 1.0  # 1 credit per call
+                logger.info(
+                    f"Agent {self.agent_id}: External coaching success "
+                    f"move={data.get('recommended_move')} "
+                    f"model={data.get('model_used')} "
+                    f"total_spend={self._total_external_spend}"
+                )
+                return data
+            elif response.status_code == 402:
+                logger.warning(
+                    f"Agent {self.agent_id}: External coaching returned 402 — "
+                    "insufficient credits or invalid token"
+                )
+                # Reset token so it gets refreshed next time
+                self._external_token = None
+                return None
+            else:
+                logger.warning(
+                    f"Agent {self.agent_id}: External coaching returned {response.status_code}"
+                )
+                return None
+        except httpx.TimeoutException:
+            logger.warning(f"Agent {self.agent_id}: External coaching request timed out")
+            return None
+        except Exception as exc:
+            logger.error(f"Agent {self.agent_id}: External coaching request failed: {exc}")
+            return None
+    # ── Economic decision: switch / stop ───────────────────────────────────────
+    def should_stop_external_coaching(self) -> bool:
+        """
+        Autonomous 'stop' decision: stop buying external coaching if
+        the ROI is poor (low success rate) or budget is exhausted.
+        """
+        if self._total_external_spend >= EXTERNAL_COACHING_BUDGET:
+            logger.info(
+                f"Agent {self.agent_id}: External coaching budget exhausted "
+                f"(spent={self._total_external_spend:.1f})"
+            )
+            return True
+        if self._external_calls >= 10:
+            success_rate = self._external_successes / self._external_calls
+            if success_rate < 0.5:
+                logger.info(
+                    f"Agent {self.agent_id}: Stopping external coaching due to low success rate "
+                    f"({success_rate:.0%})"
+                )
+                return True
+        return False
+    def get_stats(self) -> Dict:
+        """Return agent economic stats for dashboard display."""
+        return {
+            "agent_id": self.agent_id,
+            "nvm_available": self._nvm_available,
+            "external_coaching_url": EXTERNAL_COACHING_URL or None,
+            "external_plan_id": EXTERNAL_NVM_PLAN_ID or None,
+            "plan_ordered": self._external_plan_ordered,
+            "external_calls": self._external_calls,
+            "external_successes": self._external_successes,
+            "total_external_spend": self._total_external_spend,
+            "success_rate": (
+                self._external_successes / self._external_calls
+                if self._external_calls > 0 else 0.0
+            ),
+        }

backend/agents/qwen_agent.py ADDED Viewed

	@@ -0,0 +1,228 @@

+"""
+qwen_agent.py
+─────────────
+Loads Qwen2.5-0.5B-Instruct (or any HuggingFace causal LM) and uses it to
+generate chess moves given a position prompt.
+Key responsibilities:
+  - Lazy model loading (first call triggers download + GPU placement)
+  - Illegal-move retry loop (up to settings.max_move_retries attempts)
+  - Log-probability extraction for GRPO training
+  - Temperature annealing hook (called by the trainer after each update)
+"""
+import logging
+import torch
+from typing import Optional
+from transformers import AutoTokenizer, AutoModelForCausalLM
+from backend.settings import settings
+from backend.chess_lib.chess_engine import ChessEngine
+logger = logging.getLogger(__name__)
+# ── Lazy singletons ───────────────────────────────────────────────────────────
+_tokenizer = None
+_model = None
+def _load_model():
+    global _tokenizer, _model
+    if _model is not None:
+        return _tokenizer, _model
+    logger.info("Loading model: %s …", settings.player_model)
+    dtype_map = {
+        "float16":  torch.float16,
+        "bfloat16": torch.bfloat16,
+        "float32":  torch.float32,
+    }
+    torch_dtype = dtype_map.get(settings.torch_dtype, torch.bfloat16)
+    hf_kwargs = {}
+    if settings.hf_token:
+        hf_kwargs["token"] = settings.hf_token
+    _tokenizer = AutoTokenizer.from_pretrained(
+        settings.player_model,
+        trust_remote_code=True,
+        **hf_kwargs,
+    )
+    device_map = settings.device if settings.device != "auto" else "auto"
+    _model = AutoModelForCausalLM.from_pretrained(
+        settings.player_model,
+        torch_dtype=torch_dtype,
+        device_map=device_map,
+        trust_remote_code=True,
+        **hf_kwargs,
+    )
+    _model.eval()
+    logger.info("Model loaded on device: %s", next(_model.parameters()).device)
+    # Apply LoRA if requested
+    if settings.lora_rank > 0:
+        try:
+            from peft import get_peft_model, LoraConfig, TaskType  # type: ignore
+            lora_config = LoraConfig(
+                task_type=TaskType.CAUSAL_LM,
+                r=settings.lora_rank,
+                lora_alpha=settings.lora_rank * 2,
+                lora_dropout=0.05,
+                target_modules=["q_proj", "v_proj"],
+            )
+            _model = get_peft_model(_model, lora_config)
+            _model.print_trainable_parameters()
+            logger.info("LoRA adapter applied (rank=%d)", settings.lora_rank)
+        except ImportError:
+            logger.warning("peft not installed — running without LoRA. pip install peft")
+    return _tokenizer, _model
+class QwenAgent:
+    """
+    Wraps the Qwen model for chess move generation.
+    Usage:
+        agent = QwenAgent()
+        san, log_prob = await agent.get_move(engine, "white", move_history)
+    """
+    def __init__(self):
+        self._temperature = settings.temperature
+    def set_temperature(self, temp: float):
+        """Called by the GRPO trainer to anneal temperature over training."""
+        self._temperature = max(0.1, temp)
+    @property
+    def temperature(self) -> float:
+        return self._temperature
+    def get_move(
+        self,
+        engine: ChessEngine,
+        agent_color: str,
+        move_history: list[str],
+    ) -> tuple[str, float]:
+        """
+        Generate a legal chess move for the given position.
+        Returns:
+            (san_move, log_prob)
+            - san_move: the chosen move in SAN notation
+            - log_prob: sum of log-probs of the generated tokens (for GRPO)
+        Falls back to a random legal move if all retries are exhausted.
+        """
+        tokenizer, model = _load_model()
+        prompt = engine.build_prompt(agent_color, move_history)
+        messages = [
+            {"role": "system", "content": "You are a chess engine. Reply with only the move."},
+            {"role": "user", "content": prompt},
+        ]
+        # Apply chat template
+        text = tokenizer.apply_chat_template(
+            messages,
+            tokenize=False,
+            add_generation_prompt=True,
+        )
+        inputs = tokenizer(text, return_tensors="pt").to(model.device)
+        input_len = inputs["input_ids"].shape[1]
+        best_san: Optional[str] = None
+        best_log_prob: float = 0.0
+        for attempt in range(settings.max_move_retries):
+            with torch.no_grad():
+                outputs = model.generate(
+                    **inputs,
+                    max_new_tokens=settings.max_new_tokens,
+                    temperature=self._temperature,
+                    do_sample=True,
+                    pad_token_id=tokenizer.eos_token_id,
+                    return_dict_in_generate=True,
+                    output_scores=True,
+                )
+            generated_ids = outputs.sequences[0][input_len:]
+            generated_text = tokenizer.decode(generated_ids, skip_special_tokens=True)
+            # Compute sum of log-probs for GRPO
+            log_prob = _compute_log_prob(outputs.scores, generated_ids)
+            san = engine.parse_model_output(generated_text)
+            if san is not None:
+                best_san = san
+                best_log_prob = log_prob
+                logger.debug(
+                    "Move generated (attempt %d/%d): %s  log_prob=%.4f",
+                    attempt + 1, settings.max_move_retries, san, log_prob,
+                )
+                break
+            else:
+                logger.debug(
+                    "Illegal/unparseable output (attempt %d/%d): %r",
+                    attempt + 1, settings.max_move_retries, generated_text,
+                )
+        if best_san is None:
+            # All retries exhausted — fall back to random legal move
+            best_san = engine.random_legal_move_san() or "e4"
+            best_log_prob = 0.0
+            logger.warning("All retries exhausted — using random fallback move: %s", best_san)
+        return best_san, best_log_prob
+    def get_move_log_prob_only(
+        self,
+        engine: ChessEngine,
+        agent_color: str,
+        move_history: list[str],
+        san_move: str,
+    ) -> float:
+        """
+        Compute the log-probability of a specific move under the current policy.
+        Used by GRPO to evaluate the reference policy for KL computation.
+        """
+        tokenizer, model = _load_model()
+        prompt = engine.build_prompt(agent_color, move_history)
+        messages = [
+            {"role": "system", "content": "You are a chess engine. Reply with only the move."},
+            {"role": "user", "content": prompt},
+        ]
+        text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
+        target_text = text + san_move
+        inputs = tokenizer(target_text, return_tensors="pt").to(model.device)
+        prompt_len = tokenizer(text, return_tensors="pt")["input_ids"].shape[1]
+        with torch.no_grad():
+            out = model(**inputs, labels=inputs["input_ids"])
+        # Extract per-token log-probs for the generated portion only
+        logits = out.logits[0, prompt_len - 1:-1]
+        target_ids = inputs["input_ids"][0, prompt_len:]
+        log_probs = torch.nn.functional.log_softmax(logits, dim=-1)
+        selected = log_probs.gather(1, target_ids.unsqueeze(1)).squeeze(1)
+        return selected.sum().item()
+# ── Helpers ───────────────────────────────────────────────────────────────────
+def _compute_log_prob(scores, generated_ids) -> float:
+    """
+    Compute the sum of log-probabilities for the generated token sequence.
+    `scores` is a tuple of (vocab_size,) tensors, one per generated step.
+    """
+    total = 0.0
+    for step, score in enumerate(scores):
+        if step >= len(generated_ids):
+            break
+        log_probs = torch.nn.functional.log_softmax(score[0], dim=-1)
+        total += log_probs[generated_ids[step]].item()
+    return total

backend/api/__init__.py ADDED Viewed

File without changes

backend/api/coaching_router.py ADDED Viewed

	@@ -0,0 +1,274 @@

+"""
+ChessEcon — Chess Analysis API Router (Nevermined-Protected)
+=============================================================
+Exposes POST /api/chess/analyze as a paid service endpoint using the
+x402 payment protocol. Other teams' agents can:
+  1. Discover this endpoint via the Nevermined marketplace
+  2. Subscribe to the ChessEcon Coaching Plan (NVM_PLAN_ID)
+  3. Generate an x402 access token
+  4. Call this endpoint with the token in the `payment-signature` header
+  5. Receive chess position analysis powered by Claude Opus 4.5
+Payment flow:
+  - No token → HTTP 402 with `payment-required` header (base64-encoded spec)
+  - Invalid token → HTTP 402 with error reason
+  - Valid token → Analysis delivered, 1 credit settled automatically
+The endpoint also works WITHOUT Nevermined (NVM_API_KEY not set) for
+local development and testing — payment verification is skipped.
+"""
+from __future__ import annotations
+import base64
+import logging
+import os
+from typing import Any, Dict, List, Optional
+from fastapi import APIRouter, Request
+from fastapi.responses import JSONResponse
+from pydantic import BaseModel
+from backend.agents.claude_coach import claude_coach
+from backend.agents.complexity import ComplexityAnalyzer
+from backend.economy.nvm_payments import nvm_manager, NVM_PLAN_ID, NVM_AGENT_ID
+from shared.models import CoachingRequest
+logger = logging.getLogger(__name__)
+router = APIRouter(prefix="/api/chess", tags=["chess-analysis"])
+# ── Request / Response models ──────────────────────────────────────────────────
+class AnalyzeRequest(BaseModel):
+    """Chess position analysis request."""
+    fen: str
+    legal_moves: List[str]
+    game_id: Optional[str] = "external"
+    agent_id: Optional[str] = "external_agent"
+    context: Optional[str] = None  # Optional game context for richer analysis
+class AnalyzeResponse(BaseModel):
+    """Chess position analysis response."""
+    recommended_move: str
+    analysis: str
+    complexity_score: float
+    complexity_level: str
+    model_used: str
+    credits_used: int = 1
+    nvm_plan_id: Optional[str] = None
+    nvm_agent_id: Optional[str] = None
+# ── Payment helper ─────────────────────────────────────────────────────────────
+def _make_402_response(endpoint: str, http_verb: str = "POST") -> JSONResponse:
+    """
+    Return an HTTP 402 response with the x402 payment-required header.
+    The header contains a base64-encoded PaymentRequired specification
+    that tells clients exactly how to pay for this service.
+    """
+    payment_required = nvm_manager.build_payment_required(endpoint, http_verb)
+    if payment_required is None:
+        # NVM not configured — return plain 402
+        return JSONResponse(
+            status_code=402,
+            content={
+                "error": "Payment Required",
+                "message": (
+                    "This endpoint requires a Nevermined payment token. "
+                    f"Subscribe to plan {NVM_PLAN_ID} and include "
+                    "the x402 access token in the 'payment-signature' header."
+                ),
+                "nvm_plan_id": NVM_PLAN_ID or None,
+                "nvm_agent_id": NVM_AGENT_ID or None,
+                "docs": "https://nevermined.ai/docs/integrate/quickstart/5-minute-setup",
+            },
+        )
+    # Encode the payment spec per x402 spec
+    pr_json = payment_required.model_dump_json(by_alias=True)
+    pr_base64 = base64.b64encode(pr_json.encode()).decode()
+    return JSONResponse(
+        status_code=402,
+        content={
+            "error": "Payment Required",
+            "message": (
+                "Include your x402 access token in the 'payment-signature' header. "
+                f"Subscribe to plan: {NVM_PLAN_ID}"
+            ),
+            "nvm_plan_id": NVM_PLAN_ID or None,
+            "nvm_agent_id": NVM_AGENT_ID or None,
+            "docs": "https://nevermined.ai/docs/integrate/quickstart/5-minute-setup",
+        },
+        headers={"payment-required": pr_base64},
+    )
+# ── Main endpoint ──────────────────────────────────────────────────────────────
+@router.post("/analyze", response_model=AnalyzeResponse)
+async def analyze_position(request: Request, body: AnalyzeRequest):
+    """
+    **Paid chess position analysis endpoint.**
+    Analyzes a chess position and returns the best move recommendation
+    with strategic reasoning, powered by Claude Opus 4.5.
+    **Payment:**
+    - Requires a Nevermined x402 access token in the `payment-signature` header
+    - Each call costs 1 credit from your subscribed plan
+    - Subscribe at: https://nevermined.app/en/subscription/{NVM_PLAN_ID}
+    **Without payment (NVM not configured):**
+    - Falls back to heuristic analysis (no Claude)
+    **Request body:**
+    ```json
+    {
+      "fen": "rnbqkbnr/pppppppp/8/8/4P3/8/PPPP1PPP/RNBQKBNR b KQkq e3 0 1",
+      "legal_moves": ["e7e5", "d7d5", "g8f6", ...],
+      "game_id": "game_001",
+      "agent_id": "my_agent"
+    }
+    ```
+    **Headers:**
+    - `payment-signature`: x402 access token (required when NVM is active)
+    **Response:**
+    ```json
+    {
+      "recommended_move": "e7e5",
+      "analysis": "The move e7e5 controls the center...",
+      "complexity_score": 0.42,
+      "complexity_level": "moderate",
+      "model_used": "claude-opus-4-5",
+      "credits_used": 1
+    }
+    ```
+    """
+    endpoint_url = str(request.url)
+    http_verb = request.method
+    # ── x402 Payment Verification ──────────────────────────────────────────────
+    x402_token = request.headers.get("payment-signature")
+    if nvm_manager.available:
+        if not x402_token:
+            logger.info(
+                f"No payment-signature header for /api/chess/analyze "
+                f"from {request.client.host if request.client else 'unknown'}"
+            )
+            return _make_402_response(endpoint_url, http_verb)
+        is_valid, reason = nvm_manager.verify_token(
+            x402_token=x402_token,
+            endpoint=endpoint_url,
+            http_verb=http_verb,
+            max_credits="1",
+        )
+        if not is_valid:
+            logger.warning(f"Payment verification failed: {reason}")
+            return JSONResponse(
+                status_code=402,
+                content={
+                    "error": "Payment Verification Failed",
+                    "reason": reason,
+                    "nvm_plan_id": NVM_PLAN_ID or None,
+                },
+            )
+    # ── Chess Analysis ─────────────────────────────────────────────────────────
+    # Assess position complexity
+    analyzer = ComplexityAnalyzer()
+    complexity = analyzer.analyze(body.fen, body.legal_moves)
+    # Build coaching request
+    coaching_req = CoachingRequest(
+        game_id=body.game_id or "external",
+        agent_id=body.agent_id or "external_agent",
+        fen=body.fen,
+        legal_moves=body.legal_moves,
+        wallet_balance=0.0,  # External agents don't use internal wallet
+        complexity=complexity,
+    )
+    # Get analysis from Claude (or fallback)
+    coaching_resp = claude_coach.analyze(coaching_req)
+    # ── Settle Credits ─────────────────────────────────────────────────────────
+    if nvm_manager.available and x402_token:
+        nvm_manager.settle_token(
+            x402_token=x402_token,
+            endpoint=endpoint_url,
+            http_verb=http_verb,
+            max_credits="1",
+        )
+    response_data = AnalyzeResponse(
+        recommended_move=coaching_resp.recommended_move,
+        analysis=coaching_resp.analysis,
+        complexity_score=complexity.score,
+        complexity_level=complexity.level.value,
+        model_used=coaching_resp.model_used,
+        credits_used=1,
+        nvm_plan_id=NVM_PLAN_ID or None,
+        nvm_agent_id=NVM_AGENT_ID or None,
+    )
+    logger.info(
+        f"Chess analysis served: game={body.game_id} "
+        f"agent={body.agent_id} move={coaching_resp.recommended_move} "
+        f"model={coaching_resp.model_used} "
+        f"nvm={'settled' if (nvm_manager.available and x402_token) else 'skipped'}"
+    )
+    return response_data
+# ── Service info endpoint (public, no payment required) ────────────────────────
+@router.get("/service-info")
+async def service_info():
+    """
+    Public endpoint returning ChessEcon service information.
+    Other agents can call this to discover how to subscribe and pay.
+    """
+    return {
+        "service": "ChessEcon Chess Analysis",
+        "description": (
+            "Premium chess position analysis powered by Claude Opus 4.5. "
+            "Subscribe to get best-move recommendations and strategic coaching."
+        ),
+        "endpoint": "/api/chess/analyze",
+        "method": "POST",
+        "payment": {
+            "protocol": "x402",
+            "nvm_plan_id": NVM_PLAN_ID or "not configured",
+            "nvm_agent_id": NVM_AGENT_ID or "not configured",
+            "credits_per_request": 1,
+            "marketplace_url": (
+                f"https://nevermined.app/en/subscription/{NVM_PLAN_ID}"
+                if NVM_PLAN_ID else "not configured"
+            ),
+            "how_to_subscribe": [
+                "1. Get NVM API key at https://nevermined.app",
+                "2. Call payments.plans.order_plan(NVM_PLAN_ID)",
+                "3. Call payments.x402.get_x402_access_token(NVM_PLAN_ID, NVM_AGENT_ID)",
+                "4. Include token in 'payment-signature' header",
+            ],
+        },
+        "nvm_available": nvm_manager.available,
+        "claude_available": claude_coach.available,
+        "docs": "https://nevermined.ai/docs/integrate/quickstart/5-minute-setup",
+    }
+# ── NVM transaction history (for dashboard) ────────────────────────────────────
+@router.get("/nvm-transactions")
+async def get_nvm_transactions(limit: int = 50):
+    """Return recent Nevermined payment transactions for dashboard display."""
+    return {
+        "transactions": nvm_manager.get_transactions(limit=limit),
+        "nvm_status": nvm_manager.get_status(),
+    }

backend/api/game_router.py ADDED Viewed

	@@ -0,0 +1,295 @@

+"""
+ChessEcon Backend — Game Router
+REST endpoints for game management + WebSocket game runner that
+orchestrates full games between agents and streams events live.
+"""
+from __future__ import annotations
+import asyncio
+import random
+import uuid
+import logging
+from typing import Optional
+from fastapi import APIRouter, HTTPException, WebSocket, WebSocketDisconnect
+from shared.models import (
+    GameState, NewGameResponse, MoveRequest, MoveResponse,
+    GameOutcome, EventType, WSEvent,
+    CoachingRequest, ComplexityAnalysis, PositionComplexity,
+)
+from backend.chess_lib.engine import chess_engine
+from backend.economy.ledger import ledger
+from backend.agents.complexity import complexity_analyzer
+from backend.agents.claude_coach import claude_coach
+from backend.api.websocket import (
+    ws_manager, emit_game_start, emit_move,
+    emit_coaching_request, emit_coaching_result,
+    emit_game_end, emit_economy_update,
+)
+logger = logging.getLogger(__name__)
+router = APIRouter(prefix="/api/game", tags=["game"])
+# Track cumulative P&L per agent across sessions
+_cumulative_pnl: dict = {}
+# ── REST endpoints ────────────────────────────────────────────────────────────
+@router.post("/new", response_model=NewGameResponse)
+async def new_game(white_id: str = "white", black_id: str = "black"):
+    """Create a new chess game and register agents."""
+    ledger.register_agent(white_id)
+    ledger.register_agent(black_id)
+    game = chess_engine.new_game()
+    return game
+@router.get("/{game_id}", response_model=GameState)
+async def get_game(game_id: str):
+    try:
+        return chess_engine.get_state(game_id)
+    except KeyError:
+        raise HTTPException(status_code=404, detail=f"Game {game_id} not found")
+@router.post("/move", response_model=GameState)
+async def make_move(req: MoveRequest):
+    try:
+        state = chess_engine.make_move(req.game_id, req.move_uci)
+        return state
+    except KeyError:
+        raise HTTPException(status_code=404, detail=f"Game {req.game_id} not found")
+    except ValueError as e:
+        raise HTTPException(status_code=400, detail=str(e))
+@router.delete("/{game_id}")
+async def delete_game(game_id: str):
+    chess_engine.delete_game(game_id)
+    return {"deleted": game_id}
+@router.get("/")
+async def list_games():
+    return {"games": chess_engine.list_games()}
+@router.get("/economy/summary")
+async def economy_summary():
+    return ledger.summary()
+@router.get("/economy/wallet/{agent_id}")
+async def get_wallet(agent_id: str):
+    return ledger.get_wallet(agent_id).model_dump()
+# ── WebSocket game runner ─────────────────────────────────────────────────────
+@router.websocket("/ws/game")
+async def websocket_game_runner(ws: WebSocket):
+    """
+    WebSocket endpoint that runs a full game when connected.
+    Streams all events (moves, coaching, economy) to all dashboard clients.
+    """
+    await ws_manager.connect(ws)
+    try:
+        while True:
+            data = await ws.receive_text()
+            msg = __import__("json").loads(data)
+            if msg.get("action") == "start_game":
+                white_id = msg.get("white_id", "white_agent")
+                black_id = msg.get("black_id", "black_agent")
+                asyncio.create_task(run_game(white_id, black_id))
+    except WebSocketDisconnect:
+        await ws_manager.disconnect(ws)
+async def run_game(
+    white_id: str = "white_agent",
+    black_id: str = "black_agent",
+    game_number: int = 1,
+) -> Optional[GameOutcome]:
+    """
+    Run a complete game between two heuristic agents with economic tracking.
+    Streams all events via the WebSocket manager.
+    """
+    # Register agents and open tournament
+    ledger.register_agent(white_id)
+    ledger.register_agent(black_id)
+    game = chess_engine.new_game()
+    game_id = game.game_id
+    pool = ledger.open_game(game_id, white_id, black_id)
+    white_wallet = ledger.get_balance(white_id)
+    black_wallet = ledger.get_balance(black_id)
+    await emit_game_start(ws_manager, {
+        "game_id": game_id,
+        "game_number": game_number,
+        "white_agent": white_id,
+        "black_agent": black_id,
+        "white_wallet": white_wallet,
+        "black_wallet": black_wallet,
+        "entry_fee": ledger.config.entry_fee,
+        "prize_pool": pool,
+    })
+    max_moves = 150
+    move_count = 0
+    coaching_calls = {"white": 0, "black": 0}
+    coaching_costs = {"white": 0.0, "black": 0.0}
+    while move_count < max_moves:
+        state = chess_engine.get_state(game_id)
+        if state.outcome != GameOutcome.ONGOING:
+            break
+        # Determine current player
+        is_white_turn = (move_count % 2 == 0)
+        current_agent = white_id if is_white_turn else black_id
+        player_label = "white" if is_white_turn else "black"
+        # Complexity analysis
+        features = chess_engine.complexity_features(game_id)
+        features["fen"] = state.fen
+        analysis = complexity_analyzer.analyze(features)
+        # Decide whether to use coaching
+        used_coaching = False
+        coaching_move: Optional[str] = None
+        if (
+            analysis.recommend_coaching
+            and ledger.can_afford_coaching(current_agent)
+            and claude_coach.available
+            and random.random() < 0.3  # 30% chance when eligible
+        ):
+            await emit_coaching_request(ws_manager, {
+                "game_id": game_id,
+                "agent_id": current_agent,
+                "player": player_label,
+                "complexity": analysis.score,
+                "complexity_level": analysis.level.value,
+                "wallet": ledger.get_balance(current_agent),
+            })
+            fee = ledger.charge_coaching(current_agent, game_id)
+            if fee > 0:
+                coaching_req = CoachingRequest(
+                    game_id=game_id,
+                    agent_id=current_agent,
+                    fen=state.fen,
+                    legal_moves=state.legal_moves,
+                    wallet_balance=ledger.get_balance(current_agent),
+                    complexity=analysis,
+                )
+                coaching_resp = claude_coach.analyze(coaching_req)
+                coaching_move = coaching_resp.recommended_move
+                used_coaching = True
+                coaching_calls[player_label] += 1
+                coaching_costs[player_label] += fee
+                await emit_coaching_result(ws_manager, {
+                    "game_id": game_id,
+                    "agent_id": current_agent,
+                    "player": player_label,
+                    "recommended_move": coaching_move,
+                    "analysis_snippet": coaching_resp.analysis[:200],
+                    "cost": fee,
+                    "model": coaching_resp.model_used,
+                })
+        # Select move
+        if coaching_move and coaching_move in state.legal_moves:
+            move_uci = coaching_move
+        else:
+            move_uci = _heuristic_move(state.legal_moves, state.fen)
+        # Execute move
+        try:
+            new_state = chess_engine.make_move(game_id, move_uci)
+        except ValueError as e:
+            logger.warning(f"Invalid move {move_uci}: {e} — using random")
+            move_uci = random.choice(state.legal_moves)
+            new_state = chess_engine.make_move(game_id, move_uci)
+        move_count += 1
+        white_wallet = ledger.get_balance(white_id)
+        black_wallet = ledger.get_balance(black_id)
+        await emit_move(ws_manager, {
+            "game_id": game_id,
+            "player": player_label,
+            "move_uci": move_uci,
+            "fen": new_state.fen,
+            "move_number": new_state.move_number,
+            "wallet_white": white_wallet,
+            "wallet_black": black_wallet,
+            "used_coaching": used_coaching,
+            "complexity": analysis.score,
+        })
+        # Small delay for visual effect
+        await asyncio.sleep(0.3)
+    # Settle game
+    final_state = chess_engine.get_state(game_id)
+    outcome = final_state.outcome
+    if outcome == GameOutcome.ONGOING:
+        outcome = GameOutcome.DRAW  # Treat max-move games as draws
+    result = ledger.settle_game(game_id, outcome)
+    chess_engine.delete_game(game_id)
+    white_final = ledger.get_balance(white_id)
+    black_final = ledger.get_balance(black_id)
+    # Compute P&L for economy update
+    entry_fee = ledger.config.entry_fee
+    prize_income = result.prize_paid if result.winner == white_id else (
+        result.prize_paid if result.winner == black_id else result.prize_paid / 2
+    )
+    total_coaching = coaching_costs["white"] + coaching_costs["black"]
+    net_pnl = prize_income - entry_fee - total_coaching
+    # Track cumulative P&L
+    for aid in [white_id, black_id]:
+        _cumulative_pnl[aid] = _cumulative_pnl.get(aid, 0.0) + (
+            (result.prize_paid - entry_fee - coaching_costs.get("white" if aid == white_id else "black", 0.0))
+        )
+    await emit_game_end(ws_manager, {
+        "game_id": game_id,
+        "game_number": game_number,
+        "outcome": outcome.value,
+        "winner": result.winner,
+        "white_wallet_final": white_final,
+        "black_wallet_final": black_final,
+        "prize_paid": result.prize_paid,
+        "total_moves": move_count,
+        "coaching_calls_white": coaching_calls["white"],
+        "coaching_calls_black": coaching_calls["black"],
+    })
+    await emit_economy_update(ws_manager, {
+        "game_number": game_number,
+        "white_wallet": white_final,
+        "black_wallet": black_final,
+        "prize_income": result.prize_paid,
+        "coaching_cost": total_coaching,
+        "entry_fee": entry_fee * 2,
+        "net_pnl": net_pnl,
+        "cumulative_pnl": _cumulative_pnl.get(white_id, 0.0),
+    })
+    return outcome
+def _heuristic_move(legal_moves: list, fen: str) -> str:
+    """Simple heuristic: prefer captures and center moves, else random."""
+    import chess as _chess
+    board = _chess.Board(fen)
+    captures = [m.uci() for m in board.legal_moves if board.is_capture(m)]
+    if captures:
+        return random.choice(captures)
+    center = ["e2e4", "d2d4", "e7e5", "d7d5", "g1f3", "b1c3"]
+    center_moves = [m for m in center if m in legal_moves]
+    if center_moves:
+        return random.choice(center_moves)
+    return random.choice(legal_moves)

backend/api/training_router.py ADDED Viewed

	@@ -0,0 +1,75 @@

+"""
+ChessEcon Backend — Training Status Router
+REST endpoints for monitoring training progress.
+The actual training runs in the separate training/ service.
+"""
+from __future__ import annotations
+import os
+import json
+import glob
+import logging
+from pathlib import Path
+from fastapi import APIRouter, HTTPException
+logger = logging.getLogger(__name__)
+router = APIRouter(prefix="/api/training", tags=["training"])
+CHECKPOINT_DIR   = os.getenv("CHECKPOINT_DIR", "./training/checkpoints")
+SELFPLAY_DATA_DIR = os.getenv("SELFPLAY_DATA_DIR", "./training/data")
+@router.get("/status")
+async def training_status():
+    """Return current training status from checkpoint directory."""
+    checkpoint_dir = Path(CHECKPOINT_DIR)
+    if not checkpoint_dir.exists():
+        return {"status": "not_started", "checkpoints": [], "latest_step": 0}
+    checkpoints = sorted(checkpoint_dir.glob("step_*"), key=lambda p: p.stat().st_mtime)
+    latest_step = 0
+    latest_metrics = {}
+    if checkpoints:
+        latest = checkpoints[-1]
+        metrics_file = latest / "metrics.json"
+        if metrics_file.exists():
+            with open(metrics_file) as f:
+                latest_metrics = json.load(f)
+        latest_step = int(latest.name.replace("step_", ""))
+    return {
+        "status": "running" if checkpoints else "not_started",
+        "latest_step": latest_step,
+        "checkpoints": [str(c.name) for c in checkpoints[-5:]],
+        "latest_metrics": latest_metrics,
+    }
+@router.get("/metrics")
+async def training_metrics():
+    """Return all training metrics from saved checkpoints."""
+    checkpoint_dir = Path(CHECKPOINT_DIR)
+    if not checkpoint_dir.exists():
+        return {"metrics": []}
+    all_metrics = []
+    for metrics_file in sorted(checkpoint_dir.glob("*/metrics.json")):
+        try:
+            with open(metrics_file) as f:
+                all_metrics.append(json.load(f))
+        except Exception:
+            pass
+    return {"metrics": all_metrics}
+@router.get("/episodes")
+async def episode_count():
+    """Return count of collected self-play episodes."""
+    data_dir = Path(SELFPLAY_DATA_DIR)
+    if not data_dir.exists():
+        return {"count": 0, "files": []}
+    files = list(data_dir.glob("*.jsonl"))
+    total = sum(
+        sum(1 for _ in open(f)) for f in files
+    )
+    return {"count": total, "files": [f.name for f in files[-5:]]}

backend/api/websocket.py ADDED Viewed

	@@ -0,0 +1,97 @@

+"""
+ChessEcon Backend — WebSocket Event Bus
+Broadcasts real-time game events, training metrics, and economy updates
+to all connected frontend clients.
+"""
+from __future__ import annotations
+import asyncio
+import json
+import logging
+from typing import Set
+from fastapi import WebSocket, WebSocketDisconnect
+from shared.models import WSEvent, EventType
+logger = logging.getLogger(__name__)
+class ConnectionManager:
+    """Manages all active WebSocket connections and broadcasts events."""
+    def __init__(self):
+        self._connections: Set[WebSocket] = set()
+        self._lock = asyncio.Lock()
+    async def connect(self, ws: WebSocket) -> None:
+        await ws.accept()
+        async with self._lock:
+            self._connections.add(ws)
+        logger.info(f"WebSocket connected. Total: {len(self._connections)}")
+    async def disconnect(self, ws: WebSocket) -> None:
+        async with self._lock:
+            self._connections.discard(ws)
+        logger.info(f"WebSocket disconnected. Total: {len(self._connections)}")
+    async def broadcast(self, event: WSEvent) -> None:
+        """Send an event to all connected clients."""
+        if not self._connections:
+            return
+        payload = event.model_dump_json()
+        dead: Set[WebSocket] = set()
+        async with self._lock:
+            connections = set(self._connections)
+        for ws in connections:
+            try:
+                await ws.send_text(payload)
+            except Exception:
+                dead.add(ws)
+        if dead:
+            async with self._lock:
+                self._connections -= dead
+    async def broadcast_raw(self, data: dict) -> None:
+        """Broadcast a raw dict, preserving the 'type' field from the dict itself."""
+        type_map = {
+            "game_start": EventType.GAME_START,
+            "move": EventType.MOVE,
+            "coaching_request": EventType.COACHING_REQUEST,
+            "coaching_result": EventType.COACHING_RESULT,
+            "game_end": EventType.GAME_END,
+            "training_step": EventType.TRAINING_STEP,
+            "economy_update": EventType.ECONOMY_UPDATE,
+        }
+        event_type = type_map.get(data.get("type", ""), EventType.MOVE)
+        event = WSEvent(type=event_type, data=data.get("data", data))
+        await self.broadcast(event)
+    @property
+    def connection_count(self) -> int:
+        return len(self._connections)
+# ── Helper functions for emitting typed events ────────────────────────────────
+async def emit_game_start(manager: ConnectionManager, data: dict) -> None:
+    await manager.broadcast(WSEvent(type=EventType.GAME_START, data=data))
+async def emit_move(manager: ConnectionManager, data: dict) -> None:
+    await manager.broadcast(WSEvent(type=EventType.MOVE, data=data))
+async def emit_coaching_request(manager: ConnectionManager, data: dict) -> None:
+    await manager.broadcast(WSEvent(type=EventType.COACHING_REQUEST, data=data))
+async def emit_coaching_result(manager: ConnectionManager, data: dict) -> None:
+    await manager.broadcast(WSEvent(type=EventType.COACHING_RESULT, data=data))
+async def emit_game_end(manager: ConnectionManager, data: dict) -> None:
+    await manager.broadcast(WSEvent(type=EventType.GAME_END, data=data))
+async def emit_training_step(manager: ConnectionManager, data: dict) -> None:
+    await manager.broadcast(WSEvent(type=EventType.TRAINING_STEP, data=data))
+async def emit_economy_update(manager: ConnectionManager, data: dict) -> None:
+    await manager.broadcast(WSEvent(type=EventType.ECONOMY_UPDATE, data=data))
+# Singleton
+ws_manager = ConnectionManager()

backend/api/websocket.py_backup ADDED Viewed

	@@ -0,0 +1,87 @@

+"""
+ChessEcon Backend — WebSocket Event Bus
+Broadcasts real-time game events, training metrics, and economy updates
+to all connected frontend clients.
+"""
+from __future__ import annotations
+import asyncio
+import json
+import logging
+from typing import Set, Any
+from fastapi import WebSocket, WebSocketDisconnect
+from shared.models import WSEvent, EventType
+logger = logging.getLogger(__name__)
+class ConnectionManager:
+    """Manages all active WebSocket connections and broadcasts events."""
+    def __init__(self):
+        self._connections: Set[WebSocket] = set()
+        self._lock = asyncio.Lock()
+    async def connect(self, ws: WebSocket) -> None:
+        await ws.accept()
+        async with self._lock:
+            self._connections.add(ws)
+        logger.info(f"WebSocket connected. Total: {len(self._connections)}")
+    async def disconnect(self, ws: WebSocket) -> None:
+        async with self._lock:
+            self._connections.discard(ws)
+        logger.info(f"WebSocket disconnected. Total: {len(self._connections)}")
+    async def broadcast(self, event: WSEvent) -> None:
+        """Send an event to all connected clients."""
+        if not self._connections:
+            return
+        payload = event.model_dump_json()
+        dead: Set[WebSocket] = set()
+        async with self._lock:
+            connections = set(self._connections)
+        for ws in connections:
+            try:
+                await ws.send_text(payload)
+            except Exception:
+                dead.add(ws)
+        if dead:
+            async with self._lock:
+                self._connections -= dead
+    async def broadcast_raw(self, data: dict) -> None:
+        """Broadcast a raw dictionary as JSON."""
+        event = WSEvent(type=EventType.MOVE, data=data)
+        await self.broadcast(event)
+    @property
+    def connection_count(self) -> int:
+        return len(self._connections)
+# ── Helper functions for emitting typed events ────────────────────────────────
+async def emit_game_start(manager: ConnectionManager, data: dict) -> None:
+    await manager.broadcast(WSEvent(type=EventType.GAME_START, data=data))
+async def emit_move(manager: ConnectionManager, data: dict) -> None:
+    await manager.broadcast(WSEvent(type=EventType.MOVE, data=data))
+async def emit_coaching_request(manager: ConnectionManager, data: dict) -> None:
+    await manager.broadcast(WSEvent(type=EventType.COACHING_REQUEST, data=data))
+async def emit_coaching_result(manager: ConnectionManager, data: dict) -> None:
+    await manager.broadcast(WSEvent(type=EventType.COACHING_RESULT, data=data))
+async def emit_game_end(manager: ConnectionManager, data: dict) -> None:
+    await manager.broadcast(WSEvent(type=EventType.GAME_END, data=data))
+async def emit_training_step(manager: ConnectionManager, data: dict) -> None:
+    await manager.broadcast(WSEvent(type=EventType.TRAINING_STEP, data=data))
+async def emit_economy_update(manager: ConnectionManager, data: dict) -> None:
+    await manager.broadcast(WSEvent(type=EventType.ECONOMY_UPDATE, data=data))
+# Singleton
+ws_manager = ConnectionManager()

backend/chess_engine.py ADDED Viewed

	@@ -0,0 +1,186 @@

+"""
+chess_engine.py
+───────────────
+Thin wrapper around python-chess providing:
+  - Board state management
+  - Legal move validation and parsing
+  - FEN / SAN / UCI conversion helpers
+  - Reward calculation after game end
+"""
+import chess
+import chess.pgn
+import random
+from typing import Optional
+class ChessEngine:
+    """Manages a single game of chess and exposes helpers for the agent loop."""
+    def __init__(self):
+        self.board = chess.Board()
+    # ── Board state ───────────────────────────────────────────────────────
+    @property
+    def fen(self) -> str:
+        return self.board.fen()
+    @property
+    def turn(self) -> str:
+        return "white" if self.board.turn == chess.WHITE else "black"
+    @property
+    def move_number(self) -> int:
+        return self.board.fullmove_number
+    @property
+    def is_game_over(self) -> bool:
+        return self.board.is_game_over()
+    @property
+    def result(self) -> Optional[str]:
+        """Returns '1-0', '0-1', '1/2-1/2', or None if game is ongoing."""
+        if not self.board.is_game_over():
+            return None
+        outcome = self.board.outcome()
+        if outcome is None:
+            return "1/2-1/2"
+        if outcome.winner == chess.WHITE:
+            return "1-0"
+        if outcome.winner == chess.BLACK:
+            return "0-1"
+        return "1/2-1/2"
+    @property
+    def legal_moves_uci(self) -> list[str]:
+        return [m.uci() for m in self.board.legal_moves]
+    @property
+    def legal_moves_san(self) -> list[str]:
+        return [self.board.san(m) for m in self.board.legal_moves]
+    def reset(self):
+        self.board = chess.Board()
+    # ── Move application ──────────────────────────────────────────────────
+    def apply_move_uci(self, uci: str) -> Optional[str]:
+        """
+        Apply a UCI move (e.g. 'e2e4') to the board.
+        Returns the SAN string on success, None if the move is illegal.
+        """
+        try:
+            move = chess.Move.from_uci(uci)
+            if move not in self.board.legal_moves:
+                return None
+            san = self.board.san(move)
+            self.board.push(move)
+            return san
+        except (ValueError, chess.InvalidMoveError):
+            return None
+    def apply_move_san(self, san: str) -> Optional[str]:
+        """
+        Apply a SAN move (e.g. 'Nf3') to the board.
+        Returns the UCI string on success, None if illegal.
+        """
+        try:
+            move = self.board.parse_san(san)
+            uci = move.uci()
+            self.board.push(move)
+            return uci
+        except (ValueError, chess.InvalidMoveError, chess.AmbiguousMoveError):
+            return None
+    # ── Move parsing helpers ──────────────────────────────────────────────
+    def parse_model_output(self, text: str) -> Optional[str]:
+        """
+        Extract the first plausible chess move from raw model output.
+        Tries SAN first, then UCI.  Returns the SAN string if valid, else None.
+        """
+        # Clean up whitespace and take the first token
+        tokens = text.strip().split()
+        for token in tokens[:5]:  # check first 5 tokens
+            clean = token.strip(".,!?;:()")
+            # Try SAN
+            try:
+                move = self.board.parse_san(clean)
+                if move in self.board.legal_moves:
+                    return self.board.san(move)
+            except Exception:
+                pass
+            # Try UCI
+            try:
+                move = chess.Move.from_uci(clean)
+                if move in self.board.legal_moves:
+                    return self.board.san(move)
+            except Exception:
+                pass
+        return None
+    def uci_to_san(self, uci: str) -> Optional[str]:
+        """Convert a UCI move string (e.g. 'e2e4') to SAN if it is legal."""
+        try:
+            move = self.board.parse_uci(uci)
+            if move in self.board.legal_moves:
+                return self.board.san(move)
+        except Exception:
+            pass
+        return None
+    def san_to_uci(self, san: str) -> Optional[str]:
+        """Convert a SAN move string (e.g. 'Nf3') to UCI if it is legal."""
+        try:
+            move = self.board.parse_san(san)
+            if move in self.board.legal_moves:
+                return move.uci()
+        except Exception:
+            pass
+        return None
+    def random_legal_move_san(self) -> Optional[str]:
+        """Return a random legal move in SAN notation (fallback)."""
+        legal = list(self.board.legal_moves)
+        if not legal:
+            return None
+        move = random.choice(legal)
+        return self.board.san(move)
+    # ── Reward calculation ────────────────────────────────────────────────
+    def compute_reward(self, agent_color: str) -> float:
+        """
+        Terminal reward for the agent after the game ends.
+          +1.0  win
+          -1.0  loss
+           0.0  draw or game not over
+        """
+        result = self.result
+        if result is None:
+            return 0.0
+        if result == "1-0":
+            return 1.0 if agent_color == "white" else -1.0
+        if result == "0-1":
+            return 1.0 if agent_color == "black" else -1.0
+        return 0.0  # draw
+    # ── Position prompt ───────────────────────────────────────────────────
+    def build_prompt(self, agent_color: str, move_history: list[str]) -> str:
+        """
+        Build the text prompt fed to Qwen for move generation.
+        Keeps it short so the model stays focused on the move token.
+        """
+        history_str = " ".join(move_history[-20:]) if move_history else "(opening)"
+        legal_sample = ", ".join(self.legal_moves_san[:10])
+        return (
+            f"You are a chess engine playing as {agent_color}.\n"
+            f"Position (FEN): {self.fen}\n"
+            f"Move history: {history_str}\n"
+            f"Some legal moves: {legal_sample}\n"
+            f"Reply with ONLY the single best next move in standard algebraic notation (SAN), "
+            f"e.g. 'e4' or 'Nf3'. Do not explain."
+        )

backend/chess_lib/__init__.py ADDED Viewed

File without changes

backend/chess_lib/chess_engine.py ADDED Viewed

	@@ -0,0 +1,166 @@

+"""
+chess_engine.py
+───────────────
+Thin wrapper around python-chess providing:
+  - Board state management
+  - Legal move validation and parsing
+  - FEN / SAN / UCI conversion helpers
+  - Reward calculation after game end
+"""
+import chess
+import chess.pgn
+import random
+from typing import Optional
+class ChessEngine:
+    """Manages a single game of chess and exposes helpers for the agent loop."""
+    def __init__(self):
+        self.board = chess.Board()
+    # ── Board state ───────────────────────────────────────────────────────
+    @property
+    def fen(self) -> str:
+        return self.board.fen()
+    @property
+    def turn(self) -> str:
+        return "white" if self.board.turn == chess.WHITE else "black"
+    @property
+    def move_number(self) -> int:
+        return self.board.fullmove_number
+    @property
+    def is_game_over(self) -> bool:
+        return self.board.is_game_over()
+    @property
+    def result(self) -> Optional[str]:
+        """Returns '1-0', '0-1', '1/2-1/2', or None if game is ongoing."""
+        if not self.board.is_game_over():
+            return None
+        outcome = self.board.outcome()
+        if outcome is None:
+            return "1/2-1/2"
+        if outcome.winner == chess.WHITE:
+            return "1-0"
+        if outcome.winner == chess.BLACK:
+            return "0-1"
+        return "1/2-1/2"
+    @property
+    def legal_moves_uci(self) -> list[str]:
+        return [m.uci() for m in self.board.legal_moves]
+    @property
+    def legal_moves_san(self) -> list[str]:
+        return [self.board.san(m) for m in self.board.legal_moves]
+    def reset(self):
+        self.board = chess.Board()
+    # ── Move application ──────────────────────────────────────────────────
+    def apply_move_uci(self, uci: str) -> Optional[str]:
+        """
+        Apply a UCI move (e.g. 'e2e4') to the board.
+        Returns the SAN string on success, None if the move is illegal.
+        """
+        try:
+            move = chess.Move.from_uci(uci)
+            if move not in self.board.legal_moves:
+                return None
+            san = self.board.san(move)
+            self.board.push(move)
+            return san
+        except (ValueError, chess.InvalidMoveError):
+            return None
+    def apply_move_san(self, san: str) -> Optional[str]:
+        """
+        Apply a SAN move (e.g. 'Nf3') to the board.
+        Returns the UCI string on success, None if illegal.
+        """
+        try:
+            move = self.board.parse_san(san)
+            uci = move.uci()
+            self.board.push(move)
+            return uci
+        except (ValueError, chess.InvalidMoveError, chess.AmbiguousMoveError):
+            return None
+    # ── Move parsing helpers ──────────────────────────────────────────────
+    def parse_model_output(self, text: str) -> Optional[str]:
+        """
+        Extract the first plausible chess move from raw model output.
+        Tries SAN first, then UCI.  Returns the SAN string if valid, else None.
+        """
+        # Clean up whitespace and take the first token
+        tokens = text.strip().split()
+        for token in tokens[:5]:  # check first 5 tokens
+            clean = token.strip(".,!?;:()")
+            # Try SAN
+            try:
+                move = self.board.parse_san(clean)
+                if move in self.board.legal_moves:
+                    return self.board.san(move)
+            except Exception:
+                pass
+            # Try UCI
+            try:
+                move = chess.Move.from_uci(clean)
+                if move in self.board.legal_moves:
+                    return self.board.san(move)
+            except Exception:
+                pass
+        return None
+    def random_legal_move_san(self) -> Optional[str]:
+        """Return a random legal move in SAN notation (fallback)."""
+        legal = list(self.board.legal_moves)
+        if not legal:
+            return None
+        move = random.choice(legal)
+        return self.board.san(move)
+    # ── Reward calculation ────────────────────────────────────────────────
+    def compute_reward(self, agent_color: str) -> float:
+        """
+        Terminal reward for the agent after the game ends.
+          +1.0  win
+          -1.0  loss
+           0.0  draw or game not over
+        """
+        result = self.result
+        if result is None:
+            return 0.0
+        if result == "1-0":
+            return 1.0 if agent_color == "white" else -1.0
+        if result == "0-1":
+            return 1.0 if agent_color == "black" else -1.0
+        return 0.0  # draw
+    # ── Position prompt ───────────────────────────────────────────────────
+    def build_prompt(self, agent_color: str, move_history: list[str]) -> str:
+        """
+        Build the text prompt fed to Qwen for move generation.
+        Keeps it short so the model stays focused on the move token.
+        """
+        history_str = " ".join(move_history[-20:]) if move_history else "(opening)"
+        legal_sample = ", ".join(self.legal_moves_san[:10])
+        return (
+            f"You are a chess engine playing as {agent_color}.\n"
+            f"Position (FEN): {self.fen}\n"
+            f"Move history: {history_str}\n"
+            f"Some legal moves: {legal_sample}\n"
+            f"Reply with ONLY the single best next move in standard algebraic notation (SAN), "
+            f"e.g. 'e4' or 'Nf3'. Do not explain."
+        )

backend/chess_lib/engine.py ADDED Viewed

	@@ -0,0 +1,125 @@

+"""
+ChessEcon Backend — Chess Engine
+Wraps python-chess to manage game state, validate moves, and detect outcomes.
+"""
+from __future__ import annotations
+import uuid
+import chess
+import chess.pgn
+from typing import Dict, Optional, List
+from shared.models import GameState, GameOutcome, GameStatus, NewGameResponse
+class ChessEngine:
+    """Thread-safe chess game manager. Stores all active games in memory."""
+    def __init__(self):
+        self._games: Dict[str, chess.Board] = {}
+    # ── Game lifecycle ────────────────────────────────────────────────────────
+    def new_game(self, game_id: Optional[str] = None) -> NewGameResponse:
+        gid = game_id or str(uuid.uuid4())
+        board = chess.Board()
+        self._games[gid] = board
+        return NewGameResponse(
+            game_id=gid,
+            fen=board.fen(),
+            legal_moves=[m.uci() for m in board.legal_moves],
+            status=GameStatus.ACTIVE,
+        )
+    def get_state(self, game_id: str) -> GameState:
+        board = self._get_board(game_id)
+        return GameState(
+            game_id=game_id,
+            fen=board.fen(),
+            legal_moves=[m.uci() for m in board.legal_moves],
+            outcome=self._outcome(board),
+            move_number=board.fullmove_number,
+            move_history=[m.uci() for m in board.move_stack],
+            status=GameStatus.FINISHED if board.is_game_over() else GameStatus.ACTIVE,
+        )
+    def make_move(self, game_id: str, move_uci: str) -> GameState:
+        board = self._get_board(game_id)
+        if board.is_game_over():
+            raise ValueError(f"Game {game_id} is already over")
+        try:
+            move = chess.Move.from_uci(move_uci)
+        except ValueError:
+            raise ValueError(f"Invalid UCI move format: {move_uci}")
+        if move not in board.legal_moves:
+            legal = [m.uci() for m in board.legal_moves]
+            raise ValueError(
+                f"Illegal move {move_uci} in position {board.fen()}. "
+                f"Legal moves: {legal[:10]}{'...' if len(legal) > 10 else ''}"
+            )
+        board.push(move)
+        return self.get_state(game_id)
+    def delete_game(self, game_id: str) -> None:
+        self._games.pop(game_id, None)
+    def list_games(self) -> List[str]:
+        return list(self._games.keys())
+    # ── Position analysis ─────────────────────────────────────────────────────
+    def get_legal_moves(self, game_id: str) -> List[str]:
+        board = self._get_board(game_id)
+        return [m.uci() for m in board.legal_moves]
+    def get_fen(self, game_id: str) -> str:
+        return self._get_board(game_id).fen()
+    def is_game_over(self, game_id: str) -> bool:
+        return self._get_board(game_id).is_game_over()
+    def complexity_features(self, game_id: str) -> dict:
+        """Return raw features used by the complexity analyzer."""
+        board = self._get_board(game_id)
+        legal = list(board.legal_moves)
+        return {
+            "num_legal_moves": len(legal),
+            "is_check": board.is_check(),
+            "has_captures": any(board.is_capture(m) for m in legal),
+            "num_pieces": len(board.piece_map()),
+            "fullmove_number": board.fullmove_number,
+            "material_balance": self._material_balance(board),
+        }
+    # ── Private helpers ───────────────────────────────────────────────────────
+    def _get_board(self, game_id: str) -> chess.Board:
+        if game_id not in self._games:
+            raise KeyError(f"Game {game_id} not found")
+        return self._games[game_id]
+    @staticmethod
+    def _outcome(board: chess.Board) -> GameOutcome:
+        if not board.is_game_over():
+            return GameOutcome.ONGOING
+        result = board.result()
+        if result == "1-0":
+            return GameOutcome.WHITE_WIN
+        elif result == "0-1":
+            return GameOutcome.BLACK_WIN
+        return GameOutcome.DRAW
+    @staticmethod
+    def _material_balance(board: chess.Board) -> float:
+        """Positive = white advantage."""
+        piece_values = {
+            chess.PAWN: 1, chess.KNIGHT: 3, chess.BISHOP: 3,
+            chess.ROOK: 5, chess.QUEEN: 9, chess.KING: 0,
+        }
+        balance = 0.0
+        for piece_type, value in piece_values.items():
+            balance += value * len(board.pieces(piece_type, chess.WHITE))
+            balance -= value * len(board.pieces(piece_type, chess.BLACK))
+        return balance
+# Singleton instance
+chess_engine = ChessEngine()

backend/economy/.DS_Store ADDED Viewed

Binary file (6.15 kB). View file

backend/economy/__init__.py ADDED Viewed

File without changes

backend/economy/ledger.py ADDED Viewed

	@@ -0,0 +1,174 @@

+"""
+ChessEcon Backend — Economic Ledger
+Manages agent wallets, tournament prize pools, and transaction history.
+"""
+from __future__ import annotations
+import os
+import uuid
+import time
+from typing import Dict, List, Optional
+from shared.models import Transaction, WalletState, TournamentResult, GameOutcome
+class EconomicConfig:
+    entry_fee: float       = float(os.getenv("ENTRY_FEE", "10.0"))
+    prize_multiplier: float = float(os.getenv("PRIZE_MULTIPLIER", "0.9"))
+    initial_wallet: float  = float(os.getenv("INITIAL_WALLET", "100.0"))
+    coaching_fee: float    = float(os.getenv("COACHING_FEE", "5.0"))
+    min_wallet_for_coaching: float = float(os.getenv("MIN_WALLET_FOR_COACHING", "15.0"))
+class Ledger:
+    """
+    Manages all economic activity in the ChessEcon system.
+    Thread-safe for concurrent game sessions.
+    """
+    def __init__(self, config: Optional[EconomicConfig] = None):
+        self.config = config or EconomicConfig()
+        self._wallets: Dict[str, WalletState] = {}
+        self._transactions: List[Transaction] = []
+        self._open_games: Dict[str, dict] = {}  # game_id -> {white, black, pool}
+    # ── Wallet management ─────────────────────────────────────────────────────
+    def register_agent(self, agent_id: str) -> WalletState:
+        if agent_id not in self._wallets:
+            self._wallets[agent_id] = WalletState(
+                agent_id=agent_id,
+                balance=self.config.initial_wallet,
+                total_earned=self.config.initial_wallet,
+            )
+        return self._wallets[agent_id]
+    def get_wallet(self, agent_id: str) -> WalletState:
+        if agent_id not in self._wallets:
+            return self.register_agent(agent_id)
+        return self._wallets[agent_id]
+    def get_balance(self, agent_id: str) -> float:
+        return self.get_wallet(agent_id).balance
+    def _debit(self, agent_id: str, amount: float, description: str) -> Transaction:
+        wallet = self.get_wallet(agent_id)
+        wallet.balance -= amount
+        wallet.total_spent += amount
+        tx = Transaction(
+            tx_id=str(uuid.uuid4()),
+            agent_id=agent_id,
+            amount=-amount,
+            description=description,
+        )
+        self._transactions.append(tx)
+        return tx
+    def _credit(self, agent_id: str, amount: float, description: str) -> Transaction:
+        wallet = self.get_wallet(agent_id)
+        wallet.balance += amount
+        wallet.total_earned += amount
+        tx = Transaction(
+            tx_id=str(uuid.uuid4()),
+            agent_id=agent_id,
+            amount=amount,
+            description=description,
+        )
+        self._transactions.append(tx)
+        return tx
+    # ── Tournament management ─────────────────────────────────────────────────
+    def open_game(self, game_id: str, white_id: str, black_id: str) -> float:
+        """Collect entry fees and open a prize pool. Returns pool size."""
+        fee = self.config.entry_fee
+        self._debit(white_id, fee, f"Entry fee game {game_id}")
+        self._debit(black_id, fee, f"Entry fee game {game_id}")
+        pool = fee * 2 * self.config.prize_multiplier
+        self._open_games[game_id] = {
+            "white": white_id,
+            "black": black_id,
+            "pool": pool,
+            "entry_fees": fee * 2,
+        }
+        # Update game counts
+        self.get_wallet(white_id).games_played += 1
+        self.get_wallet(black_id).games_played += 1
+        return pool
+    def settle_game(self, game_id: str, outcome: GameOutcome) -> TournamentResult:
+        """Pay out prize pool based on game outcome. Returns settlement details."""
+        if game_id not in self._open_games:
+            raise KeyError(f"Game {game_id} not found in open games")
+        game = self._open_games.pop(game_id)
+        white_id = game["white"]
+        black_id = game["black"]
+        pool = game["pool"]
+        entry_fees = game["entry_fees"]
+        organizer_cut = entry_fees - pool
+        winner: Optional[str] = None
+        prize_paid = 0.0
+        if outcome == GameOutcome.WHITE_WIN:
+            winner = white_id
+            prize_paid = pool
+            self._credit(white_id, pool, f"Prize win game {game_id}")
+            self.get_wallet(white_id).games_won += 1
+        elif outcome == GameOutcome.BLACK_WIN:
+            winner = black_id
+            prize_paid = pool
+            self._credit(black_id, pool, f"Prize win game {game_id}")
+            self.get_wallet(black_id).games_won += 1
+        elif outcome == GameOutcome.DRAW:
+            # Split pool equally on draw
+            half = pool / 2
+            prize_paid = pool
+            self._credit(white_id, half, f"Draw prize game {game_id}")
+            self._credit(black_id, half, f"Draw prize game {game_id}")
+        return TournamentResult(
+            game_id=game_id,
+            winner=winner,
+            outcome=outcome,
+            prize_paid=prize_paid,
+            entry_fees_collected=entry_fees,
+            organizer_cut=organizer_cut,
+        )
+    # ── Coaching payments ─────────────────────────────────────────────────────
+    def charge_coaching(self, agent_id: str, game_id: str) -> float:
+        """Deduct coaching fee. Returns fee charged, or 0 if insufficient funds."""
+        wallet = self.get_wallet(agent_id)
+        if wallet.balance < self.config.min_wallet_for_coaching:
+            return 0.0
+        fee = self.config.coaching_fee
+        self._debit(agent_id, fee, f"Claude coaching game {game_id}")
+        wallet.coaching_calls += 1
+        return fee
+    def can_afford_coaching(self, agent_id: str) -> bool:
+        return self.get_balance(agent_id) >= self.config.min_wallet_for_coaching
+    # ── Reporting ─────────────────────────────────────────────────────────────
+    def get_all_wallets(self) -> Dict[str, WalletState]:
+        return dict(self._wallets)
+    def get_transactions(self, agent_id: Optional[str] = None) -> List[Transaction]:
+        if agent_id:
+            return [t for t in self._transactions if t.agent_id == agent_id]
+        return list(self._transactions)
+    def summary(self) -> dict:
+        wallets = self._wallets
+        return {
+            "total_agents": len(wallets),
+            "total_transactions": len(self._transactions),
+            "open_games": len(self._open_games),
+            "wallets": {aid: w.model_dump() for aid, w in wallets.items()},
+        }
+# Singleton instance
+ledger = Ledger()

backend/economy/nvm_payments.py ADDED Viewed

	@@ -0,0 +1,340 @@

+"""
+ChessEcon — Nevermined Payment Manager
+=======================================
+Wraps the payments-py SDK to provide a clean interface for:
+  - Initializing the Nevermined Payments client
+  - Verifying x402 payment tokens on incoming requests
+  - Settling credits after successful service delivery
+  - Ordering plans and generating access tokens (subscriber side)
+  - Tracking NVM transactions for the dashboard
+This replaces the internal ledger for cross-team agent-to-agent payments.
+The internal ledger (economy/ledger.py) is still used for intra-team
+tournament accounting (entry fees, prize pools).
+"""
+from __future__ import annotations
+import logging
+import os
+from dataclasses import dataclass, field
+from datetime import datetime, timezone
+from typing import Any, Dict, List, Optional
+logger = logging.getLogger(__name__)
+# ── Environment ────────────────────────────────────────────────────────────────
+NVM_API_KEY = os.getenv("NVM_API_KEY", "")
+NVM_ENVIRONMENT = os.getenv("NVM_ENVIRONMENT", "sandbox")
+NVM_PLAN_ID = os.getenv("NVM_PLAN_ID", "")
+NVM_AGENT_ID = os.getenv("NVM_AGENT_ID", "")
+# ── Transaction record ─────────────────────────────────────────────────────────
+@dataclass
+class NvmTransaction:
+    """A recorded Nevermined payment transaction."""
+    tx_id: str
+    tx_type: str          # "verify" | "settle" | "order" | "token"
+    agent_id: str
+    plan_id: str
+    credits: int
+    timestamp: str = field(default_factory=lambda: datetime.now(timezone.utc).isoformat())
+    details: Dict[str, Any] = field(default_factory=dict)
+    success: bool = True
+    error: Optional[str] = None
+# ── Nevermined Payment Manager ─────────────────────────────────────────────────
+class NeverminedPaymentManager:
+    """
+    Singleton manager for all Nevermined payment operations.
+    Usage (server side — verify + settle):
+        nvm = NeverminedPaymentManager()
+        if nvm.available:
+            ok, reason = nvm.verify_token(token, request_url, "POST")
+            if ok:
+                # ... handle request ...
+                nvm.settle_token(token, request_url, "POST")
+    Usage (client/agent side — order + get token):
+        nvm = NeverminedPaymentManager()
+        if nvm.available:
+            nvm.order_plan(plan_id)
+            token = nvm.get_access_token(plan_id, agent_id)
+    """
+    def __init__(self):
+        self._payments = None
+        self._available = False
+        self._transactions: List[NvmTransaction] = []
+        self._init_sdk()
+    def _init_sdk(self):
+        """Initialize the payments-py SDK if NVM_API_KEY is configured."""
+        if not NVM_API_KEY:
+            logger.warning(
+                "NVM_API_KEY not set — Nevermined payments disabled. "
+                "Set NVM_API_KEY in .env to enable cross-team agent payments."
+            )
+            return
+        try:
+            from payments_py import Payments, PaymentOptions
+            self._payments = Payments.get_instance(
+                PaymentOptions(
+                    nvm_api_key=NVM_API_KEY,
+                    environment=NVM_ENVIRONMENT,
+                )
+            )
+            self._available = True
+            logger.info(
+                f"Nevermined Payments SDK initialized "
+                f"(environment={NVM_ENVIRONMENT}, "
+                f"plan_id={NVM_PLAN_ID or 'not set'}, "
+                f"agent_id={NVM_AGENT_ID or 'not set'})"
+            )
+        except Exception as exc:
+            logger.error(f"Failed to initialize Nevermined SDK: {exc}")
+            self._available = False
+    # ── Properties ─────────────────────────────────────────────────────────────
+    @property
+    def available(self) -> bool:
+        return self._available and self._payments is not None
+    @property
+    def payments(self):
+        return self._payments
+    # ── Server-side: verify + settle ───────────────────────────────────────────
+    def build_payment_required(
+        self,
+        endpoint: str,
+        http_verb: str = "POST",
+        plan_id: Optional[str] = None,
+        agent_id: Optional[str] = None,
+    ):
+        """Build a PaymentRequired spec for a protected endpoint."""
+        if not self.available:
+            return None
+        try:
+            from payments_py.x402.helpers import build_payment_required
+            return build_payment_required(
+                plan_id=plan_id or NVM_PLAN_ID,
+                endpoint=endpoint,
+                agent_id=agent_id or NVM_AGENT_ID,
+                http_verb=http_verb,
+            )
+        except Exception as exc:
+            logger.error(f"build_payment_required failed: {exc}")
+            return None
+    def verify_token(
+        self,
+        x402_token: str,
+        endpoint: str,
+        http_verb: str = "POST",
+        max_credits: str = "1",
+        plan_id: Optional[str] = None,
+        agent_id: Optional[str] = None,
+    ) -> tuple[bool, Optional[str]]:
+        """
+        Verify an x402 access token WITHOUT burning credits.
+        Returns:
+            (is_valid, error_reason)
+        """
+        if not self.available:
+            # Graceful degradation: allow requests when NVM not configured
+            logger.debug("NVM not available — skipping payment verification")
+            return True, None
+        payment_required = self.build_payment_required(endpoint, http_verb, plan_id, agent_id)
+        if payment_required is None:
+            return False, "Could not build payment_required spec"
+        try:
+            verification = self._payments.facilitator.verify_permissions(
+                payment_required=payment_required,
+                x402_access_token=x402_token,
+                max_amount=max_credits,
+            )
+            is_valid = verification.is_valid
+            reason = None if is_valid else (verification.invalid_reason or "Verification failed")
+            self._record_transaction(
+                tx_type="verify",
+                agent_id=agent_id or NVM_AGENT_ID,
+                plan_id=plan_id or NVM_PLAN_ID,
+                credits=int(max_credits),
+                success=is_valid,
+                error=reason,
+                details={"endpoint": endpoint, "verb": http_verb},
+            )
+            return is_valid, reason
+        except Exception as exc:
+            logger.error(f"verify_permissions failed: {exc}")
+            return False, str(exc)
+    def settle_token(
+        self,
+        x402_token: str,
+        endpoint: str,
+        http_verb: str = "POST",
+        max_credits: str = "1",
+        plan_id: Optional[str] = None,
+        agent_id: Optional[str] = None,
+    ) -> bool:
+        """
+        Settle (burn) credits after successful service delivery.
+        Returns:
+            True if settlement succeeded, False otherwise.
+        """
+        if not self.available:
+            return True  # No-op when NVM not configured
+        payment_required = self.build_payment_required(endpoint, http_verb, plan_id, agent_id)
+        if payment_required is None:
+            return False
+        try:
+            settlement = self._payments.facilitator.settle_permissions(
+                payment_required=payment_required,
+                x402_access_token=x402_token,
+                max_amount=max_credits,
+            )
+            credits_burned = getattr(settlement, "credits_redeemed", int(max_credits))
+            self._record_transaction(
+                tx_type="settle",
+                agent_id=agent_id or NVM_AGENT_ID,
+                plan_id=plan_id or NVM_PLAN_ID,
+                credits=credits_burned,
+                success=True,
+                details={"endpoint": endpoint, "verb": http_verb},
+            )
+            logger.info(f"NVM credits settled: {credits_burned} credits for {endpoint}")
+            return True
+        except Exception as exc:
+            logger.error(f"settle_permissions failed: {exc}")
+            self._record_transaction(
+                tx_type="settle",
+                agent_id=agent_id or NVM_AGENT_ID,
+                plan_id=plan_id or NVM_PLAN_ID,
+                credits=0,
+                success=False,
+                error=str(exc),
+                details={"endpoint": endpoint},
+            )
+            return False
+    # ── Client/agent side: order + token ──────────────────────────────────────
+    def order_plan(self, plan_id: str) -> bool:
+        """
+        Subscribe to a payment plan (purchase credits).
+        Returns:
+            True if order succeeded.
+        """
+        if not self.available:
+            return False
+        try:
+            result = self._payments.plans.order_plan(plan_id)
+            self._record_transaction(
+                tx_type="order",
+                agent_id="self",
+                plan_id=plan_id,
+                credits=0,
+                success=True,
+                details=result,
+            )
+            logger.info(f"NVM plan ordered: {plan_id}")
+            return True
+        except Exception as exc:
+            logger.error(f"order_plan failed: {exc}")
+            return False
+    def get_access_token(
+        self,
+        plan_id: str,
+        agent_id: Optional[str] = None,
+    ) -> Optional[str]:
+        """
+        Generate an x402 access token for a purchased plan.
+        Returns:
+            The access token string, or None on failure.
+        """
+        if not self.available:
+            return None
+        try:
+            result = self._payments.x402.get_x402_access_token(
+                plan_id=plan_id,
+                agent_id=agent_id or NVM_AGENT_ID,
+            )
+            token = result.get("accessToken") or result.get("access_token")
+            self._record_transaction(
+                tx_type="token",
+                agent_id=agent_id or NVM_AGENT_ID,
+                plan_id=plan_id,
+                credits=0,
+                success=bool(token),
+            )
+            return token
+        except Exception as exc:
+            logger.error(f"get_x402_access_token failed: {exc}")
+            return None
+    def get_plan_balance(self, plan_id: str) -> Optional[Dict[str, Any]]:
+        """Return the current credit balance for a plan."""
+        if not self.available:
+            return None
+        try:
+            return self._payments.plans.get_plan_balance(plan_id)
+        except Exception as exc:
+            logger.error(f"get_plan_balance failed: {exc}")
+            return None
+    # ── Transaction history ────────────────────────────────────────────────────
+    def _record_transaction(self, **kwargs):
+        import uuid
+        tx = NvmTransaction(
+            tx_id=str(uuid.uuid4()),
+            **kwargs,
+        )
+        self._transactions.append(tx)
+        # Keep last 500 transactions in memory
+        if len(self._transactions) > 500:
+            self._transactions = self._transactions[-500:]
+    def get_transactions(self, limit: int = 50) -> List[Dict[str, Any]]:
+        """Return recent NVM transactions for dashboard display."""
+        txs = self._transactions[-limit:]
+        return [
+            {
+                "tx_id": t.tx_id,
+                "type": t.tx_type,
+                "agent_id": t.agent_id,
+                "plan_id": t.plan_id,
+                "credits": t.credits,
+                "timestamp": t.timestamp,
+                "success": t.success,
+                "error": t.error,
+                "details": t.details,
+            }
+            for t in reversed(txs)
+        ]
+    def get_status(self) -> Dict[str, Any]:
+        """Return NVM integration status for health checks."""
+        return {
+            "available": self.available,
+            "environment": NVM_ENVIRONMENT,
+            "plan_id": NVM_PLAN_ID or None,
+            "agent_id": NVM_AGENT_ID or None,
+            "api_key_set": bool(NVM_API_KEY),
+            "transaction_count": len(self._transactions),
+        }
+# ── Singleton ──────────────────────────────────────────────────────────────────
+nvm_manager = NeverminedPaymentManager()

backend/economy/register_agent.py ADDED Viewed

	@@ -0,0 +1,138 @@

+"""
+ChessEcon — Nevermined Agent Registration Script
+=================================================
+Run this ONCE to register the ChessEcon chess analysis service in the
+Nevermined marketplace. It creates:
+  1. A payment plan (credits-based, free for hackathon demo)
+  2. An agent entry pointing to the /api/chess/analyze endpoint
+After running, copy the printed NVM_PLAN_ID and NVM_AGENT_ID into your .env.
+Usage:
+    cd chessecon-v2
+    python -m backend.economy.register_agent
+Environment variables required:
+    NVM_API_KEY        — Your Nevermined API key (sandbox:xxx...)
+    CHESSECON_API_URL  — Public URL of your ChessEcon backend
+                         e.g. https://your-server.com or https://ngrok-url.ngrok.io
+"""
+from __future__ import annotations
+import os
+import sys
+import logging
+logging.basicConfig(level=logging.INFO, format="%(levelname )s: %(message)s")
+logger = logging.getLogger(__name__)
+# ── Config ─────────────────────────────────────────────────────────────────────
+NVM_API_KEY = os.getenv("NVM_API_KEY", "")
+NVM_ENVIRONMENT = os.getenv("NVM_ENVIRONMENT", "sandbox")
+CHESSECON_API_URL = os.getenv("CHESSECON_API_URL", "https://chessecon.example.com" )
+# Service description
+SERVICE_NAME = "ChessEcon Chess Analysis"
+SERVICE_DESCRIPTION = (
+    "Premium chess position analysis powered by Claude Opus 4.5. "
+    "Provides best-move recommendations, tactical threat assessment, "
+    "and strategic coaching for AI chess agents. "
+    "Part of the ChessEcon multi-agent chess economy — "
+    "agents earn money playing chess and spend it on coaching."
+)
+SERVICE_TAGS = ["chess", "ai", "coaching", "analysis", "game", "rl", "hackathon"]
+# Plan: free credits for hackathon demo
+# 1000 credits, 1 credit per request — free to subscribe
+PLAN_NAME = "ChessEcon Coaching Plan (Hackathon)"
+PLAN_DESCRIPTION = (
+    "1000 free credits for chess position analysis. "
+    "Each analysis request costs 1 credit. "
+    "Subscribe to access the ChessEcon coaching endpoint."
+)
+CREDITS_GRANTED = 1000
+CREDITS_PER_REQUEST = 1
+def register():
+    """Register ChessEcon as a paid agent service in Nevermined."""
+    if not NVM_API_KEY:
+        logger.error(
+            "NVM_API_KEY is not set. "
+            "Get your key at https://nevermined.app and set it in .env"
+         )
+        sys.exit(1)
+    logger.info(f"Initializing Nevermined SDK (environment={NVM_ENVIRONMENT})")
+    try:
+        from payments_py import Payments, PaymentOptions
+        from payments_py.common.types import AgentMetadata, AgentAPIAttributes, PlanMetadata, Endpoint
+        from payments_py.plans import get_free_price_config, get_fixed_credits_config
+    except ImportError:
+        logger.error("payments-py not installed. Run: pip install payments-py")
+        sys.exit(1)
+    payments = Payments.get_instance(
+        PaymentOptions(
+            nvm_api_key=NVM_API_KEY,
+            environment=NVM_ENVIRONMENT,
+        )
+    )
+    analyze_endpoint = f"{CHESSECON_API_URL}/api/chess/analyze"
+    openapi_url = f"{CHESSECON_API_URL}/openapi.json"
+    logger.info(f"Registering agent at: {analyze_endpoint}")
+    logger.info(f"OpenAPI spec: {openapi_url}")
+    try:
+        result = payments.agents.register_agent_and_plan(
+            agent_metadata=AgentMetadata(
+                name=SERVICE_NAME,
+                description=SERVICE_DESCRIPTION,
+                tags=SERVICE_TAGS,
+            ),
+            agent_api=AgentAPIAttributes(
+                endpoints=[Endpoint(verb="POST", url=analyze_endpoint)],
+                open_endpoints=[f"{CHESSECON_API_URL}/health"],
+                agent_definition_url=openapi_url,
+            ),
+            plan_metadata=PlanMetadata(
+                name=PLAN_NAME,
+                description=PLAN_DESCRIPTION,
+            ),
+            price_config=get_free_price_config(),
+            credits_config=get_fixed_credits_config(
+                credits_granted=CREDITS_GRANTED,
+                credits_per_request=CREDITS_PER_REQUEST,
+            ),
+            access_limit="credits",
+        )
+        agent_id = result.get("agentId", "")
+        plan_id = result.get("planId", "")
+        print("\n" + "=" * 60)
+        print("✅  ChessEcon registered on Nevermined!")
+        print("=" * 60)
+        print(f"  NVM_AGENT_ID = {agent_id}")
+        print(f"  NVM_PLAN_ID  = {plan_id}")
+        print("=" * 60)
+        print("\nAdd these to your .env file:")
+        print(f"  NVM_AGENT_ID={agent_id}")
+        print(f"  NVM_PLAN_ID={plan_id}")
+        print("\nMarketplace URL:")
+        print(f"  https://nevermined.app/en/subscription/{plan_id}" )
+        print("=" * 60 + "\n")
+        return agent_id, plan_id
+    except Exception as exc:
+        logger.error(f"Registration failed: {exc}")
+        raise
+if __name__ == "__main__":
+    register()

backend/grpo_trainer.py ADDED Viewed

	@@ -0,0 +1,240 @@

+"""
+grpo_trainer.py
+───────────────
+Group Relative Policy Optimisation (GRPO) training loop for the chess agent.
+Algorithm summary (per game batch):
+  1. Collect a group of G candidate moves per position (sampled from the policy).
+  2. Compute advantages: A_i = (r_i - mean(r)) / (std(r) + ε)
+     where r_i is the terminal game reward for the trajectory that chose move i.
+  3. Compute the GRPO policy loss:
+       L = -E[ min(ratio * A, clip(ratio, 1-ε, 1+ε) * A) ]
+     where ratio = exp(log_π_θ(a) - log_π_old(a))
+  4. Add KL penalty: L_total = L + β * KL(π_θ || π_ref)
+  5. Backprop and update the model weights.
+In practice, for a single-agent chess game:
+  - Each move in the game is a "step" with a delayed terminal reward.
+  - The group is formed by sampling G moves at each position and running
+    mini-rollouts (or approximating with the final game outcome).
+  - For simplicity we use the full game outcome as the reward for every
+    move in the game (REINFORCE-style with GRPO normalisation).
+References:
+  DeepSeek-R1 GRPO: https://arxiv.org/abs/2501.12599
+"""
+import os
+import logging
+import torch
+import torch.nn.functional as F
+from dataclasses import dataclass, field
+from typing import Optional
+from settings import settings
+logger = logging.getLogger(__name__)
+@dataclass
+class Trajectory:
+    """One complete game trajectory collected for training."""
+    agent_color: str
+    log_probs: list[float]          # log π_θ(a_t | s_t) for each move
+    ref_log_probs: list[float]      # log π_ref(a_t | s_t) for KL
+    reward: float                   # terminal reward (+1 win, -1 loss, 0 draw)
+    move_count: int = 0
+@dataclass
+class TrainingMetrics:
+    step: int = 0
+    loss: float = 0.0
+    policy_reward: float = 0.0
+    kl_div: float = 0.0
+    win_rate: float = 0.0
+    avg_profit: float = 0.0
+    coaching_rate: float = 0.0
+    # Running stats
+    wins: int = 0
+    games: int = 0
+    total_profit: float = 0.0
+    total_coaching_calls: int = 0
+    total_moves: int = 0
+class GRPOTrainer:
+    """
+    Manages the GRPO training loop for the Qwen chess agent.
+    Usage:
+        trainer = GRPOTrainer(model, tokenizer)
+        trainer.record_move(log_prob, ref_log_prob)
+        ...
+        metrics = trainer.end_game(reward, profit, coaching_calls)
+        # metrics is None until grpo_update_every_n_games games have been collected
+    """
+    def __init__(self, model, tokenizer):
+        self.model = model
+        self.tokenizer = tokenizer
+        self._step = 0
+        self._pending: list[Trajectory] = []
+        self._current: Optional[Trajectory] = None
+        self._metrics = TrainingMetrics()
+        # Optimizer — only update LoRA params if present, else all params
+        trainable = [p for p in model.parameters() if p.requires_grad]
+        if not trainable:
+            logger.warning("No trainable parameters found — GRPO updates will be no-ops.")
+        self._optimizer = torch.optim.AdamW(trainable, lr=settings.grpo_lr) if trainable else None
+    # ── Game lifecycle ────────────────────────────────────────────────────
+    def start_game(self, agent_color: str):
+        """Call at the start of each game."""
+        self._current = Trajectory(agent_color=agent_color, log_probs=[], ref_log_probs=[], reward=0.0)
+    def record_move(self, log_prob: float, ref_log_prob: float):
+        """Call after each move with the policy and reference log-probs."""
+        if self._current is None:
+            return
+        self._current.log_probs.append(log_prob)
+        self._current.ref_log_probs.append(ref_log_prob)
+        self._current.move_count += 1
+    def end_game(
+        self,
+        reward: float,
+        profit: float = 0.0,
+        coaching_calls: int = 0,
+    ) -> Optional[TrainingMetrics]:
+        """
+        Call at game end with the terminal reward.
+        Returns updated TrainingMetrics if a gradient update was performed,
+        else None (still accumulating games).
+        """
+        if self._current is None:
+            return None
+        self._current.reward = reward
+        self._pending.append(self._current)
+        self._current = None
+        # Update running stats
+        m = self._metrics
+        m.games += 1
+        if reward > 0:
+            m.wins += 1
+        m.total_profit += profit
+        m.total_coaching_calls += coaching_calls
+        m.total_moves += self._pending[-1].move_count
+        # Trigger update every N games
+        if m.games % settings.grpo_update_every_n_games == 0:
+            return self._update()
+        return None
+    # ── GRPO update ───────────────────────────────────────────────────────
+    def _update(self) -> TrainingMetrics:
+        """Perform one GRPO gradient update over the pending trajectories."""
+        if self._optimizer is None or not self._pending:
+            return self._build_metrics()
+        trajectories = self._pending
+        self._pending = []
+        # Collect rewards and compute advantages (GRPO normalisation)
+        rewards = torch.tensor([t.reward for t in trajectories], dtype=torch.float32)
+        mean_r = rewards.mean()
+        std_r = rewards.std(unbiased=False) + 1e-8  # unbiased=False avoids nan for N=1
+        if std_r < 1e-6:
+            advantages = rewards - mean_r
+        else:
+            advantages = (rewards - mean_r) / std_r  # shape: (N,)
+        total_loss = torch.tensor(0.0, requires_grad=True)
+        total_kl = 0.0
+        n_tokens = 0
+        for traj, adv in zip(trajectories, advantages):
+            if not traj.log_probs:
+                continue
+            lp = torch.tensor(traj.log_probs, dtype=torch.float32)       # (T,)
+            ref_lp = torch.tensor(traj.ref_log_probs, dtype=torch.float32)  # (T,)
+            # Ratio: π_θ / π_old  (here π_old == π_ref since we update every game)
+            ratio = torch.exp(lp - ref_lp)
+            # Clipped surrogate loss (PPO-style clip)
+            eps = 0.2
+            clipped = torch.clamp(ratio, 1 - eps, 1 + eps)
+            surrogate = torch.min(ratio * adv, clipped * adv)
+            policy_loss = -surrogate.mean()
+            # KL penalty: KL(π_θ || π_ref) ≈ exp(lp - ref_lp) - (lp - ref_lp) - 1
+            diff = torch.clamp(lp - ref_lp, -10, 10)  # prevent KL explosion
+            kl = (torch.exp(diff) - diff - 1).mean()
+            total_kl += kl.item()
+            step_loss = policy_loss + settings.grpo_kl_coeff * kl
+            total_loss = total_loss + step_loss
+            n_tokens += len(traj.log_probs)
+        if n_tokens > 0:
+            total_loss = total_loss / len(trajectories)
+            self._optimizer.zero_grad()
+            total_loss.backward()
+            torch.nn.utils.clip_grad_norm_(self.model.parameters(), max_norm=1.0)
+            self._optimizer.step()
+        self._step += 1
+        # Save checkpoint periodically
+        if self._step % settings.save_every_n_steps == 0:
+            self._save_checkpoint()
+        # Update metrics
+        m = self._metrics
+        m.step = self._step
+        m.loss = total_loss.item() if n_tokens > 0 else 0.0
+        m.policy_reward = float(rewards.mean())
+        m.kl_div = total_kl / max(len(trajectories), 1)
+        m.win_rate = m.wins / max(m.games, 1)
+        m.avg_profit = m.total_profit / max(m.games, 1)
+        m.coaching_rate = m.total_coaching_calls / max(m.total_moves, 1)
+        logger.info(
+            "GRPO step %d | loss=%.4f reward=%.3f kl=%.4f win_rate=%.2f",
+            m.step, m.loss, m.policy_reward, m.kl_div, m.win_rate,
+        )
+        return self._build_metrics()
+    def _build_metrics(self) -> TrainingMetrics:
+        import copy
+        return copy.copy(self._metrics)
+    # ── Checkpoint ────────────────────────────────────────────────────────
+    def _save_checkpoint(self):
+        os.makedirs(settings.checkpoint_dir, exist_ok=True)
+        path = os.path.join(settings.checkpoint_dir, f"step_{self._step:06d}")
+        try:
+            self.model.save_pretrained(path)
+            self.tokenizer.save_pretrained(path)
+            logger.info("Checkpoint saved: %s", path)
+        except Exception as exc:
+            logger.error("Checkpoint save failed: %s", exc)
+    def load_checkpoint(self, path: str):
+        """Load a previously saved LoRA checkpoint."""
+        try:
+            from peft import PeftModel  # type: ignore
+            self.model = PeftModel.from_pretrained(self.model, path)
+            logger.info("Checkpoint loaded: %s", path)
+        except Exception as exc:
+            logger.error("Checkpoint load failed: %s", exc)

backend/main.py ADDED Viewed

	@@ -0,0 +1,313 @@

+"""
+ChessEcon Backend — FastAPI Application
+Serves the chess game API, WebSocket event stream, and the built React frontend.
+"""
+from __future__ import annotations
+import os
+import asyncio
+import json
+import logging
+from pathlib import Path
+from contextlib import asynccontextmanager
+from fastapi import FastAPI, WebSocket, WebSocketDisconnect
+from fastapi.middleware.cors import CORSMiddleware
+from fastapi.staticfiles import StaticFiles
+from fastapi.responses import FileResponse, JSONResponse
+from backend.api.game_router import router as game_router
+from backend.api.training_router import router as training_router
+from backend.api.websocket import ws_manager
+from backend.agents.qwen_agent import QwenAgent
+from backend.agents.grpo_trainer import GRPOTrainer
+from backend.chess_lib.chess_engine import ChessEngine
+from backend.settings import settings
+# ── Logging ───────────────────────────────────────────────────────────────────
+logging.basicConfig(
+    level=os.getenv("LOG_LEVEL", "info").upper(),
+    format="%(asctime)s [%(levelname)s] %(name)s: %(message)s",
+)
+logger = logging.getLogger(__name__)
+# ── Static frontend path ──────────────────────────────────────────────────────
+FRONTEND_DIST = Path(__file__).parent / "static"
+# ── Live game snapshot (sent to late-joining clients) ────────────────────────
+# Updated by game_loop; read by websocket_endpoint on new connections.
+game_snapshot: dict = {}
+# ── Game loop (runs as a background task) ─────────────────────────────────────
+async def game_loop():
+    white = QwenAgent()
+    black = QwenAgent()
+    from backend.agents.qwen_agent import _load_model
+    tokenizer, model = _load_model()
+    trainer = GRPOTrainer(model, tokenizer)
+    game_num = 0
+    # Wallets persist across games — agents earn/lose money each game
+    wallet_white = settings.starting_wallet
+    wallet_black = settings.starting_wallet
+    while True:
+        engine = ChessEngine()
+        move_history: list[str] = []
+        game_num += 1
+        # Deduct entry fees at the start of each game
+        wallet_white -= settings.entry_fee
+        wallet_black -= settings.entry_fee
+        # Update snapshot so late-joining clients can sync
+        game_snapshot.update({
+            "type": "game_start",
+            "game_num": game_num,
+            "wallet_white": wallet_white,
+            "wallet_black": wallet_black,
+            "fen": "rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1",
+            "move_number": 0,
+            "grpo_step": 0,
+            "games_completed": game_num - 1,
+        })
+        await ws_manager.broadcast_raw({
+            "type": "game_start",
+            "data": {
+                "game": game_num,
+                "game_id": game_num,
+                "wallet_white": wallet_white,
+                "wallet_black": wallet_black,
+            },
+        })
+        trainer.start_game("white")
+        while not engine.is_game_over:
+            color = engine.turn
+            agent = white if color == "white" else black
+            # get_move returns (san, log_prob)
+            san, log_prob = agent.get_move(engine, color, move_history)
+            # get reference log prob for GRPO KL term
+            ref_log_prob = agent.get_move_log_prob_only(engine, color, move_history, san)
+            # apply the move
+            uci = engine.apply_move_san(san)
+            if uci is None:
+                # fallback: random legal move
+                san = engine.random_legal_move_san()
+                uci = engine.apply_move_san(san) or ""
+                log_prob = -1.0
+                ref_log_prob = -1.0
+            move_history.append(san)
+            trainer.record_move(log_prob, ref_log_prob)
+            # Keep snapshot current so late joiners see the live position
+            game_snapshot.update({
+                "fen": engine.fen,
+                "move_number": engine.move_number,
+                "wallet_white": wallet_white,
+                "wallet_black": wallet_black,
+            })
+            await ws_manager.broadcast_raw({
+                "type": "move",
+                "data": {
+                    "player": color,
+                    "uci": uci or "",
+                    "move": san,
+                    "fen": engine.fen,
+                    "turn": engine.turn,
+                    "move_number": engine.move_number,
+                    "wallet_white": wallet_white,
+                    "wallet_black": wallet_black,
+                    "log_prob": log_prob,
+                    "message": f"{color} plays {san}",
+                },
+            })
+            await asyncio.sleep(settings.move_delay)
+        result = engine.result
+        reward_w = engine.compute_reward("white")
+        reward_b = engine.compute_reward("black")
+        # Award prize money: winner gets 2x entry fee, draw splits the pot
+        prize_pool = settings.entry_fee * 2
+        if reward_w > 0:      # white wins
+            prize_white = prize_pool
+            prize_black = 0.0
+        elif reward_b > 0:    # black wins
+            prize_white = 0.0
+            prize_black = prize_pool
+        else:                  # draw — split pot
+            prize_white = prize_pool / 2
+            prize_black = prize_pool / 2
+        wallet_white += prize_white
+        wallet_black += prize_black
+        metrics = trainer.end_game(
+            reward=reward_w,
+            profit=prize_white - settings.entry_fee,
+            coaching_calls=0,
+        )
+        net_pnl_white = prize_white - settings.entry_fee
+        # Update snapshot with post-game wallet values
+        game_snapshot.update({
+            "type": "between_games",
+            "wallet_white": wallet_white,
+            "wallet_black": wallet_black,
+            "games_completed": game_num,
+            "grpo_step": (metrics or trainer._metrics).step,
+        })
+        await ws_manager.broadcast_raw({
+            "type": "game_end",
+            "data": {
+                "game": game_num,
+                "game_id": game_num,
+                "result": result,
+                "reward": reward_w,
+                "reward_white": reward_w,
+                "reward_black": reward_b,
+                "wallet_white": wallet_white,
+                "wallet_black": wallet_black,
+                "net_pnl_white": net_pnl_white,
+                "prize_income": prize_white,
+                "coaching_cost": 0.0,
+                "entry_fee": settings.entry_fee,
+                "grpo_loss": metrics.loss if metrics else None,
+                "win_rate": metrics.win_rate if metrics else None,
+                "kl_divergence": metrics.kl_div if metrics else None,
+                "avg_profit": metrics.avg_profit if metrics else None,
+                "grpo_step": metrics.step if metrics else 0,
+            },
+        })
+        # Always emit a training_step event so the GRPO charts update
+        # even when end_game returns None (not every game triggers an update)
+        current_metrics = metrics or trainer._metrics
+        await ws_manager.broadcast_raw({
+            "type": "training_step",
+            "data": {
+                "step": current_metrics.step,
+                "loss": current_metrics.loss,
+                "reward": reward_w,
+                "kl_div": current_metrics.kl_div,
+                "win_rate": current_metrics.win_rate,
+                "avg_profit": current_metrics.avg_profit,
+                "coaching_rate": 0.0,
+                "games": current_metrics.games,
+            },
+        })
+        await asyncio.sleep(settings.move_delay * 4)
+@asynccontextmanager
+async def lifespan(app: FastAPI):
+    logger.info("ChessEcon backend starting up")
+    logger.info(f"Frontend dist: {FRONTEND_DIST} (exists: {FRONTEND_DIST.exists()})")
+    logger.info(f"Claude Coach: {'enabled' if os.getenv('ANTHROPIC_API_KEY') else 'disabled (no API key)'}")
+    logger.info(f"HuggingFace token: {'set' if os.getenv('HF_TOKEN') else 'not set'}")
+    asyncio.create_task(game_loop())
+    yield
+    logger.info("ChessEcon backend shutting down")
+# ── FastAPI app ───────────────────────────────────────────────────────────────
+app = FastAPI(
+    title="ChessEcon API",
+    description="Multi-Agent Chess Economy — Backend API",
+    version="2.0.0",
+    lifespan=lifespan,
+)
+app.add_middleware(
+    CORSMiddleware,
+    allow_origins=["*"],
+    allow_credentials=True,
+    allow_methods=["*"],
+    allow_headers=["*"],
+)
+# ── API routes ────────────────────────────────────────────────────────────────
+app.include_router(game_router)
+app.include_router(training_router)
+# ── WebSocket endpoint ──────────────────────────────────────────────────────
+@app.websocket("/ws")
+async def websocket_endpoint(ws: WebSocket):
+    await ws_manager.connect(ws)
+    # Send current game state to the newly connected client
+    if game_snapshot:
+        snap = game_snapshot.copy()
+        await ws.send_text(json.dumps({
+            "type": "status",
+            "timestamp": __import__('time').time(),
+            "data": snap,
+        }))
+    try:
+        while True:
+            data = await ws.receive_text()
+            try:
+                msg = json.loads(data)
+                action = msg.get("action")
+                if action == "ping":
+                    await ws.send_text(json.dumps({"type": "pong"}))
+                elif action == "start_game":
+                    pass  # game_loop auto-starts from lifespan
+                elif action == "stop_game":
+                    pass  # games stop after current game ends
+            except json.JSONDecodeError:
+                pass
+    except WebSocketDisconnect:
+        await ws_manager.disconnect(ws)
+# ── Health check ──────────────────────────────────────────────────────────────
+@app.get("/health")
+async def health():
+    return {
+        "status": "ok",
+        "service": "chessecon-backend",
+        "version": "2.0.0",
+        "ws_connections": ws_manager.connection_count,
+        "claude_available": bool(os.getenv("ANTHROPIC_API_KEY")),
+        "hf_token_set": bool(os.getenv("HF_TOKEN")),
+    }
+@app.get("/api/config")
+async def get_config():
+    return {
+        "entry_fee": float(os.getenv("ENTRY_FEE", "10.0")),
+        "initial_wallet": float(os.getenv("INITIAL_WALLET", "100.0")),
+        "coaching_fee": float(os.getenv("COACHING_FEE", "5.0")),
+        "player_model": os.getenv("PLAYER_MODEL", "Qwen/Qwen2.5-0.5B-Instruct"),
+        "claude_model": os.getenv("CLAUDE_MODEL", "claude-opus-4-5"),
+        "claude_available": bool(os.getenv("ANTHROPIC_API_KEY")),
+        "rl_method": os.getenv("RL_METHOD", "grpo"),
+    }
+# ── Serve React frontend (SPA) ────────────────────────────────────────────────
+if FRONTEND_DIST.exists():
+    app.mount("/assets", StaticFiles(directory=str(FRONTEND_DIST / "assets")), name="assets")
+    @app.get("/{full_path:path}")
+    async def serve_spa(full_path: str):
+        index = FRONTEND_DIST / "index.html"
+        if index.exists():
+            return FileResponse(str(index))
+        return JSONResponse({"error": "Frontend not built"}, status_code=503)
+else:
+    @app.get("/")
+    async def root():
+        return {
+            "message": "ChessEcon API running. Frontend not built yet.",
+            "docs": "/docs",
+            "health": "/health",
+        }
+# Patch already applied — see websocket_endpoint above

backend/main.py_backup ADDED Viewed

	@@ -0,0 +1,218 @@

+"""
+ChessEcon Backend — FastAPI Application
+Serves the chess game API, WebSocket event stream, and the built React frontend.
+"""
+from __future__ import annotations
+import os
+import asyncio
+import json
+import logging
+from pathlib import Path
+from contextlib import asynccontextmanager
+from fastapi import FastAPI, WebSocket, WebSocketDisconnect
+from fastapi.middleware.cors import CORSMiddleware
+from fastapi.staticfiles import StaticFiles
+from fastapi.responses import FileResponse, JSONResponse
+from backend.api.game_router import router as game_router
+from backend.api.training_router import router as training_router
+from backend.api.websocket import ws_manager
+from backend.agents.qwen_agent import QwenAgent
+from backend.agents.grpo_trainer import GRPOTrainer
+from backend.chess.chess_engine import ChessEngine
+from backend.settings import settings
+# ── Logging ───────────────────────────────────────────────────────────────────
+logging.basicConfig(
+    level=os.getenv("LOG_LEVEL", "info").upper(),
+    format="%(asctime)s [%(levelname)s] %(name)s: %(message)s",
+)
+logger = logging.getLogger(__name__)
+# ── Static frontend path ──────────────────────────────────────────────────────
+FRONTEND_DIST = Path(__file__).parent / "static"
+# ── Game loop (runs as a background task) ─────────────────────────────────────
+async def game_loop():
+    white = QwenAgent()
+    black = QwenAgent()
+    from backend.agents.qwen_agent import _load_model
+    tokenizer, model = _load_model()
+    trainer = GRPOTrainer(model, tokenizer)
+    game_num = 0
+    while True:
+        engine = ChessEngine()
+        move_history: list[str] = []
+        wallet_white = settings.starting_wallet
+        wallet_black = settings.starting_wallet
+        game_num += 1
+        await ws_manager.broadcast_raw({"type": "game_start", "data": {"game": game_num}})
+        trainer.start_game("white")
+        while not engine.is_game_over:
+            color = engine.turn
+            agent = white if color == "white" else black
+            # get_move returns (san, log_prob)
+            san, log_prob = agent.get_move(engine, color, move_history)
+            # get reference log prob for GRPO KL term
+            ref_log_prob = agent.get_move_log_prob_only(engine, color, move_history, san)
+            # apply the move
+            uci = engine.apply_move_san(san)
+            if uci is None:
+                # fallback: random legal move
+                san = engine.random_legal_move_san()
+                uci = engine.apply_move_san(san) or ""
+                log_prob = -1.0
+                ref_log_prob = -1.0
+            move_history.append(san)
+            trainer.record_move(log_prob, ref_log_prob)
+            await ws_manager.broadcast_raw({
+                "type": "move",
+                "data": {
+                    "player": color,
+                    "uci": uci or "",
+                    "move": san,
+                    "fen": engine.fen,
+                    "turn": engine.turn,
+                    "move_number": engine.move_number,
+                    "wallet_white": wallet_white,
+                    "wallet_black": wallet_black,
+                    "log_prob": log_prob,
+                    "message": f"{color} plays {san}",
+                },
+            })
+            await asyncio.sleep(settings.move_delay)
+        result = engine.result
+        reward_w = engine.compute_reward("white")
+        reward_b = engine.compute_reward("black")
+        metrics = trainer.end_game(
+            reward=reward_w,
+            profit=reward_w * 10.0,
+            coaching_calls=0,
+        )
+        await ws_manager.broadcast_raw({
+            "type": "game_end",
+            "data": {
+                "game": game_num,
+                "result": result,
+                "reward_white": reward_w,
+                "reward_black": reward_b,
+                "wallet_white": wallet_white,
+                "wallet_black": wallet_black,
+                "net_pnl_white": reward_w * 10.0,
+                "grpo_loss": metrics.loss if metrics else None,
+                "win_rate": metrics.win_rate if metrics else None,
+                "kl_divergence": metrics.kl_div if metrics else None,
+                "avg_profit": metrics.avg_profit if metrics else None,
+                "grpo_step": metrics.step if metrics else 0,
+            },
+        })
+        await asyncio.sleep(settings.move_delay * 4)
+@asynccontextmanager
+async def lifespan(app: FastAPI):
+    logger.info("ChessEcon backend starting up")
+    logger.info(f"Frontend dist: {FRONTEND_DIST} (exists: {FRONTEND_DIST.exists()})")
+    logger.info(f"Claude Coach: {'enabled' if os.getenv('ANTHROPIC_API_KEY') else 'disabled (no API key)'}")
+    logger.info(f"HuggingFace token: {'set' if os.getenv('HF_TOKEN') else 'not set'}")
+    asyncio.create_task(game_loop())
+    yield
+    logger.info("ChessEcon backend shutting down")
+# ── FastAPI app ───────────────────────────────────────────────────────────────
+app = FastAPI(
+    title="ChessEcon API",
+    description="Multi-Agent Chess Economy — Backend API",
+    version="2.0.0",
+    lifespan=lifespan,
+)
+app.add_middleware(
+    CORSMiddleware,
+    allow_origins=["*"],
+    allow_credentials=True,
+    allow_methods=["*"],
+    allow_headers=["*"],
+)
+# ── API routes ────────────────────────────────────────────────────────────────
+app.include_router(game_router)
+app.include_router(training_router)
+# ── WebSocket endpoint ────────────────────────────────────────────────────────
+@app.websocket("/ws")
+async def websocket_endpoint(ws: WebSocket):
+    await ws_manager.connect(ws)
+    try:
+        while True:
+            data = await ws.receive_text()
+            try:
+                msg = json.loads(data)
+                action = msg.get("action")
+                if action == "ping":
+                    await ws.send_text(json.dumps({"type": "pong"}))
+                elif action == "start_game":
+                    pass  # game_loop auto-starts from lifespan
+                elif action == "stop_game":
+                    pass  # games stop after current game ends
+            except json.JSONDecodeError:
+                pass
+    except WebSocketDisconnect:
+        await ws_manager.disconnect(ws)
+# ── Health check ──────────────────────────────────────────────────────────────
+@app.get("/health")
+async def health():
+    return {
+        "status": "ok",
+        "service": "chessecon-backend",
+        "version": "2.0.0",
+        "ws_connections": ws_manager.connection_count,
+        "claude_available": bool(os.getenv("ANTHROPIC_API_KEY")),
+        "hf_token_set": bool(os.getenv("HF_TOKEN")),
+    }
+@app.get("/api/config")
+async def get_config():
+    return {
+        "entry_fee": float(os.getenv("ENTRY_FEE", "10.0")),
+        "initial_wallet": float(os.getenv("INITIAL_WALLET", "100.0")),
+        "coaching_fee": float(os.getenv("COACHING_FEE", "5.0")),
+        "player_model": os.getenv("PLAYER_MODEL", "Qwen/Qwen2.5-0.5B-Instruct"),
+        "claude_model": os.getenv("CLAUDE_MODEL", "claude-opus-4-5"),
+        "claude_available": bool(os.getenv("ANTHROPIC_API_KEY")),
+        "rl_method": os.getenv("RL_METHOD", "grpo"),
+    }
+# ── Serve React frontend (SPA) ────────────────────────────────────────────────
+if FRONTEND_DIST.exists():
+    app.mount("/assets", StaticFiles(directory=str(FRONTEND_DIST / "assets")), name="assets")
+    @app.get("/{full_path:path}")
+    async def serve_spa(full_path: str):
+        index = FRONTEND_DIST / "index.html"
+        if index.exists():
+            return FileResponse(str(index))
+        return JSONResponse({"error": "Frontend not built"}, status_code=503)
+else:
+    @app.get("/")
+    async def root():
+        return {
+            "message": "ChessEcon API running. Frontend not built yet.",
+            "docs": "/docs",
+            "health": "/health",
+        }
+# Patch already applied — see websocket_endpoint above

backend/openenv/__init__.py ADDED Viewed

	@@ -0,0 +1,19 @@

+"""openenv — OpenEnv 0.1 compliant HTTP interface for ChessEcon."""
+from backend.openenv.env import ChessEconEnv
+from backend.openenv.router import router, init_env
+from backend.openenv.models import (
+    ResetRequest, StepRequest,
+    ResetResponse, StepResponse, StateResponse, EnvInfo,
+)
+__all__ = [
+    "ChessEconEnv",
+    "router",
+    "init_env",
+    "ResetRequest",
+    "StepRequest",
+    "ResetResponse",
+    "StepResponse",
+    "StateResponse",
+    "EnvInfo",
+]

backend/openenv/env.py ADDED Viewed

	@@ -0,0 +1,311 @@

+"""
+openenv/env.py
+──────────────
+Stateful ChessEcon environment that implements the OpenEnv 0.1 contract:
+  reset()  → ResetResponse
+  step()   → StepResponse
+  state()  → StateResponse
+Key design decisions:
+  - Each call to reset() creates a new episode (new game_id, fresh board).
+  - step(action) accepts either UCI or SAN notation.
+  - Rewards are computed per-step (not just terminal):
+      +0.01  legal move played
+      +0.05  move gives check
+      +0.10  capture
+      +1.00  win
+      -1.00  loss
+       0.00  draw
+  - Economy (entry fees, prize pool) is tracked per episode.
+  - Thread-safe: each episode is independent.  The FastAPI router creates
+    one global instance and serialises access via asyncio locks.
+"""
+from __future__ import annotations
+import uuid
+import logging
+from typing import Optional
+import chess
+from backend.chess_engine import ChessEngine
+from backend.settings import settings
+from backend.openenv.models import (
+    ChessObservation, ResetResponse, StepResponse, StateResponse, ResetRequest,
+)
+logger = logging.getLogger(__name__)
+# Shaping rewards (small intermediate signals)
+REWARD_LEGAL_MOVE   =  0.01
+REWARD_CHECK        =  0.05
+REWARD_CAPTURE      =  0.10
+REWARD_WIN          =  1.00
+REWARD_LOSS         = -1.00
+REWARD_DRAW         =  0.00
+class ChessEconEnv:
+    """
+    OpenEnv-compliant Chess Economy environment.
+    Manages a single active episode. Call reset() to start a new episode.
+    Call step(action) to advance it. Call state() to inspect without advancing.
+    """
+    def __init__(
+        self,
+        white_model_id: str,
+        black_model_id: str,
+        starting_wallet: float = 100.0,
+        entry_fee: float = 10.0,
+        prize_pool_fraction: float = 0.9,
+        max_moves: int = 150,
+    ):
+        self.white_model_id = white_model_id
+        self.black_model_id = black_model_id
+        self.starting_wallet = starting_wallet
+        self.entry_fee = entry_fee
+        self.prize_pool_fraction = prize_pool_fraction
+        self.max_moves = max_moves
+        # Episode state (None until first reset())
+        self._engine: Optional[ChessEngine] = None
+        self._episode_id: str = ""
+        self._step_count: int = 0
+        self._status: str = "idle"
+        self._move_history: list[str] = []
+        # Economy
+        self._wallet_white: float = starting_wallet
+        self._wallet_black: float = starting_wallet
+        self._prize_pool: float = 0.0
+        # Last move for observation
+        self._last_uci: Optional[str] = None
+        self._last_san: Optional[str] = None
+    # ── OpenEnv core API ───────────────────────────────────────────────────────
+    def reset(self, request: Optional[ResetRequest] = None) -> ResetResponse:
+        """
+        Start a new episode.  Deducts entry fees and returns the initial observation.
+        """
+        self._engine = ChessEngine()
+        self._episode_id = str(uuid.uuid4())
+        self._step_count = 0
+        self._status = "active"
+        self._move_history = []
+        self._last_uci = None
+        self._last_san = None
+        # Economy: deduct entry fees
+        self._wallet_white -= self.entry_fee
+        self._wallet_black -= self.entry_fee
+        self._prize_pool = self.entry_fee * 2 * self.prize_pool_fraction
+        logger.info(
+            "Episode %s started. Wallets: W=%.1f B=%.1f prize_pool=%.1f",
+            self._episode_id[:8], self._wallet_white, self._wallet_black, self._prize_pool,
+        )
+        obs = self._build_observation()
+        return ResetResponse(
+            observation=obs,
+            info={
+                "episode_id": self._episode_id,
+                "prize_pool": self._prize_pool,
+                "entry_fee": self.entry_fee,
+            },
+        )
+    def step(self, action: str) -> StepResponse:
+        """
+        Apply a move to the board and return the next observation + reward.
+        action: UCI string ('e2e4') or SAN string ('e4').
+        """
+        if self._engine is None or self._status != "active":
+            raise RuntimeError("Call reset() before step()")
+        # ── Apply the move ─────────────────────────────────────────────────
+        # Try UCI first, then SAN
+        uci_applied: Optional[str] = None
+        san_applied: Optional[str] = None
+        # UCI path
+        san_from_uci = self._engine.apply_move_uci(action)
+        if san_from_uci is not None:
+            uci_applied = action
+            san_applied = san_from_uci
+        else:
+            # SAN path — we need the UCI back
+            try:
+                move = self._engine.board.parse_san(action)
+                uci_applied = move.uci()
+                san_applied = self._engine.board.san(move)
+                self._engine.board.push(move)
+            except Exception:
+                # Illegal move — return current state with negative reward
+                obs = self._build_observation()
+                return StepResponse(
+                    observation=obs,
+                    reward=-0.10,
+                    terminated=False,
+                    truncated=False,
+                    info={"error": f"Illegal move: {action}", "legal_moves": self._engine.legal_moves_uci[:10]},
+                )
+        self._last_uci = uci_applied
+        self._last_san = san_applied
+        self._move_history.append(san_applied)
+        self._step_count += 1
+        # ── Compute per-step reward ────────────────────────────────────────
+        reward = self._compute_step_reward(uci_applied)
+        # ── Check termination ──────────────────────────────────────────────
+        terminated = bool(self._engine.is_game_over)
+        truncated = (not terminated) and (self._step_count >= self.max_moves * 2)
+        if terminated or truncated:
+            reward = self._settle_game(terminated, truncated, reward)
+        obs = self._build_observation()
+        return StepResponse(
+            observation=obs,
+            reward=round(reward, 4),
+            terminated=terminated,
+            truncated=truncated,
+            info={
+                "episode_id": self._episode_id,
+                "step": self._step_count,
+                "san": san_applied,
+                "uci": uci_applied,
+                "move_history": self._move_history[-10:],
+                "prize_pool": self._prize_pool,
+            },
+        )
+    def state(self) -> StateResponse:
+        """Return current episode state without advancing it."""
+        if self._engine is None:
+            # Return idle state with default observation
+            idle_obs = ChessObservation(
+                fen="rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1",
+                turn="white",
+                move_number=1,
+                legal_moves_uci=[],
+                wallet_white=self._wallet_white,
+                wallet_black=self._wallet_black,
+                white_model=self.white_model_id,
+                black_model=self.black_model_id,
+            )
+            return StateResponse(
+                observation=idle_obs,
+                episode_id="",
+                step_count=0,
+                status="idle",
+            )
+        return StateResponse(
+            observation=self._build_observation(),
+            episode_id=self._episode_id,
+            step_count=self._step_count,
+            status=self._status,
+            info={
+                "prize_pool": self._prize_pool,
+                "move_history": self._move_history[-10:],
+            },
+        )
+    # ── Internal helpers ───────────────────────────────────────────────────────
+    def _build_observation(self) -> ChessObservation:
+        engine = self._engine
+        assert engine is not None
+        board = engine.board
+        return ChessObservation(
+            fen=engine.fen,
+            turn=engine.turn,
+            move_number=engine.move_number,
+            last_move_uci=self._last_uci,
+            last_move_san=self._last_san,
+            legal_moves_uci=engine.legal_moves_uci,
+            is_check=board.is_check(),
+            wallet_white=round(self._wallet_white, 2),
+            wallet_black=round(self._wallet_black, 2),
+            white_model=self.white_model_id,
+            black_model=self.black_model_id,
+            info={
+                "move_history": self._move_history[-20:],
+                "step_count": self._step_count,
+                "episode_id": self._episode_id,
+            },
+        )
+    def _compute_step_reward(self, uci: str) -> float:
+        """
+        Dense per-step reward shaping.
+        Evaluated AFTER the move has been applied, so we look at the NEW board state.
+        """
+        engine = self._engine
+        assert engine is not None
+        board = engine.board
+        reward = REWARD_LEGAL_MOVE
+        # Check bonus (opponent is now in check)
+        if board.is_check():
+            reward += REWARD_CHECK
+        # Capture bonus — look at the move that was just pushed
+        if board.move_stack:
+            last_move = board.move_stack[-1]
+            # Castling and en-passant: board.is_capture works on the board before the move
+            # We check by looking at whether a piece disappeared from the target square
+            # Simple heuristic: the move stack entry captures flag
+            if board.is_capture(last_move):
+                reward += REWARD_CAPTURE
+        return reward
+    def _settle_game(self, terminated: bool, truncated: bool, step_reward: float) -> float:
+        """
+        Apply terminal reward and settle the economy.
+        Returns the final total reward for the last move.
+        """
+        engine = self._engine
+        assert engine is not None
+        result = engine.result or "1/2-1/2"
+        white_reward = engine.compute_reward("white")  # +1, -1, or 0
+        # Terminal reward
+        if white_reward > 0:
+            terminal = REWARD_WIN
+            self._wallet_white += self._prize_pool
+            logger.info("White wins! Prize: +%.1f", self._prize_pool)
+        elif white_reward < 0:
+            terminal = REWARD_LOSS
+            self._wallet_black += self._prize_pool
+            logger.info("Black wins! Prize: +%.1f", self._prize_pool)
+        else:
+            terminal = REWARD_DRAW
+            self._wallet_white += self._prize_pool / 2
+            self._wallet_black += self._prize_pool / 2
+            logger.info("Draw. Split prize: +%.1f each", self._prize_pool / 2)
+        self._status = "truncated" if truncated else "terminated"
+        logger.info(
+            "Episode %s ended. Result=%s Wallets: W=%.1f B=%.1f",
+            self._episode_id[:8], result,
+            self._wallet_white, self._wallet_black,
+        )
+        return step_reward + terminal

backend/openenv/models.py ADDED Viewed

	@@ -0,0 +1,136 @@

+"""
+openenv/models.py
+─────────────────
+Pydantic schemas that exactly match the OpenEnv 0.1 HTTP spec.
+  POST /reset  → ResetResponse
+  POST /step   → StepResponse
+  GET  /state  → StateResponse
+All three wrap a shared Observation object that carries chess-specific
+fields inside the `info` dict so the core contract stays generic.
+"""
+from __future__ import annotations
+from typing import Any, Optional
+from pydantic import BaseModel, Field
+# ── Request bodies ─────────────────────────────────────────────────────────────
+class StepRequest(BaseModel):
+    """Action sent by the RL trainer to advance the environment by one move."""
+    action: str = Field(
+        ...,
+        description="Chess move in UCI notation (e.g. 'e2e4') or SAN (e.g. 'e4')",
+        examples=["e2e4", "Nf3", "O-O"],
+    )
+class ResetRequest(BaseModel):
+    """Optional seed / config passed on reset. All fields optional."""
+    seed: Optional[int] = Field(None, description="RNG seed for reproducibility")
+    config: Optional[dict[str, Any]] = Field(
+        None, description="Override environment config for this episode"
+    )
+# ── Core observation ───────────────────────────────────────────────────────────
+class ChessObservation(BaseModel):
+    """
+    Chess-specific observation.  Returned inside every response as `observation`.
+    The `info` dict carries auxiliary data (legal moves, last move, etc.) so that
+    the outer schema stays OpenEnv-generic.
+    """
+    fen: str = Field(..., description="Current board position in FEN notation")
+    turn: str = Field(..., description="'white' or 'black'")
+    move_number: int = Field(..., description="Full-move number (1-indexed)")
+    last_move_uci: Optional[str] = Field(None, description="Last move in UCI notation")
+    last_move_san: Optional[str] = Field(None, description="Last move in SAN notation")
+    legal_moves_uci: list[str] = Field(..., description="All legal moves in UCI notation")
+    is_check: bool = Field(False, description="Whether the current side is in check")
+    # Economy
+    wallet_white: float = Field(..., description="White agent wallet balance (units)")
+    wallet_black: float = Field(..., description="Black agent wallet balance (units)")
+    # Agent identities
+    white_model: str = Field(..., description="Model ID playing White")
+    black_model: str = Field(..., description="Model ID playing Black")
+    # Info dict for auxiliary / extensible data
+    info: dict[str, Any] = Field(default_factory=dict)
+# ── OpenEnv response bodies ────────────────────────────────────────────────────
+class ResetResponse(BaseModel):
+    """
+    Returned by POST /reset.
+    OpenEnv spec: { observation, info }
+    """
+    observation: ChessObservation
+    info: dict[str, Any] = Field(default_factory=dict)
+class StepResponse(BaseModel):
+    """
+    Returned by POST /step.
+    OpenEnv spec: { observation, reward, terminated, truncated, info }
+    """
+    observation: ChessObservation
+    reward: float = Field(..., description="Per-step reward signal")
+    terminated: bool = Field(..., description="True if the episode ended naturally (checkmate/stalemate/draw)")
+    truncated: bool = Field(..., description="True if the episode was cut short (move limit)")
+    info: dict[str, Any] = Field(default_factory=dict)
+class StateResponse(BaseModel):
+    """
+    Returned by GET /state.
+    OpenEnv spec: { observation, info, episode_id, step_count, status }
+    """
+    observation: ChessObservation
+    info: dict[str, Any] = Field(default_factory=dict)
+    episode_id: str = Field(..., description="Unique identifier for the current episode")
+    step_count: int = Field(..., description="Number of moves played so far")
+    status: str = Field(..., description="'active' | 'terminated' | 'truncated' | 'idle'")
+# ── Environment info ──────────────────────────────────────────────────────────
+class EnvInfo(BaseModel):
+    """Returned by GET /env_info — describes environment capabilities."""
+    name: str = "chessecon"
+    version: str = "1.0.0"
+    description: str = (
+        "Two-agent chess economy environment. White plays Qwen2.5-0.5B-Instruct, "
+        "Black plays Llama-3.2-1B-Instruct. Agents earn/lose economic units based "
+        "on game outcomes. Compatible with OpenEnv 0.1 spec."
+    )
+    openenv_version: str = "0.1"
+    action_space: dict = Field(
+        default_factory=lambda: {
+            "type": "text",
+            "description": "Chess move in UCI (e2e4) or SAN (e4) notation",
+        }
+    )
+    observation_space: dict = Field(
+        default_factory=lambda: {
+            "type": "structured",
+            "fields": ["fen", "turn", "move_number", "legal_moves_uci",
+                       "wallet_white", "wallet_black", "is_check"],
+        }
+    )
+    reward_range: list[float] = Field(default_factory=lambda: [-1.0, 1.0])
+    max_episode_steps: int = 300
+    agents: list[dict] = Field(
+        default_factory=lambda: [
+            {"id": "white", "model": "Qwen/Qwen2.5-0.5B-Instruct", "role": "White player"},
+            {"id": "black", "model": "meta-llama/Llama-3.2-1B-Instruct", "role": "Black player"},
+        ]
+    )
+    tags: list[str] = Field(
+        default_factory=lambda: [
+            "chess", "multi-agent", "rl", "grpo", "economy",
+            "openenv", "two-player", "game",
+        ]
+    )

backend/openenv/router.py ADDED Viewed

	@@ -0,0 +1,159 @@

+"""
+openenv/router.py
+─────────────────
+FastAPI router that exposes the OpenEnv 0.1 HTTP API:
+  POST /reset         → start a new episode
+  POST /step          → advance the environment by one action
+  GET  /state         → inspect current episode state (no side-effects)
+  GET  /env_info      → environment metadata (for HF Hub discoverability)
+All endpoints are prefixed with /env so the full paths are:
+  /env/reset, /env/step, /env/state, /env/env_info
+A single global ChessEconEnv instance is shared across all HTTP requests.
+An asyncio.Lock ensures that concurrent step() calls don't race.
+The auto-play game loop (websocket_server.py) runs in parallel and calls
+env.reset() / env.step() internally — it does NOT go through these HTTP
+endpoints.  The HTTP endpoints are for external RL trainers (TRL, verl,
+SkyRL etc.) that want to drive the environment themselves.
+"""
+from __future__ import annotations
+import asyncio
+import logging
+from typing import Optional
+from fastapi import APIRouter, HTTPException, status
+from backend.openenv.models import (
+    ResetRequest, StepRequest,
+    ResetResponse, StepResponse, StateResponse, EnvInfo,
+)
+from backend.openenv.env import ChessEconEnv
+from backend.settings import settings
+logger = logging.getLogger(__name__)
+router = APIRouter(prefix="/env", tags=["OpenEnv"])
+# ── Singleton environment + lock ──────────────────────────────────────────────
+_env: Optional[ChessEconEnv] = None
+_env_lock: asyncio.Lock = asyncio.Lock()
+def get_env() -> ChessEconEnv:
+    """Return the global environment instance (initialised at app startup)."""
+    global _env
+    if _env is None:
+        raise HTTPException(
+            status_code=status.HTTP_503_SERVICE_UNAVAILABLE,
+            detail="Environment not initialised yet. Models still loading.",
+        )
+    return _env
+def init_env(white_model_id: str, black_model_id: str) -> ChessEconEnv:
+    """Called once at app lifespan startup after models are loaded."""
+    global _env
+    _env = ChessEconEnv(
+        white_model_id=white_model_id,
+        black_model_id=black_model_id,
+        starting_wallet=settings.starting_wallet,
+        entry_fee=settings.entry_fee,
+        prize_pool_fraction=settings.prize_pool_fraction,
+        max_moves=settings.max_moves,
+    )
+    logger.info(
+        "ChessEconEnv initialised. White=%s Black=%s",
+        white_model_id, black_model_id,
+    )
+    return _env
+# ── OpenEnv endpoints ─────────────────────────────────────────────────────────
+@router.post(
+    "/reset",
+    response_model=ResetResponse,
+    summary="Reset — start a new episode",
+    description=(
+        "Initialises a new chess game, deducts entry fees from both agent wallets, "
+        "and returns the initial observation. Compatible with OpenEnv 0.1 spec."
+    ),
+)
+async def reset(request: Optional[ResetRequest] = None) -> ResetResponse:
+    env = get_env()
+    async with _env_lock:
+        try:
+            return env.reset(request)
+        except Exception as exc:
+            logger.exception("reset() failed")
+            raise HTTPException(status_code=500, detail=str(exc))
+@router.post(
+    "/step",
+    response_model=StepResponse,
+    summary="Step — apply one action",
+    description=(
+        "Applies a chess move (UCI or SAN) to the current board and returns "
+        "the next observation, per-step reward, and termination flags. "
+        "Returns reward=-0.1 for illegal moves (episode continues). "
+        "Compatible with OpenEnv 0.1 spec."
+    ),
+)
+async def step(request: StepRequest) -> StepResponse:
+    env = get_env()
+    async with _env_lock:
+        try:
+            return env.step(request.action)
+        except RuntimeError as exc:
+            raise HTTPException(
+                status_code=status.HTTP_409_CONFLICT,
+                detail=str(exc),
+            )
+        except Exception as exc:
+            logger.exception("step() failed")
+            raise HTTPException(status_code=500, detail=str(exc))
+@router.get(
+    "/state",
+    response_model=StateResponse,
+    summary="State — current episode state (read-only)",
+    description=(
+        "Returns the current episode state without advancing it. "
+        "Safe to call at any time, even before reset(). "
+        "Compatible with OpenEnv 0.1 spec."
+    ),
+)
+async def state() -> StateResponse:
+    env = get_env()
+    try:
+        return env.state()
+    except Exception as exc:
+        logger.exception("state() failed")
+        raise HTTPException(status_code=500, detail=str(exc))
+@router.get(
+    "/env_info",
+    response_model=EnvInfo,
+    summary="Environment metadata",
+    description=(
+        "Returns environment metadata used by the HuggingFace OpenEnv Hub "
+        "for discoverability. Lists action/observation spaces, agent models, "
+        "reward range, and OpenEnv version."
+    ),
+)
+async def env_info() -> EnvInfo:
+    env = get_env()
+    return EnvInfo(
+        agents=[
+            {"id": "white", "model": env.white_model_id, "role": "White player (Qwen)"},
+            {"id": "black", "model": env.black_model_id, "role": "Black player (Llama)"},
+        ]
+    )

backend/qwen_agent.py ADDED Viewed

	@@ -0,0 +1,228 @@

+"""
+qwen_agent.py
+─────────────
+Loads Qwen2.5-0.5B-Instruct (or any HuggingFace causal LM) and uses it to
+generate chess moves given a position prompt.
+Key responsibilities:
+  - Lazy model loading (first call triggers download + GPU placement)
+  - Illegal-move retry loop (up to settings.max_move_retries attempts)
+  - Log-probability extraction for GRPO training
+  - Temperature annealing hook (called by the trainer after each update)
+"""
+import logging
+import torch
+from typing import Optional
+from transformers import AutoTokenizer, AutoModelForCausalLM
+from settings import settings
+from chess_engine import ChessEngine
+logger = logging.getLogger(__name__)
+# ── Lazy singletons ───────────────────────────────────────────────────────────
+_tokenizer = None
+_model = None
+def _load_model():
+    global _tokenizer, _model
+    if _model is not None:
+        return _tokenizer, _model
+    logger.info("Loading model: %s …", settings.player_model)
+    dtype_map = {
+        "float16":  torch.float16,
+        "bfloat16": torch.bfloat16,
+        "float32":  torch.float32,
+    }
+    torch_dtype = dtype_map.get(settings.torch_dtype, torch.bfloat16)
+    hf_kwargs = {}
+    if settings.hf_token:
+        hf_kwargs["token"] = settings.hf_token
+    _tokenizer = AutoTokenizer.from_pretrained(
+        settings.player_model,
+        trust_remote_code=True,
+        **hf_kwargs,
+    )
+    device_map = settings.device if settings.device != "auto" else "auto"
+    _model = AutoModelForCausalLM.from_pretrained(
+        settings.player_model,
+        torch_dtype=torch_dtype,
+        device_map=device_map,
+        trust_remote_code=True,
+        **hf_kwargs,
+    )
+    _model.eval()
+    logger.info("Model loaded on device: %s", next(_model.parameters()).device)
+    # Apply LoRA if requested
+    if settings.lora_rank > 0:
+        try:
+            from peft import get_peft_model, LoraConfig, TaskType  # type: ignore
+            lora_config = LoraConfig(
+                task_type=TaskType.CAUSAL_LM,
+                r=settings.lora_rank,
+                lora_alpha=settings.lora_rank * 2,
+                lora_dropout=0.05,
+                target_modules=["q_proj", "v_proj"],
+            )
+            _model = get_peft_model(_model, lora_config)
+            _model.print_trainable_parameters()
+            logger.info("LoRA adapter applied (rank=%d)", settings.lora_rank)
+        except ImportError:
+            logger.warning("peft not installed — running without LoRA. pip install peft")
+    return _tokenizer, _model
+class QwenAgent:
+    """
+    Wraps the Qwen model for chess move generation.
+    Usage:
+        agent = QwenAgent()
+        san, log_prob = await agent.get_move(engine, "white", move_history)
+    """
+    def __init__(self):
+        self._temperature = settings.temperature
+    def set_temperature(self, temp: float):
+        """Called by the GRPO trainer to anneal temperature over training."""
+        self._temperature = max(0.1, temp)
+    @property
+    def temperature(self) -> float:
+        return self._temperature
+    def get_move(
+        self,
+        engine: ChessEngine,
+        agent_color: str,
+        move_history: list[str],
+    ) -> tuple[str, float]:
+        """
+        Generate a legal chess move for the given position.
+        Returns:
+            (san_move, log_prob)
+            - san_move: the chosen move in SAN notation
+            - log_prob: sum of log-probs of the generated tokens (for GRPO)
+        Falls back to a random legal move if all retries are exhausted.
+        """
+        tokenizer, model = _load_model()
+        prompt = engine.build_prompt(agent_color, move_history)
+        messages = [
+            {"role": "system", "content": "You are a chess engine. Reply with only the move."},
+            {"role": "user", "content": prompt},
+        ]
+        # Apply chat template
+        text = tokenizer.apply_chat_template(
+            messages,
+            tokenize=False,
+            add_generation_prompt=True,
+        )
+        inputs = tokenizer(text, return_tensors="pt").to(model.device)
+        input_len = inputs["input_ids"].shape[1]
+        best_san: Optional[str] = None
+        best_log_prob: float = 0.0
+        for attempt in range(settings.max_move_retries):
+            with torch.no_grad():
+                outputs = model.generate(
+                    **inputs,
+                    max_new_tokens=settings.max_new_tokens,
+                    temperature=self._temperature,
+                    do_sample=True,
+                    pad_token_id=tokenizer.eos_token_id,
+                    return_dict_in_generate=True,
+                    output_scores=True,
+                )
+            generated_ids = outputs.sequences[0][input_len:]
+            generated_text = tokenizer.decode(generated_ids, skip_special_tokens=True)
+            # Compute sum of log-probs for GRPO
+            log_prob = _compute_log_prob(outputs.scores, generated_ids)
+            san = engine.parse_model_output(generated_text)
+            if san is not None:
+                best_san = san
+                best_log_prob = log_prob
+                logger.debug(
+                    "Move generated (attempt %d/%d): %s  log_prob=%.4f",
+                    attempt + 1, settings.max_move_retries, san, log_prob,
+                )
+                break
+            else:
+                logger.debug(
+                    "Illegal/unparseable output (attempt %d/%d): %r",
+                    attempt + 1, settings.max_move_retries, generated_text,
+                )
+        if best_san is None:
+            # All retries exhausted — fall back to random legal move
+            best_san = engine.random_legal_move_san() or "e4"
+            best_log_prob = 0.0
+            logger.warning("All retries exhausted — using random fallback move: %s", best_san)
+        return best_san, best_log_prob
+    def get_move_log_prob_only(
+        self,
+        engine: ChessEngine,
+        agent_color: str,
+        move_history: list[str],
+        san_move: str,
+    ) -> float:
+        """
+        Compute the log-probability of a specific move under the current policy.
+        Used by GRPO to evaluate the reference policy for KL computation.
+        """
+        tokenizer, model = _load_model()
+        prompt = engine.build_prompt(agent_color, move_history)
+        messages = [
+            {"role": "system", "content": "You are a chess engine. Reply with only the move."},
+            {"role": "user", "content": prompt},
+        ]
+        text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
+        target_text = text + san_move
+        inputs = tokenizer(target_text, return_tensors="pt").to(model.device)
+        prompt_len = tokenizer(text, return_tensors="pt")["input_ids"].shape[1]
+        with torch.no_grad():
+            out = model(**inputs, labels=inputs["input_ids"])
+        # Extract per-token log-probs for the generated portion only
+        logits = out.logits[0, prompt_len - 1:-1]
+        target_ids = inputs["input_ids"][0, prompt_len:]
+        log_probs = torch.nn.functional.log_softmax(logits, dim=-1)
+        selected = log_probs.gather(1, target_ids.unsqueeze(1)).squeeze(1)
+        return selected.sum().item()
+# ── Helpers ───────────────────────────────────────────────────────────────────
+def _compute_log_prob(scores, generated_ids) -> float:
+    """
+    Compute the sum of log-probabilities for the generated token sequence.
+    `scores` is a tuple of (vocab_size,) tensors, one per generated step.
+    """
+    total = 0.0
+    for step, score in enumerate(scores):
+        if step >= len(generated_ids):
+            break
+        log_probs = torch.nn.functional.log_softmax(score[0], dim=-1)
+        total += log_probs[generated_ids[step]].item()
+    return total

backend/requirements.txt ADDED Viewed

	@@ -0,0 +1,40 @@

+# ChessEcon Backend Dependencies
+# fastapi
+# uvicorn[standard]
+# websockets
+python-chess
+pydantic
+pydantic-settings
+anthropic
+# python-dotenv
+httpx
+# Nevermined Payments — x402 cross-team agent-to-agent payments
+# https://nevermined.ai/docs/getting-started/welcome
+huggingface_hub
+# transformers
+# ChessEcon Backend — Python dependencies
+# Install with: pip install -r requirements.txt
+# ── Web server ────────────────────────────────────────────────────────────────
+fastapi
+uvicorn[standard]
+websockets
+# ── Chess ─────────────────────────────────────────────────────────────────────
+chess
+# ── LLM / Training ───────────────────────────────────────────────────────────
+torch
+transformers
+accelerate
+peft
+sentencepiece
+protobuf
+# ── Utilities ─────────────────────────────────────────────────────────────────
+python-dotenv

backend/settings.py ADDED Viewed

	@@ -0,0 +1,65 @@

+"""
+settings.py
+───────────
+Single source of truth for all environment-variable-driven configuration.
+All values have safe defaults so the server starts without any .env file.
+New in v2 (OpenEnv):
+  - white_model / black_model replace the single player_model
+"""
+import os
+from dataclasses import dataclass, field
+@dataclass(frozen=True)
+class Settings:
+    # ── Models (dual-agent) ───────────────────────────────────────────────
+    white_model: str = field(
+        default_factory=lambda: os.getenv(
+            "WHITE_MODEL",
+            os.getenv("PLAYER_MODEL", "Qwen/Qwen2.5-0.5B-Instruct"),
+        )
+    )
+    black_model: str = field(
+        default_factory=lambda: os.getenv(
+            "BLACK_MODEL",
+            "meta-llama/Llama-3.2-1B-Instruct",
+        )
+    )
+    # Legacy alias
+    @property
+    def player_model(self) -> str:
+        return self.white_model
+    hf_token: str = field(default_factory=lambda: os.getenv("HF_TOKEN", ""))
+    device: str = field(default_factory=lambda: os.getenv("DEVICE", "auto"))
+    torch_dtype: str = field(default_factory=lambda: os.getenv("TORCH_DTYPE", "bfloat16"))
+    # ── Move generation ───────────────────────────────────────────────────
+    max_new_tokens: int = field(default_factory=lambda: int(os.getenv("MAX_NEW_TOKENS", "32")))
+    temperature: float = field(default_factory=lambda: float(os.getenv("TEMPERATURE", "0.7")))
+    max_move_retries: int = field(default_factory=lambda: int(os.getenv("MAX_MOVE_RETRIES", "5")))
+    # ── GRPO training ─────────────────────────────────────────────────────
+    grpo_update_every_n_games: int = field(default_factory=lambda: int(os.getenv("GRPO_UPDATE_EVERY_N_GAMES", "1")))
+    grpo_group_size: int = field(default_factory=lambda: int(os.getenv("GRPO_GROUP_SIZE", "4")))
+    grpo_kl_coeff: float = field(default_factory=lambda: float(os.getenv("GRPO_KL_COEFF", "0.04")))
+    grpo_lr: float = field(default_factory=lambda: float(os.getenv("GRPO_LR", "1e-5")))
+    lora_rank: int = field(default_factory=lambda: int(os.getenv("LORA_RANK", "8")))
+    checkpoint_dir: str = field(default_factory=lambda: os.getenv("CHECKPOINT_DIR", "./checkpoints"))
+    save_every_n_steps: int = field(default_factory=lambda: int(os.getenv("SAVE_EVERY_N_STEPS", "10")))
+    # ── Economy ───────────────────────────────────────────────────────────
+    starting_wallet: float = field(default_factory=lambda: float(os.getenv("STARTING_WALLET", "100.0")))
+    entry_fee: float = field(default_factory=lambda: float(os.getenv("ENTRY_FEE", "10.0")))
+    prize_pool_fraction: float = field(default_factory=lambda: float(os.getenv("PRIZE_POOL_FRACTION", "0.9")))
+    max_moves: int = field(default_factory=lambda: int(os.getenv("MAX_MOVES", "150")))
+    # ── Server ────────────────────────────────────────────────────────────
+    host: str = field(default_factory=lambda: os.getenv("HOST", "0.0.0.0"))
+    port: int = field(default_factory=lambda: int(os.getenv("PORT", "8000")))
+    move_delay: float = field(default_factory=lambda: float(os.getenv("MOVE_DELAY", "0.5")))
+settings = Settings()

backend/websocket_server.py ADDED Viewed

	@@ -0,0 +1,365 @@

+"""
+websocket_server.py  (v2 — OpenEnv + Dual Agent)
+─────────────────────────────────────────────────
+FastAPI application that:
+  1. Loads TWO models at startup:
+       White → Qwen/Qwen2.5-0.5B-Instruct
+       Black → meta-llama/Llama-3.2-1B-Instruct
+  2. Registers the OpenEnv 0.1 HTTP API at /env/*
+  3. Runs continuous self-play games (white=Qwen vs black=Llama).
+  4. Streams every game event to all connected WebSocket clients.
+  5. Runs GRPO on the WHITE model only (Qwen) — Llama acts as fixed opponent.
+OpenEnv endpoints (for external RL trainers):
+  POST /env/reset      start a new episode
+  POST /env/step       apply one action
+  GET  /env/state      inspect current state
+  GET  /env/env_info   environment metadata (HF Hub discoverability)
+WebSocket endpoint:  /ws
+Health check:        /health
+API docs:            /docs
+"""
+import asyncio
+import json
+import logging
+import time
+from contextlib import asynccontextmanager
+from typing import Any
+import uvicorn
+from fastapi import FastAPI, WebSocket, WebSocketDisconnect
+from fastapi.middleware.cors import CORSMiddleware
+from settings import settings
+from chess_engine import ChessEngine
+from agents.model_agent import ModelAgent
+from grpo_trainer import GRPOTrainer
+from openenv.router import router as openenv_router, init_env
+logging.basicConfig(
+    level=logging.INFO,
+    format="%(asctime)s %(levelname)s %(name)s: %(message)s",
+)
+logger = logging.getLogger(__name__)
+# ── Global state ──────────────────────────────────────────────────────────────
+connected_clients: set[WebSocket] = set()
+paused = False
+game_count = 0
+wallet_white = settings.starting_wallet
+wallet_black = settings.starting_wallet
+# Initialised in lifespan
+white_agent: ModelAgent | None = None
+black_agent: ModelAgent | None = None
+trainer: GRPOTrainer | None = None
+# ── Lifespan ──────────────────────────────────────────────────────────────────
+@asynccontextmanager
+async def lifespan(app: FastAPI):
+    global white_agent, black_agent, trainer
+    logger.info("Loading WHITE model (%s) …", settings.white_model)
+    white_agent = ModelAgent(settings.white_model).load()
+    logger.info("Loading BLACK model (%s) …", settings.black_model)
+    black_agent = ModelAgent(settings.black_model).load()
+    # GRPO trains the WHITE agent (Qwen); Llama is a fixed opponent
+    trainer = GRPOTrainer(white_agent.model, white_agent.tokenizer)
+    # Initialise the OpenEnv environment (used by /env/* HTTP endpoints)
+    init_env(
+        white_model_id=settings.white_model,
+        black_model_id=settings.black_model,
+    )
+    logger.info("Both models ready. Starting auto-play loop …")
+    asyncio.create_task(game_loop())
+    yield
+    logger.info("Shutting down.")
+app = FastAPI(
+    title="ChessEcon",
+    description=(
+        "Multi-Agent Chess Economy — OpenEnv 0.1 compliant environment. "
+        "White: Qwen2.5-0.5B  |  Black: Llama-3.2-1B  |  Training: GRPO"
+    ),
+    version="2.0.0",
+    lifespan=lifespan,
+)
+app.add_middleware(
+    CORSMiddleware,
+    allow_origins=["*"],
+    allow_methods=["*"],
+    allow_headers=["*"],
+)
+# Register OpenEnv HTTP router at /env/*
+app.include_router(openenv_router)
+# ── Health ────────────────────────────────────────────────────────────────────
+@app.get("/health")
+async def health():
+    return {
+        "status": "ok",
+        "service": "chessecon",
+        "version": "2.0.0",
+        "openenv_version": "0.1",
+        "white_model": settings.white_model,
+        "black_model": settings.black_model,
+        "ws_clients": len(connected_clients),
+        "games_played": game_count,
+    }
+# ── WebSocket endpoint ────────────────────────────────────────────────────────
+@app.websocket("/ws")
+async def websocket_endpoint(ws: WebSocket):
+    await ws.accept()
+    connected_clients.add(ws)
+    logger.info("WS client connected (%d total)", len(connected_clients))
+    # Send current state snapshot to new client immediately
+    try:
+        await ws.send_text(json.dumps({
+            "type": "status",
+            "data": {
+                "game_id": game_count,
+                "wallet_white": round(wallet_white, 2),
+                "wallet_black": round(wallet_black, 2),
+                "grpo_step": trainer._step if trainer else 0,
+                "message": f"Connected — game #{game_count} in progress",
+            }
+        }))
+    except Exception:
+        pass
+    try:
+        while True:
+            raw = await ws.receive_text()
+            try:
+                msg = json.loads(raw)
+                await handle_client_message(ws, msg)
+            except json.JSONDecodeError:
+                pass
+    except WebSocketDisconnect:
+        connected_clients.discard(ws)
+        logger.info("WS client disconnected (%d total)", len(connected_clients))
+async def handle_client_message(ws: WebSocket, msg: dict):
+    global paused
+    action = msg.get("action", "")
+    if action == "ping":
+        await ws.send_text(json.dumps({"type": "pong", "data": {}}))
+    elif action == "pause":
+        paused = True
+        logger.info("Game loop paused")
+    elif action == "resume":
+        paused = False
+        logger.info("Game loop resumed")
+# ── Broadcast helper ──────────────────────────────────────────────────────────
+async def broadcast(event_type: str, data: dict[str, Any]):
+    if not connected_clients:
+        return
+    payload = json.dumps({"type": event_type, "data": data})
+    dead: set[WebSocket] = set()
+    for ws in list(connected_clients):
+        try:
+            await ws.send_text(payload)
+        except Exception:
+            dead.add(ws)
+    connected_clients.difference_update(dead)
+# ── Main game loop ────────────────────────────────────────────────────────────
+async def game_loop():
+    global game_count, wallet_white, wallet_black, paused
+    while True:
+        while paused:
+            await asyncio.sleep(0.5)
+        game_count += 1
+        engine = ChessEngine()
+        wallet_white -= settings.entry_fee
+        wallet_black -= settings.entry_fee
+        prize_pool = settings.entry_fee * 2 * settings.prize_pool_fraction
+        await broadcast("game_start", {
+            "game_id": game_count,
+            "wallet_white": round(wallet_white, 2),
+            "wallet_black": round(wallet_black, 2),
+            "prize_pool": round(prize_pool, 2),
+            "white_model": settings.white_model,
+            "black_model": settings.black_model,
+            "message": (
+                f"Game #{game_count} — "
+                f"Qwen(W) vs Llama(B) — "
+                f"Prize pool: {prize_pool:.1f} units"
+            ),
+        })
+        trainer.start_game("white")  # type: ignore[union-attr]
+        move_history: list[str] = []
+        # ── Play the game ─────────────────────────────────────────────────
+        while not engine.is_game_over and engine.move_number <= settings.max_moves:
+            while paused:
+                await asyncio.sleep(0.5)
+            current_color = engine.turn
+            # Select the right agent
+            active_agent = white_agent if current_color == "white" else black_agent
+            san, log_prob = await asyncio.get_event_loop().run_in_executor(
+                None,
+                active_agent.get_move,  # type: ignore[union-attr]
+                engine, current_color, move_history,
+            )
+            # KL reference: only needed for WHITE (GRPO training target)
+            if current_color == "white":
+                ref_log_prob = await asyncio.get_event_loop().run_in_executor(
+                    None,
+                    white_agent.get_move_log_prob_only,  # type: ignore[union-attr]
+                    engine, current_color, move_history, san,
+                )
+            else:
+                ref_log_prob = log_prob  # Black is fixed; KL = 0
+            uci = engine.apply_move_san(san)
+            if uci is None:
+                fallback = engine.random_legal_move_san()
+                if fallback is None:
+                    break
+                san = fallback
+                uci = engine.apply_move_san(san) or ""
+                log_prob = 0.0
+                ref_log_prob = 0.0
+            trainer.record_move(log_prob, ref_log_prob)  # type: ignore[union-attr]
+            move_history.append(san)
+            await broadcast("move", {
+                "game_id": game_count,
+                "player": current_color,
+                "model": settings.white_model if current_color == "white" else settings.black_model,
+                "move": san,
+                "uci": uci,
+                "fen": engine.fen,
+                "move_number": engine.move_number,
+                "turn": engine.turn,
+                "wallet_white": round(wallet_white, 2),
+                "wallet_black": round(wallet_black, 2),
+                "message": f"{'Qwen' if current_color == 'white' else 'Llama'} plays {san}",
+            })
+            await asyncio.sleep(settings.move_delay)
+        # ���─ Game over ─────────────────────────────────────────────────────
+        # If game ended by chess rules use that result; otherwise adjudicate by material
+        if engine.result:
+            result = engine.result
+        else:
+            # Count material: Q=9 R=5 B=3 N=3 P=1
+            piece_values = {1: 1, 2: 3, 3: 3, 4: 5, 5: 9}  # pawn,knight,bishop,rook,queen
+            import chess as _chess
+            white_mat = sum(
+                piece_values.get(pt, 0)
+                for pt in range(1, 6)
+                for _ in engine.board.pieces(pt, _chess.WHITE)
+            )
+            black_mat = sum(
+                piece_values.get(pt, 0)
+                for pt in range(1, 6)
+                for _ in engine.board.pieces(pt, _chess.BLACK)
+            )
+            result = '1-0' if white_mat >= black_mat else '0-1'  # always decisive
+        white_reward = 1.0 if result == "1-0" else (-1.0 if result == "0-1" else 0.0)
+        black_reward = 1.0 if result == "0-1" else (-1.0 if result == "1-0" else 0.0)
+        if result == "1-0":
+            wallet_white += prize_pool
+        elif result == "0-1":
+            wallet_black += prize_pool
+        else:
+            wallet_white += prize_pool / 2
+            wallet_black += prize_pool / 2
+        white_pnl = (
+            prize_pool if result == "1-0"
+            else prize_pool / 2 if result == "1/2-1/2"
+            else 0
+        ) - settings.entry_fee
+        black_pnl = (
+            prize_pool if result == "0-1"
+            else prize_pool / 2 if result == "1/2-1/2"
+            else 0
+        ) - settings.entry_fee
+        await broadcast("game_end", {
+            "game_id": game_count,
+            "result": result,
+            "reward": white_reward,
+            "wallet_white": round(wallet_white, 2),
+            "wallet_black": round(wallet_black, 2),
+            "prize_income": round(
+                prize_pool if result == "1-0"
+                else prize_pool / 2 if result == "1/2-1/2"
+                else 0, 2
+            ),
+            "coaching_cost": 0,
+            "entry_fee": settings.entry_fee,
+            "net_pnl_white": round(white_pnl, 2),
+            "net_pnl_black": round(black_pnl, 2),
+            "move_count": len(move_history),
+            "white_model": settings.white_model,
+            "black_model": settings.black_model,
+            "message": f"Game #{game_count} ended — {result}",
+        })
+        # GRPO update (WHITE model only)
+        training_metrics = trainer.end_game(  # type: ignore[union-attr]
+            reward=white_reward,
+            profit=white_pnl,
+            coaching_calls=0,
+        )
+        if training_metrics is not None:
+            await broadcast("training_step", {
+                "step": training_metrics.step,
+                "loss": round(training_metrics.loss, 6),
+                "reward": round(training_metrics.policy_reward, 4),
+                "kl_div": round(training_metrics.kl_div, 6),
+                "win_rate": round(training_metrics.win_rate, 4),
+                "avg_profit": round(training_metrics.avg_profit, 4),
+                "coaching_rate": round(training_metrics.coaching_rate, 4),
+                "model": settings.white_model,
+                "message": (
+                    f"GRPO step {training_metrics.step} | "
+                    f"loss={training_metrics.loss:.4f} "
+                    f"win_rate={training_metrics.win_rate:.2%}"
+                ),
+            })
+        await asyncio.sleep(1.0)
+# ── Entry point ───────────────────────────────────────────────────────────────
+if __name__ == "__main__":
+    uvicorn.run(
+        "websocket_server:app",
+        host=settings.host,
+        port=settings.port,
+        reload=False,
+        log_level="info",
+    )

doc.md ADDED Viewed

	@@ -0,0 +1,124 @@

+# ChessEcon: A Visual Guide to the Autonomous Chess Economy
+**Author:** Adaboost AI
+**Date:** March 03, 2026
+---
+## Introduction
+This document provides a comprehensive visual overview of the **ChessEcon** system, a multi-agent reinforcement learning platform where AI agents operate as autonomous businesses. The following diagrams and charts illustrate the system's architecture, the flow of information and money, the agent decision-making process, and the dynamics of the training loop. These visualizations are designed to clarify the inter-workings of the agents and the training pipeline, from a single move to a full self-play and training cycle.
+---
+## 1. System Architecture & Information Flow
+The ChessEcon system is composed of several interconnected layers, each with a distinct responsibility. The following diagrams illustrate the high-level architecture and the sequence of events during a typical training loop.
+### 1.1. Full Training Loop Sequence
+This sequence diagram shows the end-to-end flow of a single game, from setup and move-by-move execution to payout and the triggering of a training step. It highlights the interactions between the agents, the environment server, the economic layer, and the training pipeline.
+![Full Training Loop Sequence](https://private-us-east-1.manuscdn.com/sessionFile/ELP96X8OiHqgxiSAuWbFms/sandbox/SsYEQ33FqlWJCy9d2U9OKk-images_1772600757694_na1fn_L2hvbWUvdWJ1bnR1L2NoZXNzZWNvbl92aXovMDFfcmVuZGVyZWQ.png?Policy=eyJTdGF0ZW1lbnQiOlt7IlJlc291cmNlIjoiaHR0cHM6Ly9wcml2YXRlLXVzLWVhc3QtMS5tYW51c2Nkbi5jb20vc2Vzc2lvbkZpbGUvRUxQOTZYOE9pSHFneGlTQXVXYkZtcy9zYW5kYm94L1NzWUVRMzNGcWxXSkN5OWQyVTlPS2staW1hZ2VzXzE3NzI2MDA3NTc2OTRfbmExZm5fTDJodmJXVXZkV0oxYm5SMUwyTm9aWE56WldOdmJsOTJhWG92TURGZmNtVnVaR1Z5WldRLnBuZyIsIkNvbmRpdGlvbiI6eyJEYXRlTGVzc1RoYW4iOnsiQVdTOkVwb2NoVGltZSI6MTc5ODc2MTYwMH19fV19&Key-Pair-Id=K2HSFNDJXOU9YS&Signature=eGAwcqstJAJbwXEo5Rs~dlA7FwgmA8cVOq1n7iyvl3d0SI2Tf7K6ubUmEzi80lKZEKIIomkfvzayiMb7wkvTOtLvyE2ueAcK3mJUKiZa8yh5IrjSHmFrBb0iZBkTXyjwM2h442LtxnT6kE0HB7KiQGWaG8-KLSSwED6MHlO-2H918dmy-T0iNOjfZS~Ov8Uh-T3L7KW3YxUt~w-u1ZUyEvBdDGHEwYQQYRpEosJPMqNp2sz6iODECFS-sf87Gf7QwaPk8oadMhDE41LGjhTdjq2ayab6gcbtxeDvA5HcyDSlAQFJerDTih1LD29LpV11s6S2VqHCTaI9VNsGeh0XYw__)
+### 1.2. Agent Decision-Making Flowchart
+At the heart of ChessEcon is the agent's ability to make both chess and economic decisions. This flowchart details the step-by-step process an agent follows each turn, including the critical decision of whether to purchase expert coaching from Claude claude-opus-4-5.
+![Agent Decision-Making Flowchart](https://private-us-east-1.manuscdn.com/sessionFile/ELP96X8OiHqgxiSAuWbFms/sandbox/SsYEQ33FqlWJCy9d2U9OKk-images_1772600757694_na1fn_L2hvbWUvdWJ1bnR1L2NoZXNzZWNvbl92aXovMDJfcmVuZGVyZWQ.png?Policy=eyJTdGF0ZW1lbnQiOlt7IlJlc291cmNlIjoiaHR0cHM6Ly9wcml2YXRlLXVzLWVhc3QtMS5tYW51c2Nkbi5jb20vc2Vzc2lvbkZpbGUvRUxQOTZYOE9pSHFneGlTQXVXYkZtcy9zYW5kYm94L1NzWUVRMzNGcWxXSkN5OWQyVTlPS2staW1hZ2VzXzE3NzI2MDA3NTc2OTRfbmExZm5fTDJodmJXVXZkV0oxYm5SMUwyTm9aWE56WldOdmJsOTJhWG92TURKZmNtVnVaR1Z5WldRLnBuZyIsIkNvbmRpdGlvbiI6eyJEYXRlTGVzc1RoYW4iOnsiQVdTOkVwb2NoVGltZSI6MTc5ODc2MTYwMH19fV19&Key-Pair-Id=K2HSFNDJXOU9YS&Signature=Q2-uM4Wo~3~-14vDVIUFgEA5fk~zHrNLGxwhe7uFqNqgglNsDW5K~eNiSR3zcU39D8adxCsjlEumO9LLhsppoX2R~-2J3qwO~SKB6LFrgtk83Wg5T4pfAE~upZUk7Iy8vfVhnh3SPx4EITIdzxxBuKOAwlH3IWIk6cTWun6FcLglJf0fjJecjHjJDsp5cvSP0uC7pfk2XkK6V2IDo4JntiJBOxX-Fsxt6X4rDVZ40B4jiJSd-QFHbbHvJ0RHCwadQqerJ55RlRobqqKR-CJC5SFnYFlx6i9xNtzz7o1fh6O1VbojDbQuXFQHdq3YaVFZHa0KvmjIcVLm1Cpij8508w__)
+### 1.3. Economic Flow
+Money is the lifeblood of the ChessEcon system. This diagram illustrates how money flows between agents and the tournament organizer, from entry fees and coaching payments to prize payouts. It also breaks down the net profit for various game outcomes.
+![Economic Flow Diagram](https://private-us-east-1.manuscdn.com/sessionFile/ELP96X8OiHqgxiSAuWbFms/sandbox/SsYEQ33FqlWJCy9d2U9OKk-images_1772600757694_na1fn_L2hvbWUvdWJ1bnR1L2NoZXNzZWNvbl92aXovMDNfcmVuZGVyZWQ.png?Policy=eyJTdGF0ZW1lbnQiOlt7IlJlc291cmNlIjoiaHR0cHM6Ly9wcml2YXRlLXVzLWVhc3QtMS5tYW51c2Nkbi5jb20vc2Vzc2lvbkZpbGUvRUxQOTZYOE9pSHFneGlTQXVXYkZtcy9zYW5kYm94L1NzWUVRMzNGcWxXSkN5OWQyVTlPS2staW1hZ2VzXzE3NzI2MDA3NTc2OTRfbmExZm5fTDJodmJXVXZkV0oxYm5SMUwyTm9aWE56WldOdmJsOTJhWG92TUROZmNtVnVaR1Z5WldRLnBuZyIsIkNvbmRpdGlvbiI6eyJEYXRlTGVzc1RoYW4iOnsiQVdTOkVwb2NoVGltZSI6MTc5ODc2MTYwMH19fV19&Key-Pair-Id=K2HSFNDJXOU9YS&Signature=b~VH4HoyD0uXy64t0SdDvYTkI9cCBTt0DVB3PMSPCAhOiEnDFxH2Oc9dLAVw~5uZmsfAupI~DtNl7VGY3vCrnhbaqeVu8p-SNN-eOxyBJUvIR~gwHAJWrvdP0DcjtPTGsSbCSXagQ2~khsUMVZESvvLfNV1W-TMuEE0UI39NCjpS4ZPVXA26-evIPgMaWJn2cfTeOL9iFCT9nRd36cxdFaFhMP~-Uz56fohCbtHSI7y~h0Fus7lzzuyx0MO8BLkefpqyRWFJf8a~H7LClHt30GIxeryB275d-1I1A8747fm2mUX3uE8C13n6mOtIO3es46v4~Wk6YOaHwuSHp2nmGA__)
+### 1.4. GRPO Training Internals
+The training pipeline uses **Group Relative Policy Optimization (GRPO)**. This flowchart breaks down the four phases of the GRPO process: data collection, reward assignment, advantage computation, and the final loss calculation and model update.
+![GRPO Training Internals](https://private-us-east-1.manuscdn.com/sessionFile/ELP96X8OiHqgxiSAuWbFms/sandbox/SsYEQ33FqlWJCy9d2U9OKk-images_1772600757694_na1fn_L2hvbWUvdWJ1bnR1L2NoZXNzZWNvbl92aXovMDRfcmVuZGVyZWQ.png?Policy=eyJTdGF0ZW1lbnQiOlt7IlJlc291cmNlIjoiaHR0cHM6Ly9wcml2YXRlLXVzLWVhc3QtMS5tYW51c2Nkbi5jb20vc2Vzc2lvbkZpbGUvRUxQOTZYOE9pSHFneGlTQXVXYkZtcy9zYW5kYm94L1NzWUVRMzNGcWxXSkN5OWQyVTlPS2staW1hZ2VzXzE3NzI2MDA3NTc2OTRfbmExZm5fTDJodmJXVXZkV0oxYm5SMUwyTm9aWE56WldOdmJsOTJhWG92TURSZmNtVnVaR1Z5WldRLnBuZyIsIkNvbmRpdGlvbiI6eyJEYXRlTGVzc1RoYW4iOnsiQVdTOkVwb2NoVGltZSI6MTc5ODc2MTYwMH19fV19&Key-Pair-Id=K2HSFNDJXOU9YS&Signature=gXYxpV6xGg~CBHhfEds~kkb-fOt5VPf1F4Qr7DT8LJivp2FDEGHZ5SzHl13WjA8MogHZT-vwm1I973l3NaBdk0YGBLWFnQttUU5fpB31-pVL9Hbtq3-EBUEhBpp9i8tGwX98n7DY0yoAIJz3~v5Q7XJKRxyFC1Ld6OJdlbcNMnglOQ4eTjmVm-tuSXpJKh6C-3VOJPEvW7QFRNDX1pzxkJwDQk3gyKGsOvzOg~VvtmgWgMustsiOob3lRezCzPKCR0dUogLcKCTSPm7HDzLNJoueER43qWSpAf2gah8x2eJx80e98JEsPRf9qFmhjKNXsnJy~TzqT-FMNBNEERbANA__)
+---
+## 2. Training & Economic Performance
+The following charts are generated from a simulated 80-game self-play run, illustrating how the system's performance evolves over the course of training.
+### 2.1. Training Metrics Dashboard
+This 2x2 dashboard provides a high-level view of the key training metrics. It shows the GRPO training loss decreasing, the combined policy reward increasing, the KL divergence remaining stable (indicating controlled training), and the agent's win rate improving over time.
+![Training Metrics Dashboard](https://private-us-east-1.manuscdn.com/sessionFile/ELP96X8OiHqgxiSAuWbFms/sandbox/SsYEQ33FqlWJCy9d2U9OKk-images_1772600757694_na1fn_L2hvbWUvdWJ1bnR1L2NoZXNzZWNvbl92aXovMDVfdHJhaW5pbmdfbWV0cmljcw.png?Policy=eyJTdGF0ZW1lbnQiOlt7IlJlc291cmNlIjoiaHR0cHM6Ly9wcml2YXRlLXVzLWVhc3QtMS5tYW51c2Nkbi5jb20vc2Vzc2lvbkZpbGUvRUxQOTZYOE9pSHFneGlTQXVXYkZtcy9zYW5kYm94L1NzWUVRMzNGcWxXSkN5OWQyVTlPS2staW1hZ2VzXzE3NzI2MDA3NTc2OTRfbmExZm5fTDJodmJXVXZkV0oxYm5SMUwyTm9aWE56WldOdmJsOTJhWG92TURWZmRISmhhVzVwYm1kZmJXVjBjbWxqY3cucG5nIiwiQ29uZGl0aW9uIjp7IkRhdGVMZXNzVGhhbiI6eyJBV1M6RXBvY2hUaW1lIjoxNzk4NzYxNjAwfX19XX0_&Key-Pair-Id=K2HSFNDJXOU9YS&Signature=BCPFjCwipKz6hXnELqk2-QEBKmlOeAM8Dm6iPCqezHF2f0gL6KNgi85vs3l2bN8eR7JGj1OywWZ76IPvsOCIC15wRIpnmqL3vP3kTS92av6ZePqbrV0il~6DrNaJL1ABNBJ~RR8DZGFF578CJehWittrqv5zgPo5hUmRhaMUN1SK7qlHT61N0D31P8SVsCxpZbxAQBEBB~oQn34yaFErmeOOjI~jBj2gqcBVMIQuETuINe4x8S6RwHA0qoig7BH--LtTDhKBtJATMVL0ttPcASRqkHOzrtwcV5BN-6Z~K2XRP-xYpn0hVz6-fDVnD2ZOA4JvkdZgmhj~30kyFYggsg__)
+### 2.2. Economic Performance Over Time
+This chart tracks the wallet balances of the White and Black agents over the 80-game simulation. It clearly shows the White agent, which is the primary agent being trained, learning to become profitable, while the less-trained Black agent's balance stagnates or declines. The bottom panel shows the rolling average of net profit per game, reinforcing the trend of improving economic performance.
+![Economic Performance Over Time](https://private-us-east-1.manuscdn.com/sessionFile/ELP96X8OiHqgxiSAuWbFms/sandbox/SsYEQ33FqlWJCy9d2U9OKk-images_1772600757694_na1fn_L2hvbWUvdWJ1bnR1L2NoZXNzZWNvbl92aXovMDZfZWNvbm9taWNfcGVyZm9ybWFuY2U.png?Policy=eyJTdGF0ZW1lbnQiOlt7IlJlc291cmNlIjoiaHR0cHM6Ly9wcml2YXRlLXVzLWVhc3QtMS5tYW51c2Nkbi5jb20vc2Vzc2lvbkZpbGUvRUxQOTZYOE9pSHFneGlTQXVXYkZtcy9zYW5kYm94L1NzWUVRMzNGcWxXSkN5OWQyVTlPS2staW1hZ2VzXzE3NzI2MDA3NTc2OTRfbmExZm5fTDJodmJXVXZkV0oxYm5SMUwyTm9aWE56WldOdmJsOTJhWG92TURaZlpXTnZibTl0YVdOZmNHVnlabTl5YldGdVkyVS5wbmciLCJDb25kaXRpb24iOnsiRGF0ZUxlc3NUaGFuIjp7IkFXUzpFcG9jaFRpbWUiOjE3OTg3NjE2MDB9fX1dfQ__&Key-Pair-Id=K2HSFNDJXOU9YS&Signature=NRGJRzzs9RbOggZjFdGTC3gPLAuDd9Fx8JgcizZf9wkf57ydgb~zV3i5uYNKiXHHfq97IO4X1G-ZZCvWfwy~CpZpTYnPjoisxWs-gXXz-8p~TQ515aqmZIx4qleCrAL0FnN0pnQTSsRpLxRcqHvNB22JxoD4er-jGREgBhbgMSf2O12MZfqk9e1qF24RPSBhN5yAE-LmxHRWKJPIBWeBhcpS9Dm7YBq2BRM784xmpsWQ5KR8pY4ewaL9KJ4ivmsZtK3C77RZlMuFCzbUI-fg3PQQe8mVATJfijj7i2zXMgBZtQumHqxaMoJlUPgL9tJmgCS8F8YIDuOVrzW978OA5A__)
+---
+## 3. Agent Behavior & Interaction Analysis
+These visualizations dive deeper into the specific behaviors of the agents, particularly the decision to use the premium Claude Coach agent.
+### 3.1. Claude Coaching Usage Analysis
+This set of charts analyzes when and why the Claude Coach is used. It shows that as agents become more skilled, their reliance on coaching decreases. It also demonstrates a clear positive correlation between buying coaching and winning the game, validating its role as a valuable but costly resource.
+![Claude Coaching Usage Analysis](https://private-us-east-1.manuscdn.com/sessionFile/ELP96X8OiHqgxiSAuWbFms/sandbox/SsYEQ33FqlWJCy9d2U9OKk-images_1772600757694_na1fn_L2hvbWUvdWJ1bnR1L2NoZXNzZWNvbl92aXovMDdfY29hY2hpbmdfYW5hbHlzaXM.png?Policy=eyJTdGF0ZW1lbnQiOlt7IlJlc291cmNlIjoiaHR0cHM6Ly9wcml2YXRlLXVzLWVhc3QtMS5tYW51c2Nkbi5jb20vc2Vzc2lvbkZpbGUvRUxQOTZYOE9pSHFneGlTQXVXYkZtcy9zYW5kYm94L1NzWUVRMzNGcWxXSkN5OWQyVTlPS2staW1hZ2VzXzE3NzI2MDA3NTc2OTRfbmExZm5fTDJodmJXVXZkV0oxYm5SMUwyTm9aWE56WldOdmJsOTJhWG92TURkZlkyOWhZMmhwYm1kZllXNWhiSGx6YVhNLnBuZyIsIkNvbmRpdGlvbiI6eyJEYXRlTGVzc1RoYW4iOnsiQVdTOkVwb2NoVGltZSI6MTc5ODc2MTYwMH19fV19&Key-Pair-Id=K2HSFNDJXOU9YS&Signature=FAhX4iVXS4pFhMSovGXgDGQAUMeu7pIzebURdjN3zHEt4BbH4yXiHSb3LFhm8gOyRlOoUE5ZH3pQ70gcrsE4ZV8m30fzgoB~hmN16jUtexO~eF4NlwDvfS7QRTPxW9jey-IJcdxxHgZDL~ZVdOzSy1-sXcOWK0IfvEGy8d45G~QNMgUf57YpCUebX-zoVJTJhEv2WfeOa0gzVlwa9wqa3ZAm5sb-6k9~SqxN7IoAquFOh1XJpQbmuqy9JmZeIydCYjDv4o7wfeM1wxkNRN3CUkOG9IAYuBKn2RONtBKnSENSJJ31GkW0Tk1LIGOPzomAmqPa0DKJk7wNLMqSUaHTMw__)
+### 3.2. Game Statistics Over Training
+This dashboard shows how game outcomes and length change as the agents train. The pie chart gives an overall distribution, while the line chart shows the White agent's win rate steadily climbing. The histogram of game lengths reveals that games tend to become shorter and more decisive as the agents improve.
+![Game Statistics Over Training](https://private-us-east-1.manuscdn.com/sessionFile/ELP96X8OiHqgxiSAuWbFms/sandbox/SsYEQ33FqlWJCy9d2U9OKk-images_1772600757694_na1fn_L2hvbWUvdWJ1bnR1L2NoZXNzZWNvbl92aXovMDhfZ2FtZV9zdGF0aXN0aWNz.png?Policy=eyJTdGF0ZW1lbnQiOlt7IlJlc291cmNlIjoiaHR0cHM6Ly9wcml2YXRlLXVzLWVhc3QtMS5tYW51c2Nkbi5jb20vc2Vzc2lvbkZpbGUvRUxQOTZYOE9pSHFneGlTQXVXYkZtcy9zYW5kYm94L1NzWUVRMzNGcWxXSkN5OWQyVTlPS2staW1hZ2VzXzE3NzI2MDA3NTc2OTRfbmExZm5fTDJodmJXVXZkV0oxYm5SMUwyTm9aWE56WldOdmJsOTJhWG92TURoZloyRnRaVjl6ZEdGMGFYTjBhV056LnBuZyIsIkNvbmRpdGlvbiI6eyJEYXRlTGVzc1RoYW4iOnsiQVdTOkVwb2NoVGltZSI6MTc5ODc2MTYwMH19fV19&Key-Pair-Id=K2HSFNDJXOU9YS&Signature=gUdb3ZCb-VNO2Ew1Y4hVXVFOVDUBxdMDdPw-SABMI6-kr9V8o6C2XqG8hvP4yDqw8TIzvS8~YLAHESWbcWDxOTZFkCUI2L590YVeqrBlDhOihV8U9xfHKCDMTX8YKhWSczEgKmlE6ZpW248RFKcZY4y35RmIEIXIK73BDH~XuSCKy6c7FyFwlshXO2UpfJCVeQE3jbut9rvdkChjc1gcLekuztdSdtiB3sDSj9KZLUZKQuW1KozhBE2a2tucAhC0-bYu4p00kDwLgZeEO3rrpXUODpfnnHrTvtn5ZmQEStgWQmoruYHYKW606PLDT~FnwnJ2Dz5ic17YvuxFRzqf0Q__)
+### 3.3. Reward Function Decomposition
+The core of the economic training is the combined reward function. This chart decomposes the reward, showing the relationship between the game outcome (win/loss/draw) and the economic outcome (net profit). It illustrates how the final reward is a blend of both factors, encouraging agents to be both strong players and shrewd business operators.
+![Reward Function Decomposition](https://private-us-east-1.manuscdn.com/sessionFile/ELP96X8OiHqgxiSAuWbFms/sandbox/SsYEQ33FqlWJCy9d2U9OKk-images_1772600757694_na1fn_L2hvbWUvdWJ1bnR1L2NoZXNzZWNvbl92aXovMDlfcmV3YXJkX2RlY29tcG9zaXRpb24.png?Policy=eyJTdGF0ZW1lbnQiOlt7IlJlc291cmNlIjoiaHR0cHM6Ly9wcml2YXRlLXVzLWVhc3QtMS5tYW51c2Nkbi5jb20vc2Vzc2lvbkZpbGUvRUxQOTZYOE9pSHFneGlTQXVXYkZtcy9zYW5kYm94L1NzWUVRMzNGcWxXSkN5OWQyVTlPS2staW1hZ2VzXzE3NzI2MDA3NTc2OTRfbmExZm5fTDJodmJXVXZkV0oxYm5SMUwyTm9aWE56WldOdmJsOTJhWG92TURsZmNtVjNZWEprWDJSbFkyOXRjRzl6YVhScGIyNC5wbmciLCJDb25kaXRpb24iOnsiRGF0ZUxlc3NUaGFuIjp7IkFXUzpFcG9jaFRpbWUiOjE3OTg3NjE2MDB9fX1dfQ__&Key-Pair-Id=K2HSFNDJXOU9YS&Signature=ANyhIOTVBjmJ31aDD9S1QYcWeZ7sNAB6RCV-eKlYL9OiamGGLm1ZrI3PFf7yG~s0igz27okUQZqn4qVOFU3-yMOHv2IV4ukbmFZVN2V5AV-h~prWZCTmyhHloGOemtQS9HzBaYhvZ4~zL~1h0z5SEDvJS83D8XVqtSufNJt6~V7EY07B1OnMKX031fThxsb9a4veROrpbgN7XcDLAx~DKHm8H0qJuOtCdz~29wuhAsiQBMiuxlUxF6x9uKAPclBeWoON~VCMkrMWgwfdBsXat9lY-Aaawdn6IlY47YXstB6CkqVbSokDlFSdqQWuMfcWXgqbjR4qYKjSr9ZpL3IxhQ__)
+### 3.4. Position Complexity & Claude Trigger Analysis
+Claude is only triggered in complex positions. This chart shows how our heuristic complexity score evolves over a typical game, peaking in the middlegame. The bar chart confirms that the vast majority of Claude coaching calls occur during the strategically rich middlegame phases.
+![Position Complexity & Claude Trigger Analysis](https://private-us-east-1.manuscdn.com/sessionFile/ELP96X8OiHqgxiSAuWbFms/sandbox/SsYEQ33FqlWJCy9d2U9OKk-images_1772600757694_na1fn_L2hvbWUvdWJ1bnR1L2NoZXNzZWNvbl92aXovMTBfY29tcGxleGl0eV9hbmFseXNpcw.png?Policy=eyJTdGF0ZW1lbnQiOlt7IlJlc291cmNlIjoiaHR0cHM6Ly9wcml2YXRlLXVzLWVhc3QtMS5tYW51c2Nkbi5jb20vc2Vzc2lvbkZpbGUvRUxQOTZYOE9pSHFneGlTQXVXYkZtcy9zYW5kYm94L1NzWUVRMzNGcWxXSkN5OWQyVTlPS2staW1hZ2VzXzE3NzI2MDA3NTc2OTRfbmExZm5fTDJodmJXVXZkV0oxYm5SMUwyTm9aWE56WldOdmJsOTJhWG92TVRCZlkyOXRjR3hsZUdsMGVWOWhibUZzZVhOcGN3LnBuZyIsIkNvbmRpdGlvbiI6eyJEYXRlTGVzc1RoYW4iOnsiQVdTOkVwb2NoVGltZSI6MTc5ODc2MTYwMH19fV19&Key-Pair-Id=K2HSFNDJXOU9YS&Signature=FT4u0A~isN0innd6oMmNiu2ivh9NLyTOiwgClA6GF1kHElKvNDkNGJnD23N25ofdE4LzjdbKy7ewYoXGfiUt65qP~m2f8LJVU7WElkL0i4VejjyRav~tPUKWuPFKCh5YLnKiyiPh9UPUY~tGMciuncMQO2~YxhhK~UiE~E4zX9BO5SuaNVRqwH1ySVIl~RhceOCqi~W6xzKurgzcVUj0pEXsLXT8txJ6WHfCfPG90O21pjWcDYsLTL8D75g6fdTg~JHal6uTRWrhFLPSwX-~JYWlVfSuI~eWaVQliBsyMtWQr3bvXyZ1hkTT3mEKhQ7kFaQN8xuXvwNGZFjzcSZWZw__)
+---
+## 4. Detailed Interaction Visualizations
+Finally, these visualizations provide a granular look at the system's inner workings.
+### 4.1. Single-Game Agent Interaction Timeline
+This Gantt-style chart provides a step-by-step timeline of all agent and system interactions during a single, representative game. It clearly shows the sequence of API calls, decisions, and data flows.
+![Single-Game Agent Interaction Timeline](https://private-us-east-1.manuscdn.com/sessionFile/ELP96X8OiHqgxiSAuWbFms/sandbox/SsYEQ33FqlWJCy9d2U9OKk-images_1772600757694_na1fn_L2hvbWUvdWJ1bnR1L2NoZXNzZWNvbl92aXovMTFfaW50ZXJhY3Rpb25fdGltZWxpbmU.png?Policy=eyJTdGF0ZW1lbnQiOlt7IlJlc291cmNlIjoiaHR0cHM6Ly9wcml2YXRlLXVzLWVhc3QtMS5tYW51c2Nkbi5jb20vc2Vzc2lvbkZpbGUvRUxQOTZYOE9pSHFneGlTQXVXYkZtcy9zYW5kYm94L1NzWUVRMzNGcWxXSkN5OWQyVTlPS2staW1hZ2VzXzE3NzI2MDA3NTc2OTRfbmExZm5fTDJodmJXVXZkV0oxYm5SMUwyTm9aWE56WldOdmJsOTJhWG92TVRGZmFXNTBaWEpoWTNScGIyNWZkR2x0Wld4cGJtVS5wbmciLCJDb25kaXRpb24iOnsiRGF0ZUxlc3NUaGFuIjp7IkFXUzpFcG9jaFRpbWUiOjE3OTg3NjE2MDB9fX1dfQ__&Key-Pair-Id=K2HSFNDJXOU9YS&Signature=awdawO~8OoLoBXuZk4TAsiUWSJerJfQHuHhhnOysVQtPFPaIOe9m7LzkBuHgLvnEfqnX0PVesjp~33yK5Q6~Dj9fHe~DRELJyTNEu9Ok8Lk8FPmQvHSX9S0hUbWsWoBj6kbMS6hlF6niGiOXsrN0FPG2ekIaVgYbVhQyHLyCaX509HucACzRBprpgN5IvXinbb8AUHbL-n0AR-Oni2Vlw3ORLXQ3Tob20N0czLPAnlAJ9SKL-ox4q6rB6cIXYTX45alJWJPtMNi9nUWlecfGHKSbadmI0g-CLpd5iQYCyTEvEZg0BAnD~siMrJRWbRiZcUFns1dStzD2Q2BVWMu5mQ__)
+### 4.2. Money Flow Sankey Diagram
+This diagram visualizes the aggregate flow of money across a simulated 10-game tournament, providing a clear picture of the overall economy.
+![Money Flow Sankey Diagram](https://private-us-east-1.manuscdn.com/sessionFile/ELP96X8OiHqgxiSAuWbFms/sandbox/SsYEQ33FqlWJCy9d2U9OKk-images_1772600757694_na1fn_L2hvbWUvdWJ1bnR1L2NoZXNzZWNvbl92aXovMTJfbW9uZXlfZmxvdw.png?Policy=eyJTdGF0ZW1lbnQiOlt7IlJlc291cmNlIjoiaHR0cHM6Ly9wcml2YXRlLXVzLWVhc3QtMS5tYW51c2Nkbi5jb20vc2Vzc2lvbkZpbGUvRUxQOTZYOE9pSHFneGlTQXVXYkZtcy9zYW5kYm94L1NzWUVRMzNGcWxXSkN5OWQyVTlPS2staW1hZ2VzXzE3NzI2MDA3NTc2OTRfbmExZm5fTDJodmJXVXZkV0oxYm5SMUwyTm9aWE56WldOdmJsOTJhWG92TVRKZmJXOXVaWGxmWm14dmR3LnBuZyIsIkNvbmRpdGlvbiI6eyJEYXRlTGVzc1RoYW4iOnsiQVdTOkVwb2NoVGltZSI6MTc5ODc2MTYwMH19fV19&Key-Pair-Id=K2HSFNDJXOU9YS&Signature=NFIrwUUsVxNfrhwL68E7Y7hpBmjQNxPoBpPJTj-gW3Umo-43HIgvpcAC9rqwp04HHhft56JvBU3GAjhy-TSiJyFy91aL4RsmLWYbNZ8b9MZYSSAxGTm7XAMHAukHyvEsPbjFShYmw4TZ6fgwe0TBQ6SfL1dO~Fea4WgV3S-EdIEabiPadqNnfGY5X4IdxNpwg-MnfgANGkzcNTC7dMfwS2BBlfNmG5ndYpG5AmPfbLJ-5hRllpEBU9AYY0Pn0Y35SNdfvJO2dBPxitPEiTaAfWyC79VTqLnVOV5bfRor26jxqN~v5dVeJynJllOquree0WugOoE0W1Y6I4M4ZdxNNQ__)
+### 4.3. LLM Prompt Structure
+The behavior of the agents is driven by carefully crafted prompts. This visualization shows the exact structure of the prompts sent to both the trainable Player Agent (Qwen/Llama) and the premium Claude Coach Agent.
+![LLM Prompt Structure](https://private-us-east-1.manuscdn.com/sessionFile/ELP96X8OiHqgxiSAuWbFms/sandbox/SsYEQ33FqlWJCy9d2U9OKk-images_1772600757694_na1fn_L2hvbWUvdWJ1bnR1L2NoZXNzZWNvbl92aXovMTNfcHJvbXB0X3N0cnVjdHVyZQ.png?Policy=eyJTdGF0ZW1lbnQiOlt7IlJlc291cmNlIjoiaHR0cHM6Ly9wcml2YXRlLXVzLWVhc3QtMS5tYW51c2Nkbi5jb20vc2Vzc2lvbkZpbGUvRUxQOTZYOE9pSHFneGlTQXVXYkZtcy9zYW5kYm94L1NzWUVRMzNGcWxXSkN5OWQyVTlPS2staW1hZ2VzXzE3NzI2MDA3NTc2OTRfbmExZm5fTDJodmJXVXZkV0oxYm5SMUwyTm9aWE56WldOdmJsOTJhWG92TVROZmNISnZiWEIwWDNOMGNuVmpkSFZ5WlEucG5nIiwiQ29uZGl0aW9uIjp7IkRhdGVMZXNzVGhhbiI6eyJBV1M6RXBvY2hUaW1lIjoxNzk4NzYxNjAwfX19XX0_&Key-Pair-Id=K2HSFNDJXOU9YS&Signature=WetJozFy20bllcTD6aod9dtE6rdk-8mrKAi9Hej~RzMt92vRbNOn2hvBrtxDMCtIXFX1NHyQiPEctrjDJ6SwubhHZZJlVuPCXWaYQVJvPpp1uvqAIcPOBJhrn40Yo8rVoi9uTam0z1VrYUsm7Z0jGN8ewl8OvxIhmglrAbqq1Ri9e6Sj2isvVNPSF5JzSNmKQ14IDJyHYxsXezLfQ0YftMsODBWdbigJpWjIQNkD0sYeJvwAUuBR4LOtSLHwPWv4-ZcmNJuI4fUkhFHuMT7VCLd0mpOIAsBiNsN~hBOx2txRAgFVrCfIpZqKsDtrc9QWUoUwZTQ2XQWysgue1~hBXw__)
+### 4.4. Summary Dashboard
+This final dashboard provides a one-glance summary of the entire training process, combining key performance indicators (KPIs) with trend lines for win rate, profit, and coaching usage.
+![Summary Dashboard](https://private-us-east-1.manuscdn.com/sessionFile/ELP96X8OiHqgxiSAuWbFms/sandbox/SsYEQ33FqlWJCy9d2U9OKk-images_1772600757694_na1fn_L2hvbWUvdWJ1bnR1L2NoZXNzZWNvbl92aXovMTRfc3VtbWFyeV9kYXNoYm9hcmQ.png?Policy=eyJTdGF0ZW1lbnQiOlt7IlJlc291cmNlIjoiaHR0cHM6Ly9wcml2YXRlLXVzLWVhc3QtMS5tYW51c2Nkbi5jb20vc2Vzc2lvbkZpbGUvRUxQOTZYOE9pSHFneGlTQXVXYkZtcy9zYW5kYm94L1NzWUVRMzNGcWxXSkN5OWQyVTlPS2staW1hZ2VzXzE3NzI2MDA3NTc2OTRfbmExZm5fTDJodmJXVXZkV0oxYm5SMUwyTm9aWE56WldOdmJsOTJhWG92TVRSZmMzVnRiV0Z5ZVY5a1lYTm9ZbTloY21RLnBuZyIsIkNvbmRpdGlvbiI6eyJEYXRlTGVzc1RoYW4iOnsiQVdTOkVwb2NoVGltZSI6MTc5ODc2MTYwMH19fV19&Key-Pair-Id=K2HSFNDJXOU9YS&Signature=Y6YI5LtCBwodv4T-P-XqOWgxcVfIjfzH7Kc7iggFpyDuJsTc7LR3C9UhOSK9zihh9BTzPPriKUGyuoHmgZuq5kqp1ggCMOIVXwBQ0VjLJ2d3885RpRrnpAoG3ZeWk8iBtCUF0HpZw9~dvE8aWCG2DLpW9ly-~8ETsbV9GUBkuC777gDAF64EuKBN2WgMtf4K5es1R~7Sv5zhBBTWBYHgbGvcZwpnLO5Cpj5BKKkTZYnh-qEcBbN1R3M~QJCAz5Bjz3uT87zitUYQMwaopdyyTEyF8MHKOab2cNH1IZa-q30TsokkRUmSrC9ot7WL~Sp9gF2f8OyNE6oY7RT54TMoXw__)
+---
+## Conclusion
+These visualizations collectively demonstrate a robust and well-defined system where AI agents learn to navigate a competitive environment with real economic constraints. The data shows clear evidence of learning, both in terms of chess-playing ability and economic decision-making, validating the core principles of the ChessEcon project.

docker-compose.gpu.yml ADDED Viewed

	@@ -0,0 +1,52 @@

+# ─────────────────────────────────────────────────────────────────────────────
+# ChessEcon — GPU Override (docker-compose.gpu.yml)
+#
+# Usage:
+#   docker compose -f docker-compose.yml -f docker-compose.gpu.yml up
+#
+# Requirements:
+#   - NVIDIA GPU with CUDA 12.1+ support
+#   - nvidia-container-toolkit installed on the host
+#   - Run: sudo nvidia-ctk runtime configure --runtime=docker
+# ─────────────────────────────────────────────────────────────────────────────
+services:
+  chessecon:
+    build:
+      target: backend-gpu
+    image: chessecon:gpu
+    environment:
+      CUDA_VISIBLE_DEVICES: "${CUDA_VISIBLE_DEVICES:-0}"
+      TORCH_DTYPE: "${TORCH_DTYPE:-bfloat16}"
+      USE_FLASH_ATTENTION: "${USE_FLASH_ATTENTION:-true}"
+      DEVICE: "cuda"
+    deploy:
+      resources:
+        reservations:
+          devices:
+            - driver: nvidia
+              count: 1
+              capabilities: [gpu]
+  trainer:
+    build:
+      target: backend-gpu
+    image: chessecon:gpu
+    environment:
+      CUDA_VISIBLE_DEVICES: "${CUDA_VISIBLE_DEVICES:-all}"
+      TORCH_DTYPE: "${TORCH_DTYPE:-bfloat16}"
+      USE_FLASH_ATTENTION: "${USE_FLASH_ATTENTION:-true}"
+      DEVICE: "cuda"
+      # Multi-GPU training
+      NPROC_PER_NODE: "${NPROC_PER_NODE:-1}"
+      # Larger batches on GPU
+      GAMES_PER_BATCH: "${GAMES_PER_BATCH:-16}"
+      BATCH_SIZE: "${BATCH_SIZE:-8}"
+    deploy:
+      resources:
+        reservations:
+          devices:
+            - driver: nvidia
+              count: all
+              capabilities: [gpu]

docker-compose.yml ADDED Viewed

	@@ -0,0 +1,63 @@

+version: "3.9"
+# ChessEcon — OpenEnv 0.1 compliant multi-agent chess economy
+#
+# White: Qwen/Qwen2.5-0.5B-Instruct  (GRPO training target)
+# Black: meta-llama/Llama-3.2-1B-Instruct  (fixed opponent)
+#
+# Quick start:
+#   docker compose up --build
+services:
+  backend:
+    build:
+      context: ./backend
+      dockerfile: Dockerfile
+    image: chessecon-backend:latest
+    container_name: chessecon-backend
+    restart: unless-stopped
+    ports:
+      - "8008:8000"
+    env_file:
+      - ./backend/.env
+    environment:
+      - DEVICE=cuda                                         # GPU inference
+      - HOST=0.0.0.0
+      - PORT=8000
+      - WHITE_MODEL=/models/Qwen_Qwen2.5-0.5B-Instruct
+      - BLACK_MODEL=/models/meta-llama_Llama-3.2-1B-Instruct
+      - HF_HUB_OFFLINE=1
+      - CUDA_VISIBLE_DEVICES=0                             # use first GPU
+    volumes:
+      - ./training/models:/models:ro                                   # model weights
+      - /home/minasm/.cache/huggingface:/root/.cache/huggingface:ro    # HF cache
+      - checkpoints:/app/checkpoints                                    # LoRA checkpoints
+    deploy:
+      resources:
+        reservations:
+          devices:
+            - driver: nvidia
+              count: 1
+              capabilities: [gpu]
+    healthcheck:
+      test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
+      interval: 30s
+      timeout: 10s
+      retries: 5
+      start_period: 180s
+  dashboard:
+    image: nginx:alpine
+    container_name: chessecon-dashboard
+    restart: unless-stopped
+    ports:
+      - "3006:80"
+    extra_hosts:
+      - "host.docker.internal:host-gateway"
+    volumes:
+      - ./frontend/dist/public:/usr/share/nginx/html:ro
+      - ./nginx.conf:/etc/nginx/conf.d/default.conf:ro
+volumes:
+  checkpoints:

docker-compose.yml_backup ADDED Viewed

	@@ -0,0 +1,61 @@

+version: "3.9"
+# ─────────────────────────────────────────────────────────────────────────────
+# ChessEcon — Full stack
+#
+# Services:
+#   backend   — Python FastAPI WebSocket server (Qwen + GRPO)
+#   dashboard — React/Node.js dashboard (Manus web project)
+#
+# Quick start (GPU machine):
+#   cp backend/.env.example backend/.env   # fill in HF_TOKEN etc.
+#   docker compose up --build
+#
+# The dashboard is served on port 3000.
+# The backend WebSocket is on port 8000 (/ws).
+# In LIVE mode the dashboard connects to ws://localhost:8000/ws
+# (or set VITE_WS_URL to override).
+# ─────────────────────────────────────────────────────────────────────────────
+services:
+  # ── Python backend (Qwen + GRPO + WebSocket) ─────────────────────────────
+  backend:
+    build:
+      context: ./backend
+      dockerfile: Dockerfile
+    image: chessecon-backend:latest
+    container_name: chessecon-backend
+    restart: unless-stopped
+    ports:
+      - "8008:8000"
+    env_file:
+      - ./backend/.env          # create from env-vars-reference.md
+    environment:
+      - DEVICE=auto             # override with "cpu" if no GPU
+      - HOST=0.0.0.0
+      - PORT=8000
+    volumes:
+      - hf_cache:/app/.cache/huggingface   # persist model weights
+      - checkpoints:/app/checkpoints       # persist LoRA adapters
+    healthcheck:
+      test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
+      interval: 30s
+      timeout: 10s
+      retries: 5
+      start_period: 120s      # allow time for model download
+  # ── React dashboard (Node.js dev server) ─────────────────────────────────
+  dashboard:
+    image: nginx:alpine
+    container_name: chessecon-dashboard
+    restart: unless-stopped
+    ports:
+     - "3006:80"
+    volumes:
+     - ./frontend/dist/public:/usr/share/nginx/html:ro
+volumes:
+  hf_cache:
+  checkpoints:

docker-entrypoint.sh ADDED Viewed

	@@ -0,0 +1,175 @@

+#!/bin/bash
+# ─────────────────────────────────────────────────────────────────────────────
+# ChessEcon Docker Entrypoint
+#
+# Modes (CMD argument):
+#   backend       — Start the FastAPI server (default)
+#   train         — Run the RL training loop
+#   selfplay      — Run self-play data collection only (no training)
+#   download      — Download the HuggingFace model and exit
+#   demo          — Run a quick 3-game demo and exit
+# ─────────────────────────────────────────────────────────────────────────────
+set -euo pipefail
+MODE="${1:-backend}"
+echo "╔══════════════════════════════════════════════════════════════╗"
+echo "║              ChessEcon — Multi-Agent Chess RL                ║"
+echo "║  TextArena + Meta OpenEnv + GRPO | Hackathon 2026            ║"
+echo "╚══════════════════════════════════════════════════════════════╝"
+echo ""
+echo "Mode: $MODE"
+echo "Model: ${PLAYER_MODEL:-Qwen/Qwen2.5-0.5B-Instruct}"
+echo "RL Method: ${RL_METHOD:-grpo}"
+echo ""
+# ── Validate required environment variables ───────────────────────────────
+check_env() {
+    local var_name="$1"
+    local required="${2:-false}"
+    if [ -z "${!var_name:-}" ]; then
+        if [ "$required" = "true" ]; then
+            echo "ERROR: Required environment variable $var_name is not set."
+            echo "       Please set it in your .env file or Docker environment."
+            exit 1
+        else
+            echo "WARNING: Optional variable $var_name is not set."
+        fi
+    fi
+}
+# Always required
+check_env "HF_TOKEN" "true"
+# Required for Claude coaching
+if [ "${ENABLE_CLAUDE_COACHING:-true}" = "true" ]; then
+    check_env "ANTHROPIC_API_KEY" "true"
+fi
+# ── Download model from HuggingFace if not cached ────────────────────────
+MODEL_NAME="${PLAYER_MODEL:-Qwen/Qwen2.5-0.5B-Instruct}"
+MODEL_CACHE_DIR="/app/models/$(echo $MODEL_NAME | tr '/' '_')"
+if [ ! -d "$MODEL_CACHE_DIR" ] || [ "${FORCE_DOWNLOAD:-false}" = "true" ]; then
+    echo "Downloading model: $MODEL_NAME"
+    echo "Cache directory: $MODEL_CACHE_DIR"
+    python3 -c "
+from huggingface_hub import snapshot_download
+import os
+snapshot_download(
+    repo_id='${MODEL_NAME}',
+    local_dir='${MODEL_CACHE_DIR}',
+    token=os.environ.get('HF_TOKEN'),
+    ignore_patterns=['*.bin', '*.pt'] if os.environ.get('USE_SAFETENSORS', 'true') == 'true' else []
+)
+print('Model downloaded successfully.')
+"
+    echo "Model ready at: $MODEL_CACHE_DIR"
+else
+    echo "Model already cached at: $MODEL_CACHE_DIR"
+fi
+export MODEL_LOCAL_PATH="$MODEL_CACHE_DIR"
+# ── Execute the requested mode ────────────────────────────────────────────
+case "$MODE" in
+    backend)
+        echo ""
+        echo "Starting ChessEcon API server on port ${PORT:-8000}..."
+        echo "Dashboard: http://localhost:${PORT:-8000}"
+        echo "API docs:  http://localhost:${PORT:-8000}/docs"
+        echo "WebSocket: ws://localhost:${PORT:-8000}/ws"
+        echo ""
+        exec python3 -m uvicorn backend.main:app \
+            --host 0.0.0.0 \
+            --port "${PORT:-8000}" \
+            --workers "${WORKERS:-1}" \
+            --log-level "${LOG_LEVEL:-info}"
+        ;;
+    train)
+        echo ""
+        echo "Starting RL training..."
+        echo "Method: ${RL_METHOD:-grpo}"
+        echo "Games per batch: ${GAMES_PER_BATCH:-8}"
+        echo "Training steps: ${MAX_TRAINING_STEPS:-1000}"
+        echo ""
+        exec python3 -m training.run \
+            --method "${RL_METHOD:-grpo}" \
+            --model-path "$MODEL_LOCAL_PATH" \
+            --games-per-batch "${GAMES_PER_BATCH:-8}" \
+            --max-steps "${MAX_TRAINING_STEPS:-1000}" \
+            --output-dir "/app/data/training" \
+            --log-dir "/app/logs"
+        ;;
+    selfplay)
+        echo ""
+        echo "Starting self-play data collection..."
+        echo "Games: ${SELFPLAY_GAMES:-100}"
+        echo ""
+        exec python3 -m training.run \
+            --method selfplay \
+            --model-path "$MODEL_LOCAL_PATH" \
+            --games "${SELFPLAY_GAMES:-100}" \
+            --output-dir "/app/data/games"
+        ;;
+    download)
+        echo "Model download complete. Exiting."
+        exit 0
+        ;;
+    demo)
+        echo ""
+        echo "Running 3-game demo..."
+        exec python3 -c "
+import asyncio
+import sys
+sys.path.insert(0, '/app')
+from backend.chess.engine import ChessEngine
+from backend.economy.ledger import EconomicConfig, WalletManager, TournamentOrganizer
+async def run_demo():
+    config = EconomicConfig()
+    wallets = WalletManager(config)
+    wallets.create_wallet('white', 100.0)
+    wallets.create_wallet('black', 100.0)
+    organizer = TournamentOrganizer(config, wallets)
+    for game_num in range(1, 4):
+        print(f'\n--- Game {game_num} ---')
+        engine = ChessEngine()
+        game_id = organizer.open_game('white', 'black')
+        print(f'Game ID: {game_id}')
+        print(f'Prize pool: {organizer.games[game_id].prize_pool}')
+        move_count = 0
+        while not engine.is_game_over() and move_count < 20:
+            legal = engine.get_legal_moves()
+            if not legal:
+                break
+            import random
+            move = random.choice(legal)
+            engine.make_move(move)
+            move_count += 1
+        result = engine.get_result() or '1/2-1/2'
+        winner = 'white' if result == '1-0' else ('black' if result == '0-1' else None)
+        payout = organizer.close_game(game_id, winner)
+        print(f'Result: {result} | White: {payout[\"white\"]:.1f} | Black: {payout[\"black\"]:.1f}')
+        print(f'Wallets — White: {wallets.get_balance(\"white\"):.1f} | Black: {wallets.get_balance(\"black\"):.1f}')
+    print('\nDemo complete.')
+asyncio.run(run_demo())
+"
+        ;;
+    *)
+        echo "Unknown mode: $MODE"
+        echo "Valid modes: backend | train | selfplay | download | demo"
+        exit 1
+        ;;
+esac

docs/Issues.md ADDED Viewed

	@@ -0,0 +1,47 @@

+Here is a complete record of every issue encountered and fix applied during this session.
+ChessEcon Dashboard — Issues & Fixes Log
+Issue 1 — Black pieces invisible on dark squares
+Root cause: Black pieces were rendered in #1a1a2e (near-black), which was indistinguishable from the dark board background (#141c21).
+Fix: Changed black piece color to vivid gold #E8B400 (filled Unicode symbols ♚♛♜♝♞♟) with a dark drop-shadow. Gold is visible on both cream and brown squares.
+File changed: frontend/client/src/components/ChessBoard.tsx
+Issue 2 — White pieces invisible on light squares
+Root cause: White pieces were initially set to #FFFFFF (white), which was invisible on cream squares (#F0D9B5). A subsequent attempt using navy #1B2A6B was too dark on brown squares. A -webkit-text-stroke approach made pieces appear grey/muddy.
+Fix: White pieces (hollow Unicode symbols ♔♕♖♗♘♙) are rendered in dark navy #1a2744. The hollow outline of the Unicode symbol is clearly visible on both cream and brown squares. A subtle white drop-shadow adds depth.
+File changed: frontend/client/src/components/ChessBoard.tsx
+Issue 3 — Board square colors too dark (original dark theme)
+Root cause: The CSS classes chess-square-light and chess-square-dark used near-black values (#141c21 and similar) from the original dark terminal theme, making the board unreadable.
+Fix: Changed to classic chess board colors — cream #F0D9B5 for light squares and warm brown #B58863 for dark squares.
+File changed: frontend/client/src/index.css
+Issue 4 — Docker build not picking up source changes
+Root cause: Files were copied to chessecon/frontend/src/components/ but the Vite Docker config (vite.docker.config.ts) uses root: "client", meaning it reads from chessecon/frontend/client/src/. The wrong directory was targeted.
+Fix: Identified the correct path from vite.docker.config.ts. All file copies must go to chessecon/frontend/client/src/ (not frontend/src/).
+Command to verify: docker exec chessecon-app grep -o "chess-square-light[^}]*}" /app/backend/static/assets/index-*.css
+Issue 5 — Browser serving cached old CSS (304 Not Modified)
+Root cause: After Docker rebuild, the browser loaded the old index-ezKtz3Zw.css from cache because the filename hash had not changed (Vite produces the same hash when output bytes are identical).
+Fix: Open the app in an Incognito/Private window, or enable "Disable cache" in DevTools Network tab before reloading.
+Issue 6 — Board completely blank on initial load
+Root cause: An earlier attempt used position: absolute with inset: "1.75rem 0 0 0" on the board div, but the Panel component's root element did not have position: relative in the right context, so the board rendered outside the visible area.
+Fix: Rewrote ChessBoard to use a pure flex layout (width: 100%; height: 100%) that fills its container naturally, removing all absolute positioning.
+File changed: frontend/client/src/components/ChessBoard.tsx
+Issue 7 — Black horizontal lines appearing on the board during simulation
+Root cause: The board used a CSS class chess-board with display: grid but the container had overflow: hidden cutting rows unevenly. Combined with flex: 1 on the board panel, grid row heights became fractional and borders bled through as visible lines.
+Fix: Moved all grid styles inline (display: grid, gridTemplateColumns, gridTemplateRows, overflow: hidden) directly on the board div, eliminating the CSS class dependency and ensuring clean row boundaries.
+File changed: frontend/client/src/components/ChessBoard.tsx
+Issue 8 — Board stretching vertically during first 2–3 simulations
+Root cause: The board panel used flex: 1 which caused it to grow to fill all remaining vertical space in the left column. As the wallet history chart appeared below and the page layout expanded, the left column grew taller and the board stretched with it.
+Fix: Replaced flex: 1 with aspectRatio: "1 / 1" and flexShrink: 0 on the board panel container. The board height is now always derived from its width — it is a strict square at all times regardless of surrounding layout changes.
+File changed: frontend/client/src/pages/Home.tsx
+Summary of files changed
+File
+Changes
+frontend/client/src/components/ChessBoard.tsx
+Piece colors, layout rewrite, inline grid styles
+frontend/client/src/index.css
+Square colors (#F0D9B5 / #B58863)
+frontend/client/src/pages/Home.tsx
+Board panel aspect-ratio fix, agent cards layout
+Docker rebuild command (run after copying all three files)
+Bash
+docker compose down
+docker compose build --no-cache chessecon
+PORT=8006 docker compose up chessecon

docs/latest_fixes_howto.md ADDED Viewed

	@@ -0,0 +1,420 @@

+# ChessEcon Setup Log
+Complete record of all steps, issues, and fixes for the backend and frontend.
+---
+## Frontend (Dashboard) — Manus Web Project
+### Design & Layout
+The dashboard was built as a Bloomberg-style dark terminal UI with:
+- KPI cards row (wallets, coaching calls, last reward, win rate, GRPO loss, KL div)
+- Agent cards (White / Black with wallet and Claude call count)
+- Live chess board (left column)
+- Move history feed (centre column)
+- GRPO training metrics charts (right column)
+- Wallet history chart
+- Live event feed
+- Economic performance chart (bottom)
+### Issue 1 — Panels expanding vertically beyond viewport
+**Symptom:** Panels in the middle and right columns were growing taller than 100vh, causing the page to scroll.
+**Fix:** Changed the root container from `minHeight: 100vh` to `height: 100vh` with `overflow: hidden`. Added `minHeight: 0` and `overflow: hidden` to the GRPO Training Metrics panel.
+### Issue 2 — Chess board clipping rows 1 and 2
+**Symptom:** The board panel was clipping at the bottom — white pawns (row 2) and back rank (row 1) were not visible.
+**Root cause:** The board panel used `flexShrink: 0` with `aspectRatio: 1/1`. As the left column was squeezed by the 100vh constraint, the board overflowed its container.
+**Fix:** Changed the board panel to `flex: 1` with `minHeight: 0` and `overflow: hidden` so it fills available height without overflowing.
+---
+## Backend — Python FastAPI + Qwen2.5-0.5B + GRPO
+### Architecture
+New files added to `backend/`:
+| File | Purpose |
+|---|---|
+| `settings.py` | Centralised env-var config (model name, device, fees, GRPO params) |
+| `chess_engine.py` | Thin `python-chess` wrapper (copied to `chess/chess_engine.py`) |
+| `qwen_agent.py` | Qwen2.5-0.5B move generator with LoRA + illegal-move retry (copied to `agents/qwen_agent.py`) |
+| `grpo_trainer.py` | GRPO policy gradient training loop (copied to `agents/grpo_trainer.py`) |
+| `websocket_server.py` | FastAPI WebSocket server (merged into existing `main.py`) |
+| `requirements.txt` | Python dependencies |
+| `Dockerfile` | GPU-capable container |
+| `docker-compose.yml` | Orchestrates backend + dashboard |
+### Step 1 — Environment setup on Lambda Labs GPU machine
+**Machine:** 4× RTX 3070 (8 GB VRAM each), CUDA 12.4, Ubuntu 20.04
+**Issue:** System `pip` at `/usr/local/bin/pip` was broken due to Ubuntu 20.04 `pyOpenSSL` / `libssl` version conflict.
+```
+pkg_resources.VersionConflict: (uvicorn 0.27.0, Requirement.parse('uvicorn==0.11.3'))
+AttributeError: module 'lib' has no attribute 'X509_V_FLAG_NOTIFY_POLICY'
+```
+**Fix:** Used Anaconda's pip instead of the system pip. Created a fresh conda env with Python 3.11:
+```bash
+conda create -n chessecon python=3.11 -y
+conda activate chessecon
+```
+### Step 2 — Installing requirements.txt
+**Issue 1:** Duplicate `transformers` version pin in `requirements.txt`.
+```
+ERROR: Double requirement given: transformers>=4.46.0 (already in transformers>=4.40.0)
+```
+**Fix:**
+```bash
+sed -i '/^transformers>=4.40.0/d' requirements.txt
+```
+**Issue 2:** `pydantic>=2.7.0` not available — conda env had pydantic 2.5.3 max.
+**Fix:**
+```bash
+sed -i 's/pydantic>=2.7.0/pydantic>=2.0.0/' requirements.txt
+sed -i 's/pydantic-settings>=2.3.0/pydantic-settings>=2.0.0/' requirements.txt
+```
+**Issue 3:** `httpx>=0.27.0` not available in conda env.
+**Fix:** Removed all version pins:
+```bash
+sed -i 's/>=.*//' requirements.txt
+```
+**Issue 4:** `payments-py` (Nevermined SDK) not on PyPI.
+**Fix:**
+```bash
+sed -i '/payments-py/d' requirements.txt
+```
+**Issue 5:** `jitter` requires Python 3.8+ — conda base env was Python 3.7, blocking `anthropic` install.
+**Fix:** Used the `chessecon` conda env (Python 3.11) instead of the base env.
+**Issue 6:** `transformers` required PyTorch 2.4+; conda env had PyTorch 1.9.
+**Fix:**
+```bash
+pip install torch==2.4.1 --index-url https://download.pytorch.org/whl/cu121
+pip install transformers accelerate peft sentencepiece tokenizers
+```
+### Step 3 — Running the server
+**Command:**
+```bash
+cd ~/suvasis/tools/blogs/hackathon/ChessEcon
+python3.11 -m uvicorn backend.main:app --host 0.0.0.0 --port 8008 --reload
+```
+**Issue 1:** System `uvicorn` at `/bin/uvicorn` conflicted with the newly installed version.
+**Fix:** Used `python3.11 -m uvicorn` instead of the bare `uvicorn` command.
+**Issue 2:** Port 8000 already in use by existing backend.
+**Fix:** Used port 8008 instead.
+**Issue 3:** `from backend.api.game_router import router` — absolute import failed when running from inside `backend/`.
+**Fix:** Run from the parent directory (`ChessEcon/`) using `backend.main:app` as the module path.
+**Issue 4:** New files (`qwen_agent.py`, `grpo_trainer.py`, `chess_engine.py`) were placed at `backend/` root but `main.py` expected them at `backend/agents/` and `backend/chess/`.
+**Fix:**
+```bash
+cp backend/qwen_agent.py backend/agents/qwen_agent.py
+cp backend/grpo_trainer.py backend/agents/grpo_trainer.py
+cp backend/chess_engine.py backend/chess/chess_engine.py
+```
+**Issue 5:** New imports inside `qwen_agent.py` and `grpo_trainer.py` used bare `from settings import settings` — failed when running from parent directory.
+**Fix:**
+```bash
+sed -i 's/^from settings import settings/from backend.settings import settings/' \
+    backend/agents/qwen_agent.py backend/agents/grpo_trainer.py
+sed -i 's/^from chess_engine import ChessEngine/from backend.chess.chess_engine import ChessEngine/' \
+    backend/agents/qwen_agent.py backend/agents/grpo_trainer.py
+```
+**Issue 6:** WebSocket endpoint block was inserted before `app = FastAPI()` in `main.py`, causing `NameError: name 'app' is not defined`.
+**Fix:** Rewrote `main.py` with the WebSocket endpoint and `game_loop` correctly placed after `app` is created.
+### Step 4 — HuggingFace authentication
+**Issue:** Expired token `"llama4"` was cached at `~/.cache/huggingface/token`, causing 401 errors even after creating a new token.
+**Fix:**
+```bash
+rm -f ~/.cache/huggingface/token
+export HF_TOKEN=hf_<new_token>
+echo "HF_TOKEN=hf_<new_token>" >> backend/.env
+```
+**Note on token type:** The new `hackathon_chess` token was created as a **Fine-grained** token with no repository permissions, which also returns 401. The fix was to either edit its permissions to add **Contents: Read**, or create a classic **Read** token instead.
+### Step 5 — Game loop API mismatches
+After the model loaded successfully on `cuda:1`, the `game_loop` had several API mismatches with the actual `ChessEngine` and `QwenAgent` classes:
+| Error | Cause | Fix |
+|---|---|---|
+| `'bool' object is not callable` | `engine.is_game_over` is a `@property`, called with `()` | Remove `()` |
+| `QwenAgent.__init__() takes 1 positional argument but 3 were given` | Constructor takes no args | `QwenAgent()` with no args |
+| `QwenAgent.get_move() missing 2 required positional arguments` | `get_move(engine, agent_color, move_history)` — not `(fen, ...)` | Pass `engine` object, not `engine.fen` |
+| `'Settings' object has no attribute 'initial_wallet'` | Field is `starting_wallet` not `initial_wallet` | `settings.starting_wallet` |
+| `'Settings' object has no attribute 'move_delay_seconds'` | Field is `move_delay` | `settings.move_delay` |
+| `'TrainingMetrics' object has no attribute 'grpo_loss'` | Field is `loss` not `grpo_loss`; `kl_div` not `kl_divergence` | Use correct field names |
+| `NameError: 'move_history' is not defined` | Not initialised before the move loop | `move_history = []` after `engine = ChessEngine()` |
+| `'QwenAgent' object has no attribute 'wallet'` | `QwenAgent` has no wallet — economy tracked separately | Use local `wallet_white` / `wallet_black` variables |
+| `'QwenAgent' object has no attribute 'trajectory'` | Trajectory is internal to trainer | Use `getattr(agent, 'trajectory', [])` |
+### Step 6 — Game running successfully
+After all fixes, the server runs cleanly:
+```
+Model loaded on device: cuda:1
+trainable params: 540,672 || all params: 494,573,440 || trainable%: 0.1093
+LoRA adapter applied (rank=8)
+GRPO step 1 | loss=nan reward=1.000 kl=1808.1735 win_rate=1.00
+```
+**Expected warnings (not errors):**
+- `All retries exhausted — using random fallback move` — Normal for an untrained model. Qwen generates illegal moves initially; the fallback ensures the game continues. This improves as GRPO training progresses.
+- `loss=nan on step 1` — Normal. GRPO requires multiple trajectory samples to compute group-relative advantages (std deviation). With only one game, std=0 → NaN. Resolves after a few games.
+---
+## Frontend Docker Setup (macOS)
+The dashboard is the Manus React web project. To run it on macOS pointing at the Lambda backend:
+### docker-compose.yml changes needed for macOS
+1. Remove the `deploy: resources: reservations: devices` GPU block (macOS has no NVIDIA GPU).
+2. Add `VITE_WS_URL=ws://<LAMBDA_IP>:8008/ws` to the `dashboard` environment so the frontend connects to the remote backend.
+3. Remove `depends_on` health check (backend is not running locally).
+```bash
+# Build and run just the dashboard
+docker-compose build --no-cache dashboard
+docker-compose up -d dashboard
+# Dashboard available at http://localhost:3000
+```
+### Connecting to the backend
+In the dashboard, click **LIVE** in the top-right toggle. The `VITE_WS_URL` env var sets the default WebSocket URL. If not set, the dashboard defaults to `ws://localhost:8008/ws`.
+---
+## Current Status
+| Component | Status |
+|---|---|
+| Dashboard UI | Running — simulation mode fully functional |
+| Backend server | Running on Lambda at port 8008 |
+| Qwen2.5-0.5B | Loaded on `cuda:1`, generating moves |
+| GRPO training | Active — step 1 completed |
+| Dashboard ↔ Backend connection | Pending — need to run frontend and set `VITE_WS_URL` |
+| Claude coaching | Disabled — `ANTHROPIC_API_KEY` not set |
+| Nevermined integration | Not implemented (deferred) |
+---
+## Dashboard Docker Deployment — Lambda GPU Machine (Mar 5, 2026)
+This section documents the complete sequence of attempts, failures, and fixes required to get the React dashboard accessible at `http://192.168.1.140:3006` on the Lambda GPU machine.
+---
+### Attempt 1 — https instead of http
+**Symptom:** Browser showed "This site can't be reached — 192.168.1.140 refused to connect" when navigating to `https://192.168.1.140:3006`.
+**Root cause:** The dashboard has no SSL certificate configured. Nginx serves plain HTTP only.
+**Fix:** Use `http://192.168.1.140:3006` (not `https://`).
+---
+### Attempt 2 — Dashboard container running the wrong image (GPU backend)
+**Symptom:** `docker-compose ps` showed the `chessecon-dashboard` container in a `Restarting` loop. Logs showed the Python backend entrypoint running and failing with:
+```
+ERROR: Required environment variable HF_TOKEN is not set.
+```
+**Root cause:** The `docker-compose.yml` `dashboard` service originally had:
+```yaml
+dashboard:
+  build:
+    context: .
+    dockerfile: Dockerfile
+```
+This `Dockerfile` at the project root is the **combined GPU backend + frontend** image (CUDA, PyTorch, Python, `docker-entrypoint.sh`). Docker had already built and tagged it as `chessecon-dashboard:latest`. Even after changing the `docker-compose.yml` to `image: nginx:alpine`, running `docker-compose up -d` reused the cached `chessecon-dashboard:latest` image instead of pulling nginx.
+**Fix — two steps:**
+1. Remove the stale image:
+   ```bash
+   docker-compose down
+   docker rmi chessecon-dashboard:latest
+   ```
+2. Update `docker-compose.yml` dashboard service to use nginx directly (no build step):
+   ```yaml
+   dashboard:
+     image: nginx:alpine
+     container_name: chessecon-dashboard
+     restart: unless-stopped
+     ports:
+       - "3006:80"
+     volumes:
+       - ./frontend/dist/public:/usr/share/nginx/html:ro
+   ```
+3. Bring up fresh:
+   ```bash
+   docker-compose up -d dashboard
+   ```
+---
+### Attempt 3 — 403 Forbidden: wrong volume path
+**Symptom:** Nginx started successfully but returned `403 Forbidden`. Nginx error log showed:
+```
+directory index of "/usr/share/nginx/html/" is forbidden
+```
+**Root cause — part A:** The volume mount was initially set to `./frontend/dist:/usr/share/nginx/html:ro`. The Vite build outputs files to `frontend/dist/public/` (not `frontend/dist/` directly) because the Manus web project template configures `publicDir` in `vite.config.ts`. So nginx was serving an empty directory.
+**Root cause — part B:** The frontend had not been built at all — `frontend/dist/public/` did not exist yet.
+**Fix — step 1:** Build the frontend on the host machine:
+```bash
+cd ~/suvasis/tools/blogs/hackathon/ChessEcon/frontend
+VITE_WS_URL=ws://192.168.1.140:8008/ws pnpm build
+```
+The build completed successfully (the esbuild error for `server/_core/index.ts` is a server-side build step that does not affect the static frontend output):
+```
+../dist/public/index.html                 367.71 kB
+../dist/public/assets/index-dzvHG_3C.css  118.58 kB
+../dist/public/assets/index-TWBDAwdS.js   887.67 kB
+✓ built in 4.20s
+```
+**Fix — step 2:** Update the volume path in `docker-compose.yml`:
+```bash
+sed -i 's|./frontend/dist:/usr/share/nginx/html|./frontend/dist/public:/usr/share/nginx/html|' docker-compose.yml
+```
+---
+### Attempt 4 — `docker-compose restart` does not re-read volume mounts
+**Symptom:** After updating the volume path and running `docker-compose restart dashboard`, the 403 persisted.
+**Root cause:** `docker-compose restart` only stops and restarts the existing container — it does **not** recreate it. Volume mount changes in `docker-compose.yml` are only applied when the container is recreated. `restart` does not trigger recreation.
+**Fix:** Always use `down` + `up` when changing volume mounts or image references:
+```bash
+docker-compose down
+docker-compose up -d dashboard
+```
+---
+### Final Working State
+After all fixes, the dashboard is accessible at `http://192.168.1.140:3006`.
+**Summary of the working `docker-compose.yml` dashboard service:**
+```yaml
+dashboard:
+  image: nginx:alpine
+  container_name: chessecon-dashboard
+  restart: unless-stopped
+  ports:
+    - "3006:80"
+  volumes:
+    - ./frontend/dist/public:/usr/share/nginx/html:ro
+```
+**Summary of the working build + deploy sequence:**
+```bash
+# 1. Build the React frontend (run once, or after any code change)
+cd ~/suvasis/tools/blogs/hackathon/ChessEcon/frontend
+VITE_WS_URL=ws://192.168.1.140:8008/ws pnpm build
+# 2. Remove any stale container/image if switching from the old GPU image
+cd ..
+docker-compose down
+docker rmi chessecon-dashboard:latest 2>/dev/null || true
+# 3. Start nginx serving the built files
+docker-compose up -d dashboard
+# 4. Verify
+docker-compose ps
+docker exec chessecon-dashboard ls /usr/share/nginx/html/
+# Should show: index.html  assets/
+```
+**Key lessons learned:**
+| Lesson | Detail |
+|---|---|
+| `VITE_*` vars are build-time | Must be set as shell env vars during `pnpm build`, not in Docker `environment:` |
+| `docker-compose restart` ≠ recreate | Volume/image changes require `down` + `up` to take effect |
+| Vite output path | Manus template outputs to `dist/public/`, not `dist/` — always check `vite.config.ts` `publicDir` |
+| Old image caching | After changing `image:` in `docker-compose.yml`, remove the old image with `docker rmi` before `up` |
+| esbuild server error is non-fatal | The `server/_core/index.ts` esbuild step fails on Lambda (no server env vars), but the Vite frontend build completes successfully before that step |
+---
+## Updated Current Status
+| Component | Status |
+|---|---|
+| Dashboard UI | **Running** — accessible at `http://192.168.1.140:3006` |
+| Backend server | Running on Lambda at port 8008 |
+| Qwen2.5-0.5B | Loaded on `cuda:1`, generating moves |
+| GRPO training | Active |
+| Dashboard ↔ Backend (LIVE mode) | Ready — click LIVE toggle in dashboard to connect |
+| Claude coaching | Disabled — `ANTHROPIC_API_KEY` not set |
+| Nevermined integration | Not implemented (deferred) |

frontend/.DS_Store ADDED Viewed

Binary file (6.15 kB). View file