Shannon's Gambit (legacyaravind/shannons-gambit)

A self-improving chess intelligence. This repo holds the served network plus the checkpoint ladder (ladder.json) and the Inference Endpoint handler. The full system lives at github.com/aravinds-kannappan/Chess-Gambit-RL.

What the system is

A multi-agent engine: each position is routed to the method that owns it.

Agent Where it plays Method
MDP solved endgames (KRvK, KQvK) exact Bellman value iteration (optimal)
PPO low-material regime on-policy actor-critic RL
Reward (DQN) low-material regime off-policy, potential-based shaping
Neural opening / middlegame this network: AlphaZero-lite self-play + behavioural cloning

A phase router (agents/router.py) dispatches each move to the right agent. The network in this repo is the general full-board player and the bootstrap for self-play; it also serves the policy/value/WDL/rating predictions.

Stockfish is the benchmark, never a player

The agents never call Stockfish to choose a move. A separate backend evaluator (eval/benchmark.py) uses Stockfish only as a calibrated yardstick: it throttles Stockfish to known Elo bands (UCI_LimitStrength + UCI_Elo, with a Skill Level fallback below the floor), plays each agent a gauntlet, and fits a calibrated Elo (Bradley-Terry MLE). It also reports centipawn loss and top-1 agreement. That rating is the level each agent plays at and climbs as it learns.

The network

Multi-head residual network trained on real Lichess games. Heads: policy (next move), value + win/draw/loss (outcome), and player rating (Elo).

Final supervised training metrics

{
  "loss_policy": 0.2169,
  "loss_value": 0.0305,
  "loss_wdl": 0.0295,
  "loss_rating": 0.0312,
  "policy_acc": 0.966,
  "wdl_acc": 0.9903,
  "rating_mae_elo": 21.1,
  "epoch": 15
}

Input / output

  • Input: 18x8x8 board planes (see shannons_gambit/data/encode.py).
  • Output: policy logits over 4672 moves, scalar value in [-1, 1], WDL logits, standardised rating.

How it is served

  • HF Space (Docker + FastAPI): trains continuously by self-play and serves /move, /predict, /watch-move, /ladder, plus /calibrate (Stockfish-assessed Elo). New generations are versioned back to this repo so the ladder survives restarts.
  • Inference Endpoint: handler.py here loads model.pt and returns best move, WDL, value and rating for a FEN.
import requests
requests.post(
    "https://<your-endpoint>.endpoints.huggingface.cloud",
    headers={"Authorization": "Bearer <HF_TOKEN>"},
    json={"inputs": {"fen": "rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1"}},
).json()

Honest limitations

  • The MDP, PPO and reward agents are endgame specialists (validated against the exactly-solved table); the network carries the opening and middlegame.
  • On a free CPU, self-play is slow and the Elo ladder grows over hours; GPU bursts accelerate it. Any Elo is only meaningful once anchored by the Stockfish benchmark.

License: Apache-2.0.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Space using legacyaravind/shannons-gambit 1