Instructions to use victorqueiroz/chessmate-net with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- TF-Keras
How to use victorqueiroz/chessmate-net with TF-Keras:
# Note: 'keras<3.x' or 'tf_keras' must be installed (legacy) # See https://github.com/keras-team/tf-keras for more details. from huggingface_hub import from_pretrained_keras model = from_pretrained_keras("victorqueiroz/chessmate-net") - Notebooks
- Google Colab
- Kaggle
ChessMate Net — pretrained policy/value networks
Browser-deployable chess policy + value networks for ChessMate. Each version is a ResNet (224 filters × 10 residual blocks, ~10M params) trained by supervised distillation from a strong teacher, with an AlphaZero-style PUCT MCTS at play time.
- Input: single-position
8×8×22planes (pieces ×12, castling, side-to-move, en-passant, halfmove, attack maps), side-to-move oriented (board rotated for Black). - Outputs:
policy— 4096 logits overfrom*64 + to(the parity-locked move encoding; promotion folds onto the from→to index, no underpromotion dim);value— scalartanhin[-1, 1], side-to-move POV. - Training data:
victorqueiroz/chessmate-positions.
Versions
| Ver | Teacher | Train positions | Held-out policy top-1 | Absolute Elo (vs Stockfish UCI_Elo ladder) |
|---|---|---|---|---|
| v5 | lc0 (t1-256×10, full policy + WDL) | 701,984¹ | 0.485² | ~1595 (95% CI [1486, 1704]) |
| v6-wdl ⚗️ | lc0 (same as v5) + WDL value head | 701,984 | 0.477 | ~1563 (95% CI [1471, 1655]) — strength-neutral, not shipped |
| v4 | Stockfish d18 multipv-8 | 701,984 | 0.392 | ~1292 (95% CI [1230, 1353]) |
| v3 | Stockfish (earlier recipe) | 416,619 | 0.321 | — (pre-anchor) |
¹ v5 = v4's exact positions re-labeled with lc0 (a clean teacher A/B — only the teacher changed). ² v5/v4 top-1 are each measured against their own teacher's labels and are not directly comparable across teachers; the Elo anchor is the fair cross-version metric.
The ~+300 Elo from v4→v5 comes from the teacher upgrade alone (Stockfish → lc0), at identical positions and scale — the Tier-1 distillation result.
Confirmation (held-out scorecard, v5 vs v4)
Beyond the game-play Elo, v5 and v4 were scored on the same 70,198-position held-out set (seed 42, same pipeline, vs Stockfish labels) — a deterministic same-methodology check:
| metric | v4 | v5 |
|---|---|---|
| policy top-1 (= agrees with Stockfish's best move) | 0.392 | 0.527 |
| mean regret (cp) | 118 | 72 |
| blunder rate (>300cp) | 0.130 | 0.077 |
| value MAE | 0.213 | 0.323 |
| value ECE | 0.076 | 0.273 |
v5's policy is decisively stronger — and teacher-agnostically so (lower regret/blunder vs Stockfish's own cp; it even matches Stockfish's best move more often than the Stockfish-distilled v4). That's what drives MCTS strength and confirms the Elo gain.
Honest caveat (resolved): v5's value is worse-calibrated against Stockfish targets (MAE/ECE up). This is a teacher-scale artifact — v5's value is trained to lc0's WDL scale, not Stockfish's
tanh(cp/400)— not a strength regression (the Elo gate integrates policy + value and still shows ~+300). v5's Elo gate used an 18-opening fallback suite, so the game-play number is directional; the held-out scorecard above is the same-methodology confirmation.A WDL value head does NOT fix this (tested in v6-wdl, see below): the caveat is about the target definition, not the head type.
v6-wdl — experimental WDL value head (⚗️ not shipped)
v6-wdl/ archives a head-type A/B: v5's exact recipe (lc0 teacher, v4's 701,984
positions) but with a 3-way WDL softmax value head (W/D/L, lc0's native output)
instead of the scalar tanh. The question was whether WDL fixes v5's value-ECE
caveat above.
Verdict: strength-neutral, and the caveat is a teacher-scale artifact — not the head type.
- Elo: 1563 (95% CI [1471, 1655]) vs v5's 1595 [1486, 1704] — indistinguishable.
- vs its own lc0 teacher, the WDL value is well-calibrated: ECE 0.040 (a faithful win-probability, q = P(win) − P(loss)).
- but on the identical Stockfish held-out it does not beat v5: ECE 0.299 vs 0.273, policy top-1 0.510 vs 0.527 — marginally worse.
So switching the value-head type changes the representation, not the lc0-vs-Stockfish scale mismatch. v5 (tanh) remains the shipped net. The WDL head is retained for future self-play (genuine W/D/L probabilities for draw modeling / search uncertainty). Full analysis: ChessMate VQ-498.
Files
v5/keras_model.keras v5/tfjs/{model.json,weights.bin} v5/metrics.json
v4/keras_model.keras v4/tfjs/{model.json,weights.bin} v4/metrics.json
v3/keras_model.keras v3/metrics.json (archival; keras only)
v6-wdl/keras_model.keras v6-wdl/tfjs/{model.json,weights.bin} v6-wdl/metrics.json (⚗️ experimental, not shipped)
Usage
Browser (TF.js LayersModel — what ChessMate ships):
import * as tf from '@tensorflow/tfjs';
const model = await tf.loadLayersModel('https://huggingface.co/victorqueiroz/chessmate-net/resolve/main/v5/tfjs/model.json');
const [policy, value] = model.predict(tf.zeros([1, 8, 8, 22]));
Python (Keras, legacy tf-keras 2.16):
import os; os.environ["TF_USE_LEGACY_KERAS"] = "1"
import tensorflow as tf
from huggingface_hub import hf_hub_download
m = tf.keras.models.load_model(
hf_hub_download("victorqueiroz/chessmate-net", "v5/keras_model.keras"), compile=False)
Encoding the 8×8×22 input and decoding the 4096 policy must match ChessMate's
parity-verified contract (pretrain/hf/eval_match.py / chess-core
calculatePolicyIndex) — see the repo.
License & attribution
Weights: Apache-2.0. Distilled from Stockfish (GPL engine — its output
labels do not encumber the student) and Leela Chess Zero (lc0). Trained on
positions derived from
mateuszgrzyb/lichess-stockfish-normalized
(CC-BY-4.0) and Lichess data (CC0).
- Downloads last month
- 3