ChessMate Net — pretrained policy/value networks

Browser-deployable chess policy + value networks for ChessMate. Each version is a ResNet (224 filters × 10 residual blocks, ~10M params) trained by supervised distillation from a strong teacher, with an AlphaZero-style PUCT MCTS at play time.

Input: single-position 8×8×22 planes (pieces ×12, castling, side-to-move, en-passant, halfmove, attack maps), side-to-move oriented (board rotated for Black).
Outputs: policy — 4096 logits over from*64 + to (the parity-locked move encoding; promotion folds onto the from→to index, no underpromotion dim); value — scalar tanh in [-1, 1], side-to-move POV.
Training data: victorqueiroz/chessmate-positions.

Versions

Ver	Teacher	Train positions	Held-out policy top-1	Absolute Elo (vs Stockfish UCI_Elo ladder)
v5	lc0 (t1-256×10, full policy + WDL)	701,984¹	0.485²	~1595 (95% CI [1486, 1704])
v6-wdl ⚗️	lc0 (same as v5) + WDL value head	701,984	0.477	~1563 (95% CI [1471, 1655]) — strength-neutral, not shipped
v4	Stockfish d18 multipv-8	701,984	0.392	~1292 (95% CI [1230, 1353])
v3	Stockfish (earlier recipe)	416,619	0.321	— (pre-anchor)

¹ v5 = v4's exact positions re-labeled with lc0 (a clean teacher A/B — only the teacher changed). ² v5/v4 top-1 are each measured against their own teacher's labels and are not directly comparable across teachers; the Elo anchor is the fair cross-version metric.

The ~+300 Elo from v4→v5 comes from the teacher upgrade alone (Stockfish → lc0), at identical positions and scale — the Tier-1 distillation result.

Confirmation (held-out scorecard, v5 vs v4)

Beyond the game-play Elo, v5 and v4 were scored on the same 70,198-position held-out set (seed 42, same pipeline, vs Stockfish labels) — a deterministic same-methodology check:

metric	v4	v5
policy top-1 (= agrees with Stockfish's best move)	0.392	0.527
mean regret (cp)	118	72
blunder rate (>300cp)	0.130	0.077
value MAE	0.213	0.323
value ECE	0.076	0.273

v5's policy is decisively stronger — and teacher-agnostically so (lower regret/blunder vs Stockfish's own cp; it even matches Stockfish's best move more often than the Stockfish-distilled v4). That's what drives MCTS strength and confirms the Elo gain.

Honest caveat (resolved): v5's value is worse-calibrated against Stockfish targets (MAE/ECE up). This is a teacher-scale artifact — v5's value is trained to lc0's WDL scale, not Stockfish's tanh(cp/400) — not a strength regression (the Elo gate integrates policy + value and still shows ~+300). v5's Elo gate used an 18-opening fallback suite, so the game-play number is directional; the held-out scorecard above is the same-methodology confirmation.

A WDL value head does NOT fix this (tested in v6-wdl, see below): the caveat is about the target definition, not the head type.

v6-wdl — experimental WDL value head (⚗️ not shipped)

v6-wdl/ archives a head-type A/B: v5's exact recipe (lc0 teacher, v4's 701,984 positions) but with a 3-way WDL softmax value head (W/D/L, lc0's native output) instead of the scalar tanh. The question was whether WDL fixes v5's value-ECE caveat above.

Verdict: strength-neutral, and the caveat is a teacher-scale artifact — not the head type.

Elo: 1563 (95% CI [1471, 1655]) vs v5's 1595 [1486, 1704] — indistinguishable.
vs its own lc0 teacher, the WDL value is well-calibrated: ECE 0.040 (a faithful win-probability, q = P(win) − P(loss)).
but on the identical Stockfish held-out it does not beat v5: ECE 0.299 vs 0.273, policy top-1 0.510 vs 0.527 — marginally worse.

So switching the value-head type changes the representation, not the lc0-vs-Stockfish scale mismatch. v5 (tanh) remains the shipped net. The WDL head is retained for future self-play (genuine W/D/L probabilities for draw modeling / search uncertainty). Full analysis: ChessMate VQ-498.

Files

v5/keras_model.keras      v5/tfjs/{model.json,weights.bin}   v5/metrics.json
v4/keras_model.keras      v4/tfjs/{model.json,weights.bin}   v4/metrics.json
v3/keras_model.keras                                         v3/metrics.json   (archival; keras only)
v6-wdl/keras_model.keras  v6-wdl/tfjs/{model.json,weights.bin}  v6-wdl/metrics.json   (⚗️ experimental, not shipped)

Usage

Browser (TF.js LayersModel — what ChessMate ships):

import * as tf from '@tensorflow/tfjs';
const model = await tf.loadLayersModel('https://huggingface.co/victorqueiroz/chessmate-net/resolve/main/v5/tfjs/model.json');
const [policy, value] = model.predict(tf.zeros([1, 8, 8, 22]));

Python (Keras, legacy tf-keras 2.16):

import os; os.environ["TF_USE_LEGACY_KERAS"] = "1"
import tensorflow as tf
from huggingface_hub import hf_hub_download
m = tf.keras.models.load_model(
    hf_hub_download("victorqueiroz/chessmate-net", "v5/keras_model.keras"), compile=False)

Encoding the 8×8×22 input and decoding the 4096 policy must match ChessMate's parity-verified contract (pretrain/hf/eval_match.py / chess-core calculatePolicyIndex) — see the repo.

License & attribution

Weights: Apache-2.0. Distilled from Stockfish (GPL engine — its output labels do not encumber the student) and Leela Chess Zero (lc0). Trained on positions derived from mateuszgrzyb/lichess-stockfish-normalized (CC-BY-4.0) and Lichess data (CC0).

Downloads last month: 3

Inference Providers NEW

Other

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

victorqueiroz
/

chessmate-net