Alfredvc/chess-autocomplete-v1

This repository contains one chess-autocomplete model variant staged for inference.

Variant

Repository: Alfredvc/chess-autocomplete-v1
Architecture: ChessTransformer
Dimensions: 768 hidden, 12 heads, 12 blocks
Block loops: 1
Maximum half moves: 600
Input representation: Discrete
Norm / MLP: layernorm / swiglu
Native input tokenizer: RealizableMoveTokenizer with 4171 ids
Native output tokenizer: RealizableMoveTokenizer with 4135 ids
Metadata: Metadata tokens are part of the input token stream.

Interface

This is a metadata-token model. Inputs must begin with the metadata prefix:

[time_control_token, white_elo_token, black_elo_token, GAME_START, ...moves]

Use TIME_CONTROL_MISSING_WORD and RATING_MISSING_WORD when metadata is not available. The time-control token encodes one of four labels — bullet, blitz, rapid, or classical — each with its own token (see dataset_varlen.get_time_control_token).

The native PyTorch model returns logits over the output tokenizer vocabulary (4135 ids). The ONNX artifacts wrap that model and return bin_logits over raw 16-bit move words (65536 ids). These are different output interfaces.

PyTorch

import torch

from chess_autocomplete import protocol
from chess_autocomplete.huggingface import load_model_repo

loaded = load_model_repo(".")
raw_input = torch.tensor(
    [[
        protocol.TIME_CONTROL_MISSING_WORD,
        protocol.RATING_MISSING_WORD,
        protocol.RATING_MISSING_WORD,
        protocol.GAME_START,
    ]],
    dtype=torch.long,
)
input_ids = loaded.input_tokenizer.batch_encode(raw_input)
logits, _ = loaded.model(input_ids)

The PyTorch weights are stored in model.safetensors and loaded strictly into chess_autocomplete.models.ChessTransformer.

ONNX Runtime

import numpy as np
import onnxruntime as ort

from chess_autocomplete import protocol

session = ort.InferenceSession("model.onnx", providers=["CPUExecutionProvider"])
bin_moves = np.asarray(
    [[
        protocol.TIME_CONTROL_MISSING_WORD,
        protocol.RATING_MISSING_WORD,
        protocol.RATING_MISSING_WORD,
        protocol.GAME_START,
    ]],
    dtype=np.int32,
)
bin_logits = session.run(["bin_logits"], {"bin_moves": bin_moves})[0]

Three ONNX files are published:

model.onnx: FP32 compatibility artifact.
model-bf16-fp32compute.onnx: BF16-storage / FP32-compute artifact. Floating weights are stored as BF16 (halving weight size) and cast back to FP32 before every op, so the graph computes and outputs in FP32. It runs on any runtime without BF16 operator support, including onnxruntime-web (WebGPU/WASM) in the browser.
model-int8-blk128.onnx: weight-only block-wise INT8 artifact (smallest). Linear weights are stored as 8-bit block-wise MatMulNBits (block size 128) and dequantized to FP32 at compute time; the embedding and output head stay FP32 and activations are never quantized. This is the recommended browser download — it is WebGPU-native (the MatMulNBits op runs on the onnxruntime-web WebGPU EP) and the smallest of the three, with no measurable strength loss (see Performance).

All three ONNX artifacts use the bin_logits_v1 interface: bin_moves input with shape [batch, time] and bin_logits output with shape [batch, 65536]. The BF16 and INT8 artifacts are structurally checked before publishing and loaded with ONNX Runtime CPU as a compatibility smoke test.

Performance

Held-out human-move-match on the ALLIE / Maia-3 Table 1 benchmark — top-1 move match and legal-move NLL over the 2022-blitz test set (a clean, training-excluded held-out). Each ONNX artifact is scored through the exact model it ships (the INT8 row is the dequantized MatMulNBits weights, bit-faithful to the artifact), so these are the numbers you get at inference. Δtop-1 is relative to the FP32 artifact.

Artifact	Precision	Size (MB)	Top-1 move match %	Δ Top-1 (pp)	NLL (legal)	Perplexity
model.onnx	fp32	368	56.0739	0	1.33426	3.7972
model-bf16-fp32compute.onnx	bf16	231	56.0734	-0.0006	1.33429	3.7973
model-int8-blk128.onnx	int8	117	56.0671	-0.0068	1.33433	3.7975

published references (matched)
MAIA-3-79M			57.1
MAIA-3-23M			56.6
MAIA-3-5M			55.4
ALLIE-ADAPTIVE-SEARCH			55.9
ALLIE-POLICY			55.7
MAIA-2			52.0
MAIA*			51.6
GPT-3.5			53.7

The block-wise INT8 artifact is decision-equivalent to FP32 on this benchmark while being the smallest download; weight-only quantization keeps activations in FP32, which avoids the accuracy collapse of dynamic (activation) INT8.

Converting Logits To Moves

The model predicts move tokens, not SAN strings. Do not take an unconstrained argmax over the full vocabulary. Score the legal moves in the current board position and choose from that legal set.

For PyTorch, logits are over the native output tokenizer vocabulary:

from chess_autocomplete.chess_utils import Board

board = Board()
# Apply any moves already played:
# board.push(chess.Move.from_uci("e2e4"))

next_logits = logits[0, -1]
legal = []
for move in board.board.legal_moves:
    raw_bin_word = board.encode(move)
    token_id = loaded.output_tokenizer.encode(raw_bin_word)
    legal.append((float(next_logits[token_id]), move))

score, best_move = max(legal, key=lambda item: item[0])
print(best_move.uci())

For ONNX bin_logits_v1, logits are already indexed by raw 16-bit move word:

from chess_autocomplete.chess_utils import Board

board = Board()
# Apply any moves already played:
# board.push(chess.Move.from_uci("e2e4"))

next_logits = bin_logits[0]
legal = []
for move in board.board.legal_moves:
    raw_bin_word = board.encode(move)
    legal.append((float(next_logits[raw_bin_word]), move))

score, best_move = max(legal, key=lambda item: item[0])
print(best_move.uci())

Call board.push(best_move) after selecting a move so the next prediction is decoded against the updated legal move set.

Validation

Artifact	Validation	Status	Backend	Precision	Sample shape
model.safetensors	write	pass	safetensors.torch.save_file
model.safetensors	strict_load	pass	safetensors.torch.load_file
model.onnx	export	pass	torch.onnx	fp32	[2, 2]
model.onnx	runtime	pass	onnxruntime.CPUExecutionProvider	fp32	[2, 2]
model-bf16-fp32compute.onnx	export	pass	torch.onnx	bf16	[2, 2]
model-bf16-fp32compute.onnx	onnx_checker_initializer_dtype_and_runtime	pass	onnx.checker+onnxruntime.CPUExecutionProvider	bf16	[2, 2]
model-int8-blk128.onnx	quantize	pass	onnxruntime.MatMulNBitsQuantizer	int8
model-int8-blk128.onnx	onnx_checker_matmulnbits_and_runtime	pass	onnx.checker+onnxruntime.CPUExecutionProvider	int8	[2, 2]

Known Limitations

This model is trained for chess move autocomplete and is not a general chess engine. It does not include Transformers AutoModel or trust_remote_code support. Metadata-aware variants encode metadata as input tokens; no separate metadata tensor path is supported. Both ONNX artifacts compute in FP32; model-bf16-fp32compute.onnx only differs by storing its weights as BF16, so it needs no BF16 operator support at runtime. Use model.onnx if you specifically want FP32 weights on disk.

Downloads last month: 49

Safetensors

Model size

91.4M params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support