avewright
/

chess-transformer-200m

Model card Files Files and versions

chess-transformer-200m / README.md

avewright's picture

Upload README.md with huggingface_hub

d4a3c06 verified 2 months ago

|

history blame contribute delete

1.65 kB

	---
	license: mit
	tags:
	- chess
	- transformer
	- policy-value
	datasets:
	- avewright/chess-positions-lichess-sf
	---

	# ChessTransformer200M

	A 204M parameter chess-native transformer trained on Stockfish-labeled positions.

	## Architecture
	- Encoder: FusedBoardEncoder (256d) — learned piece-color + square + context embeddings
	- Backbone: 16-layer Transformer (1024d, 16 heads, FFN 4096, GELU, norm_first)
	- Policy Head: SpatialPolicyHead (from×to square features, 512d)
	- Value Head: WDL (win/draw/loss) classification

	## Training
	- Dataset: avewright/chess-positions-lichess-sf (10.2M positions seen out of 48M available)
	- Steps: 10,000 optimizer steps (effective batch 1024)
	- Final Policy Loss: ~2.5 (estimated from loss curve)
	- Top-1 Accuracy: 18.4% (on 5K eval positions vs Stockfish best moves)
	- GPU: NVIDIA A40 46GB, FP16 + torch.compile
	- Training time: ~6 hours to step 10,000

	## Usage

	```python
	import torch
	from play import ChessTransformer200M, load_model, encode_board, get_model_move
	import chess

	model = load_model("best_model.pt", torch.device("cpu"))
	board = chess.Board()
	move, info = get_model_move(model, board, torch.device("cpu"))
	print(f"Best move: {move.uci()}, Top 5: {info['top_moves']}")
	```

	## Files
	- `best_model.pt` — Model weights only (816 MB)
	- `training_log.json` — Loss curve data
	- `config.json` — Architecture config

	## Known Issues
	- Training hit FP16 NaN at step ~13,800. Best checkpoint is step 10,000.
	- Model is only ~21% through 1 epoch of the 48M subset dataset.
	- Opens with 1.d4 as White. Plays reasonable chess but still early in training.