chessgpt-medium

ChessGPT model trained for mechanistic interpretability research.

Model Details

Model Size: 0.0M parameters (approximate)
Architecture: GPT-style transformer
Vocabulary: 4,211 tokens (4,208 UCI chess moves + 3 special tokens)
Context Length: 256
Layers: 12
Hidden Size: 768
Attention Heads: 12

Training Configuration

Dataset: Lichess/standard-chess-games
Min Elo: 1800
Min Moves: 10
Batch Size: 32
Learning Rate: 3e-4
Epochs: 10

Metrics

loss: 1.1781
accuracy: 0.7051
perplexity: 3.2484

Usage

from src.model import ChessGPT, load_config_from_yaml
from src.training.hf_utils import load_model_from_hub

# Load model from Hub
model = load_model_from_hub("taj-gillin/chessgpt")

# Or load from config
config = load_config_from_yaml("configs/model/medium.yaml")
model = ChessGPT(config)
# ... load weights ...

Research

This model is part of mechanistic interpretability research on chess-playing transformers. The goal is to understand what internal representations and algorithms the model learns.

Citation

If you use this model in your research, please cite:

@misc{chessgpt2024,
  title={ChessGPT: Mechanistic Interpretability of Chess Transformers},
  author={Your Name},
  year={2024},
  url={https://huggingface.co/taj-gillin/chessgpt}
}

Downloads last month: 6

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support