chessgpt-medium

ChessGPT model trained for mechanistic interpretability research.

Model Details

  • Model Size: 0.0M parameters (approximate)
  • Architecture: GPT-style transformer
  • Vocabulary: 4,211 tokens (4,208 UCI chess moves + 3 special tokens)
  • Context Length: 256
  • Layers: 12
  • Hidden Size: 768
  • Attention Heads: 12

Training Configuration

  • Dataset: Lichess/standard-chess-games
  • Min Elo: 1800
  • Min Moves: 10
  • Batch Size: 32
  • Learning Rate: 3e-4
  • Epochs: 10

Metrics

  • loss: 1.1781
  • accuracy: 0.7051
  • perplexity: 3.2484

Usage

from src.model import ChessGPT, load_config_from_yaml
from src.training.hf_utils import load_model_from_hub

# Load model from Hub
model = load_model_from_hub("taj-gillin/chessgpt")

# Or load from config
config = load_config_from_yaml("configs/model/medium.yaml")
model = ChessGPT(config)
# ... load weights ...

Research

This model is part of mechanistic interpretability research on chess-playing transformers. The goal is to understand what internal representations and algorithms the model learns.

Citation

If you use this model in your research, please cite:

@misc{chessgpt2024,
  title={ChessGPT: Mechanistic Interpretability of Chess Transformers},
  author={Your Name},
  year={2024},
  url={https://huggingface.co/taj-gillin/chessgpt}
}
Downloads last month
10
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support