chessgpt-medium
ChessGPT model trained for mechanistic interpretability research.
Model Details
- Model Size: 0.0M parameters (approximate)
- Architecture: GPT-style transformer
- Vocabulary: 4,211 tokens (4,208 UCI chess moves + 3 special tokens)
- Context Length: 256
- Layers: 12
- Hidden Size: 768
- Attention Heads: 12
Training Configuration
- Dataset: Lichess/standard-chess-games
- Min Elo: 1800
- Min Moves: 10
- Batch Size: 32
- Learning Rate: 3e-4
- Epochs: 10
Metrics
- loss: 1.1781
- accuracy: 0.7051
- perplexity: 3.2484
Usage
from src.model import ChessGPT, load_config_from_yaml
from src.training.hf_utils import load_model_from_hub
# Load model from Hub
model = load_model_from_hub("taj-gillin/chessgpt")
# Or load from config
config = load_config_from_yaml("configs/model/medium.yaml")
model = ChessGPT(config)
# ... load weights ...
Research
This model is part of mechanistic interpretability research on chess-playing transformers. The goal is to understand what internal representations and algorithms the model learns.
Citation
If you use this model in your research, please cite:
@misc{chessgpt2024,
title={ChessGPT: Mechanistic Interpretability of Chess Transformers},
author={Your Name},
year={2024},
url={https://huggingface.co/taj-gillin/chessgpt}
}
- Downloads last month
- 10
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support