| --- |
| license: apache-2.0 |
| library_name: transformers |
| tags: |
| - chess |
| - sefl-play |
| pipeline_tag: text-generation |
| base_model: |
| - FlameF0X/ChessSLM |
| --- |
| |
| # ChessSLM-RL |
|
|
| **ChessSLM-RL** is the improve version of **ChessSLM** (a small language model designed to play chess using natural language move generation.) by using RL (Reinforcement LeanLearning) to make the model to hallucinated less and play a bit more conscious. |
| Despite having only **30M parameters**, it is capable of competing with and occasionally outperforming larger language models in chess-playing tasks. |
|
|
| The model is based on the ChessSLM pre-train model, fine-tuned using RL and Stockfish to make the model to play more legal moves and attempt fewer illegal moves. |
|
|
| Play against ChessSLM [here](https://flamef0x.github.io/other/chess). |
|
|
| --- |
|
|
| ## Overview |
|
|
| - **Architecture:** GPT-2 |
| - **Parameters:** ~30M |
| - **Training data:** Self-Play |
| - **Task:** Autoregressive chess move generation |
|
|
| --- |
|
|
| ## Capabilities |
|
|
| ChessSLM can play chess by generating moves sequentially in SAN notation. |
| It has been evaluated in matches against several language models, including: |
|
|
| - Claude |
| - Gemini |
| - Qwen |
| - GPT-2 |
| - GPT-Neo |
| - Pythia |
| - LLaMA |
| - Mistral |
| - other small chess-oriented models |
|
|
| The model achieves an **Elo rating of approximately {TBD}**, averaging **around ~{TBD} Elo** against other language models despite its small size. |
|
|
| --- |
|
|
| ## Benchmark Results |
|
|
| | Model | Elo Rating | |
| |------|------------| |
| | EleutherAI/pythia-70m-deduped | 1113 | |
| | nlpguy/amdchess-v9 | 1094 | |
| | nlpguy/smolchess-v2 | 1093 | |
| | mlabonne/chesspythia-70m | 1088 | |
| | **FlameF0X/ChessSLM** | **1087** | |
| | DedeProGames/mini-chennus | 1083 | |
| | distilbert/distilgpt2 | 1061 | |
| | Locutusque/TinyMistral-248M-v2.5 | 1061 | |
| | facebook/opt-125m | 1057 | |
| | mlabonne/grandpythia-200k-70m | 1050 | |
| | DedeProGames/Chesser-248K-Mini | 1048 | |
| | bharathrajcl/chess_llama_68m | 1046 | |
|
|
| --- |
|
|
| ## Limitations |
|
|
| Like many language-model-based chess systems, ChessSLM has several limitations: |
|
|
| - **Illegal move hallucinations:** The model may occasionally generate moves that violate chess rules. |
| - **No board-state verification:** Moves are generated purely from learned patterns rather than a validated game state. |
| - **Limited strategic depth:** While competitive at lower Elo levels, it cannot match dedicated chess engines. |
|
|
| These limitations are common for **pure language-model chess agents** that do not use external rule engines. |
|
|
| --- |
|
|
| ## Future Improvements |
|
|
| Potential improvements include: |
|
|
| - Adding **move legality filtering** |
| - Integrating **board-state validation** |
|
|
| --- |
|
|
| ## Summary |
|
|
| ChessSLM shows that **very small language models can achieve meaningful chess performance** when trained on domain-specific data. |
| It serves as a lightweight baseline for exploring **LLM-based chess agents** and **specialized small language models (SLMs)**. |