kaupane
/

ChessFormer-SL

@@ -1,169 +1,10 @@
 ---
-license: mit
 tags:
-- chess
-- transformer
-- reinforcement-learning
-- game-playing
-library_name: pytorch
 ---
-# ChessFormer-SL
-ChessFormer-SL is a transformer-based chess model trained via supervised learning on Stockfish evaluations. This model explores training chess engines without Monte Carlo Tree Search (MCTS), using only neural networks.
-## Model Description
-- **Model type**: Transformer for chess position evaluation and move prediction
-- **Language(s)**: Chess (FEN notation)
-- **License**: MIT
-- **Parameters**: 100.7M
-## Architecture
-ChessFormer uses a custom transformer architecture optimized for chess:
-- **Blocks**: 20 transformer layers
-- **Hidden size**: 640
-- **Attention heads**: 8
-- **Intermediate size**: 1728
-- **Features**: RMSNorm, SwiGLU activation, custom FEN tokenizer
-### Input Format
-The model processes FEN strings and repetition counts, tokenizing them into 75-token sequences representing:
-- 64 board square tokens (pieces + positional embeddings)
-- 9 metadata tokens (turn, castling, en passant, clocks, repetitions)
-- 2 special tokens (action, value)
-### Output Format
-- **Policy head**: Logits over 1,969 structurally valid chess moves
-- **Value head**: Position evaluation from current player's perspective
-## Training Details
-### Training Data
-- **Dataset**: `kaupane/lichess-2023-01-stockfish-annotated` (depth18 split)
-- **Size**: 56M positions with Stockfish evaluations
-- **Validation**: depth27 split
-### Training Procedure
-- **Method**: Supervised learning on Stockfish move recommendations and evaluations
-- **Objective**: Cross-entropy loss (moves) + MSE loss (values) + invalid move penalty
-- **Hardware**: RTX 4060Ti 16GB
-- **Duration**: ~2 weeks
-- **Checkpoints**: 20 total, this model is the final checkpoint
-### Training Metrics
-- **Action Loss**: /
-- **Value Loss**: /
-- **Invalid Loss**: /
-## Performance
-### Capabilities
-- ✅ Reasonable opening and endgame play
-- ✅ Fast inference without search
-- ✅ Better than next-token prediction chess models
-- ✅ Can defeat Stockfish occasionally with search enhancement
-### Limitations
-- ❌ Frequent tactical blunders in midgame
-- ❌ Estimated ELO ~1500 (informal assessment)
-- ❌ Struggles with complex tactical combinations
-- ❌ Tends to give away pieces ("free captures")
-## Usage
-### Installation
-```bash
-pip install torch transformers huggingface_hub chess
-# Download model.py from this repository
-```
-### Basic Usage
-```python
-import torch
-from model import ChessFormerModel
-# Load model
-model = ChessFormerModel.from_pretrained("kaupane/ChessFormer-SL")
-model.eval()
-# Analyze position
-fens = ["rnbqkbnr/pppppppp/8/8/4P3/8/PPPP1PPP/RNBQKBNR b KQkq e3 0 1"]
-repetitions = torch.tensor([1])
-with torch.no_grad():
-    move_logits, position_value = model(fens, repetitions)
-# Get best move (requires additional processing for legal moves)
-print(f"Position value: {position_value.item():.3f}")
-```
-### With Chess Engine Interface
-```python
-from engine import Engine, ChessformerConfig
-import chess
-# Create engine
-config = ChessformerConfig(
-    chessformer=model,
-    temperature=0.5,
-    depth=2  # Enable search enhancement
-)
-engine = Engine(type="chessformer", chessformer_config=config)
-# Play move
-board = chess.Board()
-move_uci, value = engine.move(board)
-print(f"Suggested move: {move_uci}, Value: {value:.3f}")
-```
-## Limitations and Bias
-### Technical Limitations
-- **Tactical weakness**: Prone to hanging pieces and missing simple tactics
-- **Computational inefficiency**: FEN tokenization creates training bottlenecks, preprocess the entire dataset before training should be benefical
-### Potential Biases
-- Trained exclusively on Stockfish evaluations, may inherit engine biases
-- May not generalize to unconventional openings or endgames
-### Known Issues
-- Piece embeddings have consistently lower norms than positional embeddings
-- Model sometimes assigns probability (though unlikely, ~3%) to invalid moves despite training penalty
-- Performance degrades without search enhancement
-## Ethical Considerations
-This model is intended for:
-- ✅ Educational purposes and chess learning
-- ✅ Research into neural chess architectures
-- ✅ Developing chess training tools
-Not recommended for:
-- ❌ Competitive chess tournaments
-- ❌ Production chess engines without extensive testing
-- ❌ Applications requiring reliable tactical calculation
-## Additional Information
-- **Repository**: [GitHub link](https://github.com/Mtrya/chess-transformer)
-- **Demo**: [HuggingFace Space Demo](https://huggingface.co/spaces/kaupane/Chessformer_Demo)
-- **Related**: [ChessFormer-RL](https://huggingface.co/kaupane/ChessFormer-RL) (RL training experiment)

 ---
 tags:
+- model_hub_mixin
+- pytorch_model_hub_mixin
 ---
+This model has been pushed to the Hub using the [PytorchModelHubMixin](https://huggingface.co/docs/huggingface_hub/package_reference/mixins#huggingface_hub.PyTorchModelHubMixin) integration:
+- Code: [More Information Needed]
+- Paper: [More Information Needed]
+- Docs: [More Information Needed]

model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:07655925064152222a4958baf270adfbe1e31b7eb1baa35e03db75bed1238714
 size 402931432

 version https://git-lfs.github.com/spec/v1
+oid sha256:fb0cb41c159c82f1c8cf5fb9747bce35c639940dc4f78a999d7060d451450e48
 size 402931432