KiyEngine-V3 / README.md
Kiy-K's picture
Update README.md
886fcbd verified
---
language: en
license: mit
library_name: transformers
tags:
- chess
- mamba
- moe
- kiyengine
- ssm
- mixture-of-experts
pipeline_tag: reinforcement-learning
---
# β™ŸοΈ KiyEngine V3 (Mamba-MoE)
> **"Where Linear Recurrence meets Sparse Intuition."**
**KiyEngine V3** is a high-performance chess evaluation model utilizing a hybrid **Mamba State-Space Model (SSM)** and **Sparse Mixture-of-Experts (MoE)** architecture. It is designed to provide deep positional understanding with the inference speed required for elite-level Blitz play.
---
## πŸš€ Highlights
- **Architecture:** Mamba-SSM core for linear-time sequence modeling
- **MoE Strategy:** 32 total experts (8 per layer) with Top-2 Gated Routing
- **Training Achievement:** Final Converged Loss: **5.46**
- **Target Performance:** Designed to bridge the gap between neural intuition and traditional brute-force search
---
## 🧠 Model Architecture
Unlike traditional Transformers, KiyEngine V3 uses **Mamba** blocks to handle long-range game dependencies efficiently, coupled with a **Sparse MoE** layer to specialize in different phases of the game (Opening, Middlegame, Endgame).
### Hyperparameters
| Parameter | Value | Description |
|:----------|:------|:------------|
| `d_model` | 384 | Hidden dimension size |
| `n_layers` | 4 | Number of Mamba-MoE blocks |
| `n_experts` | 8 | Experts per layer (Total: 32) |
| `top_k` | 2 | Experts activated per token |
| `d_state` | 16 | SSM state dimension |
| `d_conv` | 4 | Convolution kernel size |
| `expansion_factor` | 2 | MLP expansion ratio |
| `vocab_size` | 768 | Input representation (Squares Γ— Pieces) |
---
## πŸ’» Usage
You can load and test the "brain" of KiyEngine V3 directly via the `transformers` library:
```python
from transformers import AutoConfig, AutoModel
import torch
# Load the model
repo_id = "Kiy-K/KiyEngine-V3"
config = AutoConfig.from_pretrained(repo_id, trust_remote_code=True)
model = AutoModel.from_pretrained(repo_id, trust_remote_code=True)
# Set to evaluation mode
model.eval()
print("βœ… KiyEngine V3 ready for inference.")
# Create a fake input(Batch=1, Seq_len=64)
dummy_input = torch.randint(0, 768, (1, 64))
with torch.no_grad():
# Gọi model
output = model(dummy_input)
# βœ… CΓ‘ch lαΊ₯y dα»― liệu chuαΊ©n tα»« KiyEngineOutput
print(f"πŸŽ‰ Success!")
print(f"1. Policy Logits (Dα»± Δ‘oΓ‘n nΖ°α»›c Δ‘i): {output.policy_logits.shape}")
# Kα»³ vọng: torch.Size([1, 768]) -> Dα»± Δ‘oΓ‘n xΓ‘c suαΊ₯t cho 768 nΖ°α»›c Δ‘i cΓ³ thể
print(f"2. Value (ĐÑnh giÑ thế cờ): {output.value.shape}")
# Kα»³ vọng: torch.Size([1, 1]) -> Điểm sα»‘ tα»« -1 (Thua) Δ‘αΊΏn 1 (ThαΊ―ng)
print(f"3. Last Hidden State (TΖ° duy): {output.last_hidden_state.shape}")
# Kỳ vọng: torch.Size([1, 64, 384])
```
---
## πŸ“ˆ Training Progress
The model was trained on **1.5M+ high-quality Lichess games**. The loss curve demonstrated exceptional convergence due to the MoE routing stability.
- **Initial Loss:** 7.78
- **Final Loss:** 5.46 (Epoch 10)
- **Optimizer:** AdamW with OneCycleLR
- **Training Time:** ~5 hours on Tesla P100.
### Loss Curve
*(Consider adding a training curve image here)*
---
## πŸ“‚ Repository Structure
```
KiyEngine-V3-Mamba-MoE/
β”œβ”€β”€ model.safetensors # Optimized weights (272MB)
β”œβ”€β”€ config.json # Model configuration
β”œβ”€β”€ configuration_kiyengine.py # Custom config class
β”œβ”€β”€ modeling_kiyengine.py # Core PyTorch implementation
└── README.md # This file
```
---
## 🎯 Performance
| Metric | Value |
|:-------|:------|
| Final Training Loss | 5.46 |
| Model Size | 272 MB |
| Parameters | 68.06M *(calculated from architecture)* |
| Inference Speed | TBD |
| Target ELO | TBD |
---
## πŸ› οΈ Roadmap
- [x] Train V3 Mamba-MoE weights
- [x] Push to Hugging Face Hub
- [ ] Implement native Rust inference via `candle-core`
- [ ] Integrate with UCI protocol for GUI play (Arena, CuteChess)
- [ ] Benchmark against Stockfish and Leela Chess Zero
- [ ] Add ONNX export for deployment
- [ ] Create interactive demo on Hugging Face Spaces
---
## πŸ”¬ Technical Details
### Why Mamba?
Traditional Transformers have quadratic complexity in sequence length, making them inefficient for long chess games. **Mamba's linear-time recurrence** allows the model to process entire games efficiently while maintaining long-range dependencies.
### Why MoE?
Chess has distinct phases (opening, middlegame, endgame) that require different strategic thinking. The **Mixture-of-Experts** architecture allows the model to:
- Specialize experts for different game phases
- Route positions to the most relevant expert
- Maintain parameter efficiency while increasing model capacity
---
## πŸ“Š Dataset
- **Source:** Lichess Database
- **Games:** 1.5M+ high-quality games
---
## 🀝 Citation
If you use KiyEngine V3 in your research or projects, please cite:
```bibtex
@misc{kiyengine-v3-2026,
author = {Kiy-K},
title = {KiyEngine V3: Mamba-MoE Chess Evaluation Model},
year = {2026},
publisher = {Hugging Face},
howpublished = {\url{https://huggingface.co/Kiy-K/KiyEngine-V3}}
}
```
---
## πŸ“ License
This model is released under the **MIT License**. See the [LICENSE](LICENSE) file for details.
---
## πŸ‘€ Author
**Kiy-K**
*"Building the next generation of neural chess engines."*
- πŸ€— Hugging Face: [@Kiy-K](https://huggingface.co/Kiy-K)
- πŸ“§ Contact: [khoitruong071510@gmail.com]
---
## πŸ™ Acknowledgments
- **Mamba:** Based on the [Mamba architecture](https://arxiv.org/abs/2312.00752) by Gu & Dao
- **Dataset:** Lichess Open Database
- **Inspiration:** Stockfish, Leela Chess Zero, and the broader chess AI community
---
## ⚠️ Limitations
- Model is currently a **neural network component** and requires integration with a search algorithm (e.g., MCTS, Alpha-Beta) for full chess engine functionality
- Performance may vary across different game phases
- Requires further validation against established benchmarks
---
<div align="center">
**Star this repo if you find it useful! ⭐**
*Made with β™ŸοΈ and πŸ€– by Kiy-K*
</div>