File size: 6,228 Bytes
83027d5 3999deb 83027d5 3999deb 83027d5 3999deb 83027d5 3999deb 83027d5 3999deb 83027d5 3999deb 83027d5 886fcbd 3999deb 114f874 3999deb 114f874 3999deb 114f874 3999deb 114f874 3999deb 4a4c952 3999deb 83027d5 3999deb 4a4c952 3999deb 886fcbd 3999deb 886f0f5 3999deb 83027d5 3999deb 83027d5 3999deb 83027d5 3999deb | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 | ---
language: en
license: mit
library_name: transformers
tags:
- chess
- mamba
- moe
- kiyengine
- ssm
- mixture-of-experts
pipeline_tag: reinforcement-learning
---
# ♟️ KiyEngine V3 (Mamba-MoE)
> **"Where Linear Recurrence meets Sparse Intuition."**
**KiyEngine V3** is a high-performance chess evaluation model utilizing a hybrid **Mamba State-Space Model (SSM)** and **Sparse Mixture-of-Experts (MoE)** architecture. It is designed to provide deep positional understanding with the inference speed required for elite-level Blitz play.
---
## 🚀 Highlights
- **Architecture:** Mamba-SSM core for linear-time sequence modeling
- **MoE Strategy:** 32 total experts (8 per layer) with Top-2 Gated Routing
- **Training Achievement:** Final Converged Loss: **5.46**
- **Target Performance:** Designed to bridge the gap between neural intuition and traditional brute-force search
---
## 🧠 Model Architecture
Unlike traditional Transformers, KiyEngine V3 uses **Mamba** blocks to handle long-range game dependencies efficiently, coupled with a **Sparse MoE** layer to specialize in different phases of the game (Opening, Middlegame, Endgame).
### Hyperparameters
| Parameter | Value | Description |
|:----------|:------|:------------|
| `d_model` | 384 | Hidden dimension size |
| `n_layers` | 4 | Number of Mamba-MoE blocks |
| `n_experts` | 8 | Experts per layer (Total: 32) |
| `top_k` | 2 | Experts activated per token |
| `d_state` | 16 | SSM state dimension |
| `d_conv` | 4 | Convolution kernel size |
| `expansion_factor` | 2 | MLP expansion ratio |
| `vocab_size` | 768 | Input representation (Squares × Pieces) |
---
## 💻 Usage
You can load and test the "brain" of KiyEngine V3 directly via the `transformers` library:
```python
from transformers import AutoConfig, AutoModel
import torch
# Load the model
repo_id = "Kiy-K/KiyEngine-V3"
config = AutoConfig.from_pretrained(repo_id, trust_remote_code=True)
model = AutoModel.from_pretrained(repo_id, trust_remote_code=True)
# Set to evaluation mode
model.eval()
print("✅ KiyEngine V3 ready for inference.")
# Create a fake input(Batch=1, Seq_len=64)
dummy_input = torch.randint(0, 768, (1, 64))
with torch.no_grad():
# Gọi model
output = model(dummy_input)
# ✅ Cách lấy dữ liệu chuẩn từ KiyEngineOutput
print(f"🎉 Success!")
print(f"1. Policy Logits (Dự đoán nước đi): {output.policy_logits.shape}")
# Kỳ vọng: torch.Size([1, 768]) -> Dự đoán xác suất cho 768 nước đi có thể
print(f"2. Value (Đánh giá thế cờ): {output.value.shape}")
# Kỳ vọng: torch.Size([1, 1]) -> Điểm số từ -1 (Thua) đến 1 (Thắng)
print(f"3. Last Hidden State (Tư duy): {output.last_hidden_state.shape}")
# Kỳ vọng: torch.Size([1, 64, 384])
```
---
## 📈 Training Progress
The model was trained on **1.5M+ high-quality Lichess games**. The loss curve demonstrated exceptional convergence due to the MoE routing stability.
- **Initial Loss:** 7.78
- **Final Loss:** 5.46 (Epoch 10)
- **Optimizer:** AdamW with OneCycleLR
- **Training Time:** ~5 hours on Tesla P100.
### Loss Curve
*(Consider adding a training curve image here)*
---
## 📂 Repository Structure
```
KiyEngine-V3-Mamba-MoE/
├── model.safetensors # Optimized weights (272MB)
├── config.json # Model configuration
├── configuration_kiyengine.py # Custom config class
├── modeling_kiyengine.py # Core PyTorch implementation
└── README.md # This file
```
---
## 🎯 Performance
| Metric | Value |
|:-------|:------|
| Final Training Loss | 5.46 |
| Model Size | 272 MB |
| Parameters | 68.06M *(calculated from architecture)* |
| Inference Speed | TBD |
| Target ELO | TBD |
---
## 🛠️ Roadmap
- [x] Train V3 Mamba-MoE weights
- [x] Push to Hugging Face Hub
- [ ] Implement native Rust inference via `candle-core`
- [ ] Integrate with UCI protocol for GUI play (Arena, CuteChess)
- [ ] Benchmark against Stockfish and Leela Chess Zero
- [ ] Add ONNX export for deployment
- [ ] Create interactive demo on Hugging Face Spaces
---
## 🔬 Technical Details
### Why Mamba?
Traditional Transformers have quadratic complexity in sequence length, making them inefficient for long chess games. **Mamba's linear-time recurrence** allows the model to process entire games efficiently while maintaining long-range dependencies.
### Why MoE?
Chess has distinct phases (opening, middlegame, endgame) that require different strategic thinking. The **Mixture-of-Experts** architecture allows the model to:
- Specialize experts for different game phases
- Route positions to the most relevant expert
- Maintain parameter efficiency while increasing model capacity
---
## 📊 Dataset
- **Source:** Lichess Database
- **Games:** 1.5M+ high-quality games
---
## 🤝 Citation
If you use KiyEngine V3 in your research or projects, please cite:
```bibtex
@misc{kiyengine-v3-2026,
author = {Kiy-K},
title = {KiyEngine V3: Mamba-MoE Chess Evaluation Model},
year = {2026},
publisher = {Hugging Face},
howpublished = {\url{https://huggingface.co/Kiy-K/KiyEngine-V3}}
}
```
---
## 📝 License
This model is released under the **MIT License**. See the [LICENSE](LICENSE) file for details.
---
## 👤 Author
**Kiy-K**
*"Building the next generation of neural chess engines."*
- 🤗 Hugging Face: [@Kiy-K](https://huggingface.co/Kiy-K)
- 📧 Contact: [khoitruong071510@gmail.com]
---
## 🙏 Acknowledgments
- **Mamba:** Based on the [Mamba architecture](https://arxiv.org/abs/2312.00752) by Gu & Dao
- **Dataset:** Lichess Open Database
- **Inspiration:** Stockfish, Leela Chess Zero, and the broader chess AI community
---
## ⚠️ Limitations
- Model is currently a **neural network component** and requires integration with a search algorithm (e.g., MCTS, Alpha-Beta) for full chess engine functionality
- Performance may vary across different game phases
- Requires further validation against established benchmarks
---
<div align="center">
**Star this repo if you find it useful! ⭐**
*Made with ♟️ and 🤖 by Kiy-K*
</div> |