|
|
--- |
|
|
language: en |
|
|
license: mit |
|
|
library_name: transformers |
|
|
tags: |
|
|
- chess |
|
|
- mamba |
|
|
- moe |
|
|
- kiyengine |
|
|
- ssm |
|
|
- mixture-of-experts |
|
|
pipeline_tag: reinforcement-learning |
|
|
--- |
|
|
|
|
|
# βοΈ KiyEngine V3 (Mamba-MoE) |
|
|
|
|
|
> **"Where Linear Recurrence meets Sparse Intuition."** |
|
|
|
|
|
**KiyEngine V3** is a high-performance chess evaluation model utilizing a hybrid **Mamba State-Space Model (SSM)** and **Sparse Mixture-of-Experts (MoE)** architecture. It is designed to provide deep positional understanding with the inference speed required for elite-level Blitz play. |
|
|
|
|
|
--- |
|
|
|
|
|
## π Highlights |
|
|
|
|
|
- **Architecture:** Mamba-SSM core for linear-time sequence modeling |
|
|
- **MoE Strategy:** 32 total experts (8 per layer) with Top-2 Gated Routing |
|
|
- **Training Achievement:** Final Converged Loss: **5.46** |
|
|
- **Target Performance:** Designed to bridge the gap between neural intuition and traditional brute-force search |
|
|
|
|
|
--- |
|
|
|
|
|
## π§ Model Architecture |
|
|
|
|
|
Unlike traditional Transformers, KiyEngine V3 uses **Mamba** blocks to handle long-range game dependencies efficiently, coupled with a **Sparse MoE** layer to specialize in different phases of the game (Opening, Middlegame, Endgame). |
|
|
|
|
|
### Hyperparameters |
|
|
|
|
|
| Parameter | Value | Description | |
|
|
|:----------|:------|:------------| |
|
|
| `d_model` | 384 | Hidden dimension size | |
|
|
| `n_layers` | 4 | Number of Mamba-MoE blocks | |
|
|
| `n_experts` | 8 | Experts per layer (Total: 32) | |
|
|
| `top_k` | 2 | Experts activated per token | |
|
|
| `d_state` | 16 | SSM state dimension | |
|
|
| `d_conv` | 4 | Convolution kernel size | |
|
|
| `expansion_factor` | 2 | MLP expansion ratio | |
|
|
| `vocab_size` | 768 | Input representation (Squares Γ Pieces) | |
|
|
|
|
|
--- |
|
|
|
|
|
## π» Usage |
|
|
|
|
|
You can load and test the "brain" of KiyEngine V3 directly via the `transformers` library: |
|
|
|
|
|
```python |
|
|
from transformers import AutoConfig, AutoModel |
|
|
import torch |
|
|
|
|
|
# Load the model |
|
|
repo_id = "Kiy-K/KiyEngine-V3" |
|
|
config = AutoConfig.from_pretrained(repo_id, trust_remote_code=True) |
|
|
model = AutoModel.from_pretrained(repo_id, trust_remote_code=True) |
|
|
|
|
|
# Set to evaluation mode |
|
|
model.eval() |
|
|
|
|
|
print("β
KiyEngine V3 ready for inference.") |
|
|
|
|
|
# Create a fake input(Batch=1, Seq_len=64) |
|
|
dummy_input = torch.randint(0, 768, (1, 64)) |
|
|
|
|
|
with torch.no_grad(): |
|
|
# Gα»i model |
|
|
output = model(dummy_input) |
|
|
|
|
|
# β
CΓ‘ch lαΊ₯y dα»― liα»u chuαΊ©n tα»« KiyEngineOutput |
|
|
print(f"π Success!") |
|
|
print(f"1. Policy Logits (Dα»± ΔoΓ‘n nΖ°α»c Δi): {output.policy_logits.shape}") |
|
|
# Kα»³ vα»ng: torch.Size([1, 768]) -> Dα»± ΔoΓ‘n xΓ‘c suαΊ₯t cho 768 nΖ°α»c Δi cΓ³ thα» |
|
|
|
|
|
print(f"2. Value (ΔΓ‘nh giΓ‘ thαΊΏ cα»): {output.value.shape}") |
|
|
# Kα»³ vα»ng: torch.Size([1, 1]) -> Δiα»m sα» tα»« -1 (Thua) ΔαΊΏn 1 (ThαΊ―ng) |
|
|
|
|
|
print(f"3. Last Hidden State (TΖ° duy): {output.last_hidden_state.shape}") |
|
|
# Kα»³ vα»ng: torch.Size([1, 64, 384]) |
|
|
``` |
|
|
|
|
|
--- |
|
|
|
|
|
## π Training Progress |
|
|
|
|
|
The model was trained on **1.5M+ high-quality Lichess games**. The loss curve demonstrated exceptional convergence due to the MoE routing stability. |
|
|
|
|
|
- **Initial Loss:** 7.78 |
|
|
- **Final Loss:** 5.46 (Epoch 10) |
|
|
- **Optimizer:** AdamW with OneCycleLR |
|
|
- **Training Time:** ~5 hours on Tesla P100. |
|
|
|
|
|
### Loss Curve |
|
|
*(Consider adding a training curve image here)* |
|
|
|
|
|
--- |
|
|
|
|
|
## π Repository Structure |
|
|
|
|
|
``` |
|
|
KiyEngine-V3-Mamba-MoE/ |
|
|
βββ model.safetensors # Optimized weights (272MB) |
|
|
βββ config.json # Model configuration |
|
|
βββ configuration_kiyengine.py # Custom config class |
|
|
βββ modeling_kiyengine.py # Core PyTorch implementation |
|
|
βββ README.md # This file |
|
|
``` |
|
|
|
|
|
--- |
|
|
|
|
|
## π― Performance |
|
|
|
|
|
| Metric | Value | |
|
|
|:-------|:------| |
|
|
| Final Training Loss | 5.46 | |
|
|
| Model Size | 272 MB | |
|
|
| Parameters | 68.06M *(calculated from architecture)* | |
|
|
| Inference Speed | TBD | |
|
|
| Target ELO | TBD | |
|
|
|
|
|
--- |
|
|
|
|
|
## π οΈ Roadmap |
|
|
|
|
|
- [x] Train V3 Mamba-MoE weights |
|
|
- [x] Push to Hugging Face Hub |
|
|
- [ ] Implement native Rust inference via `candle-core` |
|
|
- [ ] Integrate with UCI protocol for GUI play (Arena, CuteChess) |
|
|
- [ ] Benchmark against Stockfish and Leela Chess Zero |
|
|
- [ ] Add ONNX export for deployment |
|
|
- [ ] Create interactive demo on Hugging Face Spaces |
|
|
|
|
|
--- |
|
|
|
|
|
## π¬ Technical Details |
|
|
|
|
|
### Why Mamba? |
|
|
|
|
|
Traditional Transformers have quadratic complexity in sequence length, making them inefficient for long chess games. **Mamba's linear-time recurrence** allows the model to process entire games efficiently while maintaining long-range dependencies. |
|
|
|
|
|
### Why MoE? |
|
|
|
|
|
Chess has distinct phases (opening, middlegame, endgame) that require different strategic thinking. The **Mixture-of-Experts** architecture allows the model to: |
|
|
- Specialize experts for different game phases |
|
|
- Route positions to the most relevant expert |
|
|
- Maintain parameter efficiency while increasing model capacity |
|
|
|
|
|
--- |
|
|
|
|
|
## π Dataset |
|
|
|
|
|
- **Source:** Lichess Database |
|
|
- **Games:** 1.5M+ high-quality games |
|
|
|
|
|
--- |
|
|
|
|
|
## π€ Citation |
|
|
|
|
|
If you use KiyEngine V3 in your research or projects, please cite: |
|
|
|
|
|
```bibtex |
|
|
@misc{kiyengine-v3-2026, |
|
|
author = {Kiy-K}, |
|
|
title = {KiyEngine V3: Mamba-MoE Chess Evaluation Model}, |
|
|
year = {2026}, |
|
|
publisher = {Hugging Face}, |
|
|
howpublished = {\url{https://huggingface.co/Kiy-K/KiyEngine-V3}} |
|
|
} |
|
|
``` |
|
|
|
|
|
--- |
|
|
|
|
|
## π License |
|
|
|
|
|
This model is released under the **MIT License**. See the [LICENSE](LICENSE) file for details. |
|
|
|
|
|
--- |
|
|
|
|
|
## π€ Author |
|
|
|
|
|
**Kiy-K** |
|
|
*"Building the next generation of neural chess engines."* |
|
|
|
|
|
- π€ Hugging Face: [@Kiy-K](https://huggingface.co/Kiy-K) |
|
|
- π§ Contact: [khoitruong071510@gmail.com] |
|
|
|
|
|
--- |
|
|
|
|
|
## π Acknowledgments |
|
|
|
|
|
- **Mamba:** Based on the [Mamba architecture](https://arxiv.org/abs/2312.00752) by Gu & Dao |
|
|
- **Dataset:** Lichess Open Database |
|
|
- **Inspiration:** Stockfish, Leela Chess Zero, and the broader chess AI community |
|
|
|
|
|
--- |
|
|
|
|
|
## β οΈ Limitations |
|
|
|
|
|
- Model is currently a **neural network component** and requires integration with a search algorithm (e.g., MCTS, Alpha-Beta) for full chess engine functionality |
|
|
- Performance may vary across different game phases |
|
|
- Requires further validation against established benchmarks |
|
|
|
|
|
--- |
|
|
|
|
|
<div align="center"> |
|
|
|
|
|
**Star this repo if you find it useful! β** |
|
|
|
|
|
*Made with βοΈ and π€ by Kiy-K* |
|
|
|
|
|
</div> |