---
language: en
license: mit
library_name: transformers
tags:
- chess
- mamba
- moe
- kiyengine
- ssm
- mixture-of-experts
pipeline_tag: reinforcement-learning
---

# ♟️ KiyEngine V3 (Mamba-MoE)

> **"Where Linear Recurrence meets Sparse Intuition."**

**KiyEngine V3** is a high-performance chess evaluation model utilizing a hybrid **Mamba State-Space Model (SSM)** and **Sparse Mixture-of-Experts (MoE)** architecture. It is designed to provide deep positional understanding with the inference speed required for elite-level Blitz play.

---

## 🚀 Highlights

- **Architecture:** Mamba-SSM core for linear-time sequence modeling
- **MoE Strategy:** 32 total experts (8 per layer) with Top-2 Gated Routing
- **Training Achievement:** Final Converged Loss: **5.46**
- **Target Performance:** Designed to bridge the gap between neural intuition and traditional brute-force search

---

## 🧠 Model Architecture

Unlike traditional Transformers, KiyEngine V3 uses **Mamba** blocks to handle long-range game dependencies efficiently, coupled with a **Sparse MoE** layer to specialize in different phases of the game (Opening, Middlegame, Endgame).

### Hyperparameters

| Parameter | Value | Description |
|:----------|:------|:------------|
| `d_model` | 384 | Hidden dimension size |
| `n_layers` | 4 | Number of Mamba-MoE blocks |
| `n_experts` | 8 | Experts per layer (Total: 32) |
| `top_k` | 2 | Experts activated per token |
| `d_state` | 16 | SSM state dimension |
| `d_conv` | 4 | Convolution kernel size |
| `expansion_factor` | 2 | MLP expansion ratio |
| `vocab_size` | 768 | Input representation (Squares × Pieces) |

---

## 💻 Usage

You can load and test the "brain" of KiyEngine V3 directly via the `transformers` library:

```python
from transformers import AutoConfig, AutoModel
import torch

# Load the model
repo_id = "Kiy-K/KiyEngine-V3"
config = AutoConfig.from_pretrained(repo_id, trust_remote_code=True)
model = AutoModel.from_pretrained(repo_id, trust_remote_code=True)

# Set to evaluation mode
model.eval()

print("✅ KiyEngine V3 ready for inference.")

# Create a fake input(Batch=1, Seq_len=64)
dummy_input = torch.randint(0, 768, (1, 64))

with torch.no_grad():
    # Gọi model
    output = model(dummy_input)

    # ✅ Cách lấy dữ liệu chuẩn từ KiyEngineOutput
    print(f"🎉 Success!")
    print(f"1. Policy Logits (Dự đoán nước đi): {output.policy_logits.shape}") 
    # Kỳ vọng: torch.Size([1, 768]) -> Dự đoán xác suất cho 768 nước đi có thể
    
    print(f"2. Value (Đánh giá thế cờ):      {output.value.shape}")
    # Kỳ vọng: torch.Size([1, 1])   -> Điểm số từ -1 (Thua) đến 1 (Thắng)
    
    print(f"3. Last Hidden State (Tư duy):   {output.last_hidden_state.shape}")
    # Kỳ vọng: torch.Size([1, 64, 384])
```

---

## 📈 Training Progress

The model was trained on **1.5M+ high-quality Lichess games**. The loss curve demonstrated exceptional convergence due to the MoE routing stability.

- **Initial Loss:** 7.78
- **Final Loss:** 5.46 (Epoch 10)
- **Optimizer:** AdamW with OneCycleLR
- **Training Time:** ~5 hours on Tesla P100.

### Loss Curve
*(Consider adding a training curve image here)*

---

## 📂 Repository Structure

```
KiyEngine-V3-Mamba-MoE/
├── model.safetensors              # Optimized weights (272MB)
├── config.json                    # Model configuration
├── configuration_kiyengine.py     # Custom config class
├── modeling_kiyengine.py          # Core PyTorch implementation
└── README.md                      # This file
```

---

## 🎯 Performance

| Metric | Value |
|:-------|:------|
| Final Training Loss | 5.46 |
| Model Size | 272 MB |
| Parameters | 68.06M  *(calculated from architecture)* |
| Inference Speed | TBD |
| Target ELO | TBD |

---

## 🛠️ Roadmap

- [x] Train V3 Mamba-MoE weights
- [x] Push to Hugging Face Hub
- [ ] Implement native Rust inference via `candle-core`
- [ ] Integrate with UCI protocol for GUI play (Arena, CuteChess)
- [ ] Benchmark against Stockfish and Leela Chess Zero
- [ ] Add ONNX export for deployment
- [ ] Create interactive demo on Hugging Face Spaces

---

## 🔬 Technical Details

### Why Mamba?

Traditional Transformers have quadratic complexity in sequence length, making them inefficient for long chess games. **Mamba's linear-time recurrence** allows the model to process entire games efficiently while maintaining long-range dependencies.

### Why MoE?

Chess has distinct phases (opening, middlegame, endgame) that require different strategic thinking. The **Mixture-of-Experts** architecture allows the model to:
- Specialize experts for different game phases
- Route positions to the most relevant expert
- Maintain parameter efficiency while increasing model capacity

---

## 📊 Dataset

- **Source:** Lichess Database
- **Games:** 1.5M+ high-quality games

---

## 🤝 Citation

If you use KiyEngine V3 in your research or projects, please cite:

```bibtex
@misc{kiyengine-v3-2026,
  author = {Kiy-K},
  title = {KiyEngine V3: Mamba-MoE Chess Evaluation Model},
  year = {2026},
  publisher = {Hugging Face},
  howpublished = {\url{https://huggingface.co/Kiy-K/KiyEngine-V3}}
}
```

---

## 📝 License

This model is released under the **MIT License**. See the [LICENSE](LICENSE) file for details.

---

## 👤 Author

**Kiy-K**  
*"Building the next generation of neural chess engines."*

- 🤗 Hugging Face: [@Kiy-K](https://huggingface.co/Kiy-K)
- 📧 Contact: [khoitruong071510@gmail.com]

---

## 🙏 Acknowledgments

- **Mamba:** Based on the [Mamba architecture](https://arxiv.org/abs/2312.00752) by Gu & Dao
- **Dataset:** Lichess Open Database
- **Inspiration:** Stockfish, Leela Chess Zero, and the broader chess AI community

---

## ⚠️ Limitations

- Model is currently a **neural network component** and requires integration with a search algorithm (e.g., MCTS, Alpha-Beta) for full chess engine functionality
- Performance may vary across different game phases
- Requires further validation against established benchmarks

---

<div align="center">

**Star this repo if you find it useful! ⭐**

*Made with ♟️ and 🤖 by Kiy-K*

</div>