File size: 6,228 Bytes
83027d5
 
 
 
 
 
 
 
3999deb
 
 
 
83027d5
 
 
 
3999deb
83027d5
3999deb
83027d5
3999deb
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
83027d5
3999deb
 
 
 
 
83027d5
 
 
3999deb
83027d5
 
886fcbd
3999deb
 
 
 
 
 
 
 
114f874
3999deb
114f874
3999deb
114f874
3999deb
114f874
 
 
 
 
 
 
 
 
 
 
3999deb
 
 
 
 
 
 
 
 
 
 
4a4c952
3999deb
 
 
 
 
 
 
 
 
 
 
 
 
 
 
83027d5
 
3999deb
 
 
 
 
 
 
 
4a4c952
3999deb
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
886fcbd
3999deb
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
886f0f5
3999deb
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
83027d5
3999deb
83027d5
3999deb
83027d5
3999deb
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
---
language: en
license: mit
library_name: transformers
tags:
- chess
- mamba
- moe
- kiyengine
- ssm
- mixture-of-experts
pipeline_tag: reinforcement-learning
---

# ♟️ KiyEngine V3 (Mamba-MoE)

> **"Where Linear Recurrence meets Sparse Intuition."**

**KiyEngine V3** is a high-performance chess evaluation model utilizing a hybrid **Mamba State-Space Model (SSM)** and **Sparse Mixture-of-Experts (MoE)** architecture. It is designed to provide deep positional understanding with the inference speed required for elite-level Blitz play.

---

## 🚀 Highlights

- **Architecture:** Mamba-SSM core for linear-time sequence modeling
- **MoE Strategy:** 32 total experts (8 per layer) with Top-2 Gated Routing
- **Training Achievement:** Final Converged Loss: **5.46**
- **Target Performance:** Designed to bridge the gap between neural intuition and traditional brute-force search

---

## 🧠 Model Architecture

Unlike traditional Transformers, KiyEngine V3 uses **Mamba** blocks to handle long-range game dependencies efficiently, coupled with a **Sparse MoE** layer to specialize in different phases of the game (Opening, Middlegame, Endgame).

### Hyperparameters

| Parameter | Value | Description |
|:----------|:------|:------------|
| `d_model` | 384 | Hidden dimension size |
| `n_layers` | 4 | Number of Mamba-MoE blocks |
| `n_experts` | 8 | Experts per layer (Total: 32) |
| `top_k` | 2 | Experts activated per token |
| `d_state` | 16 | SSM state dimension |
| `d_conv` | 4 | Convolution kernel size |
| `expansion_factor` | 2 | MLP expansion ratio |
| `vocab_size` | 768 | Input representation (Squares × Pieces) |

---

## 💻 Usage

You can load and test the "brain" of KiyEngine V3 directly via the `transformers` library:

```python
from transformers import AutoConfig, AutoModel
import torch

# Load the model
repo_id = "Kiy-K/KiyEngine-V3"
config = AutoConfig.from_pretrained(repo_id, trust_remote_code=True)
model = AutoModel.from_pretrained(repo_id, trust_remote_code=True)

# Set to evaluation mode
model.eval()

print("✅ KiyEngine V3 ready for inference.")

# Create a fake input(Batch=1, Seq_len=64)
dummy_input = torch.randint(0, 768, (1, 64))

with torch.no_grad():
    # Gọi model
    output = model(dummy_input)

    # ✅ Cách lấy dữ liệu chuẩn từ KiyEngineOutput
    print(f"🎉 Success!")
    print(f"1. Policy Logits (Dự đoán nước đi): {output.policy_logits.shape}") 
    # Kỳ vọng: torch.Size([1, 768]) -> Dự đoán xác suất cho 768 nước đi có thể
    
    print(f"2. Value (Đánh giá thế cờ):      {output.value.shape}")
    # Kỳ vọng: torch.Size([1, 1])   -> Điểm số từ -1 (Thua) đến 1 (Thắng)
    
    print(f"3. Last Hidden State (Tư duy):   {output.last_hidden_state.shape}")
    # Kỳ vọng: torch.Size([1, 64, 384])
```

---

## 📈 Training Progress

The model was trained on **1.5M+ high-quality Lichess games**. The loss curve demonstrated exceptional convergence due to the MoE routing stability.

- **Initial Loss:** 7.78
- **Final Loss:** 5.46 (Epoch 10)
- **Optimizer:** AdamW with OneCycleLR
- **Training Time:** ~5 hours on Tesla P100.

### Loss Curve
*(Consider adding a training curve image here)*

---

## 📂 Repository Structure

```
KiyEngine-V3-Mamba-MoE/
├── model.safetensors              # Optimized weights (272MB)
├── config.json                    # Model configuration
├── configuration_kiyengine.py     # Custom config class
├── modeling_kiyengine.py          # Core PyTorch implementation
└── README.md                      # This file
```

---

## 🎯 Performance

| Metric | Value |
|:-------|:------|
| Final Training Loss | 5.46 |
| Model Size | 272 MB |
| Parameters | 68.06M  *(calculated from architecture)* |
| Inference Speed | TBD |
| Target ELO | TBD |

---

## 🛠️ Roadmap

- [x] Train V3 Mamba-MoE weights
- [x] Push to Hugging Face Hub
- [ ] Implement native Rust inference via `candle-core`
- [ ] Integrate with UCI protocol for GUI play (Arena, CuteChess)
- [ ] Benchmark against Stockfish and Leela Chess Zero
- [ ] Add ONNX export for deployment
- [ ] Create interactive demo on Hugging Face Spaces

---

## 🔬 Technical Details

### Why Mamba?

Traditional Transformers have quadratic complexity in sequence length, making them inefficient for long chess games. **Mamba's linear-time recurrence** allows the model to process entire games efficiently while maintaining long-range dependencies.

### Why MoE?

Chess has distinct phases (opening, middlegame, endgame) that require different strategic thinking. The **Mixture-of-Experts** architecture allows the model to:
- Specialize experts for different game phases
- Route positions to the most relevant expert
- Maintain parameter efficiency while increasing model capacity

---

## 📊 Dataset

- **Source:** Lichess Database
- **Games:** 1.5M+ high-quality games

---

## 🤝 Citation

If you use KiyEngine V3 in your research or projects, please cite:

```bibtex
@misc{kiyengine-v3-2026,
  author = {Kiy-K},
  title = {KiyEngine V3: Mamba-MoE Chess Evaluation Model},
  year = {2026},
  publisher = {Hugging Face},
  howpublished = {\url{https://huggingface.co/Kiy-K/KiyEngine-V3}}
}
```

---

## 📝 License

This model is released under the **MIT License**. See the [LICENSE](LICENSE) file for details.

---

## 👤 Author

**Kiy-K**  
*"Building the next generation of neural chess engines."*

- 🤗 Hugging Face: [@Kiy-K](https://huggingface.co/Kiy-K)
- 📧 Contact: [khoitruong071510@gmail.com]

---

## 🙏 Acknowledgments

- **Mamba:** Based on the [Mamba architecture](https://arxiv.org/abs/2312.00752) by Gu & Dao
- **Dataset:** Lichess Open Database
- **Inspiration:** Stockfish, Leela Chess Zero, and the broader chess AI community

---

## ⚠️ Limitations

- Model is currently a **neural network component** and requires integration with a search algorithm (e.g., MCTS, Alpha-Beta) for full chess engine functionality
- Performance may vary across different game phases
- Requires further validation against established benchmarks

---

<div align="center">

**Star this repo if you find it useful! ⭐**

*Made with ♟️ and 🤖 by Kiy-K*

</div>