--- language: en license: mit library_name: transformers tags: - chess - mamba - moe - kiyengine - ssm - mixture-of-experts pipeline_tag: reinforcement-learning --- # ♟️ KiyEngine V3 (Mamba-MoE) > **"Where Linear Recurrence meets Sparse Intuition."** **KiyEngine V3** is a high-performance chess evaluation model utilizing a hybrid **Mamba State-Space Model (SSM)** and **Sparse Mixture-of-Experts (MoE)** architecture. It is designed to provide deep positional understanding with the inference speed required for elite-level Blitz play. --- ## 🚀 Highlights - **Architecture:** Mamba-SSM core for linear-time sequence modeling - **MoE Strategy:** 32 total experts (8 per layer) with Top-2 Gated Routing - **Training Achievement:** Final Converged Loss: **5.46** - **Target Performance:** Designed to bridge the gap between neural intuition and traditional brute-force search --- ## 🧠 Model Architecture Unlike traditional Transformers, KiyEngine V3 uses **Mamba** blocks to handle long-range game dependencies efficiently, coupled with a **Sparse MoE** layer to specialize in different phases of the game (Opening, Middlegame, Endgame). ### Hyperparameters | Parameter | Value | Description | |:----------|:------|:------------| | `d_model` | 384 | Hidden dimension size | | `n_layers` | 4 | Number of Mamba-MoE blocks | | `n_experts` | 8 | Experts per layer (Total: 32) | | `top_k` | 2 | Experts activated per token | | `d_state` | 16 | SSM state dimension | | `d_conv` | 4 | Convolution kernel size | | `expansion_factor` | 2 | MLP expansion ratio | | `vocab_size` | 768 | Input representation (Squares × Pieces) | --- ## 💻 Usage You can load and test the "brain" of KiyEngine V3 directly via the `transformers` library: ```python from transformers import AutoConfig, AutoModel import torch # Load the model repo_id = "Kiy-K/KiyEngine-V3" config = AutoConfig.from_pretrained(repo_id, trust_remote_code=True) model = AutoModel.from_pretrained(repo_id, trust_remote_code=True) # Set to evaluation mode model.eval() print("✅ KiyEngine V3 ready for inference.") # Create a fake input(Batch=1, Seq_len=64) dummy_input = torch.randint(0, 768, (1, 64)) with torch.no_grad(): # Gọi model output = model(dummy_input) # ✅ Cách lấy dữ liệu chuẩn từ KiyEngineOutput print(f"🎉 Success!") print(f"1. Policy Logits (Dự đoán nước đi): {output.policy_logits.shape}") # Kỳ vọng: torch.Size([1, 768]) -> Dự đoán xác suất cho 768 nước đi có thể print(f"2. Value (Đánh giá thế cờ): {output.value.shape}") # Kỳ vọng: torch.Size([1, 1]) -> Điểm số từ -1 (Thua) đến 1 (Thắng) print(f"3. Last Hidden State (Tư duy): {output.last_hidden_state.shape}") # Kỳ vọng: torch.Size([1, 64, 384]) ``` --- ## 📈 Training Progress The model was trained on **1.5M+ high-quality Lichess games**. The loss curve demonstrated exceptional convergence due to the MoE routing stability. - **Initial Loss:** 7.78 - **Final Loss:** 5.46 (Epoch 10) - **Optimizer:** AdamW with OneCycleLR - **Training Time:** ~5 hours on Tesla P100. ### Loss Curve *(Consider adding a training curve image here)* --- ## 📂 Repository Structure ``` KiyEngine-V3-Mamba-MoE/ ├── model.safetensors # Optimized weights (272MB) ├── config.json # Model configuration ├── configuration_kiyengine.py # Custom config class ├── modeling_kiyengine.py # Core PyTorch implementation └── README.md # This file ``` --- ## 🎯 Performance | Metric | Value | |:-------|:------| | Final Training Loss | 5.46 | | Model Size | 272 MB | | Parameters | 68.06M *(calculated from architecture)* | | Inference Speed | TBD | | Target ELO | TBD | --- ## 🛠️ Roadmap - [x] Train V3 Mamba-MoE weights - [x] Push to Hugging Face Hub - [ ] Implement native Rust inference via `candle-core` - [ ] Integrate with UCI protocol for GUI play (Arena, CuteChess) - [ ] Benchmark against Stockfish and Leela Chess Zero - [ ] Add ONNX export for deployment - [ ] Create interactive demo on Hugging Face Spaces --- ## 🔬 Technical Details ### Why Mamba? Traditional Transformers have quadratic complexity in sequence length, making them inefficient for long chess games. **Mamba's linear-time recurrence** allows the model to process entire games efficiently while maintaining long-range dependencies. ### Why MoE? Chess has distinct phases (opening, middlegame, endgame) that require different strategic thinking. The **Mixture-of-Experts** architecture allows the model to: - Specialize experts for different game phases - Route positions to the most relevant expert - Maintain parameter efficiency while increasing model capacity --- ## 📊 Dataset - **Source:** Lichess Database - **Games:** 1.5M+ high-quality games --- ## 🤝 Citation If you use KiyEngine V3 in your research or projects, please cite: ```bibtex @misc{kiyengine-v3-2026, author = {Kiy-K}, title = {KiyEngine V3: Mamba-MoE Chess Evaluation Model}, year = {2026}, publisher = {Hugging Face}, howpublished = {\url{https://huggingface.co/Kiy-K/KiyEngine-V3}} } ``` --- ## 📝 License This model is released under the **MIT License**. See the [LICENSE](LICENSE) file for details. --- ## 👤 Author **Kiy-K** *"Building the next generation of neural chess engines."* - 🤗 Hugging Face: [@Kiy-K](https://huggingface.co/Kiy-K) - 📧 Contact: [khoitruong071510@gmail.com] --- ## 🙏 Acknowledgments - **Mamba:** Based on the [Mamba architecture](https://arxiv.org/abs/2312.00752) by Gu & Dao - **Dataset:** Lichess Open Database - **Inspiration:** Stockfish, Leela Chess Zero, and the broader chess AI community --- ## ⚠️ Limitations - Model is currently a **neural network component** and requires integration with a search algorithm (e.g., MCTS, Alpha-Beta) for full chess engine functionality - Performance may vary across different game phases - Requires further validation against established benchmarks ---