KiyEngine-V3 / README.md

Update README.md

886fcbd verified 15 days ago

6.23 kB

	---
	language: en
	license: mit
	library_name: transformers
	tags:
	- chess
	- mamba
	- moe
	- kiyengine
	- ssm
	- mixture-of-experts
	pipeline_tag: reinforcement-learning
	---

	# ♟️ KiyEngine V3 (Mamba-MoE)

	> "Where Linear Recurrence meets Sparse Intuition."

	KiyEngine V3 is a high-performance chess evaluation model utilizing a hybrid Mamba State-Space Model (SSM) and Sparse Mixture-of-Experts (MoE) architecture. It is designed to provide deep positional understanding with the inference speed required for elite-level Blitz play.

	---

	## 🚀 Highlights

	- Architecture: Mamba-SSM core for linear-time sequence modeling
	- MoE Strategy: 32 total experts (8 per layer) with Top-2 Gated Routing
	- Training Achievement: Final Converged Loss: 5.46
	- Target Performance: Designed to bridge the gap between neural intuition and traditional brute-force search

	---

	## 🧠 Model Architecture

	Unlike traditional Transformers, KiyEngine V3 uses Mamba blocks to handle long-range game dependencies efficiently, coupled with a Sparse MoE layer to specialize in different phases of the game (Opening, Middlegame, Endgame).

	### Hyperparameters

	\| Parameter \| Value \| Description \|
	\|:----------\|:------\|:------------\|
	\| `d_model` \| 384 \| Hidden dimension size \|
	\| `n_layers` \| 4 \| Number of Mamba-MoE blocks \|
	\| `n_experts` \| 8 \| Experts per layer (Total: 32) \|
	\| `top_k` \| 2 \| Experts activated per token \|
	\| `d_state` \| 16 \| SSM state dimension \|
	\| `d_conv` \| 4 \| Convolution kernel size \|
	\| `expansion_factor` \| 2 \| MLP expansion ratio \|
	\| `vocab_size` \| 768 \| Input representation (Squares × Pieces) \|

	---

	## 💻 Usage

	You can load and test the "brain" of KiyEngine V3 directly via the `transformers` library:

	```python
	from transformers import AutoConfig, AutoModel
	import torch

	# Load the model
	repo_id = "Kiy-K/KiyEngine-V3"
	config = AutoConfig.from_pretrained(repo_id, trust_remote_code=True)
	model = AutoModel.from_pretrained(repo_id, trust_remote_code=True)

	# Set to evaluation mode
	model.eval()

	print("✅ KiyEngine V3 ready for inference.")

	# Create a fake input(Batch=1, Seq_len=64)
	dummy_input = torch.randint(0, 768, (1, 64))

	with torch.no_grad():
	# Gọi model
	output = model(dummy_input)

	# ✅ Cách lấy dữ liệu chuẩn từ KiyEngineOutput
	print(f"🎉 Success!")
	print(f"1. Policy Logits (Dự đoán nước đi): {output.policy_logits.shape}")
	# Kỳ vọng: torch.Size([1, 768]) -> Dự đoán xác suất cho 768 nước đi có thể

	print(f"2. Value (Đánh giá thế cờ): {output.value.shape}")
	# Kỳ vọng: torch.Size([1, 1]) -> Điểm số từ -1 (Thua) đến 1 (Thắng)

	print(f"3. Last Hidden State (Tư duy): {output.last_hidden_state.shape}")
	# Kỳ vọng: torch.Size([1, 64, 384])
	```

	---

	## 📈 Training Progress

	The model was trained on 1.5M+ high-quality Lichess games. The loss curve demonstrated exceptional convergence due to the MoE routing stability.

	- Initial Loss: 7.78
	- Final Loss: 5.46 (Epoch 10)
	- Optimizer: AdamW with OneCycleLR
	- Training Time: ~5 hours on Tesla P100.

	### Loss Curve
	(Consider adding a training curve image here)

	---

	## 📂 Repository Structure

	```
	KiyEngine-V3-Mamba-MoE/
	├── model.safetensors # Optimized weights (272MB)
	├── config.json # Model configuration
	├── configuration_kiyengine.py # Custom config class
	├── modeling_kiyengine.py # Core PyTorch implementation
	└── README.md # This file
	```

	---

	## 🎯 Performance

	\| Metric \| Value \|
	\|:-------\|:------\|
	\| Final Training Loss \| 5.46 \|
	\| Model Size \| 272 MB \|
	\| Parameters \| 68.06M (calculated from architecture) \|
	\| Inference Speed \| TBD \|
	\| Target ELO \| TBD \|

	---

	## 🛠️ Roadmap

	- [x] Train V3 Mamba-MoE weights
	- [x] Push to Hugging Face Hub
	- [ ] Implement native Rust inference via `candle-core`
	- [ ] Integrate with UCI protocol for GUI play (Arena, CuteChess)
	- [ ] Benchmark against Stockfish and Leela Chess Zero
	- [ ] Add ONNX export for deployment
	- [ ] Create interactive demo on Hugging Face Spaces

	---

	## 🔬 Technical Details

	### Why Mamba?

	Traditional Transformers have quadratic complexity in sequence length, making them inefficient for long chess games. Mamba's linear-time recurrence allows the model to process entire games efficiently while maintaining long-range dependencies.

	### Why MoE?

	Chess has distinct phases (opening, middlegame, endgame) that require different strategic thinking. The Mixture-of-Experts architecture allows the model to:
	- Specialize experts for different game phases
	- Route positions to the most relevant expert
	- Maintain parameter efficiency while increasing model capacity

	---

	## 📊 Dataset

	- Source: Lichess Database
	- Games: 1.5M+ high-quality games

	---

	## 🤝 Citation

	If you use KiyEngine V3 in your research or projects, please cite:

	```bibtex
	@misc{kiyengine-v3-2026,
	author = {Kiy-K},
	title = {KiyEngine V3: Mamba-MoE Chess Evaluation Model},
	year = {2026},
	publisher = {Hugging Face},
	howpublished = {\url{https://huggingface.co/Kiy-K/KiyEngine-V3}}
	}
	```

	---

	## 📝 License

	This model is released under the MIT License. See the [LICENSE](LICENSE) file for details.

	---

	## 👤 Author

	Kiy-K
	"Building the next generation of neural chess engines."

	- 🤗 Hugging Face: [@Kiy-K](https://huggingface.co/Kiy-K)
	- 📧 Contact: [khoitruong071510@gmail.com]

	---

	## 🙏 Acknowledgments

	- Mamba: Based on the [Mamba architecture](https://arxiv.org/abs/2312.00752) by Gu & Dao
	- Dataset: Lichess Open Database
	- Inspiration: Stockfish, Leela Chess Zero, and the broader chess AI community

	---

	## ⚠️ Limitations

	- Model is currently a neural network component and requires integration with a search algorithm (e.g., MCTS, Alpha-Beta) for full chess engine functionality
	- Performance may vary across different game phases
	- Requires further validation against established benchmarks

	---

	<div align="center">

	Star this repo if you find it useful! ⭐

	Made with ♟️ and 🤖 by Kiy-K

	</div>