KiyEngine-V3 / README.md
Kiy-K's picture
Update README.md
886fcbd verified
metadata
language: en
license: mit
library_name: transformers
tags:
  - chess
  - mamba
  - moe
  - kiyengine
  - ssm
  - mixture-of-experts
pipeline_tag: reinforcement-learning

♟️ KiyEngine V3 (Mamba-MoE)

"Where Linear Recurrence meets Sparse Intuition."

KiyEngine V3 is a high-performance chess evaluation model utilizing a hybrid Mamba State-Space Model (SSM) and Sparse Mixture-of-Experts (MoE) architecture. It is designed to provide deep positional understanding with the inference speed required for elite-level Blitz play.


🚀 Highlights

  • Architecture: Mamba-SSM core for linear-time sequence modeling
  • MoE Strategy: 32 total experts (8 per layer) with Top-2 Gated Routing
  • Training Achievement: Final Converged Loss: 5.46
  • Target Performance: Designed to bridge the gap between neural intuition and traditional brute-force search

🧠 Model Architecture

Unlike traditional Transformers, KiyEngine V3 uses Mamba blocks to handle long-range game dependencies efficiently, coupled with a Sparse MoE layer to specialize in different phases of the game (Opening, Middlegame, Endgame).

Hyperparameters

Parameter Value Description
d_model 384 Hidden dimension size
n_layers 4 Number of Mamba-MoE blocks
n_experts 8 Experts per layer (Total: 32)
top_k 2 Experts activated per token
d_state 16 SSM state dimension
d_conv 4 Convolution kernel size
expansion_factor 2 MLP expansion ratio
vocab_size 768 Input representation (Squares × Pieces)

💻 Usage

You can load and test the "brain" of KiyEngine V3 directly via the transformers library:

from transformers import AutoConfig, AutoModel
import torch

# Load the model
repo_id = "Kiy-K/KiyEngine-V3"
config = AutoConfig.from_pretrained(repo_id, trust_remote_code=True)
model = AutoModel.from_pretrained(repo_id, trust_remote_code=True)

# Set to evaluation mode
model.eval()

print("✅ KiyEngine V3 ready for inference.")

# Create a fake input(Batch=1, Seq_len=64)
dummy_input = torch.randint(0, 768, (1, 64))

with torch.no_grad():
    # Gọi model
    output = model(dummy_input)

    # ✅ Cách lấy dữ liệu chuẩn từ KiyEngineOutput
    print(f"🎉 Success!")
    print(f"1. Policy Logits (Dự đoán nước đi): {output.policy_logits.shape}") 
    # Kỳ vọng: torch.Size([1, 768]) -> Dự đoán xác suất cho 768 nước đi có thể
    
    print(f"2. Value (Đánh giá thế cờ):      {output.value.shape}")
    # Kỳ vọng: torch.Size([1, 1])   -> Điểm số từ -1 (Thua) đến 1 (Thắng)
    
    print(f"3. Last Hidden State (Tư duy):   {output.last_hidden_state.shape}")
    # Kỳ vọng: torch.Size([1, 64, 384])

📈 Training Progress

The model was trained on 1.5M+ high-quality Lichess games. The loss curve demonstrated exceptional convergence due to the MoE routing stability.

  • Initial Loss: 7.78
  • Final Loss: 5.46 (Epoch 10)
  • Optimizer: AdamW with OneCycleLR
  • Training Time: ~5 hours on Tesla P100.

Loss Curve

(Consider adding a training curve image here)


📂 Repository Structure

KiyEngine-V3-Mamba-MoE/
├── model.safetensors              # Optimized weights (272MB)
├── config.json                    # Model configuration
├── configuration_kiyengine.py     # Custom config class
├── modeling_kiyengine.py          # Core PyTorch implementation
└── README.md                      # This file

🎯 Performance

Metric Value
Final Training Loss 5.46
Model Size 272 MB
Parameters 68.06M (calculated from architecture)
Inference Speed TBD
Target ELO TBD

🛠️ Roadmap

  • Train V3 Mamba-MoE weights
  • Push to Hugging Face Hub
  • Implement native Rust inference via candle-core
  • Integrate with UCI protocol for GUI play (Arena, CuteChess)
  • Benchmark against Stockfish and Leela Chess Zero
  • Add ONNX export for deployment
  • Create interactive demo on Hugging Face Spaces

🔬 Technical Details

Why Mamba?

Traditional Transformers have quadratic complexity in sequence length, making them inefficient for long chess games. Mamba's linear-time recurrence allows the model to process entire games efficiently while maintaining long-range dependencies.

Why MoE?

Chess has distinct phases (opening, middlegame, endgame) that require different strategic thinking. The Mixture-of-Experts architecture allows the model to:

  • Specialize experts for different game phases
  • Route positions to the most relevant expert
  • Maintain parameter efficiency while increasing model capacity

📊 Dataset

  • Source: Lichess Database
  • Games: 1.5M+ high-quality games

🤝 Citation

If you use KiyEngine V3 in your research or projects, please cite:

@misc{kiyengine-v3-2026,
  author = {Kiy-K},
  title = {KiyEngine V3: Mamba-MoE Chess Evaluation Model},
  year = {2026},
  publisher = {Hugging Face},
  howpublished = {\url{https://huggingface.co/Kiy-K/KiyEngine-V3}}
}

📝 License

This model is released under the MIT License. See the LICENSE file for details.


👤 Author

Kiy-K
"Building the next generation of neural chess engines."


🙏 Acknowledgments

  • Mamba: Based on the Mamba architecture by Gu & Dao
  • Dataset: Lichess Open Database
  • Inspiration: Stockfish, Leela Chess Zero, and the broader chess AI community

⚠️ Limitations

  • Model is currently a neural network component and requires integration with a search algorithm (e.g., MCTS, Alpha-Beta) for full chess engine functionality
  • Performance may vary across different game phases
  • Requires further validation against established benchmarks

Star this repo if you find it useful! ⭐

Made with ♟️ and 🤖 by Kiy-K