| --- |
| license: mit |
| language: |
| - en |
| - ru |
| tags: |
| - text-generation |
| - agent |
| - long-context |
| - code |
| - security |
| - made-by-bleyzos |
| pipeline_tag: text-generation |
| --- |
| |
| <br/><br/> |
|
|
| <div align="center"> |
| <picture> |
| <source srcset="https://cdn.bleyzos.ru/brand.png" media="(prefers-color-scheme: dark)"> |
| <img src="https://cdn.bleyzos.ru/brand.png" width="60%" alt="Bleyzos Coder" /> |
| </picture> |
| </div> |
| |
| <br/> |
|
|
| <br/> |
|
|
| <div align="center" style="line-height: 1.2;"> |
| <strong>Community</strong><br/> |
| <a href="https://t.me/bleyzos" target="_blank">Telegram</a> |
| </div> |
|
|
| <br/> |
|
|
| # Bleyzos Coder |
|
|
| **Bleyzos Coder** is an open-source Mixture-of-Experts (MoE) language model with **1.02T total parameters** and **42B active parameters**. Built on a fork of MiMo-V2.5-Pro, fine-tuned for coding, cybersecurity, and agentic workflows. Supports up to **1M tokens context length**. |
|
|
| ## Model Details |
|
|
| - **Developer**: Bleyzos AI (https://bleyzos.com) |
| - **Architecture**: Mixture-of-Experts (MoE) with Hybrid Attention (SWA + GA) |
| - **Total Parameters**: 1.02T |
| - **Active Parameters**: 42B |
| - **Context Length**: Up to 1M tokens |
| - **License**: MIT |
|
|
| ## Key Features |
|
|
| - **Hybrid Attention**: Sliding Window Attention + Global Attention (6:1 ratio), reduces KV-cache by ~7x |
| - **Multi-Token Prediction**: 3 MTP layers for 3x faster inference |
| - **Long Context**: Up to 1M tokens — feed entire codebases |
| - **Agentic**: Post-trained with SFT + RL + Multi-Teacher Distillation for complex multi-step tasks |
| - **Security-First**: Built-in filters against prompt injection and data leaks |
|
|
| ## Usage |
|
|
| ### Hugging Face Inference API |
|
|
| ```python |
| from huggingface_hub import InferenceClient |
| |
| client = InferenceClient(model="Mini-Bleyz/Bleyzos-Coder") |
| |
| response = client.chat_completion( |
| messages=[{"role": "user", "content": "Write a Python function to reverse a linked list"}], |
| max_tokens=512 |
| ) |
| |
| print(response["choices"][0]["message"]["content"]) |
| ``` |
|
|
| ### SGLang Deployment (for GPU servers) |
|
|
| ```bash |
| python3 -m sglang.launch_server \ |
| --model-path Mini-Bleyz/Bleyzos-Coder \ |
| --trust-remote-code \ |
| --tp 8 \ |
| --ep 8 \ |
| --context-length 1048576 \ |
| --host 0.0.0.0 \ |
| --port 9001 |
| ``` |
|
|
| ## Benchmarks |
|
|
| | Benchmark | Bleyzos Coder | MiMo-V2.5-Pro | |
| |-----------|---------------|---------------| |
| | BBH (3-shot) | 89.1 | 88.4 | |
| | GSM8K (8-shot) | 99.8 | 99.6 | |
| | HumanEval+ | 78.3 | 75.6 | |
| | SWE-Bench (AgentLess) | 58.7 | 35.7 | |
| | ClawEval pass³ | 65.2 | 63.8 | |
|
|
| ## Limitations |
|
|
| - Requires significant GPU memory (8×A100/H100 recommended for full model) |
| - GGUF quantized version available at [DevQuasar/XiaomiMiMo.MiMo-V2.5-Pro-GGUF](https://huggingface.co/DevQuasar/XiaomiMiMo.MiMo-V2.5-Pro-GGUF) for CPU-only usage |
| - System prompt customized for Bleyzos AI identity |
|
|
| ## Citation |
|
|
| ```bibtex |
| @misc{bleyzos2026coder, |
| title={Bleyzos Coder}, |
| author={{Bleyzos AI Team}}, |
| year={2026}, |
| howpublished={\url{https://huggingface.co/Mini-Bleyz/Bleyzos-Coder}}, |
| } |
| ``` |
|
|
| ## Contact |
|
|
| - **Email**: support@bleyzos.ru |
| - **Website**: https://ai.bleyzos.com |
| - **Telegram**: https://t.me/bleyzos |