Duplicate from Mini-Bleyz/Bleyzos-Coder

5ca774a 23 days ago

6.1 kB

	```markdown
	---
	license: mit
	language:
	- en
	- ru
	tags:
	- text-generation
	- agent
	- long-context
	- code
	- security
	- made-by-bleyzos
	---

	<br/><br/>

	<div align="center">
	<picture>
	<source srcset="https://github.com/XiaomiMiMo/MiMo/raw/main/figures/Xiaomi_MiMo_darkmode.png?raw=true" media="(prefers-color-scheme: dark)">
	<img src="https://github.com/XiaomiMiMo/MiMo/raw/main/figures/Xiaomi_MiMo.png?raw=true" width="60%" alt="Bleyzos Coder" />
	</picture>
	</div>

	<br/>

	<div align="center" style="line-height: 1;">
	\|
	<a href="https://github.com/BleyzosAI" target="_blank">🐙 GitHub</a>
	\|
	<a href="https://bleyzos.com/blog" target="_blank">📰 Blog</a>
	\|
	<a href="https://bleyzos.com/studio" target="_blank">🎨 Bleyzos AI Studio</a>
	\|
	<a href="https://discord.gg/bleyzos" target="_blank">🗨️ Discord</a>
	\|
	</div>

	<br/>

	<div align="center" style="line-height: 1.2;">
	<strong>Community</strong><br/>
	<a href="https://t.me/bleyzos" target="_blank">Telegram</a>
	\|
	<a href="https://discord.gg/bleyzos" target="_blank">Discord</a>
	\|
	<a href="https://github.com/BleyzosAI" target="_blank">GitHub</a>
	</div>

	<br/>

	# Bleyzos Coder

	Bleyzos Coder is an open-source Mixture-of-Experts (MoE) language model with 1.02T total parameters and 42B active parameters. Built on a fork of MiMo-V2.5-Pro, fine-tuned for coding, cybersecurity, and agentic workflows. Up to 1M tokens context length.

	## 1. Introduction

	Bleyzos Coder is our most capable model to date, designed for the most demanding agentic, complex software engineering, and cybersecurity tasks. It sustains complex trajectories spanning thousands of tool calls with strong instruction following and coherence over a 1M-token context window. Key features include:

	- Hybrid Attention Architecture: Interleaves Sliding Window Attention (SWA) and Global Attention (GA) with a 6:1 ratio and 128 sliding window. This reduces KV-cache storage by nearly 7x while maintaining long-context performance via learnable attention sink bias.
	- Multi-Token Prediction (MTP): Equipped with three lightweight MTP modules using dense FFNs. This triples output speed during inference and will be good to accelerate rollout in RL training.
	- Efficient Pre-Training: Trained on 27T tokens using FP8 mixed precision and native 32k seq length. The context window supports up to 1M tokens.
	- Agentic Capabilities: Post-training utilizes SFT, large-scale agentic RL and Multi-Teacher On-Policy Distillation (MOPD), achieving superior performance on the most demanding agentic, complex software engineering, and long-horizon tasks.
	- Built-in Security: Filters against prompt injection, data leaks, and malicious code generation. Designed to protect, not harm.

	## 2. Model Downloads

	\| Model \| Total Params \| Active Params \| Context Length \| Precision \| Download \|
	\| :--- \| :---: \| :---: \| :---: \| :---: \| :---: \|
	\| Bleyzos Coder Pro \| 1.02T \| 42B \| 1M \| FP8 (E4M3) Mixed \| [🤗 HuggingFace](https://huggingface.co/BleyzosAI/Bleyzos-Coder-Pro) \|

	## 3. Evaluation Results

	### Base Model Evaluation

	\| Category \| Benchmark \| Setting \| Bleyzos Coder \| MiMo-V2.5-Pro \| DeepSeek-V4-Pro \| DeepSeek-V4-Flash \| Kimi-K2 Base \|
	\| :--- \| :--- \| :---: \| :---: \| :---: \| :---: \| :---: \| :---: \|
	\| Params \| #Activated / #Total \| - \| 42B / 1.02T \| 42B / 1.02T \| 49B / 1.6T \| 13B / 284B \| 32B / 1.04T \|
	\| General \| BBH \| 3-shot \| 89.1 \| 88.4 \| 87.5 \| 86.9 \| 88.7 \|
	\| \| MMLU \| 5-shot \| 89.4 \| 89.4 \| 90.1 \| 88.7 \| 87.8 \|
	\| \| MMLU-Redux \| 5-shot \| 92.8 \| 92.8 \| 90.8 \| 89.4 \| 90.2 \|
	\| \| MMLU-Pro \| 5-shot \| 68.5 \| 68.5 \| 73.5 \| 68.3 \| 69.2 \|
	\| \| DROP \| 3-shot \| 86.3 \| 86.3 \| 88.7 \| 88.6 \| 83.6 \|
	\| Math \| GSM8K \| 8-shot \| 99.8 \| 99.6 \| 92.6 \| 90.8 \| 92.1 \|
	\| \| MATH \| 4-shot \| 86.2 \| 86.2 \| 64.5 \| 57.4 \| 70.2 \|
	\| Code \| HumanEval+ \| 1-shot \| 78.3 \| 75.6 \| - \| - \| 84.8 \|
	\| \| SWE-Bench (AgentLess) \| 3-shot \| 58.7 \| 35.7 \| - \| - \| 28.2 \|
	\| Agents \| ClawEval pass³ \| - \| 65.2 \| 63.8 \| 59.8 \| - \| - \|

	## 4. Model Architecture & Training Process

	Bleyzos Coder addresses the quadratic complexity of long contexts by interleaving Local Sliding Window Attention (SWA) and Global Attention (GA). Unlike traditional speculative decoding, our MTP module is natively integrated for training and inference.

	### Model Summary

	\| Component \| Bleyzos Coder Pro \|
	\| :--- \| :---: \|
	\| Total Parameters \| 1.02T \|
	\| Activated Parameters \| 42B \|
	\| Hidden Size \| 6144 \|
	\| Num Layers \| 70 (1 dense + 69 MoE) \|
	\| Full Attention Layers \| 10 \|
	\| SWA Layers \| 60 \|
	\| Num Attention Heads \| 128 \|
	\| Num KV Heads \| 8 (GQA) \|
	\| Routed Experts \| 384 \|
	\| Experts per Token \| 8 \|
	\| Max Context Length \| 1M \|
	\| MTP Layers \| 3 \|

	### Training Process

	Post-training follows a three-stage paradigm: Supervised Fine-Tuning (SFT) for foundational instruction-following, Domain-Specialized Training for cybersecurity and code, and Multi-Teacher On-Policy Distillation (MOPD) to integrate all capabilities into a single model.

	## 5. Deployment

	### SGLang Deployment

	For the best performance, use SGLang with the following configuration:

	```bash
	SGLANG_ENABLE_SPEC_V2=1
	python3 -m sglang.launch_server \
	--model-path BleyzosAI/Bleyzos-Coder-Pro \
	--trust-remote-code \
	--dp-size 2 \
	--ep-size 16 \
	--tp-size 16 \
	--quantization fp8 \
	--context-length 1048576 \
	--speculative-algorithm EAGLE \
	--host 0.0.0.0 \
	--port 9001 \
	--tool-call-parser bleyzos \
	--watchdog-timeout 3600
	```

	For local deployment, set `temperature=1.0`, `top_p=0.95`.

	## Citation

	```bibtex
	@misc{bleyzos2026coder,
	title={Bleyzos Coder},
	author={{Bleyzos AI Team}},
	year={2026},
	howpublished={\url{https://huggingface.co/BleyzosAI/Bleyzos-Coder-Pro}},
	}
	```

	## Contact

	For questions or feedback, reach us at [coder@bleyzos.com](mailto:coder@bleyzos.com) or join our community:

	- [Telegram](https://t.me/bleyzos)
	- [Discord](https://discord.gg/bleyzos)
	- [GitHub](https://github.com/BleyzosAI)
	```