Update README.md
Browse files
README.md
CHANGED
|
@@ -1,4 +1,3 @@
|
|
| 1 |
-
```markdown
|
| 2 |
---
|
| 3 |
license: mit
|
| 4 |
language:
|
|
@@ -11,14 +10,15 @@ tags:
|
|
| 11 |
- code
|
| 12 |
- security
|
| 13 |
- made-by-bleyzos
|
|
|
|
| 14 |
---
|
| 15 |
|
| 16 |
<br/><br/>
|
| 17 |
|
| 18 |
<div align="center">
|
| 19 |
<picture>
|
| 20 |
-
<source srcset="https://
|
| 21 |
-
<img src="https://
|
| 22 |
</picture>
|
| 23 |
</div>
|
| 24 |
|
|
@@ -51,91 +51,70 @@ tags:
|
|
| 51 |
|
| 52 |
# Bleyzos Coder
|
| 53 |
|
| 54 |
-
Bleyzos Coder is an open-source Mixture-of-Experts (MoE) language model with 1.02T total parameters and 42B active parameters. Built on a fork of MiMo-V2.5-Pro, fine-tuned for coding, cybersecurity, and agentic workflows.
|
| 55 |
|
| 56 |
-
##
|
| 57 |
|
| 58 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 59 |
|
| 60 |
-
|
| 61 |
-
- **Multi-Token Prediction (MTP)**: Equipped with three lightweight MTP modules using dense FFNs. This triples output speed during inference and will be good to accelerate rollout in RL training.
|
| 62 |
-
- **Efficient Pre-Training**: Trained on 27T tokens using FP8 mixed precision and native 32k seq length. The context window supports up to 1M tokens.
|
| 63 |
-
- **Agentic Capabilities**: Post-training utilizes SFT, large-scale agentic RL and Multi-Teacher On-Policy Distillation (MOPD), achieving superior performance on the most demanding agentic, complex software engineering, and long-horizon tasks.
|
| 64 |
-
- **Built-in Security**: Filters against prompt injection, data leaks, and malicious code generation. Designed to protect, not harm.
|
| 65 |
|
| 66 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 67 |
|
| 68 |
-
|
| 69 |
-
| :--- | :---: | :---: | :---: | :---: | :---: |
|
| 70 |
-
| **Bleyzos Coder Pro** | 1.02T | 42B | 1M | FP8 (E4M3) Mixed | [🤗 HuggingFace](https://huggingface.co/BleyzosAI/Bleyzos-Coder-Pro) |
|
| 71 |
|
| 72 |
-
##
|
| 73 |
|
| 74 |
-
|
|
|
|
| 75 |
|
| 76 |
-
|
| 77 |
-
| :--- | :--- | :---: | :---: | :---: | :---: | :---: | :---: |
|
| 78 |
-
| **Params** | #Activated / #Total | - | 42B / 1.02T | 42B / 1.02T | 49B / 1.6T | 13B / 284B | 32B / 1.04T |
|
| 79 |
-
| **General** | BBH | 3-shot | 89.1 | 88.4 | 87.5 | 86.9 | 88.7 |
|
| 80 |
-
| | MMLU | 5-shot | 89.4 | 89.4 | 90.1 | 88.7 | 87.8 |
|
| 81 |
-
| | MMLU-Redux | 5-shot | 92.8 | 92.8 | 90.8 | 89.4 | 90.2 |
|
| 82 |
-
| | MMLU-Pro | 5-shot | 68.5 | 68.5 | 73.5 | 68.3 | 69.2 |
|
| 83 |
-
| | DROP | 3-shot | 86.3 | 86.3 | 88.7 | 88.6 | 83.6 |
|
| 84 |
-
| **Math** | GSM8K | 8-shot | 99.8 | 99.6 | 92.6 | 90.8 | 92.1 |
|
| 85 |
-
| | MATH | 4-shot | 86.2 | 86.2 | 64.5 | 57.4 | 70.2 |
|
| 86 |
-
| **Code** | HumanEval+ | 1-shot | 78.3 | 75.6 | - | - | 84.8 |
|
| 87 |
-
| | SWE-Bench (AgentLess) | 3-shot | 58.7 | 35.7 | - | - | 28.2 |
|
| 88 |
-
| **Agents** | ClawEval pass³ | - | 65.2 | 63.8 | 59.8 | - | - |
|
| 89 |
|
| 90 |
-
|
|
|
|
|
|
|
|
|
|
| 91 |
|
| 92 |
-
|
| 93 |
-
|
| 94 |
-
### Model Summary
|
| 95 |
-
|
| 96 |
-
| Component | Bleyzos Coder Pro |
|
| 97 |
-
| :--- | :---: |
|
| 98 |
-
| **Total Parameters** | 1.02T |
|
| 99 |
-
| **Activated Parameters** | 42B |
|
| 100 |
-
| **Hidden Size** | 6144 |
|
| 101 |
-
| **Num Layers** | 70 (1 dense + 69 MoE) |
|
| 102 |
-
| **Full Attention Layers** | 10 |
|
| 103 |
-
| **SWA Layers** | 60 |
|
| 104 |
-
| **Num Attention Heads** | 128 |
|
| 105 |
-
| **Num KV Heads** | 8 (GQA) |
|
| 106 |
-
| **Routed Experts** | 384 |
|
| 107 |
-
| **Experts per Token** | 8 |
|
| 108 |
-
| **Max Context Length** | 1M |
|
| 109 |
-
| **MTP Layers** | 3 |
|
| 110 |
-
|
| 111 |
-
### Training Process
|
| 112 |
-
|
| 113 |
-
Post-training follows a three-stage paradigm: Supervised Fine-Tuning (SFT) for foundational instruction-following, Domain-Specialized Training for cybersecurity and code, and Multi-Teacher On-Policy Distillation (MOPD) to integrate all capabilities into a single model.
|
| 114 |
-
|
| 115 |
-
## 5. Deployment
|
| 116 |
-
|
| 117 |
-
### SGLang Deployment
|
| 118 |
|
| 119 |
-
|
| 120 |
|
| 121 |
```bash
|
| 122 |
-
SGLANG_ENABLE_SPEC_V2=1
|
| 123 |
python3 -m sglang.launch_server \
|
| 124 |
-
--model-path
|
| 125 |
--trust-remote-code \
|
| 126 |
-
--
|
| 127 |
-
--ep
|
| 128 |
-
--tp-size 16 \
|
| 129 |
-
--quantization fp8 \
|
| 130 |
--context-length 1048576 \
|
| 131 |
-
--speculative-algorithm EAGLE \
|
| 132 |
--host 0.0.0.0 \
|
| 133 |
-
--port 9001
|
| 134 |
-
--tool-call-parser bleyzos \
|
| 135 |
-
--watchdog-timeout 3600
|
| 136 |
```
|
| 137 |
|
| 138 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 139 |
|
| 140 |
## Citation
|
| 141 |
|
|
@@ -144,15 +123,13 @@ For local deployment, set `temperature=1.0`, `top_p=0.95`.
|
|
| 144 |
title={Bleyzos Coder},
|
| 145 |
author={{Bleyzos AI Team}},
|
| 146 |
year={2026},
|
| 147 |
-
howpublished={\url{https://huggingface.co/
|
| 148 |
}
|
| 149 |
```
|
| 150 |
|
| 151 |
## Contact
|
| 152 |
|
| 153 |
-
|
| 154 |
-
|
| 155 |
-
-
|
| 156 |
-
-
|
| 157 |
-
- [GitHub](https://github.com/BleyzosAI)
|
| 158 |
-
```
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
license: mit
|
| 3 |
language:
|
|
|
|
| 10 |
- code
|
| 11 |
- security
|
| 12 |
- made-by-bleyzos
|
| 13 |
+
pipeline_tag: text-generation
|
| 14 |
---
|
| 15 |
|
| 16 |
<br/><br/>
|
| 17 |
|
| 18 |
<div align="center">
|
| 19 |
<picture>
|
| 20 |
+
<source srcset="https://cdn.bleyzos.ru/brand.png" media="(prefers-color-scheme: dark)">
|
| 21 |
+
<img src="https://cdn.bleyzos.ru/brand.png" width="60%" alt="Bleyzos Coder" />
|
| 22 |
</picture>
|
| 23 |
</div>
|
| 24 |
|
|
|
|
| 51 |
|
| 52 |
# Bleyzos Coder
|
| 53 |
|
| 54 |
+
**Bleyzos Coder** is an open-source Mixture-of-Experts (MoE) language model with **1.02T total parameters** and **42B active parameters**. Built on a fork of MiMo-V2.5-Pro, fine-tuned for coding, cybersecurity, and agentic workflows. Supports up to **1M tokens context length**.
|
| 55 |
|
| 56 |
+
## Model Details
|
| 57 |
|
| 58 |
+
- **Developer**: Bleyzos AI (https://bleyzos.com)
|
| 59 |
+
- **Architecture**: Mixture-of-Experts (MoE) with Hybrid Attention (SWA + GA)
|
| 60 |
+
- **Total Parameters**: 1.02T
|
| 61 |
+
- **Active Parameters**: 42B
|
| 62 |
+
- **Context Length**: Up to 1M tokens
|
| 63 |
+
- **License**: MIT
|
| 64 |
|
| 65 |
+
## Key Features
|
|
|
|
|
|
|
|
|
|
|
|
|
| 66 |
|
| 67 |
+
- **Hybrid Attention**: Sliding Window Attention + Global Attention (6:1 ratio), reduces KV-cache by ~7x
|
| 68 |
+
- **Multi-Token Prediction**: 3 MTP layers for 3x faster inference
|
| 69 |
+
- **Long Context**: Up to 1M tokens — feed entire codebases
|
| 70 |
+
- **Agentic**: Post-trained with SFT + RL + Multi-Teacher Distillation for complex multi-step tasks
|
| 71 |
+
- **Security-First**: Built-in filters against prompt injection and data leaks
|
| 72 |
|
| 73 |
+
## Usage
|
|
|
|
|
|
|
| 74 |
|
| 75 |
+
### Hugging Face Inference API
|
| 76 |
|
| 77 |
+
```python
|
| 78 |
+
from huggingface_hub import InferenceClient
|
| 79 |
|
| 80 |
+
client = InferenceClient(model="Mini-Bleyz/Bleyzos-Coder")
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 81 |
|
| 82 |
+
response = client.chat_completion(
|
| 83 |
+
messages=[{"role": "user", "content": "Write a Python function to reverse a linked list"}],
|
| 84 |
+
max_tokens=512
|
| 85 |
+
)
|
| 86 |
|
| 87 |
+
print(response["choices"][0]["message"]["content"])
|
| 88 |
+
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 89 |
|
| 90 |
+
### SGLang Deployment (for GPU servers)
|
| 91 |
|
| 92 |
```bash
|
|
|
|
| 93 |
python3 -m sglang.launch_server \
|
| 94 |
+
--model-path Mini-Bleyz/Bleyzos-Coder \
|
| 95 |
--trust-remote-code \
|
| 96 |
+
--tp 8 \
|
| 97 |
+
--ep 8 \
|
|
|
|
|
|
|
| 98 |
--context-length 1048576 \
|
|
|
|
| 99 |
--host 0.0.0.0 \
|
| 100 |
+
--port 9001
|
|
|
|
|
|
|
| 101 |
```
|
| 102 |
|
| 103 |
+
## Benchmarks
|
| 104 |
+
|
| 105 |
+
| Benchmark | Bleyzos Coder | MiMo-V2.5-Pro |
|
| 106 |
+
|-----------|---------------|---------------|
|
| 107 |
+
| BBH (3-shot) | 89.1 | 88.4 |
|
| 108 |
+
| GSM8K (8-shot) | 99.8 | 99.6 |
|
| 109 |
+
| HumanEval+ | 78.3 | 75.6 |
|
| 110 |
+
| SWE-Bench (AgentLess) | 58.7 | 35.7 |
|
| 111 |
+
| ClawEval pass³ | 65.2 | 63.8 |
|
| 112 |
+
|
| 113 |
+
## Limitations
|
| 114 |
+
|
| 115 |
+
- Requires significant GPU memory (8×A100/H100 recommended for full model)
|
| 116 |
+
- GGUF quantized version available at [DevQuasar/XiaomiMiMo.MiMo-V2.5-Pro-GGUF](https://huggingface.co/DevQuasar/XiaomiMiMo.MiMo-V2.5-Pro-GGUF) for CPU-only usage
|
| 117 |
+
- System prompt customized for Bleyzos AI identity
|
| 118 |
|
| 119 |
## Citation
|
| 120 |
|
|
|
|
| 123 |
title={Bleyzos Coder},
|
| 124 |
author={{Bleyzos AI Team}},
|
| 125 |
year={2026},
|
| 126 |
+
howpublished={\url{https://huggingface.co/Mini-Bleyz/Bleyzos-Coder}},
|
| 127 |
}
|
| 128 |
```
|
| 129 |
|
| 130 |
## Contact
|
| 131 |
|
| 132 |
+
- **Email**: coder@bleyzos.com
|
| 133 |
+
- **Website**: https://bleyzos.com
|
| 134 |
+
- **Telegram**: https://t.me/bleyzos
|
| 135 |
+
- **GitHub**: https://github.com/BleyzosAI
|
|
|
|
|
|