Mini-Bleyz commited on
Commit
7cb282f
·
verified ·
1 Parent(s): b069d8a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +53 -76
README.md CHANGED
@@ -1,4 +1,3 @@
1
- ```markdown
2
  ---
3
  license: mit
4
  language:
@@ -11,14 +10,15 @@ tags:
11
  - code
12
  - security
13
  - made-by-bleyzos
 
14
  ---
15
 
16
  <br/><br/>
17
 
18
  <div align="center">
19
  <picture>
20
- <source srcset="https://github.com/XiaomiMiMo/MiMo/raw/main/figures/Xiaomi_MiMo_darkmode.png?raw=true" media="(prefers-color-scheme: dark)">
21
- <img src="https://github.com/XiaomiMiMo/MiMo/raw/main/figures/Xiaomi_MiMo.png?raw=true" width="60%" alt="Bleyzos Coder" />
22
  </picture>
23
  </div>
24
 
@@ -51,91 +51,70 @@ tags:
51
 
52
  # Bleyzos Coder
53
 
54
- Bleyzos Coder is an open-source Mixture-of-Experts (MoE) language model with 1.02T total parameters and 42B active parameters. Built on a fork of MiMo-V2.5-Pro, fine-tuned for coding, cybersecurity, and agentic workflows. Up to 1M tokens context length.
55
 
56
- ## 1. Introduction
57
 
58
- Bleyzos Coder is our most capable model to date, designed for the most demanding agentic, complex software engineering, and cybersecurity tasks. It sustains complex trajectories spanning thousands of tool calls with strong instruction following and coherence over a 1M-token context window. Key features include:
 
 
 
 
 
59
 
60
- - **Hybrid Attention Architecture**: Interleaves Sliding Window Attention (SWA) and Global Attention (GA) with a 6:1 ratio and 128 sliding window. This reduces KV-cache storage by nearly 7x while maintaining long-context performance via learnable attention sink bias.
61
- - **Multi-Token Prediction (MTP)**: Equipped with three lightweight MTP modules using dense FFNs. This triples output speed during inference and will be good to accelerate rollout in RL training.
62
- - **Efficient Pre-Training**: Trained on 27T tokens using FP8 mixed precision and native 32k seq length. The context window supports up to 1M tokens.
63
- - **Agentic Capabilities**: Post-training utilizes SFT, large-scale agentic RL and Multi-Teacher On-Policy Distillation (MOPD), achieving superior performance on the most demanding agentic, complex software engineering, and long-horizon tasks.
64
- - **Built-in Security**: Filters against prompt injection, data leaks, and malicious code generation. Designed to protect, not harm.
65
 
66
- ## 2. Model Downloads
 
 
 
 
67
 
68
- | Model | Total Params | Active Params | Context Length | Precision | Download |
69
- | :--- | :---: | :---: | :---: | :---: | :---: |
70
- | **Bleyzos Coder Pro** | 1.02T | 42B | 1M | FP8 (E4M3) Mixed | [🤗 HuggingFace](https://huggingface.co/BleyzosAI/Bleyzos-Coder-Pro) |
71
 
72
- ## 3. Evaluation Results
73
 
74
- ### Base Model Evaluation
 
75
 
76
- | Category | Benchmark | Setting | Bleyzos Coder | MiMo-V2.5-Pro | DeepSeek-V4-Pro | DeepSeek-V4-Flash | Kimi-K2 Base |
77
- | :--- | :--- | :---: | :---: | :---: | :---: | :---: | :---: |
78
- | **Params** | #Activated / #Total | - | 42B / 1.02T | 42B / 1.02T | 49B / 1.6T | 13B / 284B | 32B / 1.04T |
79
- | **General** | BBH | 3-shot | 89.1 | 88.4 | 87.5 | 86.9 | 88.7 |
80
- | | MMLU | 5-shot | 89.4 | 89.4 | 90.1 | 88.7 | 87.8 |
81
- | | MMLU-Redux | 5-shot | 92.8 | 92.8 | 90.8 | 89.4 | 90.2 |
82
- | | MMLU-Pro | 5-shot | 68.5 | 68.5 | 73.5 | 68.3 | 69.2 |
83
- | | DROP | 3-shot | 86.3 | 86.3 | 88.7 | 88.6 | 83.6 |
84
- | **Math** | GSM8K | 8-shot | 99.8 | 99.6 | 92.6 | 90.8 | 92.1 |
85
- | | MATH | 4-shot | 86.2 | 86.2 | 64.5 | 57.4 | 70.2 |
86
- | **Code** | HumanEval+ | 1-shot | 78.3 | 75.6 | - | - | 84.8 |
87
- | | SWE-Bench (AgentLess) | 3-shot | 58.7 | 35.7 | - | - | 28.2 |
88
- | **Agents** | ClawEval pass³ | - | 65.2 | 63.8 | 59.8 | - | - |
89
 
90
- ## 4. Model Architecture & Training Process
 
 
 
91
 
92
- Bleyzos Coder addresses the quadratic complexity of long contexts by interleaving Local Sliding Window Attention (SWA) and Global Attention (GA). Unlike traditional speculative decoding, our MTP module is natively integrated for training and inference.
93
-
94
- ### Model Summary
95
-
96
- | Component | Bleyzos Coder Pro |
97
- | :--- | :---: |
98
- | **Total Parameters** | 1.02T |
99
- | **Activated Parameters** | 42B |
100
- | **Hidden Size** | 6144 |
101
- | **Num Layers** | 70 (1 dense + 69 MoE) |
102
- | **Full Attention Layers** | 10 |
103
- | **SWA Layers** | 60 |
104
- | **Num Attention Heads** | 128 |
105
- | **Num KV Heads** | 8 (GQA) |
106
- | **Routed Experts** | 384 |
107
- | **Experts per Token** | 8 |
108
- | **Max Context Length** | 1M |
109
- | **MTP Layers** | 3 |
110
-
111
- ### Training Process
112
-
113
- Post-training follows a three-stage paradigm: Supervised Fine-Tuning (SFT) for foundational instruction-following, Domain-Specialized Training for cybersecurity and code, and Multi-Teacher On-Policy Distillation (MOPD) to integrate all capabilities into a single model.
114
-
115
- ## 5. Deployment
116
-
117
- ### SGLang Deployment
118
 
119
- For the best performance, use SGLang with the following configuration:
120
 
121
  ```bash
122
- SGLANG_ENABLE_SPEC_V2=1
123
  python3 -m sglang.launch_server \
124
- --model-path BleyzosAI/Bleyzos-Coder-Pro \
125
  --trust-remote-code \
126
- --dp-size 2 \
127
- --ep-size 16 \
128
- --tp-size 16 \
129
- --quantization fp8 \
130
  --context-length 1048576 \
131
- --speculative-algorithm EAGLE \
132
  --host 0.0.0.0 \
133
- --port 9001 \
134
- --tool-call-parser bleyzos \
135
- --watchdog-timeout 3600
136
  ```
137
 
138
- For local deployment, set `temperature=1.0`, `top_p=0.95`.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
139
 
140
  ## Citation
141
 
@@ -144,15 +123,13 @@ For local deployment, set `temperature=1.0`, `top_p=0.95`.
144
  title={Bleyzos Coder},
145
  author={{Bleyzos AI Team}},
146
  year={2026},
147
- howpublished={\url{https://huggingface.co/BleyzosAI/Bleyzos-Coder-Pro}},
148
  }
149
  ```
150
 
151
  ## Contact
152
 
153
- For questions or feedback, reach us at [coder@bleyzos.com](mailto:coder@bleyzos.com) or join our community:
154
-
155
- - [Telegram](https://t.me/bleyzos)
156
- - [Discord](https://discord.gg/bleyzos)
157
- - [GitHub](https://github.com/BleyzosAI)
158
- ```
 
 
1
  ---
2
  license: mit
3
  language:
 
10
  - code
11
  - security
12
  - made-by-bleyzos
13
+ pipeline_tag: text-generation
14
  ---
15
 
16
  <br/><br/>
17
 
18
  <div align="center">
19
  <picture>
20
+ <source srcset="https://cdn.bleyzos.ru/brand.png" media="(prefers-color-scheme: dark)">
21
+ <img src="https://cdn.bleyzos.ru/brand.png" width="60%" alt="Bleyzos Coder" />
22
  </picture>
23
  </div>
24
 
 
51
 
52
  # Bleyzos Coder
53
 
54
+ **Bleyzos Coder** is an open-source Mixture-of-Experts (MoE) language model with **1.02T total parameters** and **42B active parameters**. Built on a fork of MiMo-V2.5-Pro, fine-tuned for coding, cybersecurity, and agentic workflows. Supports up to **1M tokens context length**.
55
 
56
+ ## Model Details
57
 
58
+ - **Developer**: Bleyzos AI (https://bleyzos.com)
59
+ - **Architecture**: Mixture-of-Experts (MoE) with Hybrid Attention (SWA + GA)
60
+ - **Total Parameters**: 1.02T
61
+ - **Active Parameters**: 42B
62
+ - **Context Length**: Up to 1M tokens
63
+ - **License**: MIT
64
 
65
+ ## Key Features
 
 
 
 
66
 
67
+ - **Hybrid Attention**: Sliding Window Attention + Global Attention (6:1 ratio), reduces KV-cache by ~7x
68
+ - **Multi-Token Prediction**: 3 MTP layers for 3x faster inference
69
+ - **Long Context**: Up to 1M tokens — feed entire codebases
70
+ - **Agentic**: Post-trained with SFT + RL + Multi-Teacher Distillation for complex multi-step tasks
71
+ - **Security-First**: Built-in filters against prompt injection and data leaks
72
 
73
+ ## Usage
 
 
74
 
75
+ ### Hugging Face Inference API
76
 
77
+ ```python
78
+ from huggingface_hub import InferenceClient
79
 
80
+ client = InferenceClient(model="Mini-Bleyz/Bleyzos-Coder")
 
 
 
 
 
 
 
 
 
 
 
 
81
 
82
+ response = client.chat_completion(
83
+ messages=[{"role": "user", "content": "Write a Python function to reverse a linked list"}],
84
+ max_tokens=512
85
+ )
86
 
87
+ print(response["choices"][0]["message"]["content"])
88
+ ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
89
 
90
+ ### SGLang Deployment (for GPU servers)
91
 
92
  ```bash
 
93
  python3 -m sglang.launch_server \
94
+ --model-path Mini-Bleyz/Bleyzos-Coder \
95
  --trust-remote-code \
96
+ --tp 8 \
97
+ --ep 8 \
 
 
98
  --context-length 1048576 \
 
99
  --host 0.0.0.0 \
100
+ --port 9001
 
 
101
  ```
102
 
103
+ ## Benchmarks
104
+
105
+ | Benchmark | Bleyzos Coder | MiMo-V2.5-Pro |
106
+ |-----------|---------------|---------------|
107
+ | BBH (3-shot) | 89.1 | 88.4 |
108
+ | GSM8K (8-shot) | 99.8 | 99.6 |
109
+ | HumanEval+ | 78.3 | 75.6 |
110
+ | SWE-Bench (AgentLess) | 58.7 | 35.7 |
111
+ | ClawEval pass³ | 65.2 | 63.8 |
112
+
113
+ ## Limitations
114
+
115
+ - Requires significant GPU memory (8×A100/H100 recommended for full model)
116
+ - GGUF quantized version available at [DevQuasar/XiaomiMiMo.MiMo-V2.5-Pro-GGUF](https://huggingface.co/DevQuasar/XiaomiMiMo.MiMo-V2.5-Pro-GGUF) for CPU-only usage
117
+ - System prompt customized for Bleyzos AI identity
118
 
119
  ## Citation
120
 
 
123
  title={Bleyzos Coder},
124
  author={{Bleyzos AI Team}},
125
  year={2026},
126
+ howpublished={\url{https://huggingface.co/Mini-Bleyz/Bleyzos-Coder}},
127
  }
128
  ```
129
 
130
  ## Contact
131
 
132
+ - **Email**: coder@bleyzos.com
133
+ - **Website**: https://bleyzos.com
134
+ - **Telegram**: https://t.me/bleyzos
135
+ - **GitHub**: https://github.com/BleyzosAI