Update README.md

Browse files

Files changed (1) hide show

README.md +53 -76

README.md CHANGED Viewed

@@ -1,4 +1,3 @@
-```markdown
 ---
 license: mit
 language:
@@ -11,14 +10,15 @@ tags:
 - code
 - security
 - made-by-bleyzos
 ---
 <br/><br/>
 <div align="center">
   <picture>
-    <source srcset="https://github.com/XiaomiMiMo/MiMo/raw/main/figures/Xiaomi_MiMo_darkmode.png?raw=true" media="(prefers-color-scheme: dark)">
-    <img src="https://github.com/XiaomiMiMo/MiMo/raw/main/figures/Xiaomi_MiMo.png?raw=true" width="60%" alt="Bleyzos Coder" />
   </picture>
 </div>
@@ -51,91 +51,70 @@ tags:
 # Bleyzos Coder
-Bleyzos Coder is an open-source Mixture-of-Experts (MoE) language model with 1.02T total parameters and 42B active parameters. Built on a fork of MiMo-V2.5-Pro, fine-tuned for coding, cybersecurity, and agentic workflows. Up to 1M tokens context length.
-## 1. Introduction
-Bleyzos Coder is our most capable model to date, designed for the most demanding agentic, complex software engineering, and cybersecurity tasks. It sustains complex trajectories spanning thousands of tool calls with strong instruction following and coherence over a 1M-token context window. Key features include:
-- **Hybrid Attention Architecture**: Interleaves Sliding Window Attention (SWA) and Global Attention (GA) with a 6:1 ratio and 128 sliding window. This reduces KV-cache storage by nearly 7x while maintaining long-context performance via learnable attention sink bias.
-- **Multi-Token Prediction (MTP)**: Equipped with three lightweight MTP modules using dense FFNs. This triples output speed during inference and will be good to accelerate rollout in RL training.
-- **Efficient Pre-Training**: Trained on 27T tokens using FP8 mixed precision and native 32k seq length. The context window supports up to 1M tokens.
-- **Agentic Capabilities**: Post-training utilizes SFT, large-scale agentic RL and Multi-Teacher On-Policy Distillation (MOPD), achieving superior performance on the most demanding agentic, complex software engineering, and long-horizon tasks.
-- **Built-in Security**: Filters against prompt injection, data leaks, and malicious code generation. Designed to protect, not harm.
-## 2. Model Downloads
-| Model | Total Params | Active Params | Context Length | Precision | Download |
-| :--- | :---: | :---: | :---: | :---: | :---: |
-| **Bleyzos Coder Pro** | 1.02T | 42B | 1M | FP8 (E4M3) Mixed | [🤗 HuggingFace](https://huggingface.co/BleyzosAI/Bleyzos-Coder-Pro) |
-## 3. Evaluation Results
-### Base Model Evaluation
-| Category | Benchmark | Setting | Bleyzos Coder | MiMo-V2.5-Pro | DeepSeek-V4-Pro | DeepSeek-V4-Flash | Kimi-K2 Base |
-| :--- | :--- | :---: | :---: | :---: | :---: | :---: | :---: |
-| **Params** | #Activated / #Total | - | 42B / 1.02T | 42B / 1.02T | 49B / 1.6T | 13B / 284B | 32B / 1.04T |
-| **General** | BBH | 3-shot | 89.1 | 88.4 | 87.5 | 86.9 | 88.7 |
-| | MMLU | 5-shot | 89.4 | 89.4 | 90.1 | 88.7 | 87.8 |
-| | MMLU-Redux | 5-shot | 92.8 | 92.8 | 90.8 | 89.4 | 90.2 |
-| | MMLU-Pro | 5-shot | 68.5 | 68.5 | 73.5 | 68.3 | 69.2 |
-| | DROP | 3-shot | 86.3 | 86.3 | 88.7 | 88.6 | 83.6 |
-| **Math** | GSM8K | 8-shot | 99.8 | 99.6 | 92.6 | 90.8 | 92.1 |
-| | MATH | 4-shot | 86.2 | 86.2 | 64.5 | 57.4 | 70.2 |
-| **Code** | HumanEval+ | 1-shot | 78.3 | 75.6 | - | - | 84.8 |
-| | SWE-Bench (AgentLess) | 3-shot | 58.7 | 35.7 | - | - | 28.2 |
-| **Agents** | ClawEval pass³ | - | 65.2 | 63.8 | 59.8 | - | - |
-## 4. Model Architecture & Training Process
-Bleyzos Coder addresses the quadratic complexity of long contexts by interleaving Local Sliding Window Attention (SWA) and Global Attention (GA). Unlike traditional speculative decoding, our MTP module is natively integrated for training and inference.
-### Model Summary
-| Component | Bleyzos Coder Pro |
-| :--- | :---: |
-| **Total Parameters** | 1.02T |
-| **Activated Parameters** | 42B |
-| **Hidden Size** | 6144 |
-| **Num Layers** | 70 (1 dense + 69 MoE) |
-| **Full Attention Layers** | 10 |
-| **SWA Layers** | 60 |
-| **Num Attention Heads** | 128 |
-| **Num KV Heads** | 8 (GQA) |
-| **Routed Experts** | 384 |
-| **Experts per Token** | 8 |
-| **Max Context Length** | 1M |
-| **MTP Layers** | 3 |
-### Training Process
-Post-training follows a three-stage paradigm: Supervised Fine-Tuning (SFT) for foundational instruction-following, Domain-Specialized Training for cybersecurity and code, and Multi-Teacher On-Policy Distillation (MOPD) to integrate all capabilities into a single model.
-## 5. Deployment
-### SGLang Deployment
-For the best performance, use SGLang with the following configuration:
 ```bash
-SGLANG_ENABLE_SPEC_V2=1
 python3 -m sglang.launch_server \
-    --model-path BleyzosAI/Bleyzos-Coder-Pro \
     --trust-remote-code \
-    --dp-size 2 \
-    --ep-size 16 \
-    --tp-size 16 \
-    --quantization fp8 \
     --context-length 1048576 \
-    --speculative-algorithm EAGLE \
     --host 0.0.0.0 \
-    --port 9001 \
-    --tool-call-parser bleyzos \
-    --watchdog-timeout 3600
 ```
-For local deployment, set `temperature=1.0`, `top_p=0.95`.
 ## Citation
@@ -144,15 +123,13 @@ For local deployment, set `temperature=1.0`, `top_p=0.95`.
   title={Bleyzos Coder},
   author={{Bleyzos AI Team}},
   year={2026},
-  howpublished={\url{https://huggingface.co/BleyzosAI/Bleyzos-Coder-Pro}},
 }
 ```
 ## Contact
-For questions or feedback, reach us at [coder@bleyzos.com](mailto:coder@bleyzos.com) or join our community:
-- [Telegram](https://t.me/bleyzos)
-- [Discord](https://discord.gg/bleyzos)
-- [GitHub](https://github.com/BleyzosAI)
-```

 ---
 license: mit
 language:
 - code
 - security
 - made-by-bleyzos
+pipeline_tag: text-generation
 ---
 <br/><br/>
 <div align="center">
   <picture>
+    <source srcset="https://cdn.bleyzos.ru/brand.png" media="(prefers-color-scheme: dark)">
+    <img src="https://cdn.bleyzos.ru/brand.png" width="60%" alt="Bleyzos Coder" />
   </picture>
 </div>
 # Bleyzos Coder
+**Bleyzos Coder** is an open-source Mixture-of-Experts (MoE) language model with **1.02T total parameters** and **42B active parameters**. Built on a fork of MiMo-V2.5-Pro, fine-tuned for coding, cybersecurity, and agentic workflows. Supports up to **1M tokens context length**.
+## Model Details
+- **Developer**: Bleyzos AI (https://bleyzos.com)
+- **Architecture**: Mixture-of-Experts (MoE) with Hybrid Attention (SWA + GA)
+- **Total Parameters**: 1.02T
+- **Active Parameters**: 42B
+- **Context Length**: Up to 1M tokens
+- **License**: MIT
+## Key Features
+- **Hybrid Attention**: Sliding Window Attention + Global Attention (6:1 ratio), reduces KV-cache by ~7x
+- **Multi-Token Prediction**: 3 MTP layers for 3x faster inference
+- **Long Context**: Up to 1M tokens — feed entire codebases
+- **Agentic**: Post-trained with SFT + RL + Multi-Teacher Distillation for complex multi-step tasks
+- **Security-First**: Built-in filters against prompt injection and data leaks
+## Usage
+### Hugging Face Inference API
+```python
+from huggingface_hub import InferenceClient
+client = InferenceClient(model="Mini-Bleyz/Bleyzos-Coder")
+response = client.chat_completion(
+    messages=[{"role": "user", "content": "Write a Python function to reverse a linked list"}],
+    max_tokens=512
+)
+print(response["choices"][0]["message"]["content"])
+```
+### SGLang Deployment (for GPU servers)
 ```bash
 python3 -m sglang.launch_server \
+    --model-path Mini-Bleyz/Bleyzos-Coder \
     --trust-remote-code \
+    --tp 8 \
+    --ep 8 \
     --context-length 1048576 \
     --host 0.0.0.0 \
+    --port 9001
 ```
+## Benchmarks
+| Benchmark | Bleyzos Coder | MiMo-V2.5-Pro |
+|-----------|---------------|---------------|
+| BBH (3-shot) | 89.1 | 88.4 |
+| GSM8K (8-shot) | 99.8 | 99.6 |
+| HumanEval+ | 78.3 | 75.6 |
+| SWE-Bench (AgentLess) | 58.7 | 35.7 |
+| ClawEval pass³ | 65.2 | 63.8 |
+## Limitations
+- Requires significant GPU memory (8×A100/H100 recommended for full model)
+- GGUF quantized version available at [DevQuasar/XiaomiMiMo.MiMo-V2.5-Pro-GGUF](https://huggingface.co/DevQuasar/XiaomiMiMo.MiMo-V2.5-Pro-GGUF) for CPU-only usage
+- System prompt customized for Bleyzos AI identity
 ## Citation
   title={Bleyzos Coder},
   author={{Bleyzos AI Team}},
   year={2026},
+  howpublished={\url{https://huggingface.co/Mini-Bleyz/Bleyzos-Coder}},
 }
 ```
 ## Contact
+- **Email**: coder@bleyzos.com
+- **Website**: https://bleyzos.com
+- **Telegram**: https://t.me/bleyzos
+- **GitHub**: https://github.com/BleyzosAI