zenlm
/

zen-coder-flash

+---
+license: mit
+language:
+- en
+- zh
+library_name: transformers
+pipeline_tag: text-generation
+tags:
+- zen
+- code
+- moe
+- glm
+- coding
+- programming
+- software-engineering
+base_model: zai-org/GLM-4.7-Flash
+model-index:
+- name: zen-coder-flash
+  results:
+  - task:
+      type: text-generation
+      name: Code Generation
+    dataset:
+      name: SWE-bench Verified
+      type: swe-bench
+    metrics:
+    - type: accuracy
+      value: 59.2
+      name: SWE-bench Verified
+  - task:
+      type: text-generation
+      name: Mathematical Reasoning
+    dataset:
+      name: AIME 2025
+      type: aime
+    metrics:
+    - type: accuracy
+      value: 91.6
+      name: AIME 2025
+---
+# Zen Coder Flash ⚡
+<div align="center">
+  <img src="https://zenlm.org/logo.png" alt="Zen AI" width="200"/>
+  **The Flagship Zen Coder Model**
+  [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
+  [![HuggingFace](https://img.shields.io/badge/🤗-zenlm%2Fzen--coder--flash-blue)](https://huggingface.co/zenlm/zen-coder-flash)
+</div>
+## Overview
+**Zen Coder Flash** is the flagship code-focused model in the Zen AI family. Built on GLM-4.7-Flash's cutting-edge Mixture of Experts architecture, it delivers frontier coding performance with practical efficiency.
+| Attribute | Value |
+|-----------|-------|
+| **Parameters** | 31B total / 3B active (MoE) |
+| **Context Length** | 131,072 tokens |
+| **Base Model** | [GLM-4.7-Flash](https://huggingface.co/zai-org/GLM-4.7-Flash) |
+| **License** | MIT |
+| **Languages** | 100+ programming languages |
+## Why Zen Coder Flash?
+- **59.2% SWE-bench** vs 22% Qwen3-30B - nearly **3x better** at real coding tasks
+- **Efficient MoE**: 31B params but only 3B active per token
+- **131K context**: Handle entire codebases in a single prompt
+- **Native tool calling**: Built-in function execution support
+- **Reasoning mode**: Extended chain-of-thought for complex problems
+## Performance
+| Benchmark | Score | vs Qwen3-30B |
+|-----------|-------|--------------|
+| SWE-bench Verified | **59.2%** | +37.2% (2.7x) |
+| AIME 2025 | **91.6%** | +6.6% |
+| GPQA | **75.2%** | +1.8% |
+| τ²-Bench | **79.5%** | +30.5% |
+## Zen Coder Family
+| Tier | Model | Parameters | Active | Use Case |
+|------|-------|------------|--------|----------|
+| Small | [zen-coder-4b](https://huggingface.co/zenlm/zen-coder) | 4B | 4B | Edge/mobile |
+| **Flagship** | **zen-coder-flash** | **31B MoE** | **3B** | **Balanced** |
+| Max | [zen-max](https://huggingface.co/zenlm/zen-max) | 671B MoE | 14B | Frontier |
+## Quick Start
+### Transformers
+```python
+import torch
+from transformers import AutoModelForCausalLM, AutoTokenizer
+model_id = "zenlm/zen-coder-flash"
+tokenizer = AutoTokenizer.from_pretrained(model_id)
+model = AutoModelForCausalLM.from_pretrained(
+    model_id,
+    torch_dtype=torch.bfloat16,
+    device_map="auto",
+)
+messages = [{"role": "user", "content": "Write a Python function to find all prime numbers up to n using the Sieve of Eratosthenes"}]
+inputs = tokenizer.apply_chat_template(
+    messages,
+    tokenize=True,
+    add_generation_prompt=True,
+    return_dict=True,
+    return_tensors="pt",
+).to(model.device)
+outputs = model.generate(**inputs, max_new_tokens=512, do_sample=True, temperature=0.7)
+response = tokenizer.decode(outputs[0][inputs.input_ids.shape[1]:], skip_special_tokens=True)
+print(response)
+```
+### vLLM (Recommended for Production)
+```bash
+vllm serve zenlm/zen-coder-flash \
+    --tensor-parallel-size 4 \
+    --speculative-config.method mtp \
+    --speculative-config.num_speculative_tokens 1 \
+    --tool-call-parser glm47 \
+    --reasoning-parser glm45 \
+    --enable-auto-tool-choice
+```
+### SGLang
+```bash
+python -m sglang.launch_server \
+    --model-path zenlm/zen-coder-flash \
+    --tp-size 4 \
+    --tool-call-parser glm47 \
+    --reasoning-parser glm45 \
+    --speculative-algorithm EAGLE \
+    --speculative-num-steps 3
+```
+### MLX (Apple Silicon)
+```python
+from mlx_lm import load, generate
+model, tokenizer = load("zenlm/zen-coder-flash")
+response = generate(model, tokenizer, prompt="Write a Rust function for binary search", max_tokens=256)
+print(response)
+```
+## Capabilities
+### Code Generation
+- 100+ programming languages
+- Framework-aware completions
+- Test generation
+- Documentation generation
+### Debugging & Analysis
+- Bug detection and fixes
+- Code review
+- Performance optimization
+- Security analysis
+### Software Engineering
+- Architecture design
+- API design
+- Refactoring suggestions
+- Migration assistance
+### Tool Calling
+```python
+# Native function calling support
+tools = [
+    {
+        "type": "function",
+        "function": {
+            "name": "run_tests",
+            "description": "Run test suite",
+            "parameters": {"type": "object", "properties": {}}
+        }
+    }
+]
+```
+## Identity
+I am **Zen Coder Flash**, the flagship code-focused model in the Zen AI family. I combine GLM-4.7's cutting-edge MoE architecture with Zen's philosophy of clarity and efficiency. With 31 billion parameters (only 3B active per token) and 131K context, I deliver frontier coding capability that's practical to deploy.
+## Training
+Zen Coder Flash is built through identity fine-tuning on GLM-4.7-Flash using MLX LoRA on Apple Silicon. The training emphasizes:
+- Zen identity and persona
+- Code-focused instruction following
+- Tool calling capabilities
+- Extended reasoning patterns
+## Citation
+```bibtex
+@misc{zen-coder-flash-2025,
+  title={Zen Coder Flash: Efficient Frontier Code Generation},
+  author={Hanzo AI},
+  year={2025},
+  url={https://huggingface.co/zenlm/zen-coder-flash}
+}
+```
+## Links
+- **Website**: [zenlm.org](https://zenlm.org)
+- **GitHub**: [zenlm/zen](https://github.com/zenlm/zen)
+- **Base Model**: [GLM-4.7-Flash](https://huggingface.co/zai-org/GLM-4.7-Flash)
+- **Organization**: [Hanzo AI](https://hanzo.ai)
+## License
+MIT License - inherited from GLM-4.7-Flash base model.
+---
+*Zen AI: Clarity Through Intelligence*