zeekay commited on
Commit
bc0f9de
·
verified ·
1 Parent(s): f41e0d4

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +228 -0
README.md ADDED
@@ -0,0 +1,228 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ language:
4
+ - en
5
+ - zh
6
+ library_name: transformers
7
+ pipeline_tag: text-generation
8
+ tags:
9
+ - zen
10
+ - code
11
+ - moe
12
+ - glm
13
+ - coding
14
+ - programming
15
+ - software-engineering
16
+ base_model: zai-org/GLM-4.7-Flash
17
+ model-index:
18
+ - name: zen-coder-flash
19
+ results:
20
+ - task:
21
+ type: text-generation
22
+ name: Code Generation
23
+ dataset:
24
+ name: SWE-bench Verified
25
+ type: swe-bench
26
+ metrics:
27
+ - type: accuracy
28
+ value: 59.2
29
+ name: SWE-bench Verified
30
+ - task:
31
+ type: text-generation
32
+ name: Mathematical Reasoning
33
+ dataset:
34
+ name: AIME 2025
35
+ type: aime
36
+ metrics:
37
+ - type: accuracy
38
+ value: 91.6
39
+ name: AIME 2025
40
+ ---
41
+
42
+ # Zen Coder Flash ⚡
43
+
44
+ <div align="center">
45
+ <img src="https://zenlm.org/logo.png" alt="Zen AI" width="200"/>
46
+
47
+ **The Flagship Zen Coder Model**
48
+
49
+ [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
50
+ [![HuggingFace](https://img.shields.io/badge/🤗-zenlm%2Fzen--coder--flash-blue)](https://huggingface.co/zenlm/zen-coder-flash)
51
+ </div>
52
+
53
+ ## Overview
54
+
55
+ **Zen Coder Flash** is the flagship code-focused model in the Zen AI family. Built on GLM-4.7-Flash's cutting-edge Mixture of Experts architecture, it delivers frontier coding performance with practical efficiency.
56
+
57
+ | Attribute | Value |
58
+ |-----------|-------|
59
+ | **Parameters** | 31B total / 3B active (MoE) |
60
+ | **Context Length** | 131,072 tokens |
61
+ | **Base Model** | [GLM-4.7-Flash](https://huggingface.co/zai-org/GLM-4.7-Flash) |
62
+ | **License** | MIT |
63
+ | **Languages** | 100+ programming languages |
64
+
65
+ ## Why Zen Coder Flash?
66
+
67
+ - **59.2% SWE-bench** vs 22% Qwen3-30B - nearly **3x better** at real coding tasks
68
+ - **Efficient MoE**: 31B params but only 3B active per token
69
+ - **131K context**: Handle entire codebases in a single prompt
70
+ - **Native tool calling**: Built-in function execution support
71
+ - **Reasoning mode**: Extended chain-of-thought for complex problems
72
+
73
+ ## Performance
74
+
75
+ | Benchmark | Score | vs Qwen3-30B |
76
+ |-----------|-------|--------------|
77
+ | SWE-bench Verified | **59.2%** | +37.2% (2.7x) |
78
+ | AIME 2025 | **91.6%** | +6.6% |
79
+ | GPQA | **75.2%** | +1.8% |
80
+ | τ²-Bench | **79.5%** | +30.5% |
81
+
82
+ ## Zen Coder Family
83
+
84
+ | Tier | Model | Parameters | Active | Use Case |
85
+ |------|-------|------------|--------|----------|
86
+ | Small | [zen-coder-4b](https://huggingface.co/zenlm/zen-coder) | 4B | 4B | Edge/mobile |
87
+ | **Flagship** | **zen-coder-flash** | **31B MoE** | **3B** | **Balanced** |
88
+ | Max | [zen-max](https://huggingface.co/zenlm/zen-max) | 671B MoE | 14B | Frontier |
89
+
90
+ ## Quick Start
91
+
92
+ ### Transformers
93
+
94
+ ```python
95
+ import torch
96
+ from transformers import AutoModelForCausalLM, AutoTokenizer
97
+
98
+ model_id = "zenlm/zen-coder-flash"
99
+
100
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
101
+ model = AutoModelForCausalLM.from_pretrained(
102
+ model_id,
103
+ torch_dtype=torch.bfloat16,
104
+ device_map="auto",
105
+ )
106
+
107
+ messages = [{"role": "user", "content": "Write a Python function to find all prime numbers up to n using the Sieve of Eratosthenes"}]
108
+
109
+ inputs = tokenizer.apply_chat_template(
110
+ messages,
111
+ tokenize=True,
112
+ add_generation_prompt=True,
113
+ return_dict=True,
114
+ return_tensors="pt",
115
+ ).to(model.device)
116
+
117
+ outputs = model.generate(**inputs, max_new_tokens=512, do_sample=True, temperature=0.7)
118
+ response = tokenizer.decode(outputs[0][inputs.input_ids.shape[1]:], skip_special_tokens=True)
119
+ print(response)
120
+ ```
121
+
122
+ ### vLLM (Recommended for Production)
123
+
124
+ ```bash
125
+ vllm serve zenlm/zen-coder-flash \
126
+ --tensor-parallel-size 4 \
127
+ --speculative-config.method mtp \
128
+ --speculative-config.num_speculative_tokens 1 \
129
+ --tool-call-parser glm47 \
130
+ --reasoning-parser glm45 \
131
+ --enable-auto-tool-choice
132
+ ```
133
+
134
+ ### SGLang
135
+
136
+ ```bash
137
+ python -m sglang.launch_server \
138
+ --model-path zenlm/zen-coder-flash \
139
+ --tp-size 4 \
140
+ --tool-call-parser glm47 \
141
+ --reasoning-parser glm45 \
142
+ --speculative-algorithm EAGLE \
143
+ --speculative-num-steps 3
144
+ ```
145
+
146
+ ### MLX (Apple Silicon)
147
+
148
+ ```python
149
+ from mlx_lm import load, generate
150
+
151
+ model, tokenizer = load("zenlm/zen-coder-flash")
152
+ response = generate(model, tokenizer, prompt="Write a Rust function for binary search", max_tokens=256)
153
+ print(response)
154
+ ```
155
+
156
+ ## Capabilities
157
+
158
+ ### Code Generation
159
+ - 100+ programming languages
160
+ - Framework-aware completions
161
+ - Test generation
162
+ - Documentation generation
163
+
164
+ ### Debugging & Analysis
165
+ - Bug detection and fixes
166
+ - Code review
167
+ - Performance optimization
168
+ - Security analysis
169
+
170
+ ### Software Engineering
171
+ - Architecture design
172
+ - API design
173
+ - Refactoring suggestions
174
+ - Migration assistance
175
+
176
+ ### Tool Calling
177
+ ```python
178
+ # Native function calling support
179
+ tools = [
180
+ {
181
+ "type": "function",
182
+ "function": {
183
+ "name": "run_tests",
184
+ "description": "Run test suite",
185
+ "parameters": {"type": "object", "properties": {}}
186
+ }
187
+ }
188
+ ]
189
+ ```
190
+
191
+ ## Identity
192
+
193
+ I am **Zen Coder Flash**, the flagship code-focused model in the Zen AI family. I combine GLM-4.7's cutting-edge MoE architecture with Zen's philosophy of clarity and efficiency. With 31 billion parameters (only 3B active per token) and 131K context, I deliver frontier coding capability that's practical to deploy.
194
+
195
+ ## Training
196
+
197
+ Zen Coder Flash is built through identity fine-tuning on GLM-4.7-Flash using MLX LoRA on Apple Silicon. The training emphasizes:
198
+
199
+ - Zen identity and persona
200
+ - Code-focused instruction following
201
+ - Tool calling capabilities
202
+ - Extended reasoning patterns
203
+
204
+ ## Citation
205
+
206
+ ```bibtex
207
+ @misc{zen-coder-flash-2025,
208
+ title={Zen Coder Flash: Efficient Frontier Code Generation},
209
+ author={Hanzo AI},
210
+ year={2025},
211
+ url={https://huggingface.co/zenlm/zen-coder-flash}
212
+ }
213
+ ```
214
+
215
+ ## Links
216
+
217
+ - **Website**: [zenlm.org](https://zenlm.org)
218
+ - **GitHub**: [zenlm/zen](https://github.com/zenlm/zen)
219
+ - **Base Model**: [GLM-4.7-Flash](https://huggingface.co/zai-org/GLM-4.7-Flash)
220
+ - **Organization**: [Hanzo AI](https://hanzo.ai)
221
+
222
+ ## License
223
+
224
+ MIT License - inherited from GLM-4.7-Flash base model.
225
+
226
+ ---
227
+
228
+ *Zen AI: Clarity Through Intelligence*