zeekay commited on
Commit
2a46e3e
·
verified ·
1 Parent(s): d97598d

Update model card: add zen/zenlm tags, fix branding

Browse files
Files changed (1) hide show
  1. README.md +35 -198
README.md CHANGED
@@ -1,227 +1,64 @@
1
  ---
2
- license: mit
3
- language:
4
- - en
5
- - zh
6
- library_name: transformers
7
- pipeline_tag: text-generation
8
  tags:
9
- - zen
10
- - code
11
- - moe
12
- - coding
13
- - programming
14
- - software-engineering
15
- base_model: zenlm/zen-coder-flash
16
- model-index:
17
- - name: zen-coder-flash
18
- results:
19
- - task:
20
- type: text-generation
21
- name: Code Generation
22
- dataset:
23
- name: SWE-bench Verified
24
- type: swe-bench
25
- metrics:
26
- - type: accuracy
27
- value: 59.2
28
- name: SWE-bench Verified
29
- - task:
30
- type: text-generation
31
- name: Mathematical Reasoning
32
- dataset:
33
- name: AIME 2025
34
- type: aime
35
- metrics:
36
- - type: accuracy
37
- value: 91.6
38
- name: AIME 2025
39
  ---
40
 
41
- # Zen Coder Flash
42
-
43
- <div align="center">
44
- <img src="https://zenlm.org/logo.png" alt="Zen AI" width="200"/>
45
-
46
- **The Flagship Zen Coder Model**
47
 
48
- [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
49
- [![HuggingFace](https://img.shields.io/badge/🤗-zenlm%2Fzen--coder--flash-blue)](https://huggingface.co/zenlm/zen-coder-flash)
50
- </div>
51
 
52
  ## Overview
53
 
54
- **Zen Coder Flash** is the flagship code-focused model in the Zen AI family. Built on a cutting-edge Mixture of Experts architecture, it delivers frontier coding performance with practical efficiency.
55
-
56
- | Attribute | Value |
57
- |-----------|-------|
58
- | **Parameters** | 31B total / 3B active (MoE) |
59
- | **Context Length** | 131,072 tokens |
60
- | **Architecture** | Mixture of Experts (MoE) |
61
- | **License** | MIT |
62
- | **Languages** | 100+ programming languages |
63
-
64
- ## Why Zen Coder Flash?
65
-
66
- - **59.2% SWE-bench** nearly **3x better** than comparable models at real coding tasks
67
- - **Efficient MoE**: 31B params but only 3B active per token
68
- - **131K context**: Handle entire codebases in a single prompt
69
- - **Native tool calling**: Built-in function execution support
70
- - **Reasoning mode**: Extended chain-of-thought for complex problems
71
 
72
- ## Performance
73
-
74
- | Benchmark | Score | Improvement |
75
- |-----------|-------|--------------|
76
- | SWE-bench Verified | **59.2%** | +37.2% (2.7x) |
77
- | AIME 2025 | **91.6%** | +6.6% |
78
- | GPQA | **75.2%** | +1.8% |
79
- | τ²-Bench | **79.5%** | +30.5% |
80
-
81
- ## Zen Coder Family
82
-
83
- | Tier | Model | Parameters | Active | Use Case |
84
- |------|-------|------------|--------|----------|
85
- | Small | [zen-coder-4b](https://huggingface.co/zenlm/zen-coder) | 4B | 4B | Edge/mobile |
86
- | **Flagship** | **zen-coder-flash** | **31B MoE** | **3B** | **Balanced** |
87
- | Max | [zen-max](https://huggingface.co/zenlm/zen-max) | 671B MoE | 14B | Frontier |
88
 
89
  ## Quick Start
90
 
91
- ### Transformers
92
-
93
  ```python
94
- import torch
95
  from transformers import AutoModelForCausalLM, AutoTokenizer
96
 
97
  model_id = "zenlm/zen-coder-flash"
98
-
99
  tokenizer = AutoTokenizer.from_pretrained(model_id)
100
- model = AutoModelForCausalLM.from_pretrained(
101
- model_id,
102
- torch_dtype=torch.bfloat16,
103
- device_map="auto",
104
- )
105
-
106
- messages = [{"role": "user", "content": "Write a Python function to find all prime numbers up to n using the Sieve of Eratosthenes"}]
107
-
108
- inputs = tokenizer.apply_chat_template(
109
- messages,
110
- tokenize=True,
111
- add_generation_prompt=True,
112
- return_dict=True,
113
- return_tensors="pt",
114
- ).to(model.device)
115
-
116
- outputs = model.generate(**inputs, max_new_tokens=512, do_sample=True, temperature=0.7)
117
- response = tokenizer.decode(outputs[0][inputs.input_ids.shape[1]:], skip_special_tokens=True)
118
- print(response)
119
- ```
120
 
121
- ### vLLM (Recommended for Production)
122
-
123
- ```bash
124
- vllm serve zenlm/zen-coder-flash \
125
- --tensor-parallel-size 4 \
126
- --speculative-config.method mtp \
127
- --speculative-config.num_speculative_tokens 1 \
128
- --tool-call-parser zen-coder \
129
- --reasoning-parser zen-coder \
130
- --enable-auto-tool-choice
131
  ```
132
 
133
- ### SGLang
134
 
135
  ```bash
136
- python -m sglang.launch_server \
137
- --model-path zenlm/zen-coder-flash \
138
- --tp-size 4 \
139
- --tool-call-parser zen-coder \
140
- --reasoning-parser zen-coder \
141
- --speculative-algorithm EAGLE \
142
- --speculative-num-steps 3
143
- ```
144
-
145
- ### MLX (Apple Silicon)
146
-
147
- ```python
148
- from mlx_lm import load, generate
149
-
150
- model, tokenizer = load("zenlm/zen-coder-flash")
151
- response = generate(model, tokenizer, prompt="Write a Rust function for binary search", max_tokens=256)
152
- print(response)
153
- ```
154
-
155
- ## Capabilities
156
-
157
- ### Code Generation
158
- - 100+ programming languages
159
- - Framework-aware completions
160
- - Test generation
161
- - Documentation generation
162
-
163
- ### Debugging & Analysis
164
- - Bug detection and fixes
165
- - Code review
166
- - Performance optimization
167
- - Security analysis
168
-
169
- ### Software Engineering
170
- - Architecture design
171
- - API design
172
- - Refactoring suggestions
173
- - Migration assistance
174
-
175
- ### Tool Calling
176
- ```python
177
- # Native function calling support
178
- tools = [
179
- {
180
- "type": "function",
181
- "function": {
182
- "name": "run_tests",
183
- "description": "Run test suite",
184
- "parameters": {"type": "object", "properties": {}}
185
- }
186
- }
187
- ]
188
- ```
189
-
190
- ## Identity
191
-
192
- I am **Zen Coder Flash**, the flagship code-focused model in the Zen AI family. I combine a cutting-edge MoE architecture with Zen's philosophy of clarity and efficiency. With 31 billion parameters (only 3B active per token) and 131K context, I deliver frontier coding capability that's practical to deploy.
193
-
194
- ## Training
195
-
196
- Zen Coder Flash is built through identity fine-tuning using MLX LoRA on Apple Silicon. The training emphasizes:
197
-
198
- - Zen identity and persona
199
- - Code-focused instruction following
200
- - Tool calling capabilities
201
- - Extended reasoning patterns
202
-
203
- ## Citation
204
-
205
- ```bibtex
206
- @misc{zen-coder-flash-2025,
207
- title={Zen Coder Flash: Efficient Frontier Code Generation},
208
- author={Hanzo AI},
209
- year={2025},
210
- url={https://huggingface.co/zenlm/zen-coder-flash}
211
- }
212
  ```
213
 
214
- ## Links
215
 
216
- - **Website**: [zenlm.org](https://zenlm.org)
217
- - **GitHub**: [zenlm/zen](https://github.com/zenlm/zen)
218
 
219
- - **Organization**: [Hanzo AI](https://hanzo.ai)
 
 
 
 
 
220
 
221
  ## License
222
 
223
- MIT License
224
-
225
- ---
226
-
227
- *Zen AI: Clarity Through Intelligence*
 
1
  ---
2
+ language: en
3
+ license: apache-2.0
 
 
 
 
4
  tags:
5
+ - text-generation
6
+ - zen
7
+ - zenlm
8
+ - hanzo
9
+ - code
10
+ - coding
11
+ - fast
12
+ pipeline_tag: text-generation
13
+ library_name: transformers
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
14
  ---
15
 
16
+ # Zen Coder Flash
 
 
 
 
 
17
 
18
+ Ultra-fast compact code generation model optimized for real-time completions.
 
 
19
 
20
  ## Overview
21
 
22
+ Built on **Zen MoDE (Mixture of Distilled Experts)** architecture with 4B parameters and 64K context window.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
23
 
24
+ Developed by [Hanzo AI](https://hanzo.ai) and the [Zoo Labs Foundation](https://zoo.ngo).
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
25
 
26
  ## Quick Start
27
 
 
 
28
  ```python
 
29
  from transformers import AutoModelForCausalLM, AutoTokenizer
30
 
31
  model_id = "zenlm/zen-coder-flash"
 
32
  tokenizer = AutoTokenizer.from_pretrained(model_id)
33
+ model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype="auto", device_map="auto")
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
34
 
35
+ messages = [{"role": "user", "content": "Hello!"}]
36
+ text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
37
+ inputs = tokenizer([text], return_tensors="pt").to(model.device)
38
+ outputs = model.generate(**inputs, max_new_tokens=512)
39
+ print(tokenizer.decode(outputs[0][inputs.input_ids.shape[1]:], skip_special_tokens=True))
 
 
 
 
 
40
  ```
41
 
42
+ ## API Access
43
 
44
  ```bash
45
+ curl https://api.hanzo.ai/v1/chat/completions \
46
+ -H "Authorization: Bearer $HANZO_API_KEY" \
47
+ -H "Content-Type: application/json" \
48
+ -d '{"model": "zen-coder-flash", "messages": [{"role": "user", "content": "Hello"}]}'
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
49
  ```
50
 
51
+ Get your API key at [console.hanzo.ai](https://console.hanzo.ai) — $5 free credit on signup.
52
 
53
+ ## Model Details
 
54
 
55
+ | Attribute | Value |
56
+ |-----------|-------|
57
+ | Parameters | 4B |
58
+ | Architecture | Zen MoDE |
59
+ | Context | 64K tokens |
60
+ | License | Apache 2.0 |
61
 
62
  ## License
63
 
64
+ Apache 2.0