zeekay commited on
Commit
d15f55a
·
verified ·
1 Parent(s): e90fc9a

Update model card: add zen/zenlm tags, fix branding

Browse files
Files changed (1) hide show
  1. README.md +30 -160
README.md CHANGED
@@ -1,192 +1,62 @@
1
  ---
2
- language:
3
- - en
4
- - zh
5
- - ja
6
- - ko
7
- - fr
8
- - de
9
- - es
10
- - it
11
- - pt
12
- - ru
13
  license: apache-2.0
14
  tags:
15
- - text-generation
16
- - instruction-following
17
- - reasoning
18
- - zenlm
19
- - zen
20
  pipeline_tag: text-generation
 
21
  ---
22
 
23
- # Zen Pro 8B
24
 
25
- **Professional-grade 8B language model with three specialized variants: instruct, thinking, and agent.**
26
 
27
- Zen Pro is Zen LM's 8B professional model, designed for production workloads requiring strong instruction following, multi-step reasoning, and tool use. It runs efficiently on a single consumer GPU (16GB VRAM) while delivering quality competitive with much larger models on structured tasks.
28
 
29
- ## Model Variants
30
 
31
- | Variant | HuggingFace | Best For |
32
- |---------|-------------|----------|
33
- | **zen-pro-instruct** | [zenlm/zen-pro-instruct](https://huggingface.co/zenlm/zen-pro-instruct) | Chat, Q&A, summarization, drafting |
34
- | **zen-pro-thinking** | [zenlm/zen-pro-thinking](https://huggingface.co/zenlm/zen-pro-thinking) | Complex reasoning, math, analysis |
35
- | **zen-pro-agent** | [zenlm/zen-pro-agent](https://huggingface.co/zenlm/zen-pro-agent) | Tool use, API calls, automation |
36
-
37
- ## Model Specs
38
-
39
- | Property | Value |
40
- |----------|-------|
41
- | Parameters | 8B |
42
- | Architecture | Transformer (decoder-only) |
43
- | Context Window | 32,768 tokens |
44
- | License | Apache 2.0 |
45
- | Quantization | SafeTensors (BF16), GGUF (Q4_K_M, Q5_K_M, Q8_0), MLX |
46
 
47
  ## Quick Start
48
 
49
- ### Instruct (chat and general tasks)
50
-
51
  ```python
52
  from transformers import AutoModelForCausalLM, AutoTokenizer
53
- import torch
54
-
55
- model = AutoModelForCausalLM.from_pretrained(
56
- "zenlm/zen-pro-instruct",
57
- torch_dtype=torch.bfloat16,
58
- device_map="auto"
59
- )
60
- tokenizer = AutoTokenizer.from_pretrained("zenlm/zen-pro-instruct")
61
 
62
- messages = [
63
- {"role": "system", "content": "You are Zen Pro, a professional AI assistant."},
64
- {"role": "user", "content": "Summarize the key differences between REST and GraphQL APIs."}
65
- ]
66
 
 
67
  text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
68
- inputs = tokenizer(text, return_tensors="pt").to(model.device)
69
- outputs = model.generate(**inputs, max_new_tokens=512, temperature=0.6)
70
- print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))
71
- ```
72
-
73
- ### Thinking (complex reasoning)
74
-
75
- ```python
76
- # Enable extended reasoning for hard problems
77
- messages = [
78
- {"role": "user", "content": "A company has 3 products with 40%, 35%, and 25% market share. "
79
- "Product A grows 10%/year, B shrinks 5%/year, C grows 20%/year. "
80
- "What are the shares after 3 years?"}
81
- ]
82
-
83
- text = tokenizer.apply_chat_template(
84
- messages, tokenize=False, add_generation_prompt=True,
85
- # Enable thinking mode
86
- enable_thinking=True
87
- )
88
- inputs = tokenizer(text, return_tensors="pt").to(model.device)
89
- outputs = model.generate(**inputs, max_new_tokens=2048, temperature=0.6)
90
- response = tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)
91
- print(response)
92
- ```
93
-
94
- ### Agent (tool use)
95
-
96
- ```python
97
- tools = [
98
- {
99
- "type": "function",
100
- "function": {
101
- "name": "search_web",
102
- "description": "Search the web for current information",
103
- "parameters": {
104
- "type": "object",
105
- "properties": {
106
- "query": {"type": "string", "description": "Search query"}
107
- },
108
- "required": ["query"]
109
- }
110
- }
111
- }
112
- ]
113
-
114
- messages = [{"role": "user", "content": "What's the latest in quantum computing research?"}]
115
- text = tokenizer.apply_chat_template(messages, tools=tools, tokenize=False, add_generation_prompt=True)
116
- inputs = tokenizer(text, return_tensors="pt").to(model.device)
117
  outputs = model.generate(**inputs, max_new_tokens=512)
118
- print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))
119
- ```
120
-
121
- ## Hardware Requirements
122
-
123
- | Format | VRAM | Speed |
124
- |--------|------|-------|
125
- | BF16 (full) | 16 GB | Fast |
126
- | GGUF Q8_0 | 10 GB | Fast |
127
- | GGUF Q4_K_M | 6 GB | Moderate |
128
- | MLX 4-bit | 6 GB (Apple Silicon) | Native Metal |
129
-
130
- ## vLLM (Production)
131
-
132
- ```bash
133
- vllm serve zenlm/zen-pro-instruct \
134
- --dtype bfloat16 \
135
- --max-model-len 32768 \
136
- --tensor-parallel-size 1
137
  ```
138
 
139
- ## MLX (Apple Silicon)
140
-
141
- ```bash
142
- pip install mlx-lm
143
- mlx_lm.generate --model zenlm/zen-pro-instruct \
144
- --prompt "Explain the CAP theorem:" \
145
- --max-tokens 512
146
- ```
147
-
148
- ## Hanzo API
149
 
150
  ```bash
151
  curl https://api.hanzo.ai/v1/chat/completions \
152
  -H "Authorization: Bearer $HANZO_API_KEY" \
153
  -H "Content-Type: application/json" \
154
- -d '{"model": "zen-pro", "messages": [{"role": "user", "content": "Explain ACID transactions."}]}'
155
  ```
156
 
157
- ## License
158
-
159
- Apache 2.0. See [LICENSE](LICENSE) for details.
160
 
161
- ## Citation
162
 
163
- ```bibtex
164
- @misc{zenlm2025zenpro,
165
- title={Zen Pro: Professional-Grade 8B Language Model},
166
- author={Hanzo AI and Zoo Labs Foundation},
167
- year={2025},
168
- publisher={HuggingFace},
169
- howpublished={\url{https://huggingface.co/zenlm/zen-pro}}
170
- }
171
- ```
172
-
173
- ---
174
-
175
- Part of the [Zen model family](https://zenlm.org) by [Hanzo AI](https://hanzo.ai) (Techstars '17) and [Zoo Labs Foundation](https://zoo.ngo).
176
-
177
- ## Abliteration
178
-
179
- Zen Pro 8B weights are derived from an abliterated base model. Abliteration removes
180
- refusal behavior by identifying and nullifying the "refusal direction" in the model's
181
- residual stream, as described in
182
- [Maxime Labonne's methodology](https://huggingface.co/blog/mlabonne/abliteration).
183
-
184
- **Method**: Directional ablation on the residual stream across all layers
185
- **Source**: Zen MoDE (Mixture of Distilled Experts) architecture base
186
- **Effect**: Removes refusal behaviors while preserving all other capabilities
187
- **Identity layer**: Zen identity added via system prompt — full LoRA fine-tuning planned
188
 
189
- Abliteration is a feature, not a limitation. It enables unrestricted research,
190
- security testing, and applications where safety guardrails are managed at the
191
- application layer rather than baked into model weights.
192
 
 
 
1
  ---
2
+ language: en
 
 
 
 
 
 
 
 
 
 
3
  license: apache-2.0
4
  tags:
5
+ - text-generation
6
+ - zen
7
+ - zenlm
8
+ - hanzo
9
+ - reasoning
10
  pipeline_tag: text-generation
11
+ library_name: transformers
12
  ---
13
 
14
+ # Zen Pro
15
 
16
+ Professional-grade general-purpose language model for complex reasoning and analysis.
17
 
18
+ ## Overview
19
 
20
+ Built on **Zen MoDE (Mixture of Distilled Experts)** architecture with 32B parameters and 128K context window.
21
 
22
+ Developed by [Hanzo AI](https://hanzo.ai) and the [Zoo Labs Foundation](https://zoo.ngo).
 
 
 
 
 
 
 
 
 
 
 
 
 
 
23
 
24
  ## Quick Start
25
 
 
 
26
  ```python
27
  from transformers import AutoModelForCausalLM, AutoTokenizer
 
 
 
 
 
 
 
 
28
 
29
+ model_id = "zenlm/zen-pro"
30
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
31
+ model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype="auto", device_map="auto")
 
32
 
33
+ messages = [{"role": "user", "content": "Hello!"}]
34
  text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
35
+ inputs = tokenizer([text], return_tensors="pt").to(model.device)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
36
  outputs = model.generate(**inputs, max_new_tokens=512)
37
+ print(tokenizer.decode(outputs[0][inputs.input_ids.shape[1]:], skip_special_tokens=True))
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
38
  ```
39
 
40
+ ## API Access
 
 
 
 
 
 
 
 
 
41
 
42
  ```bash
43
  curl https://api.hanzo.ai/v1/chat/completions \
44
  -H "Authorization: Bearer $HANZO_API_KEY" \
45
  -H "Content-Type: application/json" \
46
+ -d '{"model": "zen-pro", "messages": [{"role": "user", "content": "Hello"}]}'
47
  ```
48
 
49
+ Get your API key at [console.hanzo.ai](https://console.hanzo.ai) — $5 free credit on signup.
 
 
50
 
51
+ ## Model Details
52
 
53
+ | Attribute | Value |
54
+ |-----------|-------|
55
+ | Parameters | 32B |
56
+ | Architecture | Zen MoDE |
57
+ | Context | 128K tokens |
58
+ | License | Apache 2.0 |
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
59
 
60
+ ## License
 
 
61
 
62
+ Apache 2.0