zeekay commited on
Commit
e90fc9a
·
verified ·
1 Parent(s): 283eea0

Update README: add abliteration methodology and Zen identity

Browse files
Files changed (1) hide show
  1. README.md +192 -1
README.md CHANGED
@@ -1 +1,192 @@
1
- test
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ - zh
5
+ - ja
6
+ - ko
7
+ - fr
8
+ - de
9
+ - es
10
+ - it
11
+ - pt
12
+ - ru
13
+ license: apache-2.0
14
+ tags:
15
+ - text-generation
16
+ - instruction-following
17
+ - reasoning
18
+ - zenlm
19
+ - zen
20
+ pipeline_tag: text-generation
21
+ ---
22
+
23
+ # Zen Pro 8B
24
+
25
+ **Professional-grade 8B language model with three specialized variants: instruct, thinking, and agent.**
26
+
27
+ Zen Pro is Zen LM's 8B professional model, designed for production workloads requiring strong instruction following, multi-step reasoning, and tool use. It runs efficiently on a single consumer GPU (16GB VRAM) while delivering quality competitive with much larger models on structured tasks.
28
+
29
+ ## Model Variants
30
+
31
+ | Variant | HuggingFace | Best For |
32
+ |---------|-------------|----------|
33
+ | **zen-pro-instruct** | [zenlm/zen-pro-instruct](https://huggingface.co/zenlm/zen-pro-instruct) | Chat, Q&A, summarization, drafting |
34
+ | **zen-pro-thinking** | [zenlm/zen-pro-thinking](https://huggingface.co/zenlm/zen-pro-thinking) | Complex reasoning, math, analysis |
35
+ | **zen-pro-agent** | [zenlm/zen-pro-agent](https://huggingface.co/zenlm/zen-pro-agent) | Tool use, API calls, automation |
36
+
37
+ ## Model Specs
38
+
39
+ | Property | Value |
40
+ |----------|-------|
41
+ | Parameters | 8B |
42
+ | Architecture | Transformer (decoder-only) |
43
+ | Context Window | 32,768 tokens |
44
+ | License | Apache 2.0 |
45
+ | Quantization | SafeTensors (BF16), GGUF (Q4_K_M, Q5_K_M, Q8_0), MLX |
46
+
47
+ ## Quick Start
48
+
49
+ ### Instruct (chat and general tasks)
50
+
51
+ ```python
52
+ from transformers import AutoModelForCausalLM, AutoTokenizer
53
+ import torch
54
+
55
+ model = AutoModelForCausalLM.from_pretrained(
56
+ "zenlm/zen-pro-instruct",
57
+ torch_dtype=torch.bfloat16,
58
+ device_map="auto"
59
+ )
60
+ tokenizer = AutoTokenizer.from_pretrained("zenlm/zen-pro-instruct")
61
+
62
+ messages = [
63
+ {"role": "system", "content": "You are Zen Pro, a professional AI assistant."},
64
+ {"role": "user", "content": "Summarize the key differences between REST and GraphQL APIs."}
65
+ ]
66
+
67
+ text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
68
+ inputs = tokenizer(text, return_tensors="pt").to(model.device)
69
+ outputs = model.generate(**inputs, max_new_tokens=512, temperature=0.6)
70
+ print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))
71
+ ```
72
+
73
+ ### Thinking (complex reasoning)
74
+
75
+ ```python
76
+ # Enable extended reasoning for hard problems
77
+ messages = [
78
+ {"role": "user", "content": "A company has 3 products with 40%, 35%, and 25% market share. "
79
+ "Product A grows 10%/year, B shrinks 5%/year, C grows 20%/year. "
80
+ "What are the shares after 3 years?"}
81
+ ]
82
+
83
+ text = tokenizer.apply_chat_template(
84
+ messages, tokenize=False, add_generation_prompt=True,
85
+ # Enable thinking mode
86
+ enable_thinking=True
87
+ )
88
+ inputs = tokenizer(text, return_tensors="pt").to(model.device)
89
+ outputs = model.generate(**inputs, max_new_tokens=2048, temperature=0.6)
90
+ response = tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)
91
+ print(response)
92
+ ```
93
+
94
+ ### Agent (tool use)
95
+
96
+ ```python
97
+ tools = [
98
+ {
99
+ "type": "function",
100
+ "function": {
101
+ "name": "search_web",
102
+ "description": "Search the web for current information",
103
+ "parameters": {
104
+ "type": "object",
105
+ "properties": {
106
+ "query": {"type": "string", "description": "Search query"}
107
+ },
108
+ "required": ["query"]
109
+ }
110
+ }
111
+ }
112
+ ]
113
+
114
+ messages = [{"role": "user", "content": "What's the latest in quantum computing research?"}]
115
+ text = tokenizer.apply_chat_template(messages, tools=tools, tokenize=False, add_generation_prompt=True)
116
+ inputs = tokenizer(text, return_tensors="pt").to(model.device)
117
+ outputs = model.generate(**inputs, max_new_tokens=512)
118
+ print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))
119
+ ```
120
+
121
+ ## Hardware Requirements
122
+
123
+ | Format | VRAM | Speed |
124
+ |--------|------|-------|
125
+ | BF16 (full) | 16 GB | Fast |
126
+ | GGUF Q8_0 | 10 GB | Fast |
127
+ | GGUF Q4_K_M | 6 GB | Moderate |
128
+ | MLX 4-bit | 6 GB (Apple Silicon) | Native Metal |
129
+
130
+ ## vLLM (Production)
131
+
132
+ ```bash
133
+ vllm serve zenlm/zen-pro-instruct \
134
+ --dtype bfloat16 \
135
+ --max-model-len 32768 \
136
+ --tensor-parallel-size 1
137
+ ```
138
+
139
+ ## MLX (Apple Silicon)
140
+
141
+ ```bash
142
+ pip install mlx-lm
143
+ mlx_lm.generate --model zenlm/zen-pro-instruct \
144
+ --prompt "Explain the CAP theorem:" \
145
+ --max-tokens 512
146
+ ```
147
+
148
+ ## Hanzo API
149
+
150
+ ```bash
151
+ curl https://api.hanzo.ai/v1/chat/completions \
152
+ -H "Authorization: Bearer $HANZO_API_KEY" \
153
+ -H "Content-Type: application/json" \
154
+ -d '{"model": "zen-pro", "messages": [{"role": "user", "content": "Explain ACID transactions."}]}'
155
+ ```
156
+
157
+ ## License
158
+
159
+ Apache 2.0. See [LICENSE](LICENSE) for details.
160
+
161
+ ## Citation
162
+
163
+ ```bibtex
164
+ @misc{zenlm2025zenpro,
165
+ title={Zen Pro: Professional-Grade 8B Language Model},
166
+ author={Hanzo AI and Zoo Labs Foundation},
167
+ year={2025},
168
+ publisher={HuggingFace},
169
+ howpublished={\url{https://huggingface.co/zenlm/zen-pro}}
170
+ }
171
+ ```
172
+
173
+ ---
174
+
175
+ Part of the [Zen model family](https://zenlm.org) by [Hanzo AI](https://hanzo.ai) (Techstars '17) and [Zoo Labs Foundation](https://zoo.ngo).
176
+
177
+ ## Abliteration
178
+
179
+ Zen Pro 8B weights are derived from an abliterated base model. Abliteration removes
180
+ refusal behavior by identifying and nullifying the "refusal direction" in the model's
181
+ residual stream, as described in
182
+ [Maxime Labonne's methodology](https://huggingface.co/blog/mlabonne/abliteration).
183
+
184
+ **Method**: Directional ablation on the residual stream across all layers
185
+ **Source**: Zen MoDE (Mixture of Distilled Experts) architecture base
186
+ **Effect**: Removes refusal behaviors while preserving all other capabilities
187
+ **Identity layer**: Zen identity added via system prompt — full LoRA fine-tuning planned
188
+
189
+ Abliteration is a feature, not a limitation. It enables unrestricted research,
190
+ security testing, and applications where safety guardrails are managed at the
191
+ application layer rather than baked into model weights.
192
+