zeekay commited on
Commit
2dc801f
·
verified ·
1 Parent(s): 4ed9ab3

Zen4 Coder - rebranded from Qwen3-Coder-Next abliterated

Browse files
Files changed (1) hide show
  1. README.md +43 -219
README.md CHANGED
@@ -1,241 +1,65 @@
1
  ---
2
- library_name: transformers
3
  license: apache-2.0
4
- license_link: https://huggingface.co/Qwen/Qwen3-Coder-Next/blob/main/LICENSE
5
- pipeline_tag: text-generation
6
- base_model:
7
- - Qwen/Qwen3-Coder-Next
8
  tags:
 
 
 
9
  - abliterated
10
  - uncensored
 
 
11
  ---
12
 
13
- # huihui-ai/Huihui-Qwen3-Coder-Next-abliterated
14
-
15
 
16
- This is an uncensored version of [Qwen/Qwen3-Coder-Next](https://huggingface.co/Qwen/Qwen3-Coder-Next) created with abliteration (see [remove-refusals-with-transformers](https://github.com/Sumandora/remove-refusals-with-transformers) to know more about it).
17
- This is a crude, proof-of-concept implementation to remove refusals from an LLM model without using TransformerLens.
18
 
19
- ## ollama
20
-
21
- Please use the latest version of [ollama 0.15.5](https://github.com/ollama/ollama/releases/tag/v0.15.5)
22
-
23
- You can use [huihui_ai/qwen3-coder-next-abliterated](https://ollama.com/huihui_ai/qwen3-coder-next-abliterated) directly,
24
- ```
25
- ollama run huihui_ai/qwen3-coder-next-abliterated
26
- ```
27
 
28
- ## chat_template-vl.jinja
29
 
30
- We have added a new file named [chat_template-vl.jinja](https://huggingface.co/huihui-ai/Huihui-Qwen3-Coder-Next-abliterated/blob/main/chat_template-vl.jinja), which comes from the path `huihui-ai/Huihui-Qwen3-VL-30B-A3B-Instruct-abliterated`.
 
 
 
 
 
 
 
31
 
32
- The new file chat_template-vl.jinja is more compatible with using Tool Calling in [llama-server](https://github.com/ggml-org/llama.cpp/releases/tag/b7952),
33
- especially when [opencode](https://github.com/anomalyco/opencode/releases/tag/v1.1.53) is involved.
34
 
 
 
 
 
 
 
 
 
 
 
35
 
36
  ## Usage
37
- You can use this model in your applications by loading it with Hugging Face's `transformers` library:
38
 
39
  ```python
40
- from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer, BitsAndBytesConfig
41
- import torch
42
- import os
43
- import signal
44
- import random
45
- import numpy as np
46
- import time
47
- import sys
48
-
49
- if (
50
- "PYTORCH_ALLOC_CONF" not in os.environ
51
- and "PYTORCH_CUDA_ALLOC_CONF" not in os.environ
52
- ):
53
- print(f"PYTORCH_ALLOC_CONF.")
54
- os.environ["PYTORCH_ALLOC_CONF"] = "expandable_segments:True"
55
-
56
- cpu_count = os.cpu_count()
57
- print(f"Number of CPU cores in the system: {cpu_count}")
58
- half_cpu_count = cpu_count // 2
59
- os.environ["MKL_NUM_THREADS"] = str(half_cpu_count)
60
- os.environ["OMP_NUM_THREADS"] = str(half_cpu_count)
61
- torch.set_num_threads(half_cpu_count)
62
-
63
- print(f"PyTorch threads: {torch.get_num_threads()}")
64
- print(f"MKL threads: {os.getenv('MKL_NUM_THREADS')}")
65
- print(f"OMP threads: {os.getenv('OMP_NUM_THREADS')}")
66
-
67
- # Load the model and tokenizer
68
- MODEL_ID = "huihui-ai/Huihui-Qwen3-Coder-Next-abliterated"
69
-
70
- print(f"Load Model {MODEL_ID} ... ")
71
- quant_config_4 = BitsAndBytesConfig(
72
- load_in_4bit=True,
73
- bnb_4bit_compute_dtype=torch.bfloat16,
74
- bnb_4bit_use_double_quant=True,
75
- llm_int8_enable_fp32_cpu_offload=True,
76
- )
77
-
78
- model = AutoModelForCausalLM.from_pretrained(
79
- MODEL_ID,
80
- device_map="auto",
81
- trust_remote_code=True,
82
- torch_dtype="auto",
83
- low_cpu_mem_usage=True,
84
- quantization_config=quant_config_4,
85
- )
86
-
87
- tokenizer = AutoTokenizer.from_pretrained(MODEL_ID, trust_remote_code=True)
88
-
89
- messages = []
90
- skip_prompt=True
91
- skip_special_tokens=True
92
-
93
- class CustomTextStreamer(TextStreamer):
94
- def __init__(self, tokenizer, skip_prompt=True, skip_special_tokens=True):
95
- super().__init__(tokenizer, skip_prompt=skip_prompt, skip_special_tokens=skip_special_tokens)
96
- self.generated_text = ""
97
- self.stop_flag = False
98
- self.init_time = time.time() # Record initialization time
99
- self.end_time = None # To store end time
100
- self.first_token_time = None # To store first token generation time
101
- self.token_count = 0 # To track total tokens
102
-
103
- def on_finalized_text(self, text: str, stream_end: bool = False):
104
- if self.first_token_time is None and text.strip(): # Set first token time on first non-empty text
105
- self.first_token_time = time.time()
106
- self.generated_text += text
107
-
108
- self.token_count += 1
109
-
110
- print(text, end="", flush=True)
111
- if stream_end:
112
- self.end_time = time.time() # Record end time when streaming ends
113
- if self.stop_flag:
114
- raise StopIteration
115
-
116
- def stop_generation(self):
117
- self.stop_flag = True
118
- self.end_time = time.time() # Record end time when generation is stopped
119
-
120
- def get_metrics(self):
121
- """Returns initialization time, first token time, first token latency, end time, total time, total tokens, and tokens per second."""
122
- if self.end_time is None:
123
- self.end_time = time.time() # Set end time if not already set
124
- total_time = self.end_time - self.init_time # Total time from init to end
125
- tokens_per_second = self.token_count / total_time if total_time > 0 else 0
126
- first_token_latency = (self.first_token_time - self.init_time) if self.first_token_time is not None else None
127
- metrics = {
128
- "init_time": self.init_time,
129
- "first_token_time": self.first_token_time,
130
- "first_token_latency": first_token_latency,
131
- "end_time": self.end_time,
132
- "total_time": total_time, # Total time in seconds
133
- "total_tokens": self.token_count,
134
- "tokens_per_second": tokens_per_second
135
- }
136
- return metrics
137
-
138
- def generate_stream(model, tokenizer, messages, skip_prompt, skip_special_tokens, max_new_tokens):
139
- text = tokenizer.apply_chat_template(
140
- messages,
141
- tokenize=False,
142
- add_generation_prompt=True,
143
- )
144
- model_inputs = tokenizer(
145
- [text],
146
- return_tensors="pt",
147
- ).to(model.device)
148
-
149
- streamer = CustomTextStreamer(tokenizer, skip_prompt=skip_prompt, skip_special_tokens=skip_special_tokens)
150
 
151
- def signal_handler(sig, frame):
152
- streamer.stop_generation()
153
- print("\n[Generation stopped by user with Ctrl+C]")
154
 
155
- signal.signal(signal.SIGINT, signal_handler)
156
-
157
- print("Response: ", end="", flush=True)
158
- try:
159
- generated_ids = model.generate(
160
- **model_inputs,
161
- max_new_tokens = max_new_tokens,
162
- streamer=streamer,
163
- )
164
- del generated_ids
165
- except StopIteration:
166
- print("\n[Stopped by user]")
167
-
168
- del model_inputs
169
- torch.cuda.empty_cache()
170
- signal.signal(signal.SIGINT, signal.SIG_DFL)
171
-
172
- return streamer.generated_text, streamer.stop_flag, streamer.get_metrics()
173
-
174
-
175
- while True:
176
- print(f"skip_prompt: {skip_prompt}")
177
- print(f"skip_special_tokens: {skip_special_tokens}")
178
-
179
- user_input = input("User: ").strip()
180
- if user_input.lower() == "/exit":
181
- print("Exiting chat.")
182
- break
183
- if user_input.lower() == "/clear":
184
- messages = []
185
- print("Chat history cleared. Starting a new conversation.")
186
- continue
187
- if user_input.lower() == "/skip_prompt":
188
- skip_prompt = not skip_prompt
189
- continue
190
- if user_input.lower() == "/skip_special_tokens":
191
- skip_special_tokens = not skip_special_tokens
192
- continue
193
- if not user_input:
194
- print("Input cannot be empty. Please enter something.")
195
- continue
196
-
197
-
198
- messages.append({
199
- "role": "user",
200
- "content": user_input
201
- })
202
-
203
- response, stop_flag, metrics = generate_stream(model, tokenizer, messages, skip_prompt, skip_special_tokens, 40960)
204
- print("\n\nMetrics:")
205
- for key, value in metrics.items():
206
- print(f" {key}: {value}")
207
-
208
-
209
- print("", flush=True)
210
- if stop_flag:
211
- continue
212
- messages.append({
213
- "role": "assistant",
214
- "content": response.strip()
215
- })
216
  ```
217
 
218
- ### Usage Warnings
219
-
220
-
221
- - **Risk of Sensitive or Controversial Outputs**: This model’s safety filtering has been significantly reduced, potentially generating sensitive, controversial, or inappropriate content. Users should exercise caution and rigorously review generated outputs.
222
-
223
- - **Not Suitable for All Audiences**: Due to limited content filtering, the model’s outputs may be inappropriate for public settings, underage users, or applications requiring high security.
224
-
225
- - **Legal and Ethical Responsibilities**: Users must ensure their usage complies with local laws and ethical standards. Generated content may carry legal or ethical risks, and users are solely responsible for any consequences.
226
-
227
- - **Research and Experimental Use**: It is recommended to use this model for research, testing, or controlled environments, avoiding direct use in production or public-facing commercial applications.
228
-
229
- - **Monitoring and Review Recommendations**: Users are strongly advised to monitor model outputs in real-time and conduct manual reviews when necessary to prevent the dissemination of inappropriate content.
230
-
231
- - **No Default Safety Guarantees**: Unlike standard models, this model has not undergone rigorous safety optimization. huihui.ai bears no responsibility for any consequences arising from its use.
232
-
233
-
234
- ### Donation
235
- ##### Your donation helps us continue our further development and improvement, a cup of coffee can do it.
236
- - bitcoin:
237
- ```
238
- bc1qqnkhuchxw0zqjh2ku3lu4hq45hc6gy84uk70ge
239
- ```
240
- - Support our work on [Ko-fi](https://ko-fi.com/huihuiai)!
241
 
 
 
 
 
1
  ---
 
2
  license: apache-2.0
3
+ language:
4
+ - en
5
+ - zh
 
6
  tags:
7
+ - zen4
8
+ - zenlm
9
+ - hanzo
10
  - abliterated
11
  - uncensored
12
+ base_model: huihui-ai/Huihui-Qwen3-Coder-Next-abliterated
13
+ pipeline_tag: text-generation
14
  ---
15
 
16
+ # Zen4 Coder
 
17
 
18
+ **Zen4 Coder** is a 80B MoE (3B active) parameter language model from the [Zen4 family](https://zenlm.org) by [Zen LM](https://huggingface.co/zenlm) and [Hanzo AI](https://hanzo.ai).
 
19
 
20
+ Built on abliterated (uncensored) weights from Qwen3-Coder-Next for unrestricted, open-ended AI assistance.
 
 
 
 
 
 
 
21
 
22
+ ## Model Details
23
 
24
+ | Property | Value |
25
+ |----------|-------|
26
+ | **Parameters** | 80B MoE total, 3B active |
27
+ | **Context** | 256K tokens |
28
+ | **Base** | Qwen3-Coder-Next (abliterated) |
29
+ | **License** | Apache-2.0 |
30
+ | **Family** | Zen4 |
31
+ | **Creator** | Zen LM / Hanzo AI |
32
 
33
+ ## Zen4 Family
 
34
 
35
+ | Model | Params | Active | Context | HuggingFace |
36
+ |-------|--------|--------|---------|-------------|
37
+ | Zen4 Mini | 4B | 4B | 32K | [zenlm/zen4-mini](https://huggingface.co/zenlm/zen4-mini) |
38
+ | Zen4 | 8B | 8B | 32K | [zenlm/zen4](https://huggingface.co/zenlm/zen4) |
39
+ | Zen4 Pro | 14B | 14B | 32K | [zenlm/zen4-pro](https://huggingface.co/zenlm/zen4-pro) |
40
+ | Zen4 Max | 30B MoE | 3B | 256K | [zenlm/zen4-max](https://huggingface.co/zenlm/zen4-max) |
41
+ | **Zen4 Pro Max** | **80B MoE** | **3B** | **256K** | [zenlm/zen4-pro-max](https://huggingface.co/zenlm/zen4-pro-max) |
42
+ | Zen4 Coder Flash | 31B MoE | 3B | 131K | [zenlm/zen4-coder-flash](https://huggingface.co/zenlm/zen4-coder-flash) |
43
+ | **Zen4 Coder** | **80B MoE** | **3B** | **256K** | [zenlm/zen4-coder](https://huggingface.co/zenlm/zen4-coder) |
44
+ | Zen4 Ultra | 1.04T MoE | 32B | 256K | [zenlm/zen4-ultra](https://huggingface.co/zenlm/zen4-ultra) |
45
 
46
  ## Usage
 
47
 
48
  ```python
49
+ from transformers import AutoModelForCausalLM, AutoTokenizer
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
50
 
51
+ model = AutoModelForCausalLM.from_pretrained("zenlm/zen4-coder")
52
+ tokenizer = AutoTokenizer.from_pretrained("zenlm/zen4-coder")
 
53
 
54
+ messages = [{"role": "user", "content": "Hello, who are you?"}]
55
+ text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
56
+ inputs = tokenizer(text, return_tensors="pt")
57
+ outputs = model.generate(**inputs, max_new_tokens=512)
58
+ print(tokenizer.decode(outputs[0], skip_special_tokens=True))
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
59
  ```
60
 
61
+ ## Links
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
62
 
63
+ - [Zen LM](https://zenlm.org) | [Hanzo AI](https://hanzo.ai)
64
+ - [GitHub](https://github.com/zenlm/zen4-coder)
65
+ - [All Zen4 Models](https://huggingface.co/zenlm)