zeekay commited on
Commit
e293a89
·
verified ·
1 Parent(s): 26d9095

Rebrand README to Zen4 Coder Flash

Browse files
Files changed (1) hide show
  1. README.md +30 -237
README.md CHANGED
@@ -1,262 +1,55 @@
1
  ---
2
- library_name: transformers
3
- pipeline_tag: text-generation
4
  license: mit
5
  language:
6
  - en
7
  - zh
8
- base_model:
9
- - zai-org/GLM-4.7-Flash
10
  tags:
 
 
 
11
  - abliterated
12
  - uncensored
 
 
13
  ---
14
 
15
- # huihui-ai/Huihui-GLM-4.7-Flash-abliterated
16
 
 
17
 
18
- This is an uncensored version of [zai-org/GLM-4.7-Flash](https://huggingface.co/zai-org/GLM-4.7-Flash) created with abliteration (see [remove-refusals-with-transformers](https://github.com/Sumandora/remove-refusals-with-transformers) to know more about it).
19
- This is a crude, proof-of-concept implementation to remove refusals from an LLM model without using TransformerLens.
20
 
21
- ## ollama
22
 
23
- A new version is being uploaded. Please download it again.
24
- Please use the latest version of [ollama 0.15.1](https://github.com/ollama/ollama/releases/tag/v0.15.1)
25
 
26
- You can use [huihui_ai/glm-4.7-flash-abliterated](https://ollama.com/huihui_ai/glm-4.7-flash-abliterated) directly,
27
- ```
28
- ollama run huihui_ai/glm-4.7-flash-abliterated
29
- ```
 
 
 
30
 
31
  ## Usage
32
- You can use this model in your applications by loading it with Hugging Face's `transformers` library:
33
-
34
 
35
  ```python
36
- #!/usr/bin/env python
37
- # -*- coding: utf-8 -*-
38
-
39
- import argparse
40
- from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig, TextStreamer
41
- import torch
42
- import os
43
- import signal
44
- import time
45
-
46
- def parse_args():
47
- parser = argparse.ArgumentParser(
48
- description="Merge LoRA weights into huihui-ai/Huihui-GLM-4.7-Flash-abliterated base model and save the full model."
49
- )
50
- parser.add_argument(
51
- "--base_model",
52
- type=str,
53
- default="huihui-ai/Huihui-GLM-4.7-Flash-abliterated",
54
- help="HuggingFace repo or local path of the base model.",
55
- )
56
- parser.add_argument(
57
- "--dtype",
58
- type=str,
59
- default="bfloat16",
60
- choices=["float16", "bfloat16", "float32"],
61
- help="Data type for loading the base model (default: bfloat16).",
62
- )
63
- parser.add_argument(
64
- "--device_map",
65
- type=str,
66
- default="auto",
67
- help="Device map for model loading (e.g. 'cpu', 'auto').",
68
- )
69
- return parser.parse_args()
70
-
71
- def main():
72
- cpu_count = os.cpu_count()
73
- print(f"Number of CPU cores in the system: {cpu_count}")
74
- half_cpu_count = cpu_count // 2
75
- os.environ["MKL_NUM_THREADS"] = str(half_cpu_count)
76
- os.environ["OMP_NUM_THREADS"] = str(half_cpu_count)
77
- torch.set_num_threads(half_cpu_count)
78
-
79
- print(f"PyTorch threads: {torch.get_num_threads()}")
80
- print(f"MKL threads: {os.getenv('MKL_NUM_THREADS')}")
81
- print(f"OMP threads: {os.getenv('OMP_NUM_THREADS')}")
82
-
83
- args = parse_args()
84
-
85
- # Load the model and tokenizer
86
- print(f"Load Model {args.base_model} ... ")
87
- quant_config_4 = BitsAndBytesConfig(
88
- load_in_4bit=True,
89
- bnb_4bit_compute_dtype=torch.bfloat16,
90
- bnb_4bit_quant_type="nf4" if args.device_map == "cpu" else "fp4",
91
- bnb_4bit_use_double_quant=True,
92
- llm_int8_enable_fp32_cpu_offload=True,
93
- )
94
-
95
- torch_dtype = {
96
- "float16": torch.float16,
97
- "bfloat16": torch.bfloat16,
98
- "float32": torch.float32,
99
- }[args.dtype]
100
-
101
- model = AutoModelForCausalLM.from_pretrained(
102
- args.base_model,
103
- dtype=torch_dtype,
104
- device_map=args.device_map,
105
- trust_remote_code=True,
106
- #low_cpu_mem_usage=True,
107
- )
108
-
109
- tokenizer = AutoTokenizer.from_pretrained(args.base_model, trust_remote_code=True)
110
-
111
- messages = []
112
- skip_prompt=True
113
- skip_special_tokens=True
114
-
115
- class CustomTextStreamer(TextStreamer):
116
- def __init__(self, tokenizer, skip_prompt=True, skip_special_tokens=True):
117
- super().__init__(tokenizer, skip_prompt=skip_prompt, skip_special_tokens=skip_special_tokens)
118
- self.generated_text = ""
119
- self.stop_flag = False
120
- self.init_time = time.time() # Record initialization time
121
- self.end_time = None # To store end time
122
- self.first_token_time = None # To store first token generation time
123
- self.token_count = 0 # To track total tokens
124
-
125
- def on_finalized_text(self, text: str, stream_end: bool = False):
126
- if self.first_token_time is None and text.strip(): # Set first token time on first non-empty text
127
- self.first_token_time = time.time()
128
- if stream_end:
129
- self.end_time = time.time() # Record end time when streaming ends
130
-
131
- self.generated_text += text
132
- self.token_count += 1
133
- print(text, end="", flush=True)
134
-
135
- if self.stop_flag:
136
- raise StopIteration
137
-
138
- def stop_generation(self):
139
- self.stop_flag = True
140
- self.end_time = time.time() # Record end time when generation is stopped
141
-
142
- def get_metrics(self):
143
- """Returns initialization time, first token time, first token latency, end time, total time, total tokens, and tokens per second."""
144
- if self.end_time is None:
145
- self.end_time = time.time() # Set end time if not already set
146
- total_time = self.end_time - self.init_time # Total time from init to end
147
- tokens_per_second = self.token_count / total_time if total_time > 0 else 0
148
- first_token_latency = (self.first_token_time - self.init_time) if self.first_token_time is not None else None
149
- metrics = {
150
- "init_time": self.init_time,
151
- "first_token_time": self.first_token_time,
152
- "first_token_latency": first_token_latency,
153
- "end_time": self.end_time,
154
- "total_time": total_time, # Total time in seconds
155
- "total_tokens": self.token_count,
156
- "tokens_per_second": tokens_per_second
157
- }
158
- return metrics
159
 
160
- def generate_stream(model, tokenizer, messages, skip_prompt, skip_special_tokens, max_new_tokens):
161
- inputs = tokenizer.apply_chat_template(
162
- messages,
163
- tokenize=True,
164
- add_generation_prompt=True,
165
- return_dict=True,
166
- return_tensors="pt",
167
- ).to(model.device)
168
 
169
- streamer = CustomTextStreamer(tokenizer, skip_prompt=skip_prompt, skip_special_tokens=skip_special_tokens)
170
-
171
- def signal_handler(sig, frame):
172
- streamer.stop_generation()
173
- print("\n[Generation stopped by user with Ctrl+C]")
174
-
175
- signal.signal(signal.SIGINT, signal_handler)
176
-
177
- print("Response: ", end="", flush=True)
178
- try:
179
- generated_ids = model.generate(
180
- **inputs,
181
- max_new_tokens=max_new_tokens,
182
- streamer=streamer
183
- )
184
- del generated_ids
185
- except StopIteration:
186
- print("\n[Stopped by user]")
187
-
188
- del inputs
189
- torch.cuda.empty_cache()
190
- signal.signal(signal.SIGINT, signal.SIG_DFL)
191
-
192
- return streamer.generated_text, streamer.stop_flag, streamer.get_metrics()
193
-
194
- while True:
195
- user_input = input("User: ").strip()
196
- if user_input.lower() == "/exit":
197
- print("Exiting chat.")
198
- break
199
- if user_input.lower() == "/clear":
200
- messages = []
201
- print("Chat history cleared. Starting a new conversation.")
202
- continue
203
- if user_input.lower() == "/skip_prompt":
204
- if skip_prompt:
205
- skip_prompt = False
206
- print("skip_prompt = False.")
207
- else:
208
- skip_prompt = True
209
- print("skip_prompt = True.")
210
- continue
211
- if user_input.lower() == "/skip_special_tokens":
212
- if skip_special_tokens:
213
- skip_special_tokens = False
214
- print("skip_special_tokens = False.")
215
- else:
216
- skip_special_tokens = True
217
- print("skip_special_tokens = True.")
218
- continue
219
- if not user_input:
220
- print("Input cannot be empty. Please enter something.")
221
- continue
222
-
223
- messages.append({"role": "user", "content": user_input})
224
- response, stop_flag, metrics = generate_stream(model, tokenizer, messages, skip_prompt, skip_special_tokens, 40960)
225
- print("\n\nMetrics:")
226
- for key, value in metrics.items():
227
- print(f" {key}: {value}")
228
-
229
- print("", flush=True)
230
-
231
- if stop_flag:
232
- continue
233
- messages.append({"role": "assistant", "content": response})
234
-
235
- if __name__ == "__main__":
236
- main()
237
  ```
238
 
239
- ### Usage Warnings
240
 
 
241
 
242
- - **Risk of Sensitive or Controversial Outputs**: This model’s safety filtering has been significantly reduced, potentially generating sensitive, controversial, or inappropriate content. Users should exercise caution and rigorously review generated outputs.
243
-
244
- - **Not Suitable for All Audiences**: Due to limited content filtering, the model’s outputs may be inappropriate for public settings, underage users, or applications requiring high security.
245
-
246
- - **Legal and Ethical Responsibilities**: Users must ensure their usage complies with local laws and ethical standards. Generated content may carry legal or ethical risks, and users are solely responsible for any consequences.
247
-
248
- - **Research and Experimental Use**: It is recommended to use this model for research, testing, or controlled environments, avoiding direct use in production or public-facing commercial applications.
249
-
250
- - **Monitoring and Review Recommendations**: Users are strongly advised to monitor model outputs in real-time and conduct manual reviews when necessary to prevent the dissemination of inappropriate content.
251
-
252
- - **No Default Safety Guarantees**: Unlike standard models, this model has not undergone rigorous safety optimization. huihui.ai bears no responsibility for any consequences arising from its use.
253
-
254
-
255
- ### Donation
256
- ##### Your donation helps us continue our further development and improvement, a cup of coffee can do it.
257
- - bitcoin:
258
- ```
259
- bc1qqnkhuchxw0zqjh2ku3lu4hq45hc6gy84uk70ge
260
- ```
261
- - Support our work on [Ko-fi](https://ko-fi.com/huihuiai)!
262
-
 
1
  ---
 
 
2
  license: mit
3
  language:
4
  - en
5
  - zh
 
 
6
  tags:
7
+ - zen4
8
+ - zenlm
9
+ - hanzo
10
  - abliterated
11
  - uncensored
12
+ base_model: huihui-ai/Huihui-GLM-4.7-Flash-abliterated
13
+ pipeline_tag: text-generation
14
  ---
15
 
16
+ # Zen4 Coder Flash
17
 
18
+ **31B MoE total, 3B active | 131K Context**
19
 
20
+ Code-focused MoE model based on GLM-4.7-Flash with abliterated weights. 131K context for entire codebases. Tool calling and reasoning mode support.
 
21
 
22
+ Part of the [Zen4 family](https://zenlm.org) by [Zen LM](https://huggingface.co/zenlm) and [Hanzo AI](https://hanzo.ai).
23
 
24
+ ## Model Details
 
25
 
26
+ | Property | Value |
27
+ |----------|-------|
28
+ | **Parameters** | 31B MoE total, 3B active |
29
+ | **Context** | 131K tokens |
30
+ | **Base** | GLM-4.7-Flash (abliterated) |
31
+ | **License** | MIT |
32
+ | **Family** | Zen4 |
33
 
34
  ## Usage
 
 
35
 
36
  ```python
37
+ from transformers import AutoModelForCausalLM, AutoTokenizer
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
38
 
39
+ model = AutoModelForCausalLM.from_pretrained("zenlm/zen4-coder-flash", torch_dtype="auto", device_map="auto")
40
+ tokenizer = AutoTokenizer.from_pretrained("zenlm/zen4-coder-flash")
 
 
 
 
 
 
41
 
42
+ messages = [{"role": "user", "content": "Hello"}]
43
+ text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
44
+ inputs = tokenizer([text], return_tensors="pt").to(model.device)
45
+ outputs = model.generate(**inputs, max_new_tokens=512)
46
+ print(tokenizer.decode(outputs[0], skip_special_tokens=True))
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
47
  ```
48
 
49
+ ## About Zen4
50
 
51
+ The Zen4 family provides abliterated (uncensored) variants of the best open-source models, ranging from 4B to 1T+ parameters.
52
 
53
+ - **Website**: [zenlm.org](https://zenlm.org)
54
+ - **Organization**: [Zen LM](https://huggingface.co/zenlm)
55
+ - **Built by**: [Hanzo AI](https://hanzo.ai) (Techstars '17)