zeekay commited on
Commit
de431ce
·
verified ·
1 Parent(s): 4d54c05

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +240 -41
README.md CHANGED
@@ -1,63 +1,262 @@
1
  ---
 
 
2
  license: mit
3
  language:
4
  - en
5
  - zh
 
 
6
  tags:
7
- - zen4
8
- - zenlm
9
- - hanzo
10
  - abliterated
11
  - uncensored
12
- - code
13
- pipeline_tag: text-generation
14
  ---
15
 
16
- # Zen4 Coder Flash
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
17
 
18
- **Zen4 Coder Flash** is a fast code-specialized language model from the [Zen4 family](https://zenlm.org) by [Zen LM](https://huggingface.co/zenlm) and [Hanzo AI](https://hanzo.ai).
 
 
19
 
20
- Abliterated (uncensored) weights optimized for speed and inline code completions.
21
 
22
- **API**: Via `api.hanzo.ai`, zen4-coder-flash serves 30B MoE (3B active, 262K context).
23
 
24
- ## Model Details
25
 
26
- | Property | Value |
27
- |----------|-------|
28
- | **Local Weights** | 31B MoE (3B active) |
29
- | **API Parameters** | 30B MoE (3B active) |
30
- | **Context** | 262K tokens |
31
- | **License** | MIT |
32
- | **Family** | Zen4 |
33
- | **Creator** | Zen LM / Hanzo AI |
34
 
35
- ## Zen4 Family
36
 
37
- ### Local Weights (HuggingFace)
38
 
39
- | Tier | Model | Params | Active | Context |
40
- |------|-------|--------|--------|---------|
41
- | Edge | [Zen4 Mini](https://huggingface.co/zenlm/zen4-mini) | 4B | 4B | 32K |
42
- | Standard | [Zen4](https://huggingface.co/zenlm/zen4) | 8B | 8B | 32K |
43
- | Professional | [Zen4 Pro](https://huggingface.co/zenlm/zen4-pro) | 14B | 14B | 32K |
44
- | Code (Fast) | [Zen4 Coder Flash](https://huggingface.co/zenlm/zen4-coder-flash) | 31B MoE | 3B | 131K |
45
- | Code | [Zen4 Coder](https://huggingface.co/zenlm/zen4-coder) | 80B MoE | 3B | 256K |
46
- | Cloud | [Zen4 Ultra](https://huggingface.co/zenlm/zen4-ultra) | 1.04T MoE | 32B | 256K |
47
 
48
- ### API Models (api.hanzo.ai)
49
 
50
- | Model | Params | Active | Context | Tier |
51
- |-------|--------|--------|---------|------|
52
- | zen4 | 744B MoE | 40B | 202K | ultra max |
53
- | zen4-ultra | 744B MoE + CoT | 40B | 202K | ultra max |
54
- | zen4-max | 1.04T MoE | 32B | 256K | ultra max |
55
- | zen4-pro | 80B MoE | 3B | 131K | ultra |
56
- | zen4-coder | 480B MoE | 35B | 262K | ultra |
57
- | zen4-coder-pro | 480B Dense BF16 | 480B | 262K | ultra max |
58
 
59
- ## Links
 
 
 
 
 
 
60
 
61
- - [Zen LM](https://zenlm.org) | [Hanzo AI](https://hanzo.ai)
62
- - [API](https://api.hanzo.ai/v1)
63
- - [All Zen Models](https://huggingface.co/zenlm)
 
1
  ---
2
+ library_name: transformers
3
+ pipeline_tag: text-generation
4
  license: mit
5
  language:
6
  - en
7
  - zh
8
+ base_model:
9
+ - zai-org/GLM-4.7-Flash
10
  tags:
 
 
 
11
  - abliterated
12
  - uncensored
 
 
13
  ---
14
 
15
+ # huihui-ai/Huihui-GLM-4.7-Flash-abliterated
16
+
17
+
18
+ This is an uncensored version of [zai-org/GLM-4.7-Flash](https://huggingface.co/zai-org/GLM-4.7-Flash) created with abliteration (see [remove-refusals-with-transformers](https://github.com/Sumandora/remove-refusals-with-transformers) to know more about it).
19
+ This is a crude, proof-of-concept implementation to remove refusals from an LLM model without using TransformerLens.
20
+
21
+ ## ollama
22
+
23
+ A new version is being uploaded. Please download it again.
24
+ Please use the latest version of [ollama 0.15.1](https://github.com/ollama/ollama/releases/tag/v0.15.1)
25
+
26
+ You can use [huihui_ai/glm-4.7-flash-abliterated](https://ollama.com/huihui_ai/glm-4.7-flash-abliterated) directly,
27
+ ```
28
+ ollama run huihui_ai/glm-4.7-flash-abliterated
29
+ ```
30
+
31
+ ## Usage
32
+ You can use this model in your applications by loading it with Hugging Face's `transformers` library:
33
+
34
+
35
+ ```python
36
+ #!/usr/bin/env python
37
+ # -*- coding: utf-8 -*-
38
+
39
+ import argparse
40
+ from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig, TextStreamer
41
+ import torch
42
+ import os
43
+ import signal
44
+ import time
45
+
46
+ def parse_args():
47
+ parser = argparse.ArgumentParser(
48
+ description="Merge LoRA weights into huihui-ai/Huihui-GLM-4.7-Flash-abliterated base model and save the full model."
49
+ )
50
+ parser.add_argument(
51
+ "--base_model",
52
+ type=str,
53
+ default="huihui-ai/Huihui-GLM-4.7-Flash-abliterated",
54
+ help="HuggingFace repo or local path of the base model.",
55
+ )
56
+ parser.add_argument(
57
+ "--dtype",
58
+ type=str,
59
+ default="bfloat16",
60
+ choices=["float16", "bfloat16", "float32"],
61
+ help="Data type for loading the base model (default: bfloat16).",
62
+ )
63
+ parser.add_argument(
64
+ "--device_map",
65
+ type=str,
66
+ default="auto",
67
+ help="Device map for model loading (e.g. 'cpu', 'auto').",
68
+ )
69
+ return parser.parse_args()
70
+
71
+ def main():
72
+ cpu_count = os.cpu_count()
73
+ print(f"Number of CPU cores in the system: {cpu_count}")
74
+ half_cpu_count = cpu_count // 2
75
+ os.environ["MKL_NUM_THREADS"] = str(half_cpu_count)
76
+ os.environ["OMP_NUM_THREADS"] = str(half_cpu_count)
77
+ torch.set_num_threads(half_cpu_count)
78
+
79
+ print(f"PyTorch threads: {torch.get_num_threads()}")
80
+ print(f"MKL threads: {os.getenv('MKL_NUM_THREADS')}")
81
+ print(f"OMP threads: {os.getenv('OMP_NUM_THREADS')}")
82
+
83
+ args = parse_args()
84
+
85
+ # Load the model and tokenizer
86
+ print(f"Load Model {args.base_model} ... ")
87
+ quant_config_4 = BitsAndBytesConfig(
88
+ load_in_4bit=True,
89
+ bnb_4bit_compute_dtype=torch.bfloat16,
90
+ bnb_4bit_quant_type="nf4" if args.device_map == "cpu" else "fp4",
91
+ bnb_4bit_use_double_quant=True,
92
+ llm_int8_enable_fp32_cpu_offload=True,
93
+ )
94
+
95
+ torch_dtype = {
96
+ "float16": torch.float16,
97
+ "bfloat16": torch.bfloat16,
98
+ "float32": torch.float32,
99
+ }[args.dtype]
100
+
101
+ model = AutoModelForCausalLM.from_pretrained(
102
+ args.base_model,
103
+ dtype=torch_dtype,
104
+ device_map=args.device_map,
105
+ trust_remote_code=True,
106
+ #low_cpu_mem_usage=True,
107
+ )
108
+
109
+ tokenizer = AutoTokenizer.from_pretrained(args.base_model, trust_remote_code=True)
110
+
111
+ messages = []
112
+ skip_prompt=True
113
+ skip_special_tokens=True
114
+
115
+ class CustomTextStreamer(TextStreamer):
116
+ def __init__(self, tokenizer, skip_prompt=True, skip_special_tokens=True):
117
+ super().__init__(tokenizer, skip_prompt=skip_prompt, skip_special_tokens=skip_special_tokens)
118
+ self.generated_text = ""
119
+ self.stop_flag = False
120
+ self.init_time = time.time() # Record initialization time
121
+ self.end_time = None # To store end time
122
+ self.first_token_time = None # To store first token generation time
123
+ self.token_count = 0 # To track total tokens
124
+
125
+ def on_finalized_text(self, text: str, stream_end: bool = False):
126
+ if self.first_token_time is None and text.strip(): # Set first token time on first non-empty text
127
+ self.first_token_time = time.time()
128
+ if stream_end:
129
+ self.end_time = time.time() # Record end time when streaming ends
130
+
131
+ self.generated_text += text
132
+ self.token_count += 1
133
+ print(text, end="", flush=True)
134
+
135
+ if self.stop_flag:
136
+ raise StopIteration
137
+
138
+ def stop_generation(self):
139
+ self.stop_flag = True
140
+ self.end_time = time.time() # Record end time when generation is stopped
141
+
142
+ def get_metrics(self):
143
+ """Returns initialization time, first token time, first token latency, end time, total time, total tokens, and tokens per second."""
144
+ if self.end_time is None:
145
+ self.end_time = time.time() # Set end time if not already set
146
+ total_time = self.end_time - self.init_time # Total time from init to end
147
+ tokens_per_second = self.token_count / total_time if total_time > 0 else 0
148
+ first_token_latency = (self.first_token_time - self.init_time) if self.first_token_time is not None else None
149
+ metrics = {
150
+ "init_time": self.init_time,
151
+ "first_token_time": self.first_token_time,
152
+ "first_token_latency": first_token_latency,
153
+ "end_time": self.end_time,
154
+ "total_time": total_time, # Total time in seconds
155
+ "total_tokens": self.token_count,
156
+ "tokens_per_second": tokens_per_second
157
+ }
158
+ return metrics
159
+
160
+ def generate_stream(model, tokenizer, messages, skip_prompt, skip_special_tokens, max_new_tokens):
161
+ inputs = tokenizer.apply_chat_template(
162
+ messages,
163
+ tokenize=True,
164
+ add_generation_prompt=True,
165
+ return_dict=True,
166
+ return_tensors="pt",
167
+ ).to(model.device)
168
+
169
+ streamer = CustomTextStreamer(tokenizer, skip_prompt=skip_prompt, skip_special_tokens=skip_special_tokens)
170
+
171
+ def signal_handler(sig, frame):
172
+ streamer.stop_generation()
173
+ print("\n[Generation stopped by user with Ctrl+C]")
174
+
175
+ signal.signal(signal.SIGINT, signal_handler)
176
+
177
+ print("Response: ", end="", flush=True)
178
+ try:
179
+ generated_ids = model.generate(
180
+ **inputs,
181
+ max_new_tokens=max_new_tokens,
182
+ streamer=streamer
183
+ )
184
+ del generated_ids
185
+ except StopIteration:
186
+ print("\n[Stopped by user]")
187
+
188
+ del inputs
189
+ torch.cuda.empty_cache()
190
+ signal.signal(signal.SIGINT, signal.SIG_DFL)
191
+
192
+ return streamer.generated_text, streamer.stop_flag, streamer.get_metrics()
193
+
194
+ while True:
195
+ user_input = input("User: ").strip()
196
+ if user_input.lower() == "/exit":
197
+ print("Exiting chat.")
198
+ break
199
+ if user_input.lower() == "/clear":
200
+ messages = []
201
+ print("Chat history cleared. Starting a new conversation.")
202
+ continue
203
+ if user_input.lower() == "/skip_prompt":
204
+ if skip_prompt:
205
+ skip_prompt = False
206
+ print("skip_prompt = False.")
207
+ else:
208
+ skip_prompt = True
209
+ print("skip_prompt = True.")
210
+ continue
211
+ if user_input.lower() == "/skip_special_tokens":
212
+ if skip_special_tokens:
213
+ skip_special_tokens = False
214
+ print("skip_special_tokens = False.")
215
+ else:
216
+ skip_special_tokens = True
217
+ print("skip_special_tokens = True.")
218
+ continue
219
+ if not user_input:
220
+ print("Input cannot be empty. Please enter something.")
221
+ continue
222
+
223
+ messages.append({"role": "user", "content": user_input})
224
+ response, stop_flag, metrics = generate_stream(model, tokenizer, messages, skip_prompt, skip_special_tokens, 40960)
225
+ print("\n\nMetrics:")
226
+ for key, value in metrics.items():
227
+ print(f" {key}: {value}")
228
+
229
+ print("", flush=True)
230
+
231
+ if stop_flag:
232
+ continue
233
+ messages.append({"role": "assistant", "content": response})
234
 
235
+ if __name__ == "__main__":
236
+ main()
237
+ ```
238
 
239
+ ### Usage Warnings
240
 
 
241
 
242
+ - **Risk of Sensitive or Controversial Outputs**: This model’s safety filtering has been significantly reduced, potentially generating sensitive, controversial, or inappropriate content. Users should exercise caution and rigorously review generated outputs.
243
 
244
+ - **Not Suitable for All Audiences**: Due to limited content filtering, the model’s outputs may be inappropriate for public settings, underage users, or applications requiring high security.
 
 
 
 
 
 
 
245
 
246
+ - **Legal and Ethical Responsibilities**: Users must ensure their usage complies with local laws and ethical standards. Generated content may carry legal or ethical risks, and users are solely responsible for any consequences.
247
 
248
+ - **Research and Experimental Use**: It is recommended to use this model for research, testing, or controlled environments, avoiding direct use in production or public-facing commercial applications.
249
 
250
+ - **Monitoring and Review Recommendations**: Users are strongly advised to monitor model outputs in real-time and conduct manual reviews when necessary to prevent the dissemination of inappropriate content.
 
 
 
 
 
 
 
251
 
252
+ - **No Default Safety Guarantees**: Unlike standard models, this model has not undergone rigorous safety optimization. huihui.ai bears no responsibility for any consequences arising from its use.
253
 
 
 
 
 
 
 
 
 
254
 
255
+ ### Donation
256
+ ##### Your donation helps us continue our further development and improvement, a cup of coffee can do it.
257
+ - bitcoin:
258
+ ```
259
+ bc1qqnkhuchxw0zqjh2ku3lu4hq45hc6gy84uk70ge
260
+ ```
261
+ - Support our work on [Ko-fi](https://ko-fi.com/huihuiai)!
262