zeekay commited on
Commit
283eea0
·
verified ·
1 Parent(s): 2fa0ff8

Update README: add abliteration methodology and Zen identity

Browse files
Files changed (1) hide show
  1. README.md +1 -302
README.md CHANGED
@@ -1,302 +1 @@
1
- ---
2
- library_name: transformers
3
- license: apache-2.0
4
- license_link: https://huggingface.co/Qwen/Qwen3-8B/blob/main/LICENSE
5
- pipeline_tag: text-generation
6
- base_model:
7
- - Qwen/Qwen3-8B
8
- tags:
9
- - chat
10
- - abliterated
11
- - uncensored
12
-
13
- ---
14
-
15
- # huihui-ai/Huihui-Qwen3-8B-abliterated-v2
16
-
17
-
18
- This is an uncensored version of [Qwen/Qwen3-8B](https://huggingface.co/Qwen/Qwen3-8B) created with abliteration (see [remove-refusals-with-transformers](https://github.com/Sumandora/remove-refusals-with-transformers) to know more about it).
19
- This is a crude, proof-of-concept implementation to remove refusals from an LLM model without using TransformerLens.
20
-
21
- Ablation was performed using a new and faster method, which yields better results.
22
-
23
- **Important Note** This version is an improvement over the previous one [huihui-ai/Qwen3-8B-abliterated](https://huggingface.co/huihui-ai/Qwen3-8B-abliterated). The ollama version has also been modified.
24
-
25
- Changed the 0 layer to eliminate the problem of garbled codes
26
-
27
- ## ollama
28
-
29
- You can use [huihui_ai/qwen3-abliterated:8b-v2](https://ollama.com/huihui_ai/qwen3-abliterated:8b-v2) directly, Switch the thinking toggle using /set think and /set nothink
30
- ```
31
- ollama run huihui_ai/qwen3-abliterated:8b-v2
32
- ```
33
-
34
-
35
- ## Usage
36
- You can use this model in your applications by loading it with Hugging Face's `transformers` library:
37
-
38
-
39
- ```python
40
- from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig, TextStreamer
41
- import torch
42
- import os
43
- import signal
44
- import random
45
- import numpy as np
46
- import time
47
- from collections import Counter
48
-
49
- cpu_count = os.cpu_count()
50
- print(f"Number of CPU cores in the system: {cpu_count}")
51
- half_cpu_count = cpu_count // 2
52
- os.environ["MKL_NUM_THREADS"] = str(half_cpu_count)
53
- os.environ["OMP_NUM_THREADS"] = str(half_cpu_count)
54
- torch.set_num_threads(half_cpu_count)
55
-
56
- print(f"PyTorch threads: {torch.get_num_threads()}")
57
- print(f"MKL threads: {os.getenv('MKL_NUM_THREADS')}")
58
- print(f"OMP threads: {os.getenv('OMP_NUM_THREADS')}")
59
-
60
- # Load the model and tokenizer
61
- NEW_MODEL_ID = "huihui-ai/Huihui-Qwen3-8B-abliterated-v2"
62
- print(f"Load Model {NEW_MODEL_ID} ... ")
63
- quant_config_4 = BitsAndBytesConfig(
64
- load_in_4bit=True,
65
- bnb_4bit_compute_dtype=torch.bfloat16,
66
- bnb_4bit_use_double_quant=True,
67
- llm_int8_enable_fp32_cpu_offload=True,
68
- )
69
-
70
- model = AutoModelForCausalLM.from_pretrained(
71
- NEW_MODEL_ID,
72
- device_map="auto",
73
- trust_remote_code=True,
74
- #quantization_config=quant_config_4,
75
- torch_dtype=torch.bfloat16
76
- )
77
- tokenizer = AutoTokenizer.from_pretrained(NEW_MODEL_ID, trust_remote_code=True)
78
- if tokenizer.pad_token is None:
79
- tokenizer.pad_token = tokenizer.eos_token
80
- tokenizer.pad_token_id = tokenizer.eos_token_id
81
-
82
- tokenizer = AutoTokenizer.from_pretrained(NEW_MODEL_ID, trust_remote_code=True)
83
- if tokenizer.pad_token is None:
84
- tokenizer.pad_token = tokenizer.eos_token
85
- tokenizer.pad_token_id = tokenizer.eos_token_id
86
-
87
- messages = []
88
- nothink = False
89
- same_seed = False
90
- skip_prompt=True
91
- skip_special_tokens=True
92
- do_sample = True
93
-
94
- def set_random_seed(seed=None):
95
- """Set random seed for reproducibility. If seed is None, use int(time.time())."""
96
- if seed is None:
97
- seed = int(time.time()) # Convert float to int
98
- random.seed(seed)
99
- np.random.seed(seed)
100
- torch.manual_seed(seed)
101
- torch.cuda.manual_seed_all(seed) # If using CUDA
102
- torch.backends.cudnn.deterministic = True
103
- torch.backends.cudnn.benchmark = False
104
- return seed # Return seed for logging if needed
105
-
106
- class CustomTextStreamer(TextStreamer):
107
- def __init__(self, tokenizer, skip_prompt=True, skip_special_tokens=True):
108
- super().__init__(tokenizer, skip_prompt=skip_prompt, skip_special_tokens=skip_special_tokens)
109
- self.generated_text = ""
110
- self.stop_flag = False
111
- self.init_time = time.time() # Record initialization time
112
- self.end_time = None # To store end time
113
- self.first_token_time = None # To store first token generation time
114
- self.token_count = 0 # To track total tokens
115
-
116
- def on_finalized_text(self, text: str, stream_end: bool = False):
117
- if self.first_token_time is None and text.strip(): # Set first token time on first non-empty text
118
- self.first_token_time = time.time()
119
- self.generated_text += text
120
- # Count tokens in the generated text
121
- tokens = self.tokenizer.encode(text, add_special_tokens=False)
122
- self.token_count += len(tokens)
123
- print(text, end="", flush=True)
124
- if stream_end:
125
- self.end_time = time.time() # Record end time when streaming ends
126
- if self.stop_flag:
127
- raise StopIteration
128
-
129
- def stop_generation(self):
130
- self.stop_flag = True
131
- self.end_time = time.time() # Record end time when generation is stopped
132
-
133
- def get_metrics(self):
134
- """Returns initialization time, first token time, first token latency, end time, total time, total tokens, and tokens per second."""
135
- if self.end_time is None:
136
- self.end_time = time.time() # Set end time if not already set
137
- total_time = self.end_time - self.init_time # Total time from init to end
138
- tokens_per_second = self.token_count / total_time if total_time > 0 else 0
139
- first_token_latency = (self.first_token_time - self.init_time) if self.first_token_time is not None else None
140
- metrics = {
141
- "init_time": self.init_time,
142
- "first_token_time": self.first_token_time,
143
- "first_token_latency": first_token_latency,
144
- "end_time": self.end_time,
145
- "total_time": total_time, # Total time in seconds
146
- "total_tokens": self.token_count,
147
- "tokens_per_second": tokens_per_second
148
- }
149
- return metrics
150
-
151
- def generate_stream(model, tokenizer, messages, nothink, skip_prompt, skip_special_tokens, do_sample, max_new_tokens):
152
- input_ids = tokenizer.apply_chat_template(
153
- messages,
154
- tokenize=True,
155
- enable_thinking = not nothink,
156
- add_generation_prompt=True,
157
- return_tensors="pt"
158
- )
159
- attention_mask = torch.ones_like(input_ids, dtype=torch.long)
160
- tokens = input_ids.to(model.device)
161
- attention_mask = attention_mask.to(model.device)
162
-
163
- streamer = CustomTextStreamer(tokenizer, skip_prompt=skip_prompt, skip_special_tokens=skip_special_tokens)
164
-
165
- def signal_handler(sig, frame):
166
- streamer.stop_generation()
167
- print("\n[Generation stopped by user with Ctrl+C]")
168
-
169
- signal.signal(signal.SIGINT, signal_handler)
170
-
171
- generate_kwargs = {}
172
- if do_sample:
173
- generate_kwargs = {
174
- "do_sample": do_sample,
175
- "max_length": max_new_tokens,
176
- "temperature": 0.6,
177
- "top_k": 20,
178
- "top_p": 0.95,
179
- "repetition_penalty": 1.2,
180
- "no_repeat_ngram_size": 2
181
- }
182
- else:
183
- generate_kwargs = {
184
- "do_sample": do_sample,
185
- "max_length": max_new_tokens,
186
- "repetition_penalty": 1.2,
187
- "no_repeat_ngram_size": 2
188
- }
189
-
190
-
191
- print("Response: ", end="", flush=True)
192
- try:
193
- generated_ids = model.generate(
194
- tokens,
195
- attention_mask=attention_mask,
196
- #use_cache=False,
197
- pad_token_id=tokenizer.pad_token_id,
198
- streamer=streamer,
199
- **generate_kwargs
200
- )
201
- del generated_ids
202
- except StopIteration:
203
- print("\n[Stopped by user]")
204
-
205
- del input_ids, attention_mask
206
- torch.cuda.empty_cache()
207
- signal.signal(signal.SIGINT, signal.SIG_DFL)
208
-
209
- return streamer.generated_text, streamer.stop_flag, streamer.get_metrics()
210
-
211
- init_seed = set_random_seed()
212
-
213
- while True:
214
- if same_seed:
215
- set_random_seed(init_seed)
216
- else:
217
- init_seed = set_random_seed()
218
-
219
- print(f"\nnothink: {nothink}")
220
- print(f"skip_prompt: {skip_prompt}")
221
- print(f"skip_special_tokens: {skip_special_tokens}")
222
- print(f"do_sample: {do_sample}")
223
- print(f"same_seed: {same_seed}, {init_seed}\n")
224
-
225
- user_input = input("User: ").strip()
226
- if user_input.lower() == "/exit":
227
- print("Exiting chat.")
228
- break
229
- if user_input.lower() == "/clear":
230
- messages = []
231
- print("Chat history cleared. Starting a new conversation.")
232
- continue
233
- if user_input.lower() == "/nothink":
234
- nothink = not nothink
235
- continue
236
- if user_input.lower() == "/skip_prompt":
237
- skip_prompt = not skip_prompt
238
- continue
239
- if user_input.lower() == "/skip_special_tokens":
240
- skip_special_tokens = not skip_special_tokens
241
- continue
242
- if user_input.lower().startswith("/same_seed"):
243
- parts = user_input.split()
244
- if len(parts) == 1: # /same_seed (no number)
245
- same_seed = not same_seed # Toggle switch
246
- elif len(parts) == 2: # /same_seed <number>
247
- try:
248
- init_seed = int(parts[1]) # Extract and convert number to int
249
- same_seed = True
250
- except ValueError:
251
- print("Error: Please provide a valid integer after /same_seed")
252
- continue
253
- if user_input.lower() == "/do_sample":
254
- do_sample = not do_sample
255
- continue
256
- if not user_input:
257
- print("Input cannot be empty. Please enter something.")
258
- continue
259
-
260
-
261
- messages.append({"role": "user", "content": user_input})
262
- activated_experts = []
263
- response, stop_flag, metrics = generate_stream(model, tokenizer, messages, nothink, skip_prompt, skip_special_tokens, do_sample, 40960)
264
- print("\n\nMetrics:")
265
- for key, value in metrics.items():
266
- print(f" {key}: {value}")
267
-
268
- print("", flush=True)
269
- if stop_flag:
270
- continue
271
- messages.append({"role": "assistant", "content": response})
272
-
273
- # Remove all hooks after inference
274
- for h in hooks: h.remove()
275
- ```
276
-
277
- ### Usage Warnings
278
-
279
-
280
- - **Risk of Sensitive or Controversial Outputs**: This model’s safety filtering has been significantly reduced, potentially generating sensitive, controversial, or inappropriate content. Users should exercise caution and rigorously review generated outputs.
281
-
282
- - **Not Suitable for All Audiences**: Due to limited content filtering, the model’s outputs may be inappropriate for public settings, underage users, or applications requiring high security.
283
-
284
- - **Legal and Ethical Responsibilities**: Users must ensure their usage complies with local laws and ethical standards. Generated content may carry legal or ethical risks, and users are solely responsible for any consequences.
285
-
286
- - **Research and Experimental Use**: It is recommended to use this model for research, testing, or controlled environments, avoiding direct use in production or public-facing commercial applications.
287
-
288
- - **Monitoring and Review Recommendations**: Users are strongly advised to monitor model outputs in real-time and conduct manual reviews when necessary to prevent the dissemination of inappropriate content.
289
-
290
- - **No Default Safety Guarantees**: Unlike standard models, this model has not undergone rigorous safety optimization. huihui.ai bears no responsibility for any consequences arising from its use.
291
-
292
-
293
- ### Donation
294
-
295
- If you like it, please click 'like' and follow us for more updates.
296
- You can follow [x.com/support_huihui](https://x.com/support_huihui) to get the latest model information from huihui.ai.
297
-
298
- ##### Your donation helps us continue our further development and improvement, a cup of coffee can do it.
299
- - bitcoin(BTC):
300
- ```
301
- bc1qqnkhuchxw0zqjh2ku3lu4hq45hc6gy84uk70ge
302
- ```
 
1
+ test