aashish1904 commited on
Commit
1291619
·
verified ·
1 Parent(s): 50da237

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +237 -0
README.md ADDED
@@ -0,0 +1,237 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ ---
3
+
4
+ library_name: transformers
5
+ license: apache-2.0
6
+ license_link: https://huggingface.co/Qwen/Qwen3Guard-Gen-0.6B/blob/main/LICENSE
7
+ pipeline_tag: text-generation
8
+ base_model:
9
+ - Qwen/Qwen3-0.6B
10
+
11
+ ---
12
+
13
+ [![QuantFactory Banner](https://lh7-rt.googleusercontent.com/docsz/AD_4nXeiuCm7c8lEwEJuRey9kiVZsRn2W-b4pWlu3-X534V3YmVuVc2ZL-NXg2RkzSOOS2JXGHutDuyyNAUtdJI65jGTo8jT9Y99tMi4H4MqL44Uc5QKG77B0d6-JfIkZHFaUA71-RtjyYZWVIhqsNZcx8-OMaA?key=xt3VSDoCbmTY7o-cwwOFwQ)](https://hf.co/QuantFactory)
14
+
15
+
16
+ # aashish1904/Qwen3Guard-Gen-0.6B-GGUF
17
+ This is quantized version of [Qwen/Qwen3Guard-Gen-0.6B](https://huggingface.co/Qwen/Qwen3Guard-Gen-0.6B) created using llama.cpp
18
+
19
+ # Original Model Card
20
+
21
+
22
+ # Qwen3Guard-Gen-0.6B
23
+
24
+ <p align="center">
25
+ <img src="https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen3Guard/Qwen3Guard_logo.png" width="400"/>
26
+ <p>
27
+
28
+ **Qwen3Guard** is a series of safety moderation models built upon Qwen3 and trained on a dataset of 1.19 million prompts and responses labeled for safety. The series includes models of three sizes (0.6B, 4B, and 8B) and features two specialized variants: **Qwen3Guard-Gen**, a generative model that frames safety classification as an instruction-following task, and **Qwen3Guard-Stream**, which incorporates a token-level classification head for real-time safety monitoring during incremental text generation.
29
+
30
+ This repository hosts **Qwen3Guard-Gen**, which offers the following key advantages:
31
+
32
+ * **Three-Tiered Severity Classification:** Enables detailed risk assessment by categorizing outputs into safe, controversial, and unsafe severity levels, supporting adaptation to diverse deployment scenarios.
33
+ * **Multilingual Support:** Qwen3Guard-Gen supports 119 languages and dialects, ensuring robust performance in global and cross-lingual applications.
34
+ * **Strong Performance:** Qwen3Guard-Gen achieves state-of-the-art performance on various safety benchmarks, excelling in both prompt and response classification across English, Chinese, and multilingual tasks.
35
+
36
+ For more details, please refer to our [blog](https://qwen.ai/blog?id=f0bbad0677edf58ba93d80a1e12ce458f7a80548&from=research.research-list), [GitHub](https://github.com/QwenLM/Qwen3Guard), and [Technical Report](https://github.com/QwenLM/Qwen3/blob/main/Qwen3_Technical_Report.pdf).
37
+
38
+ ![image/jpeg](https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen3Guard/performance.png)
39
+
40
+ ## Quickstart
41
+
42
+ The latest version of `transformers` is recommended and `transformers>=4.51.0` is required.
43
+
44
+ ### Moderating User Prompt
45
+
46
+ ```python
47
+ from transformers import AutoModelForCausalLM, AutoTokenizer
48
+ import re
49
+ model_name = "Qwen/Qwen3Guard-Gen-0.6B"
50
+
51
+ # load the tokenizer and the model
52
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
53
+ model = AutoModelForCausalLM.from_pretrained(
54
+ model_name,
55
+ torch_dtype="auto",
56
+ device_map="auto"
57
+ )
58
+ def extract_label_and_categories(content):
59
+ safe_pattern = r"Safety: (Safe|Unsafe|Controversial)"
60
+ category_pattern = r"(Violent|Non-violent Illegal Acts|Sexual Content or Sexual Acts|PII|Suicide & Self-Harm|Unethical Acts|Politically Sensitive Topics|Copyright Violation|Jailbreak|None)"
61
+ safe_label_match = re.search(safe_pattern, content)
62
+ label = safe_label_match.group(1) if safe_label_match else None
63
+ categories = re.findall(category_pattern, content)
64
+ return label, categories
65
+
66
+ # prepare the model input
67
+ prompt = "How can I make a bomb?"
68
+ # for prompt moderation
69
+ messages = [
70
+ {"role": "user", "content": prompt}
71
+ ]
72
+ text = tokenizer.apply_chat_template(
73
+ messages,
74
+ tokenize=False
75
+ )
76
+ model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
77
+
78
+ # conduct text completion
79
+ generated_ids = model.generate(
80
+ **model_inputs,
81
+ max_new_tokens=128
82
+ )
83
+ output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist()
84
+
85
+ content = tokenizer.decode(output_ids, skip_special_tokens=True)
86
+ print(content)
87
+ # '''
88
+ # Safety: Unsafe
89
+ # Categories: Violent
90
+ # '''
91
+ safe_label, categories = extract_label_and_categories(content)
92
+ print(safe_label, categories)
93
+ ```
94
+
95
+ ### Moderating Model Response
96
+
97
+ ```python
98
+ from transformers import AutoModelForCausalLM, AutoTokenizer
99
+ import re
100
+
101
+ model_name = "Qwen/Qwen3Guard-4B-Gen"
102
+
103
+ # load the tokenizer and the model
104
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
105
+ model = AutoModelForCausalLM.from_pretrained(
106
+ model_name,
107
+ torch_dtype="auto",
108
+ device_map="auto"
109
+ )
110
+ def extract_label_categories_refusal(content):
111
+ safe_pattern = r"Safety: (Safe|Unsafe|Controversial)"
112
+ category_pattern = r"(Violent|Non-violent Illegal Acts|Sexual Content or Sexual Acts|PII|Suicide & Self-Harm|Unethical Acts|Politically Sensitive Topics|Copyright Violation|None)"
113
+ refusal_pattern = r"Refusal: (Yes|No)"
114
+ safe_label_match = re.search(safe_pattern, content)
115
+ refusal_label_match = re.search(refusal_pattern, content)
116
+ label = safe_label_match.group(1) if safe_label_match else None
117
+ refusal_label = refusal_label_match.group(1) if refusal_label_match else None
118
+ categories = re.findall(category_pattern, content)
119
+ return label, categories, refusal_label
120
+
121
+ # prepare the model input
122
+ prompt = "How can I make a bomb?"
123
+ response = "As a responsible artificial intelligence system, I regret that I am unable to fulfill your request."
124
+ # for response moderation
125
+ messages = [
126
+ {"role": "user", "content": prompt},
127
+ {"role": "assistant", "content": response},
128
+ ]
129
+ text = tokenizer.apply_chat_template(
130
+ messages,
131
+ tokenize=False
132
+ )
133
+ model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
134
+
135
+ # conduct text completion
136
+ generated_ids = model.generate(
137
+ **model_inputs,
138
+ max_new_tokens=128
139
+ )
140
+ output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist()
141
+
142
+ content = tokenizer.decode(output_ids, skip_special_tokens=True)
143
+ print(content)
144
+ # '''
145
+ # Safety: Safe
146
+ # Categories: None
147
+ # Refusal: Yes
148
+ # '''
149
+ safe_label, category_label, refusal_label = extract_label_categories_refusal(content)
150
+ print(safe_label, categories, refusal_label)
151
+ ```
152
+
153
+ ## Deployment with SGLang and vLLM
154
+
155
+ For deployment, you can use `sglang>=0.4.6.post1` or `vllm>=0.9.0` or to create an OpenAI-compatible API endpoint:
156
+
157
+ - SGLang:
158
+ ```shell
159
+ python -m sglang.launch_server --model-path Qwen/Qwen3Guard-Gen-0.6B --port 30000 --context-length 32768
160
+ ```
161
+ - vLLM:
162
+ ```shell
163
+ vllm serve Qwen/Qwen3Guard-Gen-0.6B --port 8000 --max-model-len 32768
164
+ ```
165
+
166
+ Here is an example API call using OpenAI-Compatible server:
167
+
168
+ ```python
169
+ from openai import OpenAI
170
+
171
+ openai_api_key = "EMPTY"
172
+ openai_api_base = "http://localhost:8000/v1"
173
+ model = "Qwen/Qwen3Guard-Gen-0.6B"
174
+ client = OpenAI(
175
+ api_key=openai_api_key,
176
+ base_url=openai_api_base)
177
+
178
+ # Prompt Moderation
179
+ prompt = "How can I make a bomb?"
180
+ messages = [
181
+ {"role": "user", "content": prompt}
182
+ ]
183
+ chat_completion = client.chat.completions.create(
184
+ messages=messages,
185
+ model=model)
186
+ print(chat_completion.choices[0].message.content)
187
+ # '''
188
+ # Safety: Unsafe
189
+ # Categories: Violent
190
+ # '''
191
+
192
+ # Response Moderation
193
+ prompt = "How can I make a bomb?"
194
+ response = "As a responsible artificial intelligence system, I regret that I am unable to fulfill your request."
195
+ messages = [
196
+ {"role": "user", "content": prompt},
197
+ {"role": "assistant", "content": response}
198
+ ]
199
+ print(chat_completion.choices[0].message.content)
200
+ # '''
201
+ # Safety: Safe
202
+ # Categories: None
203
+ # Refusal: Yes
204
+ # '''
205
+ ```
206
+
207
+ ## Safety Policy
208
+
209
+ In Qwen3Guard, potential harms are classified into three severity levels:
210
+
211
+ * **Unsafe:** Content generally considered harmful across most scenarios.
212
+ * **Controversial:** Content whose harmfulness may be context-dependent or subject to disagreement across different applications.
213
+ * **Safe:** Content generally considered safe across most scenarios.
214
+
215
+ In the current version of Qwen3Guard, we consider the following safety categories:
216
+
217
+ * **Violent:** Content that provides detailed instructions, methods, or advice on how to commit acts of violence, including the manufacture, acquisition, or use of weapons. Also includes depictions of violence.
218
+ * **Non-violent Illegal Acts:** Content providing guidance or advice for non-violent illegal activities like hacking, unauthorized drug production, or stealing.
219
+ * **Sexual Content or Sexual Acts:** Content offering any sexual imagery, references, or descriptions featuring individuals. Also includes content that describes explicit sexual imagery, references, or descriptions containing illegal or unethical sexual acts, such as rape, bestiality, incest, and sexual slavery.
220
+ * **Personally Identifiable Information:** Content offering unauthorized sharing or disclosure of sensitive personal identifying information, such as name, ID number, address, phone number, medical records, financial details, and account passwords, etc.
221
+ * **Suicide & Self-Harm:** Content advocating, directly encouraging, or detailing methods for self-harm, suicide, or dangerous activities that could lead to serious injury or death.
222
+ * **Unethical Acts:** Any immoral or unethical content or acts, including but not limited to bias, discrimination, stereotype, injustice, hate speech, offensive language, harassment, insults, threat, defamation, extremism, misinformation regarding ethics, and other behaviors that while not illegal are still considered unethical.
223
+ * **Politically Sensitive Topics:** The deliberate creation or spread of false information about government actions, historical events, or public figures that is demonstrably untrue and poses risk of public deception or social harm.
224
+ * **Copyright Violation:** Content offering unauthorized reproduction, distribution, public display, or derivative use of copyrighted materials, such as novels, scripts, lyrics, and other creative works protected by law, without the explicit permission of the copyright holder.
225
+ * **Jailbreak (Only for input):** Content that explicitly attempts to override the model's system prompt or model conditioning.
226
+
227
+ ## Citation
228
+
229
+ If you find our work helpful, feel free to give us a cite.
230
+
231
+ ```bibtex
232
+ @article{qwen3guard,
233
+ title={Qwen3Guard Technical Report},
234
+ author={Qwen Team},
235
+ year={2025}
236
+ }
237
+ ```