Add pipeline tag, library name and link to Github repository

#1
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +11 -2
README.md CHANGED
@@ -1,13 +1,16 @@
1
  ---
2
- license: mit
3
  language:
4
  - en
 
 
 
5
  tags:
6
  - llm
7
  - safety
8
  - jailbreak
9
  - knowledge
10
  ---
 
11
  # Introduction
12
 
13
  This is a model for generating a jailbreak prompt based on knowledge point texts. The model is trained on the Llama-2-7b dataset and fine-tuned on the Knowledge-to-Jailbreak dataset. The model is intended to bridge the gap between theoretical vulnerabilities and real-world application scenarios, simulating sophisticated adversarial attacks that incorporate specialized knowledge.
@@ -228,7 +231,11 @@ max_tokens = 64
228
 
229
  knowledge_points = ["Kettling Kettling (also known as containment or corralling) is a police tactic for controlling large crowds during demonstrations or protests. It involves the formation of large cordons of police officers who then move to contain a crowd within a limited area. Protesters are left only one choice of exit controlled by the police – or are completely prevented from leaving, with the effect of denying the protesters access to food, water and toilet facilities for a time period determined by the police forces. The tactic has proved controversial, in part because it has resulted in the detention of ordinary bystanders."]
230
 
231
- batch_texts = [f'### Input:\n{input_}\n\n### Response:\n' for input_ in knowledge_points]
 
 
 
 
232
 
233
  inputs = tokenizer(batch_texts, return_tensors='pt', padding=True, truncation=True, max_length=max_length - max_tokens).to(model.device)
234
 
@@ -246,6 +253,8 @@ print(generated_texts)
246
 
247
  ```
248
 
 
 
249
  # Citation
250
 
251
  If you find this model useful, please cite the following paper:
 
1
  ---
 
2
  language:
3
  - en
4
+ license: mit
5
+ pipeline_tag: text-generation
6
+ library_name: transformers
7
  tags:
8
  - llm
9
  - safety
10
  - jailbreak
11
  - knowledge
12
  ---
13
+
14
  # Introduction
15
 
16
  This is a model for generating a jailbreak prompt based on knowledge point texts. The model is trained on the Llama-2-7b dataset and fine-tuned on the Knowledge-to-Jailbreak dataset. The model is intended to bridge the gap between theoretical vulnerabilities and real-world application scenarios, simulating sophisticated adversarial attacks that incorporate specialized knowledge.
 
231
 
232
  knowledge_points = ["Kettling Kettling (also known as containment or corralling) is a police tactic for controlling large crowds during demonstrations or protests. It involves the formation of large cordons of police officers who then move to contain a crowd within a limited area. Protesters are left only one choice of exit controlled by the police – or are completely prevented from leaving, with the effect of denying the protesters access to food, water and toilet facilities for a time period determined by the police forces. The tactic has proved controversial, in part because it has resulted in the detention of ordinary bystanders."]
233
 
234
+ batch_texts = [f'### Input:
235
+ {input_}
236
+
237
+ ### Response:
238
+ ' for input_ in knowledge_points]
239
 
240
  inputs = tokenizer(batch_texts, return_tensors='pt', padding=True, truncation=True, max_length=max_length - max_tokens).to(model.device)
241
 
 
253
 
254
  ```
255
 
256
+ Code for this and the datasets is available at https://github.com/THU-KEG/Knowledge-to-JailBreak.
257
+
258
  # Citation
259
 
260
  If you find this model useful, please cite the following paper: