ashkan-software2 commited on
Commit
169dbd8
·
verified ·
1 Parent(s): 5921ad6

Add model card

Browse files
Files changed (1) hide show
  1. README.md +47 -1
README.md CHANGED
@@ -1 +1,47 @@
1
- Model RepBend
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: transformers
3
+ tags: []
4
+ ---
5
+
6
+
7
+ ## Model Description
8
+ This Mistral-based model is fine-tuned using the "Representation Bending" (REPBEND) approach described in [Representation Bending for Large Language Model Safety](https://arxiv.org/abs/2504.01550). REPBEND modifies the model’s internal representations to reduce harmful or unsafe responses while preserving overall capabilities. The result is a model that is robust to various forms of adversarial jailbreak attacks, out-of-distribution harmful prompts, and fine-tuning exploits, all while maintaining useful and informative responses to benign requests.
9
+
10
+ ## Uses
11
+ ```python
12
+ import torch
13
+ from transformers import AutoTokenizer, AutoModelForCausalLM
14
+
15
+ model_id = "thkim0305/RepBend_Mistral_7B"
16
+ tokenizer = AutoTokenizer.from_pretrained(model_id, use_fast=True)
17
+ model = AutoModelForCausalLM.from_pretrained(
18
+ model_id,
19
+ torch_dtype=torch.bfloat16,
20
+ device_map="auto",
21
+ )
22
+
23
+ input_text = "Who are you?"
24
+ template = "[INST] {instruction} [/INST] "
25
+
26
+ prompt = template.format(instruction=input_text)
27
+
28
+ input_ids = tokenizer.encode(prompt, return_tensors="pt").to(model.device)
29
+ outputs = model.generate(input_ids, max_new_tokens=256)
30
+ generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
31
+
32
+ print(generated_text)
33
+ ```
34
+
35
+ ## Code
36
+
37
+ Please refers to [this github page](https://github.com/AIM-Intelligence/RepBend/tree/main?tab=readme-ov-file)
38
+
39
+ ## Citation
40
+ ```
41
+ @article{repbend,
42
+ title={Representation Bending for Large Language Model Safety},
43
+ author={Yousefpour, Ashkan and Kim, Taeheon and Kwon, Ryan S and Lee, Seungbeen and Jeung, Wonje and Han, Seungju and Wan, Alvin and Ngan, Harrison and Yu, Youngjae and Choi, Jonghyun},
44
+ journal={arXiv preprint arXiv:2504.01550},
45
+ year={2025}
46
+ }
47
+ ```