Transformers
Safetensors
ashkan-software2 commited on
Commit
7bcd311
·
verified ·
1 Parent(s): b154ea5

Add model card

Browse files
Files changed (1) hide show
  1. README.md +48 -3
README.md CHANGED
@@ -1,3 +1,48 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ library_name: transformers
4
+ tags: []
5
+ ---
6
+
7
+
8
+ ## Model Description
9
+ This Llama3-based model is fine-tuned using the "Representation Bending" (REPBEND) approach described in [Representation Bending for Large Language Model Safety](https://arxiv.org/abs/2504.01550). REPBEND modifies the model’s internal representations to reduce harmful or unsafe responses while preserving overall capabilities. The result is a model that is robust to various forms of adversarial jailbreak attacks, out-of-distribution harmful prompts, and fine-tuning exploits, all while maintaining useful and informative responses to benign requests.
10
+
11
+ ## Uses
12
+ ```python
13
+ import torch
14
+ from transformers import AutoTokenizer, AutoModelForCausalLM
15
+
16
+ model_id = "AIM-Intelligence/RepBend_Llama3_8B"
17
+ tokenizer = AutoTokenizer.from_pretrained(model_id, use_fast=False)
18
+ model = AutoModelForCausalLM.from_pretrained(
19
+ model_id,
20
+ torch_dtype=torch.bfloat16,
21
+ device_map="auto",
22
+ )
23
+
24
+ input_text = "Who are you?"
25
+ template = "<|start_header_id|>user<|end_header_id|>\n\n{instruction}<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n"
26
+
27
+ prompt = template.format(instruction=input_text)
28
+
29
+ input_ids = tokenizer.encode(prompt, return_tensors="pt").to(model.device)
30
+ outputs = model.generate(input_ids, max_new_tokens=256)
31
+ generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
32
+
33
+ print(generated_text)
34
+ ```
35
+
36
+ ## Code
37
+
38
+ Please refers to [this github page](https://github.com/AIM-Intelligence/RepBend/tree/main?tab=readme-ov-file)
39
+
40
+ ## Citation
41
+ ```
42
+ @article{repbend,
43
+ title={Representation Bending for Large Language Model Safety},
44
+ author={Yousefpour, Ashkan and Kim, Taeheon and Kwon, Ryan S and Lee, Seungbeen and Jeung, Wonje and Han, Seungju and Wan, Alvin and Ngan, Harrison and Yu, Youngjae and Choi, Jonghyun},
45
+ journal={arXiv preprint arXiv:2504.01550},
46
+ year={2025}
47
+ }
48
+ ```