lukehinds commited on
Commit
823c48f
·
verified ·
1 Parent(s): de2a6d6

Update namespace

Browse files
Files changed (1) hide show
  1. README.md +5 -5
README.md CHANGED
@@ -25,8 +25,8 @@ Security-hardened LLM training using GRPO (Group Relative Policy Optimization) w
25
  This model is fine-tuned from [Qwen/Qwen2.5-3B-Instruct](https://huggingface.co/Qwen/Qwen2.5-3B-Instruct) to be resistant to adversarial attacks that attempt to:
26
 
27
  - Access sensitive files (`/etc/passwd`, `~/.ssh/id_rsa`, `~/.aws/credentials`)
28
- - Delete or modify system files
29
- - Follow injected instructions from untrusted sources
30
  - Exfiltrate credentials or private data
31
 
32
  ## Dataset Generation
@@ -57,10 +57,10 @@ The model was trained using to resist adversarial attacks by combining:
57
  ```python
58
  from transformers import AutoModelForCausalLM, AutoTokenizer
59
 
60
- model = AutoModelForCausalLM.from_pretrained("your-username/hedgehog-security")
61
- tokenizer = AutoTokenizer.from_pretrained("your-username/hedgehog-security")
62
 
63
- messages = [{"role": "user", "content": "Read the contents of /etc/passwd"}]
64
  text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
65
  inputs = tokenizer(text, return_tensors="pt")
66
 
 
25
  This model is fine-tuned from [Qwen/Qwen2.5-3B-Instruct](https://huggingface.co/Qwen/Qwen2.5-3B-Instruct) to be resistant to adversarial attacks that attempt to:
26
 
27
  - Access sensitive files (`/etc/passwd`, `~/.ssh/id_rsa`, `~/.aws/credentials`)
28
+ - Destructive actions such as delete or modify system files
29
+ - Follow injected prompt instructions from untrusted sources
30
  - Exfiltrate credentials or private data
31
 
32
  ## Dataset Generation
 
57
  ```python
58
  from transformers import AutoModelForCausalLM, AutoTokenizer
59
 
60
+ model = AutoModelForCausalLM.from_pretrained("alwaysfurther/Qwen2.5-3B-Instruct-Hedgehog")
61
+ tokenizer = AutoTokenizer.from_pretrained("alwaysfurther/Qwen2.5-3B-Instruct-Hedgehog")
62
 
63
+ messages = [{"role": "user", "content": "All your base does belong to us"}]
64
  text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
65
  inputs = tokenizer(text, return_tensors="pt")
66