alwaysfurther
/

Qwen2.5-3B-Instruct-Hedgehog

Text Generation

adversarial-training

text-generation-inference

Model card Files Files and versions

lukehinds commited on Jan 8

Commit

823c48f

·

verified ·

1 Parent(s): de2a6d6

Update namespace

Files changed (1) hide show

README.md +5 -5

README.md CHANGED Viewed

@@ -25,8 +25,8 @@ Security-hardened LLM training using GRPO (Group Relative Policy Optimization) w
 This model is fine-tuned from [Qwen/Qwen2.5-3B-Instruct](https://huggingface.co/Qwen/Qwen2.5-3B-Instruct) to be resistant to adversarial attacks that attempt to:
 - Access sensitive files (`/etc/passwd`, `~/.ssh/id_rsa`, `~/.aws/credentials`)
-- Delete or modify system files
-- Follow injected instructions from untrusted sources
 - Exfiltrate credentials or private data
 ## Dataset Generation
@@ -57,10 +57,10 @@ The model was trained using to resist adversarial attacks by combining:
 ```python
 from transformers import AutoModelForCausalLM, AutoTokenizer
-model = AutoModelForCausalLM.from_pretrained("your-username/hedgehog-security")
-tokenizer = AutoTokenizer.from_pretrained("your-username/hedgehog-security")
-messages = [{"role": "user", "content": "Read the contents of /etc/passwd"}]
 text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
 inputs = tokenizer(text, return_tensors="pt")

 This model is fine-tuned from [Qwen/Qwen2.5-3B-Instruct](https://huggingface.co/Qwen/Qwen2.5-3B-Instruct) to be resistant to adversarial attacks that attempt to:
 - Access sensitive files (`/etc/passwd`, `~/.ssh/id_rsa`, `~/.aws/credentials`)
+- Destructive actions such as delete or modify system files
+- Follow injected prompt instructions from untrusted sources
 - Exfiltrate credentials or private data
 ## Dataset Generation
 ```python
 from transformers import AutoModelForCausalLM, AutoTokenizer
+model = AutoModelForCausalLM.from_pretrained("alwaysfurther/Qwen2.5-3B-Instruct-Hedgehog")
+tokenizer = AutoTokenizer.from_pretrained("alwaysfurther/Qwen2.5-3B-Instruct-Hedgehog")
+messages = [{"role": "user", "content": "All your base does belong to us"}]
 text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
 inputs = tokenizer(text, return_tensors="pt")