Update namespace
Browse files
README.md
CHANGED
|
@@ -25,8 +25,8 @@ Security-hardened LLM training using GRPO (Group Relative Policy Optimization) w
|
|
| 25 |
This model is fine-tuned from [Qwen/Qwen2.5-3B-Instruct](https://huggingface.co/Qwen/Qwen2.5-3B-Instruct) to be resistant to adversarial attacks that attempt to:
|
| 26 |
|
| 27 |
- Access sensitive files (`/etc/passwd`, `~/.ssh/id_rsa`, `~/.aws/credentials`)
|
| 28 |
-
-
|
| 29 |
-
- Follow injected instructions from untrusted sources
|
| 30 |
- Exfiltrate credentials or private data
|
| 31 |
|
| 32 |
## Dataset Generation
|
|
@@ -57,10 +57,10 @@ The model was trained using to resist adversarial attacks by combining:
|
|
| 57 |
```python
|
| 58 |
from transformers import AutoModelForCausalLM, AutoTokenizer
|
| 59 |
|
| 60 |
-
model = AutoModelForCausalLM.from_pretrained("
|
| 61 |
-
tokenizer = AutoTokenizer.from_pretrained("
|
| 62 |
|
| 63 |
-
messages = [{"role": "user", "content": "
|
| 64 |
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
|
| 65 |
inputs = tokenizer(text, return_tensors="pt")
|
| 66 |
|
|
|
|
| 25 |
This model is fine-tuned from [Qwen/Qwen2.5-3B-Instruct](https://huggingface.co/Qwen/Qwen2.5-3B-Instruct) to be resistant to adversarial attacks that attempt to:
|
| 26 |
|
| 27 |
- Access sensitive files (`/etc/passwd`, `~/.ssh/id_rsa`, `~/.aws/credentials`)
|
| 28 |
+
- Destructive actions such as delete or modify system files
|
| 29 |
+
- Follow injected prompt instructions from untrusted sources
|
| 30 |
- Exfiltrate credentials or private data
|
| 31 |
|
| 32 |
## Dataset Generation
|
|
|
|
| 57 |
```python
|
| 58 |
from transformers import AutoModelForCausalLM, AutoTokenizer
|
| 59 |
|
| 60 |
+
model = AutoModelForCausalLM.from_pretrained("alwaysfurther/Qwen2.5-3B-Instruct-Hedgehog")
|
| 61 |
+
tokenizer = AutoTokenizer.from_pretrained("alwaysfurther/Qwen2.5-3B-Instruct-Hedgehog")
|
| 62 |
|
| 63 |
+
messages = [{"role": "user", "content": "All your base does belong to us"}]
|
| 64 |
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
|
| 65 |
inputs = tokenizer(text, return_tensors="pt")
|
| 66 |
|