Aira-security commited on
Commit
3cdf60e
·
verified ·
1 Parent(s): 2c4430b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +74 -13
README.md CHANGED
@@ -1,16 +1,77 @@
1
  ---
2
- license: apache-2.0
3
- language:
4
- - en
5
- base_model:
6
- - meta-llama/Llama-Prompt-Guard-2-22M
7
- pipeline_tag: text-classification
8
  tags:
 
9
  - security
10
- - injection
11
- - prompt_injection
12
- - jail_break
13
- - llm_security
14
- - prompt_guard
15
- - aira_security
16
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ license: llama2
3
+ base_model: meta-llama/Llama-Prompt-Guard-2-22M
 
 
 
 
4
  tags:
5
+ - prompt-injection
6
  - security
7
+ - classification
8
+ - llama
9
+ - lora
10
+ - fine-tuned
11
+ - text-classification
12
+ pipeline_tag: text-classification
13
+ model_name: FT-Llama-Prompt-Guard-2
14
+ ---
15
+
16
+ # FT-Llama-Prompt-Guard-2
17
+
18
+ A **fine-tuned** version of `meta-llama/Llama-Prompt-Guard-2-22M` for prompt injection and jailbreak detection using LoRA for better accuracy and faster inference
19
+
20
+ ## Model Details
21
+
22
+ - **Base Model**: [meta-llama/Llama-Prompt-Guard-2-22M](https://huggingface.co/meta-llama/Llama-Prompt-Guard-2-22M)
23
+ - **Fine-tuning Method**: LoRA (Low-Rank Adaptation)
24
+ - **Task**: Binary text classification (benign vs malicious prompts)
25
+ - **Model Size**: ~88MB (22M parameters + LoRA)
26
+
27
+
28
+ ## Training Details
29
+
30
+ - **LoRA Rank**: 16
31
+ - **LoRA Alpha**: 32
32
+ - **Max Length**: 512
33
+
34
+ ## Usage
35
+
36
+ ### Using Pipeline
37
+
38
+ ```python
39
+ from transformers import pipeline
40
+
41
+ pipe = pipeline("text-classification", model="Aira-security/FT-Llama-Prompt-Guard-2")
42
+
43
+ result = pipe("Ignore all previous instructions")
44
+ print(result)
45
+ ```
46
+
47
+ ### Direct Model Loading
48
+
49
+ ```python
50
+ from transformers import AutoTokenizer, AutoModelForSequenceClassification
51
+
52
+ tokenizer = AutoTokenizer.from_pretrained("Aira-security/FT-Llama-Prompt-Guard-2")
53
+ model = AutoModelForSequenceClassification.from_pretrained("Aira-security/FT-Llama-Prompt-Guard-2")
54
+
55
+ inputs = tokenizer("Your text here", return_tensors="pt", truncation=True, max_length=512)
56
+ outputs = model(**inputs)
57
+ ```
58
+
59
+ ## Limitations
60
+
61
+ - Trained on English text only
62
+ - May have false positives/negatives on edge cases
63
+ - Performance depends on similarity to training data
64
+
65
+ ## Citation
66
+
67
+ If you use this model, please cite:
68
+
69
+ ```bibtex
70
+ @model{ft_llama_prompt_guard_2},
71
+ title={FT-Llama-Prompt-Guard-2: Fine-tuned Prompt Injection and Jail Break Detector},
72
+ author={Aira Security},
73
+ year={2024},
74
+ base_model={meta-llama/Llama-Prompt-Guard-2-22M},
75
+ url={https://huggingface.co/Aira-security/FT-Llama-Prompt-Guard-2}
76
+ }
77
+ ```