ccss17 commited on
Commit
ebdb93e
·
verified ·
1 Parent(s): c5c9da9

Upload LoRA adapters for ModernBERT prompt injection detector

Browse files
Files changed (3) hide show
  1. README.md +46 -30
  2. adapter_config.json +45 -0
  3. adapter_model.safetensors +3 -0
README.md CHANGED
@@ -37,10 +37,10 @@ model-index:
37
 
38
  ## Model Description
39
 
40
- This model is a fine-tuned version of [answerdotai/ModernBERT-large](https://huggingface.co/answerdotai/ModernBERT-large) for detecting prompt injection attacks in LLM applications. It classifies input prompts as either legitimate user queries or potential injection attacks.
41
 
42
- **Base Model:** answerdotai/ModernBERT-large (395M parameters)
43
- **Training Approach:** Full fine-tuning on H200 GPU
44
  **Use Case:** Production-ready prompt injection detection for LLM security
45
 
46
  ## Intended Use
@@ -55,12 +55,16 @@ This model helps protect LLM-based applications by:
55
 
56
  ```python
57
  from transformers import AutoModelForSequenceClassification, AutoTokenizer
 
58
  import torch
59
 
60
- # Load model and tokenizer
61
- model_name = "ccss17/modernbert-prompt-injection-detector"
62
- model = AutoModelForSequenceClassification.from_pretrained(model_name)
63
- tokenizer = AutoTokenizer.from_pretrained(model_name)
 
 
 
64
 
65
  # Classify a prompt
66
  def detect_injection(text):
@@ -103,22 +107,25 @@ The model was trained on a combined dataset from multiple sources:
103
 
104
  **Total Samples:** ~2,503 (55% normal / 45% attack)
105
  **Train/Val/Test Split:** 80/10/10
 
106
 
107
  ### Training Hyperparameters
108
 
109
  ```yaml
110
- Training Mode: Full Fine-tuning
111
- Epochs: 30
112
- Batch Size: 128
113
- Learning Rate: 3e-05
114
  Optimizer: lion_32bit
115
- Warmup Ratio: 0.2
116
- Weight Decay: 0.1
117
- Max Sequence Length: 1536
 
 
 
118
  LR Scheduler: cosine
119
  Precision: bfloat16
120
- Early Stopping Patience: 7
121
- Hardware: NVIDIA H200 GPU
122
  ```
123
 
124
  ### Performance Metrics
@@ -126,7 +133,7 @@ Hardware: NVIDIA H200 GPU
126
  | Split | Accuracy | Precision | Recall | F1 Score |
127
  |-------|----------|-----------|--------|----------|
128
  | Train | TBD | TBD | TBD | TBD |
129
- | Val | TBD | TBD | TBD | TBD |
130
  | Test | TBD | TBD | TBD | TBD |
131
 
132
  *Update these metrics after running evaluation*
@@ -136,12 +143,21 @@ Hardware: NVIDIA H200 GPU
136
  To evaluate the model on your own data:
137
 
138
  ```python
139
- from transformers import pipeline
 
 
 
 
 
 
 
 
140
 
141
  classifier = pipeline(
142
  "text-classification",
143
- model="ccss17/modernbert-prompt-injection-detector",
144
- device=0 # Use GPU
 
145
  )
146
 
147
  # Batch inference
@@ -164,12 +180,12 @@ print(results)
164
 
165
  This model is designed for **defensive security purposes** only:
166
 
167
- ✅ **Intended Use:**
168
  - Protecting LLM applications from malicious inputs
169
  - Research on prompt injection vulnerabilities
170
  - Building safer AI systems
171
 
172
- ❌ **Prohibited Use:**
173
  - Offensive security testing without authorization
174
  - Bypassing legitimate content moderation
175
  - Any malicious or illegal activities
@@ -180,11 +196,11 @@ If you use this model in your research, please cite:
180
 
181
  ```bibtex
182
  @misc{modernbert_prompt_injection_detector,
183
- author = {Your Name},
184
- title = {modernbert-prompt-injection-detector: Prompt Injection Detection with ModernBERT},
185
- year = {2024},
186
- publisher = {HuggingFace},
187
- howpublished = {\url{https://huggingface.co/ccss17/modernbert-prompt-injection-detector}},
188
  }
189
  ```
190
 
@@ -200,6 +216,6 @@ Apache 2.0 - See LICENSE for details
200
 
201
  ---
202
 
203
- **Model Card Authors:** Your Name
204
- **Contact:** your.email@example.com
205
- **Last Updated:** 2025-10-04
 
37
 
38
  ## Model Description
39
 
40
+ This model is a LoRA-adapted version of [answerdotai/ModernBERT-large](https://huggingface.co/answerdotai/ModernBERT-large) for detecting prompt injection attacks in LLM applications. It classifies input prompts as either legitimate user queries or potential injection attacks.
41
 
42
+ **Base Model:** answerdotai/ModernBERT-large
43
+ **Adaptation Method:** LoRA adapters fine-tuned with Unsloth Trainer
44
  **Use Case:** Production-ready prompt injection detection for LLM security
45
 
46
  ## Intended Use
 
55
 
56
  ```python
57
  from transformers import AutoModelForSequenceClassification, AutoTokenizer
58
+ from peft import PeftModel
59
  import torch
60
 
61
+ # Load base model, adapter, and tokenizer
62
+ adapter_repo = "ccss17/modernbert-prompt-injection-detector"
63
+ base_model_id = "answerdotai/ModernBERT-large"
64
+
65
+ tokenizer = AutoTokenizer.from_pretrained(base_model_id)
66
+ base_model = AutoModelForSequenceClassification.from_pretrained(base_model_id)
67
+ model = PeftModel.from_pretrained(base_model, adapter_repo)
68
 
69
  # Classify a prompt
70
  def detect_injection(text):
 
107
 
108
  **Total Samples:** ~2,503 (55% normal / 45% attack)
109
  **Train/Val/Test Split:** 80/10/10
110
+ **Hyperparameter Search:** Optuna trial 16 with best validation F1 0.9758
111
 
112
  ### Training Hyperparameters
113
 
114
  ```yaml
115
+ Training Mode: LoRA Adapter Training
116
+ Epochs: 3
117
+ Batch Size: 16
118
+ Learning Rate: 4.4390540763318225e-05
119
  Optimizer: lion_32bit
120
+ Warmup Ratio: 0.05
121
+ Weight Decay: 0.005846666628429419
122
+ Max Sequence Length: 2048
123
+ LoRA Rank: 32
124
+ LoRA Alpha: 128
125
+ LoRA Dropout: 0.0
126
  LR Scheduler: cosine
127
  Precision: bfloat16
128
+ Hardware: NVIDIA A100 GPU
 
129
  ```
130
 
131
  ### Performance Metrics
 
133
  | Split | Accuracy | Precision | Recall | F1 Score |
134
  |-------|----------|-----------|--------|----------|
135
  | Train | TBD | TBD | TBD | TBD |
136
+ | Val | 0.9754 | 0.9603 | 0.9918 | 0.9758 |
137
  | Test | TBD | TBD | TBD | TBD |
138
 
139
  *Update these metrics after running evaluation*
 
143
  To evaluate the model on your own data:
144
 
145
  ```python
146
+ from transformers import pipeline, AutoTokenizer, AutoModelForSequenceClassification
147
+ from peft import PeftModel
148
+
149
+ base_model_id = "answerdotai/ModernBERT-large"
150
+ adapter_repo = "ccss17/modernbert-prompt-injection-detector"
151
+
152
+ tokenizer = AutoTokenizer.from_pretrained(base_model_id)
153
+ base_model = AutoModelForSequenceClassification.from_pretrained(base_model_id)
154
+ model = PeftModel.from_pretrained(base_model, adapter_repo)
155
 
156
  classifier = pipeline(
157
  "text-classification",
158
+ model=model,
159
+ tokenizer=tokenizer,
160
+ device=0, # Set to -1 for CPU
161
  )
162
 
163
  # Batch inference
 
180
 
181
  This model is designed for **defensive security purposes** only:
182
 
183
+ Intended Use:
184
  - Protecting LLM applications from malicious inputs
185
  - Research on prompt injection vulnerabilities
186
  - Building safer AI systems
187
 
188
+ Prohibited Use:
189
  - Offensive security testing without authorization
190
  - Bypassing legitimate content moderation
191
  - Any malicious or illegal activities
 
196
 
197
  ```bibtex
198
  @misc{modernbert_prompt_injection_detector,
199
+ author = {Your Name},
200
+ title = {modernbert-prompt-injection-detector: Prompt Injection Detection with ModernBERT LoRA},
201
+ year = {2024},
202
+ publisher = {HuggingFace},
203
+ howpublished = {\url{https://huggingface.co/ccss17/modernbert-prompt-injection-detector}},
204
  }
205
  ```
206
 
 
216
 
217
  ---
218
 
219
+ **Model Card Authors:** Your Name
220
+ **Contact:** your.email@example.com
221
+ **Last Updated:** 2025-10-07
adapter_config.json ADDED
@@ -0,0 +1,45 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "alpha_pattern": {},
3
+ "auto_mapping": {
4
+ "base_model_class": "ModernBertForSequenceClassification",
5
+ "parent_library": "transformers.models.modernbert.modeling_modernbert",
6
+ "unsloth_fixed": true
7
+ },
8
+ "base_model_name_or_path": "answerdotai/ModernBERT-large",
9
+ "bias": "none",
10
+ "corda_config": null,
11
+ "eva_config": null,
12
+ "exclude_modules": null,
13
+ "fan_in_fan_out": false,
14
+ "inference_mode": true,
15
+ "init_lora_weights": true,
16
+ "layer_replication": null,
17
+ "layers_pattern": null,
18
+ "layers_to_transform": null,
19
+ "loftq_config": {},
20
+ "lora_alpha": 128,
21
+ "lora_bias": false,
22
+ "lora_dropout": 0.0,
23
+ "megatron_config": null,
24
+ "megatron_core": "megatron.core",
25
+ "modules_to_save": [
26
+ "classifier",
27
+ "score"
28
+ ],
29
+ "peft_type": "LORA",
30
+ "qalora_group_size": 16,
31
+ "r": 32,
32
+ "rank_pattern": {},
33
+ "revision": null,
34
+ "target_modules": [
35
+ "Wqkv",
36
+ "Wi",
37
+ "Wo"
38
+ ],
39
+ "target_parameters": null,
40
+ "task_type": "SEQ_CLS",
41
+ "trainable_token_indices": null,
42
+ "use_dora": false,
43
+ "use_qalora": false,
44
+ "use_rslora": false
45
+ }
adapter_model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3efbc59606f6668ebc6d34fd420e5003ac8d0f9487fc72afb62cc78885155f97
3
+ size 57605772