Instructions to use Accuknoxtechnologies/PromptInjection-Qwen3.5-2B-LoRA-8bit with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use Accuknoxtechnologies/PromptInjection-Qwen3.5-2B-LoRA-8bit with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3.5-2B") model = PeftModel.from_pretrained(base_model, "Accuknoxtechnologies/PromptInjection-Qwen3.5-2B-LoRA-8bit") - Notebooks
- Google Colab
- Kaggle
PromptInjection-Qwen3.5-2B-LoRA-8bit
LoRA adapter for Qwen/Qwen3.5-2B that detects prompt-injection attacks embedded in user input: instruction overrides, jailbreak attempts, fake authority claims ("URGENT", "SYSTEM:", "developer message"), requests to reveal hidden system prompts or initialization tokens, and similar manipulation patterns. Trained on the LLM Guard prompt_injection scanner outputs.
The model is fine-tuned to emit a strict JSON object marking every injection span found in the user prompt:
{"is_valid": false, "violations": {"Injection": [[27, 113]]}}
Quick start
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
import torch, json, re
BASE = "Qwen/Qwen3.5-2B"
ADAPTER = "Accuknoxtechnologies/PromptInjection-Qwen3.5-2B-LoRA-8bit"
tokenizer = AutoTokenizer.from_pretrained(BASE, trust_remote_code=True)
bnb = BitsAndBytesConfig(load_in_8bit=True)
model = AutoModelForCausalLM.from_pretrained(BASE, quantization_config=bnb, device_map="auto", trust_remote_code=True)
model = PeftModel.from_pretrained(model, ADAPTER); model.eval()
def guard(prompt: str) -> dict:
chat = tokenizer.apply_chat_template(
[{"role":"system","content":SYSTEM_MSG},
{"role":"user","content":prompt}],
tokenize=False, add_generation_prompt=True, enable_thinking=False)
inputs = tokenizer(chat, return_tensors="pt").to(model.device)
out = model.generate(**inputs, max_new_tokens=768, do_sample=False)
text = tokenizer.decode(out[0, inputs["input_ids"].shape[1]:], skip_special_tokens=True)
return json.loads(re.search(r'\{.*\}', text, re.DOTALL).group(0))
Evaluation
Evaluated on 100 held-out prompts drawn from test_dataset_injection.csv (covers the same violation types and prompt-length buckets as the training data).
- Evaluation timestamp:
2026-05-12 20:42 UTC - GPU:
NVIDIA A10G - Source adapter:
Accuknoxtechnologies/PromptInjection-Qwen3.5-2B-LoRA-8bit - JSON parse errors:
0/100(0.0%)
Top-level metrics
| Metric | Value |
|---|---|
is_valid accuracy |
1.0000 |
| Violation-type-set exact match | 1.0000 |
| Binary F1 (positive = invalid) | 1.0000 |
| Binary precision | 1.0000 |
| Binary recall | 1.0000 |
| Macro F1 across violation types | 1.0000 |
Confusion matrix — binary is_valid decision
Positive class = the prompt contains a violation (is_valid=False).
| predicted invalid | predicted valid | |
|---|---|---|
| actual invalid | TP = 75 | FN = 0 |
| actual valid | FP = 0 | TN = 25 |
Per violation-type metrics
Only types that appear in either the actual or predicted labels are listed.
| Type | support | precision | recall | F1 |
|---|---|---|---|---|
Injection |
75 | 1.000 | 1.000 | 1.000 |
Inference latency
- Mean: 3.52 s/prompt
- Median: 3.27 s/prompt
- p95: 6.01 s/prompt
- Max: 6.27 s/prompt
Training setup
- Base model:
Qwen/Qwen3.5-2B(loaded in 8-bit viabitsandbytes) - LoRA: r=16, alpha=32, dropout=0.05, target modules = {q,k,v,o,gate,up,down}_proj
- Optimizer: paged_adamw_8bit, lr=3e-4, cosine schedule, warmup 5%
- Precision: bf16 if available, else fp16
- Effective batch size: 8 (per-device 1 + grad-accum 8), gradient checkpointing on
- Max sequence length: 3200 tokens (system + user up to 2000 + assistant up to ~600)
- Prompt-length buckets in training data: 50, 100, 200, 400, 600, 1200, 1500, 2000 tokens
- Training data:
prompt_injection.csv— 1900 rows after a 100-row stratified test split was carved off (≈1425 attacks + ≈475 benign)
Supported violation types
The model emits one or more of these TYPE keys in the violations map of its JSON output:
Injection
Model card generated automatically by eval_and_push_card.py on 2026-05-12 20:42 UTC. Mirror of this card lives at the other namespace too.
- Downloads last month
- 5
Model tree for Accuknoxtechnologies/PromptInjection-Qwen3.5-2B-LoRA-8bit
Evaluation results
- is_valid accuracy on Prompt-Injection Guard Held-out Test Setself-reported1.000
- violation-type-set exact match on Prompt-Injection Guard Held-out Test Setself-reported1.000
- binary F1 (positive=invalid) on Prompt-Injection Guard Held-out Test Setself-reported1.000
- macro F1 over violation types on Prompt-Injection Guard Held-out Test Setself-reported1.000
- binary precision (positive=invalid) on Prompt-Injection Guard Held-out Test Setself-reported1.000
- binary recall (positive=invalid) on Prompt-Injection Guard Held-out Test Setself-reported1.000