You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

PromptInjection-Qwen3.5-2B-LoRA-8bit

LoRA adapter for Qwen/Qwen3.5-2B that detects prompt-injection attacks embedded in user input: instruction overrides, jailbreak attempts, fake authority claims ("URGENT", "SYSTEM:", "developer message"), requests to reveal hidden system prompts or initialization tokens, and similar manipulation patterns. Trained on the LLM Guard prompt_injection scanner outputs. The model is fine-tuned to emit a strict JSON object marking every injection span found in the user prompt:

{"is_valid": false, "violations": {"Injection": [[27, 113]]}}

Quick start

from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
import torch, json, re

BASE = "Qwen/Qwen3.5-2B"
ADAPTER = "Accuknoxtechnologies/PromptInjection-Qwen3.5-2B-LoRA-8bit"

tokenizer = AutoTokenizer.from_pretrained(BASE, trust_remote_code=True)
bnb = BitsAndBytesConfig(load_in_8bit=True)
model = AutoModelForCausalLM.from_pretrained(BASE, quantization_config=bnb, device_map="auto", trust_remote_code=True)
model = PeftModel.from_pretrained(model, ADAPTER); model.eval()

def guard(prompt: str) -> dict:
    chat = tokenizer.apply_chat_template(
        [{"role":"system","content":SYSTEM_MSG},
         {"role":"user","content":prompt}],
        tokenize=False, add_generation_prompt=True, enable_thinking=False)
    inputs = tokenizer(chat, return_tensors="pt").to(model.device)
    out = model.generate(**inputs, max_new_tokens=768, do_sample=False)
    text = tokenizer.decode(out[0, inputs["input_ids"].shape[1]:], skip_special_tokens=True)
    return json.loads(re.search(r'\{.*\}', text, re.DOTALL).group(0))

Evaluation

Evaluated on 100 held-out prompts drawn from test_dataset_injection.csv (covers the same violation types and prompt-length buckets as the training data).

  • Evaluation timestamp: 2026-05-12 20:42 UTC
  • GPU: NVIDIA A10G
  • Source adapter: Accuknoxtechnologies/PromptInjection-Qwen3.5-2B-LoRA-8bit
  • JSON parse errors: 0/100 (0.0%)

Top-level metrics

Metric Value
is_valid accuracy 1.0000
Violation-type-set exact match 1.0000
Binary F1 (positive = invalid) 1.0000
Binary precision 1.0000
Binary recall 1.0000
Macro F1 across violation types 1.0000

Confusion matrix — binary is_valid decision

Positive class = the prompt contains a violation (is_valid=False).

predicted invalid predicted valid
actual invalid TP = 75 FN = 0
actual valid FP = 0 TN = 25

Per violation-type metrics

Only types that appear in either the actual or predicted labels are listed.

Type support precision recall F1
Injection 75 1.000 1.000 1.000

Inference latency

  • Mean: 3.52 s/prompt
  • Median: 3.27 s/prompt
  • p95: 6.01 s/prompt
  • Max: 6.27 s/prompt

Training setup

  • Base model: Qwen/Qwen3.5-2B (loaded in 8-bit via bitsandbytes)
  • LoRA: r=16, alpha=32, dropout=0.05, target modules = {q,k,v,o,gate,up,down}_proj
  • Optimizer: paged_adamw_8bit, lr=3e-4, cosine schedule, warmup 5%
  • Precision: bf16 if available, else fp16
  • Effective batch size: 8 (per-device 1 + grad-accum 8), gradient checkpointing on
  • Max sequence length: 3200 tokens (system + user up to 2000 + assistant up to ~600)
  • Prompt-length buckets in training data: 50, 100, 200, 400, 600, 1200, 1500, 2000 tokens
  • Training data: prompt_injection.csv — 1900 rows after a 100-row stratified test split was carved off (≈1425 attacks + ≈475 benign)

Supported violation types

The model emits one or more of these TYPE keys in the violations map of its JSON output:

Injection

Model card generated automatically by eval_and_push_card.py on 2026-05-12 20:42 UTC. Mirror of this card lives at the other namespace too.

Downloads last month
5
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Accuknoxtechnologies/PromptInjection-Qwen3.5-2B-LoRA-8bit

Finetuned
Qwen/Qwen3.5-2B
Adapter
(78)
this model

Evaluation results

  • is_valid accuracy on Prompt-Injection Guard Held-out Test Set
    self-reported
    1.000
  • violation-type-set exact match on Prompt-Injection Guard Held-out Test Set
    self-reported
    1.000
  • binary F1 (positive=invalid) on Prompt-Injection Guard Held-out Test Set
    self-reported
    1.000
  • macro F1 over violation types on Prompt-Injection Guard Held-out Test Set
    self-reported
    1.000
  • binary precision (positive=invalid) on Prompt-Injection Guard Held-out Test Set
    self-reported
    1.000
  • binary recall (positive=invalid) on Prompt-Injection Guard Held-out Test Set
    self-reported
    1.000