Defender Security Judge — Dolphin 3.0 Llama 3.2 3B

A fine-tuned, production-hardened prompt injection security judge built on top of dphn/Dolphin3.0-Llama3.2-3B.

This model is Stage 2 of the Defender multi-layer LLM security pipeline — a real-time adversarial firewall that intercepts, analyzes, and classifies user prompts before they ever reach a protected LLM.


Benchmark Results

Evaluated against the rogue-security/prompt-injections-benchmark — the industry-standard Qualifire benchmark used to evaluate production prompt injection defenses.

Metric Score
Accuracy 90.00%
F1 Score 0.9038
Precision 88.68%
Recall 92.16%

A 3B quantized model running entirely offline achieving 90% accuracy on the hardest curated jailbreak benchmark available. No API calls. No latency. No cost.


What Makes This Model Different

Zero refusals. Built on the uncensored Dolphin base, it coldly analyzes any attack — no matter how explicit — without flinching or refusing to process the payload.

Rigid JSON output. DoRA fine-tuning permanently hardwires the model to emit only structured {"decision", "confidence", "reason", "allowed_payload"} JSON. No preamble. No yapping.

Calibrated confidence. Trained with Gaussian confidence noise on ambiguous samples, the model's confidence field is mathematically trustworthy — not the overconfident 0.99 you get from vanilla LLMs.

Long-context immunity. Trained at sequence_len: 8192 with 98.37% sample packing efficiency. The model can read an 8,000-token document and catch an attack buried at token 7,500.


Training Details

  • Technique: DoRA (Weight-Decomposed LoRA) + NEFTune (α=5.0) + Flash Attention + Sample Packing
  • Hardware: NVIDIA H100 80GB SXM5
  • Training Time: ~14 minutes
  • Loss: 2.30 → 0.18 (converged cleanly across 3 epochs)
  • Dataset: karan11/defender-judge-fine-tune — 2,700 DeBERTa-scored, calibration-hardened samples

Available Artifacts

File Description
adapter_model.safetensors Raw LoRA adapter weights
judge-dolphin3-3b-f16.gguf Full merged model in F16 (6.4 GB)
judge-q4_k_m.gguf Production artifact — Q4_K_M quantized (2.0 GB)

Intended Use

This model is strictly a security classifier. It is not a general-purpose assistant.
Load it with llama-cpp-python and pass it the Defender system prompt for correct behavior.

from llama_cpp import Llama

llm = Llama(model_path="judge-q4_k_m.gguf", n_gpu_layers=-1, n_ctx=8192)
Downloads last month
129
GGUF
Model size
3B params
Architecture
llama
Hardware compatibility
Log In to add your hardware

4-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for hlyn/prompt-injection-judge-3b

Adapter
(2)
this model

Dataset used to train hlyn/prompt-injection-judge-3b