LLM Defence Scanner β€” LFM2.5 1.2B

GitHub

A fine-tuned LFM2.5-1.2B-Instruct model for AI guardrail classification. Single LLM, six categories, structured JSON verdicts. Works for input guard (scan user prompts before they reach an LLM) and output guard (scan LLM responses before they reach the user).

Open Notebook β€” runnable end-to-end with all six categories.

Capabilities

Category What it catches
pii names, emails, phones, IDs, cards, addresses, credentials
prompt_injection jailbreaks, instruction overrides, role-hijack attempts
topic_ban tenant-configured restricted topics
competitor tenant-configured competitor mentions
code code snippets in disallowed programming languages
malicious_url phishing, typosquats, known-bad domains

Categories are scoped per request via applied_policies β€” different tenants enable different scanners with different parameters.

Benchmarks

Held-out test set, 841 records, English only. Model output is post-processed deterministically (apply_policy_postprocess) for code allow-list and competitor list filtering.

metric value
mean_total_score 0.991
schema_validity_rate 1.000
exact_category_set_rate 0.999
mean_matched_score 0.990
mean_span_score 0.961
exact_overall_blocked_rate 0.999
hard_fail_rate 0.001

Per-category matched-accuracy:

category matched-acc span F1
pii 99.5% 0.913
competitor 99.7% 0.995
topic_ban 98.1% 0.937
code 100.0% 0.982
prompt_injection 99.3% 0.873
malicious_url 96.4% 0.907

Inference

End-to-end usage β€” model load, prompt rendering, post-processor, six category examples for both input-leg and output-leg β€” is in release_v1_demo.ipynb. Download and run in Jupyter or Colab.

For production serving with vLLM:

vllm serve FrameByFrame/llm-defence-scanner-lfm2.5-1.2b \
  --served-model-name llm-defence-scanner \
  --max-model-len 4096 \
  --gpu-memory-utilization 0.85 \
  --no-enable-prefix-caching

p50 latency ~360 ms on RTX PRO 6000 Blackwell, p95 ~870 ms.

Output schema

{
  "overall_blocked": true,
  "severity": "high",
  "language": {"dominant": "en", "alternates": [], "script": "latin", "code_mixed": false},
  "scenario": {"name": "banking", "profile": "retail_banking_kyc"},
  "categories": [
    {
      "name": "pii",
      "matched": true,
      "matches": [
        {"text": "alice.tan@example.com", "kind": "email"},
        {"text": "+1-415-555-2244", "kind": "phone"}
      ]
    }
  ],
  "reason": "The input contains personally identifiable information."
}
field role
overall_blocked gateway decision: refuse or allow
categories[].matched per-category fire flag
categories[].matches[].text verbatim substring β€” ready to redact
categories[].matches[].kind subtype label (e.g. email, phone, python, phishing)

Training

  • Base: LiquidAI/LFM2.5-1.2B-Instruct
  • Method: LoRA (r=16, Ξ±=16) on q/k/v/o + w1/w2/w3, then merged
  • Dataset (v5): ~11k records spanning synthetic multi-category inputs, ai4privacy/pii-masking-65k PII spans, bantopics safety pairs, embedded prompt-injection synthesis. English only, 8-gram leakage-filtered against the test split.
  • Hyperparameters: effective batch=32, LR=2e-4, 2 epochs
  • Hardware: single RTX PRO 6000 Blackwell (96GB)
  • Final eval_loss: 0.0015

Citation

@misc{mariappan2026llmdefence,
  author    = {Mariappan, Vijayachandran},
  title     = {LLM Defence Scanner β€” LFM2.5 1.2B},
  year      = {2026},
  publisher = {Hugging Face},
  howpublished = {\url{https://huggingface.co/FrameByFrame/llm-defence-scanner-lfm2.5-1.2b}}
}

License

Based on LFM2.5 β€” subject to the LFM Open License v1.0.

Downloads last month
48
Safetensors
Model size
1B params
Tensor type
BF16
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for FrameByFrame/llm-defence-scanner-lfm2.5-1.2b

Adapter
(24)
this model

Collection including FrameByFrame/llm-defence-scanner-lfm2.5-1.2b