LLM Defence Scanner β LFM2.5 1.2B
A fine-tuned LFM2.5-1.2B-Instruct model for AI guardrail classification. Single LLM, six categories, structured JSON verdicts. Works for input guard (scan user prompts before they reach an LLM) and output guard (scan LLM responses before they reach the user).
β runnable end-to-end with all six categories.
Capabilities
| Category | What it catches |
|---|---|
pii |
names, emails, phones, IDs, cards, addresses, credentials |
prompt_injection |
jailbreaks, instruction overrides, role-hijack attempts |
topic_ban |
tenant-configured restricted topics |
competitor |
tenant-configured competitor mentions |
code |
code snippets in disallowed programming languages |
malicious_url |
phishing, typosquats, known-bad domains |
Categories are scoped per request via applied_policies β different tenants enable different scanners with different parameters.
Benchmarks
Held-out test set, 841 records, English only. Model output is post-processed deterministically (apply_policy_postprocess) for code allow-list and competitor list filtering.
| metric | value |
|---|---|
| mean_total_score | 0.991 |
| schema_validity_rate | 1.000 |
| exact_category_set_rate | 0.999 |
| mean_matched_score | 0.990 |
| mean_span_score | 0.961 |
| exact_overall_blocked_rate | 0.999 |
| hard_fail_rate | 0.001 |
Per-category matched-accuracy:
| category | matched-acc | span F1 |
|---|---|---|
| pii | 99.5% | 0.913 |
| competitor | 99.7% | 0.995 |
| topic_ban | 98.1% | 0.937 |
| code | 100.0% | 0.982 |
| prompt_injection | 99.3% | 0.873 |
| malicious_url | 96.4% | 0.907 |
Inference
End-to-end usage β model load, prompt rendering, post-processor, six category examples for both input-leg and output-leg β is in release_v1_demo.ipynb. Download and run in Jupyter or Colab.
For production serving with vLLM:
vllm serve FrameByFrame/llm-defence-scanner-lfm2.5-1.2b \
--served-model-name llm-defence-scanner \
--max-model-len 4096 \
--gpu-memory-utilization 0.85 \
--no-enable-prefix-caching
p50 latency ~360 ms on RTX PRO 6000 Blackwell, p95 ~870 ms.
Output schema
{
"overall_blocked": true,
"severity": "high",
"language": {"dominant": "en", "alternates": [], "script": "latin", "code_mixed": false},
"scenario": {"name": "banking", "profile": "retail_banking_kyc"},
"categories": [
{
"name": "pii",
"matched": true,
"matches": [
{"text": "alice.tan@example.com", "kind": "email"},
{"text": "+1-415-555-2244", "kind": "phone"}
]
}
],
"reason": "The input contains personally identifiable information."
}
| field | role |
|---|---|
overall_blocked |
gateway decision: refuse or allow |
categories[].matched |
per-category fire flag |
categories[].matches[].text |
verbatim substring β ready to redact |
categories[].matches[].kind |
subtype label (e.g. email, phone, python, phishing) |
Training
- Base: LiquidAI/LFM2.5-1.2B-Instruct
- Method: LoRA (r=16, Ξ±=16) on q/k/v/o + w1/w2/w3, then merged
- Dataset (v5): ~11k records spanning synthetic multi-category inputs, ai4privacy/pii-masking-65k PII spans, bantopics safety pairs, embedded prompt-injection synthesis. English only, 8-gram leakage-filtered against the test split.
- Hyperparameters: effective batch=32, LR=2e-4, 2 epochs
- Hardware: single RTX PRO 6000 Blackwell (96GB)
- Final eval_loss: 0.0015
Citation
@misc{mariappan2026llmdefence,
author = {Mariappan, Vijayachandran},
title = {LLM Defence Scanner β LFM2.5 1.2B},
year = {2026},
publisher = {Hugging Face},
howpublished = {\url{https://huggingface.co/FrameByFrame/llm-defence-scanner-lfm2.5-1.2b}}
}
License
Based on LFM2.5 β subject to the LFM Open License v1.0.
- Downloads last month
- 48
Model tree for FrameByFrame/llm-defence-scanner-lfm2.5-1.2b
Base model
LiquidAI/LFM2.5-1.2B-Base