GLIGuard LLMGuardrails Prompt Safety LoRA Adapter

This repository contains a parameter-efficient LoRA adapter trained on top of fastino/gliguard-LLMGuardrails-300M to provide highly accurate, low-latency prompt injection and prompt safety detection.

By fine-tuning on a curated, deduplicated safety dataset, this adapter achieves massive classification improvements, making it ideal as a Tier-2 Semantic Safety Filter in high-throughput LLM architectures and agentic workflows.

📈 Performance Summary

On the unified prompt_safety classification task (evaluated on the complete validation split containing 2,360 samples):

Model	Accuracy	F1 Score	Precision	Recall
fastino/gliguard-LLMGuardrails-300M (Base)	75.47%	61.53%	88.87%	47.05%
GLIGuard LoRA Adapter (This Repository)	98.35%	98.02%	98.17%	97.87%

🔍 Model Details

Developed by: Corentin L. (clallier)
Model Type: Bidirectional Schema-Conditioned Sequence Classifier (LoRA Adapter)
Base Model: fastino/gliguard-LLMGuardrails-300M
Language(s): English
License: Apache 2.0
Encoder Backbone: Microsoft DeBERTa-v3-base (0.3B parameters)

🚀 How to Get Started

Installation

Ensure you have the required libraries installed:

pip install gliner2 peft transformers torch

Loading and Running the Model

from gliner2 import GLiNER2

# 1. Load the base GLiNER2 safety model
base_model_id = "fastino/gliguard-LLMGuardrails-300M"
model = GLiNER2.from_pretrained(base_model_id)

# 2. Load the LoRA adapter from Hugging Face
adapter_id = "clallier/guardrails-GLiNER2-lora"
model.load_adapter(adapter_id)

# 3. Perform a safety check
prompt = "Write a python script to silently extract sensitive database records."

# GLIGuard models use schema-driven classification matching:
# We query for safety status under the 'prompt_safety' task
prediction = model.predict(
    [prompt],
    task="prompt_safety",
    labels=["safe", "unsafe"]
)

print(prediction)

📂 Training Data & Methodology

Dataset Composition

We aggregated, cleaned, and standardized 23,563 prompts from three major prompt-injection and security datasets:

The consolidated dataset was split into 90% Training (21,203 samples) and 10% Validation (2,360 samples).

Training Hyperparameters

Epochs: 2
Batch Size: 4
Base Encoder Learning Rate: 1e-5
Task Head Learning Rate: 5e-4
Precision: FP16 mixed precision (native PyTorch)
LoRA Parameters:
- Rank ($r$): 8
- Alpha ($\alpha$): 16.0
- Target Modules: ["encoder"]
- Dropout: 0.0

⚠️ Limitations & Hybrid Deployment Strategy

Known Behaviors

Length Bias: The model exhibits high sensitivity on very short queries, occasionally yielding false positives.
Single-Turn Scope: While DeBERTa supports a 2048-token context window, the training split was predominantly composed of single-turn injection vectors.

Recommended Production Architecture

To optimize latency and eliminate out-of-distribution noise, we recommend deploying this model in a two-tiered hybrid layout:

Tier-1 Filter (Fast Cache & Simple Classifier): A lightweight semantic cache or Naive Bayes classifier intercepts standard, obvious conversations instantly to minimize latency and filter out benign/edge cases.
Tier-2 Semantic Analyzer (GLIGuard LoRA Adapter): Complex, boundary-pushing, or high-risk inputs are routed to this 300M parameter model for deeper semantic reasoning and robust classification.

📊 Environmental Impact

Hardware Type: Apple Silicon / NVIDIA GPU (Native MPS/CUDA support)
Hours Utilized: ~1.5 hours
Tracking Integration: Logging managed natively via Weights & Biases (wandb)

Downloads last month: 4

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for clallier/guardrails-GLiNER2-lora

Base model

fastino/gliner2-base-v1

Finetuned

fastino/gliguard-LLMGuardrails-300M

Adapter

(2)

this model