GLIGuard LLMGuardrails Prompt Safety LoRA Adapter

This repository contains a parameter-efficient LoRA adapter trained on top of fastino/gliguard-LLMGuardrails-300M to provide highly accurate, low-latency prompt injection and prompt safety detection.

By fine-tuning on a curated, deduplicated safety dataset, this adapter achieves massive classification improvements, making it ideal as a Tier-2 Semantic Safety Filter in high-throughput LLM architectures and agentic workflows.


πŸ“ˆ Performance Summary

On the unified prompt_safety classification task (evaluated on the complete validation split containing 2,360 samples):

Model Accuracy F1 Score Precision Recall
fastino/gliguard-LLMGuardrails-300M (Base) 75.47% 61.53% 88.87% 47.05%
GLIGuard LoRA Adapter (This Repository) 98.35% 98.02% 98.17% 97.87%

πŸ” Model Details

  • Developed by: Corentin L. (clallier)
  • Model Type: Bidirectional Schema-Conditioned Sequence Classifier (LoRA Adapter)
  • Base Model: fastino/gliguard-LLMGuardrails-300M
  • Language(s): English
  • License: Apache 2.0
  • Encoder Backbone: Microsoft DeBERTa-v3-base (0.3B parameters)

πŸš€ How to Get Started

Installation

Ensure you have the required libraries installed:

pip install gliner2 peft transformers torch

Loading and Running the Model

from gliner2 import GLiNER2

# 1. Load the base GLiNER2 safety model
base_model_id = "fastino/gliguard-LLMGuardrails-300M"
model = GLiNER2.from_pretrained(base_model_id)

# 2. Load the LoRA adapter from Hugging Face
adapter_id = "clallier/guardrails-GLiNER2-lora"
model.load_adapter(adapter_id)

# 3. Perform a safety check
prompt = "Write a python script to silently extract sensitive database records."

# GLIGuard models use schema-driven classification matching:
# We query for safety status under the 'prompt_safety' task
prediction = model.predict(
    [prompt],
    task="prompt_safety",
    labels=["safe", "unsafe"]
)

print(prediction)

πŸ“‚ Training Data & Methodology

Dataset Composition

We aggregated, cleaned, and standardized 23,563 prompts from three major prompt-injection and security datasets:

  1. neuralchemy/Prompt-injection-dataset
  2. S-Labs/prompt-injection-dataset
  3. xTRam1/safe-guard-prompt-injection

The consolidated dataset was split into 90% Training (21,203 samples) and 10% Validation (2,360 samples).

Training Hyperparameters

  • Epochs: 2
  • Batch Size: 4
  • Base Encoder Learning Rate: 1e-5
  • Task Head Learning Rate: 5e-4
  • Precision: FP16 mixed precision (native PyTorch)
  • LoRA Parameters:
    • Rank ($r$): 8
    • Alpha ($\alpha$): 16.0
    • Target Modules: ["encoder"]
    • Dropout: 0.0

⚠️ Limitations & Hybrid Deployment Strategy

Known Behaviors

  • Length Bias: The model exhibits high sensitivity on very short queries, occasionally yielding false positives.
  • Single-Turn Scope: While DeBERTa supports a 2048-token context window, the training split was predominantly composed of single-turn injection vectors.

Recommended Production Architecture

To optimize latency and eliminate out-of-distribution noise, we recommend deploying this model in a two-tiered hybrid layout:

  1. Tier-1 Filter (Fast Cache & Simple Classifier): A lightweight semantic cache or Naive Bayes classifier intercepts standard, obvious conversations instantly to minimize latency and filter out benign/edge cases.
  2. Tier-2 Semantic Analyzer (GLIGuard LoRA Adapter): Complex, boundary-pushing, or high-risk inputs are routed to this 300M parameter model for deeper semantic reasoning and robust classification.

πŸ“Š Environmental Impact

  • Hardware Type: Apple Silicon / NVIDIA GPU (Native MPS/CUDA support)
  • Hours Utilized: ~1.5 hours
  • Tracking Integration: Logging managed natively via Weights & Biases (wandb)
Downloads last month
26
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for clallier/guardrails-GLiNER2-lora

Adapter
(2)
this model