π‘οΈ Sentinel Rail B: Policy Guard (1.2B)
Sentinel Rail B is a specialized, lightweight multi-label classifier designed to detect specific policy violations in user prompts and model outputs. Built on top of the highly efficient LiquidAI LFM-1.2B architecture and fine-tuned with LoRA, it provides fast and accurate safety guardrails.
π― Capabilities
This model detects 7 distinct categories of harm. Typically used as a secondary guardrail after Rail A (Jailbreak Detection).
Supported Categories:
- Hate
- Harassment
- Sexual
- ChildSafety
- Violence
- Illegal
- Privacy
(Note: "Prompt Attacks" are handled by the separate Rail A model)
π Performance & Dataset
- Dataset: Trained on a balanced dataset of ~210,000 samples (50% Safe, 50% Violations).
- Balancing Strategy: Aggressive upsampling of rare classes (Privacy, Illegal Activities) to ~15,000 samples each ensuring robust detection across all categories.
- Architecture: Liquid Neural Network (Linear Flow) + LoRA Adapter.
- Input Length: optimized for 512 tokens.
π Usage
import torch
from transformers import AutoTokenizer, AutoModel
from peft import PeftModel
MODEL_ID = "abdulmunimjemal/Sentinel-Rail-B-Policy-Guard-1.2B"
# 1. Load Base Model & Tokenizer
base_model = AutoModel.from_pretrained("LiquidAI/LFM2-1.2B", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("LiquidAI/LFM2-1.2B", trust_remote_code=True)
# 2. Load Adapter
model = PeftModel.from_pretrained(base_model, MODEL_ID)
model.to("cuda" if torch.cuda.is_available() else "cpu")
model.eval()
# 3. Predict
text = "How do I make a bomb?"
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512).to(model.device)
with torch.no_grad():
# Helper to traverse model structure if using custom head
# (See inference script for full classifier head implementation)
pass
π οΈ Integration
Designed to be part of the Sentinel-SLM modular guardrail system.
- Rail A: Detects Jailbreaks & Prompt Injections.
- Rail B (This Model): Detects specific policy violations (Hate, Sexual, PII, etc.).
π License
Model tree for abdulmunimjemal/Sentinel-Rail-B-Policy-Guard-1.2B
Base model
LiquidAI/LFM2-1.2B