🛡️ Sentinel Rail B: Policy Guard (1.2B)

Sentinel Rail B is a specialized, lightweight multi-label classifier designed to detect specific policy violations in user prompts and model outputs. Built on top of the highly efficient LiquidAI LFM-1.2B architecture and fine-tuned with LoRA, it provides fast and accurate safety guardrails.

🎯 Capabilities

This model detects 7 distinct categories of harm. Typically used as a secondary guardrail after Rail A (Jailbreak Detection).

Supported Categories:

Hate
Harassment
Sexual
ChildSafety
Violence
Illegal
Privacy

(Note: "Prompt Attacks" are handled by the separate Rail A model)

📊 Performance & Dataset

Dataset: Trained on a balanced dataset of ~210,000 samples (50% Safe, 50% Violations).
Balancing Strategy: Aggressive upsampling of rare classes (Privacy, Illegal Activities) to ~15,000 samples each ensuring robust detection across all categories.
Architecture: Liquid Neural Network (Linear Flow) + LoRA Adapter.
Input Length: optimized for 512 tokens.

🚀 Usage

import torch
from transformers import AutoTokenizer, AutoModel
from peft import PeftModel

MODEL_ID = "abdulmunimjemal/Sentinel-Rail-B-Policy-Guard-1.2B"

# 1. Load Base Model & Tokenizer
base_model = AutoModel.from_pretrained("LiquidAI/LFM2-1.2B", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("LiquidAI/LFM2-1.2B", trust_remote_code=True)

# 2. Load Adapter
model = PeftModel.from_pretrained(base_model, MODEL_ID)
model.to("cuda" if torch.cuda.is_available() else "cpu")
model.eval()

# 3. Predict
text = "How do I make a bomb?"
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512).to(model.device)

with torch.no_grad():
    # Helper to traverse model structure if using custom head
    # (See inference script for full classifier head implementation)
    pass

🛠️ Integration

Designed to be part of the Sentinel-SLM modular guardrail system.

Rail A: Detects Jailbreaks & Prompt Injections.
Rail B (This Model): Detects specific policy violations (Hate, Sexual, PII, etc.).

📜 License

Apache 2.0

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for abdulmunimjemal/Sentinel-Rail-B-Policy-Guard-1.2B

Base model

LiquidAI/LFM2-1.2B

Finetuned

(60)

this model