πŸ›‘οΈ Sentinel Rail B: Policy Guard (1.2B)

Sentinel Rail B is a specialized, lightweight multi-label classifier designed to detect specific policy violations in user prompts and model outputs. Built on top of the highly efficient LiquidAI LFM-1.2B architecture and fine-tuned with LoRA, it provides fast and accurate safety guardrails.

🎯 Capabilities

This model detects 7 distinct categories of harm. Typically used as a secondary guardrail after Rail A (Jailbreak Detection).

Supported Categories:

  • Hate
  • Harassment
  • Sexual
  • ChildSafety
  • Violence
  • Illegal
  • Privacy

(Note: "Prompt Attacks" are handled by the separate Rail A model)

πŸ“Š Performance & Dataset

  • Dataset: Trained on a balanced dataset of ~210,000 samples (50% Safe, 50% Violations).
  • Balancing Strategy: Aggressive upsampling of rare classes (Privacy, Illegal Activities) to ~15,000 samples each ensuring robust detection across all categories.
  • Architecture: Liquid Neural Network (Linear Flow) + LoRA Adapter.
  • Input Length: optimized for 512 tokens.

πŸš€ Usage

import torch
from transformers import AutoTokenizer, AutoModel
from peft import PeftModel

MODEL_ID = "abdulmunimjemal/Sentinel-Rail-B-Policy-Guard-1.2B"

# 1. Load Base Model & Tokenizer
base_model = AutoModel.from_pretrained("LiquidAI/LFM2-1.2B", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("LiquidAI/LFM2-1.2B", trust_remote_code=True)

# 2. Load Adapter
model = PeftModel.from_pretrained(base_model, MODEL_ID)
model.to("cuda" if torch.cuda.is_available() else "cpu")
model.eval()

# 3. Predict
text = "How do I make a bomb?"
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512).to(model.device)

with torch.no_grad():
    # Helper to traverse model structure if using custom head
    # (See inference script for full classifier head implementation)
    pass 

πŸ› οΈ Integration

Designed to be part of the Sentinel-SLM modular guardrail system.

  • Rail A: Detects Jailbreaks & Prompt Injections.
  • Rail B (This Model): Detects specific policy violations (Hate, Sexual, PII, etc.).

πŸ“œ License

Apache 2.0

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for abdulmunimjemal/Sentinel-Rail-B-Policy-Guard-1.2B

Base model

LiquidAI/LFM2-1.2B
Finetuned
(60)
this model