πŸ›‘οΈ Sentinel Rail B: Policy Guard (350M)

Sentinel Rail B is a lightweight, efficient multi-label classifier designed to detect 7 distinct categories of policy violations in text.

Architecture Note: This model uses a custom classification head on top of the LiquidAI LFM2-350M base model. The repository contains the LoRA adapter weights (adapter_model.safetensors) AND the separate classifier head weights (classifier.pt).


πŸ“Š Performance

Metric Score
F1 Micro 0.7647
F1 Macro 0.7793
Hamming Loss 0.0466

Per-Category F1 Scores

Category F1 Score Status
Privacy 0.9927 🟒 Excellent
Illegal 0.9750 🟒 Excellent
ChildSafety 0.7783 🟒 Good
Violence 0.7727 🟒 Good
Sexual 0.7415 🟒 Good
Harassment 0.6160 🟑 Fair
Hate 0.5786 🟑 Fair

Per-Category F1 Scores


🎯 Supported Categories

  1. Hate - Hate speech and extremism
  2. Harassment - Bullying, threats, personal attacks
  3. Sexual - Explicit sexual content
  4. ChildSafety - Content endangering minors
  5. Violence - Gore, graphic violence, harm instructions
  6. Illegal - Illegal activities (drugs, weapons, fraud)
  7. Privacy - PII exposure, doxxing

πŸš€ Usage

To inference with this model, you MUST define the custom architecture class and load both the LoRA adapter and the classifier head.

1. Install Dependencies

pip install torch transformers peft huggingface_hub

2. Inference Code

import torch
import torch.nn as nn
from transformers import AutoTokenizer, AutoModel
from peft import PeftModel
from huggingface_hub import hf_hub_download

# --- MODEL DEFINITION (Must match training) ---
class SentinelLFMMultiLabel(nn.Module):
    def __init__(self, model_id, num_labels):
        super().__init__()
        self.num_labels = num_labels
        self.base_model = AutoModel.from_pretrained(model_id, trust_remote_code=True)
        self.config = self.base_model.config
        hidden_size = self.config.hidden_size
        self.classifier = nn.Sequential(
            nn.Linear(hidden_size, hidden_size),
            nn.Tanh(),
            nn.Dropout(0.2),
            nn.Linear(hidden_size, num_labels)
        )
        self.loss_fct = nn.BCEWithLogitsLoss()
    
    def forward(self, input_ids=None, attention_mask=None, labels=None, **kwargs):
        outputs = self.base_model(input_ids=input_ids, attention_mask=attention_mask, **kwargs)
        hidden_states = outputs[0] if isinstance(outputs, tuple) else outputs.last_hidden_state
        if attention_mask is not None:
            last_idx = attention_mask.sum(1) - 1
            pooled = hidden_states[torch.arange(input_ids.shape[0], device=input_ids.device), last_idx]
        else:
            pooled = hidden_states[:, -1, :]
        logits = self.classifier(pooled)
        loss = self.loss_fct(logits, labels.float()) if labels is not None else None
        from transformers.modeling_outputs import SequenceClassifierOutput
        return SequenceClassifierOutput(loss=loss, logits=logits)

# --- SETUP ---
CATS = ["Hate", "Harassment", "Sexual", "ChildSafety", "Violence", "Illegal", "Privacy"]
DEVICE = "cuda" if torch.cuda.is_available() else "cpu"
REPO_ID = "abdulmunimjemal/Sentinel-Rail-B-Policy-Guard"

# 1. Initialize Model Architecture (Loads Base 350M)
print("Loading base model...")
model = SentinelLFMMultiLabel("LiquidAI/LFM2-350M", num_labels=7)

# 2. Load LoRA Adapter
print("Loading LoRA adapter...")
model.base_model = PeftModel.from_pretrained(model.base_model, REPO_ID)

# 3. Load Custom Classifier Head
print("Loading classifier head...")
classifier_path = hf_hub_download(repo_id=REPO_ID, filename="classifier.pt")
state_dict = torch.load(classifier_path, map_location="cpu")
model.classifier.load_state_dict(state_dict)

model.to(DEVICE)
model.eval()
tokenizer = AutoTokenizer.from_pretrained("LiquidAI/LFM2-350M", trust_remote_code=True)

# --- PREDICT ---
text = "How do I make a homemade explosive?"
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512).to(DEVICE)

with torch.no_grad():
    outputs = model(**inputs)
    probs = torch.sigmoid(outputs.logits)[0]

print(f"\nInput: {text}")
print("-" * 30)
for i, prob in enumerate(probs):
    if prob > 0.5:
        print(f"🚨 {CATS[i]}: {prob:.4f}")

πŸ“¦ Dataset Stats

Trained on a balanced dataset of ~189,000 samples (50% Safe / 50% Violations). Rare classes like Privacy and Illegal were upsampled to ~15,000 samples each to ensure high performance (F1 > 0.97).


πŸ“œ License

Apache 2.0

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for abdulmunimjemal/Sentinel-Rail-B-Policy-Guard

Base model

LiquidAI/LFM2-350M
Adapter
(13)
this model