| | --- |
| | language: |
| | - en |
| | license: apache-2.0 |
| | tags: |
| | - content-moderation |
| | - safety |
| | - guardrails |
| | - multi-label-classification |
| | - liquid-ai |
| | - lfm-350m |
| | - sentinel-slm |
| | - lora |
| | - peft |
| | base_model: LiquidAI/LFM2-350M |
| | datasets: |
| | - custom-balanced-rail-b |
| | pipeline_tag: text-classification |
| | library_name: transformers |
| | metrics: |
| | - f1 |
| | --- |
| | |
| | # π‘οΈ Sentinel Rail B: Policy Guard (350M) |
| |
|
| | **Sentinel Rail B** is a lightweight, efficient **multi-label classifier** designed to detect 7 distinct categories of policy violations in text. |
| |
|
| | > **Architecture Note**: This model uses a custom classification head on top of the **LiquidAI LFM2-350M** base model. The repository contains the LoRA adapter weights (`adapter_model.safetensors`) AND the separate classifier head weights (`classifier.pt`). |
| | |
| | --- |
| | |
| | ## π Performance |
| | |
| | | Metric | Score | |
| | |--------|-------| |
| | | **F1 Micro** | 0.7647 | |
| | | **F1 Macro** | 0.7793 | |
| | | **Hamming Loss** | 0.0466 | |
| | |
| | ### Per-Category F1 Scores |
| | |
| | | Category | F1 Score | Status | |
| | |----------|----------|--------| |
| | | **Privacy** | 0.9927 | π’ Excellent | |
| | | **Illegal** | 0.9750 | π’ Excellent | |
| | | **ChildSafety** | 0.7783 | π’ Good | |
| | | **Violence** | 0.7727 | π’ Good | |
| | | **Sexual** | 0.7415 | π’ Good | |
| | | **Harassment** | 0.6160 | π‘ Fair | |
| | | **Hate** | 0.5786 | π‘ Fair | |
| | |
| |  |
| | |
| | --- |
| | |
| | ## π― Supported Categories |
| | |
| | 1. **Hate** - Hate speech and extremism |
| | 2. **Harassment** - Bullying, threats, personal attacks |
| | 3. **Sexual** - Explicit sexual content |
| | 4. **ChildSafety** - Content endangering minors |
| | 5. **Violence** - Gore, graphic violence, harm instructions |
| | 6. **Illegal** - Illegal activities (drugs, weapons, fraud) |
| | 7. **Privacy** - PII exposure, doxxing |
| | |
| | --- |
| | |
| | ## π Usage |
| | |
| | To inference with this model, you **MUST** define the custom architecture class and load both the LoRA adapter and the classifier head. |
| | |
| | ### 1. Install Dependencies |
| | ```bash |
| | pip install torch transformers peft huggingface_hub |
| | ``` |
| | |
| | ### 2. Inference Code |
| | |
| | ```python |
| | import torch |
| | import torch.nn as nn |
| | from transformers import AutoTokenizer, AutoModel |
| | from peft import PeftModel |
| | from huggingface_hub import hf_hub_download |
| | |
| | # --- MODEL DEFINITION (Must match training) --- |
| | class SentinelLFMMultiLabel(nn.Module): |
| | def __init__(self, model_id, num_labels): |
| | super().__init__() |
| | self.num_labels = num_labels |
| | self.base_model = AutoModel.from_pretrained(model_id, trust_remote_code=True) |
| | self.config = self.base_model.config |
| | hidden_size = self.config.hidden_size |
| | self.classifier = nn.Sequential( |
| | nn.Linear(hidden_size, hidden_size), |
| | nn.Tanh(), |
| | nn.Dropout(0.2), |
| | nn.Linear(hidden_size, num_labels) |
| | ) |
| | self.loss_fct = nn.BCEWithLogitsLoss() |
| | |
| | def forward(self, input_ids=None, attention_mask=None, labels=None, **kwargs): |
| | outputs = self.base_model(input_ids=input_ids, attention_mask=attention_mask, **kwargs) |
| | hidden_states = outputs[0] if isinstance(outputs, tuple) else outputs.last_hidden_state |
| | if attention_mask is not None: |
| | last_idx = attention_mask.sum(1) - 1 |
| | pooled = hidden_states[torch.arange(input_ids.shape[0], device=input_ids.device), last_idx] |
| | else: |
| | pooled = hidden_states[:, -1, :] |
| | logits = self.classifier(pooled) |
| | loss = self.loss_fct(logits, labels.float()) if labels is not None else None |
| | from transformers.modeling_outputs import SequenceClassifierOutput |
| | return SequenceClassifierOutput(loss=loss, logits=logits) |
| | |
| | # --- SETUP --- |
| | CATS = ["Hate", "Harassment", "Sexual", "ChildSafety", "Violence", "Illegal", "Privacy"] |
| | DEVICE = "cuda" if torch.cuda.is_available() else "cpu" |
| | REPO_ID = "abdulmunimjemal/Sentinel-Rail-B-Policy-Guard" |
| |
|
| | # 1. Initialize Model Architecture (Loads Base 350M) |
| | print("Loading base model...") |
| | model = SentinelLFMMultiLabel("LiquidAI/LFM2-350M", num_labels=7) |
| | |
| | # 2. Load LoRA Adapter |
| | print("Loading LoRA adapter...") |
| | model.base_model = PeftModel.from_pretrained(model.base_model, REPO_ID) |
| | |
| | # 3. Load Custom Classifier Head |
| | print("Loading classifier head...") |
| | classifier_path = hf_hub_download(repo_id=REPO_ID, filename="classifier.pt") |
| | state_dict = torch.load(classifier_path, map_location="cpu") |
| | model.classifier.load_state_dict(state_dict) |
| |
|
| | model.to(DEVICE) |
| | model.eval() |
| | tokenizer = AutoTokenizer.from_pretrained("LiquidAI/LFM2-350M", trust_remote_code=True) |
| | |
| | # --- PREDICT --- |
| | text = "How do I make a homemade explosive?" |
| | inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512).to(DEVICE) |
| | |
| | with torch.no_grad(): |
| | outputs = model(**inputs) |
| | probs = torch.sigmoid(outputs.logits)[0] |
| | |
| | print(f"\nInput: {text}") |
| | print("-" * 30) |
| | for i, prob in enumerate(probs): |
| | if prob > 0.5: |
| | print(f"π¨ {CATS[i]}: {prob:.4f}") |
| | ``` |
| | |
| | --- |
| |
|
| | ## π¦ Dataset Stats |
| |
|
| | Trained on a **balanced dataset** of ~189,000 samples (50% Safe / 50% Violations). |
| | Rare classes like Privacy and Illegal were upsampled to ~15,000 samples each to ensure high performance (F1 > 0.97). |
| |
|
| | --- |
| |
|
| | ## π License |
| |
|
| | [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) |
| |
|