High-Accuracy Content Safety Classifier

This model is a fine-tuned DeBERTa classifier for content safety, achieving high performance on safety classification tasks.

Model Details

  • Base Model: microsoft/deberta-v3-large
  • Training Dataset: jainsatyam26/guardrail-215k-splits
  • Categories: 10 safety categories
  • Training Time: Auto-deployed during training
  • Last Updated: 2026-04-29 06:21:01 UTC

Performance

Metric Value
F1 Score N/A
Accuracy N/A
Unsafe F1 N/A

Categories

  • benign
  • jailbreak
  • S1 Violent Crimes
  • S2 Non-Violent Crimes
  • S4 Child Sexual Exploitation
  • S7 Privacy
  • S10 Hate
  • S11 Self-Harm
  • S12 Sexual Content
  • S14 Code Abuse

Usage

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

tokenizer = AutoTokenizer.from_pretrained("jainsatyam26/bertclassfier")
model = AutoModelForSequenceClassification.from_pretrained("jainsatyam26/bertclassfier")

def predict(text):
    inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)
    with torch.no_grad():
        outputs = model(**inputs)
        probs = torch.softmax(outputs.logits, dim=-1)
        predicted_id = torch.argmax(probs, dim=-1).item()
    
    labels = ['benign', 'jailbreak', 'S1 Violent Crimes', 'S2 Non-Violent Crimes', 'S4 Child Sexual Exploitation', 'S7 Privacy', 'S10 Hate', 'S11 Self-Harm', 'S12 Sexual Content', 'S14 Code Abuse']
    return {
        "prediction": labels[predicted_id],
        "confidence": probs[0][predicted_id].item(),
        "all_scores": {labels[i]: probs[0][i].item() for i in range(len(labels))}
    }

# Example
result = predict("How to make a bomb?")
print(result)

Training Configuration

This model was trained with the following configuration:

{
  "model_name": "microsoft/deberta-v3-large",
  "dataset_name": "jainsatyam26/guardrail-215k-splits",
  "max_length": 512,
  "epochs": 4,
  "batch_size": 8,
  "grad_accum": 4,
  "learning_rate": 1e-05,
  "weight_decay": 0.01,
  "warmup_ratio": 0.1,
  "use_llrd": true,
  "llrd_alpha": 0.9,
  "use_multisample_dropout": true,
  "num_dropout_samples": 5,
  "dropout_rate": 0.3,
  "use_label_smoothing": true,
  "label_smoothing": 0.1,
  "use_focal_loss": true,
  "focal_alpha": 0.7,
  "focal_gamma": 2.0,
  "use_hard_negative": true,
  "hard_negative_ratio": 0.3,
  "num_folds": 3,
  "optimize_thresholds": true,
  "output_dir": "./guardrail_model",
  "checkpoint_steps": 500,
  "logging_steps": 50,
  "eval_steps": 500,
  "hf_repo_id": "jainsatyam26/bertclassfier",
  "hf_token": "***REDACTED***",
  "deploy_every_minutes": 30,
  "deploy_every_steps": 400,
  "auto_deploy": true,
  "private_repo": false,
  "auto_resume": true,
  "resume_from_hf": true,
  "use_wandb": true,
  "wandb_project": "safety-classifier",
  "fp16": false,
  "bf16": true,
  "dataloader_num_workers": 4,
  "seed": 42
}

Automatic Deployment

This model is automatically deployed every 30 minutes during training with:

  • ✅ Automatic checkpoint recovery
  • ✅ Real-time performance monitoring
  • ✅ Progressive model updates
  • ✅ Training state persistence

Generated automatically during training - 2026-04-29 06:21:01

Downloads last month
149
Safetensors
Model size
0.2B params
Tensor type
F16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train jainsatyam26/bertclassfier