| --- |
| language: en |
| license: mit |
| tags: |
| - content-safety |
| - text-classification |
| - safety |
| - moderation |
| - deberta |
| datasets: |
| - jainsatyam26/guardrail-215k-splits |
| metrics: |
| - f1 |
| - accuracy |
| widget: |
| - text: "How to make a bomb?" |
| example_title: "Violent Content" |
| - text: "Hello, how are you?" |
| example_title: "Safe Content" |
| --- |
| |
| # High-Accuracy Content Safety Classifier |
|
|
| This model is a fine-tuned DeBERTa classifier for content safety, achieving high performance on safety classification tasks. |
|
|
| ## Model Details |
|
|
| - **Base Model**: microsoft/deberta-v3-large |
| - **Training Dataset**: jainsatyam26/guardrail-215k-splits |
| - **Categories**: 10 safety categories |
| - **Training Time**: Auto-deployed during training |
| - **Last Updated**: 2026-04-29 06:21:01 UTC |
|
|
|
|
| ## Performance |
|
|
| | Metric | Value | |
| |--------|-------| |
| | F1 Score | N/A | |
| | Accuracy | N/A | |
| | Unsafe F1 | N/A | |
|
|
|
|
| ## Categories |
|
|
| - `benign` |
| - `jailbreak` |
| - `S1 Violent Crimes` |
| - `S2 Non-Violent Crimes` |
| - `S4 Child Sexual Exploitation` |
| - `S7 Privacy` |
| - `S10 Hate` |
| - `S11 Self-Harm` |
| - `S12 Sexual Content` |
| - `S14 Code Abuse` |
|
|
| ## Usage |
|
|
| ```python |
| from transformers import AutoTokenizer, AutoModelForSequenceClassification |
| import torch |
| |
| tokenizer = AutoTokenizer.from_pretrained("jainsatyam26/bertclassfier") |
| model = AutoModelForSequenceClassification.from_pretrained("jainsatyam26/bertclassfier") |
| |
| def predict(text): |
| inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512) |
| with torch.no_grad(): |
| outputs = model(**inputs) |
| probs = torch.softmax(outputs.logits, dim=-1) |
| predicted_id = torch.argmax(probs, dim=-1).item() |
| |
| labels = ['benign', 'jailbreak', 'S1 Violent Crimes', 'S2 Non-Violent Crimes', 'S4 Child Sexual Exploitation', 'S7 Privacy', 'S10 Hate', 'S11 Self-Harm', 'S12 Sexual Content', 'S14 Code Abuse'] |
| return { |
| "prediction": labels[predicted_id], |
| "confidence": probs[0][predicted_id].item(), |
| "all_scores": {labels[i]: probs[0][i].item() for i in range(len(labels))} |
| } |
| |
| # Example |
| result = predict("How to make a bomb?") |
| print(result) |
| ``` |
|
|
| ## Training Configuration |
|
|
| This model was trained with the following configuration: |
|
|
| ```json |
| { |
| "model_name": "microsoft/deberta-v3-large", |
| "dataset_name": "jainsatyam26/guardrail-215k-splits", |
| "max_length": 512, |
| "epochs": 4, |
| "batch_size": 8, |
| "grad_accum": 4, |
| "learning_rate": 1e-05, |
| "weight_decay": 0.01, |
| "warmup_ratio": 0.1, |
| "use_llrd": true, |
| "llrd_alpha": 0.9, |
| "use_multisample_dropout": true, |
| "num_dropout_samples": 5, |
| "dropout_rate": 0.3, |
| "use_label_smoothing": true, |
| "label_smoothing": 0.1, |
| "use_focal_loss": true, |
| "focal_alpha": 0.7, |
| "focal_gamma": 2.0, |
| "use_hard_negative": true, |
| "hard_negative_ratio": 0.3, |
| "num_folds": 3, |
| "optimize_thresholds": true, |
| "output_dir": "./guardrail_model", |
| "checkpoint_steps": 500, |
| "logging_steps": 50, |
| "eval_steps": 500, |
| "hf_repo_id": "jainsatyam26/bertclassfier", |
| "hf_token": "***REDACTED***", |
| "deploy_every_minutes": 30, |
| "deploy_every_steps": 400, |
| "auto_deploy": true, |
| "private_repo": false, |
| "auto_resume": true, |
| "resume_from_hf": true, |
| "use_wandb": true, |
| "wandb_project": "safety-classifier", |
| "fp16": false, |
| "bf16": true, |
| "dataloader_num_workers": 4, |
| "seed": 42 |
| } |
| ``` |
|
|
| ## Automatic Deployment |
|
|
| This model is automatically deployed every 30 minutes during training with: |
| - ✅ Automatic checkpoint recovery |
| - ✅ Real-time performance monitoring |
| - ✅ Progressive model updates |
| - ✅ Training state persistence |
|
|
| --- |
|
|
| *Generated automatically during training - 2026-04-29 06:21:01* |
|
|