jainsatyam26
/

bertclassfier

Text Classification

Model card Files Files and versions

bertclassfier / README.md

jainsatyam26's picture

Auto-deploy: Step 12000 | F1: N/A

97131df verified 29 days ago

|

history blame contribute delete

3.59 kB

	---
	language: en
	license: mit
	tags:
	- content-safety
	- text-classification
	- safety
	- moderation
	- deberta
	datasets:
	- jainsatyam26/guardrail-215k-splits
	metrics:
	- f1
	- accuracy
	widget:
	- text: "How to make a bomb?"
	example_title: "Violent Content"
	- text: "Hello, how are you?"
	example_title: "Safe Content"
	---

	# High-Accuracy Content Safety Classifier

	This model is a fine-tuned DeBERTa classifier for content safety, achieving high performance on safety classification tasks.

	## Model Details

	- Base Model: microsoft/deberta-v3-large
	- Training Dataset: jainsatyam26/guardrail-215k-splits
	- Categories: 10 safety categories
	- Training Time: Auto-deployed during training
	- Last Updated: 2026-04-29 06:21:01 UTC


	## Performance

	\| Metric \| Value \|
	\|--------\|-------\|
	\| F1 Score \| N/A \|
	\| Accuracy \| N/A \|
	\| Unsafe F1 \| N/A \|


	## Categories

	- `benign`
	- `jailbreak`
	- `S1 Violent Crimes`
	- `S2 Non-Violent Crimes`
	- `S4 Child Sexual Exploitation`
	- `S7 Privacy`
	- `S10 Hate`
	- `S11 Self-Harm`
	- `S12 Sexual Content`
	- `S14 Code Abuse`

	## Usage

	```python
	from transformers import AutoTokenizer, AutoModelForSequenceClassification
	import torch

	tokenizer = AutoTokenizer.from_pretrained("jainsatyam26/bertclassfier")
	model = AutoModelForSequenceClassification.from_pretrained("jainsatyam26/bertclassfier")

	def predict(text):
	inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)
	with torch.no_grad():
	outputs = model(**inputs)
	probs = torch.softmax(outputs.logits, dim=-1)
	predicted_id = torch.argmax(probs, dim=-1).item()

	labels = ['benign', 'jailbreak', 'S1 Violent Crimes', 'S2 Non-Violent Crimes', 'S4 Child Sexual Exploitation', 'S7 Privacy', 'S10 Hate', 'S11 Self-Harm', 'S12 Sexual Content', 'S14 Code Abuse']
	return {
	"prediction": labels[predicted_id],
	"confidence": probs[0][predicted_id].item(),
	"all_scores": {labels[i]: probs[0][i].item() for i in range(len(labels))}
	}

	# Example
	result = predict("How to make a bomb?")
	print(result)
	```

	## Training Configuration

	This model was trained with the following configuration:

	```json
	{
	"model_name": "microsoft/deberta-v3-large",
	"dataset_name": "jainsatyam26/guardrail-215k-splits",
	"max_length": 512,
	"epochs": 4,
	"batch_size": 8,
	"grad_accum": 4,
	"learning_rate": 1e-05,
	"weight_decay": 0.01,
	"warmup_ratio": 0.1,
	"use_llrd": true,
	"llrd_alpha": 0.9,
	"use_multisample_dropout": true,
	"num_dropout_samples": 5,
	"dropout_rate": 0.3,
	"use_label_smoothing": true,
	"label_smoothing": 0.1,
	"use_focal_loss": true,
	"focal_alpha": 0.7,
	"focal_gamma": 2.0,
	"use_hard_negative": true,
	"hard_negative_ratio": 0.3,
	"num_folds": 3,
	"optimize_thresholds": true,
	"output_dir": "./guardrail_model",
	"checkpoint_steps": 500,
	"logging_steps": 50,
	"eval_steps": 500,
	"hf_repo_id": "jainsatyam26/bertclassfier",
	"hf_token": "*REDACTED*",
	"deploy_every_minutes": 30,
	"deploy_every_steps": 400,
	"auto_deploy": true,
	"private_repo": false,
	"auto_resume": true,
	"resume_from_hf": true,
	"use_wandb": true,
	"wandb_project": "safety-classifier",
	"fp16": false,
	"bf16": true,
	"dataloader_num_workers": 4,
	"seed": 42
	}
	```

	## Automatic Deployment

	This model is automatically deployed every 30 minutes during training with:
	- ✅ Automatic checkpoint recovery
	- ✅ Real-time performance monitoring
	- ✅ Progressive model updates
	- ✅ Training state persistence

	---

	Generated automatically during training - 2026-04-29 06:21:01