Toxicity Classifier (RoBERTa)
This model is a fine-tuned version of roberta-base trained to classify text into two categories: Safe and Toxic (Hate Speech). It is optimized for analyzing internet text, comments, and short social media posts.
Intended Use
The intended use of this model is to automatically moderate user-generated content, flag potentially harmful text, and maintain safe text environments in digital platforms.
- Input: Raw English text (comments, tweets, reviews).
- Return: A binary classification label (
ToxicorSafe / Non-Toxic) with confidence scores.
Training Data
The model was highly optimized using the canonical tweet_eval (Hate subset) dataset, which contains carefully curated text samples tagged for toxicity.
Performance Metrics
The model was evaluated using robust statistical offline evaluation. The final performance metrics obtained on the evaluation set are:
- Accuracy:
0.7970 - F1 Score:
0.7955 - Precision:
0.7954 - Recall:
0.8017 - Evaluation Loss:
0.9114
Training Constraints & Hyperparameters
The model was trained under the following conditions:
- Base Architecture:
roberta-base - Maximum Sequence Length: 128
- Learning Rate: 1e-05
- Batch Size: 64
- Precision: Mixed Precision (fp16)
- Optimizer Strategy: Early Stopping (patience=3)
Usage
You can use this model directly with the Hugging Face transformers library pipeline:
from transformers import pipeline
# Load the toxicity classifier
classifier = pipeline("text-classification", model="your-username/roberta-toxic-classifier-en")
text = "I completely disagree with your point of view."
result = classifier(text)
print(result)
# Output: [{'label': 'Safe / Non-Toxic', 'score': 0.98...}]
- Downloads last month
- 23