Toxicity Classifier (RoBERTa)

This model is a fine-tuned version of roberta-base trained to classify text into two categories: Safe and Toxic (Hate Speech). It is optimized for analyzing internet text, comments, and short social media posts.

Intended Use

The intended use of this model is to automatically moderate user-generated content, flag potentially harmful text, and maintain safe text environments in digital platforms.

  • Input: Raw English text (comments, tweets, reviews).
  • Return: A binary classification label (Toxic or Safe / Non-Toxic) with confidence scores.

Training Data

The model was highly optimized using the canonical tweet_eval (Hate subset) dataset, which contains carefully curated text samples tagged for toxicity.

Performance Metrics

The model was evaluated using robust statistical offline evaluation. The final performance metrics obtained on the evaluation set are:

  • Accuracy: 0.7970
  • F1 Score: 0.7955
  • Precision: 0.7954
  • Recall: 0.8017
  • Evaluation Loss: 0.9114

Training Constraints & Hyperparameters

The model was trained under the following conditions:

  • Base Architecture: roberta-base
  • Maximum Sequence Length: 128
  • Learning Rate: 1e-05
  • Batch Size: 64
  • Precision: Mixed Precision (fp16)
  • Optimizer Strategy: Early Stopping (patience=3)

Usage

You can use this model directly with the Hugging Face transformers library pipeline:

from transformers import pipeline

# Load the toxicity classifier
classifier = pipeline("text-classification", model="your-username/roberta-toxic-classifier-en")

text = "I completely disagree with your point of view."
result = classifier(text)

print(result)
# Output: [{'label': 'Safe / Non-Toxic', 'score': 0.98...}]
Downloads last month
23
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train ENTUM-AI/roberta-toxic-classifier-en