Toxicity Classifier (RoBERTa)

This model is a fine-tuned version of roberta-base trained to classify text into two categories: Safe and Toxic (Hate Speech). It is optimized for analyzing internet text, comments, and short social media posts.

Intended Use

The intended use of this model is to automatically moderate user-generated content, flag potentially harmful text, and maintain safe text environments in digital platforms.

Input: Raw English text (comments, tweets, reviews).
Return: A binary classification label (Toxic or Safe / Non-Toxic) with confidence scores.

Training Data

The model was highly optimized using the canonical tweet_eval (Hate subset) dataset, which contains carefully curated text samples tagged for toxicity.

Performance Metrics

The model was evaluated using robust statistical offline evaluation. The final performance metrics obtained on the evaluation set are:

Accuracy: 0.7970
F1 Score: 0.7955
Precision: 0.7954
Recall: 0.8017
Evaluation Loss: 0.9114

Training Constraints & Hyperparameters

The model was trained under the following conditions:

Base Architecture: roberta-base
Maximum Sequence Length: 128
Learning Rate: 1e-05
Batch Size: 64
Precision: Mixed Precision (fp16)
Optimizer Strategy: Early Stopping (patience=3)

Usage

You can use this model directly with the Hugging Face transformers library pipeline:

from transformers import pipeline

# Load the toxicity classifier
classifier = pipeline("text-classification", model="your-username/roberta-toxic-classifier-en")

text = "I completely disagree with your point of view."
result = classifier(text)

print(result)
# Output: [{'label': 'Safe / Non-Toxic', 'score': 0.98...}]

Downloads last month: 3

Safetensors

Model size

0.1B params

Tensor type

F32

ENTUM-AI
/

roberta-toxic-classifier-en