--- language: - en license: apache-2.0 tags: - text-classification - roberta - toxic-comments - moderation datasets: - tweet_eval metrics: - accuracy - f1 - precision - recall --- # Toxicity Classifier (RoBERTa) This model is a fine-tuned version of `roberta-base` trained to classify text into two categories: **Safe** and **Toxic** (Hate Speech). It is optimized for analyzing internet text, comments, and short social media posts. ## Intended Use The intended use of this model is to automatically moderate user-generated content, flag potentially harmful text, and maintain safe text environments in digital platforms. - **Input:** Raw English text (comments, tweets, reviews). - **Return:** A binary classification label (`Toxic` or `Safe / Non-Toxic`) with confidence scores. ## Training Data The model was highly optimized using the canonical `tweet_eval` (Hate subset) dataset, which contains carefully curated text samples tagged for toxicity. ## Performance Metrics The model was evaluated using robust statistical offline evaluation. The final performance metrics obtained on the evaluation set are: - **Accuracy:** `0.7970` - **F1 Score:** `0.7955` - **Precision:** `0.7954` - **Recall:** `0.8017` - **Evaluation Loss:** `0.9114` ## Training Constraints & Hyperparameters The model was trained under the following conditions: - **Base Architecture:** `roberta-base` - **Maximum Sequence Length:** 128 - **Learning Rate:** 1e-05 - **Batch Size:** 64 - **Precision:** Mixed Precision (fp16) - **Optimizer Strategy:** Early Stopping (patience=3) ## Usage You can use this model directly with the Hugging Face `transformers` library pipeline: ```python from transformers import pipeline # Load the toxicity classifier classifier = pipeline("text-classification", model="your-username/roberta-toxic-classifier-en") text = "I completely disagree with your point of view." result = classifier(text) print(result) # Output: [{'label': 'Safe / Non-Toxic', 'score': 0.98...}] ```