| --- |
| language: |
| - en |
| license: apache-2.0 |
| tags: |
| - text-classification |
| - roberta |
| - toxic-comments |
| - moderation |
| datasets: |
| - tweet_eval |
| metrics: |
| - accuracy |
| - f1 |
| - precision |
| - recall |
| --- |
| |
| # Toxicity Classifier (RoBERTa) |
|
|
| This model is a fine-tuned version of `roberta-base` trained to classify text into two categories: **Safe** and **Toxic** (Hate Speech). It is optimized for analyzing internet text, comments, and short social media posts. |
|
|
| ## Intended Use |
|
|
| The intended use of this model is to automatically moderate user-generated content, flag potentially harmful text, and maintain safe text environments in digital platforms. |
|
|
| - **Input:** Raw English text (comments, tweets, reviews). |
| - **Return:** A binary classification label (`Toxic` or `Safe / Non-Toxic`) with confidence scores. |
|
|
| ## Training Data |
|
|
| The model was highly optimized using the canonical `tweet_eval` (Hate subset) dataset, which contains carefully curated text samples tagged for toxicity. |
|
|
| ## Performance Metrics |
|
|
| The model was evaluated using robust statistical offline evaluation. The final performance metrics obtained on the evaluation set are: |
|
|
| - **Accuracy:** `0.7970` |
| - **F1 Score:** `0.7955` |
| - **Precision:** `0.7954` |
| - **Recall:** `0.8017` |
| - **Evaluation Loss:** `0.9114` |
|
|
| ## Training Constraints & Hyperparameters |
|
|
| The model was trained under the following conditions: |
| - **Base Architecture:** `roberta-base` |
| - **Maximum Sequence Length:** 128 |
| - **Learning Rate:** 1e-05 |
| - **Batch Size:** 64 |
| - **Precision:** Mixed Precision (fp16) |
| - **Optimizer Strategy:** Early Stopping (patience=3) |
|
|
| ## Usage |
|
|
| You can use this model directly with the Hugging Face `transformers` library pipeline: |
|
|
| ```python |
| from transformers import pipeline |
| |
| # Load the toxicity classifier |
| classifier = pipeline("text-classification", model="your-username/roberta-toxic-classifier-en") |
| |
| text = "I completely disagree with your point of view." |
| result = classifier(text) |
| |
| print(result) |
| # Output: [{'label': 'Safe / Non-Toxic', 'score': 0.98...}] |
| ``` |
|
|