ENTUM-AI's picture
Initial upload of RoBERTa Toxicity Classifier
7287ba8 verified
---
language:
- en
license: apache-2.0
tags:
- text-classification
- roberta
- toxic-comments
- moderation
datasets:
- tweet_eval
metrics:
- accuracy
- f1
- precision
- recall
---
# Toxicity Classifier (RoBERTa)
This model is a fine-tuned version of `roberta-base` trained to classify text into two categories: **Safe** and **Toxic** (Hate Speech). It is optimized for analyzing internet text, comments, and short social media posts.
## Intended Use
The intended use of this model is to automatically moderate user-generated content, flag potentially harmful text, and maintain safe text environments in digital platforms.
- **Input:** Raw English text (comments, tweets, reviews).
- **Return:** A binary classification label (`Toxic` or `Safe / Non-Toxic`) with confidence scores.
## Training Data
The model was highly optimized using the canonical `tweet_eval` (Hate subset) dataset, which contains carefully curated text samples tagged for toxicity.
## Performance Metrics
The model was evaluated using robust statistical offline evaluation. The final performance metrics obtained on the evaluation set are:
- **Accuracy:** `0.7970`
- **F1 Score:** `0.7955`
- **Precision:** `0.7954`
- **Recall:** `0.8017`
- **Evaluation Loss:** `0.9114`
## Training Constraints & Hyperparameters
The model was trained under the following conditions:
- **Base Architecture:** `roberta-base`
- **Maximum Sequence Length:** 128
- **Learning Rate:** 1e-05
- **Batch Size:** 64
- **Precision:** Mixed Precision (fp16)
- **Optimizer Strategy:** Early Stopping (patience=3)
## Usage
You can use this model directly with the Hugging Face `transformers` library pipeline:
```python
from transformers import pipeline
# Load the toxicity classifier
classifier = pipeline("text-classification", model="your-username/roberta-toxic-classifier-en")
text = "I completely disagree with your point of view."
result = classifier(text)
print(result)
# Output: [{'label': 'Safe / Non-Toxic', 'score': 0.98...}]
```