---
language:
- en
license: apache-2.0
tags:
- text-classification
- roberta
- toxic-comments
- moderation
datasets:
- tweet_eval
metrics:
- accuracy
- f1
- precision
- recall
---

# Toxicity Classifier (RoBERTa)

This model is a fine-tuned version of `roberta-base` trained to classify text into two categories: **Safe** and **Toxic** (Hate Speech). It is optimized for analyzing internet text, comments, and short social media posts.

## Intended Use

The intended use of this model is to automatically moderate user-generated content, flag potentially harmful text, and maintain safe text environments in digital platforms. 

- **Input:** Raw English text (comments, tweets, reviews).
- **Return:** A binary classification label (`Toxic` or `Safe / Non-Toxic`) with confidence scores.

## Training Data

The model was highly optimized using the canonical `tweet_eval` (Hate subset) dataset, which contains carefully curated text samples tagged for toxicity. 

## Performance Metrics

The model was evaluated using robust statistical offline evaluation. The final performance metrics obtained on the evaluation set are:

- **Accuracy:** `0.7970`
- **F1 Score:** `0.7955`
- **Precision:** `0.7954`
- **Recall:** `0.8017`
- **Evaluation Loss:** `0.9114`

## Training Constraints & Hyperparameters

The model was trained under the following conditions:
- **Base Architecture:** `roberta-base`
- **Maximum Sequence Length:** 128
- **Learning Rate:** 1e-05
- **Batch Size:** 64
- **Precision:** Mixed Precision (fp16) 
- **Optimizer Strategy:** Early Stopping (patience=3)

## Usage

You can use this model directly with the Hugging Face `transformers` library pipeline:

```python
from transformers import pipeline

# Load the toxicity classifier
classifier = pipeline("text-classification", model="your-username/roberta-toxic-classifier-en")

text = "I completely disagree with your point of view."
result = classifier(text)

print(result)
# Output: [{'label': 'Safe / Non-Toxic', 'score': 0.98...}]
```