ENTUM-AI
/

roberta-toxic-classifier-en

Text Classification

Model card Files Files and versions

roberta-toxic-classifier-en / README.md

ENTUM-AI's picture

Initial upload of RoBERTa Toxicity Classifier

7287ba8 verified 15 days ago

|

history blame contribute delete

2 kB

	---
	language:
	- en
	license: apache-2.0
	tags:
	- text-classification
	- roberta
	- toxic-comments
	- moderation
	datasets:
	- tweet_eval
	metrics:
	- accuracy
	- f1
	- precision
	- recall
	---

	# Toxicity Classifier (RoBERTa)

	This model is a fine-tuned version of `roberta-base` trained to classify text into two categories: Safe and Toxic (Hate Speech). It is optimized for analyzing internet text, comments, and short social media posts.

	## Intended Use

	The intended use of this model is to automatically moderate user-generated content, flag potentially harmful text, and maintain safe text environments in digital platforms.

	- Input: Raw English text (comments, tweets, reviews).
	- Return: A binary classification label (`Toxic` or `Safe / Non-Toxic`) with confidence scores.

	## Training Data

	The model was highly optimized using the canonical `tweet_eval` (Hate subset) dataset, which contains carefully curated text samples tagged for toxicity.

	## Performance Metrics

	The model was evaluated using robust statistical offline evaluation. The final performance metrics obtained on the evaluation set are:

	- Accuracy: `0.7970`
	- F1 Score: `0.7955`
	- Precision: `0.7954`
	- Recall: `0.8017`
	- Evaluation Loss: `0.9114`

	## Training Constraints & Hyperparameters

	The model was trained under the following conditions:
	- Base Architecture: `roberta-base`
	- Maximum Sequence Length: 128
	- Learning Rate: 1e-05
	- Batch Size: 64
	- Precision: Mixed Precision (fp16)
	- Optimizer Strategy: Early Stopping (patience=3)

	## Usage

	You can use this model directly with the Hugging Face `transformers` library pipeline:

	```python
	from transformers import pipeline

	# Load the toxicity classifier
	classifier = pipeline("text-classification", model="your-username/roberta-toxic-classifier-en")

	text = "I completely disagree with your point of view."
	result = classifier(text)

	print(result)
	# Output: [{'label': 'Safe / Non-Toxic', 'score': 0.98...}]
	```