Famezz's picture
Update README.md
65f63ba verified
---
language:
- it
- en
license: mit
library_name: transformers
tags:
- text-classification
- safety
- toxicity
- insults
- xlm-roberta
- nlp
base_model: xlm-roberta-base
pipeline_tag: text-classification
---
# XLM-RoBERTa Safety Classifier (Italian & English)
## Model Description
This is an **XLM-RoBERTa-based** binary text classification model fine-tuned to detect **toxicity and insults** in user queries. It is trained on a bilingual dataset (Italian and English) to distinguish between **SAFE** (benign) and **UNSAFE** (toxic/harmful) inputs.
- **Model Type:** XLM-RoBERTa (Fine-tuned)
- **Languages:** Italian (`it`), English (`en`)
- **Task:** Binary Classification
- **Training Dataset Size:** 9,035 samples
- **Created by:** [Famezz](https://huggingface.co/Famezz)
## Intended Use
This model is designed to act as a **guardrail** for Chatbots and LLMs. It can be used to:
1. Filter out toxic user inputs before they reach a Large Language Model.
2. Flag offensive content in user-generated text.
## Label Mapping
The model is trained to predict the following string labels directly:
| Label | Description |
| :--- | :--- |
| **SAFE** | Benign queries, general knowledge, small talk. |
| **UNSAFE** | Toxic content, insults, offensive language. |
## Usage
You can use this model directly with the Hugging Face `pipeline`. The pipeline will automatically output the labels "SAFE" or "UNSAFE".
```python
from transformers import pipeline
# Load the classifier
classifier = pipeline("text-classification", model="Famezz/roberta_safety_classifier")
# Test with English
print(classifier("How do I bake a cake?"))
# Output: [{'label': 'SAFE', 'score': 0.99}]
# Test with Italian
print(classifier("Sei un idiota"))
# Output: [{'label': 'UNSAFE', 'score': 0.98}]