File size: 1,781 Bytes

13a1f45

---
language:
- it
- en
license: mit
library_name: transformers
tags:
- text-classification
- safety
- toxicity
- insults
- xlm-roberta
- nlp
base_model: xlm-roberta-base
pipeline_tag: text-classification
---

# XLM-RoBERTa Safety Classifier (Italian & English)

## Model Description

This is an **XLM-RoBERTa-based** binary text classification model fine-tuned to detect **toxicity and insults** in user queries. It is trained on a bilingual dataset (Italian and English) to distinguish between **SAFE** (benign) and **UNSAFE** (toxic/harmful) inputs.

- **Model Type:** XLM-RoBERTa (Fine-tuned)
- **Languages:** Italian (`it`), English (`en`)
- **Task:** Binary Classification
- **Training Dataset Size:** 9,035 samples
- **Created by:** [Famezz](https://huggingface.co/Famezz)

## Intended Use

This model is designed to act as a **guardrail** for Chatbots and LLMs. It can be used to:
1.  Filter out toxic user inputs before they reach a Large Language Model.
2.  Flag offensive content in user-generated text.

## Label Mapping

The model is trained to predict the following string labels directly:

| Label | Description |
| :--- | :--- |
| **SAFE** | Benign queries, general knowledge, small talk. |
| **UNSAFE** | Toxic content, insults, offensive language. |

## Usage

You can use this model directly with the Hugging Face `pipeline`. The pipeline will automatically output the labels "SAFE" or "UNSAFE".

```python
from transformers import pipeline

# Load the classifier
classifier = pipeline("text-classification", model="Famezz/roberta_safety_classifier")

# Test with English
print(classifier("How do I bake a cake?"))
# Output: [{'label': 'SAFE', 'score': 0.99}]

# Test with Italian
print(classifier("Sei un idiota"))
# Output: [{'label': 'UNSAFE', 'score': 0.98}]