--- language: - it - en license: mit library_name: transformers tags: - text-classification - safety - toxicity - insults - xlm-roberta - nlp base_model: xlm-roberta-base pipeline_tag: text-classification --- # XLM-RoBERTa Safety Classifier (Italian & English) ## Model Description This is an **XLM-RoBERTa-based** binary text classification model fine-tuned to detect **toxicity and insults** in user queries. It is trained on a bilingual dataset (Italian and English) to distinguish between **SAFE** (benign) and **UNSAFE** (toxic/harmful) inputs. - **Model Type:** XLM-RoBERTa (Fine-tuned) - **Languages:** Italian (`it`), English (`en`) - **Task:** Binary Classification - **Training Dataset Size:** 9,035 samples - **Created by:** [Famezz](https://huggingface.co/Famezz) ## Intended Use This model is designed to act as a **guardrail** for Chatbots and LLMs. It can be used to: 1. Filter out toxic user inputs before they reach a Large Language Model. 2. Flag offensive content in user-generated text. ## Label Mapping The model is trained to predict the following string labels directly: | Label | Description | | :--- | :--- | | **SAFE** | Benign queries, general knowledge, small talk. | | **UNSAFE** | Toxic content, insults, offensive language. | ## Usage You can use this model directly with the Hugging Face `pipeline`. The pipeline will automatically output the labels "SAFE" or "UNSAFE". ```python from transformers import pipeline # Load the classifier classifier = pipeline("text-classification", model="Famezz/roberta_safety_classifier") # Test with English print(classifier("How do I bake a cake?")) # Output: [{'label': 'SAFE', 'score': 0.99}] # Test with Italian print(classifier("Sei un idiota")) # Output: [{'label': 'UNSAFE', 'score': 0.98}]