Famezz commited on
Commit
13a1f45
·
verified ·
1 Parent(s): bca77e2

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +62 -3
README.md CHANGED
@@ -1,3 +1,62 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - it
4
+ - en
5
+ license: mit
6
+ library_name: transformers
7
+ tags:
8
+ - text-classification
9
+ - safety
10
+ - toxicity
11
+ - insults
12
+ - xlm-roberta
13
+ - nlp
14
+ base_model: xlm-roberta-base
15
+ pipeline_tag: text-classification
16
+ ---
17
+
18
+ # XLM-RoBERTa Safety Classifier (Italian & English)
19
+
20
+ ## Model Description
21
+
22
+ This is an **XLM-RoBERTa-based** binary text classification model fine-tuned to detect **toxicity and insults** in user queries. It is trained on a bilingual dataset (Italian and English) to distinguish between **SAFE** (benign) and **UNSAFE** (toxic/harmful) inputs.
23
+
24
+ - **Model Type:** XLM-RoBERTa (Fine-tuned)
25
+ - **Languages:** Italian (`it`), English (`en`)
26
+ - **Task:** Binary Classification
27
+ - **Training Dataset Size:** 9,035 samples
28
+ - **Created by:** [Famezz](https://huggingface.co/Famezz)
29
+
30
+ ## Intended Use
31
+
32
+ This model is designed to act as a **guardrail** for Chatbots and LLMs. It can be used to:
33
+ 1. Filter out toxic user inputs before they reach a Large Language Model.
34
+ 2. Flag offensive content in user-generated text.
35
+
36
+ ## Label Mapping
37
+
38
+ The model is trained to predict the following string labels directly:
39
+
40
+ | Label | Description |
41
+ | :--- | :--- |
42
+ | **SAFE** | Benign queries, general knowledge, small talk. |
43
+ | **UNSAFE** | Toxic content, insults, offensive language. |
44
+
45
+ ## Usage
46
+
47
+ You can use this model directly with the Hugging Face `pipeline`. The pipeline will automatically output the labels "SAFE" or "UNSAFE".
48
+
49
+ ```python
50
+ from transformers import pipeline
51
+
52
+ # Load the classifier
53
+ # Make sure the repo name matches exactly what you created on HF
54
+ classifier = pipeline("text-classification", model="Famezz/roberta_safety_classifier")
55
+
56
+ # Test with English
57
+ print(classifier("How do I bake a cake?"))
58
+ # Output: [{'label': 'SAFE', 'score': 0.99}]
59
+
60
+ # Test with Italian
61
+ print(classifier("Sei un idiota"))
62
+ # Output: [{'label': 'UNSAFE', 'score': 0.98}]