IMSyPP
/

hate_speech_multilingual

Text Classification

Model card Files Files and versions

Bojan commited on Aug 30, 2024

Commit

c46d81c

·

verified ·

1 Parent(s): ac70c3c

Update README.md

Files changed (1) hide show

README.md +53 -1

README.md CHANGED Viewed

@@ -15,4 +15,56 @@ tags:
 - xlm-roberta
 - Youtube
 - Twitter
----

 - xlm-roberta
 - Youtube
 - Twitter
+---
+# Multilingual Hate Speech Classifier for Social Media Content
+A multilingual [XLM-R-based (100 languages)](https://huggingface.co/FacebookAI/xlm-roberta-large) hate speech classification model fine-tuned on English, Italian and Slovenian data. Paper out soon...
+**Training data**
+* 103k English Youtube comments
+* 119k Italian Youtube comments
+* 50k Slovenian Twitter comments
+**Evaluation data**
+* 20k English Youtube comments
+* 21k Italian Youtube comments
+* 10k Slovenian Twitter comments
+**Fine-tuning hyperparameters**
+num_train_epochs=3,
+train_batch_size=8,
+learning_rate=6e-6
+**Evaluation Results**
+Model agreement (accuracy) vs. Inter-annotator agreement (0 - no agreement; 100 - perfect agreement):
+| | Model-annotator Agreement | Inter-annotator Agreement |
+|-----------|---------------------------|---------------------------|
+| English | 79.97 | 82.91 |
+| Italian | 82.00 | 81.79 |
+| Slovenian | 78.84 | 79.43 |
+Class-specific model F1-scores:
+| | Appropriate | Inappropriate | Offensive | Violent |
+|-----------|-------------|---------------|-----------|---------|
+| English | 86.10 | 39.16 | 68.24 | 27.82 |
+| Italian | 89.77 | 58.45 | 60.42 | 44.97 |
+| Slovenian | 84.30 | 45.22 | 69.69 | 24.79 |
+**Usage**
+from transformers import AutoModelForSequenceClassification, TextClassificationPipeline, AutoTokenizer, AutoConfig
+MODEL = "classla/xlm-r-parlasent"
+tokenizer = AutoTokenizer.from_pretrained(MODEL)
+config = AutoConfig.from_pretrained(MODEL)
+model = AutoModelForSequenceClassification.from_pretrained(MODEL)
+pipe = TextClassificationPipeline(model=model, tokenizer=tokenizer, return_all_scores=True,
+task='sentiment_analysis', device=0, function_to_apply="none")
+pipe([
+"Thank you for using our model",
+"Grazie per aver utilizzato il nostro modello"
+"Hvala za uporabo našega modela"
+])