nahiar
/

hatespeech-xlmr-v4

+---
+language:
+- id
+- ace
+- ban
+- bjn
+- bug
+- jav
+- mad
+- min
+- sun
+- bbc
+- eng
+library_name: transformers
+pipeline_tag: text-classification
+tags:
+- text-classification
+- hate-speech-detection
+- indonesian
+- multilingual
+- social-media
+- natural-language-processing
+- xlm-roberta
+license: apache-2.0
+metrics:
+- accuracy
+- f1
+base_model:
+- FacebookAI/xlm-roberta-base
+---
+# Hate Speech Detection for Social Media Text
+**Multilingual Indonesian & English — XLM-RoBERTa**
+This repository provides a fine-tuned **XLM-RoBERTa** model for **Hate Speech detection** in social media text.
+The model is designed to identify **hate speech vs non-hate speech** across **Indonesian**, **regional Indonesian languages**, and **English**, particularly in noisy and informal online conversations.
+---
+## 🚀 Highlights
+- Binary classification: **Hate Speech / Non-Hate**
+- Multilingual support (Indonesia + English)
+- Robust on informal and user-generated content
+- Ready-to-use with Hugging Face `pipeline`
+- Suitable for content moderation and safety systems
+---
+## 🌍 Supported Languages
+- 🇮🇩 Bahasa Indonesia
+- Bahasa Melayu
+- Indonesian regional languages (Aceh, Banjar, Bugis, Jawa, Madura, Minang, Sunda, dll.)
+- 🇬🇧 English
+---
+## 📊 Model Performance
+> Performance metrics are reported on a held-out validation set.
+| Metric          | Score  |
+|-----------------|--------|
+| Accuracy        | 0.9180 |
+| F1 (Macro)      | 0.9179 |
+| F1 (Weighted)   | 0.9200 |
+| Training Loss   | 0.9200 |
+| Validation Loss | 0.9200 |
+*(Exact scores may vary depending on evaluation split and threshold.)*
+---
+## ⚙️ Usage
+### Installation
+```bash
+pip install transformers torch
+````
+### Single Prediction
+```python
+from transformers import pipeline
+classifier = pipeline(
+    "text-classification",
+    model="nahiar/hatespeech-xlmr-v4"
+)
+result = classifier("Dasar bodoh, otak udang!")
+print(result)
+```
+**Output**
+```text
+[{'label': 'LABEL_1', 'score': 0.9821}]
+```
+### Label Mapping
+```text
+LABEL_0 → NON_HATE
+LABEL_1 → HATE_SPEECH
+```
+---
+## 📦 Batch Inference
+```python
+texts = [
+    "Kamu itu memang tidak berguna",
+    "Saya tidak setuju dengan pendapat kamu",
+    "Dasar kaum ini selalu bikin rusuh"
+]
+results = classifier(texts)
+for text, result in zip(texts, results):
+    print(f"{result['label']} ({result['score']:.4f}) → {text}")
+```
+---
+## 🏗️ Training Configuration
+| Parameter         | Value            |
+| ----------------- | ---------------- |
+| Base Model        | xlm-roberta-base |
+| Training Strategy | Fine-tuning      |
+| Epochs            | Multiple         |
+| Learning Rate     | 2e-5             |
+| Batch Size        | 16               |
+| Training Date     | 2025-12-03       |
+---
+## 🎯 Intended Use
+* Hate speech detection and moderation
+* Content safety and compliance systems
+* Pre-filtering for sentiment or topic analysis
+* Social media monitoring dashboards
+---
+## ⚠️ Limitations
+* Binary classification only (Hate / Non-Hate)
+* Does not identify hate targets or categories
+* Context-dependent sarcasm may be misclassified
+* Not suitable for legal judgment without human review
+---
+## 📜 License
+This model is released under the **Apache License 2.0**
+Free for research and commercial use.
+---
+## 📚 Citation
+```bibtex
+@misc{djunaedi2025hatespeech,
+  author    = {Raihan Hidayatulloh Djunaedi},
+  title     = {Hate Speech Detection for Social Media Text},
+  year      = {2025},
+  publisher = {Hugging Face},
+  url       = {https://huggingface.co/nahiar/hatespeech-xlmr-v4}
+}
+```
+---
+## 🙌 Acknowledgements
+* Hugging Face Transformers
+* Facebook AI Research — XLM-RoBERTa