Update README.md

582daa9 verified 29 days ago

4.36 kB

	---
	language:
	- id
	- eng
	library_name: transformers
	pipeline_tag: text-classification
	tags:
	- text-classification
	- sentiment-analysis
	- indonesian
	- multilingual
	- xlm-roberta
	- social-media
	license: apache-2.0
	metrics:
	- accuracy
	- f1
	base_model:
	- FacebookAI/xlm-roberta-base
	---

	# Sentiment Analysis for Social Media Text
	Multilingual Indonesian & English \| XLM-RoBERTa

	This model is a fine-tuned XLM-RoBERTa-Base designed to analyze Sentiment Positive, Neutral, Negative content in social media text.
	It supports Indonesian and English Languages, making it suitable for multi-platform moderation use cases such as Twitter/X, Instagram, TikTok, Facebook, and online forums.

	---

	## ✨ Key Features

	- ✅ Sentiment Posisitve, Neutral, and Negative classification
	- 🌏 Multilingual support (Indonesian & English)
	- 🧠 Based on XLM-RoBERTa (multilingual transformer)
	- ⚡ Ready-to-use with Hugging Face `pipeline`
	- 📊 Strong performance on noisy social media text

	---

	## 🌍 Supported Languages

	- 🇮🇩 Bahasa Indonesia
	- 🇬🇧 English

	---

	## 🧪 Model Performance

	\| Metric \| Score \|
	\|---------------------\|--------\|
	\| Accuracy \| 0.8527 \|
	\| F1 (Macro) \| 0.8525 \|
	\| F1 (Weighted) \| 0.8525 \|
	\| Precision \| 0.8500 \|
	\| Recall \| 0.8500 \|
	\| Training Loss \| 0.2759 \|
	\| Validation Loss \| 0.4368 \|

	> Evaluated on held-out validation data with balanced sentiment distribution.

	---

	## 🚀 Quick Start

	### Installation
	```bash
	pip install transformers torch
	````

	### Single Prediction

	```python
	from transformers import pipeline

	classifier = pipeline(
	task="text-classification",
	model="nahiar/sentiment-analysis-v2"
	)

	result = classifier("PASTI DIJAMIN WDP 100%")
	print(result)
	```

	Output

	```python
	[{'label': 'LABEL_1', 'score': 0.9876}]
	```

	### Label Mapping

	```text
	LABEL_0 → NEUTRAL
	LABEL_1 → POSITIF
	LABEL_2 → NEGATIVE
	```

	---

	## 📦 Batch Inference Example

	```python
	"texts": [
	"साइबर हमले के बाद JLR का बड़ा बयान - जानें कंपनी ने क्या कहा \| Tata Motors के शेयर पर दिखेगा असर?

	#TataMotors #JLR #CyberAttack

	https://t.co/6WlGS77UUp",
	"Kita sudah Ready skrg ini bagi yang memerlukan jasa pemulihan akun & Hapus All akun

	Lacak lokasi / sadap wa / Hack Akun / Revengeporn - korban pemerasan vcs / terror

	TIKTOK,GMAIL,TWITER,TELEGRAM,
	FACEBOOK,INSTAGRAM
	#revengeporn #zonauangᅠᅠᅠ
	☎️ https://t.co/K0AbW08qnU https://t.co/4IpWNA7a0z",
	"💥Slot Gacor Hari ini Rute303
	💥Jaminan Jackpot Maxwin malam ini

	LINK SLOT GACOR HARI INI : https://t.co/QvxjCAnt8o

	Tags:
	Jumbo #timsekop Jumat gratis ongkir Like Crazy PSIM https://t.co/ukuRdlvgGA"
	]

	results = classifier(texts)

	for text, result in zip(texts, results):
	print(f"{text} -> {result['label']} ({result['score']:.4f})")
	```

	---

	## 🏗️ Training Configuration

	\| Parameter \| Value \|
	\| ------------------ \| ---------------- \|
	\| Base Model \| xlm-roberta-base \|
	\| Training Samples \| 19,200 \|
	\| Validation Samples \| 4,800 \|
	\| Epochs \| 3 \|
	\| Learning Rate \| 1e-5 \|
	\| Batch Size \| 16 \|
	\| Training Date \| 2026-02-05 \|

	---

	## 🎯 Intended Use Cases

	* Social media Sentiment Analysis
	* Comment & post filtering
	* Content quality control

	---

	## ⚠️ Limitations

	* Binary classification only (Positive, Negative, Neutral)
	* Not optimized for non-social-media formal text
	* Performance may degrade on very short or ambiguous messages
	* The model still has the potential to be biased

	---

	## 📜 License

	Released under the Apache 2.0 License.
	Free for commercial and research use.

	---

	## 📚 Citation

	If you use this model in your work, please cite:

	```bibtex
	@misc{djunaedi2026sentiment,
	author = {AI/ML Engineer ADS Digital Partner},
	title = {Sentiment Analysis for Social Media Text},
	year = {2026},
	publisher = {Hugging Face},
	url = {https://huggingface.co/nahiar/spam-detection-v2}
	}
	```

	---

	## 🙌 Acknowledgements

	* Hugging Face Transformers
	* Facebook AI Research — XLM-RoBERTa