Update README.md

582daa9 verified 29 days ago

4.36 kB

language:
  - id
  - eng
library_name: transformers
pipeline_tag: text-classification
tags:
  - text-classification
  - sentiment-analysis
  - indonesian
  - multilingual
  - xlm-roberta
  - social-media
license: apache-2.0
metrics:
  - accuracy
  - f1
base_model:
  - FacebookAI/xlm-roberta-base

Sentiment Analysis for Social Media Text

Multilingual Indonesian & English | XLM-RoBERTa

This model is a fine-tuned XLM-RoBERTa-Base designed to analyze Sentiment Positive, Neutral, Negative content in social media text.
It supports Indonesian and English Languages, making it suitable for multi-platform moderation use cases such as Twitter/X, Instagram, TikTok, Facebook, and online forums.

✨ Key Features

✅ Sentiment Posisitve, Neutral, and Negative classification
🌏 Multilingual support (Indonesian & English)
🧠 Based on XLM-RoBERTa (multilingual transformer)
⚡ Ready-to-use with Hugging Face pipeline
📊 Strong performance on noisy social media text

🌍 Supported Languages

🇮🇩 Bahasa Indonesia
🇬🇧 English

🧪 Model Performance

Metric	Score
Accuracy	0.8527
F1 (Macro)	0.8525
F1 (Weighted)	0.8525
Precision	0.8500
Recall	0.8500
Training Loss	0.2759
Validation Loss	0.4368

Evaluated on held-out validation data with balanced sentiment distribution.

🚀 Quick Start

Installation

pip install transformers torch

Single Prediction

from transformers import pipeline

classifier = pipeline(
    task="text-classification",
    model="nahiar/sentiment-analysis-v2"
)

result = classifier("PASTI DIJAMIN WDP 100%")
print(result)

Output

[{'label': 'LABEL_1', 'score': 0.9876}]

Label Mapping

LABEL_0 → NEUTRAL
LABEL_1 → POSITIF
LABEL_2 → NEGATIVE

📦 Batch Inference Example

"texts": [
        "साइबर हमले के बाद JLR का बड़ा बयान - जानें कंपनी ने क्या कहा | Tata Motors के शेयर पर दिखेगा असर?

#TataMotors #JLR #CyberAttack 

https://t.co/6WlGS77UUp",
        "Kita sudah Ready skrg ini bagi yang memerlukan jasa pemulihan akun &amp; Hapus All akun 

 Lacak lokasi / sadap wa / Hack Akun / Revengeporn - korban pemerasan vcs / terror

TIKTOK,GMAIL,TWITER,TELEGRAM,
FACEBOOK,INSTAGRAM 
#revengeporn #zonauangᅠᅠᅠ 
 ☎️ https://t.co/K0AbW08qnU https://t.co/4IpWNA7a0z",
        "💥Slot Gacor Hari ini Rute303
💥Jaminan Jackpot Maxwin malam ini

LINK SLOT GACOR HARI INI : https://t.co/QvxjCAnt8o

Tags:
Jumbo #timsekop Jumat gratis ongkir Like Crazy PSIM https://t.co/ukuRdlvgGA"
    ]

results = classifier(texts)

for text, result in zip(texts, results):
    print(f"{text} -> {result['label']} ({result['score']:.4f})")

🏗️ Training Configuration

Parameter	Value
Base Model	xlm-roberta-base
Training Samples	19,200
Validation Samples	4,800
Epochs	3
Learning Rate	1e-5
Batch Size	16
Training Date	2026-02-05

🎯 Intended Use Cases

Social media Sentiment Analysis
Comment & post filtering
Content quality control

⚠️ Limitations

Binary classification only (Positive, Negative, Neutral)
Not optimized for non-social-media formal text
Performance may degrade on very short or ambiguous messages
The model still has the potential to be biased

📜 License

Released under the Apache 2.0 License. Free for commercial and research use.

📚 Citation

If you use this model in your work, please cite:

@misc{djunaedi2026sentiment,
  author    = {AI/ML Engineer ADS Digital Partner},
  title     = {Sentiment Analysis for Social Media Text},
  year      = {2026},
  publisher = {Hugging Face},
  url       = {https://huggingface.co/nahiar/spam-detection-v2}
}

🙌 Acknowledgements

Hugging Face Transformers
Facebook AI Research — XLM-RoBERTa