| | --- |
| | language: |
| | - id |
| | - eng |
| | library_name: transformers |
| | pipeline_tag: text-classification |
| | tags: |
| | - text-classification |
| | - sentiment-analysis |
| | - indonesian |
| | - multilingual |
| | - xlm-roberta |
| | - social-media |
| | license: apache-2.0 |
| | metrics: |
| | - accuracy |
| | - f1 |
| | base_model: |
| | - FacebookAI/xlm-roberta-base |
| | --- |
| | |
| | # Sentiment Analysis for Social Media Text |
| | **Multilingual Indonesian & English | XLM-RoBERTa** |
| |
|
| | This model is a fine-tuned **XLM-RoBERTa-Base** designed to analyze **Sentiment Positive, Neutral, Negative** content in social media text. |
| | It supports **Indonesian** and **English Languages**, making it suitable for multi-platform moderation use cases such as Twitter/X, Instagram, TikTok, Facebook, and online forums. |
| |
|
| | --- |
| |
|
| | ## ✨ Key Features |
| |
|
| | - ✅ Sentiment Posisitve, Neutral, and Negative classification |
| | - 🌏 Multilingual support (Indonesian & English) |
| | - 🧠 Based on **XLM-RoBERTa (multilingual transformer)** |
| | - ⚡ Ready-to-use with Hugging Face `pipeline` |
| | - 📊 Strong performance on noisy social media text |
| |
|
| | --- |
| |
|
| | ## 🌍 Supported Languages |
| |
|
| | - 🇮🇩 Bahasa Indonesia |
| | - 🇬🇧 English |
| |
|
| | --- |
| |
|
| | ## 🧪 Model Performance |
| |
|
| | | Metric | Score | |
| | |---------------------|--------| |
| | | Accuracy | 0.8527 | |
| | | F1 (Macro) | 0.8525 | |
| | | F1 (Weighted) | 0.8525 | |
| | | Precision | 0.8500 | |
| | | Recall | 0.8500 | |
| | | Training Loss | 0.2759 | |
| | | Validation Loss | 0.4368 | |
| |
|
| | > Evaluated on held-out validation data with balanced sentiment distribution. |
| |
|
| | --- |
| |
|
| | ## 🚀 Quick Start |
| |
|
| | ### Installation |
| | ```bash |
| | pip install transformers torch |
| | ```` |
| |
|
| | ### Single Prediction |
| |
|
| | ```python |
| | from transformers import pipeline |
| | |
| | classifier = pipeline( |
| | task="text-classification", |
| | model="nahiar/sentiment-analysis-v2" |
| | ) |
| | |
| | result = classifier("PASTI DIJAMIN WDP 100%") |
| | print(result) |
| | ``` |
| |
|
| | **Output** |
| |
|
| | ```python |
| | [{'label': 'LABEL_1', 'score': 0.9876}] |
| | ``` |
| |
|
| | ### Label Mapping |
| |
|
| | ```text |
| | LABEL_0 → NEUTRAL |
| | LABEL_1 → POSITIF |
| | LABEL_2 → NEGATIVE |
| | ``` |
| |
|
| | --- |
| |
|
| | ## 📦 Batch Inference Example |
| |
|
| | ```python |
| | "texts": [ |
| | "साइबर हमले के बाद JLR का बड़ा बयान - जानें कंपनी ने क्या कहा | Tata Motors के शेयर पर दिखेगा असर? |
| | |
| | #TataMotors #JLR #CyberAttack |
| | |
| | https://t.co/6WlGS77UUp", |
| | "Kita sudah Ready skrg ini bagi yang memerlukan jasa pemulihan akun & Hapus All akun |
| | |
| | Lacak lokasi / sadap wa / Hack Akun / Revengeporn - korban pemerasan vcs / terror |
| | |
| | TIKTOK,GMAIL,TWITER,TELEGRAM, |
| | FACEBOOK,INSTAGRAM |
| | #revengeporn #zonauangᅠᅠᅠ |
| | ☎️ https://t.co/K0AbW08qnU https://t.co/4IpWNA7a0z", |
| | "💥Slot Gacor Hari ini Rute303 |
| | 💥Jaminan Jackpot Maxwin malam ini |
| | |
| | LINK SLOT GACOR HARI INI : https://t.co/QvxjCAnt8o |
| | |
| | Tags: |
| | Jumbo #timsekop Jumat gratis ongkir Like Crazy PSIM https://t.co/ukuRdlvgGA" |
| | ] |
| | |
| | results = classifier(texts) |
| | |
| | for text, result in zip(texts, results): |
| | print(f"{text} -> {result['label']} ({result['score']:.4f})") |
| | ``` |
| |
|
| | --- |
| |
|
| | ## 🏗️ Training Configuration |
| |
|
| | | Parameter | Value | |
| | | ------------------ | ---------------- | |
| | | Base Model | xlm-roberta-base | |
| | | Training Samples | 19,200 | |
| | | Validation Samples | 4,800 | |
| | | Epochs | 3 | |
| | | Learning Rate | 1e-5 | |
| | | Batch Size | 16 | |
| | | Training Date | 2026-02-05 | |
| |
|
| | --- |
| |
|
| | ## 🎯 Intended Use Cases |
| |
|
| | * Social media Sentiment Analysis |
| | * Comment & post filtering |
| | * Content quality control |
| |
|
| | --- |
| |
|
| | ## ⚠️ Limitations |
| |
|
| | * Binary classification only (Positive, Negative, Neutral) |
| | * Not optimized for non-social-media formal text |
| | * Performance may degrade on very short or ambiguous messages |
| | * The model still has the potential to be biased |
| |
|
| | --- |
| |
|
| | ## 📜 License |
| |
|
| | Released under the **Apache 2.0 License**. |
| | Free for commercial and research use. |
| |
|
| | --- |
| |
|
| | ## 📚 Citation |
| |
|
| | If you use this model in your work, please cite: |
| |
|
| | ```bibtex |
| | @misc{djunaedi2026sentiment, |
| | author = {AI/ML Engineer ADS Digital Partner}, |
| | title = {Sentiment Analysis for Social Media Text}, |
| | year = {2026}, |
| | publisher = {Hugging Face}, |
| | url = {https://huggingface.co/nahiar/spam-detection-v2} |
| | } |
| | ``` |
| |
|
| | --- |
| |
|
| | ## 🙌 Acknowledgements |
| |
|
| | * Hugging Face Transformers |
| | * Facebook AI Research — XLM-RoBERTa |