🌍 Multilabel Emotion Analysis for Social Media

Indonesian 🇮🇩 | IndoBERT Base P1

A fine-tuned indobert-base-p1 model for Emotion Analysis on noisy social media text.

This model is optimized for multilingual informal content commonly found on:

  • Twitter / X
  • Instagram
  • TikTok
  • Facebook
  • Online forums

It supports both Bahasa Indonesia and English, making it suitable for moderation systems, social listening, and content intelligence pipelines. But not perform in English


🔍 Model Overview

  • Architecture: indobenchmark/indobert-base-p1
  • Task: Text Classification (Emotion Analysis)
  • Languages: Indonesian, English
  • Domain: Informal & Social Media Text
  • Training Date: 2026-02-26

🏷️ Supported Emotion Labels

This model detects the following emotion types:

Label Description
LABEL_0 Anger/Marah
LABEL_1 Anticipation/Antisipasi
LABEL_2 Disgust/Jijik
LABEL_3 Fear/Takut
LABEL_4 Joy/Senang
LABEL_5 Sadness/Sedih
LABEL_6 Surprise/Terkejut
LABEL_7 Trust/Percaya

📊 Model Performance

Evaluated on held-out validation dataset:

Metric Score
F1 Score 0.9726
Precision 0.9902
Recall 0.9557
Training Loss 0.0824
Validation Loss 0.0637

🏗️ Training Configuration

Parameter Value
Base Model indobert-base-p1
Training Samples 96,831
Epochs 3
Learning Rate 2e-5
Batch Size 16 (train), 32 (eval)
Optimizer AdamW
Framework Hugging Face Transformers

🚀 Usage

Preprocessing Configuration

import re

def clean_text(text):
    if not isinstance(text, str):
        return text
    text = text.replace("#", "<hashtag>")
    text = re.sub(r"https?://\S+|www\.\S+", "<link>", text)
    text = re.sub(r"\b[\w\.-]+@[\w\.-]+\.\w+\b", "<email>", text)
    text = re.sub(r"@\w+", "<user>", text)
    text = text.replace('"', "").replace("'", "")
    text = text.replace("\n", " ")
    text = text.replace("\\n", " ")
    text = re.sub(r"\s+", " ", text).strip()

    return text

Quick Inference (Single Text)

import torch
import numpy as np
from transformers import AutoTokenizer, AutoModelForSequenceClassification, AutoConfig

MODEL_NAME = "iqbalpurba26/dev-emot-indobert"
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
config = AutoConfig.from_pretrained(
    MODEL_NAME,
    num_labels=8,
    problem_type="multi_label_classification"
)
model = AutoModelForSequenceClassification.from_pretrained(
    MODEL_NAME,
    config=config
)

label_cols = ['anger', 'anticipation', 'disgust', 'fear',
              'joy', 'sadness', 'surprise', 'trust']

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)

def predict_multilabel(texts, max_length=128, top_k=1):
    if isinstance(texts, str):
        texts = [texts]

    encodings = tokenizer(
        texts,
        padding=True,
        truncation=True,
        max_length=max_length,
        return_tensors="pt"
    )

    encodings = {k: v.to(device) for k, v in encodings.items()}

    with torch.no_grad():
        outputs = model(**encodings)
        logits = outputs.logits

    probs = torch.sigmoid(logits).cpu().numpy()

    results = []

    for text, prob in zip(texts, probs):

        top_indices = np.argsort(prob)[-top_k:][::-1]
        top_labels = [label_cols[i] for i in top_indices]

        results.append({
            "text": text,
            "top_emotions": top_labels,
            "probabilities": {
                label_cols[i]: float(prob[i]) for i in top_indices
            }
        })

    return results

text = """ 
#SobatBahari, Kepala #BPPSDMKP, I Nyoman Radiarta beserta Eselon 1 Kementerian Kelautan dan Perikanan RI lainnya, dampingi Menteri Sakti Wahyu Trenggono dalam pertemuan dengan Babcock International Group di Gedung Mina Bahari I, KKP, Jakarta, (12/2). Pertemuan ini membahas peluang kerja sama strategis untuk memperkuat sektor kelautan dan perikanan Indonesia, termasuk dukungan teknologi, peningkatan kapasitas, dan pengembangan SDM KP. Menteri Trenggono menegaskan kemitraan lintas negara adalah kunci percepatan modernisasi sektor Kelautan dan Perikanan. Dialog berlangsung hangat dan interaktif, menjadi fondasi awal penjajakan kerja sama lanjutan. Dari perspektif BPPSDM KP, kolaborasi ini membuka ruang penguatan kompetensi #SDMKP agar makin adaptif, profesional, dan siap menjawab tantangan global. #2026KKPGrowStronger #KKPGOID #SDMUnggul
"""
text = clean_text(text)
print(predict_multilabel(text))

Quick Inference (Batch Size)

import torch
import numpy as np
from math import ceil
from transformers import AutoTokenizer, AutoModelForSequenceClassification, AutoConfig

MODEL_NAME = "iqbalpurba26/dev-emot-indobert"
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
config = AutoConfig.from_pretrained(
    MODEL_NAME,
    num_labels=8,
    problem_type="multi_label_classification"
)
model = AutoModelForSequenceClassification.from_pretrained(
    MODEL_NAME,
    config=config
)

label_cols = ['anger', 'anticipation', 'disgust', 'fear',
              'joy', 'sadness', 'surprise', 'trust']

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)

def predict_from_list(texts, batch_size=16, max_length=512, top_k=1):
    if isinstance(texts, str):
        texts = [texts]

    results = []
    total_batches = ceil(len(texts) / batch_size)

    for i in range(total_batches):

        batch_texts = texts[i*batch_size:(i+1)*batch_size]

        encodings = tokenizer(
            batch_texts,
            padding=True,
            truncation=True,
            max_length=max_length,
            return_tensors="pt"
        )

        encodings = {k: v.to(device) for k, v in encodings.items()}

        with torch.no_grad():
            outputs = model(**encodings)
            logits = outputs.logits

        probs = torch.sigmoid(logits).cpu().numpy()

        for text, prob in zip(batch_texts, probs):

            top_indices = np.argsort(prob)[-top_k:][::-1]

            for idx in top_indices:
                results.append({
                    "text": text,
                    "label": label_cols[idx],
                    "score": float(prob[idx])
                })
    return results

texts = [
  "Ribuan keluarga berduka setelah banjir besar merenggut puluhan korban jiwa.",
  "Para pekerja yang terkena PHK massal hanya bisa pasrah melihat masa depan mereka terancam.",
  "Tangis pecah di lokasi kebakaran yang menghanguskan rumah-rumah warga miskin.",
  "Ekonomi yang terpuruk membuat banyak usaha kecil gulung tikar dan meninggalkan kisah pilu.",
  "Orang tua itu terlihat hancur setelah kehilangan satu-satunya sumber penghasilan keluarganya.",
  "Warga geram dan mengecam keras kebijakan yang dinilai tidak berpihak pada rakyat kecil.",
  "Publik marah besar atas dugaan korupsi yang merugikan negara hingga triliunan rupiah.",
  "Kebijakan itu dianggap gagal total dan memicu protes di berbagai daerah.",
  "Aktivis mengutuk keras tindakan aparat yang dinilai berlebihan.",
  "Banyak pihak menilai keputusan tersebut memalukan dan tidak bertanggung jawab.",
  "Masyarakat muak dengan praktik suap yang dinilai menjijikkan dan tidak bermoral.",
  "Skandal itu dianggap sangat memuakkan dan mencoreng nama baik institusi.",
  "Publik merasa jijik melihat pejabat yang tertangkap tangan melakukan pungli.",
  "Perilaku manipulatif tersebut dinilai busuk dan merusak kepercayaan publik.",
  "Banyak orang eneg dengan drama politik yang penuh kepentingan pribadi.",
  "Warga ketakutan setelah terjadi ledakan besar di pusat kota.",
  "Ancaman resesi global membuat pelaku usaha khawatir akan masa depan bisnis mereka.",
  "Isu keamanan siber yang bocor membuat masyarakat cemas data pribadinya disalahgunakan.",
  "Lonjakan kasus penyakit misterius memicu kepanikan di beberapa wilayah.",
  "Investor waswas melihat nilai tukar yang terus melemah dalam beberapa hari terakhir."
]

predict_from_list(texts, batch_size=16)

🎯 Intended Use Cases

  • Social media Named Entity Recognition
  • Comment & post filtering
  • Content moderation assistance
  • Political monitoring
  • Brand & organization tracking
  • Multilingual content intelligence systems

⚠️ Limitations

  • Supports only the defined emotion labels set: labels = ['anger', 'anticipation', 'disgust', 'fear', 'joy', 'sadness', 'surprise', 'trust']
  • Not optimized for:
    • Formal academic/legal documents
    • Extremely short or ambiguous messages
    • Heavy slang or sarcastic expressions
  • Performance may degrade on highly code-mixed sentences
  • The model may inherit bias from training data

⚖️ Ethical Considerations

This model may reflect demographic, geopolitical, or cultural biases present in the training dataset.

It is not intended to replace human judgment in high-risk or sensitive decision-making systems.

Human-in-the-loop review is strongly recommended for moderation or governance-related deployments.


🖥️ Hardware Recommendations

  • Recommended: GPU (≥ 8GB VRAM) for optimal performance
  • CPU inference supported but slower
  • Compatible with FP16 mixed precision for faster inference

📜 License

Released under the Apache 2.0 License.
Free for commercial and research use.

📚 Citation

@misc{purba2026multilabelemotionanalysis,
  author    = {M. Iqbal Purba},
  title     = {Multilabel Emotion Analysis for Social Media},
  year      = {2026},
  publisher = {Hugging Face},
  url       = {https://huggingface.co/iqbalpurba26/dev-emot-indobert}
}

🙌 Acknowledgements

  • Hugging Face Transformers
  • IndoBenchmark - 1ndobert-base-p1
  • Open-source NLP community
  • Contributors and dataset annotators
Downloads last month
55
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for iqbalpurba26/dev-emot-indobert

Finetuned
(103)
this model