🌍 Multilabel Emotion Analysis for Social Media

Indonesian 🇮🇩 | IndoBERT Base P1

A fine-tuned indobert-base-p1 model for Emotion Analysis on noisy social media text.

This model is optimized for multilingual informal content commonly found on:

Twitter / X
Instagram
TikTok
Facebook
Online forums

It supports both Bahasa Indonesia and English, making it suitable for moderation systems, social listening, and content intelligence pipelines. But not perform in English

🔍 Model Overview

Architecture: indobenchmark/indobert-base-p1
Task: Text Classification (Emotion Analysis)
Languages: Indonesian, English
Domain: Informal & Social Media Text
Training Date: 2026-02-26

🏷️ Supported Emotion Labels

This model detects the following emotion types:

Label	Description
LABEL_0	Anger/Marah
LABEL_1	Anticipation/Antisipasi
LABEL_2	Disgust/Jijik
LABEL_3	Fear/Takut
LABEL_4	Joy/Senang
LABEL_5	Sadness/Sedih
LABEL_6	Surprise/Terkejut
LABEL_7	Trust/Percaya

📊 Model Performance

Evaluated on held-out validation dataset:

Metric	Score
F1 Score	0.9726
Precision	0.9902
Recall	0.9557
Training Loss	0.0824
Validation Loss	0.0637

🏗️ Training Configuration

Parameter	Value
Base Model	indobert-base-p1
Training Samples	96,831
Epochs	3
Learning Rate	2e-5
Batch Size	16 (train), 32 (eval)
Optimizer	AdamW
Framework	Hugging Face Transformers

🚀 Usage

Preprocessing Configuration

import re

def clean_text(text):
    if not isinstance(text, str):
        return text
    text = text.replace("#", "<hashtag>")
    text = re.sub(r"https?://\S+|www\.\S+", "<link>", text)
    text = re.sub(r"\b[\w\.-]+@[\w\.-]+\.\w+\b", "<email>", text)
    text = re.sub(r"@\w+", "<user>", text)
    text = text.replace('"', "").replace("'", "")
    text = text.replace("\n", " ")
    text = text.replace("\\n", " ")
    text = re.sub(r"\s+", " ", text).strip()

    return text

Quick Inference (Single Text)

import torch
import numpy as np
from transformers import AutoTokenizer, AutoModelForSequenceClassification, AutoConfig

MODEL_NAME = "iqbalpurba26/dev-emot-indobert"
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
config = AutoConfig.from_pretrained(
    MODEL_NAME,
    num_labels=8,
    problem_type="multi_label_classification"
)
model = AutoModelForSequenceClassification.from_pretrained(
    MODEL_NAME,
    config=config
)

label_cols = ['anger', 'anticipation', 'disgust', 'fear',
              'joy', 'sadness', 'surprise', 'trust']

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)

def predict_multilabel(texts, max_length=128, top_k=1):
    if isinstance(texts, str):
        texts = [texts]

    encodings = tokenizer(
        texts,
        padding=True,
        truncation=True,
        max_length=max_length,
        return_tensors="pt"
    )

    encodings = {k: v.to(device) for k, v in encodings.items()}

    with torch.no_grad():
        outputs = model(**encodings)
        logits = outputs.logits

    probs = torch.sigmoid(logits).cpu().numpy()

    results = []

    for text, prob in zip(texts, probs):

        top_indices = np.argsort(prob)[-top_k:][::-1]
        top_labels = [label_cols[i] for i in top_indices]

        results.append({
            "text": text,
            "top_emotions": top_labels,
            "probabilities": {
                label_cols[i]: float(prob[i]) for i in top_indices
            }
        })

    return results

text = """ 
#SobatBahari, Kepala #BPPSDMKP, I Nyoman Radiarta beserta Eselon 1 Kementerian Kelautan dan Perikanan RI lainnya, dampingi Menteri Sakti Wahyu Trenggono dalam pertemuan dengan Babcock International Group di Gedung Mina Bahari I, KKP, Jakarta, (12/2). Pertemuan ini membahas peluang kerja sama strategis untuk memperkuat sektor kelautan dan perikanan Indonesia, termasuk dukungan teknologi, peningkatan kapasitas, dan pengembangan SDM KP. Menteri Trenggono menegaskan kemitraan lintas negara adalah kunci percepatan modernisasi sektor Kelautan dan Perikanan. Dialog berlangsung hangat dan interaktif, menjadi fondasi awal penjajakan kerja sama lanjutan. Dari perspektif BPPSDM KP, kolaborasi ini membuka ruang penguatan kompetensi #SDMKP agar makin adaptif, profesional, dan siap menjawab tantangan global. #2026KKPGrowStronger #KKPGOID #SDMUnggul
"""
text = clean_text(text)
print(predict_multilabel(text))

Quick Inference (Batch Size)

import torch
import numpy as np
from math import ceil
from transformers import AutoTokenizer, AutoModelForSequenceClassification, AutoConfig

MODEL_NAME = "iqbalpurba26/dev-emot-indobert"
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
config = AutoConfig.from_pretrained(
    MODEL_NAME,
    num_labels=8,
    problem_type="multi_label_classification"
)
model = AutoModelForSequenceClassification.from_pretrained(
    MODEL_NAME,
    config=config
)

label_cols = ['anger', 'anticipation', 'disgust', 'fear',
              'joy', 'sadness', 'surprise', 'trust']

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)

def predict_from_list(texts, batch_size=16, max_length=512, top_k=1):
    if isinstance(texts, str):
        texts = [texts]

    results = []
    total_batches = ceil(len(texts) / batch_size)

    for i in range(total_batches):

        batch_texts = texts[i*batch_size:(i+1)*batch_size]

        encodings = tokenizer(
            batch_texts,
            padding=True,
            truncation=True,
            max_length=max_length,
            return_tensors="pt"
        )

        encodings = {k: v.to(device) for k, v in encodings.items()}

        with torch.no_grad():
            outputs = model(**encodings)
            logits = outputs.logits

        probs = torch.sigmoid(logits).cpu().numpy()

        for text, prob in zip(batch_texts, probs):

            top_indices = np.argsort(prob)[-top_k:][::-1]

            for idx in top_indices:
                results.append({
                    "text": text,
                    "label": label_cols[idx],
                    "score": float(prob[idx])
                })
    return results

texts = [
  "Ribuan keluarga berduka setelah banjir besar merenggut puluhan korban jiwa.",
  "Para pekerja yang terkena PHK massal hanya bisa pasrah melihat masa depan mereka terancam.",
  "Tangis pecah di lokasi kebakaran yang menghanguskan rumah-rumah warga miskin.",
  "Ekonomi yang terpuruk membuat banyak usaha kecil gulung tikar dan meninggalkan kisah pilu.",
  "Orang tua itu terlihat hancur setelah kehilangan satu-satunya sumber penghasilan keluarganya.",
  "Warga geram dan mengecam keras kebijakan yang dinilai tidak berpihak pada rakyat kecil.",
  "Publik marah besar atas dugaan korupsi yang merugikan negara hingga triliunan rupiah.",
  "Kebijakan itu dianggap gagal total dan memicu protes di berbagai daerah.",
  "Aktivis mengutuk keras tindakan aparat yang dinilai berlebihan.",
  "Banyak pihak menilai keputusan tersebut memalukan dan tidak bertanggung jawab.",
  "Masyarakat muak dengan praktik suap yang dinilai menjijikkan dan tidak bermoral.",
  "Skandal itu dianggap sangat memuakkan dan mencoreng nama baik institusi.",
  "Publik merasa jijik melihat pejabat yang tertangkap tangan melakukan pungli.",
  "Perilaku manipulatif tersebut dinilai busuk dan merusak kepercayaan publik.",
  "Banyak orang eneg dengan drama politik yang penuh kepentingan pribadi.",
  "Warga ketakutan setelah terjadi ledakan besar di pusat kota.",
  "Ancaman resesi global membuat pelaku usaha khawatir akan masa depan bisnis mereka.",
  "Isu keamanan siber yang bocor membuat masyarakat cemas data pribadinya disalahgunakan.",
  "Lonjakan kasus penyakit misterius memicu kepanikan di beberapa wilayah.",
  "Investor waswas melihat nilai tukar yang terus melemah dalam beberapa hari terakhir."
]

predict_from_list(texts, batch_size=16)

🎯 Intended Use Cases

Social media Named Entity Recognition
Comment & post filtering
Content moderation assistance
Political monitoring
Brand & organization tracking
Multilingual content intelligence systems

⚠️ Limitations

Supports only the defined emotion labels set: labels = ['anger', 'anticipation', 'disgust', 'fear', 'joy', 'sadness', 'surprise', 'trust']
Not optimized for:
- Formal academic/legal documents
- Extremely short or ambiguous messages
- Heavy slang or sarcastic expressions
Performance may degrade on highly code-mixed sentences
The model may inherit bias from training data

⚖️ Ethical Considerations

This model may reflect demographic, geopolitical, or cultural biases present in the training dataset.

It is not intended to replace human judgment in high-risk or sensitive decision-making systems.

Human-in-the-loop review is strongly recommended for moderation or governance-related deployments.

🖥️ Hardware Recommendations

Recommended: GPU (≥ 8GB VRAM) for optimal performance
CPU inference supported but slower
Compatible with FP16 mixed precision for faster inference

📜 License

Released under the Apache 2.0 License.
Free for commercial and research use.

📚 Citation

@misc{purba2026multilabelemotionanalysis,
  author    = {M. Iqbal Purba},
  title     = {Multilabel Emotion Analysis for Social Media},
  year      = {2026},
  publisher = {Hugging Face},
  url       = {https://huggingface.co/iqbalpurba26/dev-emot-indobert}
}

🙌 Acknowledgements

Hugging Face Transformers
IndoBenchmark - 1ndobert-base-p1
Open-source NLP community
Contributors and dataset annotators

Downloads last month: 7

Safetensors

Model size

0.1B params

Tensor type

F32

Model tree for iqbalpurba26/dev-emot-indobert

Base model

indobenchmark/indobert-base-p1

Finetuned

(134)

this model