🌍 Multilabel Emotion Analysis for Social Media
Indonesian 🇮🇩 | IndoBERT Base P1
A fine-tuned indobert-base-p1 model for Emotion Analysis on noisy social media text.
This model is optimized for multilingual informal content commonly found on:
- Twitter / X
- TikTok
- Online forums
It supports both Bahasa Indonesia and English, making it suitable for moderation systems, social listening, and content intelligence pipelines. But not perform in English
🔍 Model Overview
- Architecture:
indobenchmark/indobert-base-p1 - Task: Text Classification (Emotion Analysis)
- Languages: Indonesian, English
- Domain: Informal & Social Media Text
- Training Date: 2026-02-26
🏷️ Supported Emotion Labels
This model detects the following emotion types:
| Label | Description |
|---|---|
| LABEL_0 | Anger/Marah |
| LABEL_1 | Anticipation/Antisipasi |
| LABEL_2 | Disgust/Jijik |
| LABEL_3 | Fear/Takut |
| LABEL_4 | Joy/Senang |
| LABEL_5 | Sadness/Sedih |
| LABEL_6 | Surprise/Terkejut |
| LABEL_7 | Trust/Percaya |
📊 Model Performance
Evaluated on held-out validation dataset:
| Metric | Score |
|---|---|
| F1 Score | 0.9726 |
| Precision | 0.9902 |
| Recall | 0.9557 |
| Training Loss | 0.0824 |
| Validation Loss | 0.0637 |
🏗️ Training Configuration
| Parameter | Value |
|---|---|
| Base Model | indobert-base-p1 |
| Training Samples | 96,831 |
| Epochs | 3 |
| Learning Rate | 2e-5 |
| Batch Size | 16 (train), 32 (eval) |
| Optimizer | AdamW |
| Framework | Hugging Face Transformers |
🚀 Usage
Preprocessing Configuration
import re
def clean_text(text):
if not isinstance(text, str):
return text
text = text.replace("#", "<hashtag>")
text = re.sub(r"https?://\S+|www\.\S+", "<link>", text)
text = re.sub(r"\b[\w\.-]+@[\w\.-]+\.\w+\b", "<email>", text)
text = re.sub(r"@\w+", "<user>", text)
text = text.replace('"', "").replace("'", "")
text = text.replace("\n", " ")
text = text.replace("\\n", " ")
text = re.sub(r"\s+", " ", text).strip()
return text
Quick Inference (Single Text)
import torch
import numpy as np
from transformers import AutoTokenizer, AutoModelForSequenceClassification, AutoConfig
MODEL_NAME = "iqbalpurba26/dev-emot-indobert"
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
config = AutoConfig.from_pretrained(
MODEL_NAME,
num_labels=8,
problem_type="multi_label_classification"
)
model = AutoModelForSequenceClassification.from_pretrained(
MODEL_NAME,
config=config
)
label_cols = ['anger', 'anticipation', 'disgust', 'fear',
'joy', 'sadness', 'surprise', 'trust']
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)
def predict_multilabel(texts, max_length=128, top_k=1):
if isinstance(texts, str):
texts = [texts]
encodings = tokenizer(
texts,
padding=True,
truncation=True,
max_length=max_length,
return_tensors="pt"
)
encodings = {k: v.to(device) for k, v in encodings.items()}
with torch.no_grad():
outputs = model(**encodings)
logits = outputs.logits
probs = torch.sigmoid(logits).cpu().numpy()
results = []
for text, prob in zip(texts, probs):
top_indices = np.argsort(prob)[-top_k:][::-1]
top_labels = [label_cols[i] for i in top_indices]
results.append({
"text": text,
"top_emotions": top_labels,
"probabilities": {
label_cols[i]: float(prob[i]) for i in top_indices
}
})
return results
text = """
#SobatBahari, Kepala #BPPSDMKP, I Nyoman Radiarta beserta Eselon 1 Kementerian Kelautan dan Perikanan RI lainnya, dampingi Menteri Sakti Wahyu Trenggono dalam pertemuan dengan Babcock International Group di Gedung Mina Bahari I, KKP, Jakarta, (12/2). Pertemuan ini membahas peluang kerja sama strategis untuk memperkuat sektor kelautan dan perikanan Indonesia, termasuk dukungan teknologi, peningkatan kapasitas, dan pengembangan SDM KP. Menteri Trenggono menegaskan kemitraan lintas negara adalah kunci percepatan modernisasi sektor Kelautan dan Perikanan. Dialog berlangsung hangat dan interaktif, menjadi fondasi awal penjajakan kerja sama lanjutan. Dari perspektif BPPSDM KP, kolaborasi ini membuka ruang penguatan kompetensi #SDMKP agar makin adaptif, profesional, dan siap menjawab tantangan global. #2026KKPGrowStronger #KKPGOID #SDMUnggul
"""
text = clean_text(text)
print(predict_multilabel(text))
Quick Inference (Batch Size)
import torch
import numpy as np
from math import ceil
from transformers import AutoTokenizer, AutoModelForSequenceClassification, AutoConfig
MODEL_NAME = "iqbalpurba26/dev-emot-indobert"
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
config = AutoConfig.from_pretrained(
MODEL_NAME,
num_labels=8,
problem_type="multi_label_classification"
)
model = AutoModelForSequenceClassification.from_pretrained(
MODEL_NAME,
config=config
)
label_cols = ['anger', 'anticipation', 'disgust', 'fear',
'joy', 'sadness', 'surprise', 'trust']
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)
def predict_from_list(texts, batch_size=16, max_length=512, top_k=1):
if isinstance(texts, str):
texts = [texts]
results = []
total_batches = ceil(len(texts) / batch_size)
for i in range(total_batches):
batch_texts = texts[i*batch_size:(i+1)*batch_size]
encodings = tokenizer(
batch_texts,
padding=True,
truncation=True,
max_length=max_length,
return_tensors="pt"
)
encodings = {k: v.to(device) for k, v in encodings.items()}
with torch.no_grad():
outputs = model(**encodings)
logits = outputs.logits
probs = torch.sigmoid(logits).cpu().numpy()
for text, prob in zip(batch_texts, probs):
top_indices = np.argsort(prob)[-top_k:][::-1]
for idx in top_indices:
results.append({
"text": text,
"label": label_cols[idx],
"score": float(prob[idx])
})
return results
texts = [
"Ribuan keluarga berduka setelah banjir besar merenggut puluhan korban jiwa.",
"Para pekerja yang terkena PHK massal hanya bisa pasrah melihat masa depan mereka terancam.",
"Tangis pecah di lokasi kebakaran yang menghanguskan rumah-rumah warga miskin.",
"Ekonomi yang terpuruk membuat banyak usaha kecil gulung tikar dan meninggalkan kisah pilu.",
"Orang tua itu terlihat hancur setelah kehilangan satu-satunya sumber penghasilan keluarganya.",
"Warga geram dan mengecam keras kebijakan yang dinilai tidak berpihak pada rakyat kecil.",
"Publik marah besar atas dugaan korupsi yang merugikan negara hingga triliunan rupiah.",
"Kebijakan itu dianggap gagal total dan memicu protes di berbagai daerah.",
"Aktivis mengutuk keras tindakan aparat yang dinilai berlebihan.",
"Banyak pihak menilai keputusan tersebut memalukan dan tidak bertanggung jawab.",
"Masyarakat muak dengan praktik suap yang dinilai menjijikkan dan tidak bermoral.",
"Skandal itu dianggap sangat memuakkan dan mencoreng nama baik institusi.",
"Publik merasa jijik melihat pejabat yang tertangkap tangan melakukan pungli.",
"Perilaku manipulatif tersebut dinilai busuk dan merusak kepercayaan publik.",
"Banyak orang eneg dengan drama politik yang penuh kepentingan pribadi.",
"Warga ketakutan setelah terjadi ledakan besar di pusat kota.",
"Ancaman resesi global membuat pelaku usaha khawatir akan masa depan bisnis mereka.",
"Isu keamanan siber yang bocor membuat masyarakat cemas data pribadinya disalahgunakan.",
"Lonjakan kasus penyakit misterius memicu kepanikan di beberapa wilayah.",
"Investor waswas melihat nilai tukar yang terus melemah dalam beberapa hari terakhir."
]
predict_from_list(texts, batch_size=16)
🎯 Intended Use Cases
- Social media Named Entity Recognition
- Comment & post filtering
- Content moderation assistance
- Political monitoring
- Brand & organization tracking
- Multilingual content intelligence systems
⚠️ Limitations
- Supports only the defined emotion labels set:
labels = ['anger', 'anticipation', 'disgust', 'fear', 'joy', 'sadness', 'surprise', 'trust'] - Not optimized for:
- Formal academic/legal documents
- Extremely short or ambiguous messages
- Heavy slang or sarcastic expressions
- Performance may degrade on highly code-mixed sentences
- The model may inherit bias from training data
⚖️ Ethical Considerations
This model may reflect demographic, geopolitical, or cultural biases present in the training dataset.
It is not intended to replace human judgment in high-risk or sensitive decision-making systems.
Human-in-the-loop review is strongly recommended for moderation or governance-related deployments.
🖥️ Hardware Recommendations
- Recommended: GPU (≥ 8GB VRAM) for optimal performance
- CPU inference supported but slower
- Compatible with FP16 mixed precision for faster inference
📜 License
Released under the Apache 2.0 License.
Free for commercial and research use.
📚 Citation
@misc{purba2026multilabelemotionanalysis,
author = {M. Iqbal Purba},
title = {Multilabel Emotion Analysis for Social Media},
year = {2026},
publisher = {Hugging Face},
url = {https://huggingface.co/iqbalpurba26/dev-emot-indobert}
}
🙌 Acknowledgements
- Hugging Face Transformers
- IndoBenchmark - 1ndobert-base-p1
- Open-source NLP community
- Contributors and dataset annotators
- Downloads last month
- 55
Model tree for iqbalpurba26/dev-emot-indobert
Base model
indobenchmark/indobert-base-p1