🌍 Bangla Sentiment & Sarcasm Dual-Head Model

Joint sentiment classification & sarcasm detection for imbalanced Bangla social media text

📊 Task 🌐 Language 🏗️ Architecture ⚡ Training Paradigm
Sentiment Analysis (4-class) + Sarcasm Detection (2-class) Bengali (bn) Dual-head BanglaBERT (csebuetnlp/banglabert_small) Multi-task Learning, Dynamic Focal Loss, Class-Aware Threshold Calibration

📁 Training Code & Scripts: GitHub

🤗 Model Weights & Inference: Hugging Face

📖 Paper: Zenodo


📦 Repository Contents

File Description
model.pth Trained dual-head BanglaBERT weights
sent_thresholds.npy Calibrated decision thresholds for sentiment (4 classes)
sarc_thresholds.npy Calibrated decision thresholds for sarcasm (2 classes)
tokenizer/ Standard BanglaBERT tokenizer files (vocab.txt, tokenizer_config.json, etc.)

♻️ Reproducibility

  • ✅ Fixed random seed (42) for all experiments
  • ✅ 5-fold stratified cross-validation with bootstrap CIs
  • ✅ All thresholds tuned on validation folds only (no test leakage)
  • ✅ Code, data, and model weights publicly available

📖 Model Details

This model implements a calibrated multitask framework for joint sentiment and sarcasm detection in low-resource Bangla social media text. It addresses severe class imbalance and pragmatic ambiguity through:

  • 🔹 Dual-head architecture: Shared BanglaBERT encoder → 256-dim projection layer → independent sentiment & sarcasm classification heads
  • 🔹 Dynamic loss scheduling: Fold-adaptive inverse-frequency α scaling + linear γ decay (2.5 → 0.8) for epoch-aware hard-example mining
  • 🔹 Post-hoc threshold calibration: Per-class decision boundaries optimized on validation folds to prevent majority-class bias
  • 🔹 Data augmentation: BanglaT5 paraphrasing applied offline to enrich minority classes

The model was trained on 6,507 manually annotated cricket fan comments from Bangladesh’s 2023 ICC World Cup campaign, spanning Facebook and YouTube discourse.


🛠️ How to Use

Installation

pip install transformers torch numpy huggingface_hub

Inference Example (with calibrated thresholds)

import torch
import numpy as np
from huggingface_hub import hf_hub_download
from transformers import AutoTokenizer
from model_architecture import DualHeadModel

REPO_ID = "ahs95/sentiment-sarcasm-detection-BanglaBERT"

# Load tokenizer & model
tokenizer = AutoTokenizer.from_pretrained(REPO_ID)
model = DualHeadModel(num_sentiment_classes=4, num_sarcasm_classes=2)

model_path = hf_hub_download(repo_id=REPO_ID, filename="model.pth")
model.load_state_dict(torch.load(model_path, map_location="cpu", weights_only=True))
model.eval()

# Load calibrated thresholds
sent_thresholds = np.load(hf_hub_download(repo_id=REPO_ID, filename="sent_thresholds.npy"))
sarc_thresholds = np.load(hf_hub_download(repo_id=REPO_ID, filename="sarc_thresholds.npy"))

sentiment_labels = ["Positive", "Neutral", "Negative", "Mixed"]
sarcasm_labels = ["Sarcastic", "Non-Sarcastic"]  # Index 0 = Sarcastic

def predict(text, max_len=512):
    inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=max_len, padding="max_length")
    
    with torch.no_grad():
        # DualHeadModel returns tuple: (sent_logits, sarc_logits)
        sent_logits, sarc_logits = model(inputs["input_ids"], inputs["attention_mask"])
    
    sent_probs = torch.softmax(sent_logits.squeeze(0), dim=-1)
    sarc_prob = torch.sigmoid(sarc_logits.squeeze(0))[0]  # P(Sarcastic)
    
    # Apply calibrated thresholds
    sent_pred = "Neutral"  # fallback
    for i, prob in enumerate(sent_probs):
        if prob >= sent_thresholds[i]:
            sent_pred = sentiment_labels[i]
            break
    
    sarc_pred = sarcasm_labels[0] if sarc_prob >= sarc_thresholds[0] else sarcasm_labels[1]
    
    return {
        "sentiment": sent_pred,
        "sarcasm": sarc_pred,
        "confidence": {
            "sentiment": sent_probs.tolist(),
            "sarcasm": [sarc_prob.item(), 1 - sarc_prob.item()]
        }
    }

# Test
result = predict("বাংলাদেশ জিতবে ২০৫০ বিশ্বকাপ, তখন আমি আর বেঁচে থাকব না।")
print(result)
# Expected: {'sentiment': 'Negative', 'sarcasm': 'Sarcastic', 'confidence': {...}}

📦 Note: The DualHeadModel class definition is available in the training repository. Copy model_architecture.py to your local environment before running the inference example.


📊 Evaluation Results

Evaluation followed a 5-fold stratified cross-validation protocol. Metrics are macro/weighted averaged across folds. Confidence intervals computed via 2,000 bootstrap resamples.

🎯 Sentiment Analysis (4-class)

Class Precision Recall F1-Score Support
Positive 0.64 0.68 0.66 1,407
Neutral 0.57 0.62 0.59 355
Negative 0.91 0.86 0.88 4,206
Mixed 0.53 0.65 0.58 539
Macro F1 0.68
Weighted F1 0.79 (95% CI: 0.784–0.804)

😏 Sarcasm Detection (2-class)

Class Precision Recall F1-Score Support
Sarcastic 0.60 0.64 0.62 2,261
Non-Sarcastic 0.80 0.78 0.79 4,246
Macro F1 0.70
Weighted F1 0.73 (95% CI: 0.718–0.740)

📉 Ablation Baseline: Vanilla Cross-Entropy yields W-F1=0.69 (Sent) & 0.61 (Sarc) with complete minority-class collapse (Neutral/Mixed F1: 0.00).


🧪 Training Details

Parameter Value
Base Encoder csebuetnlp/banglabert_small
Optimizer 8-bit AdamW (bitsandbytes)
Learning Rate 2e-5 (Cosine Annealing)
Batch Size 16 (Gradient Accumulation ×2 → eff. 32)
Max Epochs 5 (Early Stopping patience=2 on composite F1)
Loss Function Dynamic Focal Loss: α ∈ [0.15, 0.45], γ: 2.5 → 0.8
Augmentation BanglaT5 paraphrasing (offline, minority-focused)
Hardware T4 GPU (VRAM-optimized via 8-bit quantization)
Reproducibility Fixed seed 42, 5-fold stratified splits

⚠️ Limitations & Bias

  1. Domain Specificity: Trained exclusively on cricket fan discourse. Performance may degrade on political, e-commerce, or formal Bangla text without domain adaptation.
  2. Pragmatic Reasoning: Struggles with lexical-pragmatic inversion, culturally embedded metaphors (e.g., মীরজাফর), and negation-driven intensification (লজ্জা নেই).
  3. Representation Bottleneck: Relies on [CLS] pooling, which compresses dual-polarity utterances and obscures long-range pragmatic dependencies.
  4. No Multimodal/Code-Mix Support: Emojis, memes, and Bangla-English code-switching are not explicitly modeled. Future work will integrate adapter-based multimodal extensions.
  5. Threshold Calibration: Post-hoc procedure; not differentiable. Embedding cost-sensitive objectives directly into training may yield further gains.

🔍 Error Analysis: 50.1% of misclassifications are sarcasm-related, primarily due to hyperbolic non-sarcastic comments sharing pragmatic features with irony.


📚 Citation

If you use this model or dataset in your research, please cite:

@article{banglasentimentsarcasm,
  title={Sentiment and Sarcasm Detection in Bangla: A Calibrated Multitask Framework for Imbalanced Cricket Discourse},
  author={Arshadul Hoque and Nasrin Sultana and Risul Islam Rasel},
  year={2026},
  publisher={Zenodo},
  doi={10.5281/zenodo.20307593}
}

🤝 Contact

  • 📧 ahsbd95@gmail.com

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ahs95/sentiment-sarcasm-detection-BanglaBERT

Finetuned
(2)
this model

Dataset used to train ahs95/sentiment-sarcasm-detection-BanglaBERT