Instructions to use ahs95/sentiment-sarcasm-detection-BanglaBERT with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use ahs95/sentiment-sarcasm-detection-BanglaBERT with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-classification", model="ahs95/sentiment-sarcasm-detection-BanglaBERT")# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("ahs95/sentiment-sarcasm-detection-BanglaBERT", dtype="auto") - Notebooks
- Google Colab
- Kaggle
🌍 Bangla Sentiment & Sarcasm Dual-Head Model
Joint sentiment classification & sarcasm detection for imbalanced Bangla social media text
| 📊 Task | 🌐 Language | 🏗️ Architecture | ⚡ Training Paradigm |
|---|---|---|---|
| Sentiment Analysis (4-class) + Sarcasm Detection (2-class) | Bengali (bn) |
Dual-head BanglaBERT (csebuetnlp/banglabert_small) |
Multi-task Learning, Dynamic Focal Loss, Class-Aware Threshold Calibration |
📁 Training Code & Scripts: GitHub
🤗 Model Weights & Inference: Hugging Face
📖 Paper: Zenodo
📦 Repository Contents
| File | Description |
|---|---|
model.pth |
Trained dual-head BanglaBERT weights |
sent_thresholds.npy |
Calibrated decision thresholds for sentiment (4 classes) |
sarc_thresholds.npy |
Calibrated decision thresholds for sarcasm (2 classes) |
tokenizer/ |
Standard BanglaBERT tokenizer files (vocab.txt, tokenizer_config.json, etc.) |
♻️ Reproducibility
- ✅ Fixed random seed (
42) for all experiments - ✅ 5-fold stratified cross-validation with bootstrap CIs
- ✅ All thresholds tuned on validation folds only (no test leakage)
- ✅ Code, data, and model weights publicly available
📖 Model Details
This model implements a calibrated multitask framework for joint sentiment and sarcasm detection in low-resource Bangla social media text. It addresses severe class imbalance and pragmatic ambiguity through:
- 🔹 Dual-head architecture: Shared BanglaBERT encoder → 256-dim projection layer → independent sentiment & sarcasm classification heads
- 🔹 Dynamic loss scheduling: Fold-adaptive inverse-frequency
αscaling + linearγdecay (2.5 → 0.8) for epoch-aware hard-example mining - 🔹 Post-hoc threshold calibration: Per-class decision boundaries optimized on validation folds to prevent majority-class bias
- 🔹 Data augmentation: BanglaT5 paraphrasing applied offline to enrich minority classes
The model was trained on 6,507 manually annotated cricket fan comments from Bangladesh’s 2023 ICC World Cup campaign, spanning Facebook and YouTube discourse.
🛠️ How to Use
Installation
pip install transformers torch numpy huggingface_hub
Inference Example (with calibrated thresholds)
import torch
import numpy as np
from huggingface_hub import hf_hub_download
from transformers import AutoTokenizer
from model_architecture import DualHeadModel
REPO_ID = "ahs95/sentiment-sarcasm-detection-BanglaBERT"
# Load tokenizer & model
tokenizer = AutoTokenizer.from_pretrained(REPO_ID)
model = DualHeadModel(num_sentiment_classes=4, num_sarcasm_classes=2)
model_path = hf_hub_download(repo_id=REPO_ID, filename="model.pth")
model.load_state_dict(torch.load(model_path, map_location="cpu", weights_only=True))
model.eval()
# Load calibrated thresholds
sent_thresholds = np.load(hf_hub_download(repo_id=REPO_ID, filename="sent_thresholds.npy"))
sarc_thresholds = np.load(hf_hub_download(repo_id=REPO_ID, filename="sarc_thresholds.npy"))
sentiment_labels = ["Positive", "Neutral", "Negative", "Mixed"]
sarcasm_labels = ["Sarcastic", "Non-Sarcastic"] # Index 0 = Sarcastic
def predict(text, max_len=512):
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=max_len, padding="max_length")
with torch.no_grad():
# DualHeadModel returns tuple: (sent_logits, sarc_logits)
sent_logits, sarc_logits = model(inputs["input_ids"], inputs["attention_mask"])
sent_probs = torch.softmax(sent_logits.squeeze(0), dim=-1)
sarc_prob = torch.sigmoid(sarc_logits.squeeze(0))[0] # P(Sarcastic)
# Apply calibrated thresholds
sent_pred = "Neutral" # fallback
for i, prob in enumerate(sent_probs):
if prob >= sent_thresholds[i]:
sent_pred = sentiment_labels[i]
break
sarc_pred = sarcasm_labels[0] if sarc_prob >= sarc_thresholds[0] else sarcasm_labels[1]
return {
"sentiment": sent_pred,
"sarcasm": sarc_pred,
"confidence": {
"sentiment": sent_probs.tolist(),
"sarcasm": [sarc_prob.item(), 1 - sarc_prob.item()]
}
}
# Test
result = predict("বাংলাদেশ জিতবে ২০৫০ বিশ্বকাপ, তখন আমি আর বেঁচে থাকব না।")
print(result)
# Expected: {'sentiment': 'Negative', 'sarcasm': 'Sarcastic', 'confidence': {...}}
📦 Note: The
DualHeadModelclass definition is available in the training repository. Copymodel_architecture.pyto your local environment before running the inference example.
📊 Evaluation Results
Evaluation followed a 5-fold stratified cross-validation protocol. Metrics are macro/weighted averaged across folds. Confidence intervals computed via 2,000 bootstrap resamples.
🎯 Sentiment Analysis (4-class)
| Class | Precision | Recall | F1-Score | Support |
|---|---|---|---|---|
| Positive | 0.64 | 0.68 | 0.66 | 1,407 |
| Neutral | 0.57 | 0.62 | 0.59 | 355 |
| Negative | 0.91 | 0.86 | 0.88 | 4,206 |
| Mixed | 0.53 | 0.65 | 0.58 | 539 |
| Macro F1 | 0.68 | |||
| Weighted F1 | 0.79 (95% CI: 0.784–0.804) |
😏 Sarcasm Detection (2-class)
| Class | Precision | Recall | F1-Score | Support |
|---|---|---|---|---|
| Sarcastic | 0.60 | 0.64 | 0.62 | 2,261 |
| Non-Sarcastic | 0.80 | 0.78 | 0.79 | 4,246 |
| Macro F1 | 0.70 | |||
| Weighted F1 | 0.73 (95% CI: 0.718–0.740) |
📉 Ablation Baseline: Vanilla Cross-Entropy yields W-F1=0.69 (Sent) & 0.61 (Sarc) with complete minority-class collapse (Neutral/Mixed F1: 0.00).
🧪 Training Details
| Parameter | Value |
|---|---|
| Base Encoder | csebuetnlp/banglabert_small |
| Optimizer | 8-bit AdamW (bitsandbytes) |
| Learning Rate | 2e-5 (Cosine Annealing) |
| Batch Size | 16 (Gradient Accumulation ×2 → eff. 32) |
| Max Epochs | 5 (Early Stopping patience=2 on composite F1) |
| Loss Function | Dynamic Focal Loss: α ∈ [0.15, 0.45], γ: 2.5 → 0.8 |
| Augmentation | BanglaT5 paraphrasing (offline, minority-focused) |
| Hardware | T4 GPU (VRAM-optimized via 8-bit quantization) |
| Reproducibility | Fixed seed 42, 5-fold stratified splits |
⚠️ Limitations & Bias
- Domain Specificity: Trained exclusively on cricket fan discourse. Performance may degrade on political, e-commerce, or formal Bangla text without domain adaptation.
- Pragmatic Reasoning: Struggles with lexical-pragmatic inversion, culturally embedded metaphors (e.g.,
মীরজাফর), and negation-driven intensification (লজ্জা নেই). - Representation Bottleneck: Relies on
[CLS]pooling, which compresses dual-polarity utterances and obscures long-range pragmatic dependencies. - No Multimodal/Code-Mix Support: Emojis, memes, and Bangla-English code-switching are not explicitly modeled. Future work will integrate adapter-based multimodal extensions.
- Threshold Calibration: Post-hoc procedure; not differentiable. Embedding cost-sensitive objectives directly into training may yield further gains.
🔍 Error Analysis: 50.1% of misclassifications are sarcasm-related, primarily due to hyperbolic non-sarcastic comments sharing pragmatic features with irony.
📚 Citation
If you use this model or dataset in your research, please cite:
@article{banglasentimentsarcasm,
title={Sentiment and Sarcasm Detection in Bangla: A Calibrated Multitask Framework for Imbalanced Cricket Discourse},
author={Arshadul Hoque and Nasrin Sultana and Risul Islam Rasel},
year={2026},
publisher={Zenodo},
doi={10.5281/zenodo.20307593}
}
🤝 Contact
- 📧
ahsbd95@gmail.com
Model tree for ahs95/sentiment-sarcasm-detection-BanglaBERT
Base model
csebuetnlp/banglabert_small