ahs95
/

banglabert-sentiment-analysis

+---
+license: apache-2.0
+datasets:
+- SayedShaun/sentigold
+language:
+- bn
+metrics:
+- accuracy
+- f1
+base_model:
+- csebuetnlp/banglabert
+pipeline_tag: text-classification
+tags:
+- sentiment-analysis
+- bengali
+- bangla
+- multilabel-classification
+---
+# BanglaBERT Fine-tuned for Bangla Sentiment Analysis
+## Model Description
+This model is a fine-tuned version of [`csebuetnlp/banglabert`](https://huggingface.co/csebuetnlp/banglabert) on the SentiGOLD dataset for 5-class sentiment analysis in Bengali. It classifies text into:
+1. 😠 Very Negative (SN)
+2. 😞 Negative (WN)
+3. 😐 Neutral (N)
+4. 😊 Positive (WP)
+5. 😍 Very Positive (SP)
+**Key Features:**
+- State-of-the-art Bangla language understanding
+- Handles both formal and informal Bengali text
+- Optimized for social media, reviews, and customer feedback
+- Requires text normalization using [Bangla Normalizer](https://github.com/csebuetnlp/normalizer)
+## Intended Uses & Limitations
+### Primary Use
+- Sentiment analysis of Bengali text
+- Social media monitoring
+- Customer feedback analysis
+- Product review classification
+### Limitations
+- Performance may degrade on code-mixed text (Bengali-English)
+- May struggle with sarcasm and highly contextual expressions
+- Best for short to medium-length texts (up to 512 tokens)
+## Training Data
+The model was fine-tuned on [**SentiGOLD**](https://arxiv.org/pdf/2306.06147), the largest gold-standard Bangla sentiment analysis dataset:
+| Feature                | Value         |
+|------------------------|---------------|
+| Total Samples          | 70,000        |
+| Domains Covered        | 30+           |
+| Source Diversity       | Social media, news, blogs, reviews |
+| Class Distribution     | Balanced across 5 classes |
+| Annotation Quality     | Fleiss' kappa = 0.88 |
+## Training Procedure
+### Hyperparameters
+| Parameter | Value |
+| --- | --- |
+| Learning Rate | 2e-5 → 1.05e-6 |
+| Batch Size | 48 |
+| Epochs | 5 |
+| Optimizer | AdamW |
+| Scheduler | ReduceLROnPlateau |
+| Weight Decay | 0.01 |
+| Gradient Accumulation | 4 steps |
+| Warmup Ratio | 5% |
+### Techniques
+* Class-weighted loss handling imbalance
+* Early stopping (patience=3)
+* Mixed precision (FP16) training
+* Gradient checkpointing
+* Text normalization using Bangla Normalizer
+## Evaluation Results
+### Validation Performance
+| Epoch | F1 (Macro) | Accuracy | Very Neg F1 | Neg F1 | Neu F1 | Pos F1 | Very Pos F1 |
+| --- | --- | --- | --- | --- | --- | --- | --- |
+| 1 | 0.6334 | 0.6331 | 0.6789 | 0.5834 | 0.6407 | 0.5635 | 0.7004 |
+| 5 | 0.6537 | 0.6551 | 0.7081 | 0.6157 | 0.6421 | 0.5789 | 0.7236 |
+### Final Test Performance
+| Metric | Score |
+| --- | --- |
+| Macro F1 | 0.6660 |
+| Accuracy | 0.6671 |
+## How to Use
+### Direct Inference
+```python
+from transformers import pipeline
+from normalizer import normalize
+# Load model
+classifier = pipeline(
+    "text-classification",
+    model="ahs95/banglabert-sentiment-analysis",
+    tokenizer="ahs95/banglabert-sentiment-analysis"
+)
+# Prepare text
+text = "আপনার পণ্যটি অসাধারণ! আমি খুবই সন্তুষ্ট।"
+normalized_text = normalize(text)  # Important for BanglaBERT
+# Classify
+result = classifier(normalized_text)
+print(f"Sentiment: {result[0]['label']} (Confidence: {result[0]['score']:.2f})")
+```
+### Advanced Usage
+```python
+from transformers import AutoTokenizer, AutoModelForSequenceClassification
+import torch
+from normalizer import normalize
+# Load model and tokenizer
+model_name = "ahs95/banglabert-sentiment-analysis"
+tokenizer = AutoTokenizer.from_pretrained(model_name)
+model = AutoModelForSequenceClassification.from_pretrained(model_name)
+# Prepare inputs
+texts = [
+    "সেবা খুব খারাপ ছিল। আমি কখনো ফিরে আসব না।",
+    "পণ্যটির গুণগত মান মোটামুটি ভাল"
+]
+normalized_texts = [normalize(t) for t in texts]
+# Tokenize and predict
+inputs = tokenizer(normalized_texts, padding=True, truncation=True, return_tensors="pt")
+with torch.no_grad():
+    outputs = model(**inputs)
+    probabilities = torch.nn.functional.softmax(outputs.logits, dim=-1)
+# Get predictions
+sentiment_labels = ["Very Negative", "Negative", "Neutral", "Positive", "Very Positive"]
+predictions = [sentiment_labels[p] for p in probabilities.argmax(dim=1)]
+for text, pred in zip(texts, predictions):
+    print(f"Text: {text}\nPredicted Sentiment: {pred}\n")
+```
+### Ethical Considerations
+- **Bias:** While SentiGOLD reduces bias through synthetic data, real-world validation is recommended
+- **Use Cases:** Suitable for:
+  * Product feedback analysis
+  * Social media monitoring
+  * Market research
+  - **Avoid:** Critical decision systems without human oversight
+### Citation
+If you use this model, please cite:
+```bibtex
+@misc{banglabert-sentiment,
+  author = {Arshadul Hoque},
+  title = {Fine-tuned BanglaBERT for Bengali Sentiment Analysis},
+  year = {2025},
+  publisher = {Hugging Face},
+  howpublished = {\url{https://huggingface.co/ahs95/banglabert-sentiment-analysis}}
+}
+```
+### Contact
+For questions and support: ahsbd95@gmail.com