Swarnadeep-28
/

bengali-code-mix-sentiment

@@ -7,47 +7,118 @@ tags:
 model-index:
 - name: bengali-code-mix-sentiment
   results: []
 ---
-<!-- This model card has been generated automatically according to the information the Trainer had access to. You
-should probably proofread and complete it, then remove this comment. -->
-# bengali-code-mix-sentiment
-This model is a fine-tuned version of [xlm-roberta-base](https://huggingface.co/xlm-roberta-base) on the None dataset.
-## Model description
-More information needed
-## Intended uses & limitations
-More information needed
-## Training and evaluation data
-More information needed
-## Training procedure
-### Training hyperparameters
-The following hyperparameters were used during training:
-- learning_rate: 2e-05
-- train_batch_size: 16
-- eval_batch_size: 32
-- seed: 42
-- optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
-- lr_scheduler_type: linear
-- num_epochs: 3
-### Training results
-### Framework versions
-- Transformers 4.56.1
-- Pytorch 2.8.0+cu126
-- Datasets 4.0.0
-- Tokenizers 0.22.0

 model-index:
 - name: bengali-code-mix-sentiment
   results: []
+datasets:
+- Swarnadeep-28/bn_code_mix_sentiment_dataset
+metrics:
+- accuracy
+- f1
+- precision
+- recall
 ---
+# Bengali-English Code-Mixed Sentiment Model
+## Model Summary
+This model is a **fine-tuned version of [xlm-roberta-base](https://huggingface.co/xlm-roberta-base)** for **sentiment analysis** on **Bengali–English code-mixed text** (social media posts, comments, and tweets).
+- **Task**: Text Classification (Sentiment Analysis)
+- **Languages**: Bengali (Romanized) + English
+- **Classes**: `0`, `1`, `2`, `3`
+- **Fine-tuning method**: Full fine-tuning
+- **Dataset**: [Bengali-English Code-Mixed Sentiment Dataset](https://huggingface.co/datasets/jojocoder28/bn_code_mix_sentiment_dataset)
+This model provides strong baseline performance for code-mixed sentiment classification and can be directly applied to social media analysis and low-resource NLP research.
+---
+## How to Use
+### Inference Example
+```python
+from transformers import AutoTokenizer, AutoModelForSequenceClassification
+import torch
+model_id = "jojocoder28/bengali-code-mix-sentiment"
+tokenizer = AutoTokenizer.from_pretrained(model_id)
+model = AutoModelForSequenceClassification.from_pretrained(model_id)
+text = "Aaj match ta khub bhalo chilo! Loved it."
+inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True)
+with torch.no_grad():
+    logits = model(**inputs).logits
+pred = torch.argmax(logits, dim=-1).item()
+labels = ["0", "1", "2", "3"]
+print("Predicted label:", labels[pred])
+```
+---
+## Training Details
+- **Base model**: `xlm-roberta-base`
+- **Method**: Full fine-tuning (all parameters updated)
+- **Optimizer**: AdamW
+- **Learning Rate**: 2e-5
+- **Epochs**: 3
+- **Batch Size**: 16 (train), 32 (eval)
+- **Hardware**: Trained on a single GPU (Colab T4 / equivalent)
+---
+## Evaluation
+### Classification Report
+| Label | Precision | Recall | F1-Score | Support |
+|-------|-----------|--------|----------|---------|
+| 0     | 0.80      | 0.73   | 0.77     | 528     |
+| 1     | 0.73      | 0.73   | 0.73     | 617     |
+| 2     | 0.69      | 0.76   | 0.72     | 675     |
+| 3     | 0.67      | 0.57   | 0.62     | 182     |
+### Overall Metrics
+- **Accuracy**: 0.73
+- **Macro Avg**: Precision = 0.72, Recall = 0.70, F1 = 0.71
+- **Weighted Avg**: Precision = 0.73, Recall = 0.73, F1 = 0.73
+- **Total Samples**: 2002
+---
+## Applications
+- Sentiment classification of Bengali-English social media text
+- Research in **code-mixed NLP for Indic languages**
+- Benchmark for parameter-efficient fine-tuning (compare with LoRA model)
+---
+## Limitations
+- Heavily Romanized or slang-heavy Bengali may reduce accuracy
+- Trained primarily on short-form text (tweets, comments, reviews)
+- Not designed for abusive/toxic content moderation or safety-critical use cases
+---
+## Ethical Considerations
+- Data reflects natural biases from social media sources
+- Misclassifications may occur in sarcasm or offensive text
+- Should not be the sole basis for critical decision-making
+---
+## Citation
+If you use this model, please cite:
+```bibtex
+@model{das2025_bn_code_mix_sentiment,
+  author    = {Swarnadeep Das},
+  title     = {Bengali-English Code-Mixed Sentiment Model},
+  year      = {2025},
+  url       = {https://huggingface.co/jojocoder28/bengali-code-mix-sentiment}
+}
+```
+---
+## Acknowledgements
+- **Dataset**: [Bengali-English Code-Mixed Sentiment Dataset](https://huggingface.co/datasets/jojocoder28/bn_code_mix_sentiment_dataset)
+- **Base model**: [`xlm-roberta-base`](https://huggingface.co/xlm-roberta-base)
+- **Frameworks**: [Transformers](https://huggingface.co/docs/transformers), [Datasets](https://huggingface.co/docs/datasets)