DanishMahdi/snd_movies_sentiment_analysis
Viewer โข Updated โข 40k โข 7
Fine-tuned XLM-RoBERTa-base model for binary sentiment classification on Sindhi movie reviews.
| Field | Details |
|---|---|
| Base model | xlm-roberta-base |
| Task | Binary Sentiment Classification |
| Language | Sindhi (sd) โ Perso-Arabic / Nastaliq script |
| Labels | positive ยท negative |
| Dataset | DanishMahdi/snd_movies_sentiment_analysis |
| Training rows | ~40,000 (20k positive, 20k negative) |
| Max token length | 128 |
| Hyperparameter | Value |
|---|---|
| Learning rate | 2e-5 |
| Batch size | 16 |
| Epochs | 5 (early stopping patience = 2) |
| Warmup ratio | 0.1 |
| LR scheduler | cosine |
| Weight decay | 0.01 |
| Optimizer | AdamW |
| Precision | fp16 (if CUDA available) |
pip install transformers torch
from transformers import pipeline
pipe = pipeline(
"text-classification",
model="DanishMahdi/snd_sentiment_analysis",
)
reviews = [
"ูู ููู
ุจฺูชู ุฎุฑุงุจ ูุฆูุ ู
ูู ฺฉู ูพุณูุฏ ูู ุขุฆู", # negative
"ูู ููู
ุชู
ุงู
ุณูบู ูุฆูุ ู
ูู ฺฉู ุชู
ุงู
ฺฏฺูปู ูพุณูุฏ ุขุฆู", # positive
]
for review in reviews:
result = pipe(review)[0]
print(f"Label: {result['label']} | Score: {result['score']:.4f}")
Label: NEGATIVE | Score: 0.9873
Label: POSITIVE | Score: 0.9912
DanishMahdi/snd_sentiment_analysis/
โโโ config.json # Model config
โโโ model.safetensors # Fine-tuned weights
โโโ tokenizer_config.json # Tokenizer config
โโโ sentencepiece.bpe.model # SentencePiece vocab
โโโ evaluation/
โ โโโ test_metrics.json # Accuracy, F1, Precision, Recall
โ โโโ confusion_matrix.json # Raw confusion matrix
โ โโโ confusion_matrix.png # Confusion matrix plot
โ โโโ training_curves.png # Loss & F1 over epochs
โ โโโ test_metrics_bar.png # Bar chart of metrics
โ โโโ classification_report.txt # Full sklearn report
โโโ README.md
See evaluation/test_metrics.json for the latest numbers.
Plots are available in the evaluation/ folder.
| Metric | Score |
|---|---|
| Accuracy | 0.8825 |
| F1 (weighted) | 0.8825 |
| Precision (weighted) | 0.8828 |
| Recall (weighted) | 0.8825 |
The training data is sourced fromDanishMahdi/snd_movies_sentiment_analysis
@misc{danish2025snd,
author = {Danish Mahdi},
title = {Sindhi Movie Sentiment Analysis using XLM-RoBERTa},
year = {2026},
publisher = {Hugging Face},
url = {https://huggingface.co/DanishMahdi/snd_sentiment_analysis},
}
MIT โ free to use for research and commercial purposes.