๐ŸŽฌ Sindhi Movie Sentiment Analysis

Fine-tuned XLM-RoBERTa-base model for binary sentiment classification on Sindhi movie reviews.


๐Ÿ“‹ Model Details

Field Details
Base model xlm-roberta-base
Task Binary Sentiment Classification
Language Sindhi (sd) โ€” Perso-Arabic / Nastaliq script
Labels positive ยท negative
Dataset DanishMahdi/snd_movies_sentiment_analysis
Training rows ~40,000 (20k positive, 20k negative)
Max token length 128

๐Ÿ“Š Training Configuration

Hyperparameter Value
Learning rate 2e-5
Batch size 16
Epochs 5 (early stopping patience = 2)
Warmup ratio 0.1
LR scheduler cosine
Weight decay 0.01
Optimizer AdamW
Precision fp16 (if CUDA available)

๐Ÿš€ Quick Start

Install dependencies

pip install transformers torch

Run inference

from transformers import pipeline

pipe = pipeline(
    "text-classification",
    model="DanishMahdi/snd_sentiment_analysis",
)

reviews = [
    "ู‡ูŠ ูู„ู… ุจู„ฺชู„ ุฎุฑุงุจ ู‡ุฆูŠุŒ ู…ูˆู† ฺฉูŠ ูพุณู†ุฏ ู†ู‡ ุขุฆูŠ",   # negative
    "ู‡ูŠ ูู„ู… ุชู…ุงู… ุณูบูŠ ู‡ุฆูŠุŒ ู…ูˆู† ฺฉูŠ ุชู…ุงู… ฺฏู‡ฺปูˆ ูพุณู†ุฏ ุขุฆูŠ",  # positive
]

for review in reviews:
    result = pipe(review)[0]
    print(f"Label: {result['label']} | Score: {result['score']:.4f}")

Output

Label: NEGATIVE | Score: 0.9873
Label: POSITIVE | Score: 0.9912

๐Ÿ“ Repository Structure

DanishMahdi/snd_sentiment_analysis/
โ”œโ”€โ”€ config.json                    # Model config
โ”œโ”€โ”€ model.safetensors              # Fine-tuned weights
โ”œโ”€โ”€ tokenizer_config.json          # Tokenizer config
โ”œโ”€โ”€ sentencepiece.bpe.model        # SentencePiece vocab
โ”œโ”€โ”€ evaluation/
โ”‚   โ”œโ”€โ”€ test_metrics.json          # Accuracy, F1, Precision, Recall
โ”‚   โ”œโ”€โ”€ confusion_matrix.json      # Raw confusion matrix
โ”‚   โ”œโ”€โ”€ confusion_matrix.png       # Confusion matrix plot
โ”‚   โ”œโ”€โ”€ training_curves.png        # Loss & F1 over epochs
โ”‚   โ”œโ”€โ”€ test_metrics_bar.png       # Bar chart of metrics
โ”‚   โ””โ”€โ”€ classification_report.txt  # Full sklearn report
โ””โ”€โ”€ README.md

๐Ÿ“ˆ Evaluation Results

See evaluation/test_metrics.json for the latest numbers. Plots are available in the evaluation/ folder.

Metric Score
Accuracy 0.8825
F1 (weighted) 0.8825
Precision (weighted) 0.8828
Recall (weighted) 0.8825

๐Ÿ”— Dataset

The training data is sourced from
DanishMahdi/snd_movies_sentiment_analysis

  • Total rows: ~40,000
  • Positive reviews: ~20,000
  • Negative reviews: ~20,000
  • Split: 80% train / 10% validation / 10% test

๐Ÿ“ Citation

@misc{danish2025snd,
  author    = {Danish Mahdi},
  title     = {Sindhi Movie Sentiment Analysis using XLM-RoBERTa},
  year      = {2026},
  publisher = {Hugging Face},
  url       = {https://huggingface.co/DanishMahdi/snd_sentiment_analysis},
}

โš–๏ธ License

MIT โ€” free to use for research and commercial purposes.

Downloads last month
5
Safetensors
Model size
0.3B params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Dataset used to train DanishMahdi/snd_sentiment_analysis

Evaluation results

  • accuracy on snd_movies_sentiment_analysis
    self-reported
    see evaluation/test_metrics.json
  • f1 on snd_movies_sentiment_analysis
    self-reported
    see evaluation/test_metrics.json