๐ŸŽฌ Sindhi Movie Sentiment Analysis

Fine-tuned XLM-RoBERTa-base model for binary sentiment classification on Sindhi movie reviews.


๐Ÿ“‹ Model Details

Field Details
Base model xlm-roberta-base
Task Binary Sentiment Classification
Language Sindhi (sd) โ€” Perso-Arabic / Nastaliq script
Labels positive ยท negative
Dataset DanishMahdi/snd_movies_sentiment_analysis
Training rows ~40,000 (20k positive, 20k negative)
Max token length 128

๐Ÿ“Š Training Configuration

Hyperparameter Value
Learning rate 2e-5
Batch size 16
Epochs 5 (early stopping patience = 2)
Warmup ratio 0.1
LR scheduler cosine
Weight decay 0.01
Optimizer AdamW
Precision fp16 (if CUDA available)

๐Ÿš€ Quick Start

Install dependencies

pip install transformers torch

Run inference

from transformers import pipeline

pipe = pipeline(
    "text-classification",
    model="DanishMahdi/snd_sentiment_analysis",
)

reviews = [
    "ู‡ูŠ ูู„ู… ุจู„ฺชู„ ุฎุฑุงุจ ู‡ุฆูŠุŒ ู…ูˆู† ฺฉูŠ ูพุณู†ุฏ ู†ู‡ ุขุฆูŠ",   # negative
    "ู‡ูŠ ูู„ู… ุชู…ุงู… ุณูบูŠ ู‡ุฆูŠุŒ ู…ูˆู† ฺฉูŠ ุชู…ุงู… ฺฏู‡ฺปูˆ ูพุณู†ุฏ ุขุฆูŠ",  # positive
]

for review in reviews:
    result = pipe(review)[0]
    print(f"Label: {result['label']} | Score: {result['score']:.4f}")

Output

Label: NEGATIVE | Score: 0.9873
Label: POSITIVE | Score: 0.9912

๐Ÿ“ Repository Structure

DanishMahdi/snd_sentiment_analysis/
โ”œโ”€โ”€ config.json                    # Model config
โ”œโ”€โ”€ model.safetensors              # Fine-tuned weights
โ”œโ”€โ”€ tokenizer_config.json          # Tokenizer config
โ”œโ”€โ”€ sentencepiece.bpe.model        # SentencePiece vocab
โ”œโ”€โ”€ evaluation/
โ”‚   โ”œโ”€โ”€ test_metrics.json          # Accuracy, F1, Precision, Recall
โ”‚   โ”œโ”€โ”€ confusion_matrix.json      # Raw confusion matrix
โ”‚   โ”œโ”€โ”€ confusion_matrix.png       # Confusion matrix plot
โ”‚   โ”œโ”€โ”€ training_curves.png        # Loss & F1 over epochs
โ”‚   โ”œโ”€โ”€ test_metrics_bar.png       # Bar chart of metrics
โ”‚   โ””โ”€โ”€ classification_report.txt  # Full sklearn report
โ””โ”€โ”€ README.md

๐Ÿ“ˆ Evaluation Results

See evaluation/test_metrics.json for the latest numbers. Plots are available in the evaluation/ folder.

Metric Score
Accuracy 0.8825
F1 (weighted) 0.8825
Precision (weighted) 0.8828
Recall (weighted) 0.8825

๐Ÿ”— Dataset

The training data is sourced from
DanishMahdi/snd_movies_sentiment_analysis

  • Total rows: ~40,000
  • Positive reviews: ~20,000
  • Negative reviews: ~20,000
  • Split: 80% train / 10% validation / 10% test

๐Ÿ“ Citation

@misc{danish2025snd,
  author    = {Danish Mahdi},
  title     = {Sindhi Movie Sentiment Analysis using XLM-RoBERTa},
  year      = {2026},
  publisher = {Hugging Face},
  url       = {https://huggingface.co/DanishMahdi/snd_sentiment_analysis},
}

โš–๏ธ License

MIT โ€” free to use for research and commercial purposes.

Downloads last month
105
Safetensors
Model size
0.3B params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Dataset used to train DanishMahdi/snd_sentiment_analysis

Evaluation results

  • accuracy on snd_movies_sentiment_analysis
    self-reported
    see evaluation/test_metrics.json
  • f1 on snd_movies_sentiment_analysis
    self-reported
    see evaluation/test_metrics.json