Sindhi Sentiment Analysis Model

A text classification model that detects positive, negative, and neutral sentiment in Sindhi language text. This is one of the first publicly available sentiment analysis models for the Sindhi language on Hugging Face.

Model Description

This model was trained on a custom Sindhi sentiment dataset collected from Sindhi newspaper corpora. It classifies Sindhi text into three sentiment categories:

  • โœ… Positive
  • โŒ Negative
  • ๐Ÿ˜ Neutral

Model Details

Property Details
Language Sindhi (sd)
Script Arabic (Nastaliq)
Task Sentiment Analysis / Text Classification
Labels Positive, Negative, Neutral
License MIT
Developer Ali Nawaz
Institution Shaikh Ayaz University

Training Data

Trained on the Sindhi Sentiment Analysis Dataset โ€” a dataset of 1,898 sentences in Sindhi collected from Sindhi newspaper corpora using a semi-supervised pipeline, with manual verification.

Column Description
Sindhi Text Original Sindhi sentence
English Translation English translation
Sentiment Label: Positive / Negative / Neutral
Source Newspaper/corpus source
Verified Manual verification status

How to Use

from transformers import pipeline

classifier = pipeline("text-classification", model="alinawazmahar/sindhi-sentiment")
result = classifier("ู‡ูŠ ฺชุชุงุจ ุชู…ุงู… ุณูบูˆ ุขู‡ูŠ")
print(result)
# [{'label': 'Positive', 'score': 0.95}]

Or load manually:

from transformers import AutoTokenizer, AutoModelForSequenceClassification

tokenizer = AutoTokenizer.from_pretrained("alinawazmahar/sindhi-sentiment")
model = AutoModelForSequenceClassification.from_pretrained("alinawazmahar/sindhi-sentiment")

Live Demo

Try the model interactively on the Hugging Face Space:
๐Ÿ‘‰ alinawazmahar/sindhi-sentiment (Space)

Intended Use

  • Sentiment analysis of Sindhi news articles
  • Social media monitoring in Sindhi
  • NLP research on low-resource South Asian languages
  • Educational and academic research

Limitations

  • Trained on newspaper text; may perform differently on informal/social media Sindhi
  • Dataset size is relatively small (1,898 sentences)
  • Roman Sindhi (Latin script) is not supported โ€” Arabic script only

Citation

If you use this model or dataset in your research, please cite:

@misc{alinawaz2025sindhi,
  author = {Ali Nawaz},
  title  = {Sindhi Sentiment Analysis Model},
  year   = {2025},
  publisher = {Hugging Face},
  url    = {https://huggingface.co/alinawazmahar/sindhi-sentiment},
  institution = {Shaikh Ayaz University}
}

Acknowledgements

Dataset collected from Sindhi newspaper corpora. Developed as part of NLP research at Shaikh Ayaz University.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Dataset used to train alinawazmahar/sindhi-sentiment