Sanskrit POS Tagger
This model is a fine-tuned BERT/ALBERT model for Sanskrit Parts-of-Speech (POS) tagging.
Model Description
- Fine-tuning Data: Sanskrit POS dataset (Universal Dependencies format)
Intended Use
This model is intended to be used for linguistic analysis of Sanskrit text, specifically for identifying grammatical categories of words (Nouns, Verbs, etc.).
Performance
- Accuracy: ~89.7%
- F1 Score: ~89.6%
Usage
from transformers import pipeline, AutoTokenizer, AutoModelForTokenClassification
tokenizer = AutoTokenizer.from_pretrained("tanuj437/sanskrit-pos-bert")
model = AutoModelForTokenClassification.from_pretrained("tanuj437/sanskrit-pos-bert")
nlp = pipeline("token-classification", model=model, tokenizer=tokenizer, aggregation_strategy="simple")
result = nlp("रामः वनम् गच्छति")
print(result)
Limitations
The model may perform less accurately on poetry (shlokas) with complex sandhi splits if not pre-segmented, although the tokenizer attempts to handle subwords.
Citation
@misc{sanskrit-pos-bert,
author = {Tanuj Saxena, Soumya Sharma},
title = {Sanskrit POS Tagger using BERT},
year = {2025},
publisher = {Hugging Face},
howpublished = {\url{https://huggingface.co/tanuj437/sanskrit-pos-bert}}
}
- Downloads last month
- 4
Model tree for tanuj437/sanskrit-bert-pos
Base model
tanuj437/SanskritBERT