SanskritBERT (Light)
SanskritBERT is a lightweight Transformer model trained specifically for the Sanskrit language. It is based on the BERT architecture and trained using the Masked Language Modeling (MLM) objective.
Model Description
- Shared by: Tanuj Saxena and Soumya Sharma
- Model type: Transformers Encoder (BERT-like)
- Language: Sanskrit
- License: Apache 2.0
- Finetuned from model: None (Trained from scratch)
Model Architecture
- Layers: 6
- Hidden Size: 256
- Attention Heads: 4
- Feedforward Size: 1024
- Max Sequence Length: 512
- Vocab Size: 64,000
- Parameters: ~15M
Intended Uses & Limitations
Intended Uses
- Masked Word Prediction
- Fine-tuning for Sanskrit NLP tasks involves (POS Tagging, NER, Text Classification)
- Research into low-resource language modeling
Limitations
- The model is "Light", so it may not capture as much nuance as a
bert-baseorbert-largemodel. - Performance depends heavily on the domain of the downstream task relative to the pre-training corpus.
Training Data
Trained on a corpus of Sanskrit texts including general literature, wikis, and classical texts.
Training Procedure
- Optimizer: AdamW
- Precision: Mixed Precision (bf16)
- Batch Size: 16
- Epochs: 6
How to Get Started
You can use the model directly with the Hugging Face transformers library:
from transformers import AutoTokenizer, AutoModelForMaskedLM
tokenizer = AutoTokenizer.from_pretrained("tanuj437/SanskritBERT")
model = AutoModelForMaskedLM.from_pretrained("tanuj437/SanskritBERT")
text = "सत्यमेव जयते [MASK]"
inputs = tokenizer(text, return_tensors="pt")
outputs = model(**inputs)
Citation
@misc{sanskritbert2024,
title={SanskritBERT: A Light Transformer Model for Sanskrit},
author={[Tanuj Saxena,Soumya Sharma,Kusum Lata]},
year={2026},
publisher={Hugging Face}
}
- Downloads last month
- 35