You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

sanskrit-bert-from-scratch

This is a BERT model that has been pre-trained from scratch on a large corpus of transliterated Sanskrit text. Unlike multilingual models, its vocabulary and weights are tailored specifically to the Sanskrit language as represented in the training data.

This model was trained as part of the Intelexsus project. The other model, which was continually trained from bert-base-multilingual-cased, can be found here: OMRIDRORI/mbert-sanskrit-continual.

Model Details

  • Model type: BERT (bert-base-uncased architecture)
  • Language: Sanskrit (sa)
  • Training Corpus: A custom corpus of transliterated Sanskrit text collected for the Intelexsus project.
  • Training objective: Masked Language Modeling (MLM).
  • Architecture: 12-layer, 768-hidden, 12-heads.

How to Use

You can use this model directly with the transformers library for the fill-mask task.

from transformers import pipeline

# Use your Hugging Face username and model name
model_name = "OMRIDRORI/sanskrit-bert-from-scratch"
unmasker = pipeline('fill-mask', model=model_name)

# Example sentence in IAST transliteration
# "The great sage spoke the following words: ___"
result = unmasker("sa maharṣir uvāca anena [MASK] vacanena")

print(result)

You can also load the model and tokenizer directly for more control:

from transformers import AutoTokenizer, AutoModelForMaskedLM

# Use your Hugging Face username and model name
model_name = "OMRIDRORI/sanskrit-bert-from-scratch"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForMaskedLM.from_pretrained(model_name)

# You can now use the model for your own fine-tuning and inference tasks.
Downloads last month
2
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support