You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

iast_wyile_combined_final

This repository is public.

Overview

This repository hosts the continual-pretrained bert-base-multilingual-cased checkpoint focused on Sanskrit written both in Devanagari and the Wylie transliteration scheme. The training corpus merges Sanskrit material aligned to Intellexus' downstream tasks with the combined Wylie/Sanskrit dataset curated in sandbox/sanskrit_wyele_combined.

Model Details

  • Base model: bert-base-multilingual-cased
  • Objective: Masked Language Modeling
  • Tokenizer: WordPiece / BPE compatible with mBERT
  • Training regime: continual pretraining on mixed Sanskrit + Wylie sentences

Quick Start

from transformers import AutoTokenizer, AutoModelForMaskedLM

repo_id = "Intellexus/iast_wyile_combined_final"
tokenizer = AutoTokenizer.from_pretrained(repo_id)
model = AutoModelForMaskedLM.from_pretrained(repo_id)

Intended Use & Limitations

Designed for research tasks on Sanskrit corpora (Unicode + transliterated). It inherits any biases from mBERT and the Intellexus Sanskrit datasets. Evaluate on your downstream tasks before deployment.

Citation

Please cite Intelexus or link back to https://huggingface.co/Intellexus/iast_wyile_combined_final when you use this model.

Downloads last month
4
Safetensors
Model size
0.1B params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support