You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

iast_wyile_combined_final

This repository is public.

Overview

This repository hosts the continual-pretrained bert-base-multilingual-cased checkpoint focused on Sanskrit written both in Devanagari and the Wylie transliteration scheme. The training corpus merges Sanskrit material aligned to Intellexus' downstream tasks with the combined Wylie/Sanskrit dataset curated in sandbox/sanskrit_wyele_combined.

Model Details

Base model: bert-base-multilingual-cased
Objective: Masked Language Modeling
Tokenizer: WordPiece / BPE compatible with mBERT
Training regime: continual pretraining on mixed Sanskrit + Wylie sentences

Quick Start

from transformers import AutoTokenizer, AutoModelForMaskedLM

repo_id = "Intellexus/iast_wyile_combined_final"
tokenizer = AutoTokenizer.from_pretrained(repo_id)
model = AutoModelForMaskedLM.from_pretrained(repo_id)

Intended Use & Limitations

Designed for research tasks on Sanskrit corpora (Unicode + transliterated). It inherits any biases from mBERT and the Intellexus Sanskrit datasets. Evaluate on your downstream tasks before deployment.

Citation

Please cite Intelexus or link back to https://huggingface.co/Intellexus/iast_wyile_combined_final when you use this model.

Downloads last month: -

Safetensors

Model size

0.1B params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support