YAML Metadata
Warning:
empty or missing yaml metadata in repo card
(https://huggingface.co/docs/hub/model-cards#model-card-metadata)
iast_wyile_combined_final
This repository is public.
Overview
This repository hosts the continual-pretrained bert-base-multilingual-cased
checkpoint focused on Sanskrit written both in Devanagari and the Wylie
transliteration scheme. The training corpus merges Sanskrit material aligned to
Intellexus' downstream tasks with the combined Wylie/Sanskrit dataset curated in
sandbox/sanskrit_wyele_combined.
Model Details
- Base model:
bert-base-multilingual-cased - Objective: Masked Language Modeling
- Tokenizer: WordPiece / BPE compatible with mBERT
- Training regime: continual pretraining on mixed Sanskrit + Wylie sentences
Quick Start
from transformers import AutoTokenizer, AutoModelForMaskedLM
repo_id = "Intellexus/iast_wyile_combined_final"
tokenizer = AutoTokenizer.from_pretrained(repo_id)
model = AutoModelForMaskedLM.from_pretrained(repo_id)
Intended Use & Limitations
Designed for research tasks on Sanskrit corpora (Unicode + transliterated). It inherits any biases from mBERT and the Intellexus Sanskrit datasets. Evaluate on your downstream tasks before deployment.
Citation
Please cite Intelexus or link back to https://huggingface.co/Intellexus/iast_wyile_combined_final when you use this model.
- Downloads last month
- 4
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support