UR-BERT: Scaling Text Encoders for Massively Multilingual TTS Through Universal Romanization and Speech Token Prediction
Paper • 2606.11681 • Published
YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
This repository provides the URBERT backbone as a Hugging Face transformers model.
The uploaded checkpoint is a BERT encoder trained in the URBERT pipeline with character-level uroman tokenization.
AutoModel)bert-base-uncased config familyAutoTokenizerimport torch
from transformers import AutoModel, AutoTokenizer
REPO_ID = "Sanghyang00/urbert-256"
device = "cuda" if torch.cuda.is_available() else "cpu"
tokenizer = AutoTokenizer.from_pretrained(REPO_ID, force_download=True)
model = AutoModel.from_pretrained(REPO_ID).to(device).eval()
text = "hello urbert"
inputs = tokenizer(text, add_special_tokens=False, return_tensors="pt")
inputs = {k: v.to(device) for k, v in inputs.items()}
with torch.no_grad():
outputs = model(**inputs)
last_hidden = outputs.last_hidden_state
print("input_ids:", inputs["input_ids"].tolist())
print("input shape:", tuple(inputs["input_ids"].shape))
print("last_hidden shape:", tuple(last_hidden.shape))
AutoTokenizer."[MASK]" is treated as one special token by HF tokenizer.For training code, data processing, and experiment setup, please refer to:
If you use this model in your research or applications, please cite:
@article{lee2026urbert,
title = {UR-BERT: Scaling Text Encoders for Massively Multilingual TTS Through Universal Romanization and Speech Token Prediction},
author = {Lee, Sangmin and Ahn, Eekgyun and Choi, Woongjib and Kang, Hong-Goo},
journal = {arXiv preprint arXiv:2606.11681},
year = {2026}
}