SE-Bridge-TTS Weights
This model repository hosts the public release checkpoints for SE-Bridge-TTS, the project page for the ICML 2026 paper Bridging the Stability-Expressivity Gap: Synthetic Data Scaling and Preference Alignment for Low-Resource Spoken Language Models.
Links
- Project page: https://piedpiperg.github.io/SE-Bridge-TTS/
- GitHub repository: https://github.com/piedpiperG/SE-Bridge-TTS
- arXiv paper: https://arxiv.org/abs/2605.27383
- Hugging Face model repository: https://huggingface.co/isabeth/SE-Bridge-TTS
Hugging Face Classification
- Repository type:
model - Task / pipeline:
text-to-speech - Library:
pytorch - Languages: Thai (
th) and Lao (lo) - Primary tags:
text-to-speech,speech-synthesis,thai,lao,low-resource,spoken-language-model
Files
| File | Description |
|---|---|
thai_tts.pt |
Public Thai TTS checkpoint. |
lao_tts.pt |
Public Lao TTS checkpoint. |
release_config.json |
Sanitized release metadata for the two checkpoints. |
Inference
The released files are CosyVoice2 LLM checkpoints. They are intended to be loaded with a CosyVoice2-compatible checkout and the standard CosyVoice2 base model assets. The base model directory should contain the normal CosyVoice2 configuration and acoustic/vocoder weights, while this repository supplies the Thai or Lao LLM checkpoint.
Install or prepare CosyVoice first:
git clone https://github.com/FunAudioLLM/CosyVoice.git
cd CosyVoice
pip install -r requirements.txt
pip install huggingface_hub torchaudio
Minimal zero-shot inference example:
import sys
from pathlib import Path
import torch
import torchaudio
from huggingface_hub import snapshot_download
sys.path.append("third_party/Matcha-TTS")
from cosyvoice.cli.cosyvoice import CosyVoice2
from cosyvoice.utils.file_utils import load_wav
HF_REPO_ID = "isabeth/SE-Bridge-TTS"
BASE_MODEL_DIR = Path("pretrained_models/CosyVoice2-0.5B")
language = "lao" # choose "thai" or "lao"
checkpoint_name = {
"thai": "thai_tts.pt",
"lao": "lao_tts.pt",
}[language]
weights_dir = Path(snapshot_download(HF_REPO_ID))
checkpoint_path = weights_dir / checkpoint_name
cosyvoice = CosyVoice2(
str(BASE_MODEL_DIR),
load_jit=False,
load_trt=False,
load_vllm=False,
fp16=False,
)
state_dict = torch.load(checkpoint_path, map_location="cpu")
cosyvoice.model.llm.load_state_dict(state_dict, strict=False)
prompt_speech_16k = load_wav("prompt.wav", 16000)
prompt_text = "Transcript of prompt.wav."
tts_text = "Text to synthesize in the selected language."
for idx, output in enumerate(
cosyvoice.inference_zero_shot(
tts_text,
prompt_text,
prompt_speech_16k,
stream=False,
)
):
torchaudio.save(
f"se_bridge_tts_{language}_{idx}.wav",
output["tts_speech"],
cosyvoice.sample_rate,
)
For cross-lingual prompting, use the same loaded model and replace the generation loop with:
for idx, output in enumerate(
cosyvoice.inference_cross_lingual(
tts_text,
prompt_speech_16k,
stream=False,
)
):
torchaudio.save(
f"se_bridge_tts_{language}_cross_lingual_{idx}.wav",
output["tts_speech"],
cosyvoice.sample_rate,
)
Release Notes
This release package has been sanitized for public distribution. Internal server paths, private data paths, training-stage names, and operational configuration details are intentionally omitted. The repository does not describe per-stage checkpoint construction methods.