BlueMagpie-TTS
Private BlueMagpie-TTS checkpoint for internal research and evaluation.
This repository contains the inference artifact only: model weights, AudioVAE weights, tokenizer files, config, a default Hung-yi Lee speaker centroid, and usage documentation. It does not include optimizer state, scheduler state, training logs, local configs, or training-data metadata.
Intended Use
- Mandarin and mixed Mandarin/English text-to-speech evaluation.
- Internal experiments with reference-audio prompting, continuation prompting, and optional speaker-centroid conditioning.
- Do not redistribute the checkpoint or generated speech unless rights and consent are cleared for the intended use.
Install
git clone https://github.com/OpenFormosa/BlueMagpie-TTS
cd BlueMagpie-TTS
pip install -e ".[train]"
pip install soundfile huggingface_hub
Quick Start
import os
from huggingface_hub import snapshot_download
import soundfile as sf
import torch
from transformers import PreTrainedTokenizerFast
from bluemagpie import BlueMagpieModel
model_dir = snapshot_download("OpenFormosa/BlueMagpie-TTS", token=True)
# load tokenizer from tokenizer.json (works with newer transformers 5.x)
tokenizer = PreTrainedTokenizerFast(tokenizer_file=os.path.join(model_dir, "tokenizer.json"))
model = BlueMagpieModel.from_local(model_dir, tokenizer=tokenizer, training=False, device="cuda")
centroids = torch.load(
f"{model_dir}/checkpoints/hung_yi_lee_speaker_centroids.pt",
map_location="cpu",
weights_only=True,
)
speaker_centroid = centroids["centroids"][centroids["speaker_ids"].index("hung_yi_lee")]
audio = model.generate(
target_text="這是 AI TTS code switching 測試。",
cfg_value=2.8,
inference_timesteps=9,
max_len=2000,
retry_badcase=True,
speaker_centroid=speaker_centroid,
)
sf.write("sample.wav", audio.detach().cpu().numpy(), model.sample_rate)
Reference Audio Prompting
audio = model.generate(
target_text="今天的會議改到下午三點。",
reference_wav_path="reference_speaker.wav",
cfg_value=2.8,
inference_timesteps=9,
)
Only use reference audio from speakers you have permission to synthesize.
Recommended Defaults
The current recommended defaults are also recorded in config.json under
generation_defaults and in release_metadata.json under
recommended_generation_defaults.
cfg_value=2.8inference_timesteps=9max_len=2000retry_badcase=True- default speaker centroid:
checkpoints/hung_yi_lee_speaker_centroids.pt(speaker_id="hung_yi_lee", source datasetvoidful/hung-yi_lee)
The bundled hung_yi_lee speaker centroid is included with the speaker's
permission for use as an example voice. For any other speaker, obtain that
speaker's authorization before synthesizing.
These defaults were selected on /home/voidful/tts_hard_sentences_zh_500.txt
using MediaTek-Research/Breeze-ASR-25 with normalized CER. The best trial was
hy_cfg2p8_steps9: CER 0.09669792733863977, TER 0.0911015155363644,
with 1227/12689 character errors.
Long Text
For long-form synthesis, split text into sentence-sized chunks and concatenate
the generated waveforms. For stronger continuity, pass a short approved prompt
clip with prompt_text and prompt_wav_path, then synthesize the next chunk.
Evaluation
Numbers below are from an internal held-out evaluation set. The eval set and training data are intentionally not described in this private model card.
| System | Setting | CER | WER |
|---|---|---|---|
| BlueMagpie-TTS | selected checkpoint | 4.81% | 5.36% |
| Reference baseline | same internal eval | 11.45% | 14.83% |
Selected checkpoint speed diagnostics on the same internal eval:
| Metric | Value |
|---|---|
| Median duration-units/sec | 4.748 |
| Max duration-units/sec | 5.288 |
Limitations
- Metrics are not a public benchmark and should be used only for internal model selection.
- Speaker similarity depends on the quality and rights-cleared status of the supplied reference audio or centroid.
- Very long passages should be chunked to avoid stop-token and prosody drift.
- Generated speech may be incorrect; do not use it as a real-world notification without human review.
Files
pytorch_model.bin: BlueMagpie model weights.audiovae.pth: AudioVAE weights.config.json: BlueMagpie architecture/runtime config.tokenizer.json,tokenizer_config.json: tokenizer files.checkpoints/hung_yi_lee_speaker_centroids.pt: default speaker centroid table used by the recommended defaults.USAGE.md: expanded usage guide.
- Downloads last month
- 4