Breeze Taigi: Benchmarks and Models for Taiwanese Hokkien Speech Recognition and Synthesis
Paper • 2603.19259 • Published • 2
This is a faster-whisper compatible conversion of MediaTek-Research/Breeze-ASR-26, converted to CTranslate2 format with float16 quantization.
BreezeASR-Taigi is a Taiwanese Hokkien (Taigi / 台語) automatic speech recognition (ASR) model developed as part of the Breeze Taigi framework. It is fine-tuned from openai/whisper-large-v2 on approximately 10,000 hours of large-scale synthetic Taiwanese Hokkien speech data. The model transcribes spoken Taigi audio and outputs Mandarin Chinese character transcriptions.
| Property | Value |
|---|---|
| Source model | MediaTek-Research/Breeze-ASR-26 |
| Architecture | Whisper Large V2 |
| Quantization | float16 |
| Model size | ~2.9 GB (vs ~6.2 GB float32 original) |
| CTranslate2 version | 4.7.1 |
Run inference:
from faster_whisper import WhisperModel
model = WhisperModel("MediaTek-Research/Breeze-ASR-26-ct2", device="cuda", compute_type="float16")
# For CPU: WhisperModel("...", device="cpu", compute_type="int8")
segments, info = model.transcribe("audio.wav", language="zh", task="transcribe")
for segment in segments:
print(f"[{segment.start:.2f}s -> {segment.end:.2f}s] {segment.text}")
Note: This model outputs Mandarin Chinese characters (not Taigi orthography / 台語正字). Pass
language="zh"explicitly to avoid language detection overhead.
@misc{lan2026breezetaigibenchmarksmodels,
title={Breeze Taigi: Benchmarks and Models for Taiwanese Hokkien Speech Recognition and Synthesis},
author={Yu-Siang Lan and Chia-Sheng Liu and Yi-Chang Chen and Po-Chun Hsu and Allyson Chiu and Shun-Wen Lin and Da-shan Shiu and Yuan-Fu Liao},
year={2026},
eprint={2603.19259},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2603.19259},
}
Apache 2.0, same as the original model.
Base model
openai/whisper-large-v2