JacobLinCool
/

TEA-ASR-1-mini

Automatic Speech Recognition

taiwan-mandarin

traditional-chinese

Model card Files Files and versions

JacobLinCool commited on 7 days ago

Commit

689b043

·

verified ·

1 Parent(s): eab73e7

model card: fresh-eval numbers + protocol

Files changed (1) hide show

README.md +2 -1

README.md CHANGED Viewed

@@ -136,7 +136,8 @@ allocated during inference.
   ASCEND, NTUML2021), with general + code-switch **replay** to preserve the base model's broad and bilingual
   ability. The audio encoder is left frozen.
 - **Localization**: Traditional-script + Taiwan-lexicon output is rendered through the model's **own tokenizer**
-  (the surface mapping is baked once at build time); there is **no OpenCC or string rewriting at inference**.
 - **Packaging**: the adapter is **merged** into the base and the localized tokenizer is shipped with it, so the
   release is a single drop-in checkpoint that loads like stock Qwen3-ASR.
 - **Decoding tip**: pass `language="Chinese"` for Taiwan speech; this also prevents translation-style outputs on

   ASCEND, NTUML2021), with general + code-switch **replay** to preserve the base model's broad and bilingual
   ability. The audio encoder is left frozen.
 - **Localization**: Traditional-script + Taiwan-lexicon output is rendered through the model's **own tokenizer**
+  (the surface mapping is baked once at build time); there is **no post-processing at inference** — the
+  Traditional output comes straight from the model's own tokenizer decode.
 - **Packaging**: the adapter is **merged** into the base and the localized tokenizer is shipped with it, so the
   release is a single drop-in checkpoint that loads like stock Qwen3-ASR.
 - **Decoding tip**: pass `language="Chinese"` for Taiwan speech; this also prevents translation-style outputs on