Inconsistent number transcription - often letters instead of digits (french language)
#34
by poulpor - opened
Hi, I'm trying out parakeet-tdt-0.6b-v3 as a STT for Home Assistant.
For voice commands like "do this at 11:30" or "set volume to 30%", it's crucial to get digits (not letters).
I noticed inconsistent French number transcription - often letters instead of digits:
- "09:30" → "neuf heures trente"
- 2nd try → same issue (see logs)
Is there a way to force digit output?
I suspect it's related to Inverse Text Normalization (ITN), but I don't understand why it works most of the time but fails sometimes.
https://github.com/NVIDIA/NeMo-text-processing/blob/main/tutorials/Text_(Inverse)_Normalization.ipynb
INFO:wyoming_onnx_asr.handler:Language requested: fr
INFO:wyoming_onnx_asr.handler:Available models: ['multi']
INFO:wyoming_onnx_asr.handler:Selected multilingual model for language 'fr'
INFO:wyoming_onnx_asr.handler:Starting transcription with model for language 'fr'
INFO:wyoming_onnx_asr.handler:Transcription completed successfully for language 'fr'
INFO:wyoming_onnx_asr.handler:fr:Rappelle-moi de décongeler le poisson à neuf heures trente.
INFO:wyoming_onnx_asr.handler:Language requested: fr
INFO:wyoming_onnx_asr.handler:Available models: ['multi']
INFO:wyoming_onnx_asr.handler:Selected multilingual model for language 'fr'
INFO:wyoming_onnx_asr.handler:Starting transcription with model for language 'fr'
INFO:wyoming_onnx_asr.handler:Transcription completed successfully for language 'fr'
INFO:wyoming_onnx_asr.handler:fr:Rappelle-moi de décongeler le poisson à neuf heures trente.
INFO:__main__:Loading multilingual model nemo-parakeet-tdt-0.6b-v3, None ...
INFO:__main__:Ready
INFO:__main__:Loading multilingual model nemo-parakeet-tdt-0.6b-v3, None ...
INFO:__main__:Ready
INFO:wyoming_onnx_asr.handler:Language requested: fr
INFO:wyoming_onnx_asr.handler:Available models: ['multi']
INFO:wyoming_onnx_asr.handler:Selected multilingual model for language 'fr'
INFO:wyoming_onnx_asr.handler:Starting transcription with model for language 'fr'
INFO:wyoming_onnx_asr.handler:Transcription completed successfully for language 'fr'
INFO:wyoming_onnx_asr.handler:fr:Rappelle-moi de décongeler un poisson à 9h32.