Is it possible to get numbers in digit format instead of text like the original Whisper does?
Hi, I really like this french model, thank you for this but is there a way to get numbers in digits instead of words?
For example, I test your model as an STT for Home Assistant and it needs to get time in "12:15" or "12h15" format, instead of "twelve hours fifteen minutes" (in french, "douze heures quinze minutes") to trigger automations.
Your french finetuned model is not the only one in this situation, but I don't understand why I get numerical values right with original Whisper and faster Whisper tiny/base/small models but not with finetuned variations, in my native tongue :(
I use your model in a docker container:
services:
wyoming-whisper:
ports:
- 10300:10300
volumes:
- /var/lib/docker/volumes/ha-whisper/data:/data
image: rhasspy/wyoming-whisper
command: --stt-library transformers --model deepdml/whisper-small-mix-fr --language fr --debug --beam-size 2
restart: unless-stopped
Hi,
Thanks a lot for your feedback and for testing the French model with Home Assistant
The reason you’re getting times as words (“douze heures quinze minutes”) instead of digits (“12:15” or “12h15”) is related to the dataset I used for fine-tuning.
Most of the transcripts in this dataset write numbers as words, so during fine-tuning the model learns that this is the “correct” style and it tends to overwrite the behaviour of the original Whisper models, which have seen more mixed styles (including numeric formats like 12:15).
There are a few possible workarounds:
Add a post-processing step that converts French number words (e.g. “douze heures quinze”) into a numeric time format (“12:15”). This is usually the most robust option for Home Assistant, because you can enforce exactly the format you need.
Try to use an initial prompt at decoding time telling the model to always use digits for numbers and to format times as “12:15” or “12h15”. This can help, although it’s not always perfect.
Fine-tune again with an additional small dataset where numbers (especially times) are always written in digits, so the model learns this style explicitly.
So the behaviour you’re seeing is expected given the fine-tuning data, but it can be fixed on top of the model with a numeric normalisation layer or with an additional small fine-tuning pass focused on times and digits.
If you’re interested, I can share some ideas or code snippets for a French number-to-digits post-processor for Home Assistant.
Hi,
thank you very much for your detailed answer, I suppose it’s obvious I’m a newcomer to this community and LLM, so I still have a lot to learn, and I have to admit it’s overwhelming!
I already tried to pass some initial prompt, like you suggested as a second possibility, but without succes, surely because it was badly formated due to my inexperience.
Regarding post-processing, I also found some "ready to run" python scripts online, but wasn't able to understand how to add this process to my docker setup...
So, I was about to learn how to finetune, when I stumbled upon another finetuned whisper french model, keypa/whisper-3-mls-fr, and I was really pleased to discover this user kept the original Whisper behaviour regarding numeric formats, because he also needed this for assistants!
After some very limited tests, it's doing fine with Home Assistant and pass hours in HHhMM format (12h15) , numbers are also written in digits!
So, for now, I would say this is the only finetuned model available in french with numbers in digits, I was starting to believe I was the one at fault, that it was an impossible combination! :)
So, again, thank you very much for your help and expertise, it's very comforting to get spontanous and kind support when you're alone in the dark!