The model does not return timestamps on return_timestamps=True

by alikraken - opened Jun 19, 2025

Jun 19, 2025

Hi,
Great job on fine-tuning the model! I have tried deploying the model using transformers pipeline but had no luck producing timestamped output using return_timestamps parameter in generate_kwargs set to True. It just returns one chunk of transcribed text and timestamps both set to None, like so: {'chunks': [{'timestamp': [None, None], 'text': 'blabla'}]}.

Can you at least point at where could the issue be? Thank you!

alikraken

Jun 19, 2025

By one chunk I mean that the "chunks" key in output dictionary has only one item. Pipeline transcribes audios longer than 30s just fine but produces no timestamps and merges all text segments into one big text.

abilmansplus

Owner Jun 21, 2025

Hi! Thank you for your interest!
I only trained it for the transcription task without timestamps.
Perhaps that is the reason you are unable to get them.

If you want the timestamps, I recommend using a voice-activity-detection (VAD) tool.
This one is really good: https://github.com/wiseman/py-webrtcvad?tab=readme-ov-file
It comes with a usage example: https://github.com/wiseman/py-webrtcvad/blob/master/example.py

If you pass your audio file to VAD, it will output voiced audio segments (i.e., timestamps of audio intervals containing speech)
Then you can transcribe each audio segment one by one
Then you can simply combine the transcripts with time-stamps

I hope it helps,
Thanks!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment