The model does not return timestamps on return_timestamps=True

#1
by alikraken - opened

Hi,
Great job on fine-tuning the model! I have tried deploying the model using transformers pipeline but had no luck producing timestamped output using return_timestamps parameter in generate_kwargs set to True. It just returns one chunk of transcribed text and timestamps both set to None, like so: {'chunks': [{'timestamp': [None, None], 'text': 'blabla'}]}.

Can you at least point at where could the issue be? Thank you!

By one chunk I mean that the "chunks" key in output dictionary has only one item. Pipeline transcribes audios longer than 30s just fine but produces no timestamps and merges all text segments into one big text.

Hi! Thank you for your interest!
I only trained it for the transcription task without timestamps.
Perhaps that is the reason you are unable to get them.

If you want the timestamps, I recommend using a voice-activity-detection (VAD) tool.
This one is really good: https://github.com/wiseman/py-webrtcvad?tab=readme-ov-file
It comes with a usage example: https://github.com/wiseman/py-webrtcvad/blob/master/example.py

If you pass your audio file to VAD, it will output voiced audio segments (i.e., timestamps of audio intervals containing speech)
Then you can transcribe each audio segment one by one
Then you can simply combine the transcripts with time-stamps

I hope it helps,
Thanks!

Sign up or log in to comment