Oleslav-Ivan Boychuk
oleslav
Ā·
AI & ML interests
All AI & ML Staff
Recent Activity
reacted
to
sanchit-gandhi's
post with ā¤ļø about 21 hours ago
Why does returning timestamps help Whisper reduce hallucinations? š§
Empirically, most practitioners have found that setting `return_timestamps=True` helps reduce hallucinations, particularly when doing long-form evaluation with Transformersā āchunkedā algorithm.
But why does this work?..
My interpretation is that forcing the model to predict timestamps is contradictory to hallucinations. Suppose you have the transcription:
```markdown
The cat sat on the on the on the mat.
```
Where we have a repeated hallucination for āon theā. If we ask the model to predict timestamps, then the āon theā has to contribute to the overall segment-level timing, e.g.:
```markdown
<|0.00|> The cat sat on the on the on the mat.<|5.02|>
```
However, itās impossible to fit 3 copies of āon theā within the time allocation given to the segment, so the probability for this hallucinatory sequence becomes lower, and the model actually predicts the correct transcription with highest probability:
```markdown
<|0.00|> The cat sat on the mat.<|5.02|>
```
In this sense, the end timestamp is of the opposite of the initial timestamp constraint they describe in Section 4.5 of the paper https://huggingface.co/papers/2212.04356 ā it helps the model remove extra words at the end of the sequence (rather than the initial timestamp which helps when the model ignores words at the start), but the overall principle is the same (using timestamps to improve the probability of more realistic sequences).
Leaving it open to you: why do you think timestamps reduces Whisper hallucinations? new activity
about 22 hours ago
mistralai/Voxtral-Mini-4B-Realtime-2602:Optimize GPU KV cache memory usage new activity
6 days ago
Qwen/Qwen3-ASR-0.6B:streaming