Allow single quotes "'" and hyphens "-"

by sanchit-gandhi - opened Nov 24, 2022

base: refs/heads/main

←

from: refs/pr/4

Discussion Files changed

-2

sanchit-gandhi

Nov 24, 2022

•

edited Nov 24, 2022

Remove single quotes ' (id 6) and hyphens - (id 12) from suppress_tokens. These tokens should not be suppressed during generation. They are accepted as valid generated tokens in the official Whisper repo:
https://github.com/openai/whisper/blob/eff383b27b783e280c089475852ba83f20f64998/whisper/tokenizer.py#L258

Check that we're removing the right tokens:

from transformers import WhisperTokenizer

tokenizer = WhisperTokenizer.from_pretrained("openai/whisper-small")

print(tokenizer.decode(6))
print(tokenizer.decode(12))

Print Output:
```
'

Allow single quotes "'" and hyphens "-"b59b9405

ArthurZ changed pull request status to merged Nov 29, 2022

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Allow single quotes "'" and hyphens "-"

Print Output:```'

Print Output:
```
'