remove `<|endoftext|>`
#30
by
Jaffe2718
- opened
I noticed that ruby test faild because the model contain the special token <|endoftext|> in its vocabulary, so I use my convert-h5-to-ggml.py convert from openai/whisper-base.en and remove <|endoftext|>. And I suggest that all model should be checked if there is any special token in their vocabulary.
It can transcribe the example audio successfully on my computer (Windows 11) with both CPU and GPU (RTX 3060, CUDA 12.7 or Vulkan)
audio_path: whisper.cpp/samples/jfk.wav
audio_len: 176017 (samples)
whisper_init_from_file_with_params_no_state: loading model from 'ggml-base.en.bin'
whisper_init_with_params_no_state: use gpu = 1
whisper_init_with_params_no_state: flash attn = 1
whisper_init_with_params_no_state: gpu_device = 0
whisper_init_with_params_no_state: dtw = 0
register_backend: registered backend CPU (1 devices)
register_device: registered device CPU (12th Gen Intel(R) Core(TM) i7-12700H)
whisper_init_with_params_no_state: devices = 1
whisper_init_with_params_no_state: backends = 1
whisper_model_load: loading model
whisper_model_load: n_vocab = 51864
whisper_model_load: n_audio_ctx = 1500
whisper_model_load: n_audio_state = 512
whisper_model_load: n_audio_head = 8
whisper_model_load: n_audio_layer = 6
whisper_model_load: n_text_ctx = 448
whisper_model_load: n_text_state = 512
whisper_model_load: n_text_head = 8
whisper_model_load: n_text_layer = 6
whisper_model_load: n_mels = 80
whisper_model_load: ftype = 1
whisper_model_load: qntvr = 0
whisper_model_load: type = 2 (base)
whisper_model_load: n_common_vocab = 50256
whisper_model_load: adding 1608 extra tokens
whisper_model_load: n_langs = 99
whisper_model_load: CPU total size = 147.37 MB
whisper_model_load: model size = 147.37 MB
whisper_backend_init_gpu: device 0: CPU (type: 0)
whisper_backend_init_gpu: no GPU found
whisper_init_state: kv self size = 6.29 MB
whisper_init_state: kv cross size = 18.87 MB
whisper_init_state: kv pad size = 3.15 MB
whisper_init_state: compute buffer (conv) = 16.28 MB
whisper_init_state: compute buffer (encode) = 23.09 MB
whisper_init_state: compute buffer (cross) = 4.66 MB
whisper_init_state: compute buffer (decode) = 96.37 MB
init model from ggml-base.en.bin
vocab info:
n_vocab: 51864
token_eot_id: 50256
token_sot_id: 50257
token_translate_id: 50357
token_transcribe_id: 50358
token_solm_id: 50359
token_prev_id: 50360
token_nosp_id: 50361
token_not_id: 50362
token_beg_id: 50363
audio_samples: 176017
whisper_full.return: 0
n_segments: 1
segment 0
text: And so my fellow Americans, ask not what your country can do for you, ask what you can do for your country.
start: 0
end: 1100
Jaffe2718
changed pull request status to
closed