remove `<|endoftext|>`

#30
by Jaffe2718 - opened

I noticed that ruby test faild because the model contain the special token <|endoftext|> in its vocabulary, so I use my convert-h5-to-ggml.py convert from openai/whisper-base.en and remove <|endoftext|>. And I suggest that all model should be checked if there is any special token in their vocabulary.

It can transcribe the example audio successfully on my computer (Windows 11) with both CPU and GPU (RTX 3060, CUDA 12.7 or Vulkan)

audio_path: whisper.cpp/samples/jfk.wav
audio_len: 176017 (samples)

whisper_init_from_file_with_params_no_state: loading model from 'ggml-base.en.bin'
whisper_init_with_params_no_state: use gpu    = 1
whisper_init_with_params_no_state: flash attn = 1
whisper_init_with_params_no_state: gpu_device = 0
whisper_init_with_params_no_state: dtw        = 0
register_backend: registered backend CPU (1 devices)
register_device: registered device CPU (12th Gen Intel(R) Core(TM) i7-12700H)
whisper_init_with_params_no_state: devices    = 1
whisper_init_with_params_no_state: backends   = 1
whisper_model_load: loading model
whisper_model_load: n_vocab       = 51864
whisper_model_load: n_audio_ctx   = 1500
whisper_model_load: n_audio_state = 512
whisper_model_load: n_audio_head  = 8
whisper_model_load: n_audio_layer = 6
whisper_model_load: n_text_ctx    = 448
whisper_model_load: n_text_state  = 512
whisper_model_load: n_text_head   = 8
whisper_model_load: n_text_layer  = 6
whisper_model_load: n_mels        = 80
whisper_model_load: ftype         = 1
whisper_model_load: qntvr         = 0
whisper_model_load: type          = 2 (base)
whisper_model_load: n_common_vocab = 50256
whisper_model_load: adding 1608 extra tokens
whisper_model_load: n_langs       = 99
whisper_model_load:          CPU total size =   147.37 MB
whisper_model_load: model size    =  147.37 MB
whisper_backend_init_gpu: device 0: CPU (type: 0)
whisper_backend_init_gpu: no GPU found
whisper_init_state: kv self size  =    6.29 MB
whisper_init_state: kv cross size =   18.87 MB
whisper_init_state: kv pad  size  =    3.15 MB
whisper_init_state: compute buffer (conv)   =   16.28 MB
whisper_init_state: compute buffer (encode) =   23.09 MB
whisper_init_state: compute buffer (cross)  =    4.66 MB
whisper_init_state: compute buffer (decode) =   96.37 MB

init model from ggml-base.en.bin
vocab info:
  n_vocab: 51864
  token_eot_id: 50256
  token_sot_id: 50257
  token_translate_id: 50357
  token_transcribe_id: 50358
  token_solm_id: 50359
  token_prev_id: 50360
  token_nosp_id: 50361
  token_not_id: 50362
  token_beg_id: 50363
audio_samples: 176017
whisper_full.return: 0
n_segments: 1
segment 0
  text:  And so my fellow Americans, ask not what your country can do for you, ask what you can do for your country.
  start: 0
  end: 1100
Jaffe2718 changed pull request status to closed

Sign up or log in to comment