Jaffe2718 commited on
Commit
782613c
·
verified ·
1 Parent(s): 5359861

remove `<|endoftext|>`

Browse files

I noticed that [ruby test](https://github.com/Jaffe2718/whisper.cpp/actions/runs/20052653983/job/57511670231) faild because the model contain the special token `<|endoftext|>` in its vocabulary, so I use my [convert-h5-to-ggml.py](https://github.com/Jaffe2718/whisper.cpp/blob/master/models/convert-h5-to-ggml.py) convert from [openai/whisper-base.en](https://huggingface.co/openai/whisper-base.en) and remove `<|endoftext|>`. And I suggest that all model should be checked if there is any special token in their vocabulary.

Files changed (1) hide show
  1. ggml-base.en.bin +2 -2
ggml-base.en.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:a03779c86df3323075f5e796cb2ce5029f00ec8869eee3fdfb897afe36c6d002
3
- size 147964211
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e111a865d56afc4adf1379a2028544b3275dd25839c460953ed9a126632dcda2
3
+ size 147964194