remove `<|endoftext|>`
Browse filesI noticed that [ruby test](https://github.com/Jaffe2718/whisper.cpp/actions/runs/20052653983/job/57511670231) faild because the model contain the special token `<|endoftext|>` in its vocabulary, so I use my [convert-h5-to-ggml.py](https://github.com/Jaffe2718/whisper.cpp/blob/master/models/convert-h5-to-ggml.py) convert from [openai/whisper-base.en](https://huggingface.co/openai/whisper-base.en) and remove `<|endoftext|>`. And I suggest that all model should be checked if there is any special token in their vocabulary.
- ggml-base.en.bin +2 -2
ggml-base.en.bin
CHANGED
|
@@ -1,3 +1,3 @@
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:
|
| 3 |
-
size
|
|
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:e111a865d56afc4adf1379a2028544b3275dd25839c460953ed9a126632dcda2
|
| 3 |
+
size 147964194
|