Fix tokenizer compatibility with newer transformers versions
#5
by
Alonadoli
- opened
Issue: A community member reported the error: "Exception: data did not match any variant of untagged enum PyPreTokenizerTypeWrapper at line 88 column 3" when trying to load the model with newer versions of the transformers library.
Root cause: The field prepend_scheme: "always" is deprecated and not recognized by the Rust tokenizers backend in newer library versions.
Solution: Updated tokenizer.json to resolve compatibility issues with recent versions of the tokenizers library.
Changes:
Line 84: Changed "prepend_scheme": "always" to "add_prefix_space": true in pre_tokenizer
Line 167: Changed "prepend_scheme": "always" to "add_prefix_space": true in decoder