Fix tokenizer compatibility with newer transformers versions

#5
by Alonadoli - opened

Issue: A community member reported the error: "Exception: data did not match any variant of untagged enum PyPreTokenizerTypeWrapper at line 88 column 3" when trying to load the model with newer versions of the transformers library.

Root cause: The field prepend_scheme: "always" is deprecated and not recognized by the Rust tokenizers backend in newer library versions.

Solution: Updated tokenizer.json to resolve compatibility issues with recent versions of the tokenizers library.

Changes:
Line 84: Changed "prepend_scheme": "always" to "add_prefix_space": true in pre_tokenizer
Line 167: Changed "prepend_scheme": "always" to "add_prefix_space": true in decoder

Ready to merge
This branch is ready to get merged automatically.

Sign up or log in to comment