Add converted tokenizer (no trust_remote_code needed)
#17
by
ArthurZ HF Staff - opened
Tokenizer Conversion
This PR adds a converted tokenizer that works without trust_remote_code=True.
Conversion Details
- Converted using:
python scripts/convert_tokenizer.py internlm/internlm2-chat-7b --push-to-hub - Original tokenizer type:
InternLM2TokenizerFast - Converted tokenizer type:
TokenizersBackend
Validation Results
- Tested on XNLI dataset (500 samples)
- All samples match 1-1 β
Usage
from transformers import AutoTokenizer
# Now works without trust_remote_code=True
tokenizer = AutoTokenizer.from_pretrained("internlm/internlm2-chat-7b")
Converted with transformers tokenizer conversion script