Upload tokenizer

by ArthurZ HF Staff - opened Mar 25, 2024

base: refs/heads/main

←

from: refs/pr/4

Discussion Files changed

+439343

-1

Upload tokenizer4f6dc371

ArthurZ

Mar 25, 2024

No description provided.

ehartford

Mar 26, 2024

hello can this be merged?

Jonathan1909

Mar 27, 2024

Hey @ArthurZ , thank you for uploading the tokenizer! That will definitely simply the loading process and improve user experience. We're viewing and checking with @Xenova about this PR (maybe waiting for his/her/their approval as well).

Jonathan1909

Mar 27, 2024

hello can this be merged?

Yes, it will. We're checking with the author who provided a transfomers-compatible tokenizer in discussions several days ago.

ehartford

Mar 27, 2024

Thank you!

Xenova

Mar 27, 2024

Yes you can merge! As mentioned in another post, this tokenizer matches the original on the entire xnli dataset (all languages)! This PR also adds the slow-tokenizer in case a user wants to fallback on it.

Jonathan1909 changed pull request status to merged Mar 28, 2024

Jonathan1909

Mar 28, 2024

Thank you @ArthurZ . I've merged the PR and tested on it. It works pretty well!

Jonathan1909

Mar 28, 2024

hello can this be merged?

Hey @ehartford , the PR has been merged and you can now directly use the following method to load the tokenizer

tokenizer = AutoTokenizer.from_pretrained("hpcai-tech/grok-1", trust_remote_code=True)

If you have downloaded the model as a repository, you might want to use git pull to get the tokenizer updated.

We have also updated usage case in both model card and our example in ColossalAI GitHub Repository.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment