Facing error when loading tokenizer using AutoTokenizer

by alirezashmo - opened Aug 15, 2024

Aug 15, 2024

Hi! I'm facing an issue when trying to load tokenizer:

Traceback (most recent call last):
  File "testing_persian_mistral.py", line 54, in <module>
    tokenizer = AutoTokenizer.from_pretrained("aidal/Persian-Mistral-7B")
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "xxxxxxxxxxxxxxxxxxxxxxx\Lib\site-packages\transformers\models\auto\tokenization_auto.py", line 916, in from_pretrained
    return tokenizer_class_fast.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "xxxxxxxxxxxxxxxxxxxxxxx\Lib\site-packages\transformers\tokenization_utils_base.py", line 2255, in from_pretrained
    raise EnvironmentError(
OSError: Can't load tokenizer for 'aidal/Persian-Mistral-7B'. If you were trying to load it from 'https://huggingface.co/models', make sure you don't have a local directory with the same name. Otherwise, make sure 'aidal/Persian-Mistral-7B' is the correct path to a directory containing all relevant files for a LlamaTokenizerFast tokenizer.

This is my code:

from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("aidal/Persian-Mistral-7B")  # OSError Error
model = AutoModelForCausalLM.from_pretrained("aidal/Persian-Mistral-7B")
input_text = "پایتخت ایران کجاست؟"
input_ids = tokenizer(input_text, return_tensors="pt")
outputs = model.generate(**input_ids)
print(tokenizer.decode(outputs[0]))

I have even tried to login to hugging face with access token suing huggingface_hub.login and even setting HF_TOKEN env variable. But the problem persists.

Thank you for helping me to fix the problem so that I can use your grate model.

AMRZHD

Apr 26, 2025

I think the reason you are facing this error is that there are no (tokenizer.json/tokenizer_config.json) files in the directory. So, you need to define a custom tokenizer and configs for this.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment