Safetensors

Problems accessing the model from huggingface as well as local.

#2
by LowkeySuicidal - opened

Hello,

I am trying to work on a mini-project on identifying if a legal text is generated using AI, and I am trying to follow: https://arxiv.org/pdf/2305.17359 and try to compare the original text with an equivalent text generated using LLMs including InLegalLLama. However, Im unable to access the model directly through huggingface as well as loading the model after downloading the files to my local machine. In both the cases, Im unable to load the tokenizer. This is the error that I get:

This is my code for loading:

from transformers import LlamaForCausalLM, LlamaTokenizer

paper_model = "L-NLProc/InLegalLlama"
tokenizer = LlamaTokenizer.from_pretrained("L-NLProc/InLegalLlama", trust_remote_code=True)
model = LlamaForCausalLM.from_pretrained("L-NLProc/InLegalLlama", trust_remote_code=True)

However, get this error:

TypeError                                 Traceback (most recent call last)

/tmp/ipython-input-1218085602.py in <cell line: 0>()
      4 # tokenizer = LlamaTokenizer.from_pretrained(paper_model)
      5 # model = LlamaForCausalLM.from_pretrained(paper_model)
----> 6 tokenizer = LlamaTokenizer.from_pretrained("L-NLProc/InLegalLlama", trust_remote_code=True)
      7 model = LlamaForCausalLM.from_pretrained("L-NLProc/InLegalLlama", trust_remote_code=True)

5 frames

/usr/local/lib/python3.12/dist-packages/sentencepiece/__init__.py in LoadFromFile(self, arg)
    314 
    315     def LoadFromFile(self, arg):
--> 316         return _sentencepiece.SentencePieceProcessor_LoadFromFile(self, arg)
    317 
    318     def _EncodeAsIds(self, text, enable_sampling, nbest_size, alpha, add_bos, add_eos, reverse, emit_unk_piece):

TypeError: not a string

I even tried loading it directly using sentencepiece, but it also gives me the following error:

OSError                                   Traceback (most recent call last)

/tmp/ipython-input-919257988.py in <cell line: 0>()
      4 # hoping it works
      5 Papertokenizer = spm.SentencePieceProcessor(model_file="InLegalLlama/INLegalLlama/SFT/Prediction_and_Explanation/Prediction_and_Explanation SFT Llama 2/tokenizer.model")
----> 6 Papermodel = LlamaForCausalLM.from_pretrained(model_path, torch_dtype=torch.float16, device_map="auto")

2 frames

/usr/local/lib/python3.12/dist-packages/transformers/modeling_utils.py in _get_resolved_checkpoint_files(pretrained_model_name_or_path, subfolder, variant, gguf_file, from_tf, from_flax, use_safetensors, cache_dir, force_download, proxies, local_files_only, token, user_agent, revision, commit_hash, is_remote_code, transformers_explicit_filename)
    987                 )
    988             else:
--> 989                 raise OSError(
    990                     f"Error no file named {_add_variant(WEIGHTS_NAME, variant)}, {_add_variant(SAFE_WEIGHTS_NAME, variant)},"
    991                     f" {TF2_WEIGHTS_NAME}, {TF_WEIGHTS_NAME + '.index'} or {FLAX_WEIGHTS_NAME} found in directory"

OSError: Error no file named pytorch_model.bin, model.safetensors, tf_model.h5, model.ckpt.index or flax_model.msgpack found in directory InLegalLlama.

Sign up or log in to comment