Problems accessing the model from huggingface as well as local.
#2
by
LowkeySuicidal
- opened
Hello,
I am trying to work on a mini-project on identifying if a legal text is generated using AI, and I am trying to follow: https://arxiv.org/pdf/2305.17359 and try to compare the original text with an equivalent text generated using LLMs including InLegalLLama. However, Im unable to access the model directly through huggingface as well as loading the model after downloading the files to my local machine. In both the cases, Im unable to load the tokenizer. This is the error that I get:
This is my code for loading:
from transformers import LlamaForCausalLM, LlamaTokenizer
paper_model = "L-NLProc/InLegalLlama"
tokenizer = LlamaTokenizer.from_pretrained("L-NLProc/InLegalLlama", trust_remote_code=True)
model = LlamaForCausalLM.from_pretrained("L-NLProc/InLegalLlama", trust_remote_code=True)
However, get this error:
TypeError Traceback (most recent call last)
/tmp/ipython-input-1218085602.py in <cell line: 0>()
4 # tokenizer = LlamaTokenizer.from_pretrained(paper_model)
5 # model = LlamaForCausalLM.from_pretrained(paper_model)
----> 6 tokenizer = LlamaTokenizer.from_pretrained("L-NLProc/InLegalLlama", trust_remote_code=True)
7 model = LlamaForCausalLM.from_pretrained("L-NLProc/InLegalLlama", trust_remote_code=True)
5 frames
/usr/local/lib/python3.12/dist-packages/sentencepiece/__init__.py in LoadFromFile(self, arg)
314
315 def LoadFromFile(self, arg):
--> 316 return _sentencepiece.SentencePieceProcessor_LoadFromFile(self, arg)
317
318 def _EncodeAsIds(self, text, enable_sampling, nbest_size, alpha, add_bos, add_eos, reverse, emit_unk_piece):
TypeError: not a string
I even tried loading it directly using sentencepiece, but it also gives me the following error:
OSError Traceback (most recent call last)
/tmp/ipython-input-919257988.py in <cell line: 0>()
4 # hoping it works
5 Papertokenizer = spm.SentencePieceProcessor(model_file="InLegalLlama/INLegalLlama/SFT/Prediction_and_Explanation/Prediction_and_Explanation SFT Llama 2/tokenizer.model")
----> 6 Papermodel = LlamaForCausalLM.from_pretrained(model_path, torch_dtype=torch.float16, device_map="auto")
2 frames
/usr/local/lib/python3.12/dist-packages/transformers/modeling_utils.py in _get_resolved_checkpoint_files(pretrained_model_name_or_path, subfolder, variant, gguf_file, from_tf, from_flax, use_safetensors, cache_dir, force_download, proxies, local_files_only, token, user_agent, revision, commit_hash, is_remote_code, transformers_explicit_filename)
987 )
988 else:
--> 989 raise OSError(
990 f"Error no file named {_add_variant(WEIGHTS_NAME, variant)}, {_add_variant(SAFE_WEIGHTS_NAME, variant)},"
991 f" {TF2_WEIGHTS_NAME}, {TF_WEIGHTS_NAME + '.index'} or {FLAX_WEIGHTS_NAME} found in directory"
OSError: Error no file named pytorch_model.bin, model.safetensors, tf_model.h5, model.ckpt.index or flax_model.msgpack found in directory InLegalLlama.