ModuleNotFoundError: No module named ‘llama_inference_offload’ on Mac m1 chip

#8
by vijaysb - opened

Message in terminal,
INFO:Loading TheBloke_guanaco-65B-GPTQ...
ERROR:Failed to load GPTQ-for-LLaMa
ERROR:See https://github.com/oobabooga/text-generation-webui/blob/main/docs/GPTQ-models-(4-bit-mode).md

I'm getting following error in WebUI:

Traceback (most recent call last): File “/Users/vij/development/text-generation-webui/modules/GPTQ_loader.py”, line 18, in import llama_inference_offload ModuleNotFoundError: No module named ‘llama_inference_offload’

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File “/Users/vij/development/text-generation-webui/server.py”, line 71, in load_model_wrapper shared.model, shared.tokenizer = load_model(shared.model_name) File “/Users/vij/development/text-generation-webui/modules/models.py”, line 97, in load_model output = load_func(model_name) File “/Users/vij/development/text-generation-webui/modules/models.py”, line 289, in GPTQ_loader import modules.GPTQ_loader File “/Users/vij/development/text-generation-webui/modules/GPTQ_loader.py”, line 22, in sys.exit(-1) SystemExit: -1

any idea how to fix this?

Gptq is not supported on macos at this time.

Please use the ggml version, assuming you have 64+GB Ram. If not, please try a smaller model eg 33B GGML.

Thanks, I have exactly 64GB Ram, will it be slow?

additionally what configurations we need to fine tune it?

Yeah it'll be pretty slow. You might prefer to try a 30B model instead, like TheBloke/Guanaco-33B-GGML or, even better, TheBloke/WizardLM-30B-Uncensored-GGML

Yes tried TheBloke/Guanaco-33B-GGML, and it worked, a little slow ad initially takes around 30 sec to begin generating text.
Thanks for your support!!

vijaysb changed discussion status to closed

Sign up or log in to comment