error loading model: missing tensor 'token_embd.weight'

by poita66 - opened Aug 7, 2025

Discussion

poita66

Aug 7, 2025

•

edited Aug 7, 2025

I'm not really surprised that it fails to run, given that it's only 8MB.

I tried running it with the provided commands with llama.cpp and got the error:

print_info: max token length = 256
load_tensors: loading model tensors, this can take a while... (mmap = true)
llama_model_load: error loading model: missing tensor 'token_embd.weight'
llama_model_load_from_file_impl: failed to load model
common_init_from_params: failed to load model '/root/.cache/llama.cpp/CronoBJS_fix-json-GGUF_fix-json-Q8_0.gguf'
srv    load_model: failed to load model, '/root/.cache/llama.cpp/CronoBJS_fix-json-GGUF_fix-json-Q8_0.gguf'
srv    operator(): operator(): cleaning up before exit...
main: exiting due to model loading error

Am I doing something wrong?

poita66

Aug 7, 2025

Oh, this is also the same with the diff-apply GGUF too

poita66

Aug 8, 2025

OK, so this is because this GGUF (and the repo it quantizes) are just LORAs.

For anyone reading this, you'll need to use the model this adapter was developed for (https://huggingface.co/unsloth/Meta-Llama-3.1-8B-Instruct) and then apply this with the --lora flag in llama.cpp

poita66 changed discussion status to closed Aug 8, 2025

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment