Unused tensors

#25

by ttkciar - opened Oct 1, 2025

Oct 1, 2025

Loading the GGUF of this model in llama.cpp, I get the following warnings about unused tensors:

load_tensors: loading model tensors, this can take a while... (mmap = true)
model has unused tensor blk.92.attn_norm.weight (size = 20480 bytes) -- ignoring
model has unused tensor blk.92.attn_q.weight (size = 35389440 bytes) -- ignoring
model has unused tensor blk.92.attn_k.weight (size = 5570560 bytes) -- ignoring
model has unused tensor blk.92.attn_v.weight (size = 4300800 bytes) -- ignoring
model has unused tensor blk.92.attn_q.bias (size = 49152 bytes) -- ignoring
model has unused tensor blk.92.attn_k.bias (size = 4096 bytes) -- ignoring
model has unused tensor blk.92.attn_v.bias (size = 4096 bytes) -- ignoring
model has unused tensor blk.92.attn_output.weight (size = 43253760 bytes) -- ignoring
model has unused tensor blk.92.attn_q_norm.weight (size = 512 bytes) -- ignoring
model has unused tensor blk.92.attn_k_norm.weight (size = 512 bytes) -- ignoring
model has unused tensor blk.92.post_attention_norm.weight (size = 20480 bytes) -- ignoring
model has unused tensor blk.92.ffn_gate_inp.weight (size = 3276800 bytes) -- ignoring
model has unused tensor blk.92.exp_probs_b.bias (size = 640 bytes) -- ignoring
model has unused tensor blk.92.ffn_gate_exps.weight (size = 707788800 bytes) -- ignoring
model has unused tensor blk.92.ffn_down_exps.weight (size = 1032192000 bytes) -- ignoring
model has unused tensor blk.92.ffn_up_exps.weight (size = 707788800 bytes) -- ignoring
model has unused tensor blk.92.ffn_gate_shexp.weight (size = 8355840 bytes) -- ignoring
model has unused tensor blk.92.ffn_down_shexp.weight (size = 8355840 bytes) -- ignoring
model has unused tensor blk.92.ffn_up_shexp.weight (size = 8355840 bytes) -- ignoring
model has unused tensor blk.92.nextn.eh_proj.weight (size = 29491200 bytes) -- ignoring
model has unused tensor blk.92.nextn.embed_tokens.weight (size = 436469760 bytes) -- ignoring
model has unused tensor blk.92.nextn.enorm.weight (size = 20480 bytes) -- ignoring
model has unused tensor blk.92.nextn.hnorm.weight (size = 20480 bytes) -- ignoring
model has unused tensor blk.92.nextn.shared_head_head.weight (size = 436469760 bytes) -- ignoring
model has unused tensor blk.92.nextn.shared_head_norm.weight (size = 20480 bytes) -- ignoring

Any idea what that's about?

ttkciar

Oct 2, 2025

Never mind, the folks at r/LocalLLaMA filled me in. This is the layer used for multi-token prediction, which llama.cpp does not yet support.

Thanks for providing such a fantastic model, btw! I really appreciate it.

ttkciar changed discussion status to closed Oct 2, 2025

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment