Unused tensors

#25
by ttkciar - opened

Loading the GGUF of this model in llama.cpp, I get the following warnings about unused tensors:

load_tensors: loading model tensors, this can take a while... (mmap = true)
model has unused tensor blk.92.attn_norm.weight (size = 20480 bytes) -- ignoring
model has unused tensor blk.92.attn_q.weight (size = 35389440 bytes) -- ignoring
model has unused tensor blk.92.attn_k.weight (size = 5570560 bytes) -- ignoring
model has unused tensor blk.92.attn_v.weight (size = 4300800 bytes) -- ignoring
model has unused tensor blk.92.attn_q.bias (size = 49152 bytes) -- ignoring
model has unused tensor blk.92.attn_k.bias (size = 4096 bytes) -- ignoring
model has unused tensor blk.92.attn_v.bias (size = 4096 bytes) -- ignoring
model has unused tensor blk.92.attn_output.weight (size = 43253760 bytes) -- ignoring
model has unused tensor blk.92.attn_q_norm.weight (size = 512 bytes) -- ignoring
model has unused tensor blk.92.attn_k_norm.weight (size = 512 bytes) -- ignoring
model has unused tensor blk.92.post_attention_norm.weight (size = 20480 bytes) -- ignoring
model has unused tensor blk.92.ffn_gate_inp.weight (size = 3276800 bytes) -- ignoring
model has unused tensor blk.92.exp_probs_b.bias (size = 640 bytes) -- ignoring
model has unused tensor blk.92.ffn_gate_exps.weight (size = 707788800 bytes) -- ignoring
model has unused tensor blk.92.ffn_down_exps.weight (size = 1032192000 bytes) -- ignoring
model has unused tensor blk.92.ffn_up_exps.weight (size = 707788800 bytes) -- ignoring
model has unused tensor blk.92.ffn_gate_shexp.weight (size = 8355840 bytes) -- ignoring
model has unused tensor blk.92.ffn_down_shexp.weight (size = 8355840 bytes) -- ignoring
model has unused tensor blk.92.ffn_up_shexp.weight (size = 8355840 bytes) -- ignoring
model has unused tensor blk.92.nextn.eh_proj.weight (size = 29491200 bytes) -- ignoring
model has unused tensor blk.92.nextn.embed_tokens.weight (size = 436469760 bytes) -- ignoring
model has unused tensor blk.92.nextn.enorm.weight (size = 20480 bytes) -- ignoring
model has unused tensor blk.92.nextn.hnorm.weight (size = 20480 bytes) -- ignoring
model has unused tensor blk.92.nextn.shared_head_head.weight (size = 436469760 bytes) -- ignoring
model has unused tensor blk.92.nextn.shared_head_norm.weight (size = 20480 bytes) -- ignoring

Any idea what that's about?

Never mind, the folks at r/LocalLLaMA filled me in. This is the layer used for multi-token prediction, which llama.cpp does not yet support.

Thanks for providing such a fantastic model, btw! I really appreciate it.

ttkciar changed discussion status to closed

Sign up or log in to comment