Fix gating: per-element to match g_proj weights (was True/per-head)

#1
by joerowell - opened
Poolside org

The g_proj weight is [num_heads*head_dim, hidden] = per-element, matching Laguna-M.1-base/FP8/NVFP4, but config.json declared gating=true (per-head), which mismatches the weight shape. Set gating to 'per-element'.

joerowell changed pull request status to merged

Sign up or log in to comment