Manually pruned version of HyperNova 60B, from 80 experts down to 56 experts, as my second attempt at pruning a MoE model. It works, but testing was limited. There is an F16 GGUF available but I am unable to quantize it lower because the script keeps ballooning the size up to 40 GB for a Q4_K_M. I am not technically proficient, I would appreciate help.
32cd000
verified