add quantization_config.ignore=['lm_head'] (downstream audit fix) 91382f5 verified mattbucci commited on Apr 29
Devstral 24B AWQ: GPTQ-calibrated, BOS-fixed chat template, 37 tok/s on RDNA4 df87209 verified mattbucci commited on Apr 15