decoder_step: revert to fp32 outputs (fp16-out caused CoreML to auto-cast on next-step inputs, ~+5-8% RTF regression) 1e18abe verified aufklarer commited on 12 days ago
decoder_step: re-export with fp16 outputs (boundary cast removed, halves KV transfer) c192cb8 verified aufklarer commited on 12 days ago