aufklarer's picture
decoder_step: revert to fp32 outputs (fp16-out caused CoreML to auto-cast on next-step inputs, ~+5-8% RTF regression)
1e18abe verified