remove ema, remove redundant compute for data loading, memory opt for float32, and set float32 by default for finetuning aa7853c klldmofashi commited on Sep 5, 2025