Update: two-stage training, per-channel FiLM gate, cosine scheduler, 9B config b3f019f verified SunXiang2025 commited on 1 day ago