refactor: move training code to scripts/, add train/eval split, tune GRPO hyperparams fad16c9 Mohammed-Altaf commited on 19 days ago