Fix SFTConfig: move max_seq_length + dataset_text_field to SFTTrainer (trl API change) 2e81e98 AlgoCore commited on Apr 25
Add train_sft.ipynb: SFT pre-training with 1000 gold-label examples before GRPO cf0d796 AlgoCore commited on Apr 25