Rewrite train.py: 30min max on free T4, 1000 samples, 1 epoch, G=2, 128 max tokens" 1fea34c verified Rofati commited on 24 days ago
Add GRPO training script for Colab/Kaggle free T4 GPU" 3116855 verified Rofati commited on 24 days ago