Spaces:
Sleeping
Sleeping
Training (Minimal GRPO)
This folder contains a minimal GRPO training script that uses MutationGym environment rewards to fine-tune a small instruct model. It is intended as a reference implementation for the OpenEnv Challenge deliverable.
Quick start
pip install -U "trl>=0.10.0" "transformers>=4.45.0" "datasets>=2.18.0"
pip install -e .
python training/grpo_train.py --model Qwen/Qwen2.5-0.5B-Instruct
Notes:
- The script uses a tiny prompt set derived from the task specs.
- It scores completions using the local MutationGym environment.
- For a larger run, increase
--num-generationsand--max-steps.