paradox44's picture
Initial OpenEnv upload
8850413 verified

Training (Minimal GRPO)

This folder contains a minimal GRPO training script that uses MutationGym environment rewards to fine-tune a small instruct model. It is intended as a reference implementation for the OpenEnv Challenge deliverable.

Quick start

pip install -U "trl>=0.10.0" "transformers>=4.45.0" "datasets>=2.18.0"
pip install -e .
python training/grpo_train.py --model Qwen/Qwen2.5-0.5B-Instruct

Notes:

  • The script uses a tiny prompt set derived from the task specs.
  • It scores completions using the local MutationGym environment.
  • For a larger run, increase --num-generations and --max-steps.