Spaces:

paradox44
/

mutationgym-env

Sleeping

App Files Files Community

mutationgym-env / training /README.md

paradox44

Initial OpenEnv upload

8850413 verified 3 months ago

preview code

raw

history blame contribute delete

630 Bytes

Training (Minimal GRPO)

This folder contains a minimal GRPO training script that uses MutationGym environment rewards to fine-tune a small instruct model. It is intended as a reference implementation for the OpenEnv Challenge deliverable.

Quick start

pip install -U "trl>=0.10.0" "transformers>=4.45.0" "datasets>=2.18.0"
pip install -e .
python training/grpo_train.py --model Qwen/Qwen2.5-0.5B-Instruct

Notes:

The script uses a tiny prompt set derived from the task specs.
It scores completions using the local MutationGym environment.
For a larger run, increase --num-generations and --max-steps.