meta_final_model / VALIDATION_CHECKLIST.md
hitanshjain1812's picture
Add Colab GRPO training pipeline, docs, and inference robustness fixes
056a7b3

Validation Checklist

Mandatory Hackathon Checks

OpenEnv Environment

  • openenv.yaml is valid
  • Environment starts via Docker
  • Required endpoints work: /reset, /step, /state, /tasks, /health

Inference Reproducibility

  • python inference.py runs end-to-end
  • Output format uses [START], [STEP], [END]

RL Training Pipeline (TRL/Unsloth)

  • Colab notebook runs: colab/PR_Review_GRPO_Training.ipynb
  • python train_grpo.py ... runs without API errors
  • Reward logs are produced
  • Reward curve image is produced
  • Before/after score table is produced

Training Artifacts

  • artifacts/<run>/logs/reward_history.csv
  • artifacts/<run>/logs/training_summary.json
  • artifacts/<run>/logs/before_after.md
  • artifacts/<run>/plots/reward_curve.png

Storytelling Requirements

  • README explains problem, environment, rewards, and results
  • README links to HF Space
  • README links to mini-blog or <2 min video

Quick Command Flow

docker build -t pr-review-env .
docker run --rm -p 7860:7860 pr-review-env
python inference.py
python train_grpo.py --env-base-url http://127.0.0.1:7860 --num-train-epochs 1 --output-dir artifacts/grpo_run