title: CodeReview-Professional-Workflow
emoji: 🤖
colorFrom: blue
colorTo: green
sdk: docker
pinned: false
CodeReview Professional Workflow
CodeReview Professional Workflow is an OpenEnv environment for training code-fixing agents on realistic review loops instead of one-shot coding tasks. The agent has to inspect buggy code, run tests, lint the patch, query docs, and persuade a simulated author before the episode is considered solved.
Quick links
| Artifact | Link |
|---|---|
| Hugging Face Space | 100XZX001/CodeReview-Professional-Workflow |
| Local training script | training.py |
| OpenEnv manifest | openenv.yaml |
| Submission slide deck | submission_assets/code_review_openenv_submission.pptx |
| Training artifacts folder | outputs/README.md |
Why this environment
Most code agents are evaluated on static patch generation. Real review work is messier:
- you have to diagnose the failure mode before patching
- you often need tool feedback before you know whether the fix is safe
- you may need to explain the fix to another developer before it is accepted
This environment turns that workflow into a multi-step RL setting with dense rewards and stateful interaction.
How the environment works
Each episode samples one injected bug from five difficulty bands:
easy: null checks, missing defaults, simple indexing mistakesmedium: off-by-one and wrong-operator bugshard: numerical safety failures like divide-by-zeroharder: concurrency issues like missing lockshardest: deadlock and coordination mistakes
The agent can take actions such as:
inspectrun_testsrun_linterquery_docsfixcommentquestiondone
Rewards combine test delta, lint delta, tool usage, exploration behavior, step penalties, and terminal success. The observation includes the current code, latest tool output, previous scores, author confidence, progress counters, and recent action history.
OpenEnv-first setup
This repo is structured as an OpenEnv environment rather than a custom one-off app:
- the environment metadata lives in openenv.yaml
- the Space is configured as a Docker-based OpenEnv deployment
- runtime dependencies are kept lightweight for the Space build
- training-only packages live separately so judges can run the environment without pulling the full training stack
The project now targets openenv-core>=0.2.3.
Training
The main training entrypoint is training.py, which uses Unsloth plus a PPO-style loop over real environment interaction. For judges who want a rerunnable workflow, the repo also includes a Colab-ready notebook:
Install locally
pip install -e .
pip install -r requirements-training.txt
Run training
python training.py
The training run writes the evidence plots in the working directory:
warmup_loss.pngreward_curve.pngloss_curve.pngtraining_summary.png
For submission hygiene, copy a real run into outputs/<run-name>/ and link that folder from this README before final judging.
Results and evidence
The expected evidence bundle for a real training run is:
- warm-up loss curve
- PPO reward curve
- PPO loss curve
- combined summary panel
Use outputs/README.md as the landing page for committed run artifacts.
Submission materials
This repo is set up so every judge-facing artifact can be reached from this README:
- environment Space: 100XZX001/CodeReview-Professional-Workflow
- training notebook: notebooks/code_review_unsloth_training.ipynb
- slide deck: submission_assets/code_review_openenv_submission.pptx
- evidence folder: outputs/README.md
No large video files are stored in the repo; any future video or blog submission should be linked by URL from this README.