Spaces:
Runtime error
Runtime error
| # PRobe β Design Notes | |
| See the top-level [README](../README.md) for the full environment description, | |
| reward function breakdown, and task catalogue. | |
| ## Repository layout | |
| ``` | |
| repo-root/ | |
| βββ agent/ # Client API (ProbeEnv, ProbeAction, ProbeObservation) | |
| βββ environment/ # FastAPI server + RL environment logic | |
| βββ training/ # GRPO training and baseline evaluation scripts | |
| βββ tests/ # pytest suite | |
| βββ outputs/ # logs, reward curves, artefacts (git-ignored) | |
| βββ docs/ # design notes (this file) | |
| ``` | |
| ## Environment entry point | |
| `environment/app.py` β FastAPI app mounted at `/ui/` (static frontend) and `/docs` (API). | |
| `openenv.yaml` β `app: environment.app:app`. | |
| ## Reward function | |
| See `environment/graders.py` for the deterministic keyword+line-range grader. | |
| ## Training | |
| `training/train_grpo.py` β single-turn GRPO via HuggingFace TRL. | |
| `training/baseline.py` β zero-shot GPT-4o-mini baseline. | |
| `training/scripted_baseline.py` β deterministic oracle / spammer stress-tests. | |