PRobe / docs /design.md
Thakur, Mahipal
UI Integration
44bd7bd
# PRobe β€” Design Notes
See the top-level [README](../README.md) for the full environment description,
reward function breakdown, and task catalogue.
## Repository layout
```
repo-root/
β”œβ”€β”€ agent/ # Client API (ProbeEnv, ProbeAction, ProbeObservation)
β”œβ”€β”€ environment/ # FastAPI server + RL environment logic
β”œβ”€β”€ training/ # GRPO training and baseline evaluation scripts
β”œβ”€β”€ tests/ # pytest suite
β”œβ”€β”€ outputs/ # logs, reward curves, artefacts (git-ignored)
└── docs/ # design notes (this file)
```
## Environment entry point
`environment/app.py` β€” FastAPI app mounted at `/ui/` (static frontend) and `/docs` (API).
`openenv.yaml` β†’ `app: environment.app:app`.
## Reward function
See `environment/graders.py` for the deterministic keyword+line-range grader.
## Training
`training/train_grpo.py` β€” single-turn GRPO via HuggingFace TRL.
`training/baseline.py` β€” zero-shot GPT-4o-mini baseline.
`training/scripted_baseline.py` β€” deterministic oracle / spammer stress-tests.