PRobe / docs /design.md
Thakur, Mahipal
UI Integration
44bd7bd

PRobe β€” Design Notes

See the top-level README for the full environment description, reward function breakdown, and task catalogue.

Repository layout

repo-root/
β”œβ”€β”€ agent/           # Client API (ProbeEnv, ProbeAction, ProbeObservation)
β”œβ”€β”€ environment/     # FastAPI server + RL environment logic
β”œβ”€β”€ training/        # GRPO training and baseline evaluation scripts
β”œβ”€β”€ tests/           # pytest suite
β”œβ”€β”€ outputs/         # logs, reward curves, artefacts (git-ignored)
└── docs/            # design notes (this file)

Environment entry point

environment/app.py β€” FastAPI app mounted at /ui/ (static frontend) and /docs (API).
openenv.yaml β†’ app: environment.app:app.

Reward function

See environment/graders.py for the deterministic keyword+line-range grader.

Training

training/train_grpo.py β€” single-turn GRPO via HuggingFace TRL.
training/baseline.py β€” zero-shot GPT-4o-mini baseline.
training/scripted_baseline.py β€” deterministic oracle / spammer stress-tests.