Spaces:
Running
Running
ROLES.md — Team Roles & Responsibilities
Role Definitions
Environment Engineer
Owns: server/environment.py, models.py, data pipeline
- Implements
reset(),step(),statefollowing OpenEnv ABC - Defines typed
Action,Observation,Statedataclasses - Writes ground-truth data loader and reward computation logic
- Ensures
step()is stateless-safe for concurrent sessions - Writes unit tests for environment logic
API / Infra Engineer
Owns: server/app.py, server/Dockerfile, openenv.yaml, pyproject.toml
- Wires
create_fastapi_app(env)correctly - Configures Dockerfile for
uvicornwithWORKERS,PORTenv vars - Manages HF Spaces deployment (
openenv push) - Validates
/health,/reset,/step,/stateendpoints - Configures scaling (workers, MAX_CONCURRENT_ENVS)
Client Engineer
Owns: client.py
- Implements
HTTPEnvClientsubclass - Implements
_step_payload(),_parse_result(),_parse_state() - Writes async and sync usage examples
- Ensures client works against both local Docker and HF Spaces URL
Training Engineer
Owns: train.py / Colab notebook, GRPOConfig, rollout function
- Writes
rollout_functhat callsenv.reset()+env.step()in loop - Defines
GRPOConfig(learning rate, batch size, vLLM settings) - Registers all reward functions with
GRPOTrainer - Monitors training via trackio
- Pushes fine-tuned model to HF Hub
Reward Designer
Owns: REWARD_DESIGN.md, reward function implementations
- Designs decomposed, float-returning reward functions
- Works with Environment Engineer to embed reward signals in
step() - Validates reward signal is non-sparse and well-shaped
- Documents reward composition and rationale
QA / Evaluation Engineer
Owns: evaluation scripts, metrics
- Validates environment correctness (does
step()behave deterministically?) - Runs baseline policies to sanity-check reward range
- Evaluates fine-tuned model on held-out evaluation set
- Produces final metrics for hackathon submission
Collaboration Rules
- All code must follow OpenEnv 5-step pattern (see ARCHITECTURE.md)
- No deviation from
step()/reset()/stateinterface - All reward functions must return
List[float]for GRPOTrainer compatibility - Docker image must pass
curl /healthbefore any deployment PR is merged - HF Space URL must be live before training begins