Spaces:

savetrees
/

bug-triage-openenv

Running

App Files Files Community

bug-triage-openenv / docs /ROLES.md

savetrees

Upload folder using huggingface_hub

0135a17 verified 2 days ago

preview code

raw

history blame contribute delete

2.42 kB

	# ROLES.md — Team Roles & Responsibilities

	---

	## Role Definitions

	### Environment Engineer
	Owns: `server/environment.py`, `models.py`, data pipeline
	- Implements `reset()`, `step()`, `state` following OpenEnv ABC
	- Defines typed `Action`, `Observation`, `State` dataclasses
	- Writes ground-truth data loader and reward computation logic
	- Ensures `step()` is stateless-safe for concurrent sessions
	- Writes unit tests for environment logic

	### API / Infra Engineer
	Owns: `server/app.py`, `server/Dockerfile`, `openenv.yaml`, `pyproject.toml`
	- Wires `create_fastapi_app(env)` correctly
	- Configures Dockerfile for `uvicorn` with `WORKERS`, `PORT` env vars
	- Manages HF Spaces deployment (`openenv push`)
	- Validates `/health`, `/reset`, `/step`, `/state` endpoints
	- Configures scaling (workers, MAX_CONCURRENT_ENVS)

	### Client Engineer
	Owns: `client.py`
	- Implements `HTTPEnvClient` subclass
	- Implements `_step_payload()`, `_parse_result()`, `_parse_state()`
	- Writes async and sync usage examples
	- Ensures client works against both local Docker and HF Spaces URL

	### Training Engineer
	Owns: `train.py` / Colab notebook, `GRPOConfig`, rollout function
	- Writes `rollout_func` that calls `env.reset()` + `env.step()` in loop
	- Defines `GRPOConfig` (learning rate, batch size, vLLM settings)
	- Registers all reward functions with `GRPOTrainer`
	- Monitors training via trackio
	- Pushes fine-tuned model to HF Hub

	### Reward Designer
	Owns: `REWARD_DESIGN.md`, reward function implementations
	- Designs decomposed, float-returning reward functions
	- Works with Environment Engineer to embed reward signals in `step()`
	- Validates reward signal is non-sparse and well-shaped
	- Documents reward composition and rationale

	### QA / Evaluation Engineer
	Owns: evaluation scripts, metrics
	- Validates environment correctness (does `step()` behave deterministically?)
	- Runs baseline policies to sanity-check reward range
	- Evaluates fine-tuned model on held-out evaluation set
	- Produces final metrics for hackathon submission

	---

	## Collaboration Rules
	- All code must follow OpenEnv 5-step pattern (see ARCHITECTURE.md)
	- No deviation from `step()` / `reset()` / `state` interface
	- All reward functions must return `List[float]` for GRPOTrainer compatibility
	- Docker image must pass `curl /health` before any deployment PR is merged
	- HF Space URL must be live before training begins