Spaces:

savetrees
/

bug-triage-openenv

Running

App Files Files Community

bug-triage-openenv / docs /ROLES.md

savetrees's picture

Upload folder using huggingface_hub

0135a17 verified 2 days ago

|

history blame contribute delete

2.42 kB

ROLES.md — Team Roles & Responsibilities

Role Definitions

Environment Engineer

Owns: server/environment.py, models.py, data pipeline

Implements reset(), step(), state following OpenEnv ABC
Defines typed Action, Observation, State dataclasses
Writes ground-truth data loader and reward computation logic
Ensures step() is stateless-safe for concurrent sessions
Writes unit tests for environment logic

API / Infra Engineer

Owns: server/app.py, server/Dockerfile, openenv.yaml, pyproject.toml

Wires create_fastapi_app(env) correctly
Configures Dockerfile for uvicorn with WORKERS, PORT env vars
Manages HF Spaces deployment (openenv push)
Validates /health, /reset, /step, /state endpoints
Configures scaling (workers, MAX_CONCURRENT_ENVS)

Client Engineer

Owns: client.py

Implements HTTPEnvClient subclass
Implements _step_payload(), _parse_result(), _parse_state()
Writes async and sync usage examples
Ensures client works against both local Docker and HF Spaces URL

Training Engineer

Owns: train.py / Colab notebook, GRPOConfig, rollout function

Writes rollout_func that calls env.reset() + env.step() in loop
Defines GRPOConfig (learning rate, batch size, vLLM settings)
Registers all reward functions with GRPOTrainer
Monitors training via trackio
Pushes fine-tuned model to HF Hub

Reward Designer

Owns: REWARD_DESIGN.md, reward function implementations

Designs decomposed, float-returning reward functions
Works with Environment Engineer to embed reward signals in step()
Validates reward signal is non-sparse and well-shaped
Documents reward composition and rationale

QA / Evaluation Engineer

Owns: evaluation scripts, metrics

Validates environment correctness (does step() behave deterministically?)
Runs baseline policies to sanity-check reward range
Evaluates fine-tuned model on held-out evaluation set
Produces final metrics for hackathon submission

Collaboration Rules

All code must follow OpenEnv 5-step pattern (see ARCHITECTURE.md)
No deviation from step() / reset() / state interface
All reward functions must return List[float] for GRPOTrainer compatibility
Docker image must pass curl /health before any deployment PR is merged
HF Space URL must be live before training begins