savetrees's picture
Upload folder using huggingface_hub
0135a17 verified

ROLES.md — Team Roles & Responsibilities


Role Definitions

Environment Engineer

Owns: server/environment.py, models.py, data pipeline

  • Implements reset(), step(), state following OpenEnv ABC
  • Defines typed Action, Observation, State dataclasses
  • Writes ground-truth data loader and reward computation logic
  • Ensures step() is stateless-safe for concurrent sessions
  • Writes unit tests for environment logic

API / Infra Engineer

Owns: server/app.py, server/Dockerfile, openenv.yaml, pyproject.toml

  • Wires create_fastapi_app(env) correctly
  • Configures Dockerfile for uvicorn with WORKERS, PORT env vars
  • Manages HF Spaces deployment (openenv push)
  • Validates /health, /reset, /step, /state endpoints
  • Configures scaling (workers, MAX_CONCURRENT_ENVS)

Client Engineer

Owns: client.py

  • Implements HTTPEnvClient subclass
  • Implements _step_payload(), _parse_result(), _parse_state()
  • Writes async and sync usage examples
  • Ensures client works against both local Docker and HF Spaces URL

Training Engineer

Owns: train.py / Colab notebook, GRPOConfig, rollout function

  • Writes rollout_func that calls env.reset() + env.step() in loop
  • Defines GRPOConfig (learning rate, batch size, vLLM settings)
  • Registers all reward functions with GRPOTrainer
  • Monitors training via trackio
  • Pushes fine-tuned model to HF Hub

Reward Designer

Owns: REWARD_DESIGN.md, reward function implementations

  • Designs decomposed, float-returning reward functions
  • Works with Environment Engineer to embed reward signals in step()
  • Validates reward signal is non-sparse and well-shaped
  • Documents reward composition and rationale

QA / Evaluation Engineer

Owns: evaluation scripts, metrics

  • Validates environment correctness (does step() behave deterministically?)
  • Runs baseline policies to sanity-check reward range
  • Evaluates fine-tuned model on held-out evaluation set
  • Produces final metrics for hackathon submission

Collaboration Rules

  • All code must follow OpenEnv 5-step pattern (see ARCHITECTURE.md)
  • No deviation from step() / reset() / state interface
  • All reward functions must return List[float] for GRPOTrainer compatibility
  • Docker image must pass curl /health before any deployment PR is merged
  • HF Space URL must be live before training begins