bug-triage-openenv / docs /ARCHITECTURE.md
savetrees's picture
Upload folder using huggingface_hub
0135a17 verified

ARCHITECTURE.md β€” Bug Triage OpenEnv System Architecture


1. Core Abstractions (OpenEnv Spec)

Server-side (Docker)

BugTriageEnvironment(Environment)
β”œβ”€β”€ reset(task_id)  β†’  BugTriageObservation   # New bug report episode
β”œβ”€β”€ step(action)    β†’  BugTriageObservation   # Agent triages; grader fires; done=True
└── state @property β†’  BugTriageState         # Episode metadata

Client-side (Training code)

BugTriageEnvClient
β”œβ”€β”€ reset(task_id)  β†’  dict   # POST /reset
β”œβ”€β”€ step(ep_id, action) β†’ dict  # POST /step
└── state()  β†’  dict           # GET  /state

2. State Model

class BugTriageState:
    episode_id: str         # Unique per episode
    step_count: int         # 0 after reset, 1 after step
    task_id: str            # "task_1" | "task_2" | "task_3"
    bug_id: str             # Which bug report is active
    cumulative_reward: float

3. Observation Model

class BugTriageObservation:
    done: bool                        # True after step()
    reward: float                     # Shaped reward [-0.5, 1.0]
    task_id: str

    bug_report: BugReport             # Title, description, logs, env, reporter, metadata
    available_developers: List[str]   # ["Alice", "Bob", "Carol", "David", "Eve"]
    step_number: int
    feedback: str                     # Human-readable grader feedback
    grader_score: Optional[float]     # [0.0-1.0] β€” only when done=True
    episode_id: str

BugReport fields the agent reads:

Field Type Example
title str "App crashes on iOS 17 uploading >50MB"
description str Full description with context
logs str? Stack traces, error output
environment str? "iOS 17.2, iPhone 15 Pro, App v3.2.1"
reporter str? "enterprise_client_a"
metadata dict {"component": "file_upload", "affected_users": 847}

4. Action Model

class BugTriageAction:
    task_id: str                                    # Required always

    # Task 1 (Easy)
    bug_type: Optional[str]                         # crash|ui|performance|security|data_loss|compatibility

    # Task 2 (Medium)
    priority: Optional[str]                         # low|medium|high|critical

    # Task 3 (Hard) β€” all of the above plus:
    assigned_developer: Optional[str]               # Alice|Bob|Carol|David|Eve
    suggested_action: Optional[str]                 # fix_immediately|schedule_sprint|needs_more_info|wontfix|duplicate
    reasoning: Optional[str]                        # Chain-of-thought (not graded)

5. Episode Flow

reset(task_id="task_1")
  β”‚
  └── Returns BugTriageObservation:
        - bug_report = random bug from dataset (15 bugs)
        - done = False
        - episode_id = "abc123"

step(action=BugTriageAction(task_id="task_1", bug_type="crash"))
  β”‚
  β”œβ”€β”€ Grader fires immediately (single-step episode)
  β”‚     task1_grader.grade([action], ground_truth) β†’ 1.0
  β”‚
  └── Returns BugTriageObservation:
        - done = True
        - reward = 1.0 (shaped from grader score)
        - grader_score = 1.0
        - feedback = "Bug type: βœ“ (predicted=crash, expected=crash)"

Key design: Episodes are single-step β€” agent reads bug, makes one decision, episode ends. This matches real-world triage (you don't re-classify the same bug iteratively).


6. File Layout

bug_triage_env/
β”œβ”€β”€ __init__.py            ← Package exports
β”œβ”€β”€ models.py              ← Pydantic typed Action/Observation/State
β”œβ”€β”€ client.py              ← Sync + Async HTTP clients
β”œβ”€β”€ baseline.py            ← OpenAI GPT-4o-mini inference script
β”œβ”€β”€ openenv.yaml           ← Manifest
β”œβ”€β”€ pyproject.toml
β”œβ”€β”€ requirements.txt
β”‚
β”œβ”€β”€ data/
β”‚   └── bugs.json          ← 15 real-world bug reports + ground truth
β”‚
β”œβ”€β”€ graders/
β”‚   β”œβ”€β”€ __init__.py        ← GRADERS registry
β”‚   β”œβ”€β”€ task1_grader.py    ← Bug type exact match [0/1]
β”‚   β”œβ”€β”€ task2_grader.py    ← Priority distance scoring [0-1]
β”‚   └── task3_grader.py    ← Weighted composite [0-1]
β”‚
└── server/
    β”œβ”€β”€ __init__.py
    β”œβ”€β”€ environment.py     ← BugTriageEnvironment(Environment)
    β”œβ”€β”€ app.py             ← FastAPI (standard + hackathon endpoints)
    └── Dockerfile

7. API Endpoints

Standard OpenEnv

Method Endpoint Purpose
GET /health β†’ {"status": "healthy"}
POST /reset Body: {"task_id": "task_1"} β†’ observation with bug report
POST /step Body: {"episode_id": "...", "action": {...}} β†’ scored observation
GET /state β†’ current episode metadata
GET /docs Swagger UI

Hackathon Required

Method Endpoint Purpose
GET /tasks β†’ 3 tasks with action schemas
POST /grader Body: {"episode_id":"...","task_id":"task_1"} β†’ score [0.0-1.0]
POST /baseline Runs baseline.py β†’ all task scores

8. Developer Specialization Matrix

Developer Crash UI Perf Security Data Loss Compat
Alice βœ“ βœ“ βœ“
Bob βœ“ βœ“
Carol βœ“ βœ“
David βœ“ βœ“
Eve βœ“ βœ“ βœ“

This matrix is used by the Task 3 grader for partial credit on developer assignment β€” if the agent picks the wrong person but someone with the right specialization, it gets 0.5 instead of 0.0.


9. Scaling

Deployment Workers Max Sessions
Local 8 ~2000
HF Spaces Free 2 ~128
HF Spaces Upgrade 4-8 ~512

Thread-safe: BugTriageEnvironment uses a threading.Lock for concurrent episode storage.