Spaces:
Sleeping
Sleeping
| # ARCHITECTURE.md β Bug Triage OpenEnv System Architecture | |
| --- | |
| ## 1. Core Abstractions (OpenEnv Spec) | |
| ### Server-side (Docker) | |
| ``` | |
| BugTriageEnvironment(Environment) | |
| βββ reset(task_id) β BugTriageObservation # New bug report episode | |
| βββ step(action) β BugTriageObservation # Agent triages; grader fires; done=True | |
| βββ state @property β BugTriageState # Episode metadata | |
| ``` | |
| ### Client-side (Training code) | |
| ``` | |
| BugTriageEnvClient | |
| βββ reset(task_id) β dict # POST /reset | |
| βββ step(ep_id, action) β dict # POST /step | |
| βββ state() β dict # GET /state | |
| ``` | |
| --- | |
| ## 2. State Model | |
| ```python | |
| class BugTriageState: | |
| episode_id: str # Unique per episode | |
| step_count: int # 0 after reset, 1 after step | |
| task_id: str # "task_1" | "task_2" | "task_3" | |
| bug_id: str # Which bug report is active | |
| cumulative_reward: float | |
| ``` | |
| --- | |
| ## 3. Observation Model | |
| ```python | |
| class BugTriageObservation: | |
| done: bool # True after step() | |
| reward: float # Shaped reward [-0.5, 1.0] | |
| task_id: str | |
| bug_report: BugReport # Title, description, logs, env, reporter, metadata | |
| available_developers: List[str] # ["Alice", "Bob", "Carol", "David", "Eve"] | |
| step_number: int | |
| feedback: str # Human-readable grader feedback | |
| grader_score: Optional[float] # [0.0-1.0] β only when done=True | |
| episode_id: str | |
| ``` | |
| **BugReport fields the agent reads:** | |
| | Field | Type | Example | | |
| |-------|------|---------| | |
| | title | str | "App crashes on iOS 17 uploading >50MB" | | |
| | description | str | Full description with context | | |
| | logs | str? | Stack traces, error output | | |
| | environment | str? | "iOS 17.2, iPhone 15 Pro, App v3.2.1" | | |
| | reporter | str? | "enterprise_client_a" | | |
| | metadata | dict | `{"component": "file_upload", "affected_users": 847}` | | |
| --- | |
| ## 4. Action Model | |
| ```python | |
| class BugTriageAction: | |
| task_id: str # Required always | |
| # Task 1 (Easy) | |
| bug_type: Optional[str] # crash|ui|performance|security|data_loss|compatibility | |
| # Task 2 (Medium) | |
| priority: Optional[str] # low|medium|high|critical | |
| # Task 3 (Hard) β all of the above plus: | |
| assigned_developer: Optional[str] # Alice|Bob|Carol|David|Eve | |
| suggested_action: Optional[str] # fix_immediately|schedule_sprint|needs_more_info|wontfix|duplicate | |
| reasoning: Optional[str] # Chain-of-thought (not graded) | |
| ``` | |
| --- | |
| ## 5. Episode Flow | |
| ``` | |
| reset(task_id="task_1") | |
| β | |
| βββ Returns BugTriageObservation: | |
| - bug_report = random bug from dataset (15 bugs) | |
| - done = False | |
| - episode_id = "abc123" | |
| step(action=BugTriageAction(task_id="task_1", bug_type="crash")) | |
| β | |
| βββ Grader fires immediately (single-step episode) | |
| β task1_grader.grade([action], ground_truth) β 1.0 | |
| β | |
| βββ Returns BugTriageObservation: | |
| - done = True | |
| - reward = 1.0 (shaped from grader score) | |
| - grader_score = 1.0 | |
| - feedback = "Bug type: β (predicted=crash, expected=crash)" | |
| ``` | |
| **Key design:** Episodes are **single-step** β agent reads bug, makes one decision, episode ends. This matches real-world triage (you don't re-classify the same bug iteratively). | |
| --- | |
| ## 6. File Layout | |
| ``` | |
| bug_triage_env/ | |
| βββ __init__.py β Package exports | |
| βββ models.py β Pydantic typed Action/Observation/State | |
| βββ client.py β Sync + Async HTTP clients | |
| βββ baseline.py β OpenAI GPT-4o-mini inference script | |
| βββ openenv.yaml β Manifest | |
| βββ pyproject.toml | |
| βββ requirements.txt | |
| β | |
| βββ data/ | |
| β βββ bugs.json β 15 real-world bug reports + ground truth | |
| β | |
| βββ graders/ | |
| β βββ __init__.py β GRADERS registry | |
| β βββ task1_grader.py β Bug type exact match [0/1] | |
| β βββ task2_grader.py β Priority distance scoring [0-1] | |
| β βββ task3_grader.py β Weighted composite [0-1] | |
| β | |
| βββ server/ | |
| βββ __init__.py | |
| βββ environment.py β BugTriageEnvironment(Environment) | |
| βββ app.py β FastAPI (standard + hackathon endpoints) | |
| βββ Dockerfile | |
| ``` | |
| --- | |
| ## 7. API Endpoints | |
| ### Standard OpenEnv | |
| | Method | Endpoint | Purpose | | |
| |--------|----------|---------| | |
| | GET | `/health` | β `{"status": "healthy"}` | | |
| | POST | `/reset` | Body: `{"task_id": "task_1"}` β observation with bug report | | |
| | POST | `/step` | Body: `{"episode_id": "...", "action": {...}}` β scored observation | | |
| | GET | `/state` | β current episode metadata | | |
| | GET | `/docs` | Swagger UI | | |
| ### Hackathon Required | |
| | Method | Endpoint | Purpose | | |
| |--------|----------|---------| | |
| | GET | `/tasks` | β 3 tasks with action schemas | | |
| | POST | `/grader` | Body: `{"episode_id":"...","task_id":"task_1"}` β score [0.0-1.0] | | |
| | POST | `/baseline` | Runs baseline.py β all task scores | | |
| --- | |
| ## 8. Developer Specialization Matrix | |
| | Developer | Crash | UI | Perf | Security | Data Loss | Compat | | |
| |-----------|-------|----|------|----------|-----------|--------| | |
| | Alice | β | | β | | β | | | |
| | Bob | β | | | β | | | | |
| | Carol | | β | | | | β | | |
| | David | | | | β | β | | | |
| | Eve | | β | β | | | β | | |
| This matrix is used by the Task 3 grader for **partial credit** on developer assignment β if the agent picks the wrong person but someone with the right specialization, it gets 0.5 instead of 0.0. | |
| --- | |
| ## 9. Scaling | |
| | Deployment | Workers | Max Sessions | | |
| |------------|---------|-------------| | |
| | Local | 8 | ~2000 | | |
| | HF Spaces Free | 2 | ~128 | | |
| | HF Spaces Upgrade | 4-8 | ~512 | | |
| Thread-safe: `BugTriageEnvironment` uses a `threading.Lock` for concurrent episode storage. | |