Spaces:
Running
Running
File size: 6,006 Bytes
0135a17 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 | # ARCHITECTURE.md β Bug Triage OpenEnv System Architecture
---
## 1. Core Abstractions (OpenEnv Spec)
### Server-side (Docker)
```
BugTriageEnvironment(Environment)
βββ reset(task_id) β BugTriageObservation # New bug report episode
βββ step(action) β BugTriageObservation # Agent triages; grader fires; done=True
βββ state @property β BugTriageState # Episode metadata
```
### Client-side (Training code)
```
BugTriageEnvClient
βββ reset(task_id) β dict # POST /reset
βββ step(ep_id, action) β dict # POST /step
βββ state() β dict # GET /state
```
---
## 2. State Model
```python
class BugTriageState:
episode_id: str # Unique per episode
step_count: int # 0 after reset, 1 after step
task_id: str # "task_1" | "task_2" | "task_3"
bug_id: str # Which bug report is active
cumulative_reward: float
```
---
## 3. Observation Model
```python
class BugTriageObservation:
done: bool # True after step()
reward: float # Shaped reward [-0.5, 1.0]
task_id: str
bug_report: BugReport # Title, description, logs, env, reporter, metadata
available_developers: List[str] # ["Alice", "Bob", "Carol", "David", "Eve"]
step_number: int
feedback: str # Human-readable grader feedback
grader_score: Optional[float] # [0.0-1.0] β only when done=True
episode_id: str
```
**BugReport fields the agent reads:**
| Field | Type | Example |
|-------|------|---------|
| title | str | "App crashes on iOS 17 uploading >50MB" |
| description | str | Full description with context |
| logs | str? | Stack traces, error output |
| environment | str? | "iOS 17.2, iPhone 15 Pro, App v3.2.1" |
| reporter | str? | "enterprise_client_a" |
| metadata | dict | `{"component": "file_upload", "affected_users": 847}` |
---
## 4. Action Model
```python
class BugTriageAction:
task_id: str # Required always
# Task 1 (Easy)
bug_type: Optional[str] # crash|ui|performance|security|data_loss|compatibility
# Task 2 (Medium)
priority: Optional[str] # low|medium|high|critical
# Task 3 (Hard) β all of the above plus:
assigned_developer: Optional[str] # Alice|Bob|Carol|David|Eve
suggested_action: Optional[str] # fix_immediately|schedule_sprint|needs_more_info|wontfix|duplicate
reasoning: Optional[str] # Chain-of-thought (not graded)
```
---
## 5. Episode Flow
```
reset(task_id="task_1")
β
βββ Returns BugTriageObservation:
- bug_report = random bug from dataset (15 bugs)
- done = False
- episode_id = "abc123"
step(action=BugTriageAction(task_id="task_1", bug_type="crash"))
β
βββ Grader fires immediately (single-step episode)
β task1_grader.grade([action], ground_truth) β 1.0
β
βββ Returns BugTriageObservation:
- done = True
- reward = 1.0 (shaped from grader score)
- grader_score = 1.0
- feedback = "Bug type: β (predicted=crash, expected=crash)"
```
**Key design:** Episodes are **single-step** β agent reads bug, makes one decision, episode ends. This matches real-world triage (you don't re-classify the same bug iteratively).
---
## 6. File Layout
```
bug_triage_env/
βββ __init__.py β Package exports
βββ models.py β Pydantic typed Action/Observation/State
βββ client.py β Sync + Async HTTP clients
βββ baseline.py β OpenAI GPT-4o-mini inference script
βββ openenv.yaml β Manifest
βββ pyproject.toml
βββ requirements.txt
β
βββ data/
β βββ bugs.json β 15 real-world bug reports + ground truth
β
βββ graders/
β βββ __init__.py β GRADERS registry
β βββ task1_grader.py β Bug type exact match [0/1]
β βββ task2_grader.py β Priority distance scoring [0-1]
β βββ task3_grader.py β Weighted composite [0-1]
β
βββ server/
βββ __init__.py
βββ environment.py β BugTriageEnvironment(Environment)
βββ app.py β FastAPI (standard + hackathon endpoints)
βββ Dockerfile
```
---
## 7. API Endpoints
### Standard OpenEnv
| Method | Endpoint | Purpose |
|--------|----------|---------|
| GET | `/health` | β `{"status": "healthy"}` |
| POST | `/reset` | Body: `{"task_id": "task_1"}` β observation with bug report |
| POST | `/step` | Body: `{"episode_id": "...", "action": {...}}` β scored observation |
| GET | `/state` | β current episode metadata |
| GET | `/docs` | Swagger UI |
### Hackathon Required
| Method | Endpoint | Purpose |
|--------|----------|---------|
| GET | `/tasks` | β 3 tasks with action schemas |
| POST | `/grader` | Body: `{"episode_id":"...","task_id":"task_1"}` β score [0.0-1.0] |
| POST | `/baseline` | Runs baseline.py β all task scores |
---
## 8. Developer Specialization Matrix
| Developer | Crash | UI | Perf | Security | Data Loss | Compat |
|-----------|-------|----|------|----------|-----------|--------|
| Alice | β | | β | | β | |
| Bob | β | | | β | | |
| Carol | | β | | | | β |
| David | | | | β | β | |
| Eve | | β | β | | | β |
This matrix is used by the Task 3 grader for **partial credit** on developer assignment β if the agent picks the wrong person but someone with the right specialization, it gets 0.5 instead of 0.0.
---
## 9. Scaling
| Deployment | Workers | Max Sessions |
|------------|---------|-------------|
| Local | 8 | ~2000 |
| HF Spaces Free | 2 | ~128 |
| HF Spaces Upgrade | 4-8 | ~512 |
Thread-safe: `BugTriageEnvironment` uses a `threading.Lock` for concurrent episode storage.
|