bug-triage-openenv / docs /ARCHITECTURE.md
savetrees's picture
Upload folder using huggingface_hub
0135a17 verified
# ARCHITECTURE.md β€” Bug Triage OpenEnv System Architecture
---
## 1. Core Abstractions (OpenEnv Spec)
### Server-side (Docker)
```
BugTriageEnvironment(Environment)
β”œβ”€β”€ reset(task_id) β†’ BugTriageObservation # New bug report episode
β”œβ”€β”€ step(action) β†’ BugTriageObservation # Agent triages; grader fires; done=True
└── state @property β†’ BugTriageState # Episode metadata
```
### Client-side (Training code)
```
BugTriageEnvClient
β”œβ”€β”€ reset(task_id) β†’ dict # POST /reset
β”œβ”€β”€ step(ep_id, action) β†’ dict # POST /step
└── state() β†’ dict # GET /state
```
---
## 2. State Model
```python
class BugTriageState:
episode_id: str # Unique per episode
step_count: int # 0 after reset, 1 after step
task_id: str # "task_1" | "task_2" | "task_3"
bug_id: str # Which bug report is active
cumulative_reward: float
```
---
## 3. Observation Model
```python
class BugTriageObservation:
done: bool # True after step()
reward: float # Shaped reward [-0.5, 1.0]
task_id: str
bug_report: BugReport # Title, description, logs, env, reporter, metadata
available_developers: List[str] # ["Alice", "Bob", "Carol", "David", "Eve"]
step_number: int
feedback: str # Human-readable grader feedback
grader_score: Optional[float] # [0.0-1.0] β€” only when done=True
episode_id: str
```
**BugReport fields the agent reads:**
| Field | Type | Example |
|-------|------|---------|
| title | str | "App crashes on iOS 17 uploading >50MB" |
| description | str | Full description with context |
| logs | str? | Stack traces, error output |
| environment | str? | "iOS 17.2, iPhone 15 Pro, App v3.2.1" |
| reporter | str? | "enterprise_client_a" |
| metadata | dict | `{"component": "file_upload", "affected_users": 847}` |
---
## 4. Action Model
```python
class BugTriageAction:
task_id: str # Required always
# Task 1 (Easy)
bug_type: Optional[str] # crash|ui|performance|security|data_loss|compatibility
# Task 2 (Medium)
priority: Optional[str] # low|medium|high|critical
# Task 3 (Hard) β€” all of the above plus:
assigned_developer: Optional[str] # Alice|Bob|Carol|David|Eve
suggested_action: Optional[str] # fix_immediately|schedule_sprint|needs_more_info|wontfix|duplicate
reasoning: Optional[str] # Chain-of-thought (not graded)
```
---
## 5. Episode Flow
```
reset(task_id="task_1")
β”‚
└── Returns BugTriageObservation:
- bug_report = random bug from dataset (15 bugs)
- done = False
- episode_id = "abc123"
step(action=BugTriageAction(task_id="task_1", bug_type="crash"))
β”‚
β”œβ”€β”€ Grader fires immediately (single-step episode)
β”‚ task1_grader.grade([action], ground_truth) β†’ 1.0
β”‚
└── Returns BugTriageObservation:
- done = True
- reward = 1.0 (shaped from grader score)
- grader_score = 1.0
- feedback = "Bug type: βœ“ (predicted=crash, expected=crash)"
```
**Key design:** Episodes are **single-step** β€” agent reads bug, makes one decision, episode ends. This matches real-world triage (you don't re-classify the same bug iteratively).
---
## 6. File Layout
```
bug_triage_env/
β”œβ”€β”€ __init__.py ← Package exports
β”œβ”€β”€ models.py ← Pydantic typed Action/Observation/State
β”œβ”€β”€ client.py ← Sync + Async HTTP clients
β”œβ”€β”€ baseline.py ← OpenAI GPT-4o-mini inference script
β”œβ”€β”€ openenv.yaml ← Manifest
β”œβ”€β”€ pyproject.toml
β”œβ”€β”€ requirements.txt
β”‚
β”œβ”€β”€ data/
β”‚ └── bugs.json ← 15 real-world bug reports + ground truth
β”‚
β”œβ”€β”€ graders/
β”‚ β”œβ”€β”€ __init__.py ← GRADERS registry
β”‚ β”œβ”€β”€ task1_grader.py ← Bug type exact match [0/1]
β”‚ β”œβ”€β”€ task2_grader.py ← Priority distance scoring [0-1]
β”‚ └── task3_grader.py ← Weighted composite [0-1]
β”‚
└── server/
β”œβ”€β”€ __init__.py
β”œβ”€β”€ environment.py ← BugTriageEnvironment(Environment)
β”œβ”€β”€ app.py ← FastAPI (standard + hackathon endpoints)
└── Dockerfile
```
---
## 7. API Endpoints
### Standard OpenEnv
| Method | Endpoint | Purpose |
|--------|----------|---------|
| GET | `/health` | β†’ `{"status": "healthy"}` |
| POST | `/reset` | Body: `{"task_id": "task_1"}` β†’ observation with bug report |
| POST | `/step` | Body: `{"episode_id": "...", "action": {...}}` β†’ scored observation |
| GET | `/state` | β†’ current episode metadata |
| GET | `/docs` | Swagger UI |
### Hackathon Required
| Method | Endpoint | Purpose |
|--------|----------|---------|
| GET | `/tasks` | β†’ 3 tasks with action schemas |
| POST | `/grader` | Body: `{"episode_id":"...","task_id":"task_1"}` β†’ score [0.0-1.0] |
| POST | `/baseline` | Runs baseline.py β†’ all task scores |
---
## 8. Developer Specialization Matrix
| Developer | Crash | UI | Perf | Security | Data Loss | Compat |
|-----------|-------|----|------|----------|-----------|--------|
| Alice | βœ“ | | βœ“ | | βœ“ | |
| Bob | βœ“ | | | βœ“ | | |
| Carol | | βœ“ | | | | βœ“ |
| David | | | | βœ“ | βœ“ | |
| Eve | | βœ“ | βœ“ | | | βœ“ |
This matrix is used by the Task 3 grader for **partial credit** on developer assignment β€” if the agent picks the wrong person but someone with the right specialization, it gets 0.5 instead of 0.0.
---
## 9. Scaling
| Deployment | Workers | Max Sessions |
|------------|---------|-------------|
| Local | 8 | ~2000 |
| HF Spaces Free | 2 | ~128 |
| HF Spaces Upgrade | 4-8 | ~512 |
Thread-safe: `BugTriageEnvironment` uses a `threading.Lock` for concurrent episode storage.