Spaces:

savetrees
/

bug-triage-openenv

Running

File size: 6,006 Bytes

0135a17

# ARCHITECTURE.md — Bug Triage OpenEnv System Architecture

---

## 1. Core Abstractions (OpenEnv Spec)

### Server-side (Docker)
```
BugTriageEnvironment(Environment)
├── reset(task_id)  →  BugTriageObservation   # New bug report episode
├── step(action)    →  BugTriageObservation   # Agent triages; grader fires; done=True
└── state @property →  BugTriageState         # Episode metadata
```

### Client-side (Training code)
```
BugTriageEnvClient
├── reset(task_id)  →  dict   # POST /reset
├── step(ep_id, action) → dict  # POST /step
└── state()  →  dict           # GET  /state
```

---

## 2. State Model

```python
class BugTriageState:
    episode_id: str         # Unique per episode
    step_count: int         # 0 after reset, 1 after step
    task_id: str            # "task_1" | "task_2" | "task_3"
    bug_id: str             # Which bug report is active
    cumulative_reward: float
```

---

## 3. Observation Model

```python
class BugTriageObservation:
    done: bool                        # True after step()
    reward: float                     # Shaped reward [-0.5, 1.0]
    task_id: str

    bug_report: BugReport             # Title, description, logs, env, reporter, metadata
    available_developers: List[str]   # ["Alice", "Bob", "Carol", "David", "Eve"]
    step_number: int
    feedback: str                     # Human-readable grader feedback
    grader_score: Optional[float]     # [0.0-1.0] — only when done=True
    episode_id: str
```

**BugReport fields the agent reads:**
| Field | Type | Example |
|-------|------|---------|
| title | str | "App crashes on iOS 17 uploading >50MB" |
| description | str | Full description with context |
| logs | str? | Stack traces, error output |
| environment | str? | "iOS 17.2, iPhone 15 Pro, App v3.2.1" |
| reporter | str? | "enterprise_client_a" |
| metadata | dict | `{"component": "file_upload", "affected_users": 847}` |

---

## 4. Action Model

```python
class BugTriageAction:
    task_id: str                                    # Required always

    # Task 1 (Easy)
    bug_type: Optional[str]                         # crash|ui|performance|security|data_loss|compatibility

    # Task 2 (Medium)
    priority: Optional[str]                         # low|medium|high|critical

    # Task 3 (Hard) — all of the above plus:
    assigned_developer: Optional[str]               # Alice|Bob|Carol|David|Eve
    suggested_action: Optional[str]                 # fix_immediately|schedule_sprint|needs_more_info|wontfix|duplicate
    reasoning: Optional[str]                        # Chain-of-thought (not graded)
```

---

## 5. Episode Flow

```
reset(task_id="task_1")
  │
  └── Returns BugTriageObservation:
        - bug_report = random bug from dataset (15 bugs)
        - done = False
        - episode_id = "abc123"

step(action=BugTriageAction(task_id="task_1", bug_type="crash"))
  │
  ├── Grader fires immediately (single-step episode)
  │     task1_grader.grade([action], ground_truth) → 1.0
  │
  └── Returns BugTriageObservation:
        - done = True
        - reward = 1.0 (shaped from grader score)
        - grader_score = 1.0
        - feedback = "Bug type: ✓ (predicted=crash, expected=crash)"
```

**Key design:** Episodes are **single-step** — agent reads bug, makes one decision, episode ends. This matches real-world triage (you don't re-classify the same bug iteratively).

---

## 6. File Layout

```
bug_triage_env/
├── __init__.py            ← Package exports
├── models.py              ← Pydantic typed Action/Observation/State
├── client.py              ← Sync + Async HTTP clients
├── baseline.py            ← OpenAI GPT-4o-mini inference script
├── openenv.yaml           ← Manifest
├── pyproject.toml
├── requirements.txt
│
├── data/
│   └── bugs.json          ← 15 real-world bug reports + ground truth
│
├── graders/
│   ├── __init__.py        ← GRADERS registry
│   ├── task1_grader.py    ← Bug type exact match [0/1]
│   ├── task2_grader.py    ← Priority distance scoring [0-1]
│   └── task3_grader.py    ← Weighted composite [0-1]
│
└── server/
    ├── __init__.py
    ├── environment.py     ← BugTriageEnvironment(Environment)
    ├── app.py             ← FastAPI (standard + hackathon endpoints)
    └── Dockerfile
```

---

## 7. API Endpoints

### Standard OpenEnv
| Method | Endpoint | Purpose |
|--------|----------|---------|
| GET | `/health` | → `{"status": "healthy"}` |
| POST | `/reset` | Body: `{"task_id": "task_1"}` → observation with bug report |
| POST | `/step` | Body: `{"episode_id": "...", "action": {...}}` → scored observation |
| GET | `/state` | → current episode metadata |
| GET | `/docs` | Swagger UI |

### Hackathon Required
| Method | Endpoint | Purpose |
|--------|----------|---------|
| GET | `/tasks` | → 3 tasks with action schemas |
| POST | `/grader` | Body: `{"episode_id":"...","task_id":"task_1"}` → score [0.0-1.0] |
| POST | `/baseline` | Runs baseline.py → all task scores |

---

## 8. Developer Specialization Matrix

| Developer | Crash | UI | Perf | Security | Data Loss | Compat |
|-----------|-------|----|------|----------|-----------|--------|
| Alice | ✓ | | ✓ | | ✓ | |
| Bob | ✓ | | | ✓ | | |
| Carol | | ✓ | | | | ✓ |
| David | | | | ✓ | ✓ | |
| Eve | | ✓ | ✓ | | | ✓ |

This matrix is used by the Task 3 grader for **partial credit** on developer assignment — if the agent picks the wrong person but someone with the right specialization, it gets 0.5 instead of 0.0.

---

## 9. Scaling

| Deployment | Workers | Max Sessions |
|------------|---------|-------------|
| Local | 8 | ~2000 |
| HF Spaces Free | 2 | ~128 |
| HF Spaces Upgrade | 4-8 | ~512 |

Thread-safe: `BugTriageEnvironment` uses a `threading.Lock` for concurrent episode storage.