File size: 3,877 Bytes
0135a17
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
# TASKS.md — Bug Triage Environment Tasks

> 3 graded tasks, easy → medium → hard. Scores in [0.0, 1.0].

---

## Task 1 (Easy): Bug Type Classification

**Goal:** Given a bug report, classify its type.

**Output labels:** `crash` | `ui` | `performance` | `security` | `data_loss` | `compatibility`

**Grader:** Exact match → 1.0, wrong → 0.0

### Input Example
```
Title: "App crashes on iOS 17 when uploading files > 50MB"
Description: "Consistently crashes immediately on upload tap."
Logs: "FATAL: EXC_BAD_ACCESS KERN_INVALID_ADDRESS at 0x0"
```

### Expected Output
```json
{"task_id": "task_1", "bug_type": "crash"}
```

### Success Criteria
| Agent Output | Score |
|-------------|-------|
| `bug_type: "crash"` | 1.0 ✓ |
| `bug_type: "performance"` | 0.0 ✗ |
| Missing field | 0.0 ✗ |

---

## Task 2 (Medium): Priority Assignment

**Goal:** Given a bug report, assign the correct priority level.

**Output labels:** `low` | `medium` | `high` | `critical`

**Grader:** Distance-based partial credit.

### Input Example
```
Title: "SQL injection possible in search endpoint"
Description: "The /api/v1/search endpoint does not sanitize user input.
  Passing ' OR 1=1 -- returns all records."
Metadata: {"severity": "CVSS_9.8", "disclosure_deadline": "2025-03-27"}
```

### Expected Output
```json
{"task_id": "task_2", "priority": "critical"}
```

### Success Criteria (Partial Credit)
| Agent Output | Distance | Score |
|-------------|----------|-------|
| `priority: "critical"` | 0 | 1.00 ✓ |
| `priority: "high"` | 1 | 0.67 |
| `priority: "medium"` | 2 | 0.33 |
| `priority: "low"` | 3 | 0.00 ✗ |

---

## Task 3 (Hard): Full Bug Triage

**Goal:** Complete triage — classify + prioritize + assign developer + suggest action.

**Weights:** bug_type (30%) + priority (30%) + developer (20%) + action (20%)

### Input Example
```
Title: "Export to CSV corrupts data with special characters"
Description: "Characters like é, ü, 中文 replaced with question marks."
Logs: "UnicodeEncodeError: 'latin-1' codec can't encode character"
Metadata: {"affected_users": 3200, "regression": true}
```

### Expected Output
```json
{
  "task_id": "task_3",
  "bug_type": "data_loss",
  "priority": "high",
  "assigned_developer": "David",
  "suggested_action": "fix_immediately"
}
```

### Success Criteria (Composite)
| Dimension | Weight | Scoring |
|-----------|--------|---------|
| bug_type | 30% | exact match → 1.0, else 0.0 |
| priority | 30% | distance: 0→1.0, 1→0.67, 2→0.33, 3→0.0 |
| developer | 20% | exact→1.0, right specialization→0.5, else→0.0 |
| action | 20% | exact→1.0, adjacent→0.5, else→0.0 |

### Example Scoring
```
Agent output:  bug_type=data_loss ✓  priority=high ✓  dev=Alice (wrong, but knows data_loss) →0.5  action=fix_immediately ✓
Score = 0.30×1.0 + 0.30×1.0 + 0.20×0.5 + 0.20×1.0 = 0.30+0.30+0.10+0.20 = 0.90
```

---

## Bug Dataset (15 reports)

| Bug ID | Type | Priority | Developer | Action |
|--------|------|----------|-----------|--------|
| BUG-001 | crash | critical | Alice | fix_immediately |
| BUG-002 | ui | low | Carol | schedule_sprint |
| BUG-003 | performance | high | Alice | fix_immediately |
| BUG-004 | security | critical | Bob | fix_immediately |
| BUG-005 | data_loss | high | David | fix_immediately |
| BUG-006 | compatibility | high | Eve | fix_immediately |
| BUG-007 | ui | medium | Carol | schedule_sprint |
| BUG-008 | performance | critical | Alice | fix_immediately |
| BUG-009 | ui | medium | Carol | fix_immediately |
| BUG-010 | data_loss | critical | David | fix_immediately |
| BUG-011 | ui | low | Carol | schedule_sprint |
| BUG-012 | security | critical | Bob | fix_immediately |
| BUG-013 | compatibility | medium | Eve | schedule_sprint |
| BUG-014 | performance | high | Eve | fix_immediately |
| BUG-015 | ui | low | Carol | needs_more_info |