File size: 4,557 Bytes
0135a17
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
# PROJECT.md β€” Bug Triage OpenEnv Environment

> **Round 1 opens:** April 1, 2026 | **Deadline:** April 7, 2026, 11:59 PM IST

---

## Problem Statement

Software teams receive hundreds of bug reports daily. Manually triaging each report β€”
classifying the bug type, assigning a priority level, routing to the right developer,
and deciding the appropriate action β€” is slow, error-prone, and a significant cost
to engineering velocity.

**This environment** trains an LLM agent to perform **automated bug triage** at
production scale, simulating real-world issue tracking systems like Jira and GitHub
Issues. The agent reads raw bug reports (title, description, stack traces, metadata)
and must classify, prioritize, and route each issue accurately.

---

## Real-World Relevance

| Dimension | Detail |
|-----------|--------|
| **Domain** | Software Engineering / DevOps / Issue Tracking |
| **Industry use** | Every software company with a bug tracker (GitHub Issues, Jira, Linear, etc.) |
| **Scale** | Large teams process 100–500+ bug reports per week |
| **Cost of wrong triage** | Critical bugs missed β†’ outages; low-priority spamming senior devs β†’ waste |
| **Current solutions** | Manual labels, basic keyword rules, ML classifiers (limited context) |
| **LLM advantage** | Can reason over free-text descriptions, logs, and metadata together |

---

## Users

| User | Need |
|------|------|
| **Developers** | Only receive bugs relevant to their specialization |
| **QA Engineers** | Know which bugs to test first (priority-ordered) |
| **Project Managers** | Accurate sprint planning based on classified backlogs |
| **Engineering Leads** | Automated triage frees team from manual label overhead |

---

## Example Scenario

```
Bug Reported:
  Title:       "App crashes on iOS 17 when uploading files > 50MB"
  Description: "Consistently crashes immediately on upload tap. Blocking 
                3 enterprise customers."
  Logs:        "FATAL: EXC_BAD_ACCESS KERN_INVALID_ADDRESS 0x0000000000000000
                Stack: FileUploadManager.upload(url:size:)"
  Environment: "iOS 17.2, iPhone 15 Pro, App v3.2.1"

Agent Must Output:
  bug_type:           crash          ← Task 1
  priority:           critical       ← Task 2
  assigned_developer: Alice          ← Task 3 (crash specialist)
  suggested_action:   fix_immediately ← Task 3
```

---

## Real-World Constraints the Agent Must Handle

- **Ambiguous descriptions** β€” reporters with varying technical skill
- **Incomplete logs** β€” stack traces cut off or missing
- **Priority inflation** β€” reporters who label everything as "critical"
- **Routing uncertainty** β€” cross-cutting bugs (e.g., security + crash)
- **Missing environment info** β€” agent must infer from context

---

## Tech Stack

| Layer | Technology |
|-------|-----------|
| Environment server | FastAPI + Uvicorn |
| Containerisation | Docker |
| Deployment | Hugging Face Spaces |
| Training | TRL GRPOTrainer + vLLM |
| Base model | Qwen/Qwen3-1.7B |
| Package manager | uv |
| Validation | Pydantic v2 |
| Baseline LLM | OpenAI GPT-4o-mini (via `OPENAI_API_KEY`) |

---

## Repository Layout

```
bug_triage_env/
β”œβ”€β”€ models.py             ← Pydantic-typed Action / Observation / State
β”œβ”€β”€ client.py             ← HTTP client for training code
β”œβ”€β”€ baseline.py           ← OpenAI-backed inference script
β”œβ”€β”€ openenv.yaml          ← Manifest
β”œβ”€β”€ pyproject.toml
β”œβ”€β”€ requirements.txt
β”œβ”€β”€ data/
β”‚   └── bugs.json         ← 15 diverse real-world bug reports
β”œβ”€β”€ graders/
β”‚   β”œβ”€β”€ task1_grader.py   ← Bug type classification [0.0–1.0]
β”‚   β”œβ”€β”€ task2_grader.py   ← Priority assignment [0.0–1.0]
β”‚   └── task3_grader.py   ← Full triage [0.0–1.0]
└── server/
    β”œβ”€β”€ environment.py    ← OpenEnv Environment ABC implementation
    β”œβ”€β”€ app.py            ← FastAPI (standard + hackathon endpoints)
    └── Dockerfile
```

---

## Hackathon Compliance Checklist

| Requirement | Status |
|-------------|--------|
| HF Space deploys + `/reset` returns 200 | ☐ |
| `openenv.yaml` present | ☐ |
| Typed models (`Action`, `Observation`, `State`) | ☐ |
| `step()` / `reset()` / `state()` implemented | ☐ |
| Dockerfile builds | ☐ |
| `/tasks` β€” task list + action schema | ☐ |
| `/grader` β€” score in `[0.0, 1.0]` | ☐ |
| `/baseline` β€” OpenAI inference, all 3 tasks | ☐ |
| 3+ graded tasks with varying scores | ☐ |
| `baseline.py` runs without error | ☐ |