Spaces:
Sleeping
Sleeping
| # PROJECT.md β Bug Triage OpenEnv Environment | |
| > **Round 1 opens:** April 1, 2026 | **Deadline:** April 7, 2026, 11:59 PM IST | |
| --- | |
| ## Problem Statement | |
| Software teams receive hundreds of bug reports daily. Manually triaging each report β | |
| classifying the bug type, assigning a priority level, routing to the right developer, | |
| and deciding the appropriate action β is slow, error-prone, and a significant cost | |
| to engineering velocity. | |
| **This environment** trains an LLM agent to perform **automated bug triage** at | |
| production scale, simulating real-world issue tracking systems like Jira and GitHub | |
| Issues. The agent reads raw bug reports (title, description, stack traces, metadata) | |
| and must classify, prioritize, and route each issue accurately. | |
| --- | |
| ## Real-World Relevance | |
| | Dimension | Detail | | |
| |-----------|--------| | |
| | **Domain** | Software Engineering / DevOps / Issue Tracking | | |
| | **Industry use** | Every software company with a bug tracker (GitHub Issues, Jira, Linear, etc.) | | |
| | **Scale** | Large teams process 100β500+ bug reports per week | | |
| | **Cost of wrong triage** | Critical bugs missed β outages; low-priority spamming senior devs β waste | | |
| | **Current solutions** | Manual labels, basic keyword rules, ML classifiers (limited context) | | |
| | **LLM advantage** | Can reason over free-text descriptions, logs, and metadata together | | |
| --- | |
| ## Users | |
| | User | Need | | |
| |------|------| | |
| | **Developers** | Only receive bugs relevant to their specialization | | |
| | **QA Engineers** | Know which bugs to test first (priority-ordered) | | |
| | **Project Managers** | Accurate sprint planning based on classified backlogs | | |
| | **Engineering Leads** | Automated triage frees team from manual label overhead | | |
| --- | |
| ## Example Scenario | |
| ``` | |
| Bug Reported: | |
| Title: "App crashes on iOS 17 when uploading files > 50MB" | |
| Description: "Consistently crashes immediately on upload tap. Blocking | |
| 3 enterprise customers." | |
| Logs: "FATAL: EXC_BAD_ACCESS KERN_INVALID_ADDRESS 0x0000000000000000 | |
| Stack: FileUploadManager.upload(url:size:)" | |
| Environment: "iOS 17.2, iPhone 15 Pro, App v3.2.1" | |
| Agent Must Output: | |
| bug_type: crash β Task 1 | |
| priority: critical β Task 2 | |
| assigned_developer: Alice β Task 3 (crash specialist) | |
| suggested_action: fix_immediately β Task 3 | |
| ``` | |
| --- | |
| ## Real-World Constraints the Agent Must Handle | |
| - **Ambiguous descriptions** β reporters with varying technical skill | |
| - **Incomplete logs** β stack traces cut off or missing | |
| - **Priority inflation** β reporters who label everything as "critical" | |
| - **Routing uncertainty** β cross-cutting bugs (e.g., security + crash) | |
| - **Missing environment info** β agent must infer from context | |
| --- | |
| ## Tech Stack | |
| | Layer | Technology | | |
| |-------|-----------| | |
| | Environment server | FastAPI + Uvicorn | | |
| | Containerisation | Docker | | |
| | Deployment | Hugging Face Spaces | | |
| | Training | TRL GRPOTrainer + vLLM | | |
| | Base model | Qwen/Qwen3-1.7B | | |
| | Package manager | uv | | |
| | Validation | Pydantic v2 | | |
| | Baseline LLM | OpenAI GPT-4o-mini (via `OPENAI_API_KEY`) | | |
| --- | |
| ## Repository Layout | |
| ``` | |
| bug_triage_env/ | |
| βββ models.py β Pydantic-typed Action / Observation / State | |
| βββ client.py β HTTP client for training code | |
| βββ baseline.py β OpenAI-backed inference script | |
| βββ openenv.yaml β Manifest | |
| βββ pyproject.toml | |
| βββ requirements.txt | |
| βββ data/ | |
| β βββ bugs.json β 15 diverse real-world bug reports | |
| βββ graders/ | |
| β βββ task1_grader.py β Bug type classification [0.0β1.0] | |
| β βββ task2_grader.py β Priority assignment [0.0β1.0] | |
| β βββ task3_grader.py β Full triage [0.0β1.0] | |
| βββ server/ | |
| βββ environment.py β OpenEnv Environment ABC implementation | |
| βββ app.py β FastAPI (standard + hackathon endpoints) | |
| βββ Dockerfile | |
| ``` | |
| --- | |
| ## Hackathon Compliance Checklist | |
| | Requirement | Status | | |
| |-------------|--------| | |
| | HF Space deploys + `/reset` returns 200 | β | | |
| | `openenv.yaml` present | β | | |
| | Typed models (`Action`, `Observation`, `State`) | β | | |
| | `step()` / `reset()` / `state()` implemented | β | | |
| | Dockerfile builds | β | | |
| | `/tasks` β task list + action schema | β | | |
| | `/grader` β score in `[0.0, 1.0]` | β | | |
| | `/baseline` β OpenAI inference, all 3 tasks | β | | |
| | 3+ graded tasks with varying scores | β | | |
| | `baseline.py` runs without error | β | | |