bug-triage-openenv / docs /PROJECT.md
savetrees's picture
Upload folder using huggingface_hub
0135a17 verified

PROJECT.md β€” Bug Triage OpenEnv Environment

Round 1 opens: April 1, 2026 | Deadline: April 7, 2026, 11:59 PM IST


Problem Statement

Software teams receive hundreds of bug reports daily. Manually triaging each report β€” classifying the bug type, assigning a priority level, routing to the right developer, and deciding the appropriate action β€” is slow, error-prone, and a significant cost to engineering velocity.

This environment trains an LLM agent to perform automated bug triage at production scale, simulating real-world issue tracking systems like Jira and GitHub Issues. The agent reads raw bug reports (title, description, stack traces, metadata) and must classify, prioritize, and route each issue accurately.


Real-World Relevance

Dimension Detail
Domain Software Engineering / DevOps / Issue Tracking
Industry use Every software company with a bug tracker (GitHub Issues, Jira, Linear, etc.)
Scale Large teams process 100–500+ bug reports per week
Cost of wrong triage Critical bugs missed β†’ outages; low-priority spamming senior devs β†’ waste
Current solutions Manual labels, basic keyword rules, ML classifiers (limited context)
LLM advantage Can reason over free-text descriptions, logs, and metadata together

Users

User Need
Developers Only receive bugs relevant to their specialization
QA Engineers Know which bugs to test first (priority-ordered)
Project Managers Accurate sprint planning based on classified backlogs
Engineering Leads Automated triage frees team from manual label overhead

Example Scenario

Bug Reported:
  Title:       "App crashes on iOS 17 when uploading files > 50MB"
  Description: "Consistently crashes immediately on upload tap. Blocking 
                3 enterprise customers."
  Logs:        "FATAL: EXC_BAD_ACCESS KERN_INVALID_ADDRESS 0x0000000000000000
                Stack: FileUploadManager.upload(url:size:)"
  Environment: "iOS 17.2, iPhone 15 Pro, App v3.2.1"

Agent Must Output:
  bug_type:           crash          ← Task 1
  priority:           critical       ← Task 2
  assigned_developer: Alice          ← Task 3 (crash specialist)
  suggested_action:   fix_immediately ← Task 3

Real-World Constraints the Agent Must Handle

  • Ambiguous descriptions β€” reporters with varying technical skill
  • Incomplete logs β€” stack traces cut off or missing
  • Priority inflation β€” reporters who label everything as "critical"
  • Routing uncertainty β€” cross-cutting bugs (e.g., security + crash)
  • Missing environment info β€” agent must infer from context

Tech Stack

Layer Technology
Environment server FastAPI + Uvicorn
Containerisation Docker
Deployment Hugging Face Spaces
Training TRL GRPOTrainer + vLLM
Base model Qwen/Qwen3-1.7B
Package manager uv
Validation Pydantic v2
Baseline LLM OpenAI GPT-4o-mini (via OPENAI_API_KEY)

Repository Layout

bug_triage_env/
β”œβ”€β”€ models.py             ← Pydantic-typed Action / Observation / State
β”œβ”€β”€ client.py             ← HTTP client for training code
β”œβ”€β”€ baseline.py           ← OpenAI-backed inference script
β”œβ”€β”€ openenv.yaml          ← Manifest
β”œβ”€β”€ pyproject.toml
β”œβ”€β”€ requirements.txt
β”œβ”€β”€ data/
β”‚   └── bugs.json         ← 15 diverse real-world bug reports
β”œβ”€β”€ graders/
β”‚   β”œβ”€β”€ task1_grader.py   ← Bug type classification [0.0–1.0]
β”‚   β”œβ”€β”€ task2_grader.py   ← Priority assignment [0.0–1.0]
β”‚   └── task3_grader.py   ← Full triage [0.0–1.0]
└── server/
    β”œβ”€β”€ environment.py    ← OpenEnv Environment ABC implementation
    β”œβ”€β”€ app.py            ← FastAPI (standard + hackathon endpoints)
    └── Dockerfile

Hackathon Compliance Checklist

Requirement Status
HF Space deploys + /reset returns 200 ☐
openenv.yaml present ☐
Typed models (Action, Observation, State) ☐
step() / reset() / state() implemented ☐
Dockerfile builds ☐
/tasks β€” task list + action schema ☐
/grader β€” score in [0.0, 1.0] ☐
/baseline β€” OpenAI inference, all 3 tasks ☐
3+ graded tasks with varying scores ☐
baseline.py runs without error ☐