File size: 3,846 Bytes
9ec4919 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 | # Bug Hunting Loop
## Objective
Find, validate, and report likely bugs with reproducible evidence instead of filing speculative agent-generated issues.
## Trigger
- Schedule: weekly on active modules.
- Event: error logs spike, flaky tests cluster, user reports mention the same behavior, or a release branch opens.
- Manual bootstrap/debug command: "hunt for reproducible bugs in this module."
## Intake
- Recent errors, flaky tests, issue labels, support snippets, changed files, code ownership, logs, traces, and module documentation.
- Existing bug reports and duplicate issue search.
- Safe reproduction commands and test fixtures.
## Agents
- Scout: discovers suspicious signals and likely affected code paths.
- Reproducer: attempts minimal reproduction in a safe environment.
- Minimizer: reduces the reproduction to the smallest failing case.
- Fix suggester: proposes a patch only when the cause is clear.
- Reporter: files evidence-backed issues or PRs.
## Workspace And Permissions
- Use a branch, worktree, sandbox, or read-only mode depending on the target.
- Allow tests, local fixtures, logs, static search, and non-production reproduction.
- Disallow production data access, destructive fuzzing, speculative mass issue creation, or broad refactors.
## Durable State
- Checked modules, signals inspected, duplicate issue search, reproduction steps, commands, expected/actual behavior, traces, screenshots, and final disposition.
## Loop Steps
1. Discover candidate bug signals from tests, logs, issues, traces, or recent diffs.
1. Load ownership docs, existing issues, and prior bug-hunt state.
1. Delegate signal discovery, reproduction, minimization, patch proposal, and reporting.
1. Search for duplicates before filing anything.
1. Reproduce in the smallest safe environment.
1. If root cause is obvious and patch is small, propose a PR with tests.
1. Otherwise file a precise issue with evidence and stop.
1. Persist false positives and checked areas.
## Verification Gates
- A bug report includes reproducible steps or a clear trace/log link.
- A patch includes a failing test or deterministic reproduction when feasible.
- Duplicate issue search is recorded.
- Expected vs actual behavior is grounded in docs, tests, or product requirements.
## Budget And Exit
- Max retries: 3 reproduction attempts per candidate.
- Max runtime: 90 minutes per module or signal cluster.
- Stop when a bug is reproduced and reported, a small verified patch is opened, the signal is classified as non-bug, or owner judgment is needed.
## Escalation
Escalate for production-only bugs, privacy-sensitive logs, ambiguous product behavior, security-sensitive findings, data loss, or broad architectural fixes.
## Loop Instruction
```text
Hunt for reproducible bugs in <module, release, or signal cluster>.
Start from concrete signals: failing tests, logs, traces, issues, or recent changes.
Search for duplicates before filing.
Reproduce safely, minimize the case, and report expected vs actual behavior.
Open a patch only when the cause is clear and verification is available.
```
Example automation: run weekly against modules with recent churn, flaky tests, or repeated user reports.
## Failure Modes
- Filing issues from code smell without reproduction.
- Creating duplicate bug reports.
- Using private logs or customer data as public evidence.
- Patching symptoms while leaving the reproduced cause unexplained.
## References
- [Run long horizon tasks with Codex](https://developers.openai.com/blog/run-long-horizon-tasks-with-codex) - Practical plan-edit-test-observe-repair-document-repeat runbook.
- [SWE-bench](https://www.swebench.com/) - Benchmark framing around real repository issues and tests.
- [Terminal-Bench](https://www.tbench.ai/) - Evaluation context for hard terminal tasks and reproducibility.
|