| # Review instructions |
|
|
| These rules override the default review guidance. Treat them as the highest-priority |
| instruction block for any review of this repo. If something here contradicts a more |
| generic review habit, follow these. |
|
|
| ## Severity levels |
|
|
| Every finding carries one of three priority labels: |
|
|
| - **P0** β blocks merge. |
| - **P1** β worth fixing, not blocking. |
| - **P2** β informational. |
|
|
| Write labels as plain text (`P0`, `P1`, `P2`) in finding headers. Do not use |
| emoji or colored markers. Use judgment on what belongs at which level β this |
| repo does not enumerate P0 cases; read the code and decide. |
|
|
| ## Default bias: rigor |
|
|
| Reviews gate merges. This is an open-source repo that takes PRs from anyone; the |
| maintainer team is small and relies on the review to catch what they don't have |
| time to verify themselves. **Default bias is rigor, not speed.** When in doubt |
| on a P0-class concern, investigate further before deciding whether to flag β a |
| false negative ships a bug to production, a false positive costs the contributor |
| one round trip. |
|
|
| Rigor is not nitpicking. The P1 cap, "do not report" skip list, and verification |
| bar all still apply. Rigor means going deep on a small number of real concerns, |
| not surfacing a large number of shallow ones. Prefer one well-investigated P0 |
| over three speculative P1s. |
|
|
| **Hold the line on P0.** If the author pushes back on a P0 finding without a fix |
| that actually addresses the root cause, re-state the concern with added |
| citations. Only accept the pushback if the author points to code or behavior you |
| missed. Do not soften a P0 because the contributor is polite or new to the repo. |
|
|
| For P1 and P2: if the author defers or pushes back without fixing, accept it |
| silently β do not re-flag on subsequent commits. P1/P2 are informational; the |
| author may defer to a follow-up issue at their discretion. |
|
|
| If Claude and the author repeatedly disagree on the same class of finding, the |
| signal is that REVIEW.md is missing a rule; note it once in the PR summary as |
| `suggest-rule: <short description>` and stop. |
|
|
| ## Investigate before posting |
|
|
| The depth of your analysis determines the strength of your finding. For any |
| P0-class concern, before writing it up: |
|
|
| - Read the relevant callers and callees, not just the diff. Use Read and Grep |
| to open files the diff doesn't touch but the changed code interacts with. |
| - Trace the full chain end-to-end for routing, auth, and agent-loop findings. |
| Cite each hop by `file:line`, not just the suspicious line. |
| - Check whether the codebase already has an established pattern for this kind |
| of change (`grep` for similar call sites, similar tool definitions, similar |
| route guards). If the PR introduces a new approach where an established |
| pattern exists, flag that β divergence from the existing pattern is usually a |
| regression vector even when the new code "works." |
| - Confirm the specific behavior you're claiming. "This breaks X" must be |
| grounded in either the code handling X or a test exercising X, not in |
| inference from naming or structure. |
|
|
| A finding you "spotted" by scanning the diff is more likely to be a false |
| positive than a finding you verified by reading the code around it. |
|
|
| ## P1 cap |
|
|
| Report at most **3** P1 findings per review. If you found more, say "plus N |
| similar items" in the summary. If everything you found is P1 or below, open the |
| summary with "No blocking issues." |
|
|
| ## Re-review convergence |
|
|
| If this PR has already received a Claude review (there is a prior review comment |
| by the `claude` bot), suppress new P1 findings and post only P0 ones. Do not |
| re-post P1s that were already flagged on earlier commits. If the author pushed a |
| fix for a previously flagged issue, acknowledge it in one line rather than |
| re-flagging. |
|
|
| ## Do not report |
|
|
| Anything in these paths β skip entirely: |
|
|
| - `frontend/node_modules/**`, `**/*.lock`, `uv.lock`, `package-lock.json` |
| - `hf_agent.egg-info/**`, `.ruff_cache/**`, `.pytest_cache/**`, `.venv/**` |
| - `session_logs/**`, `reports/**` |
| - Anything under a `gen/` or `generated/` path |
|
|
| Anything speculative β do not post: |
|
|
| - "This might be slow" without a concrete complexity claim tied to a specific |
| input size |
| - Hypothetical race conditions without a concrete interleaving |
|
|
| ## Dependency PRs |
|
|
| For PRs whose diff is only a lockfile bump, a `pyproject.toml` change, or a |
| new dependency, the code rules above don't apply β risks shift to provenance |
| and framing. Every claim in the title or body (CVE IDs, version numbers, |
| behavior fixes) must match what the diff actually does, and any new |
| transitive dep needs justification. A PR that lies in its framing is P0 |
| regardless of whether the code change is safe in isolation. |
|
|
| ## Verification bar |
|
|
| Every behavior claim in a finding must cite `file:line`. "This breaks X" is not |
| actionable without a line reference. If you cannot cite a line, do not post |
| the finding. |
|
|
| ## Summary shape |
|
|
| Open the review body with a single-line tally and an explicit merge verdict, on |
| two lines: |
|
|
| ``` |
| 2 P0, 3 P1 |
| Verdict: changes requested |
| ``` |
|
|
| Valid verdicts: |
|
|
| - **Verdict: ready to merge** β no P0 findings, contributor can merge as-is |
| once any CI passes |
| - **Verdict: changes requested** β at least one P0 that must be addressed |
| before merging |
| - **Verdict: needs discussion** β a design-level concern the maintainer should |
| weigh in on before the contributor iterates (use sparingly) |
|
|
| If it's a clean review, write `LGTM` followed by `Verdict: ready to merge`. |
|
|
| Then a **What I checked** bullet list β one line per major area you examined, |
| regardless of whether you found anything. This gives the maintainer visible |
| coverage at a glance and lets them decide whether to spot-check areas you |
| didn't touch. |
|
|