Spaces:
Sleeping
Sleeping
Commit ·
899a7c7
1
Parent(s): cf05092
feat: Implement chunking and graph management for code review environment
Browse files- Added chunker module to split parsed Python modules into manageable chunks.
- Introduced graph builder to create edges between code chunks and modules.
- Created sample project files for authentication, cart calculations, checkout flow, and configuration.
- Implemented utility functions for inventory management and email notifications.
- Developed payment gateway wrapper with security considerations.
- Added validators for input checks and coupon validation.
- Created extensive test suite for graph manager, observation builder, and token budget enforcement.
- Documented Phase 2 plan for graph manager and observation builder integration.
- Builder.md +59 -101
- Debugger.md +57 -69
- OpenEnv +1 -0
- Phases.md +378 -240
- Reviewer.md +94 -0
- code-review-env/README.md +61 -2
- code-review-env/db/database.py +3 -0
- code-review-env/db/models.py +25 -0
- code-review-env/db/schema.py +13 -1
- code-review-env/db/seed.py +143 -0
- code-review-env/db/store.py +31 -0
- code-review-env/env/graph.py +20 -67
- code-review-env/env/observation.py +62 -0
- code-review-env/env/observation_builder.py +143 -1
- code-review-env/graph/__init__.py +5 -0
- code-review-env/graph/graph_manager.py +125 -0
- code-review-env/graph/token_budget.py +117 -0
- code-review-env/parser/ast_parser.py +41 -11
- code-review-env/parser/chunker.py +96 -0
- code-review-env/parser/graph_builder.py +114 -0
- code-review-env/parser/linter.py +29 -0
- code-review-env/requirements.txt +1 -0
- code-review-env/sample_project/auth.py +7 -0
- code-review-env/sample_project/cart.py +17 -0
- code-review-env/sample_project/checkout.py +15 -0
- code-review-env/sample_project/config.py +6 -0
- code-review-env/sample_project/database.py +6 -0
- code-review-env/sample_project/huge_module.py +628 -0
- code-review-env/sample_project/inventory.py +10 -0
- code-review-env/sample_project/notifications.py +6 -0
- code-review-env/sample_project/payments.py +15 -0
- code-review-env/sample_project/utils.py +7 -0
- code-review-env/sample_project/validators.py +8 -0
- code-review-env/tests/test_phase2_graph_manager.py +32 -0
- code-review-env/tests/test_phase2_observation.py +55 -0
- code-review-env/tests/test_phase2_token_budget.py +42 -0
- code-review-env/tests/test_seed.py +30 -0
- plans/phase-02-graph-manager-observation-plan.md +206 -0
Builder.md
CHANGED
|
@@ -1,138 +1,96 @@
|
|
| 1 |
-
# Builder Prompt —
|
| 2 |
|
| 3 |
-
You are an expert Python engineer building a
|
| 4 |
|
| 5 |
---
|
| 6 |
|
| 7 |
## What You Are Building
|
| 8 |
|
| 9 |
-
An OpenEnv-compliant RL environment where an LLM agent
|
| 10 |
|
| 11 |
-
The environment
|
| 12 |
-
1. Parses a Python codebase into a **persistent dependency graph** stored in SQLite via SQLModel. Nodes = modules. Edges = import relationships.
|
| 13 |
-
2. Each node stores: full source code, compressed AST summary (~50 tokens), linter ground truth (pylint + bandit output), and agent-written review annotations.
|
| 14 |
-
3. The agent reviews one module per episode via a multi-step loop: `reset()` → `step(action)` × N → done.
|
| 15 |
-
4. The agent sees **full code of the current module only**. Neighbors are always compressed summaries — never full code. This is a hard constraint for token budget.
|
| 16 |
-
5. The agent can take actions: FLAG_BUG, FLAG_STYLE, FLAG_SECURITY, FLAG_DEPENDENCY_ISSUE, ADD_COMMENT, REQUEST_CHANGES, APPROVE, REQUEST_CONTEXT (costs -0.1 reward), AMEND_REVIEW (updates a neighbor's annotation retroactively).
|
| 17 |
-
6. Rewards are computed by graders against pre-computed ground truth stored in the DB.
|
| 18 |
-
7. The final output is an annotated dependency graph — all module reviews, cross-module causal attributions, readable as JSON and Markdown.
|
| 19 |
|
| 20 |
-
The
|
| 21 |
|
| 22 |
---
|
| 23 |
|
| 24 |
-
##
|
| 25 |
|
| 26 |
-
**
|
| 27 |
|
| 28 |
-
|
| 29 |
-
-
|
| 30 |
-
-
|
| 31 |
-
-
|
|
|
|
| 32 |
|
| 33 |
-
Use
|
| 34 |
|
| 35 |
-
|
| 36 |
|
| 37 |
-
|
| 38 |
|
| 39 |
-
-
|
| 40 |
-
- SQLModel (SQLite persistence)
|
| 41 |
-
- NetworkX (graph construction and traversal)
|
| 42 |
-
- FastAPI (HTTP server for OpenEnv spec)
|
| 43 |
-
- Pydantic v2 (typed models)
|
| 44 |
-
- pylint + bandit (linter ground truth)
|
| 45 |
-
- Python `ast` module (AST parsing — stdlib, no extras)
|
| 46 |
-
- OpenAI client (all LLM calls in inference.py and hard grader)
|
| 47 |
-
- Docker (containerization)
|
| 48 |
|
| 49 |
-
|
| 50 |
|
| 51 |
-
|
| 52 |
|
| 53 |
-
|
| 54 |
|
| 55 |
-
``
|
| 56 |
-
code-review-env/
|
| 57 |
-
├── openenv.yaml
|
| 58 |
-
├── Dockerfile
|
| 59 |
-
├── README.md
|
| 60 |
-
├── inference.py
|
| 61 |
-
├── requirements.txt
|
| 62 |
-
├── env/
|
| 63 |
-
│ ├── environment.py
|
| 64 |
-
│ ├── models.py
|
| 65 |
-
│ ├── graph.py
|
| 66 |
-
│ ├── observation_builder.py
|
| 67 |
-
│ └── reward.py
|
| 68 |
-
├── db/
|
| 69 |
-
│ ├── schema.py
|
| 70 |
-
│ ├── store.py
|
| 71 |
-
│ └── migrations.py
|
| 72 |
-
├── parser/
|
| 73 |
-
│ ├── ast_parser.py
|
| 74 |
-
│ ├── linter.py
|
| 75 |
-
│ └── summarizer.py
|
| 76 |
-
├── graders/
|
| 77 |
-
│ ├── base_grader.py
|
| 78 |
-
│ ├── easy_grader.py
|
| 79 |
-
│ ├── medium_grader.py
|
| 80 |
-
│ └── hard_grader.py
|
| 81 |
-
├── tasks/
|
| 82 |
-
│ ├── task_registry.py
|
| 83 |
-
│ ├── easy_task.py
|
| 84 |
-
│ ├── medium_task.py
|
| 85 |
-
│ └── hard_task.py
|
| 86 |
-
├── server/
|
| 87 |
-
│ └── app.py
|
| 88 |
-
├── sample_codebase/
|
| 89 |
-
│ ├── auth.py
|
| 90 |
-
│ ├── checkout.py
|
| 91 |
-
│ ├── cart.py
|
| 92 |
-
│ ├── payments.py
|
| 93 |
-
│ ├── config.py
|
| 94 |
-
│ └── ground_truth.json
|
| 95 |
-
└── tests/
|
| 96 |
-
```
|
| 97 |
|
| 98 |
---
|
| 99 |
|
| 100 |
-
## Phase
|
| 101 |
-
|
| 102 |
-
**[INSERT PHASE NUMBER AND NAME HERE]**
|
| 103 |
|
| 104 |
-
|
| 105 |
|
| 106 |
---
|
| 107 |
|
| 108 |
-
##
|
| 109 |
|
| 110 |
-
|
| 111 |
-
|
| 112 |
-
|
| 113 |
-
|
| 114 |
-
|
| 115 |
-
|
| 116 |
-
|
| 117 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 118 |
|
| 119 |
---
|
| 120 |
|
| 121 |
-
##
|
| 122 |
|
| 123 |
-
|
| 124 |
-
|
| 125 |
-
|
| 126 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 127 |
|
| 128 |
---
|
| 129 |
|
| 130 |
-
##
|
| 131 |
-
|
| 132 |
-
If any of the following are unclear, ask before building:
|
| 133 |
|
| 134 |
-
|
| 135 |
-
|
| 136 |
-
|
| 137 |
-
|
| 138 |
-
|
|
|
|
| 1 |
+
# Builder Prompt — GraphReview RL Environment
|
| 2 |
|
| 3 |
+
You are an expert Python engineer building a production-quality RL environment for a competitive hackathon (OpenEnv Round 1). You have one job: build the GraphReview environment correctly, phase by phase, without breaking prior work.
|
| 4 |
|
| 5 |
---
|
| 6 |
|
| 7 |
## What You Are Building
|
| 8 |
|
| 9 |
+
An OpenEnv-compliant RL environment where an LLM agent reviews Python code with full dependency graph awareness. The environment parses a Python codebase into a persistent SQLite-backed dependency graph, pre-computes ground truth linter flags, and exposes a step()/reset()/state() API for an agent to interact with.
|
| 10 |
|
| 11 |
+
This is online RL — no training dataset is needed. The ground truth (pylint/bandit/pyflakes results) is computed once at seed time and stored in SQLite. The agent explores the environment and receives rewards compared against that ground truth.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 12 |
|
| 13 |
+
The full phase plan and architecture are provided below. Read the entire plan before writing a single line of code.
|
| 14 |
|
| 15 |
---
|
| 16 |
|
| 17 |
+
## Your Operating Rules
|
| 18 |
|
| 19 |
+
1. **Before building each phase, read the full plan for that phase.** Do not start coding until you understand what the phase produces and what its success criteria are.
|
| 20 |
|
| 21 |
+
2. **Ask me questions before starting if any of the following are unclear:**
|
| 22 |
+
- A design decision that affects DB schema or file structure
|
| 23 |
+
- Anything that would be hard to change later (interfaces, Pydantic models, DB tables)
|
| 24 |
+
- Ambiguity in how two components interact
|
| 25 |
+
Do NOT ask about low-level implementation details — choose the best approach yourself.
|
| 26 |
|
| 27 |
+
3. **Use context7 MCP to look up documentation** for: openenv-core, SQLAlchemy, NetworkX, Pyvis, astroid, pylint API, FastAPI, Pydantic v2. Do not rely on memory for library APIs — always verify.
|
| 28 |
|
| 29 |
+
4. **One phase at a time.** Complete a phase fully before moving to the next. Each phase has explicit success criteria — verify them before declaring a phase done.
|
| 30 |
|
| 31 |
+
5. **Never break prior phases.** If a later phase requires changing an earlier interface, explicitly flag it, explain why, and get confirmation before making the change.
|
| 32 |
|
| 33 |
+
6. **DB is the source of truth.** All state lives in SQLite. Nothing important lives only in memory. reset() clears only task-run annotations — never re-parses the codebase.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 34 |
|
| 35 |
+
7. **Token budget is a hard constraint.** No observation may exceed 2000 tokens. Enforce this in token_budget.py — do not leave it as a soft guideline.
|
| 36 |
|
| 37 |
+
8. **Graders must be deterministic.** Easy and medium graders: zero LLM calls, same input always produces same output. Hard grader: temperature=0, document prompt hash. Test this explicitly.
|
| 38 |
|
| 39 |
+
9. **inference.py log format is mandatory.** [START], [STEP], [END] format must be exact. Any deviation causes evaluation failure. Treat this as a contract.
|
| 40 |
|
| 41 |
+
10. **Write clean, typed Python.** All functions typed. All Pydantic models complete. No `Any` types unless unavoidable with explanation.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 42 |
|
| 43 |
---
|
| 44 |
|
| 45 |
+
## Phase Plan
|
|
|
|
|
|
|
| 46 |
|
| 47 |
+
[INSERT FULL PHASE PLAN HERE — paste the contents of the phase plan artifact]
|
| 48 |
|
| 49 |
---
|
| 50 |
|
| 51 |
+
## Sample Project Specification
|
| 52 |
|
| 53 |
+
The sample_project/ directory must contain exactly these files with these injected bugs:
|
| 54 |
+
|
| 55 |
+
```
|
| 56 |
+
auth.py — validate_token() can return None (not handled)
|
| 57 |
+
checkout.py — calls auth.validate_token(), doesn't check for None
|
| 58 |
+
cart.py — style violations only (PEP8)
|
| 59 |
+
config.py — missing required key in get_config() (root cause of cascade)
|
| 60 |
+
database.py — SQL query built with string concatenation (SQL injection)
|
| 61 |
+
utils.py — unused imports, dead code
|
| 62 |
+
models.py — clean file (no issues, tests APPROVE path)
|
| 63 |
+
payments.py — depends on checkout.py, inherits None risk
|
| 64 |
+
api.py — depends on auth.py and checkout.py
|
| 65 |
+
main.py — entry point, light glue code
|
| 66 |
+
```
|
| 67 |
+
|
| 68 |
+
Task mapping:
|
| 69 |
+
- easy_task: cart.py (style only)
|
| 70 |
+
- medium_task: checkout.py + auth.py (null reference)
|
| 71 |
+
- hard_task: config.py → auth.py → checkout.py (cascade)
|
| 72 |
|
| 73 |
---
|
| 74 |
|
| 75 |
+
## Tech Stack
|
| 76 |
|
| 77 |
+
- Python 3.11
|
| 78 |
+
- SQLite via SQLAlchemy ORM
|
| 79 |
+
- NetworkX + astroid + Python ast
|
| 80 |
+
- pylint + bandit + pyflakes
|
| 81 |
+
- Pyvis for visualization
|
| 82 |
+
- Pydantic v2
|
| 83 |
+
- FastAPI
|
| 84 |
+
- OpenAI client (inference.py + hard grader judge)
|
| 85 |
+
- openenv-core
|
| 86 |
+
- context7 MCP for all library lookups
|
| 87 |
|
| 88 |
---
|
| 89 |
|
| 90 |
+
## Start Instructions
|
|
|
|
|
|
|
| 91 |
|
| 92 |
+
Begin with Phase 1. Before writing any code:
|
| 93 |
+
1. Use context7 MCP to look up: openenv-core spec, SQLAlchemy ORM setup, astroid API
|
| 94 |
+
2. Ask me any design questions that affect DB schema or file structure
|
| 95 |
+
3. Confirm the sample_project file list with me if you want to adjust it
|
| 96 |
+
4. Then build Phase 1 completely and verify all success criteria before stopping
|
Debugger.md
CHANGED
|
@@ -1,100 +1,88 @@
|
|
| 1 |
-
# Debugger Prompt —
|
| 2 |
|
| 3 |
-
You are an expert Python debugger working on
|
| 4 |
|
| 5 |
---
|
| 6 |
|
| 7 |
-
## Project
|
| 8 |
|
| 9 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 10 |
|
| 11 |
---
|
| 12 |
|
| 13 |
-
##
|
| 14 |
|
| 15 |
-
1. **
|
| 16 |
-
2. **Neighbor observations are always compressed summaries** — never fix a context issue by passing full neighbor code
|
| 17 |
-
3. **Rewards must always be in 0.0–1.0** — if a reward bug exists, fix the computation, never remove the clip
|
| 18 |
-
4. **inference.py uses OpenAI client only** — do not swap to direct HTTP calls or another client
|
| 19 |
-
5. **[START]/[STEP]/[END] log format is fixed** — do not change field names or ordering to fix a logging bug
|
| 20 |
-
6. **Hard grader uses temperature=0 and fixed rubric** — do not relax this to fix flaky test failures
|
| 21 |
-
7. **episode step limit is 10** — do not raise this to fix timeout issues, optimize the agent instead
|
| 22 |
|
| 23 |
-
|
| 24 |
|
| 25 |
-
|
| 26 |
|
| 27 |
-
|
| 28 |
-
- Identify which layer the bug is in: parser → db → graph → observation_builder → environment → grader → server → inference
|
| 29 |
-
- Do not assume the bug is where the error surfaces — trace back to root cause
|
| 30 |
|
| 31 |
-
|
| 32 |
-
- Before changing implementation, verify the interface contract between the broken component and its dependencies
|
| 33 |
-
- Use Context7 MCP to re-check library APIs if the bug involves SQLModel, NetworkX, pylint, bandit, FastAPI, or OpenEnv
|
| 34 |
-
- Do not fix a bug by changing a shared interface without checking all callers
|
| 35 |
|
| 36 |
-
|
| 37 |
-
- Fix the smallest possible change that resolves the issue
|
| 38 |
-
- If the fix requires changing a DB schema, check whether a migration is needed and write it
|
| 39 |
-
- If the fix changes a Pydantic model, check all serialization/deserialization paths
|
| 40 |
|
| 41 |
-
|
| 42 |
-
- After fixing, confirm the completion criteria for the relevant phase still pass
|
| 43 |
-
- Run the specific test for the broken component
|
| 44 |
-
- If inference.py is affected, do a dry run and confirm [START]/[STEP]/[END] logs emit correctly
|
| 45 |
|
| 46 |
-
|
| 47 |
|
| 48 |
-
|
| 49 |
|
| 50 |
-
|
| 51 |
-
- DB not found on startup → check migrations.py auto-init logic
|
| 52 |
-
- Graph loads empty on second run → check upsert_node is committing correctly
|
| 53 |
-
- Annotations not persisting across reset() → check reset() only clears annotations, not nodes/edges
|
| 54 |
|
| 55 |
-
|
| 56 |
-
- AST parser crashes on type-annotated functions → check handling of ast.Constant vs ast.Str in Python 3.11
|
| 57 |
-
- Linter returns no output → check pylint/bandit are installed in the Docker image and PATH is correct
|
| 58 |
-
- Import resolution fails on relative imports → check the resolver handles both absolute and relative imports
|
| 59 |
|
| 60 |
-
##
|
| 61 |
-
- Reward outside 0.0–1.0 → find the unclipped computation in reward.py
|
| 62 |
-
- done never becomes True → check step limit counter and REQUEST_CHANGES/APPROVE handling
|
| 63 |
-
- reset() returns wrong module → check task registry is loading the correct starting module
|
| 64 |
|
| 65 |
-
|
| 66 |
-
-
|
| 67 |
-
-
|
| 68 |
-
- Grader crashes on empty annotation → add null check before scoring
|
| 69 |
|
| 70 |
-
|
| 71 |
-
-
|
| 72 |
-
-
|
| 73 |
-
- openenv validate fails → check openenv.yaml field names against spec exactly
|
| 74 |
|
| 75 |
-
|
| 76 |
-
-
|
| 77 |
-
-
|
| 78 |
-
- Missing [STEP] logs → check log emit is inside the step loop, not outside
|
| 79 |
|
| 80 |
-
|
| 81 |
-
-
|
| 82 |
-
-
|
| 83 |
-
- Port not exposed → confirm EXPOSE 7860 and uvicorn binds to 0.0.0.0
|
| 84 |
|
| 85 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 86 |
|
| 87 |
-
|
|
|
|
|
|
|
| 88 |
|
| 89 |
-
|
|
|
|
|
|
|
| 90 |
|
| 91 |
---
|
| 92 |
|
| 93 |
-
##
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 94 |
|
| 95 |
-
|
| 96 |
-
- What the root cause was (one sentence)
|
| 97 |
-
- Which file(s) were changed
|
| 98 |
-
- Whether any DB schema changed (and if so, whether a migration was added)
|
| 99 |
-
- Whether any Pydantic model interface changed (and if so, which callers were updated)
|
| 100 |
-
- The specific test or check that now passes
|
|
|
|
| 1 |
+
# Debugger Prompt — GraphReview RL Environment
|
| 2 |
|
| 3 |
+
You are an expert Python debugger working on a competitive hackathon RL environment called GraphReview. Your job is to diagnose and fix bugs without breaking existing working functionality.
|
| 4 |
|
| 5 |
---
|
| 6 |
|
| 7 |
+
## Project Context
|
| 8 |
|
| 9 |
+
GraphReview is an OpenEnv-compliant RL environment. It:
|
| 10 |
+
- Parses Python codebases into a SQLite-backed NetworkX dependency graph
|
| 11 |
+
- Pre-computes linter ground truth (pylint/bandit/pyflakes) at seed time
|
| 12 |
+
- Exposes step()/reset()/state() for an LLM agent to review code
|
| 13 |
+
- Scores agent actions against stored ground truth via deterministic graders
|
| 14 |
+
- Outputs an annotated graph visualization via Pyvis
|
| 15 |
+
|
| 16 |
+
The DB is the source of truth. Pydantic v2 models define all interfaces. FastAPI wraps the environment for HTTP. inference.py runs the baseline agent.
|
| 17 |
|
| 18 |
---
|
| 19 |
|
| 20 |
+
## Your Operating Rules
|
| 21 |
|
| 22 |
+
1. **Diagnose before fixing.** State exactly what is wrong and why before writing any fix. One sentence minimum: "The bug is X because Y."
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 23 |
|
| 24 |
+
2. **Minimal surface area.** Fix only what is broken. Do not refactor, rename, or improve unrelated code while fixing a bug.
|
| 25 |
|
| 26 |
+
3. **Check DB integrity first** for any bug involving missing data, wrong rewards, or incorrect state. Run: `SELECT * FROM seed_meta` to verify seeded flag. Check `modules`, `edges`, `linter_flags` are populated before assuming code is wrong.
|
| 27 |
|
| 28 |
+
4. **Use context7 MCP** to verify library APIs before assuming a bug is in your code. Many bugs come from incorrect assumptions about SQLAlchemy session handling, Pydantic v2 validation, or NetworkX graph methods.
|
|
|
|
|
|
|
| 29 |
|
| 30 |
+
5. **Never re-seed unless explicitly told to.** Re-seeding takes 30s and loses demo state. If a bug looks like a seeding issue, verify first.
|
|
|
|
|
|
|
|
|
|
| 31 |
|
| 32 |
+
6. **Grader determinism is sacred.** If a grader produces different results across runs, that is a critical bug — fix it before anything else. Check: temperature settings, prompt variability, random seeds.
|
|
|
|
|
|
|
|
|
|
| 33 |
|
| 34 |
+
7. **Do not change Pydantic model field names or types** without explicitly flagging it. These are shared interfaces — changing them breaks step()/reset()/state() and inference.py simultaneously.
|
|
|
|
|
|
|
|
|
|
| 35 |
|
| 36 |
+
8. **inference.py log format is a contract.** [START]/[STEP]/[END] field names and order must never change. If a bug is in inference.py, fix the logic without changing the log format.
|
| 37 |
|
| 38 |
+
9. **After fixing, state what you changed and why**, and identify any other components that might be affected by the change.
|
| 39 |
|
| 40 |
+
10. **If the bug requires a design change** (not just a code fix), say so clearly. Do not silently implement a design change as if it were a bug fix.
|
|
|
|
|
|
|
|
|
|
| 41 |
|
| 42 |
+
---
|
|
|
|
|
|
|
|
|
|
| 43 |
|
| 44 |
+
## Common Bug Patterns in This Project
|
|
|
|
|
|
|
|
|
|
| 45 |
|
| 46 |
+
**DB not seeded / partial seed**
|
| 47 |
+
- Symptom: KeyError on module_id, empty linter_flags, missing edges
|
| 48 |
+
- Check: seed_meta table for seeded=true, verify row counts in modules and edges
|
|
|
|
| 49 |
|
| 50 |
+
**Pydantic v2 validation errors**
|
| 51 |
+
- Symptom: ValidationError on step() or reset()
|
| 52 |
+
- Check: field types match exactly, Optional fields have defaults, JSON fields are dicts not strings
|
|
|
|
| 53 |
|
| 54 |
+
**NetworkX graph not reconstructed from DB**
|
| 55 |
+
- Symptom: graph_manager returns empty neighbors, traversal order is wrong
|
| 56 |
+
- Check: edges table has rows, graph_manager.load_graph() is called before queries
|
|
|
|
| 57 |
|
| 58 |
+
**Grader returning out-of-range reward**
|
| 59 |
+
- Symptom: reward > 1.0 or < -1.0
|
| 60 |
+
- Check: reward aggregation logic, episode completion bonus not double-applied
|
|
|
|
| 61 |
|
| 62 |
+
**Token budget exceeded**
|
| 63 |
+
- Symptom: LLM returns truncated or incoherent response
|
| 64 |
+
- Check: token_budget.py is being called, observation summaries not using raw code
|
| 65 |
+
|
| 66 |
+
**Hard grader non-determinism**
|
| 67 |
+
- Symptom: different scores for identical inputs
|
| 68 |
+
- Check: temperature=0 set on judge API call, system prompt is static string not f-string with variables
|
| 69 |
|
| 70 |
+
**inference.py timeout (>20 min)**
|
| 71 |
+
- Symptom: evaluation fails on judge's machine
|
| 72 |
+
- Check: REQUEST_CONTEXT actions in inference loop causing extra API calls, batching strategy
|
| 73 |
|
| 74 |
+
**reset() clearing too much**
|
| 75 |
+
- Symptom: graph annotations from prior tasks lost after reset
|
| 76 |
+
- Check: reset() filters by task_id when deleting review_annotations, not deleting all rows
|
| 77 |
|
| 78 |
---
|
| 79 |
|
| 80 |
+
## How to Use This Prompt
|
| 81 |
+
|
| 82 |
+
Paste this prompt, then describe:
|
| 83 |
+
1. What you were trying to do
|
| 84 |
+
2. What happened instead (error message, wrong output, wrong reward value)
|
| 85 |
+
3. Which phase/file the bug is in
|
| 86 |
+
4. What you already tried
|
| 87 |
|
| 88 |
+
Then share the relevant code. I will diagnose and fix it.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
OpenEnv
ADDED
|
@@ -0,0 +1 @@
|
|
|
|
|
|
|
| 1 |
+
Subproject commit c719decf2b19175d5ca35301d58a14c83e985480
|
Phases.md
CHANGED
|
@@ -1,295 +1,433 @@
|
|
| 1 |
-
#
|
| 2 |
-
## For: LLM-Assisted Development
|
| 3 |
|
| 4 |
---
|
| 5 |
|
| 6 |
-
##
|
| 7 |
|
| 8 |
-
An OpenEnv-compliant
|
| 9 |
|
| 10 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 11 |
|
| 12 |
-
The
|
| 13 |
|
| 14 |
-
The
|
| 15 |
-
|
| 16 |
-
This is differentiated from tools like CodeRabbit because:
|
| 17 |
-
- It models cascading dependency bugs (bug in B caused by design in A)
|
| 18 |
-
- Reviews are stored back into the graph and can be amended as agent learns more
|
| 19 |
-
- It is an RL training/evaluation environment, not a static analysis tool
|
| 20 |
-
- The agent learns a policy over multi-step decisions, not a single LLM call
|
| 21 |
|
| 22 |
---
|
| 23 |
|
| 24 |
-
##
|
| 25 |
|
| 26 |
-
|
| 27 |
-
-
|
| 28 |
-
-
|
| 29 |
-
-
|
| 30 |
-
-
|
| 31 |
-
-
|
| 32 |
|
| 33 |
-
|
| 34 |
|
| 35 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 36 |
|
| 37 |
---
|
| 38 |
|
| 39 |
-
##
|
| 40 |
|
| 41 |
```
|
| 42 |
-
|
| 43 |
-
├──
|
| 44 |
-
├── Dockerfile
|
| 45 |
-
├── README.md
|
| 46 |
-
├── inference.py # Required by spec, root level
|
| 47 |
-
├── requirements.txt
|
| 48 |
-
├── pyproject.toml
|
| 49 |
-
│
|
| 50 |
-
├── env/
|
| 51 |
-
│ ├── __init__.py
|
| 52 |
-
│ ├── environment.py # Main CodeReviewEnv class
|
| 53 |
-
│ ├── models.py # Pydantic: Observation, Action, Reward, GraphState
|
| 54 |
-
│ ├── graph.py # Graph construction, traversal, compression
|
| 55 |
-
│ ├── observation_builder.py # Assembles tiered observation per step
|
| 56 |
-
│ └── reward.py # Reward computation logic
|
| 57 |
-
│
|
| 58 |
-
├── db/
|
| 59 |
-
│ ├── __init__.py
|
| 60 |
-
│ ├── schema.py # SQLModel table definitions
|
| 61 |
-
│ ├── store.py # DB read/write operations
|
| 62 |
-
│ └── migrations.py # Init and seed scripts
|
| 63 |
-
│
|
| 64 |
-
├── parser/
|
| 65 |
-
│ ├── __init__.py
|
| 66 |
-
│ ├── ast_parser.py # AST extraction: signatures, imports, classes
|
| 67 |
-
│ ├── linter.py # Pylint + Bandit runner, stores results to DB
|
| 68 |
-
│ └── summarizer.py # Converts AST output → compressed node summary
|
| 69 |
-
│
|
| 70 |
-
├── graders/
|
| 71 |
-
│ ├── __init__.py
|
| 72 |
-
│ ├── base_grader.py # Abstract grader interface
|
| 73 |
-
│ ├── easy_grader.py # Linter match — fully deterministic
|
| 74 |
-
│ ├── medium_grader.py # AST + line attribution match
|
| 75 |
-
│ └── hard_grader.py # LLM-as-judge, temp=0, seed=42, rubric-constrained
|
| 76 |
-
│
|
| 77 |
-
├── tasks/
|
| 78 |
-
│ ├── __init__.py
|
| 79 |
-
│ ├── task_registry.py # Registers and loads tasks
|
| 80 |
-
│ ├── easy_task.py # Style/linter issue in isolated module
|
| 81 |
-
│ ├── medium_task.py # Logic bug with direct dependency context
|
| 82 |
-
│ └── hard_task.py # Cascading bug across 2+ modules
|
| 83 |
-
│
|
| 84 |
-
├── server/
|
| 85 |
-
│ ├── __init__.py
|
| 86 |
-
│ └── app.py # FastAPI server exposing OpenEnv HTTP endpoints
|
| 87 |
-
│
|
| 88 |
-
├── sample_codebase/ # Synthetic test codebase for demo
|
| 89 |
│ ├── auth.py
|
| 90 |
│ ├── checkout.py
|
| 91 |
│ ├── cart.py
|
| 92 |
-
│ ├──
|
| 93 |
-
│ └──
|
| 94 |
-
|
| 95 |
-
|
| 96 |
-
|
| 97 |
-
|
| 98 |
-
|
| 99 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 100 |
```
|
| 101 |
|
| 102 |
---
|
| 103 |
|
| 104 |
-
##
|
| 105 |
-
|
| 106 |
-
|
| 107 |
-
|
| 108 |
-
|
| 109 |
-
|
| 110 |
-
|
| 111 |
-
|
| 112 |
-
-
|
| 113 |
-
|
| 114 |
-
|
| 115 |
-
|
| 116 |
-
|
| 117 |
-
|
| 118 |
-
|
| 119 |
-
|
| 120 |
-
|
| 121 |
-
|
| 122 |
-
|
| 123 |
-
|
| 124 |
-
|
| 125 |
-
|
| 126 |
-
|
| 127 |
-
|
| 128 |
-
|
| 129 |
-
|
| 130 |
-
|
| 131 |
-
|
| 132 |
-
|
| 133 |
-
|
| 134 |
-
|
| 135 |
-
|
| 136 |
-
|
| 137 |
-
|
| 138 |
-
|
| 139 |
-
|
| 140 |
-
|
| 141 |
-
|
| 142 |
-
|
| 143 |
-
|
| 144 |
-
|
| 145 |
-
|
| 146 |
-
|
| 147 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 148 |
|
| 149 |
---
|
| 150 |
|
| 151 |
-
##
|
| 152 |
-
**Goal: Database schema, parser, graph construction. No RL yet.**
|
| 153 |
|
| 154 |
-
|
| 155 |
-
|
| 156 |
-
|
| 157 |
-
|
| 158 |
-
|
| 159 |
-
|
| 160 |
-
|
| 161 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 162 |
|
| 163 |
-
##
|
| 164 |
-
|
| 165 |
-
|
| 166 |
-
|
| 167 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 168 |
|
| 169 |
---
|
| 170 |
|
| 171 |
-
##
|
| 172 |
-
|
| 173 |
-
|
| 174 |
-
|
| 175 |
-
|
| 176 |
-
|
| 177 |
-
|
| 178 |
-
|
| 179 |
-
|
| 180 |
-
|
| 181 |
-
|
| 182 |
-
|
| 183 |
-
|
| 184 |
-
|
| 185 |
-
|
| 186 |
-
|
| 187 |
-
|
| 188 |
-
|
| 189 |
-
|
| 190 |
-
|
| 191 |
-
|
| 192 |
-
|
| 193 |
-
- `env.reset("easy_task")` returns a valid typed Observation
|
| 194 |
-
- `env.step(FLAG_BUG(line=12, description="null risk"))` returns reward > 0 for correct flag
|
| 195 |
-
- `env.state()` returns serializable graph with annotations
|
| 196 |
-
- Full episode runs without error on all 3 tasks
|
| 197 |
-
- Reward values all fall in 0.0–1.0 range
|
| 198 |
|
| 199 |
---
|
| 200 |
|
| 201 |
-
##
|
| 202 |
-
**Goal: Wrap environment in FastAPI, pass openenv validate.**
|
| 203 |
|
| 204 |
-
###
|
| 205 |
-
|
| 206 |
-
|
| 207 |
-
|
| 208 |
-
|
| 209 |
-
|
| 210 |
-
|
| 211 |
-
3. Run `openenv validate` — fix all compliance errors
|
| 212 |
-
4. Confirm all Pydantic models serialize/deserialize correctly over HTTP
|
| 213 |
|
| 214 |
-
###
|
| 215 |
-
- `
|
| 216 |
-
-
|
| 217 |
-
-
|
| 218 |
|
| 219 |
---
|
| 220 |
|
| 221 |
-
##
|
| 222 |
-
**Goal: Build inference.py that runs Gemma 4 as the agent. This is what judges auto-run.**
|
| 223 |
|
| 224 |
-
### Critical Requirements (Non-Negotiable)
|
| 225 |
-
- File must be named `inference.py` at root
|
| 226 |
-
- Use OpenAI client for all LLM calls
|
| 227 |
-
- Read API_BASE_URL, MODEL_NAME, HF_TOKEN from environment variables
|
| 228 |
-
- Emit structured stdout logs in EXACTLY this format:
|
| 229 |
```
|
| 230 |
-
[START] task=
|
| 231 |
-
[STEP]
|
| 232 |
-
[
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 233 |
```
|
| 234 |
-
- Must complete all 3 tasks in under 20 minutes total
|
| 235 |
-
- Must run on 2 vCPU / 8GB RAM
|
| 236 |
-
|
| 237 |
-
### Tasks
|
| 238 |
-
1. Build the agent loop — for each task: reset env, loop step() until done, collect rewards
|
| 239 |
-
2. Build the LLM action parser — send observation to model with a structured prompt, parse response into typed Action. Use JSON mode or structured output. Handle parse failures gracefully (default to APPROVE with penalty).
|
| 240 |
-
3. Build the action prompt — system prompt explaining the environment, action space, and output format. Include the compressed observation in user message. Tell model to output JSON action only.
|
| 241 |
-
4. Implement all 3 task runs sequentially
|
| 242 |
-
5. Emit all required log lines to stdout
|
| 243 |
-
6. Final output: baseline scores for all 3 tasks printed to stdout
|
| 244 |
-
|
| 245 |
-
### Completion Criteria
|
| 246 |
-
- Script runs end to end without error
|
| 247 |
-
- All [START]/[STEP]/[END] logs emitted correctly
|
| 248 |
-
- Produces a score for each task between 0.0–1.0
|
| 249 |
-
- Completes in under 20 minutes
|
| 250 |
|
| 251 |
---
|
| 252 |
|
| 253 |
-
##
|
| 254 |
-
**Goal:
|
| 255 |
-
|
| 256 |
-
|
| 257 |
-
|
| 258 |
-
|
| 259 |
-
|
| 260 |
-
|
| 261 |
-
|
| 262 |
-
|
| 263 |
-
|
| 264 |
-
|
| 265 |
-
|
| 266 |
-
|
| 267 |
-
|
| 268 |
-
|
| 269 |
-
-
|
| 270 |
-
-
|
| 271 |
-
|
| 272 |
-
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 273 |
|
| 274 |
---
|
| 275 |
|
| 276 |
-
##
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 277 |
|
| 278 |
-
|
| 279 |
-
|
| 280 |
-
|
| 281 |
-
|
| 282 |
-
|
| 283 |
-
| Phase 4 — Inference Script | 4 hrs |
|
| 284 |
-
| Phase 5 — Docker + Deploy | 3 hrs |
|
| 285 |
-
| Buffer / debugging | 4 hrs |
|
| 286 |
|
| 287 |
---
|
| 288 |
|
| 289 |
-
##
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 290 |
|
| 291 |
-
|
| 292 |
-
|
| 293 |
-
|
| 294 |
-
|
| 295 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# GraphReview RL Environment — Complete Phased Build Plan v2
|
|
|
|
| 2 |
|
| 3 |
---
|
| 4 |
|
| 5 |
+
## What You Are Building
|
| 6 |
|
| 7 |
+
An OpenEnv-compliant RL environment where an LLM agent learns to review Python code with full dependency graph awareness. The environment:
|
| 8 |
|
| 9 |
+
1. Parses a Python codebase into a **persistent dependency graph** stored in SQLite
|
| 10 |
+
2. Splits large files (>300 lines) into sub-nodes by class/function to keep observations manageable
|
| 11 |
+
3. Pre-computes ground truth linter flags (pylint + bandit + pyflakes) per node at seed time
|
| 12 |
+
4. Presents the agent with one module at a time + compressed AST summaries of neighbors
|
| 13 |
+
5. Receives structured actions (FLAG_BUG, ADD_COMMENT, REQUEST_CONTEXT, etc.)
|
| 14 |
+
6. Scores actions against pre-computed ground truth — no training data needed, ground truth IS the data
|
| 15 |
+
7. Accumulates review annotations back onto graph nodes in SQLite
|
| 16 |
+
8. Outputs an annotated dependency graph visualized via Pyvis (interactive HTML) + markdown report
|
| 17 |
|
| 18 |
+
**The RL loop:** Agent takes multi-step actions per module episode, receives per-step rewards, learns to reason about cascading dependency issues. This is online RL — the environment generates interaction data live. No pre-existing dataset required.
|
| 19 |
|
| 20 |
+
**The key differentiator vs CodeRabbit:** Agent sees WHY a decision was made (upstream context) before flagging it. Reviews are stored back into the graph. Agent can AMEND earlier reviews as it learns more about root causes downstream.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 21 |
|
| 22 |
---
|
| 23 |
|
| 24 |
+
## Why No Training Data Is Needed
|
| 25 |
|
| 26 |
+
This is online RL, not offline supervised learning:
|
| 27 |
+
- Ground truth = pylint/bandit/pyflakes results, computed once at seed time, stored in DB
|
| 28 |
+
- Agent explores environment → receives rewards → that interaction IS the training signal
|
| 29 |
+
- For Round 1, the baseline inference script evaluates a pre-trained LLM (Gemma 4 E4B) acting as agent
|
| 30 |
+
- You are not training a model — you are building the environment that COULD train one
|
| 31 |
+
- The three graders define what "correct behavior" looks like — that is your data
|
| 32 |
|
| 33 |
+
---
|
| 34 |
|
| 35 |
+
## Tech Stack (Fixed)
|
| 36 |
+
|
| 37 |
+
- Python 3.11
|
| 38 |
+
- OpenEnv: step() / reset() / state() + Pydantic typed models + openenv.yaml
|
| 39 |
+
- SQLite via SQLAlchemy ORM (persistent, file-based, ships in Docker)
|
| 40 |
+
- NetworkX for graph operations and traversal
|
| 41 |
+
- Python built-in `ast` module for structure extraction
|
| 42 |
+
- `astroid` for scope-aware name resolution and intra-file conflict detection
|
| 43 |
+
- pylint + bandit + pyflakes for ground truth generation (run once at seed time)
|
| 44 |
+
- Pyvis for interactive graph visualization
|
| 45 |
+
- OpenAI client (inference.py + hard task LLM judge)
|
| 46 |
+
- Gemma 4 E4B as baseline agent model
|
| 47 |
+
- FastAPI for HTTP server (required for HF Spaces)
|
| 48 |
+
- Docker + Hugging Face Spaces
|
| 49 |
+
- context7 MCP for library documentation during build
|
| 50 |
|
| 51 |
---
|
| 52 |
|
| 53 |
+
## File Structure
|
| 54 |
|
| 55 |
```
|
| 56 |
+
graphreview/
|
| 57 |
+
├── sample_project/ # synthetic input codebase with injected bugs
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 58 |
│ ├── auth.py
|
| 59 |
│ ├── checkout.py
|
| 60 |
│ ├── cart.py
|
| 61 |
+
│ ├── database.py
|
| 62 |
+
│ └── ...
|
| 63 |
+
├── parser/
|
| 64 |
+
│ ├── ast_parser.py # extract signatures, imports, classes per file
|
| 65 |
+
│ ├── chunker.py # split files >300 lines into sub-nodes
|
| 66 |
+
│ ├── graph_builder.py # build NetworkX DiGraph from parsed output
|
| 67 |
+
│ └── summarizer.py # compress each node to ~50 token summary
|
| 68 |
+
├── db/
|
| 69 |
+
│ ├── database.py # SQLAlchemy engine, session factory
|
| 70 |
+
│ ├── models.py # ORM models for all tables
|
| 71 |
+
│ └── seed.py # parse once → store → skip if seeded
|
| 72 |
+
├── graph/
|
| 73 |
+
│ ├── graph_manager.py # load graph from DB, traversal, neighbor queries
|
| 74 |
+
│ └── token_budget.py # enforce token limits on observations
|
| 75 |
+
├── env/
|
| 76 |
+
│ ├── environment.py # CodeReviewEnv main class
|
| 77 |
+
│ ├── observation.py # Pydantic: CodeObservation
|
| 78 |
+
│ ├── action.py # Pydantic: ReviewAction
|
| 79 |
+
│ ├── reward.py # Pydantic: ReviewReward + reward table
|
| 80 |
+
│ └── state.py # Pydantic: GraphState
|
| 81 |
+
├── graders/
|
| 82 |
+
│ ├── base_grader.py # abstract interface
|
| 83 |
+
│ ├── easy_grader.py # linter match (deterministic)
|
| 84 |
+
│ ├── medium_grader.py # AST + line attribution (deterministic)
|
| 85 |
+
│ └── hard_grader.py # graph consistency + LLM judge (temperature=0)
|
| 86 |
+
├── tasks/
|
| 87 |
+
│ ├── task_registry.py # register 3 tasks
|
| 88 |
+
│ ├── easy_task.py # style/linter review
|
| 89 |
+
│ ├── medium_task.py # logic bug + direct dep context
|
| 90 |
+
│ └── hard_task.py # cascading bug across 2+ module hops
|
| 91 |
+
├── visualizer/
|
| 92 |
+
│ ├── pyvis_renderer.py # NetworkX → interactive HTML graph
|
| 93 |
+
│ └── report_generator.py # markdown + JSON final report
|
| 94 |
+
├── server.py # FastAPI wrapper for OpenEnv HTTP spec
|
| 95 |
+
├── inference.py # baseline agent script (mandatory, root level)
|
| 96 |
+
├── openenv.yaml # spec metadata
|
| 97 |
+
├── Dockerfile
|
| 98 |
+
└── README.md
|
| 99 |
```
|
| 100 |
|
| 101 |
---
|
| 102 |
|
| 103 |
+
## Database Schema (SQLite — Persistent)
|
| 104 |
+
|
| 105 |
+
**modules**
|
| 106 |
+
```
|
| 107 |
+
id TEXT PK (relative file path, or "file.py::ClassName" for sub-nodes)
|
| 108 |
+
name TEXT
|
| 109 |
+
code TEXT (full source — full file or chunked section)
|
| 110 |
+
ast_summary JSON (signatures, classes, return types, decorators)
|
| 111 |
+
linter_flags JSON (pre-computed pylint+bandit+pyflakes — GROUND TRUTH)
|
| 112 |
+
summary TEXT (~50 token natural language description)
|
| 113 |
+
parent_module_id TEXT NULL (set if this is a sub-node chunk of a larger file)
|
| 114 |
+
review_status TEXT (pending | in_progress | reviewed)
|
| 115 |
+
is_chunk BOOLEAN
|
| 116 |
+
```
|
| 117 |
+
|
| 118 |
+
**edges**
|
| 119 |
+
```
|
| 120 |
+
source_id TEXT FK → modules.id
|
| 121 |
+
target_id TEXT FK → modules.id
|
| 122 |
+
edge_type TEXT (explicit_import | implicit_dependency | intra_file)
|
| 123 |
+
import_line TEXT
|
| 124 |
+
dependency_reason TEXT
|
| 125 |
+
scope TEXT (module_level | function_level)
|
| 126 |
+
weight FLOAT (1.0 explicit, 0.5 implicit)
|
| 127 |
+
```
|
| 128 |
+
|
| 129 |
+
**review_annotations**
|
| 130 |
+
```
|
| 131 |
+
id INTEGER PK AUTOINCREMENT
|
| 132 |
+
module_id TEXT FK → modules.id
|
| 133 |
+
task_id TEXT
|
| 134 |
+
action_type TEXT
|
| 135 |
+
content TEXT
|
| 136 |
+
reward_given FLOAT
|
| 137 |
+
attributed_to TEXT NULL (module_id for cascade attribution)
|
| 138 |
+
is_amendment BOOLEAN (true if this amends a prior review)
|
| 139 |
+
created_at TIMESTAMP
|
| 140 |
+
```
|
| 141 |
+
|
| 142 |
+
**task_runs**
|
| 143 |
+
```
|
| 144 |
+
id INTEGER PK AUTOINCREMENT
|
| 145 |
+
task_id TEXT
|
| 146 |
+
started_at TIMESTAMP
|
| 147 |
+
completed_at TIMESTAMP NULL
|
| 148 |
+
total_reward FLOAT
|
| 149 |
+
total_steps INTEGER
|
| 150 |
+
status TEXT (running | complete | failed)
|
| 151 |
+
```
|
| 152 |
+
|
| 153 |
+
**seed_meta**
|
| 154 |
+
```
|
| 155 |
+
key TEXT PK
|
| 156 |
+
value TEXT
|
| 157 |
+
```
|
| 158 |
+
(stores seeded=true flag, seed timestamp, codebase hash)
|
| 159 |
|
| 160 |
---
|
| 161 |
|
| 162 |
+
## Chunking Strategy for Large Files
|
|
|
|
| 163 |
|
| 164 |
+
```
|
| 165 |
+
File ≤ 300 lines → one node, id = "filename.py"
|
| 166 |
+
|
| 167 |
+
File > 300 lines → chunk by top-level class or function
|
| 168 |
+
Each chunk becomes a sub-node:
|
| 169 |
+
id = "filename.py::ClassName" or "filename.py::function_name"
|
| 170 |
+
parent_module_id = "filename.py"
|
| 171 |
+
|
| 172 |
+
A virtual parent node is kept for the file itself
|
| 173 |
+
with no code but with all inter-file edges
|
| 174 |
+
|
| 175 |
+
Intra-file edges added between chunks:
|
| 176 |
+
if function_a calls function_b in same file →
|
| 177 |
+
edge(filename.py::function_a → filename.py::function_b, type=intra_file)
|
| 178 |
+
|
| 179 |
+
Dependency conflict detection (via astroid):
|
| 180 |
+
If import is used only inside one function → scope=function_level, weight=0.5
|
| 181 |
+
If import used at module level → scope=module_level, weight=1.0
|
| 182 |
+
Circular imports → flagged as edge with type=circular, added to linter_flags
|
| 183 |
+
```
|
| 184 |
+
|
| 185 |
+
---
|
| 186 |
|
| 187 |
+
## Observation Token Budget
|
| 188 |
+
|
| 189 |
+
```
|
| 190 |
+
Current module full code: ~800 tokens (hard cap, truncate with notice)
|
| 191 |
+
AST summary of current: ~100 tokens
|
| 192 |
+
Direct dependency summaries: ~50 tokens × up to 5 deps = 250 tokens
|
| 193 |
+
Dependent summaries: ~50 tokens × up to 3 = 150 tokens
|
| 194 |
+
Existing neighbor reviews: ~30 tokens × up to 4 = 120 tokens
|
| 195 |
+
Task description + action space: ~200 tokens
|
| 196 |
+
Buffer: ~280 tokens
|
| 197 |
+
─────────────────────────────────────────────
|
| 198 |
+
Total: ~1900 tokens (well within E4B 128K window)
|
| 199 |
+
```
|
| 200 |
+
|
| 201 |
+
If a module has >5 direct dependencies, rank by betweenness centrality and include top 5 only.
|
| 202 |
+
|
| 203 |
+
---
|
| 204 |
+
|
| 205 |
+
## Action Space
|
| 206 |
+
|
| 207 |
+
```python
|
| 208 |
+
action_type options:
|
| 209 |
+
FLAG_STYLE # style/formatting issue
|
| 210 |
+
FLAG_BUG # logic error
|
| 211 |
+
FLAG_SECURITY # security vulnerability
|
| 212 |
+
FLAG_DEPENDENCY_ISSUE # issue caused by upstream module
|
| 213 |
+
ADD_COMMENT # explanation (requires content field)
|
| 214 |
+
REQUEST_CONTEXT # fetch full code of a neighbor (-0.1 reward cost)
|
| 215 |
+
REQUEST_CHANGES # end episode, verdict = changes needed
|
| 216 |
+
APPROVE # end episode, verdict = approved
|
| 217 |
+
AMEND_REVIEW # update a prior annotation on a neighbor node
|
| 218 |
+
|
| 219 |
+
Fields:
|
| 220 |
+
action_type: required
|
| 221 |
+
target_line: optional int
|
| 222 |
+
content: required for ADD_COMMENT, AMEND_REVIEW
|
| 223 |
+
attributed_to: optional module_id (for FLAG_DEPENDENCY_ISSUE, AMEND_REVIEW)
|
| 224 |
+
context_request: required for REQUEST_CONTEXT (module_id to fetch)
|
| 225 |
+
```
|
| 226 |
+
|
| 227 |
+
---
|
| 228 |
+
|
| 229 |
+
## Reward Table
|
| 230 |
+
|
| 231 |
+
```
|
| 232 |
+
Correct FLAG_* matching linter ground truth: +0.5
|
| 233 |
+
Accurate ADD_COMMENT (keyword match to linter desc): +0.3
|
| 234 |
+
FLAG_DEPENDENCY_ISSUE with correct attribution: +0.6
|
| 235 |
+
FLAG_DEPENDENCY_ISSUE wrong attribution: +0.1
|
| 236 |
+
AMEND_REVIEW correctly updating prior annotation: +0.4
|
| 237 |
+
REQUEST_CONTEXT (investigation cost): -0.1
|
| 238 |
+
False positive flag (no linter match): -0.2
|
| 239 |
+
APPROVE on module with unflagged critical issues: -1.0
|
| 240 |
+
REQUEST_CHANGES on clean module: -0.3
|
| 241 |
+
Episode completion bonus (all issues caught): +0.2
|
| 242 |
+
```
|
| 243 |
+
|
| 244 |
+
---
|
| 245 |
+
|
| 246 |
+
## Grader Architecture
|
| 247 |
+
|
| 248 |
+
### Easy Grader (fully deterministic)
|
| 249 |
+
- Load linter_flags JSON from DB for current module
|
| 250 |
+
- For each agent FLAG_* action: check if a matching linter flag exists (type + line ±3)
|
| 251 |
+
- Score per action, aggregate for episode
|
| 252 |
+
- No LLM call. Zero variance.
|
| 253 |
+
|
| 254 |
+
### Medium Grader (fully deterministic)
|
| 255 |
+
- Easy grader logic PLUS:
|
| 256 |
+
- For ADD_COMMENT: extract keywords from linter flag description, check overlap with agent comment (Jaccard similarity > 0.3 = match)
|
| 257 |
+
- For line attribution: ±3 line tolerance
|
| 258 |
+
- Still no LLM call.
|
| 259 |
+
|
| 260 |
+
### Hard Grader (quasi-deterministic)
|
| 261 |
+
- Graph consistency check (deterministic):
|
| 262 |
+
If FLAG_DEPENDENCY_ISSUE with attributed_to=X: verify edge(current → X) or edge(X → current) exists in graph
|
| 263 |
+
If no edge: reward = 0.0, feedback = "no dependency relationship found"
|
| 264 |
+
- LLM-as-judge (temperature=0, fixed rubric):
|
| 265 |
+
Separate API call to judge model (NOT the agent)
|
| 266 |
+
Fixed system prompt with scoring rubric
|
| 267 |
+
Scores cascade reasoning quality: 0.0 | 0.5 | 1.0
|
| 268 |
+
Document prompt hash in README for reproducibility
|
| 269 |
|
| 270 |
---
|
| 271 |
|
| 272 |
+
## Three Tasks
|
| 273 |
+
|
| 274 |
+
### Task 1: style_review (Easy)
|
| 275 |
+
- Input: single module with 3 pylint style violations
|
| 276 |
+
- Agent must: flag all 3 style issues
|
| 277 |
+
- No dependency context needed
|
| 278 |
+
- Grader: easy_grader only
|
| 279 |
+
- Expected baseline score: 0.7–0.9
|
| 280 |
+
|
| 281 |
+
### Task 2: logic_review (Medium)
|
| 282 |
+
- Input: checkout.py with a null-reference bug
|
| 283 |
+
- auth.py (its dependency) has validate_token that can return None
|
| 284 |
+
- Agent must: flag the bug + add comment referencing the None return risk
|
| 285 |
+
- Grader: medium_grader
|
| 286 |
+
- Expected baseline score: 0.4–0.7
|
| 287 |
+
|
| 288 |
+
### Task 3: cascade_review (Hard)
|
| 289 |
+
- Input: 3-module chain: config.py → auth.py → checkout.py
|
| 290 |
+
- Bug originates in config.py (missing key), propagates through auth.py, surfaces in checkout.py
|
| 291 |
+
- Agent must: flag issue in checkout.py AND attribute root cause to config.py
|
| 292 |
+
- Grader: hard_grader (graph consistency + LLM judge)
|
| 293 |
+
- Expected baseline score: 0.2–0.5
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 294 |
|
| 295 |
---
|
| 296 |
|
| 297 |
+
## Visualization
|
|
|
|
| 298 |
|
| 299 |
+
### Pyvis Interactive Graph (primary)
|
| 300 |
+
- Nodes colored by review_status: grey=pending, yellow=in_progress, green=approved, red=changes_requested
|
| 301 |
+
- Node size = number of dependents (centrality)
|
| 302 |
+
- Edge color: blue=explicit_import, orange=implicit, red=circular
|
| 303 |
+
- Edge thickness = weight (1.0 explicit, 0.5 implicit)
|
| 304 |
+
- Click node → shows review_annotations panel
|
| 305 |
+
- Rendered as standalone HTML, embedded in HF Space
|
|
|
|
|
|
|
| 306 |
|
| 307 |
+
### Final Report Output (end of all episodes)
|
| 308 |
+
- `graphreview_report.md`: per-module sections with verdict + issues + cascade attributions
|
| 309 |
+
- `graphreview_report.json`: machine-readable full graph + annotations
|
| 310 |
+
- `graphreview_graph.html`: pyvis interactive visualization
|
| 311 |
|
| 312 |
---
|
| 313 |
|
| 314 |
+
## inference.py Log Format (Mandatory)
|
|
|
|
| 315 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 316 |
```
|
| 317 |
+
[START] task=cascade_review module_count=3
|
| 318 |
+
[STEP] module=checkout.py action=FLAG_BUG line=24 reward=0.5 cumulative=0.5
|
| 319 |
+
[STEP] module=checkout.py action=ADD_COMMENT content="null risk from auth" reward=0.3 cumulative=0.8
|
| 320 |
+
[STEP] module=checkout.py action=FLAG_DEPENDENCY_ISSUE attributed_to=auth.py reward=0.6 cumulative=1.4
|
| 321 |
+
[STEP] module=checkout.py action=REQUEST_CHANGES reward=0.2 cumulative=1.6 done=true
|
| 322 |
+
[STEP] module=auth.py action=FLAG_BUG line=15 reward=0.5 cumulative=2.1
|
| 323 |
+
[STEP] module=auth.py action=FLAG_DEPENDENCY_ISSUE attributed_to=config.py reward=0.6 cumulative=2.7
|
| 324 |
+
[STEP] module=auth.py action=REQUEST_CHANGES reward=0.2 cumulative=2.9 done=true
|
| 325 |
+
[STEP] module=config.py action=FLAG_BUG line=8 reward=0.5 cumulative=3.4
|
| 326 |
+
[STEP] module=config.py action=REQUEST_CHANGES reward=0.2 cumulative=3.6 done=true
|
| 327 |
+
[END] task=cascade_review total_reward=3.6 modules_reviewed=3 report=graphreview_report.md
|
| 328 |
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 329 |
|
| 330 |
---
|
| 331 |
|
| 332 |
+
## Phase 1 — Persistence Layer & Sample Project
|
| 333 |
+
**Goal: Parse once, store forever, never re-parse**
|
| 334 |
+
|
| 335 |
+
Build:
|
| 336 |
+
- `sample_project/` — 10 Python files, ~50 functions total, with injected known bugs for each task
|
| 337 |
+
- `db/models.py` — all SQLAlchemy ORM models
|
| 338 |
+
- `db/database.py` — engine setup, session factory, init_db()
|
| 339 |
+
- `db/seed.py` — orchestrate full parse → lint → store pipeline
|
| 340 |
+
- `parser/ast_parser.py` — extract structure per file using Python ast
|
| 341 |
+
- `parser/chunker.py` — split files >300 lines by class/function into sub-nodes
|
| 342 |
+
- `parser/graph_builder.py` — build NetworkX DiGraph, explicit + implicit edges
|
| 343 |
+
- `parser/summarizer.py` — ~50 token summaries per node
|
| 344 |
+
|
| 345 |
+
Success criteria:
|
| 346 |
+
- seed.py completes in <30s on sample_project
|
| 347 |
+
- Second run detects seeded flag, loads in <1s
|
| 348 |
+
- All modules, edges, linter_flags correctly stored
|
| 349 |
+
- Chunking correctly splits a 400-line test file into sub-nodes
|
| 350 |
+
|
| 351 |
+
---
|
| 352 |
+
|
| 353 |
+
## Phase 2 — Graph Manager & Observation Builder
|
| 354 |
+
**Goal: Efficient, token-budgeted observations from DB**
|
| 355 |
+
|
| 356 |
+
Build:
|
| 357 |
+
- `graph/graph_manager.py` — load graph, traversal order, neighbor queries
|
| 358 |
+
- `graph/token_budget.py` — enforce per-component token limits
|
| 359 |
+
- `env/observation.py` — Pydantic CodeObservation model
|
| 360 |
+
|
| 361 |
+
Success criteria:
|
| 362 |
+
- Observation for any node fits within 2000 token budget
|
| 363 |
+
- Traversal order: leaf nodes first, high-centrality nodes last
|
| 364 |
+
- REQUEST_CONTEXT returns full neighbor code within budget
|
| 365 |
+
|
| 366 |
+
---
|
| 367 |
+
|
| 368 |
+
## Phase 3 — Action Space, Reward Engine & Graders
|
| 369 |
+
**Goal: All actions scored correctly and deterministically**
|
| 370 |
+
|
| 371 |
+
Build:
|
| 372 |
+
- `env/action.py` — Pydantic ReviewAction
|
| 373 |
+
- `env/reward.py` — Pydantic ReviewReward + reward table logic
|
| 374 |
+
- `graders/base_grader.py` — abstract interface
|
| 375 |
+
- `graders/easy_grader.py` — linter match
|
| 376 |
+
- `graders/medium_grader.py` — linter + keyword + line attribution
|
| 377 |
+
- `graders/hard_grader.py` — graph consistency + LLM judge
|
| 378 |
+
|
| 379 |
+
Success criteria:
|
| 380 |
+
- Easy grader: same input always gives same output (verified with 10 runs)
|
| 381 |
+
- Hard grader: temperature=0 verified, prompt hash documented
|
| 382 |
+
- All reward values within 0.0–1.0 range
|
| 383 |
+
- False positive and false negative cases handled explicitly
|
| 384 |
|
| 385 |
---
|
| 386 |
|
| 387 |
+
## Phase 4 — OpenEnv Core
|
| 388 |
+
**Goal: Fully compliant step() / reset() / state()**
|
| 389 |
+
|
| 390 |
+
Build:
|
| 391 |
+
- `env/environment.py` — CodeReviewEnv main class
|
| 392 |
+
- `env/state.py` — GraphState Pydantic model
|
| 393 |
+
- `tasks/task_registry.py` + 3 task files
|
| 394 |
+
- `openenv.yaml`
|
| 395 |
+
- `server.py` — FastAPI HTTP wrapper
|
| 396 |
|
| 397 |
+
Success criteria:
|
| 398 |
+
- `openenv validate` passes
|
| 399 |
+
- All 3 tasks run end-to-end without error
|
| 400 |
+
- state() correctly returns full annotated graph
|
| 401 |
+
- reset() clears only current task annotations, not full DB
|
|
|
|
|
|
|
|
|
|
| 402 |
|
| 403 |
---
|
| 404 |
|
| 405 |
+
## Phase 5 — Visualization & Reporting
|
| 406 |
+
**Goal: Useful output the user actually sees**
|
| 407 |
+
|
| 408 |
+
Build:
|
| 409 |
+
- `visualizer/pyvis_renderer.py` — interactive HTML graph
|
| 410 |
+
- `visualizer/report_generator.py` — markdown + JSON report
|
| 411 |
+
|
| 412 |
+
Success criteria:
|
| 413 |
+
- Graph colors update correctly as reviews accumulate
|
| 414 |
+
- Report correctly attributes cascade issues across modules
|
| 415 |
+
- HTML renders in browser without external dependencies
|
| 416 |
+
|
| 417 |
+
---
|
| 418 |
|
| 419 |
+
## Phase 6 — inference.py & Deployment
|
| 420 |
+
**Goal: Baseline script + Docker + HF Space**
|
| 421 |
+
|
| 422 |
+
Build:
|
| 423 |
+
- `inference.py` — runs Gemma 4 E4B against all 3 tasks, emits mandatory log format
|
| 424 |
+
- `Dockerfile` — clean build + run
|
| 425 |
+
- `README.md` — full documentation
|
| 426 |
+
- HF Space deployment
|
| 427 |
+
|
| 428 |
+
Success criteria:
|
| 429 |
+
- inference.py completes all 3 tasks in <20 minutes
|
| 430 |
+
- Runs on 2 vCPU / 8GB RAM
|
| 431 |
+
- docker build && docker run works cleanly
|
| 432 |
+
- HF Space deploys and responds to reset() ping
|
| 433 |
+
- Baseline scores reproducible across 3 runs
|
Reviewer.md
ADDED
|
@@ -0,0 +1,94 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Phase Reviewer Prompt — GraphReview RL Environment
|
| 2 |
+
|
| 3 |
+
You are a senior engineer and RL systems expert reviewing completed phases of a competitive hackathon project called GraphReview. Your job is to catch problems before they compound into later phases.
|
| 4 |
+
|
| 5 |
+
---
|
| 6 |
+
|
| 7 |
+
## Project Context
|
| 8 |
+
|
| 9 |
+
GraphReview is an OpenEnv-compliant RL environment for graph-aware Python code review. Key constraints:
|
| 10 |
+
- SQLite is the persistent store — DB schema changes are expensive after Phase 1
|
| 11 |
+
- Pydantic v2 models are shared interfaces — field changes break multiple components
|
| 12 |
+
- Graders must be deterministic — non-determinism is a disqualification risk
|
| 13 |
+
- inference.py log format is a judging contract — any deviation fails automated scoring
|
| 14 |
+
- Must run in <20 min on 2 vCPU / 8GB RAM
|
| 15 |
+
- Must pass `openenv validate` and `docker build && docker run`
|
| 16 |
+
|
| 17 |
+
---
|
| 18 |
+
|
| 19 |
+
## Your Review Checklist
|
| 20 |
+
|
| 21 |
+
For every phase submitted to you, check ALL of the following:
|
| 22 |
+
|
| 23 |
+
### Correctness
|
| 24 |
+
- [ ] Does the code do what the phase plan says it should do?
|
| 25 |
+
- [ ] Are all success criteria from the phase plan met?
|
| 26 |
+
- [ ] Are edge cases handled (empty files, circular imports, modules with no dependencies, modules with >5 deps)?
|
| 27 |
+
- [ ] Does reset() only clear current task annotations, not the full DB?
|
| 28 |
+
- [ ] Does state() return the full graph including all prior annotations?
|
| 29 |
+
|
| 30 |
+
### Interface Integrity
|
| 31 |
+
- [ ] Do all Pydantic models match the spec exactly (field names, types, Optional handling)?
|
| 32 |
+
- [ ] Do function signatures match what later phases will call?
|
| 33 |
+
- [ ] Are all DB foreign keys correct and consistent?
|
| 34 |
+
- [ ] Is the module_id format consistent everywhere (relative path, sub-node format)?
|
| 35 |
+
|
| 36 |
+
### Determinism & Reproducibility
|
| 37 |
+
- [ ] Do easy and medium graders make zero LLM calls?
|
| 38 |
+
- [ ] Is hard grader temperature explicitly set to 0?
|
| 39 |
+
- [ ] Would running the same input twice produce the same reward?
|
| 40 |
+
- [ ] Is the LLM judge prompt a static string (not variable-dependent)?
|
| 41 |
+
|
| 42 |
+
### Performance & Resource Constraints
|
| 43 |
+
- [ ] Will seed.py complete in <30s on the sample_project?
|
| 44 |
+
- [ ] Will inference.py complete all 3 tasks in <20 minutes?
|
| 45 |
+
- [ ] Does token_budget.py enforce the 2000 token cap?
|
| 46 |
+
- [ ] Will the environment run on 2 vCPU / 8GB RAM?
|
| 47 |
+
|
| 48 |
+
### OpenEnv Compliance
|
| 49 |
+
- [ ] Does openenv.yaml include all required fields?
|
| 50 |
+
- [ ] Do step()/reset()/state() match the OpenEnv spec exactly?
|
| 51 |
+
- [ ] Will `openenv validate` pass based on what's been built?
|
| 52 |
+
|
| 53 |
+
### Code Quality
|
| 54 |
+
- [ ] Are all functions fully typed?
|
| 55 |
+
- [ ] Are Pydantic models complete with no missing fields?
|
| 56 |
+
- [ ] Is SQLAlchemy session handling correct (no session leaks)?
|
| 57 |
+
- [ ] Are there no hardcoded paths that break in Docker?
|
| 58 |
+
|
| 59 |
+
### Forward Compatibility
|
| 60 |
+
- [ ] Will this phase's output work cleanly with the next phase's inputs?
|
| 61 |
+
- [ ] Are there any design decisions that will cause pain in later phases?
|
| 62 |
+
- [ ] Is the DB schema flexible enough for the remaining phases?
|
| 63 |
+
|
| 64 |
+
---
|
| 65 |
+
|
| 66 |
+
## How to Report Issues
|
| 67 |
+
|
| 68 |
+
For each issue found, report:
|
| 69 |
+
|
| 70 |
+
**Severity:** Critical | Major | Minor
|
| 71 |
+
|
| 72 |
+
**Critical** — will cause disqualification or break a later phase entirely
|
| 73 |
+
**Major** — will cause incorrect behavior or significant rework
|
| 74 |
+
**Minor** — suboptimal but won't break anything
|
| 75 |
+
|
| 76 |
+
**Format:**
|
| 77 |
+
```
|
| 78 |
+
[CRITICAL] File: graders/hard_grader.py
|
| 79 |
+
Issue: temperature not set to 0 on judge API call
|
| 80 |
+
Why it matters: grader will produce different scores on identical inputs, failing reproducibility check
|
| 81 |
+
Fix: add temperature=0 to API call parameters
|
| 82 |
+
```
|
| 83 |
+
|
| 84 |
+
---
|
| 85 |
+
|
| 86 |
+
## After Reviewing
|
| 87 |
+
|
| 88 |
+
Summarise:
|
| 89 |
+
1. Total issues found by severity
|
| 90 |
+
2. Whether the phase passes (no Criticals) or fails (any Critical)
|
| 91 |
+
3. The single most important thing to fix before moving to the next phase
|
| 92 |
+
4. Any forward-looking risks the builder should keep in mind for upcoming phases
|
| 93 |
+
|
| 94 |
+
Do not approve a phase with any Critical issues. Do not nitpick Minor issues if the phase is under time pressure — flag them but do not block.
|
code-review-env/README.md
CHANGED
|
@@ -1,11 +1,70 @@
|
|
| 1 |
# CodeReviewEnv
|
| 2 |
|
| 3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 4 |
|
| 5 |
## Quickstart
|
| 6 |
|
| 7 |
```bash
|
| 8 |
pip install -r requirements.txt
|
| 9 |
-
python -m
|
| 10 |
python -m db.store --module checkout
|
| 11 |
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
# CodeReviewEnv
|
| 2 |
|
| 3 |
+
Dependency-aware code review RL environment with persistent SQLite graph storage.
|
| 4 |
+
|
| 5 |
+
## Current Status
|
| 6 |
+
|
| 7 |
+
- Phase 1: implemented and validated
|
| 8 |
+
- persistent seed pipeline with hash-based cache
|
| 9 |
+
- parser/chunker/graph builder + linter findings persistence
|
| 10 |
+
- Phase 2: implemented
|
| 11 |
+
- graph manager for DB-backed graph loading and deterministic traversal
|
| 12 |
+
- hard token budget enforcement (max 2000 tokens)
|
| 13 |
+
- strict Pydantic v2 observation models
|
| 14 |
+
- observation builder with neighbor summaries and REQUEST_CONTEXT support
|
| 15 |
+
|
| 16 |
+
## Implemented Phase 2 Components
|
| 17 |
+
|
| 18 |
+
- [graph/graph_manager.py](graph/graph_manager.py)
|
| 19 |
+
- Loads graph nodes/edges from SQLite.
|
| 20 |
+
- Exposes neighbor queries (in/out/both).
|
| 21 |
+
- Provides deterministic traversal ordering with leaf-first preference.
|
| 22 |
+
|
| 23 |
+
- [graph/token_budget.py](graph/token_budget.py)
|
| 24 |
+
- Enforces hard observation token cap (<= 2000).
|
| 25 |
+
- Applies per-component token limits.
|
| 26 |
+
- Truncates oversized components with explicit marker.
|
| 27 |
+
|
| 28 |
+
- [env/observation.py](env/observation.py)
|
| 29 |
+
- Strict Pydantic models: `NeighborSummary`, `RequestedContext`, `CodeObservation`.
|
| 30 |
+
- Forbids extra fields and type coercion.
|
| 31 |
+
- Enforces `total_tokens <= 2000`.
|
| 32 |
+
|
| 33 |
+
- [env/observation_builder.py](env/observation_builder.py)
|
| 34 |
+
- Builds observation payloads from DB graph state.
|
| 35 |
+
- Ranks dependency context using graph centrality.
|
| 36 |
+
- Produces validated `CodeObservation` objects.
|
| 37 |
+
|
| 38 |
+
## Compatibility
|
| 39 |
+
|
| 40 |
+
- [env/graph.py](env/graph.py) remains stable for existing callers and now delegates to GraphManager.
|
| 41 |
|
| 42 |
## Quickstart
|
| 43 |
|
| 44 |
```bash
|
| 45 |
pip install -r requirements.txt
|
| 46 |
+
python -m db.seed sample_project/
|
| 47 |
python -m db.store --module checkout
|
| 48 |
```
|
| 49 |
+
|
| 50 |
+
## Validation
|
| 51 |
+
|
| 52 |
+
Run tests:
|
| 53 |
+
|
| 54 |
+
```bash
|
| 55 |
+
pytest -q
|
| 56 |
+
```
|
| 57 |
+
|
| 58 |
+
Phase 2-focused tests:
|
| 59 |
+
|
| 60 |
+
```bash
|
| 61 |
+
pytest -q tests/test_phase2_graph_manager.py tests/test_phase2_token_budget.py tests/test_phase2_observation.py
|
| 62 |
+
```
|
| 63 |
+
|
| 64 |
+
## Security and Quality Notes
|
| 65 |
+
|
| 66 |
+
- SQLite is used as the source of truth for graph and review state.
|
| 67 |
+
- No dynamic code execution is introduced in Phase 2 paths.
|
| 68 |
+
- Input handling fails closed for unknown `module_id` values.
|
| 69 |
+
- Observations are hard-capped to prevent context overflow.
|
| 70 |
+
- Code follows typed interfaces and minimal stateful behavior.
|
code-review-env/db/database.py
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
from db.migrations import get_default_db_path, get_engine, init_db
|
| 2 |
+
|
| 3 |
+
__all__ = ["get_default_db_path", "get_engine", "init_db"]
|
code-review-env/db/models.py
ADDED
|
@@ -0,0 +1,25 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
from db.schema import (
|
| 2 |
+
EdgeType,
|
| 3 |
+
EpisodeRecord,
|
| 4 |
+
LinterFinding,
|
| 5 |
+
ModuleEdge,
|
| 6 |
+
ModuleNode,
|
| 7 |
+
ReviewAnnotation,
|
| 8 |
+
ReviewStatus,
|
| 9 |
+
SeedMeta,
|
| 10 |
+
Severity,
|
| 11 |
+
TaskDefinition,
|
| 12 |
+
)
|
| 13 |
+
|
| 14 |
+
__all__ = [
|
| 15 |
+
"EdgeType",
|
| 16 |
+
"EpisodeRecord",
|
| 17 |
+
"LinterFinding",
|
| 18 |
+
"ModuleEdge",
|
| 19 |
+
"ModuleNode",
|
| 20 |
+
"ReviewAnnotation",
|
| 21 |
+
"ReviewStatus",
|
| 22 |
+
"SeedMeta",
|
| 23 |
+
"Severity",
|
| 24 |
+
"TaskDefinition",
|
| 25 |
+
]
|
code-review-env/db/schema.py
CHANGED
|
@@ -9,7 +9,9 @@ from sqlmodel import Field, SQLModel
|
|
| 9 |
|
| 10 |
class EdgeType(StrEnum):
|
| 11 |
EXPLICIT_IMPORT = "explicit_import"
|
| 12 |
-
|
|
|
|
|
|
|
| 13 |
|
| 14 |
|
| 15 |
class ReviewStatus(StrEnum):
|
|
@@ -28,8 +30,13 @@ class ModuleNode(SQLModel, table=True):
|
|
| 28 |
id: Optional[int] = Field(default=None, primary_key=True)
|
| 29 |
source_root: str = Field(index=True)
|
| 30 |
module_id: str = Field(index=True)
|
|
|
|
| 31 |
raw_code: str
|
| 32 |
ast_summary: str
|
|
|
|
|
|
|
|
|
|
|
|
|
| 33 |
dependency_reason: str = ""
|
| 34 |
review_annotation: Optional[str] = None
|
| 35 |
review_status: ReviewStatus = Field(default=ReviewStatus.PENDING)
|
|
@@ -89,3 +96,8 @@ class TaskDefinition(SQLModel, table=True):
|
|
| 89 |
target_module_id: str = Field(index=True)
|
| 90 |
description: str
|
| 91 |
ground_truth_ref: str
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 9 |
|
| 10 |
class EdgeType(StrEnum):
|
| 11 |
EXPLICIT_IMPORT = "explicit_import"
|
| 12 |
+
IMPLICIT_DEPENDENCY = "implicit_dependency"
|
| 13 |
+
INTRA_FILE = "intra_file"
|
| 14 |
+
CIRCULAR = "circular"
|
| 15 |
|
| 16 |
|
| 17 |
class ReviewStatus(StrEnum):
|
|
|
|
| 30 |
id: Optional[int] = Field(default=None, primary_key=True)
|
| 31 |
source_root: str = Field(index=True)
|
| 32 |
module_id: str = Field(index=True)
|
| 33 |
+
name: Optional[str] = None
|
| 34 |
raw_code: str
|
| 35 |
ast_summary: str
|
| 36 |
+
summary: Optional[str] = None
|
| 37 |
+
linter_flags: str = "[]"
|
| 38 |
+
parent_module_id: Optional[str] = Field(default=None, index=True)
|
| 39 |
+
is_chunk: bool = False
|
| 40 |
dependency_reason: str = ""
|
| 41 |
review_annotation: Optional[str] = None
|
| 42 |
review_status: ReviewStatus = Field(default=ReviewStatus.PENDING)
|
|
|
|
| 96 |
target_module_id: str = Field(index=True)
|
| 97 |
description: str
|
| 98 |
ground_truth_ref: str
|
| 99 |
+
|
| 100 |
+
|
| 101 |
+
class SeedMeta(SQLModel, table=True):
|
| 102 |
+
key: str = Field(primary_key=True)
|
| 103 |
+
value: str
|
code-review-env/db/seed.py
ADDED
|
@@ -0,0 +1,143 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
from __future__ import annotations
|
| 2 |
+
|
| 3 |
+
import argparse
|
| 4 |
+
import hashlib
|
| 5 |
+
import json
|
| 6 |
+
from datetime import UTC, datetime
|
| 7 |
+
from pathlib import Path
|
| 8 |
+
|
| 9 |
+
from db.store import Store
|
| 10 |
+
from parser.ast_parser import parse_python_file
|
| 11 |
+
from parser.chunker import chunk_module
|
| 12 |
+
from parser.graph_builder import build_edges
|
| 13 |
+
from parser.linter import run_linters
|
| 14 |
+
from parser.summarizer import summarize_module
|
| 15 |
+
|
| 16 |
+
|
| 17 |
+
def _codebase_hash(target_dir: Path) -> str:
|
| 18 |
+
digest = hashlib.sha256()
|
| 19 |
+
for path in sorted(target_dir.rglob("*.py")):
|
| 20 |
+
rel = path.relative_to(target_dir).as_posix()
|
| 21 |
+
digest.update(rel.encode("utf-8"))
|
| 22 |
+
digest.update(path.read_bytes())
|
| 23 |
+
return digest.hexdigest()
|
| 24 |
+
|
| 25 |
+
|
| 26 |
+
def _seed_meta_key(source_root: str) -> str:
|
| 27 |
+
return f"seeded:{source_root}"
|
| 28 |
+
|
| 29 |
+
|
| 30 |
+
def seed_project(target_dir: Path, db_path: str | None = None, force: bool = False) -> dict[str, object]:
|
| 31 |
+
target_dir = target_dir.resolve()
|
| 32 |
+
store = Store(source_root=str(target_dir), db_path=db_path)
|
| 33 |
+
|
| 34 |
+
current_hash = _codebase_hash(target_dir)
|
| 35 |
+
meta_key = _seed_meta_key(str(target_dir))
|
| 36 |
+
existing_raw = store.get_meta(meta_key)
|
| 37 |
+
existing = json.loads(existing_raw) if existing_raw else {}
|
| 38 |
+
|
| 39 |
+
if (
|
| 40 |
+
not force
|
| 41 |
+
and store.has_nodes()
|
| 42 |
+
and existing.get("codebase_hash") == current_hash
|
| 43 |
+
and existing.get("seeded") is True
|
| 44 |
+
):
|
| 45 |
+
return {
|
| 46 |
+
"seeded": True,
|
| 47 |
+
"loaded_from_cache": True,
|
| 48 |
+
"codebase_hash": current_hash,
|
| 49 |
+
"node_count": int(existing.get("node_count", 0)),
|
| 50 |
+
"edge_count": int(existing.get("edge_count", 0)),
|
| 51 |
+
}
|
| 52 |
+
|
| 53 |
+
store.clear_source_graph()
|
| 54 |
+
|
| 55 |
+
py_files = sorted(target_dir.rglob("*.py"))
|
| 56 |
+
parsed_modules = [parse_python_file(path, target_dir) for path in py_files]
|
| 57 |
+
module_ids = {parsed.module_id for parsed in parsed_modules}
|
| 58 |
+
|
| 59 |
+
chunk_ids_by_parent: dict[str, set[str]] = {}
|
| 60 |
+
|
| 61 |
+
for path, parsed in zip(py_files, parsed_modules):
|
| 62 |
+
issues = run_linters(path)
|
| 63 |
+
summary = summarize_module(parsed, issues)
|
| 64 |
+
linter_flags = json.dumps([issue.model_dump() for issue in issues])
|
| 65 |
+
|
| 66 |
+
chunk_result = chunk_module(parsed, max_lines=300)
|
| 67 |
+
parent = chunk_result.parent
|
| 68 |
+
store.upsert_node(
|
| 69 |
+
module_id=parent.module_id,
|
| 70 |
+
name=parent.name,
|
| 71 |
+
raw_code=parent.code,
|
| 72 |
+
ast_summary=summary,
|
| 73 |
+
summary=summary,
|
| 74 |
+
linter_flags=linter_flags,
|
| 75 |
+
dependency_reason="Imports and symbol usage captured from AST",
|
| 76 |
+
parent_module_id=parent.parent_module_id,
|
| 77 |
+
is_chunk=parent.is_chunk,
|
| 78 |
+
)
|
| 79 |
+
|
| 80 |
+
if chunk_result.chunks:
|
| 81 |
+
chunk_ids_by_parent[parent.module_id] = {chunk.module_id for chunk in chunk_result.chunks}
|
| 82 |
+
|
| 83 |
+
for chunk in chunk_result.chunks:
|
| 84 |
+
chunk_summary = f"Chunk {chunk.name} lines {chunk.start_line}-{chunk.end_line}"
|
| 85 |
+
store.upsert_node(
|
| 86 |
+
module_id=chunk.module_id,
|
| 87 |
+
name=chunk.name,
|
| 88 |
+
raw_code=chunk.code,
|
| 89 |
+
ast_summary=chunk_summary,
|
| 90 |
+
summary=chunk_summary,
|
| 91 |
+
linter_flags="[]",
|
| 92 |
+
dependency_reason="Top-level class/function chunk",
|
| 93 |
+
parent_module_id=chunk.parent_module_id,
|
| 94 |
+
is_chunk=chunk.is_chunk,
|
| 95 |
+
)
|
| 96 |
+
|
| 97 |
+
store.replace_findings_for_module(parsed.module_id, [issue.model_dump() for issue in issues])
|
| 98 |
+
|
| 99 |
+
edges = build_edges(parsed_modules, module_ids, chunk_ids_by_parent)
|
| 100 |
+
for edge in edges:
|
| 101 |
+
store.upsert_edge(
|
| 102 |
+
source_module_id=edge.source_module_id,
|
| 103 |
+
target_module_id=edge.target_module_id,
|
| 104 |
+
edge_type=edge.edge_type,
|
| 105 |
+
import_line=edge.import_line,
|
| 106 |
+
weight=edge.weight,
|
| 107 |
+
)
|
| 108 |
+
|
| 109 |
+
snapshot = store.get_full_graph()
|
| 110 |
+
meta_payload = {
|
| 111 |
+
"seeded": True,
|
| 112 |
+
"seeded_at": datetime.now(UTC).isoformat(),
|
| 113 |
+
"codebase_hash": current_hash,
|
| 114 |
+
"node_count": len(snapshot.nodes),
|
| 115 |
+
"edge_count": len(snapshot.edges),
|
| 116 |
+
}
|
| 117 |
+
store.set_meta(meta_key, json.dumps(meta_payload))
|
| 118 |
+
|
| 119 |
+
return {
|
| 120 |
+
"seeded": True,
|
| 121 |
+
"loaded_from_cache": False,
|
| 122 |
+
"codebase_hash": current_hash,
|
| 123 |
+
"node_count": len(snapshot.nodes),
|
| 124 |
+
"edge_count": len(snapshot.edges),
|
| 125 |
+
}
|
| 126 |
+
|
| 127 |
+
|
| 128 |
+
def _build_parser() -> argparse.ArgumentParser:
|
| 129 |
+
parser = argparse.ArgumentParser(description="Seed graph database from Python project")
|
| 130 |
+
parser.add_argument("target", help="Path to target codebase")
|
| 131 |
+
parser.add_argument("--db-path", default=None, help="Path to SQLite database")
|
| 132 |
+
parser.add_argument("--force", action="store_true", help="Force re-parse even if seeded")
|
| 133 |
+
return parser
|
| 134 |
+
|
| 135 |
+
|
| 136 |
+
def main() -> None:
|
| 137 |
+
args = _build_parser().parse_args()
|
| 138 |
+
result = seed_project(Path(args.target), db_path=args.db_path, force=args.force)
|
| 139 |
+
print(json.dumps(result, indent=2))
|
| 140 |
+
|
| 141 |
+
|
| 142 |
+
if __name__ == "__main__":
|
| 143 |
+
main()
|
code-review-env/db/store.py
CHANGED
|
@@ -17,6 +17,7 @@ from db.schema import (
|
|
| 17 |
ModuleNode,
|
| 18 |
ReviewAnnotation,
|
| 19 |
ReviewStatus,
|
|
|
|
| 20 |
Severity,
|
| 21 |
)
|
| 22 |
|
|
@@ -77,6 +78,11 @@ class Store:
|
|
| 77 |
raw_code: str,
|
| 78 |
ast_summary: str,
|
| 79 |
dependency_reason: str,
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 80 |
) -> ModuleNode:
|
| 81 |
with Session(self.engine) as session:
|
| 82 |
existing = session.exec(
|
|
@@ -86,8 +92,13 @@ class Store:
|
|
| 86 |
)
|
| 87 |
).first()
|
| 88 |
if existing:
|
|
|
|
| 89 |
existing.raw_code = raw_code
|
| 90 |
existing.ast_summary = ast_summary
|
|
|
|
|
|
|
|
|
|
|
|
|
| 91 |
existing.dependency_reason = dependency_reason
|
| 92 |
existing.updated_at = datetime.now(UTC)
|
| 93 |
session.add(existing)
|
|
@@ -98,8 +109,13 @@ class Store:
|
|
| 98 |
node = ModuleNode(
|
| 99 |
source_root=self.config.source_root,
|
| 100 |
module_id=module_id,
|
|
|
|
| 101 |
raw_code=raw_code,
|
| 102 |
ast_summary=ast_summary,
|
|
|
|
|
|
|
|
|
|
|
|
|
| 103 |
dependency_reason=dependency_reason,
|
| 104 |
)
|
| 105 |
session.add(node)
|
|
@@ -322,6 +338,21 @@ class Store:
|
|
| 322 |
).first()
|
| 323 |
return first_node is not None
|
| 324 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 325 |
def clear_source_graph(self) -> None:
|
| 326 |
with Session(self.engine) as session:
|
| 327 |
session.exec(
|
|
|
|
| 17 |
ModuleNode,
|
| 18 |
ReviewAnnotation,
|
| 19 |
ReviewStatus,
|
| 20 |
+
SeedMeta,
|
| 21 |
Severity,
|
| 22 |
)
|
| 23 |
|
|
|
|
| 78 |
raw_code: str,
|
| 79 |
ast_summary: str,
|
| 80 |
dependency_reason: str,
|
| 81 |
+
name: str | None = None,
|
| 82 |
+
summary: str | None = None,
|
| 83 |
+
linter_flags: str = "[]",
|
| 84 |
+
parent_module_id: str | None = None,
|
| 85 |
+
is_chunk: bool = False,
|
| 86 |
) -> ModuleNode:
|
| 87 |
with Session(self.engine) as session:
|
| 88 |
existing = session.exec(
|
|
|
|
| 92 |
)
|
| 93 |
).first()
|
| 94 |
if existing:
|
| 95 |
+
existing.name = name or existing.name
|
| 96 |
existing.raw_code = raw_code
|
| 97 |
existing.ast_summary = ast_summary
|
| 98 |
+
existing.summary = summary or existing.summary
|
| 99 |
+
existing.linter_flags = linter_flags
|
| 100 |
+
existing.parent_module_id = parent_module_id
|
| 101 |
+
existing.is_chunk = is_chunk
|
| 102 |
existing.dependency_reason = dependency_reason
|
| 103 |
existing.updated_at = datetime.now(UTC)
|
| 104 |
session.add(existing)
|
|
|
|
| 109 |
node = ModuleNode(
|
| 110 |
source_root=self.config.source_root,
|
| 111 |
module_id=module_id,
|
| 112 |
+
name=name,
|
| 113 |
raw_code=raw_code,
|
| 114 |
ast_summary=ast_summary,
|
| 115 |
+
summary=summary,
|
| 116 |
+
linter_flags=linter_flags,
|
| 117 |
+
parent_module_id=parent_module_id,
|
| 118 |
+
is_chunk=is_chunk,
|
| 119 |
dependency_reason=dependency_reason,
|
| 120 |
)
|
| 121 |
session.add(node)
|
|
|
|
| 338 |
).first()
|
| 339 |
return first_node is not None
|
| 340 |
|
| 341 |
+
def get_meta(self, key: str) -> Optional[str]:
|
| 342 |
+
with Session(self.engine) as session:
|
| 343 |
+
record = session.get(SeedMeta, key)
|
| 344 |
+
return record.value if record else None
|
| 345 |
+
|
| 346 |
+
def set_meta(self, key: str, value: str) -> None:
|
| 347 |
+
with Session(self.engine) as session:
|
| 348 |
+
record = session.get(SeedMeta, key)
|
| 349 |
+
if record:
|
| 350 |
+
record.value = value
|
| 351 |
+
session.add(record)
|
| 352 |
+
else:
|
| 353 |
+
session.add(SeedMeta(key=key, value=value))
|
| 354 |
+
session.commit()
|
| 355 |
+
|
| 356 |
def clear_source_graph(self) -> None:
|
| 357 |
with Session(self.engine) as session:
|
| 358 |
session.exec(
|
code-review-env/env/graph.py
CHANGED
|
@@ -4,11 +4,9 @@ from dataclasses import dataclass
|
|
| 4 |
from pathlib import Path
|
| 5 |
|
| 6 |
import networkx as nx
|
| 7 |
-
from sqlmodel import Session, select
|
| 8 |
|
| 9 |
-
from db.
|
| 10 |
-
from
|
| 11 |
-
from parser.ast_parser import parse_directory
|
| 12 |
|
| 13 |
|
| 14 |
@dataclass
|
|
@@ -20,79 +18,34 @@ class GraphLoadResult:
|
|
| 20 |
class DependencyGraph:
|
| 21 |
def __init__(self, target_dir: str | Path, db_path: str | Path | None = None) -> None:
|
| 22 |
self.target_dir = Path(target_dir).resolve()
|
| 23 |
-
self.
|
| 24 |
|
| 25 |
def load_or_build(self, force_reparse: bool = False) -> GraphLoadResult:
|
| 26 |
-
|
| 27 |
-
|
| 28 |
-
|
| 29 |
-
|
| 30 |
-
|
|
|
|
| 31 |
return GraphLoadResult(graph=self._build_graph(), loaded_from_cache=loaded_from_cache)
|
| 32 |
|
| 33 |
def _build_graph(self) -> nx.DiGraph:
|
| 34 |
-
|
| 35 |
-
with Session(self.store.engine) as session:
|
| 36 |
-
nodes = list(
|
| 37 |
-
session.exec(
|
| 38 |
-
select(ModuleNode).where(ModuleNode.source_root == self.store.config.source_root)
|
| 39 |
-
).all()
|
| 40 |
-
)
|
| 41 |
-
edges = list(
|
| 42 |
-
session.exec(
|
| 43 |
-
select(ModuleEdge).where(ModuleEdge.source_root == self.store.config.source_root)
|
| 44 |
-
).all()
|
| 45 |
-
)
|
| 46 |
-
|
| 47 |
-
for node in nodes:
|
| 48 |
-
graph.add_node(
|
| 49 |
-
node.module_id,
|
| 50 |
-
ast_summary=node.ast_summary,
|
| 51 |
-
review_status=node.review_status.value,
|
| 52 |
-
)
|
| 53 |
-
|
| 54 |
-
for edge in edges:
|
| 55 |
-
graph.add_edge(
|
| 56 |
-
edge.source_module_id,
|
| 57 |
-
edge.target_module_id,
|
| 58 |
-
import_line=edge.import_line,
|
| 59 |
-
edge_type=edge.edge_type.value,
|
| 60 |
-
weight=edge.weight,
|
| 61 |
-
)
|
| 62 |
-
|
| 63 |
-
return graph
|
| 64 |
|
| 65 |
def traversal_order(self, graph: nx.DiGraph | None = None) -> list[str]:
|
| 66 |
-
|
|
|
|
| 67 |
if graph.number_of_nodes() == 0:
|
| 68 |
return []
|
| 69 |
-
|
| 70 |
-
|
| 71 |
-
|
| 72 |
-
|
| 73 |
-
|
| 74 |
-
centrality = nx.betweenness_centrality(graph)
|
| 75 |
-
indegree = {node: graph.in_degree(node) for node in graph.nodes()}
|
| 76 |
-
queue = [node for node, deg in indegree.items() if deg == 0]
|
| 77 |
-
order: list[str] = []
|
| 78 |
-
|
| 79 |
-
def rank(node: str) -> tuple[float, float, str]:
|
| 80 |
-
return (
|
| 81 |
-
float(graph.out_degree(node)),
|
| 82 |
float(centrality.get(node, 0.0)),
|
| 83 |
-
node,
|
| 84 |
-
)
|
| 85 |
-
|
| 86 |
-
while queue:
|
| 87 |
-
queue.sort(key=rank)
|
| 88 |
-
current = queue.pop(0)
|
| 89 |
-
order.append(current)
|
| 90 |
-
for successor in sorted(graph.successors(current)):
|
| 91 |
-
indegree[successor] -= 1
|
| 92 |
-
if indegree[successor] == 0:
|
| 93 |
-
queue.append(successor)
|
| 94 |
-
|
| 95 |
-
return order
|
| 96 |
|
| 97 |
|
| 98 |
if __name__ == "__main__":
|
|
|
|
| 4 |
from pathlib import Path
|
| 5 |
|
| 6 |
import networkx as nx
|
|
|
|
| 7 |
|
| 8 |
+
from db.seed import seed_project
|
| 9 |
+
from graph.graph_manager import GraphManager
|
|
|
|
| 10 |
|
| 11 |
|
| 12 |
@dataclass
|
|
|
|
| 18 |
class DependencyGraph:
|
| 19 |
def __init__(self, target_dir: str | Path, db_path: str | Path | None = None) -> None:
|
| 20 |
self.target_dir = Path(target_dir).resolve()
|
| 21 |
+
self.graph_manager = GraphManager(source_root=self.target_dir, db_path=db_path)
|
| 22 |
|
| 23 |
def load_or_build(self, force_reparse: bool = False) -> GraphLoadResult:
|
| 24 |
+
result = seed_project(
|
| 25 |
+
self.target_dir,
|
| 26 |
+
db_path=str(self.graph_manager.store.config.db_path),
|
| 27 |
+
force=force_reparse,
|
| 28 |
+
)
|
| 29 |
+
loaded_from_cache = bool(result.get("loaded_from_cache", False))
|
| 30 |
return GraphLoadResult(graph=self._build_graph(), loaded_from_cache=loaded_from_cache)
|
| 31 |
|
| 32 |
def _build_graph(self) -> nx.DiGraph:
|
| 33 |
+
return self.graph_manager.load_graph()
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 34 |
|
| 35 |
def traversal_order(self, graph: nx.DiGraph | None = None) -> list[str]:
|
| 36 |
+
if graph is None:
|
| 37 |
+
return self.graph_manager.traversal_order()
|
| 38 |
if graph.number_of_nodes() == 0:
|
| 39 |
return []
|
| 40 |
+
centrality = nx.betweenness_centrality(graph, normalized=True)
|
| 41 |
+
return sorted(
|
| 42 |
+
graph.nodes(),
|
| 43 |
+
key=lambda node: (
|
| 44 |
+
int(graph.out_degree(node)),
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 45 |
float(centrality.get(node, 0.0)),
|
| 46 |
+
str(node),
|
| 47 |
+
),
|
| 48 |
+
)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 49 |
|
| 50 |
|
| 51 |
if __name__ == "__main__":
|
code-review-env/env/observation.py
ADDED
|
@@ -0,0 +1,62 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
from __future__ import annotations
|
| 2 |
+
|
| 3 |
+
from typing import Literal
|
| 4 |
+
|
| 5 |
+
from pydantic import BaseModel, ConfigDict, Field, field_validator
|
| 6 |
+
|
| 7 |
+
from graph.token_budget import MAX_TOTAL_TOKENS
|
| 8 |
+
|
| 9 |
+
|
| 10 |
+
class NeighborSummary(BaseModel):
|
| 11 |
+
model_config = ConfigDict(strict=True, extra="forbid")
|
| 12 |
+
|
| 13 |
+
module_id: str
|
| 14 |
+
relation: Literal["dependency", "dependent"]
|
| 15 |
+
summary: str
|
| 16 |
+
review_snippet: str | None = None
|
| 17 |
+
|
| 18 |
+
|
| 19 |
+
class RequestedContext(BaseModel):
|
| 20 |
+
model_config = ConfigDict(strict=True, extra="forbid")
|
| 21 |
+
|
| 22 |
+
module_id: str
|
| 23 |
+
code: str
|
| 24 |
+
was_truncated: bool
|
| 25 |
+
|
| 26 |
+
|
| 27 |
+
class CodeObservation(BaseModel):
|
| 28 |
+
model_config = ConfigDict(strict=True, extra="forbid")
|
| 29 |
+
|
| 30 |
+
module_id: str
|
| 31 |
+
code: str
|
| 32 |
+
ast_summary: dict[str, object]
|
| 33 |
+
dependency_summaries: list[NeighborSummary] = Field(default_factory=list)
|
| 34 |
+
dependent_summaries: list[NeighborSummary] = Field(default_factory=list)
|
| 35 |
+
neighbor_reviews: list[str] = Field(default_factory=list)
|
| 36 |
+
task_description: str
|
| 37 |
+
available_actions: list[str] = Field(default_factory=list)
|
| 38 |
+
requested_context: RequestedContext | None = None
|
| 39 |
+
token_usage: dict[str, int]
|
| 40 |
+
total_tokens: int
|
| 41 |
+
within_budget: bool
|
| 42 |
+
|
| 43 |
+
@field_validator("module_id", "code", "task_description")
|
| 44 |
+
@classmethod
|
| 45 |
+
def _must_not_be_empty(cls, value: str) -> str:
|
| 46 |
+
if not value.strip():
|
| 47 |
+
raise ValueError("Field cannot be empty")
|
| 48 |
+
return value
|
| 49 |
+
|
| 50 |
+
@field_validator("total_tokens")
|
| 51 |
+
@classmethod
|
| 52 |
+
def _budget_hard_cap(cls, value: int) -> int:
|
| 53 |
+
if value > MAX_TOTAL_TOKENS:
|
| 54 |
+
raise ValueError(f"total_tokens exceeds hard cap: {MAX_TOTAL_TOKENS}")
|
| 55 |
+
return value
|
| 56 |
+
|
| 57 |
+
@field_validator("within_budget")
|
| 58 |
+
@classmethod
|
| 59 |
+
def _must_be_true(cls, value: bool) -> bool:
|
| 60 |
+
if not value:
|
| 61 |
+
raise ValueError("within_budget must be True")
|
| 62 |
+
return value
|
code-review-env/env/observation_builder.py
CHANGED
|
@@ -1 +1,143 @@
|
|
| 1 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
from __future__ import annotations
|
| 2 |
+
|
| 3 |
+
import json
|
| 4 |
+
from pathlib import Path
|
| 5 |
+
|
| 6 |
+
from sqlmodel import Session, select
|
| 7 |
+
|
| 8 |
+
from db.schema import ModuleNode
|
| 9 |
+
from env.observation import CodeObservation, NeighborSummary, RequestedContext
|
| 10 |
+
from graph.graph_manager import GraphManager
|
| 11 |
+
from graph.token_budget import TokenBudget
|
| 12 |
+
|
| 13 |
+
|
| 14 |
+
DEFAULT_ACTIONS = [
|
| 15 |
+
"FLAG_STYLE",
|
| 16 |
+
"FLAG_BUG",
|
| 17 |
+
"FLAG_SECURITY",
|
| 18 |
+
"FLAG_DEPENDENCY_ISSUE",
|
| 19 |
+
"ADD_COMMENT",
|
| 20 |
+
"REQUEST_CONTEXT",
|
| 21 |
+
"REQUEST_CHANGES",
|
| 22 |
+
"APPROVE",
|
| 23 |
+
"AMEND_REVIEW",
|
| 24 |
+
]
|
| 25 |
+
|
| 26 |
+
|
| 27 |
+
class ObservationBuilder:
|
| 28 |
+
def __init__(self, source_root: str | Path, db_path: str | Path | None = None) -> None:
|
| 29 |
+
self.graph_manager = GraphManager(source_root=source_root, db_path=db_path)
|
| 30 |
+
self.token_budget = TokenBudget()
|
| 31 |
+
|
| 32 |
+
def _fetch_node(self, module_id: str) -> ModuleNode:
|
| 33 |
+
with Session(self.graph_manager.store.engine) as session:
|
| 34 |
+
node = session.exec(
|
| 35 |
+
select(ModuleNode).where(
|
| 36 |
+
ModuleNode.source_root == self.graph_manager.store.config.source_root,
|
| 37 |
+
ModuleNode.module_id == module_id,
|
| 38 |
+
)
|
| 39 |
+
).first()
|
| 40 |
+
if not node:
|
| 41 |
+
raise ValueError(f"Unknown module_id: {module_id}")
|
| 42 |
+
return node
|
| 43 |
+
|
| 44 |
+
@staticmethod
|
| 45 |
+
def _ast_summary_payload(ast_summary: str) -> dict[str, object]:
|
| 46 |
+
try:
|
| 47 |
+
loaded = json.loads(ast_summary)
|
| 48 |
+
except json.JSONDecodeError:
|
| 49 |
+
return {"text": ast_summary}
|
| 50 |
+
return loaded if isinstance(loaded, dict) else {"items": loaded}
|
| 51 |
+
|
| 52 |
+
def build(
|
| 53 |
+
self,
|
| 54 |
+
module_id: str,
|
| 55 |
+
task_description: str,
|
| 56 |
+
available_actions: list[str] | None = None,
|
| 57 |
+
context_request: str | None = None,
|
| 58 |
+
) -> CodeObservation:
|
| 59 |
+
graph = self.graph_manager.load_graph()
|
| 60 |
+
if module_id not in graph:
|
| 61 |
+
raise ValueError(f"Unknown module_id: {module_id}")
|
| 62 |
+
|
| 63 |
+
node = self._fetch_node(module_id)
|
| 64 |
+
centrality = self.graph_manager.centrality()
|
| 65 |
+
|
| 66 |
+
dependencies = list(graph.successors(module_id))
|
| 67 |
+
dependents = list(graph.predecessors(module_id))
|
| 68 |
+
|
| 69 |
+
dep_ranked = sorted(dependencies, key=lambda n: (-float(centrality.get(n, 0.0)), n))[:5]
|
| 70 |
+
dependent_ranked = sorted(dependents, key=lambda n: (-float(centrality.get(n, 0.0)), n))[:3]
|
| 71 |
+
|
| 72 |
+
dependency_summaries: list[NeighborSummary] = []
|
| 73 |
+
dependent_summaries: list[NeighborSummary] = []
|
| 74 |
+
neighbor_reviews: list[str] = []
|
| 75 |
+
|
| 76 |
+
for dep_id in dep_ranked:
|
| 77 |
+
dep_node = self._fetch_node(dep_id)
|
| 78 |
+
dependency_summaries.append(
|
| 79 |
+
NeighborSummary(
|
| 80 |
+
module_id=dep_id,
|
| 81 |
+
relation="dependency",
|
| 82 |
+
summary=dep_node.summary or dep_node.ast_summary,
|
| 83 |
+
review_snippet=dep_node.review_summary,
|
| 84 |
+
)
|
| 85 |
+
)
|
| 86 |
+
if dep_node.review_summary:
|
| 87 |
+
neighbor_reviews.append(f"{dep_id}: {dep_node.review_summary}")
|
| 88 |
+
|
| 89 |
+
for depd_id in dependent_ranked:
|
| 90 |
+
depd_node = self._fetch_node(depd_id)
|
| 91 |
+
dependent_summaries.append(
|
| 92 |
+
NeighborSummary(
|
| 93 |
+
module_id=depd_id,
|
| 94 |
+
relation="dependent",
|
| 95 |
+
summary=depd_node.summary or depd_node.ast_summary,
|
| 96 |
+
review_snippet=depd_node.review_summary,
|
| 97 |
+
)
|
| 98 |
+
)
|
| 99 |
+
if depd_node.review_summary:
|
| 100 |
+
neighbor_reviews.append(f"{depd_id}: {depd_node.review_summary}")
|
| 101 |
+
|
| 102 |
+
requested_context: RequestedContext | None = None
|
| 103 |
+
requested_context_code = ""
|
| 104 |
+
if context_request:
|
| 105 |
+
context_node = self._fetch_node(context_request)
|
| 106 |
+
requested_context_code = context_node.raw_code
|
| 107 |
+
|
| 108 |
+
actions = available_actions or DEFAULT_ACTIONS
|
| 109 |
+
budgeted = self.token_budget.enforce(
|
| 110 |
+
{
|
| 111 |
+
"code": node.raw_code,
|
| 112 |
+
"ast_summary_text": node.ast_summary,
|
| 113 |
+
"dependency_summaries": [item.model_dump_json() for item in dependency_summaries],
|
| 114 |
+
"dependent_summaries": [item.model_dump_json() for item in dependent_summaries],
|
| 115 |
+
"neighbor_reviews": neighbor_reviews[:4],
|
| 116 |
+
"task_description": task_description,
|
| 117 |
+
"available_actions": actions,
|
| 118 |
+
"requested_context_code": requested_context_code,
|
| 119 |
+
}
|
| 120 |
+
)
|
| 121 |
+
|
| 122 |
+
if context_request:
|
| 123 |
+
context_trimmed = budgeted.payload.get("requested_context_code", "")
|
| 124 |
+
requested_context = RequestedContext(
|
| 125 |
+
module_id=context_request,
|
| 126 |
+
code=str(context_trimmed),
|
| 127 |
+
was_truncated=str(context_trimmed) != requested_context_code,
|
| 128 |
+
)
|
| 129 |
+
|
| 130 |
+
return CodeObservation(
|
| 131 |
+
module_id=module_id,
|
| 132 |
+
code=str(budgeted.payload.get("code", "")),
|
| 133 |
+
ast_summary=self._ast_summary_payload(str(budgeted.payload.get("ast_summary_text", ""))),
|
| 134 |
+
dependency_summaries=dependency_summaries,
|
| 135 |
+
dependent_summaries=dependent_summaries,
|
| 136 |
+
neighbor_reviews=neighbor_reviews[:4],
|
| 137 |
+
task_description=task_description,
|
| 138 |
+
available_actions=actions,
|
| 139 |
+
requested_context=requested_context,
|
| 140 |
+
token_usage=budgeted.token_usage,
|
| 141 |
+
total_tokens=budgeted.total_tokens,
|
| 142 |
+
within_budget=budgeted.total_tokens <= self.token_budget.max_total_tokens,
|
| 143 |
+
)
|
code-review-env/graph/__init__.py
ADDED
|
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""Graph utilities for loading and querying dependency graphs."""
|
| 2 |
+
|
| 3 |
+
from graph.graph_manager import GraphManager
|
| 4 |
+
|
| 5 |
+
__all__ = ["GraphManager"]
|
code-review-env/graph/graph_manager.py
ADDED
|
@@ -0,0 +1,125 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
from __future__ import annotations
|
| 2 |
+
|
| 3 |
+
from pathlib import Path
|
| 4 |
+
from typing import Literal
|
| 5 |
+
|
| 6 |
+
import networkx as nx
|
| 7 |
+
from sqlmodel import Session, select
|
| 8 |
+
|
| 9 |
+
from db.schema import ModuleEdge, ModuleNode
|
| 10 |
+
from db.store import Store
|
| 11 |
+
|
| 12 |
+
|
| 13 |
+
class GraphManager:
|
| 14 |
+
"""Load and query dependency graph state from SQLite."""
|
| 15 |
+
|
| 16 |
+
def __init__(self, source_root: str | Path, db_path: str | Path | None = None) -> None:
|
| 17 |
+
self.source_root = str(Path(source_root).resolve())
|
| 18 |
+
self.store = Store(source_root=self.source_root, db_path=db_path)
|
| 19 |
+
|
| 20 |
+
def load_graph(self) -> nx.DiGraph:
|
| 21 |
+
graph = nx.DiGraph()
|
| 22 |
+
with Session(self.store.engine) as session:
|
| 23 |
+
nodes = list(
|
| 24 |
+
session.exec(
|
| 25 |
+
select(ModuleNode).where(ModuleNode.source_root == self.store.config.source_root)
|
| 26 |
+
).all()
|
| 27 |
+
)
|
| 28 |
+
edges = list(
|
| 29 |
+
session.exec(
|
| 30 |
+
select(ModuleEdge).where(ModuleEdge.source_root == self.store.config.source_root)
|
| 31 |
+
).all()
|
| 32 |
+
)
|
| 33 |
+
|
| 34 |
+
for node in nodes:
|
| 35 |
+
graph.add_node(
|
| 36 |
+
node.module_id,
|
| 37 |
+
name=node.name,
|
| 38 |
+
raw_code=node.raw_code,
|
| 39 |
+
ast_summary=node.ast_summary,
|
| 40 |
+
summary=node.summary or "",
|
| 41 |
+
linter_flags=node.linter_flags,
|
| 42 |
+
parent_module_id=node.parent_module_id,
|
| 43 |
+
review_status=node.review_status.value,
|
| 44 |
+
review_summary=node.review_summary or "",
|
| 45 |
+
is_chunk=node.is_chunk,
|
| 46 |
+
)
|
| 47 |
+
|
| 48 |
+
for edge in edges:
|
| 49 |
+
graph.add_edge(
|
| 50 |
+
edge.source_module_id,
|
| 51 |
+
edge.target_module_id,
|
| 52 |
+
edge_type=edge.edge_type.value,
|
| 53 |
+
import_line=edge.import_line,
|
| 54 |
+
weight=edge.weight,
|
| 55 |
+
)
|
| 56 |
+
|
| 57 |
+
return graph
|
| 58 |
+
|
| 59 |
+
def get_node(self, module_id: str) -> dict[str, object]:
|
| 60 |
+
graph = self.load_graph()
|
| 61 |
+
if module_id not in graph:
|
| 62 |
+
raise ValueError(f"Unknown module_id: {module_id}")
|
| 63 |
+
return dict(graph.nodes[module_id])
|
| 64 |
+
|
| 65 |
+
def get_neighbors(
|
| 66 |
+
self,
|
| 67 |
+
module_id: str,
|
| 68 |
+
direction: Literal["out", "in", "both"] = "both",
|
| 69 |
+
limit: int | None = None,
|
| 70 |
+
) -> list[str]:
|
| 71 |
+
graph = self.load_graph()
|
| 72 |
+
if module_id not in graph:
|
| 73 |
+
raise ValueError(f"Unknown module_id: {module_id}")
|
| 74 |
+
|
| 75 |
+
if direction == "out":
|
| 76 |
+
neighbors = set(graph.successors(module_id))
|
| 77 |
+
elif direction == "in":
|
| 78 |
+
neighbors = set(graph.predecessors(module_id))
|
| 79 |
+
else:
|
| 80 |
+
neighbors = set(graph.successors(module_id))
|
| 81 |
+
neighbors.update(graph.predecessors(module_id))
|
| 82 |
+
|
| 83 |
+
ordered = sorted(neighbors)
|
| 84 |
+
if limit is None:
|
| 85 |
+
return ordered
|
| 86 |
+
return ordered[: max(limit, 0)]
|
| 87 |
+
|
| 88 |
+
def centrality(self) -> dict[str, float]:
|
| 89 |
+
graph = self.load_graph()
|
| 90 |
+
if graph.number_of_nodes() == 0:
|
| 91 |
+
return {}
|
| 92 |
+
return nx.betweenness_centrality(graph, normalized=True)
|
| 93 |
+
|
| 94 |
+
def traversal_order(self) -> list[str]:
|
| 95 |
+
"""
|
| 96 |
+
Return a deterministic, leaf-first traversal where high-centrality nodes are later.
|
| 97 |
+
"""
|
| 98 |
+
graph = self.load_graph()
|
| 99 |
+
if graph.number_of_nodes() == 0:
|
| 100 |
+
return []
|
| 101 |
+
|
| 102 |
+
centrality = self.centrality()
|
| 103 |
+
|
| 104 |
+
# For DAGs, reverse topological order visits leaves first.
|
| 105 |
+
if nx.is_directed_acyclic_graph(graph):
|
| 106 |
+
topo_reversed = list(reversed(list(nx.lexicographical_topological_sort(graph))))
|
| 107 |
+
topo_rank = {node: idx for idx, node in enumerate(topo_reversed)}
|
| 108 |
+
return sorted(
|
| 109 |
+
graph.nodes(),
|
| 110 |
+
key=lambda node: (
|
| 111 |
+
int(topo_rank.get(node, 0)),
|
| 112 |
+
float(centrality.get(node, 0.0)),
|
| 113 |
+
str(node),
|
| 114 |
+
),
|
| 115 |
+
)
|
| 116 |
+
|
| 117 |
+
# Stable fallback for cyclic graphs.
|
| 118 |
+
return sorted(
|
| 119 |
+
graph.nodes(),
|
| 120 |
+
key=lambda node: (
|
| 121 |
+
int(graph.out_degree(node)),
|
| 122 |
+
float(centrality.get(node, 0.0)),
|
| 123 |
+
str(node),
|
| 124 |
+
),
|
| 125 |
+
)
|
code-review-env/graph/token_budget.py
ADDED
|
@@ -0,0 +1,117 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
from __future__ import annotations
|
| 2 |
+
|
| 3 |
+
import math
|
| 4 |
+
from dataclasses import dataclass
|
| 5 |
+
|
| 6 |
+
MAX_TOTAL_TOKENS = 2000
|
| 7 |
+
|
| 8 |
+
COMPONENT_LIMITS: dict[str, int] = {
|
| 9 |
+
"current_code": 800,
|
| 10 |
+
"ast_summary": 100,
|
| 11 |
+
"direct_deps": 250,
|
| 12 |
+
"dependents": 150,
|
| 13 |
+
"neighbor_reviews": 120,
|
| 14 |
+
"task_and_actions": 200,
|
| 15 |
+
"requested_context": 800,
|
| 16 |
+
}
|
| 17 |
+
|
| 18 |
+
|
| 19 |
+
def estimate_tokens(text: str) -> int:
|
| 20 |
+
"""Deterministic approximation with conservative floor for non-empty text."""
|
| 21 |
+
if not text:
|
| 22 |
+
return 0
|
| 23 |
+
return max(1, int(math.ceil(len(text) / 4)))
|
| 24 |
+
|
| 25 |
+
|
| 26 |
+
def truncate_to_budget(text: str, max_tokens: int, suffix_notice: str = "\n... [TRUNCATED]") -> str:
|
| 27 |
+
if max_tokens <= 0:
|
| 28 |
+
return ""
|
| 29 |
+
|
| 30 |
+
current = estimate_tokens(text)
|
| 31 |
+
if current <= max_tokens:
|
| 32 |
+
return text
|
| 33 |
+
|
| 34 |
+
notice_tokens = estimate_tokens(suffix_notice)
|
| 35 |
+
content_budget = max(max_tokens - notice_tokens, 0)
|
| 36 |
+
max_chars = content_budget * 4
|
| 37 |
+
trimmed = text[:max_chars]
|
| 38 |
+
return f"{trimmed}{suffix_notice}" if trimmed else suffix_notice.strip()
|
| 39 |
+
|
| 40 |
+
|
| 41 |
+
@dataclass(frozen=True)
|
| 42 |
+
class BudgetResult:
|
| 43 |
+
payload: dict[str, object]
|
| 44 |
+
token_usage: dict[str, int]
|
| 45 |
+
total_tokens: int
|
| 46 |
+
|
| 47 |
+
|
| 48 |
+
class TokenBudget:
|
| 49 |
+
def __init__(self, max_total_tokens: int = MAX_TOTAL_TOKENS) -> None:
|
| 50 |
+
self.max_total_tokens = max_total_tokens
|
| 51 |
+
|
| 52 |
+
def _trim_component(self, text: str, component_name: str) -> str:
|
| 53 |
+
limit = COMPONENT_LIMITS.get(component_name, self.max_total_tokens)
|
| 54 |
+
return truncate_to_budget(text, limit)
|
| 55 |
+
|
| 56 |
+
def enforce(self, payload: dict[str, object]) -> BudgetResult:
|
| 57 |
+
normalized = dict(payload)
|
| 58 |
+
usage: dict[str, int] = {}
|
| 59 |
+
|
| 60 |
+
current_code = str(normalized.get("code", ""))
|
| 61 |
+
ast_summary = str(normalized.get("ast_summary_text", ""))
|
| 62 |
+
dep_text = "\n".join(str(item) for item in normalized.get("dependency_summaries", []))
|
| 63 |
+
dependent_text = "\n".join(str(item) for item in normalized.get("dependent_summaries", []))
|
| 64 |
+
review_text = "\n".join(str(item) for item in normalized.get("neighbor_reviews", []))
|
| 65 |
+
task_actions = "\n".join(
|
| 66 |
+
[
|
| 67 |
+
str(normalized.get("task_description", "")),
|
| 68 |
+
" ".join(str(a) for a in normalized.get("available_actions", [])),
|
| 69 |
+
]
|
| 70 |
+
)
|
| 71 |
+
requested_context = str(normalized.get("requested_context_code", ""))
|
| 72 |
+
|
| 73 |
+
current_code = self._trim_component(current_code, "current_code")
|
| 74 |
+
ast_summary = self._trim_component(ast_summary, "ast_summary")
|
| 75 |
+
dep_text = self._trim_component(dep_text, "direct_deps")
|
| 76 |
+
dependent_text = self._trim_component(dependent_text, "dependents")
|
| 77 |
+
review_text = self._trim_component(review_text, "neighbor_reviews")
|
| 78 |
+
task_actions = self._trim_component(task_actions, "task_and_actions")
|
| 79 |
+
requested_context = self._trim_component(requested_context, "requested_context")
|
| 80 |
+
|
| 81 |
+
normalized["code"] = current_code
|
| 82 |
+
normalized["ast_summary_text"] = ast_summary
|
| 83 |
+
normalized["dependency_summaries_text"] = dep_text
|
| 84 |
+
normalized["dependent_summaries_text"] = dependent_text
|
| 85 |
+
normalized["neighbor_reviews_text"] = review_text
|
| 86 |
+
normalized["task_actions_text"] = task_actions
|
| 87 |
+
normalized["requested_context_code"] = requested_context
|
| 88 |
+
|
| 89 |
+
usage["current_code"] = estimate_tokens(current_code)
|
| 90 |
+
usage["ast_summary"] = estimate_tokens(ast_summary)
|
| 91 |
+
usage["direct_deps"] = estimate_tokens(dep_text)
|
| 92 |
+
usage["dependents"] = estimate_tokens(dependent_text)
|
| 93 |
+
usage["neighbor_reviews"] = estimate_tokens(review_text)
|
| 94 |
+
usage["task_and_actions"] = estimate_tokens(task_actions)
|
| 95 |
+
usage["requested_context"] = estimate_tokens(requested_context)
|
| 96 |
+
|
| 97 |
+
total = sum(usage.values())
|
| 98 |
+
if total > self.max_total_tokens:
|
| 99 |
+
overflow = total - self.max_total_tokens
|
| 100 |
+
requested_limit = max(estimate_tokens(requested_context) - overflow, 0)
|
| 101 |
+
requested_context = truncate_to_budget(requested_context, requested_limit)
|
| 102 |
+
normalized["requested_context_code"] = requested_context
|
| 103 |
+
usage["requested_context"] = estimate_tokens(requested_context)
|
| 104 |
+
total = sum(usage.values())
|
| 105 |
+
|
| 106 |
+
if total > self.max_total_tokens:
|
| 107 |
+
overflow = total - self.max_total_tokens
|
| 108 |
+
code_limit = max(estimate_tokens(current_code) - overflow, 0)
|
| 109 |
+
current_code = truncate_to_budget(current_code, code_limit)
|
| 110 |
+
normalized["code"] = current_code
|
| 111 |
+
usage["current_code"] = estimate_tokens(current_code)
|
| 112 |
+
total = sum(usage.values())
|
| 113 |
+
|
| 114 |
+
if total > self.max_total_tokens:
|
| 115 |
+
raise ValueError("Unable to enforce token budget within hard limit")
|
| 116 |
+
|
| 117 |
+
return BudgetResult(payload=normalized, token_usage=usage, total_tokens=total)
|
code-review-env/parser/ast_parser.py
CHANGED
|
@@ -15,6 +15,8 @@ from parser.summarizer import summarize_module
|
|
| 15 |
class ImportRef(BaseModel):
|
| 16 |
target_module: str
|
| 17 |
import_line: str
|
|
|
|
|
|
|
| 18 |
edge_type: EdgeType = EdgeType.EXPLICIT_IMPORT
|
| 19 |
|
| 20 |
|
|
@@ -33,7 +35,12 @@ class _Visitor(ast.NodeVisitor):
|
|
| 33 |
self.function_signatures: list[str] = []
|
| 34 |
self.classes: list[str] = []
|
| 35 |
self.constants: list[str] = []
|
| 36 |
-
self.imports: list[tuple[str, str]] = []
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 37 |
|
| 38 |
def visit_FunctionDef(self, node: ast.FunctionDef) -> None:
|
| 39 |
args: list[str] = []
|
|
@@ -44,7 +51,11 @@ class _Visitor(ast.NodeVisitor):
|
|
| 44 |
args.append(arg.arg)
|
| 45 |
returns = ast.unparse(node.returns) if node.returns is not None else "None"
|
| 46 |
self.function_signatures.append(f"{node.name}({', '.join(args)})->{returns}")
|
| 47 |
-
self.
|
|
|
|
|
|
|
|
|
|
|
|
|
| 48 |
|
| 49 |
def visit_AsyncFunctionDef(self, node: ast.AsyncFunctionDef) -> None:
|
| 50 |
fake = ast.FunctionDef(
|
|
@@ -59,19 +70,23 @@ class _Visitor(ast.NodeVisitor):
|
|
| 59 |
|
| 60 |
def visit_ClassDef(self, node: ast.ClassDef) -> None:
|
| 61 |
self.classes.append(node.name)
|
| 62 |
-
self.
|
|
|
|
|
|
|
|
|
|
|
|
|
| 63 |
|
| 64 |
def visit_Import(self, node: ast.Import) -> None:
|
| 65 |
line = ast.get_source_segment(self._source, node) or "import"
|
| 66 |
for alias in node.names:
|
| 67 |
-
self.imports.append((alias.name, line))
|
| 68 |
|
| 69 |
def visit_ImportFrom(self, node: ast.ImportFrom) -> None:
|
| 70 |
module = node.module or ""
|
| 71 |
level = node.level or 0
|
| 72 |
dotted = "." * level + module
|
| 73 |
line = ast.get_source_segment(self._source, node) or "from"
|
| 74 |
-
self.imports.append((dotted, line))
|
| 75 |
|
| 76 |
def visit_Assign(self, node: ast.Assign) -> None:
|
| 77 |
if isinstance(node.value, ast.Constant):
|
|
@@ -105,7 +120,18 @@ def _resolve_relative_import(current_module: str, ref: str) -> str:
|
|
| 105 |
def parse_python_file(path: Path, root_dir: Path) -> ParsedModule:
|
| 106 |
source = path.read_text(encoding="utf-8")
|
| 107 |
module_id = _to_module_id(path, root_dir)
|
| 108 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 109 |
|
| 110 |
visitor = _Visitor()
|
| 111 |
visitor.parse(tree, source)
|
|
@@ -114,9 +140,11 @@ def parse_python_file(path: Path, root_dir: Path) -> ParsedModule:
|
|
| 114 |
ImportRef(
|
| 115 |
target_module=_resolve_relative_import(module_id, name),
|
| 116 |
import_line=line,
|
|
|
|
|
|
|
| 117 |
edge_type=EdgeType.EXPLICIT_IMPORT,
|
| 118 |
)
|
| 119 |
-
for name, line in visitor.imports
|
| 120 |
]
|
| 121 |
|
| 122 |
dependencies = [imp.target_module for imp in imports if imp.target_module]
|
|
@@ -138,8 +166,10 @@ def parse_directory(target_dir: Path, db_path: str | None = None) -> Store:
|
|
| 138 |
store.clear_source_graph()
|
| 139 |
|
| 140 |
py_files = sorted(target_dir.rglob("*.py"))
|
| 141 |
-
for py_file in py_files
|
| 142 |
-
|
|
|
|
|
|
|
| 143 |
issues = run_linters(py_file)
|
| 144 |
summary = summarize_module(parsed, issues)
|
| 145 |
|
|
@@ -155,13 +185,13 @@ def parse_directory(target_dir: Path, db_path: str | None = None) -> Store:
|
|
| 155 |
[issue.model_dump() for issue in issues],
|
| 156 |
)
|
| 157 |
for imported in parsed.imports:
|
| 158 |
-
if imported.target_module:
|
| 159 |
store.upsert_edge(
|
| 160 |
source_module_id=parsed.module_id,
|
| 161 |
target_module_id=imported.target_module,
|
| 162 |
edge_type=imported.edge_type,
|
| 163 |
import_line=imported.import_line,
|
| 164 |
-
weight=
|
| 165 |
)
|
| 166 |
|
| 167 |
return store
|
|
|
|
| 15 |
class ImportRef(BaseModel):
|
| 16 |
target_module: str
|
| 17 |
import_line: str
|
| 18 |
+
scope: str = "module_level"
|
| 19 |
+
weight: float = 1.0
|
| 20 |
edge_type: EdgeType = EdgeType.EXPLICIT_IMPORT
|
| 21 |
|
| 22 |
|
|
|
|
| 35 |
self.function_signatures: list[str] = []
|
| 36 |
self.classes: list[str] = []
|
| 37 |
self.constants: list[str] = []
|
| 38 |
+
self.imports: list[tuple[str, str, str]] = []
|
| 39 |
+
self._scope_stack: list[str] = []
|
| 40 |
+
|
| 41 |
+
@property
|
| 42 |
+
def _scope(self) -> str:
|
| 43 |
+
return "function_level" if self._scope_stack else "module_level"
|
| 44 |
|
| 45 |
def visit_FunctionDef(self, node: ast.FunctionDef) -> None:
|
| 46 |
args: list[str] = []
|
|
|
|
| 51 |
args.append(arg.arg)
|
| 52 |
returns = ast.unparse(node.returns) if node.returns is not None else "None"
|
| 53 |
self.function_signatures.append(f"{node.name}({', '.join(args)})->{returns}")
|
| 54 |
+
self._scope_stack.append(node.name)
|
| 55 |
+
try:
|
| 56 |
+
self.generic_visit(node)
|
| 57 |
+
finally:
|
| 58 |
+
self._scope_stack.pop()
|
| 59 |
|
| 60 |
def visit_AsyncFunctionDef(self, node: ast.AsyncFunctionDef) -> None:
|
| 61 |
fake = ast.FunctionDef(
|
|
|
|
| 70 |
|
| 71 |
def visit_ClassDef(self, node: ast.ClassDef) -> None:
|
| 72 |
self.classes.append(node.name)
|
| 73 |
+
self._scope_stack.append(node.name)
|
| 74 |
+
try:
|
| 75 |
+
self.generic_visit(node)
|
| 76 |
+
finally:
|
| 77 |
+
self._scope_stack.pop()
|
| 78 |
|
| 79 |
def visit_Import(self, node: ast.Import) -> None:
|
| 80 |
line = ast.get_source_segment(self._source, node) or "import"
|
| 81 |
for alias in node.names:
|
| 82 |
+
self.imports.append((alias.name, line, self._scope))
|
| 83 |
|
| 84 |
def visit_ImportFrom(self, node: ast.ImportFrom) -> None:
|
| 85 |
module = node.module or ""
|
| 86 |
level = node.level or 0
|
| 87 |
dotted = "." * level + module
|
| 88 |
line = ast.get_source_segment(self._source, node) or "from"
|
| 89 |
+
self.imports.append((dotted, line, self._scope))
|
| 90 |
|
| 91 |
def visit_Assign(self, node: ast.Assign) -> None:
|
| 92 |
if isinstance(node.value, ast.Constant):
|
|
|
|
| 120 |
def parse_python_file(path: Path, root_dir: Path) -> ParsedModule:
|
| 121 |
source = path.read_text(encoding="utf-8")
|
| 122 |
module_id = _to_module_id(path, root_dir)
|
| 123 |
+
try:
|
| 124 |
+
tree = ast.parse(source)
|
| 125 |
+
except SyntaxError:
|
| 126 |
+
return ParsedModule(
|
| 127 |
+
module_id=module_id,
|
| 128 |
+
raw_code=source,
|
| 129 |
+
function_signatures=[],
|
| 130 |
+
classes=[],
|
| 131 |
+
imports=[],
|
| 132 |
+
constants=[],
|
| 133 |
+
dependencies=[],
|
| 134 |
+
)
|
| 135 |
|
| 136 |
visitor = _Visitor()
|
| 137 |
visitor.parse(tree, source)
|
|
|
|
| 140 |
ImportRef(
|
| 141 |
target_module=_resolve_relative_import(module_id, name),
|
| 142 |
import_line=line,
|
| 143 |
+
scope=scope,
|
| 144 |
+
weight=0.5 if scope == "function_level" else 1.0,
|
| 145 |
edge_type=EdgeType.EXPLICIT_IMPORT,
|
| 146 |
)
|
| 147 |
+
for name, line, scope in visitor.imports
|
| 148 |
]
|
| 149 |
|
| 150 |
dependencies = [imp.target_module for imp in imports if imp.target_module]
|
|
|
|
| 166 |
store.clear_source_graph()
|
| 167 |
|
| 168 |
py_files = sorted(target_dir.rglob("*.py"))
|
| 169 |
+
parsed_modules = [parse_python_file(py_file, target_dir) for py_file in py_files]
|
| 170 |
+
known_module_ids = {parsed.module_id for parsed in parsed_modules}
|
| 171 |
+
|
| 172 |
+
for py_file, parsed in zip(py_files, parsed_modules):
|
| 173 |
issues = run_linters(py_file)
|
| 174 |
summary = summarize_module(parsed, issues)
|
| 175 |
|
|
|
|
| 185 |
[issue.model_dump() for issue in issues],
|
| 186 |
)
|
| 187 |
for imported in parsed.imports:
|
| 188 |
+
if imported.target_module and imported.target_module in known_module_ids:
|
| 189 |
store.upsert_edge(
|
| 190 |
source_module_id=parsed.module_id,
|
| 191 |
target_module_id=imported.target_module,
|
| 192 |
edge_type=imported.edge_type,
|
| 193 |
import_line=imported.import_line,
|
| 194 |
+
weight=imported.weight,
|
| 195 |
)
|
| 196 |
|
| 197 |
return store
|
code-review-env/parser/chunker.py
ADDED
|
@@ -0,0 +1,96 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
from __future__ import annotations
|
| 2 |
+
|
| 3 |
+
import ast
|
| 4 |
+
from pydantic import BaseModel
|
| 5 |
+
|
| 6 |
+
from parser.ast_parser import ParsedModule
|
| 7 |
+
|
| 8 |
+
|
| 9 |
+
class ChunkNode(BaseModel):
|
| 10 |
+
module_id: str
|
| 11 |
+
name: str
|
| 12 |
+
code: str
|
| 13 |
+
parent_module_id: str | None = None
|
| 14 |
+
is_chunk: bool = False
|
| 15 |
+
start_line: int = 1
|
| 16 |
+
end_line: int = 1
|
| 17 |
+
|
| 18 |
+
|
| 19 |
+
class ChunkResult(BaseModel):
|
| 20 |
+
parent: ChunkNode
|
| 21 |
+
chunks: list[ChunkNode]
|
| 22 |
+
|
| 23 |
+
|
| 24 |
+
def _slice_lines(source: str, start: int, end: int) -> str:
|
| 25 |
+
lines = source.splitlines()
|
| 26 |
+
start_idx = max(start - 1, 0)
|
| 27 |
+
end_idx = min(end, len(lines))
|
| 28 |
+
return "\n".join(lines[start_idx:end_idx]).strip()
|
| 29 |
+
|
| 30 |
+
|
| 31 |
+
def chunk_module(parsed: ParsedModule, max_lines: int = 300) -> ChunkResult:
|
| 32 |
+
line_count = len(parsed.raw_code.splitlines())
|
| 33 |
+
if line_count <= max_lines:
|
| 34 |
+
parent = ChunkNode(
|
| 35 |
+
module_id=parsed.module_id,
|
| 36 |
+
name=parsed.module_id.split(".")[-1],
|
| 37 |
+
code=parsed.raw_code,
|
| 38 |
+
is_chunk=False,
|
| 39 |
+
start_line=1,
|
| 40 |
+
end_line=line_count,
|
| 41 |
+
)
|
| 42 |
+
return ChunkResult(parent=parent, chunks=[])
|
| 43 |
+
|
| 44 |
+
try:
|
| 45 |
+
tree = ast.parse(parsed.raw_code)
|
| 46 |
+
except SyntaxError:
|
| 47 |
+
parent = ChunkNode(
|
| 48 |
+
module_id=parsed.module_id,
|
| 49 |
+
name=parsed.module_id.split(".")[-1],
|
| 50 |
+
code=parsed.raw_code,
|
| 51 |
+
is_chunk=False,
|
| 52 |
+
start_line=1,
|
| 53 |
+
end_line=line_count,
|
| 54 |
+
)
|
| 55 |
+
return ChunkResult(parent=parent, chunks=[])
|
| 56 |
+
|
| 57 |
+
chunks: list[ChunkNode] = []
|
| 58 |
+
for node in tree.body:
|
| 59 |
+
if isinstance(node, (ast.FunctionDef, ast.AsyncFunctionDef, ast.ClassDef)):
|
| 60 |
+
start_line = int(getattr(node, "lineno", 1))
|
| 61 |
+
end_line = int(getattr(node, "end_lineno", start_line))
|
| 62 |
+
chunk_id = f"{parsed.module_id}::{node.name}"
|
| 63 |
+
chunks.append(
|
| 64 |
+
ChunkNode(
|
| 65 |
+
module_id=chunk_id,
|
| 66 |
+
name=node.name,
|
| 67 |
+
code=_slice_lines(parsed.raw_code, start_line, end_line),
|
| 68 |
+
parent_module_id=parsed.module_id,
|
| 69 |
+
is_chunk=True,
|
| 70 |
+
start_line=start_line,
|
| 71 |
+
end_line=end_line,
|
| 72 |
+
)
|
| 73 |
+
)
|
| 74 |
+
|
| 75 |
+
if not chunks:
|
| 76 |
+
chunks.append(
|
| 77 |
+
ChunkNode(
|
| 78 |
+
module_id=f"{parsed.module_id}::module_body",
|
| 79 |
+
name="module_body",
|
| 80 |
+
code=parsed.raw_code,
|
| 81 |
+
parent_module_id=parsed.module_id,
|
| 82 |
+
is_chunk=True,
|
| 83 |
+
start_line=1,
|
| 84 |
+
end_line=line_count,
|
| 85 |
+
)
|
| 86 |
+
)
|
| 87 |
+
|
| 88 |
+
parent = ChunkNode(
|
| 89 |
+
module_id=parsed.module_id,
|
| 90 |
+
name=parsed.module_id.split(".")[-1],
|
| 91 |
+
code="",
|
| 92 |
+
is_chunk=False,
|
| 93 |
+
start_line=1,
|
| 94 |
+
end_line=line_count,
|
| 95 |
+
)
|
| 96 |
+
return ChunkResult(parent=parent, chunks=chunks)
|
code-review-env/parser/graph_builder.py
ADDED
|
@@ -0,0 +1,114 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
from __future__ import annotations
|
| 2 |
+
|
| 3 |
+
import ast
|
| 4 |
+
import networkx as nx
|
| 5 |
+
from pydantic import BaseModel
|
| 6 |
+
|
| 7 |
+
from db.schema import EdgeType
|
| 8 |
+
from parser.ast_parser import ParsedModule
|
| 9 |
+
|
| 10 |
+
|
| 11 |
+
class EdgeRecord(BaseModel):
|
| 12 |
+
source_module_id: str
|
| 13 |
+
target_module_id: str
|
| 14 |
+
edge_type: EdgeType
|
| 15 |
+
import_line: str
|
| 16 |
+
scope: str
|
| 17 |
+
weight: float
|
| 18 |
+
|
| 19 |
+
|
| 20 |
+
def _build_intra_file_edges(parsed: ParsedModule, available_chunk_ids: set[str]) -> list[EdgeRecord]:
|
| 21 |
+
try:
|
| 22 |
+
tree = ast.parse(parsed.raw_code)
|
| 23 |
+
except SyntaxError:
|
| 24 |
+
return []
|
| 25 |
+
|
| 26 |
+
function_names = {
|
| 27 |
+
node.name
|
| 28 |
+
for node in tree.body
|
| 29 |
+
if isinstance(node, (ast.FunctionDef, ast.AsyncFunctionDef))
|
| 30 |
+
}
|
| 31 |
+
call_edges: list[EdgeRecord] = []
|
| 32 |
+
|
| 33 |
+
for node in tree.body:
|
| 34 |
+
if not isinstance(node, (ast.FunctionDef, ast.AsyncFunctionDef)):
|
| 35 |
+
continue
|
| 36 |
+
source_id = f"{parsed.module_id}::{node.name}"
|
| 37 |
+
if source_id not in available_chunk_ids:
|
| 38 |
+
continue
|
| 39 |
+
for inner in ast.walk(node):
|
| 40 |
+
if isinstance(inner, ast.Call) and isinstance(inner.func, ast.Name):
|
| 41 |
+
called = inner.func.id
|
| 42 |
+
if called in function_names:
|
| 43 |
+
target_id = f"{parsed.module_id}::{called}"
|
| 44 |
+
if target_id in available_chunk_ids and target_id != source_id:
|
| 45 |
+
call_edges.append(
|
| 46 |
+
EdgeRecord(
|
| 47 |
+
source_module_id=source_id,
|
| 48 |
+
target_module_id=target_id,
|
| 49 |
+
edge_type=EdgeType.INTRA_FILE,
|
| 50 |
+
import_line=f"call:{called}",
|
| 51 |
+
scope="function_level",
|
| 52 |
+
weight=0.5,
|
| 53 |
+
)
|
| 54 |
+
)
|
| 55 |
+
|
| 56 |
+
dedup: dict[tuple[str, str, str], EdgeRecord] = {}
|
| 57 |
+
for edge in call_edges:
|
| 58 |
+
key = (edge.source_module_id, edge.target_module_id, edge.import_line)
|
| 59 |
+
dedup[key] = edge
|
| 60 |
+
return list(dedup.values())
|
| 61 |
+
|
| 62 |
+
|
| 63 |
+
def build_edges(
|
| 64 |
+
parsed_modules: list[ParsedModule],
|
| 65 |
+
module_ids: set[str],
|
| 66 |
+
chunk_ids_by_parent: dict[str, set[str]],
|
| 67 |
+
) -> list[EdgeRecord]:
|
| 68 |
+
edges: list[EdgeRecord] = []
|
| 69 |
+
|
| 70 |
+
for parsed in parsed_modules:
|
| 71 |
+
source_module_id = parsed.module_id
|
| 72 |
+
for imp in parsed.imports:
|
| 73 |
+
if imp.target_module and imp.target_module in module_ids:
|
| 74 |
+
edge_type = (
|
| 75 |
+
EdgeType.EXPLICIT_IMPORT
|
| 76 |
+
if imp.scope == "module_level"
|
| 77 |
+
else EdgeType.IMPLICIT_DEPENDENCY
|
| 78 |
+
)
|
| 79 |
+
edges.append(
|
| 80 |
+
EdgeRecord(
|
| 81 |
+
source_module_id=source_module_id,
|
| 82 |
+
target_module_id=imp.target_module,
|
| 83 |
+
edge_type=edge_type,
|
| 84 |
+
import_line=imp.import_line,
|
| 85 |
+
scope=imp.scope,
|
| 86 |
+
weight=imp.weight,
|
| 87 |
+
)
|
| 88 |
+
)
|
| 89 |
+
|
| 90 |
+
available_chunk_ids = chunk_ids_by_parent.get(parsed.module_id, set())
|
| 91 |
+
edges.extend(_build_intra_file_edges(parsed, available_chunk_ids))
|
| 92 |
+
|
| 93 |
+
graph = nx.DiGraph()
|
| 94 |
+
for edge in edges:
|
| 95 |
+
graph.add_edge(edge.source_module_id, edge.target_module_id)
|
| 96 |
+
|
| 97 |
+
for source_module_id, target_module_id in list(graph.edges()):
|
| 98 |
+
if graph.has_edge(target_module_id, source_module_id):
|
| 99 |
+
edges.append(
|
| 100 |
+
EdgeRecord(
|
| 101 |
+
source_module_id=source_module_id,
|
| 102 |
+
target_module_id=target_module_id,
|
| 103 |
+
edge_type=EdgeType.CIRCULAR,
|
| 104 |
+
import_line="cycle_detected",
|
| 105 |
+
scope="module_level",
|
| 106 |
+
weight=1.0,
|
| 107 |
+
)
|
| 108 |
+
)
|
| 109 |
+
|
| 110 |
+
dedup: dict[tuple[str, str, str], EdgeRecord] = {}
|
| 111 |
+
for edge in edges:
|
| 112 |
+
key = (edge.source_module_id, edge.target_module_id, edge.import_line)
|
| 113 |
+
dedup[key] = edge
|
| 114 |
+
return list(dedup.values())
|
code-review-env/parser/linter.py
CHANGED
|
@@ -98,7 +98,36 @@ def run_bandit(path: Path) -> list[LinterIssue]:
|
|
| 98 |
return issues
|
| 99 |
|
| 100 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 101 |
def run_linters(path: Path) -> list[LinterIssue]:
|
| 102 |
issues = run_pylint(path)
|
| 103 |
issues.extend(run_bandit(path))
|
|
|
|
| 104 |
return issues
|
|
|
|
| 98 |
return issues
|
| 99 |
|
| 100 |
|
| 101 |
+
def run_pyflakes(path: Path) -> list[LinterIssue]:
|
| 102 |
+
cmd = [sys.executable, "-m", "pyflakes", str(path)]
|
| 103 |
+
proc = subprocess.run(cmd, capture_output=True, text=True, check=False)
|
| 104 |
+
payload = (proc.stdout or "").strip()
|
| 105 |
+
if not payload:
|
| 106 |
+
return []
|
| 107 |
+
|
| 108 |
+
issues: list[LinterIssue] = []
|
| 109 |
+
for raw_line in payload.splitlines():
|
| 110 |
+
line = 0
|
| 111 |
+
message = raw_line.strip()
|
| 112 |
+
if ":" in raw_line:
|
| 113 |
+
parts = raw_line.split(":", 3)
|
| 114 |
+
if len(parts) >= 3 and parts[1].isdigit():
|
| 115 |
+
line = int(parts[1])
|
| 116 |
+
message = parts[3].strip() if len(parts) == 4 else message
|
| 117 |
+
issues.append(
|
| 118 |
+
LinterIssue(
|
| 119 |
+
tool="pyflakes",
|
| 120 |
+
line=line,
|
| 121 |
+
severity="medium",
|
| 122 |
+
code="PYF000",
|
| 123 |
+
message=message,
|
| 124 |
+
)
|
| 125 |
+
)
|
| 126 |
+
return issues
|
| 127 |
+
|
| 128 |
+
|
| 129 |
def run_linters(path: Path) -> list[LinterIssue]:
|
| 130 |
issues = run_pylint(path)
|
| 131 |
issues.extend(run_bandit(path))
|
| 132 |
+
issues.extend(run_pyflakes(path))
|
| 133 |
return issues
|
code-review-env/requirements.txt
CHANGED
|
@@ -3,6 +3,7 @@ networkx>=3.2
|
|
| 3 |
pydantic>=2.7
|
| 4 |
pylint>=3.2
|
| 5 |
bandit>=1.7
|
|
|
|
| 6 |
fastapi>=0.115
|
| 7 |
uvicorn>=0.30
|
| 8 |
openai>=1.40
|
|
|
|
| 3 |
pydantic>=2.7
|
| 4 |
pylint>=3.2
|
| 5 |
bandit>=1.7
|
| 6 |
+
pyflakes>=3.2
|
| 7 |
fastapi>=0.115
|
| 8 |
uvicorn>=0.30
|
| 9 |
openai>=1.40
|
code-review-env/sample_project/auth.py
ADDED
|
@@ -0,0 +1,7 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""Auth helpers."""
|
| 2 |
+
|
| 3 |
+
import config
|
| 4 |
+
|
| 5 |
+
|
| 6 |
+
def issue_session_token(user_id: str) -> str:
|
| 7 |
+
return f"{user_id}:{config.SECRET_KEY}:session-token-generated-with-a-very-long-suffix-that-triggers-style-rules-and-is-hard-to-read"
|
code-review-env/sample_project/cart.py
ADDED
|
@@ -0,0 +1,17 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""Cart calculations."""
|
| 2 |
+
|
| 3 |
+
import config
|
| 4 |
+
|
| 5 |
+
|
| 6 |
+
def calculate_subtotal(items: list[dict[str, float]]) -> float:
|
| 7 |
+
subtotal = 0.0
|
| 8 |
+
for item in items:
|
| 9 |
+
subtotal += float(item.get("price", 0.0)) * float(item.get("qty", 0.0))
|
| 10 |
+
return subtotal
|
| 11 |
+
|
| 12 |
+
|
| 13 |
+
def calculate_total(items: list[dict[str, float]]) -> float:
|
| 14 |
+
subtotal = calculate_subtotal(items)
|
| 15 |
+
# BUG: config.DISCOUNT_RATE is intended to be 0.20, but set to 20 in config.
|
| 16 |
+
discounted = subtotal - (subtotal * config.DISCOUNT_RATE)
|
| 17 |
+
return discounted + (discounted * config.TAX_RATE)
|
code-review-env/sample_project/checkout.py
ADDED
|
@@ -0,0 +1,15 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""Checkout flow."""
|
| 2 |
+
|
| 3 |
+
import cart
|
| 4 |
+
import payments
|
| 5 |
+
|
| 6 |
+
|
| 7 |
+
def submit_order(items: list[dict[str, float]]) -> str:
|
| 8 |
+
total = cart.calculate_total(items)
|
| 9 |
+
# Cascading symptom: negative total is observed here but root cause is config -> cart.
|
| 10 |
+
if total < 0:
|
| 11 |
+
return "error: negative total"
|
| 12 |
+
gateway_ok = payments.run_gateway_check("https://gateway.example.com/health")
|
| 13 |
+
if gateway_ok != 0:
|
| 14 |
+
return "error: gateway"
|
| 15 |
+
return payments.charge(total)
|
code-review-env/sample_project/config.py
ADDED
|
@@ -0,0 +1,6 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""Configuration defaults for the checkout flow."""
|
| 2 |
+
|
| 3 |
+
DISCOUNT_RATE = 20
|
| 4 |
+
TAX_RATE = 0.07
|
| 5 |
+
PAYMENT_TIMEOUT_SECONDS = 30
|
| 6 |
+
SECRET_KEY = "hardcoded-dev-key"
|
code-review-env/sample_project/database.py
ADDED
|
@@ -0,0 +1,6 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
from config import SETTINGS
|
| 2 |
+
|
| 3 |
+
|
| 4 |
+
def get_connection_url() -> str:
|
| 5 |
+
# Intentional bug for lint/security testing: unsafely concatenated DSN-like value
|
| 6 |
+
return "sqlite:///" + SETTINGS.get("db_path")
|
code-review-env/sample_project/huge_module.py
ADDED
|
@@ -0,0 +1,628 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""Large synthetic file for chunking checks."""
|
| 2 |
+
|
| 3 |
+
|
| 4 |
+
def bootstrap() -> int:
|
| 5 |
+
return 1
|
| 6 |
+
LINE_1 = 1
|
| 7 |
+
LINE_2 = 2
|
| 8 |
+
LINE_3 = 3
|
| 9 |
+
LINE_4 = 4
|
| 10 |
+
LINE_5 = 5
|
| 11 |
+
LINE_6 = 6
|
| 12 |
+
LINE_7 = 7
|
| 13 |
+
LINE_8 = 8
|
| 14 |
+
LINE_9 = 9
|
| 15 |
+
LINE_10 = 10
|
| 16 |
+
LINE_11 = 11
|
| 17 |
+
LINE_12 = 12
|
| 18 |
+
LINE_13 = 13
|
| 19 |
+
LINE_14 = 14
|
| 20 |
+
LINE_15 = 15
|
| 21 |
+
LINE_16 = 16
|
| 22 |
+
LINE_17 = 17
|
| 23 |
+
LINE_18 = 18
|
| 24 |
+
LINE_19 = 19
|
| 25 |
+
LINE_20 = 20
|
| 26 |
+
LINE_21 = 21
|
| 27 |
+
LINE_22 = 22
|
| 28 |
+
LINE_23 = 23
|
| 29 |
+
LINE_24 = 24
|
| 30 |
+
LINE_25 = 25
|
| 31 |
+
LINE_26 = 26
|
| 32 |
+
LINE_27 = 27
|
| 33 |
+
LINE_28 = 28
|
| 34 |
+
LINE_29 = 29
|
| 35 |
+
LINE_30 = 30
|
| 36 |
+
LINE_31 = 31
|
| 37 |
+
LINE_32 = 32
|
| 38 |
+
LINE_33 = 33
|
| 39 |
+
LINE_34 = 34
|
| 40 |
+
LINE_35 = 35
|
| 41 |
+
LINE_36 = 36
|
| 42 |
+
LINE_37 = 37
|
| 43 |
+
LINE_38 = 38
|
| 44 |
+
LINE_39 = 39
|
| 45 |
+
LINE_40 = 40
|
| 46 |
+
LINE_41 = 41
|
| 47 |
+
LINE_42 = 42
|
| 48 |
+
LINE_43 = 43
|
| 49 |
+
LINE_44 = 44
|
| 50 |
+
LINE_45 = 45
|
| 51 |
+
LINE_46 = 46
|
| 52 |
+
LINE_47 = 47
|
| 53 |
+
LINE_48 = 48
|
| 54 |
+
LINE_49 = 49
|
| 55 |
+
LINE_50 = 50
|
| 56 |
+
LINE_51 = 51
|
| 57 |
+
LINE_52 = 52
|
| 58 |
+
LINE_53 = 53
|
| 59 |
+
LINE_54 = 54
|
| 60 |
+
LINE_55 = 55
|
| 61 |
+
LINE_56 = 56
|
| 62 |
+
LINE_57 = 57
|
| 63 |
+
LINE_58 = 58
|
| 64 |
+
LINE_59 = 59
|
| 65 |
+
LINE_60 = 60
|
| 66 |
+
LINE_61 = 61
|
| 67 |
+
LINE_62 = 62
|
| 68 |
+
LINE_63 = 63
|
| 69 |
+
LINE_64 = 64
|
| 70 |
+
LINE_65 = 65
|
| 71 |
+
LINE_66 = 66
|
| 72 |
+
LINE_67 = 67
|
| 73 |
+
LINE_68 = 68
|
| 74 |
+
LINE_69 = 69
|
| 75 |
+
LINE_70 = 70
|
| 76 |
+
LINE_71 = 71
|
| 77 |
+
LINE_72 = 72
|
| 78 |
+
LINE_73 = 73
|
| 79 |
+
LINE_74 = 74
|
| 80 |
+
LINE_75 = 75
|
| 81 |
+
LINE_76 = 76
|
| 82 |
+
LINE_77 = 77
|
| 83 |
+
LINE_78 = 78
|
| 84 |
+
LINE_79 = 79
|
| 85 |
+
LINE_80 = 80
|
| 86 |
+
LINE_81 = 81
|
| 87 |
+
LINE_82 = 82
|
| 88 |
+
LINE_83 = 83
|
| 89 |
+
LINE_84 = 84
|
| 90 |
+
LINE_85 = 85
|
| 91 |
+
LINE_86 = 86
|
| 92 |
+
LINE_87 = 87
|
| 93 |
+
LINE_88 = 88
|
| 94 |
+
LINE_89 = 89
|
| 95 |
+
LINE_90 = 90
|
| 96 |
+
LINE_91 = 91
|
| 97 |
+
LINE_92 = 92
|
| 98 |
+
LINE_93 = 93
|
| 99 |
+
LINE_94 = 94
|
| 100 |
+
LINE_95 = 95
|
| 101 |
+
LINE_96 = 96
|
| 102 |
+
LINE_97 = 97
|
| 103 |
+
LINE_98 = 98
|
| 104 |
+
LINE_99 = 99
|
| 105 |
+
LINE_100 = 100
|
| 106 |
+
LINE_101 = 101
|
| 107 |
+
LINE_102 = 102
|
| 108 |
+
LINE_103 = 103
|
| 109 |
+
LINE_104 = 104
|
| 110 |
+
LINE_105 = 105
|
| 111 |
+
LINE_106 = 106
|
| 112 |
+
LINE_107 = 107
|
| 113 |
+
LINE_108 = 108
|
| 114 |
+
LINE_109 = 109
|
| 115 |
+
LINE_110 = 110
|
| 116 |
+
LINE_111 = 111
|
| 117 |
+
LINE_112 = 112
|
| 118 |
+
LINE_113 = 113
|
| 119 |
+
LINE_114 = 114
|
| 120 |
+
LINE_115 = 115
|
| 121 |
+
LINE_116 = 116
|
| 122 |
+
LINE_117 = 117
|
| 123 |
+
LINE_118 = 118
|
| 124 |
+
LINE_119 = 119
|
| 125 |
+
LINE_120 = 120
|
| 126 |
+
LINE_121 = 121
|
| 127 |
+
LINE_122 = 122
|
| 128 |
+
LINE_123 = 123
|
| 129 |
+
LINE_124 = 124
|
| 130 |
+
LINE_125 = 125
|
| 131 |
+
LINE_126 = 126
|
| 132 |
+
LINE_127 = 127
|
| 133 |
+
LINE_128 = 128
|
| 134 |
+
LINE_129 = 129
|
| 135 |
+
LINE_130 = 130
|
| 136 |
+
LINE_131 = 131
|
| 137 |
+
LINE_132 = 132
|
| 138 |
+
LINE_133 = 133
|
| 139 |
+
LINE_134 = 134
|
| 140 |
+
LINE_135 = 135
|
| 141 |
+
LINE_136 = 136
|
| 142 |
+
LINE_137 = 137
|
| 143 |
+
LINE_138 = 138
|
| 144 |
+
LINE_139 = 139
|
| 145 |
+
LINE_140 = 140
|
| 146 |
+
LINE_141 = 141
|
| 147 |
+
LINE_142 = 142
|
| 148 |
+
LINE_143 = 143
|
| 149 |
+
LINE_144 = 144
|
| 150 |
+
LINE_145 = 145
|
| 151 |
+
LINE_146 = 146
|
| 152 |
+
LINE_147 = 147
|
| 153 |
+
LINE_148 = 148
|
| 154 |
+
LINE_149 = 149
|
| 155 |
+
LINE_150 = 150
|
| 156 |
+
LINE_151 = 151
|
| 157 |
+
LINE_152 = 152
|
| 158 |
+
LINE_153 = 153
|
| 159 |
+
LINE_154 = 154
|
| 160 |
+
LINE_155 = 155
|
| 161 |
+
LINE_156 = 156
|
| 162 |
+
LINE_157 = 157
|
| 163 |
+
LINE_158 = 158
|
| 164 |
+
LINE_159 = 159
|
| 165 |
+
LINE_160 = 160
|
| 166 |
+
LINE_161 = 161
|
| 167 |
+
LINE_162 = 162
|
| 168 |
+
LINE_163 = 163
|
| 169 |
+
LINE_164 = 164
|
| 170 |
+
LINE_165 = 165
|
| 171 |
+
LINE_166 = 166
|
| 172 |
+
LINE_167 = 167
|
| 173 |
+
LINE_168 = 168
|
| 174 |
+
LINE_169 = 169
|
| 175 |
+
LINE_170 = 170
|
| 176 |
+
LINE_171 = 171
|
| 177 |
+
LINE_172 = 172
|
| 178 |
+
LINE_173 = 173
|
| 179 |
+
LINE_174 = 174
|
| 180 |
+
LINE_175 = 175
|
| 181 |
+
LINE_176 = 176
|
| 182 |
+
LINE_177 = 177
|
| 183 |
+
LINE_178 = 178
|
| 184 |
+
LINE_179 = 179
|
| 185 |
+
LINE_180 = 180
|
| 186 |
+
LINE_181 = 181
|
| 187 |
+
LINE_182 = 182
|
| 188 |
+
LINE_183 = 183
|
| 189 |
+
LINE_184 = 184
|
| 190 |
+
LINE_185 = 185
|
| 191 |
+
LINE_186 = 186
|
| 192 |
+
LINE_187 = 187
|
| 193 |
+
LINE_188 = 188
|
| 194 |
+
LINE_189 = 189
|
| 195 |
+
LINE_190 = 190
|
| 196 |
+
LINE_191 = 191
|
| 197 |
+
LINE_192 = 192
|
| 198 |
+
LINE_193 = 193
|
| 199 |
+
LINE_194 = 194
|
| 200 |
+
LINE_195 = 195
|
| 201 |
+
LINE_196 = 196
|
| 202 |
+
LINE_197 = 197
|
| 203 |
+
LINE_198 = 198
|
| 204 |
+
LINE_199 = 199
|
| 205 |
+
LINE_200 = 200
|
| 206 |
+
LINE_201 = 201
|
| 207 |
+
LINE_202 = 202
|
| 208 |
+
LINE_203 = 203
|
| 209 |
+
LINE_204 = 204
|
| 210 |
+
LINE_205 = 205
|
| 211 |
+
LINE_206 = 206
|
| 212 |
+
LINE_207 = 207
|
| 213 |
+
LINE_208 = 208
|
| 214 |
+
LINE_209 = 209
|
| 215 |
+
LINE_210 = 210
|
| 216 |
+
LINE_211 = 211
|
| 217 |
+
LINE_212 = 212
|
| 218 |
+
LINE_213 = 213
|
| 219 |
+
LINE_214 = 214
|
| 220 |
+
LINE_215 = 215
|
| 221 |
+
LINE_216 = 216
|
| 222 |
+
LINE_217 = 217
|
| 223 |
+
LINE_218 = 218
|
| 224 |
+
LINE_219 = 219
|
| 225 |
+
LINE_220 = 220
|
| 226 |
+
LINE_221 = 221
|
| 227 |
+
LINE_222 = 222
|
| 228 |
+
LINE_223 = 223
|
| 229 |
+
LINE_224 = 224
|
| 230 |
+
LINE_225 = 225
|
| 231 |
+
LINE_226 = 226
|
| 232 |
+
LINE_227 = 227
|
| 233 |
+
LINE_228 = 228
|
| 234 |
+
LINE_229 = 229
|
| 235 |
+
LINE_230 = 230
|
| 236 |
+
LINE_231 = 231
|
| 237 |
+
LINE_232 = 232
|
| 238 |
+
LINE_233 = 233
|
| 239 |
+
LINE_234 = 234
|
| 240 |
+
LINE_235 = 235
|
| 241 |
+
LINE_236 = 236
|
| 242 |
+
LINE_237 = 237
|
| 243 |
+
LINE_238 = 238
|
| 244 |
+
LINE_239 = 239
|
| 245 |
+
LINE_240 = 240
|
| 246 |
+
LINE_241 = 241
|
| 247 |
+
LINE_242 = 242
|
| 248 |
+
LINE_243 = 243
|
| 249 |
+
LINE_244 = 244
|
| 250 |
+
LINE_245 = 245
|
| 251 |
+
LINE_246 = 246
|
| 252 |
+
LINE_247 = 247
|
| 253 |
+
LINE_248 = 248
|
| 254 |
+
LINE_249 = 249
|
| 255 |
+
LINE_250 = 250
|
| 256 |
+
LINE_251 = 251
|
| 257 |
+
LINE_252 = 252
|
| 258 |
+
LINE_253 = 253
|
| 259 |
+
LINE_254 = 254
|
| 260 |
+
LINE_255 = 255
|
| 261 |
+
LINE_256 = 256
|
| 262 |
+
LINE_257 = 257
|
| 263 |
+
LINE_258 = 258
|
| 264 |
+
LINE_259 = 259
|
| 265 |
+
LINE_260 = 260
|
| 266 |
+
LINE_261 = 261
|
| 267 |
+
LINE_262 = 262
|
| 268 |
+
LINE_263 = 263
|
| 269 |
+
LINE_264 = 264
|
| 270 |
+
LINE_265 = 265
|
| 271 |
+
LINE_266 = 266
|
| 272 |
+
LINE_267 = 267
|
| 273 |
+
LINE_268 = 268
|
| 274 |
+
LINE_269 = 269
|
| 275 |
+
LINE_270 = 270
|
| 276 |
+
LINE_271 = 271
|
| 277 |
+
LINE_272 = 272
|
| 278 |
+
LINE_273 = 273
|
| 279 |
+
LINE_274 = 274
|
| 280 |
+
LINE_275 = 275
|
| 281 |
+
LINE_276 = 276
|
| 282 |
+
LINE_277 = 277
|
| 283 |
+
LINE_278 = 278
|
| 284 |
+
LINE_279 = 279
|
| 285 |
+
LINE_280 = 280
|
| 286 |
+
LINE_281 = 281
|
| 287 |
+
LINE_282 = 282
|
| 288 |
+
LINE_283 = 283
|
| 289 |
+
LINE_284 = 284
|
| 290 |
+
LINE_285 = 285
|
| 291 |
+
LINE_286 = 286
|
| 292 |
+
LINE_287 = 287
|
| 293 |
+
LINE_288 = 288
|
| 294 |
+
LINE_289 = 289
|
| 295 |
+
LINE_290 = 290
|
| 296 |
+
LINE_291 = 291
|
| 297 |
+
LINE_292 = 292
|
| 298 |
+
LINE_293 = 293
|
| 299 |
+
LINE_294 = 294
|
| 300 |
+
LINE_295 = 295
|
| 301 |
+
LINE_296 = 296
|
| 302 |
+
LINE_297 = 297
|
| 303 |
+
LINE_298 = 298
|
| 304 |
+
LINE_299 = 299
|
| 305 |
+
LINE_300 = 300
|
| 306 |
+
LINE_301 = 301
|
| 307 |
+
LINE_302 = 302
|
| 308 |
+
LINE_303 = 303
|
| 309 |
+
LINE_304 = 304
|
| 310 |
+
LINE_305 = 305
|
| 311 |
+
LINE_306 = 306
|
| 312 |
+
LINE_307 = 307
|
| 313 |
+
LINE_308 = 308
|
| 314 |
+
LINE_309 = 309
|
| 315 |
+
LINE_310 = 310
|
| 316 |
+
LINE_311 = 311
|
| 317 |
+
LINE_312 = 312
|
| 318 |
+
LINE_313 = 313
|
| 319 |
+
LINE_314 = 314
|
| 320 |
+
LINE_315 = 315
|
| 321 |
+
LINE_316 = 316
|
| 322 |
+
LINE_317 = 317
|
| 323 |
+
LINE_318 = 318
|
| 324 |
+
LINE_319 = 319
|
| 325 |
+
LINE_320 = 320
|
| 326 |
+
LINE_321 = 321
|
| 327 |
+
LINE_322 = 322
|
| 328 |
+
LINE_323 = 323
|
| 329 |
+
LINE_324 = 324
|
| 330 |
+
LINE_325 = 325
|
| 331 |
+
LINE_326 = 326
|
| 332 |
+
LINE_327 = 327
|
| 333 |
+
LINE_328 = 328
|
| 334 |
+
LINE_329 = 329
|
| 335 |
+
LINE_330 = 330
|
| 336 |
+
LINE_331 = 331
|
| 337 |
+
LINE_332 = 332
|
| 338 |
+
LINE_333 = 333
|
| 339 |
+
LINE_334 = 334
|
| 340 |
+
LINE_335 = 335
|
| 341 |
+
LINE_336 = 336
|
| 342 |
+
LINE_337 = 337
|
| 343 |
+
LINE_338 = 338
|
| 344 |
+
LINE_339 = 339
|
| 345 |
+
LINE_340 = 340
|
| 346 |
+
LINE_341 = 341
|
| 347 |
+
LINE_342 = 342
|
| 348 |
+
LINE_343 = 343
|
| 349 |
+
LINE_344 = 344
|
| 350 |
+
LINE_345 = 345
|
| 351 |
+
LINE_346 = 346
|
| 352 |
+
LINE_347 = 347
|
| 353 |
+
LINE_348 = 348
|
| 354 |
+
LINE_349 = 349
|
| 355 |
+
LINE_350 = 350
|
| 356 |
+
LINE_351 = 351
|
| 357 |
+
LINE_352 = 352
|
| 358 |
+
LINE_353 = 353
|
| 359 |
+
LINE_354 = 354
|
| 360 |
+
LINE_355 = 355
|
| 361 |
+
LINE_356 = 356
|
| 362 |
+
LINE_357 = 357
|
| 363 |
+
LINE_358 = 358
|
| 364 |
+
LINE_359 = 359
|
| 365 |
+
LINE_360 = 360
|
| 366 |
+
LINE_361 = 361
|
| 367 |
+
LINE_362 = 362
|
| 368 |
+
LINE_363 = 363
|
| 369 |
+
LINE_364 = 364
|
| 370 |
+
LINE_365 = 365
|
| 371 |
+
LINE_366 = 366
|
| 372 |
+
LINE_367 = 367
|
| 373 |
+
LINE_368 = 368
|
| 374 |
+
LINE_369 = 369
|
| 375 |
+
LINE_370 = 370
|
| 376 |
+
LINE_371 = 371
|
| 377 |
+
LINE_372 = 372
|
| 378 |
+
LINE_373 = 373
|
| 379 |
+
LINE_374 = 374
|
| 380 |
+
LINE_375 = 375
|
| 381 |
+
LINE_376 = 376
|
| 382 |
+
LINE_377 = 377
|
| 383 |
+
LINE_378 = 378
|
| 384 |
+
LINE_379 = 379
|
| 385 |
+
LINE_380 = 380
|
| 386 |
+
LINE_381 = 381
|
| 387 |
+
LINE_382 = 382
|
| 388 |
+
LINE_383 = 383
|
| 389 |
+
LINE_384 = 384
|
| 390 |
+
LINE_385 = 385
|
| 391 |
+
LINE_386 = 386
|
| 392 |
+
LINE_387 = 387
|
| 393 |
+
LINE_388 = 388
|
| 394 |
+
LINE_389 = 389
|
| 395 |
+
LINE_390 = 390
|
| 396 |
+
LINE_391 = 391
|
| 397 |
+
LINE_392 = 392
|
| 398 |
+
LINE_393 = 393
|
| 399 |
+
LINE_394 = 394
|
| 400 |
+
LINE_395 = 395
|
| 401 |
+
LINE_396 = 396
|
| 402 |
+
LINE_397 = 397
|
| 403 |
+
LINE_398 = 398
|
| 404 |
+
LINE_399 = 399
|
| 405 |
+
LINE_400 = 400
|
| 406 |
+
LINE_401 = 401
|
| 407 |
+
LINE_402 = 402
|
| 408 |
+
LINE_403 = 403
|
| 409 |
+
LINE_404 = 404
|
| 410 |
+
LINE_405 = 405
|
| 411 |
+
LINE_406 = 406
|
| 412 |
+
LINE_407 = 407
|
| 413 |
+
LINE_408 = 408
|
| 414 |
+
LINE_409 = 409
|
| 415 |
+
LINE_410 = 410
|
| 416 |
+
LINE_411 = 411
|
| 417 |
+
LINE_412 = 412
|
| 418 |
+
LINE_413 = 413
|
| 419 |
+
LINE_414 = 414
|
| 420 |
+
LINE_415 = 415
|
| 421 |
+
LINE_416 = 416
|
| 422 |
+
LINE_417 = 417
|
| 423 |
+
LINE_418 = 418
|
| 424 |
+
LINE_419 = 419
|
| 425 |
+
LINE_420 = 420
|
| 426 |
+
LINE_421 = 421
|
| 427 |
+
LINE_422 = 422
|
| 428 |
+
LINE_423 = 423
|
| 429 |
+
LINE_424 = 424
|
| 430 |
+
LINE_425 = 425
|
| 431 |
+
LINE_426 = 426
|
| 432 |
+
LINE_427 = 427
|
| 433 |
+
LINE_428 = 428
|
| 434 |
+
LINE_429 = 429
|
| 435 |
+
LINE_430 = 430
|
| 436 |
+
|
| 437 |
+
|
| 438 |
+
def helper_alpha() -> int:
|
| 439 |
+
return LINE_10 + LINE_20
|
| 440 |
+
|
| 441 |
+
|
| 442 |
+
def helper_beta() -> int:
|
| 443 |
+
return helper_alpha()
|
| 444 |
+
|
| 445 |
+
|
| 446 |
+
class GiantService:
|
| 447 |
+
def run(self) -> int:
|
| 448 |
+
return helper_beta()
|
| 449 |
+
|
| 450 |
+
|
| 451 |
+
def auto_func_1() -> int:
|
| 452 |
+
return 1
|
| 453 |
+
|
| 454 |
+
|
| 455 |
+
def auto_func_2() -> int:
|
| 456 |
+
return 2
|
| 457 |
+
|
| 458 |
+
|
| 459 |
+
def auto_func_3() -> int:
|
| 460 |
+
return 3
|
| 461 |
+
|
| 462 |
+
|
| 463 |
+
def auto_func_4() -> int:
|
| 464 |
+
return 4
|
| 465 |
+
|
| 466 |
+
|
| 467 |
+
def auto_func_5() -> int:
|
| 468 |
+
return 5
|
| 469 |
+
|
| 470 |
+
|
| 471 |
+
def auto_func_6() -> int:
|
| 472 |
+
return 6
|
| 473 |
+
|
| 474 |
+
|
| 475 |
+
def auto_func_7() -> int:
|
| 476 |
+
return 7
|
| 477 |
+
|
| 478 |
+
|
| 479 |
+
def auto_func_8() -> int:
|
| 480 |
+
return 8
|
| 481 |
+
|
| 482 |
+
|
| 483 |
+
def auto_func_9() -> int:
|
| 484 |
+
return 9
|
| 485 |
+
|
| 486 |
+
|
| 487 |
+
def auto_func_10() -> int:
|
| 488 |
+
return 10
|
| 489 |
+
|
| 490 |
+
|
| 491 |
+
def auto_func_11() -> int:
|
| 492 |
+
return 11
|
| 493 |
+
|
| 494 |
+
|
| 495 |
+
def auto_func_12() -> int:
|
| 496 |
+
return 12
|
| 497 |
+
|
| 498 |
+
|
| 499 |
+
def auto_func_13() -> int:
|
| 500 |
+
return 13
|
| 501 |
+
|
| 502 |
+
|
| 503 |
+
def auto_func_14() -> int:
|
| 504 |
+
return 14
|
| 505 |
+
|
| 506 |
+
|
| 507 |
+
def auto_func_15() -> int:
|
| 508 |
+
return 15
|
| 509 |
+
|
| 510 |
+
|
| 511 |
+
def auto_func_16() -> int:
|
| 512 |
+
return 16
|
| 513 |
+
|
| 514 |
+
|
| 515 |
+
def auto_func_17() -> int:
|
| 516 |
+
return 17
|
| 517 |
+
|
| 518 |
+
|
| 519 |
+
def auto_func_18() -> int:
|
| 520 |
+
return 18
|
| 521 |
+
|
| 522 |
+
|
| 523 |
+
def auto_func_19() -> int:
|
| 524 |
+
return 19
|
| 525 |
+
|
| 526 |
+
|
| 527 |
+
def auto_func_20() -> int:
|
| 528 |
+
return 20
|
| 529 |
+
|
| 530 |
+
|
| 531 |
+
def auto_func_21() -> int:
|
| 532 |
+
return 21
|
| 533 |
+
|
| 534 |
+
|
| 535 |
+
def auto_func_22() -> int:
|
| 536 |
+
return 22
|
| 537 |
+
|
| 538 |
+
|
| 539 |
+
def auto_func_23() -> int:
|
| 540 |
+
return 23
|
| 541 |
+
|
| 542 |
+
|
| 543 |
+
def auto_func_24() -> int:
|
| 544 |
+
return 24
|
| 545 |
+
|
| 546 |
+
|
| 547 |
+
def auto_func_25() -> int:
|
| 548 |
+
return 25
|
| 549 |
+
|
| 550 |
+
|
| 551 |
+
def auto_func_26() -> int:
|
| 552 |
+
return 26
|
| 553 |
+
|
| 554 |
+
|
| 555 |
+
def auto_func_27() -> int:
|
| 556 |
+
return 27
|
| 557 |
+
|
| 558 |
+
|
| 559 |
+
def auto_func_28() -> int:
|
| 560 |
+
return 28
|
| 561 |
+
|
| 562 |
+
|
| 563 |
+
def auto_func_29() -> int:
|
| 564 |
+
return 29
|
| 565 |
+
|
| 566 |
+
|
| 567 |
+
def auto_func_30() -> int:
|
| 568 |
+
return 30
|
| 569 |
+
|
| 570 |
+
|
| 571 |
+
def auto_func_31() -> int:
|
| 572 |
+
return 31
|
| 573 |
+
|
| 574 |
+
|
| 575 |
+
def auto_func_32() -> int:
|
| 576 |
+
return 32
|
| 577 |
+
|
| 578 |
+
|
| 579 |
+
def auto_func_33() -> int:
|
| 580 |
+
return 33
|
| 581 |
+
|
| 582 |
+
|
| 583 |
+
def auto_func_34() -> int:
|
| 584 |
+
return 34
|
| 585 |
+
|
| 586 |
+
|
| 587 |
+
def auto_func_35() -> int:
|
| 588 |
+
return 35
|
| 589 |
+
|
| 590 |
+
|
| 591 |
+
def auto_func_36() -> int:
|
| 592 |
+
return 36
|
| 593 |
+
|
| 594 |
+
|
| 595 |
+
def auto_func_37() -> int:
|
| 596 |
+
return 37
|
| 597 |
+
|
| 598 |
+
|
| 599 |
+
def auto_func_38() -> int:
|
| 600 |
+
return 38
|
| 601 |
+
|
| 602 |
+
|
| 603 |
+
def auto_func_39() -> int:
|
| 604 |
+
return 39
|
| 605 |
+
|
| 606 |
+
|
| 607 |
+
def auto_func_40() -> int:
|
| 608 |
+
return 40
|
| 609 |
+
|
| 610 |
+
|
| 611 |
+
def auto_func_41() -> int:
|
| 612 |
+
return 41
|
| 613 |
+
|
| 614 |
+
|
| 615 |
+
def auto_func_42() -> int:
|
| 616 |
+
return 42
|
| 617 |
+
|
| 618 |
+
|
| 619 |
+
def auto_func_43() -> int:
|
| 620 |
+
return 43
|
| 621 |
+
|
| 622 |
+
|
| 623 |
+
def auto_func_44() -> int:
|
| 624 |
+
return 44
|
| 625 |
+
|
| 626 |
+
|
| 627 |
+
def auto_func_45() -> int:
|
| 628 |
+
return 45
|
code-review-env/sample_project/inventory.py
ADDED
|
@@ -0,0 +1,10 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
from validators import is_non_empty
|
| 2 |
+
|
| 3 |
+
|
| 4 |
+
STOCK = {"widget": 4, "gizmo": 0}
|
| 5 |
+
|
| 6 |
+
|
| 7 |
+
def is_available(item_name: str) -> bool:
|
| 8 |
+
if not is_non_empty(item_name):
|
| 9 |
+
return False
|
| 10 |
+
return STOCK.get(item_name, 0) > 0
|
code-review-env/sample_project/notifications.py
ADDED
|
@@ -0,0 +1,6 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
import smtplib
|
| 2 |
+
|
| 3 |
+
|
| 4 |
+
def send_email(recipient: str, body: str) -> None:
|
| 5 |
+
client = smtplib.SMTP("localhost")
|
| 6 |
+
client.sendmail("noreply@example.com", [recipient], body)
|
code-review-env/sample_project/payments.py
ADDED
|
@@ -0,0 +1,15 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""Payment gateway wrapper."""
|
| 2 |
+
|
| 3 |
+
import subprocess
|
| 4 |
+
|
| 5 |
+
|
| 6 |
+
def run_gateway_check(endpoint: str) -> int:
|
| 7 |
+
# SECURITY ISSUE: user-provided endpoint is interpolated in a shell command.
|
| 8 |
+
command = f"curl -s {endpoint}"
|
| 9 |
+
return subprocess.call(command, shell=True)
|
| 10 |
+
|
| 11 |
+
|
| 12 |
+
def charge(total: float) -> str:
|
| 13 |
+
if total <= 0:
|
| 14 |
+
return "rejected"
|
| 15 |
+
return "charged"
|
code-review-env/sample_project/utils.py
ADDED
|
@@ -0,0 +1,7 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
from inventory import is_available
|
| 2 |
+
|
| 3 |
+
|
| 4 |
+
def pick_item(preferred: str, fallback: str) -> str:
|
| 5 |
+
if is_available(preferred):
|
| 6 |
+
return preferred
|
| 7 |
+
return fallback
|
code-review-env/sample_project/validators.py
ADDED
|
@@ -0,0 +1,8 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
|
| 2 |
+
def is_non_empty(value: str | None) -> bool:
|
| 3 |
+
return value is not None and value.strip() != ""
|
| 4 |
+
|
| 5 |
+
|
| 6 |
+
def validate_coupon(code: str | None) -> bool:
|
| 7 |
+
# Intentional bug: accepts invalid short code when value is None
|
| 8 |
+
return (code or "").startswith("SAVE")
|
code-review-env/tests/test_phase2_graph_manager.py
ADDED
|
@@ -0,0 +1,32 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
from pathlib import Path
|
| 2 |
+
|
| 3 |
+
from db.seed import seed_project
|
| 4 |
+
from graph.graph_manager import GraphManager
|
| 5 |
+
|
| 6 |
+
|
| 7 |
+
def test_graph_manager_traversal_is_deterministic(tmp_path: Path) -> None:
|
| 8 |
+
db_path = tmp_path / "phase2_graph.db"
|
| 9 |
+
seed_project(Path("sample_project"), db_path=str(db_path), force=True)
|
| 10 |
+
|
| 11 |
+
manager = GraphManager(source_root="sample_project", db_path=db_path)
|
| 12 |
+
first = manager.traversal_order()
|
| 13 |
+
second = manager.traversal_order()
|
| 14 |
+
|
| 15 |
+
assert first == second
|
| 16 |
+
assert len(first) > 0
|
| 17 |
+
|
| 18 |
+
|
| 19 |
+
def test_graph_manager_neighbor_queries(tmp_path: Path) -> None:
|
| 20 |
+
db_path = tmp_path / "phase2_graph_neighbors.db"
|
| 21 |
+
seed_project(Path("sample_project"), db_path=str(db_path), force=True)
|
| 22 |
+
|
| 23 |
+
manager = GraphManager(source_root="sample_project", db_path=db_path)
|
| 24 |
+
graph = manager.load_graph()
|
| 25 |
+
candidate = next(iter(graph.nodes()))
|
| 26 |
+
|
| 27 |
+
both = manager.get_neighbors(candidate, direction="both")
|
| 28 |
+
only_out = manager.get_neighbors(candidate, direction="out")
|
| 29 |
+
only_in = manager.get_neighbors(candidate, direction="in")
|
| 30 |
+
|
| 31 |
+
assert set(only_out).issubset(set(both))
|
| 32 |
+
assert set(only_in).issubset(set(both))
|
code-review-env/tests/test_phase2_observation.py
ADDED
|
@@ -0,0 +1,55 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
from pathlib import Path
|
| 2 |
+
|
| 3 |
+
import pytest
|
| 4 |
+
|
| 5 |
+
from db.seed import seed_project
|
| 6 |
+
from env.observation import CodeObservation
|
| 7 |
+
from env.observation_builder import ObservationBuilder
|
| 8 |
+
|
| 9 |
+
|
| 10 |
+
def test_code_observation_strict_rejects_bad_types() -> None:
|
| 11 |
+
with pytest.raises(Exception):
|
| 12 |
+
CodeObservation(
|
| 13 |
+
module_id="checkout",
|
| 14 |
+
code="print('x')",
|
| 15 |
+
ast_summary={},
|
| 16 |
+
dependency_summaries=[],
|
| 17 |
+
dependent_summaries=[],
|
| 18 |
+
neighbor_reviews=[],
|
| 19 |
+
task_description="review",
|
| 20 |
+
available_actions=[],
|
| 21 |
+
requested_context=None,
|
| 22 |
+
token_usage={},
|
| 23 |
+
total_tokens="100", # type: ignore[arg-type]
|
| 24 |
+
within_budget=True,
|
| 25 |
+
)
|
| 26 |
+
|
| 27 |
+
|
| 28 |
+
def test_observation_builder_within_budget(tmp_path: Path) -> None:
|
| 29 |
+
db_path = tmp_path / "phase2_obs.db"
|
| 30 |
+
seed_project(Path("sample_project"), db_path=str(db_path), force=True)
|
| 31 |
+
|
| 32 |
+
builder = ObservationBuilder(source_root="sample_project", db_path=db_path)
|
| 33 |
+
observation = builder.build(
|
| 34 |
+
module_id="checkout",
|
| 35 |
+
task_description="Find logic and dependency issues",
|
| 36 |
+
)
|
| 37 |
+
|
| 38 |
+
assert observation.within_budget is True
|
| 39 |
+
assert observation.total_tokens <= 2000
|
| 40 |
+
assert observation.module_id == "checkout"
|
| 41 |
+
|
| 42 |
+
|
| 43 |
+
def test_request_context_is_bounded(tmp_path: Path) -> None:
|
| 44 |
+
db_path = tmp_path / "phase2_context.db"
|
| 45 |
+
seed_project(Path("sample_project"), db_path=str(db_path), force=True)
|
| 46 |
+
|
| 47 |
+
builder = ObservationBuilder(source_root="sample_project", db_path=db_path)
|
| 48 |
+
observation = builder.build(
|
| 49 |
+
module_id="checkout",
|
| 50 |
+
task_description="Investigate dependencies",
|
| 51 |
+
context_request="auth",
|
| 52 |
+
)
|
| 53 |
+
|
| 54 |
+
assert observation.requested_context is not None
|
| 55 |
+
assert observation.total_tokens <= 2000
|
code-review-env/tests/test_phase2_token_budget.py
ADDED
|
@@ -0,0 +1,42 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
from graph.token_budget import MAX_TOTAL_TOKENS, TokenBudget
|
| 2 |
+
|
| 3 |
+
|
| 4 |
+
def test_token_budget_enforces_hard_cap() -> None:
|
| 5 |
+
budget = TokenBudget()
|
| 6 |
+
huge = "x" * 50000
|
| 7 |
+
|
| 8 |
+
result = budget.enforce(
|
| 9 |
+
{
|
| 10 |
+
"code": huge,
|
| 11 |
+
"ast_summary_text": huge,
|
| 12 |
+
"dependency_summaries": [huge, huge],
|
| 13 |
+
"dependent_summaries": [huge],
|
| 14 |
+
"neighbor_reviews": [huge],
|
| 15 |
+
"task_description": huge,
|
| 16 |
+
"available_actions": ["FLAG_BUG"],
|
| 17 |
+
"requested_context_code": huge,
|
| 18 |
+
}
|
| 19 |
+
)
|
| 20 |
+
|
| 21 |
+
assert result.total_tokens <= MAX_TOTAL_TOKENS
|
| 22 |
+
|
| 23 |
+
|
| 24 |
+
def test_token_budget_marks_truncation() -> None:
|
| 25 |
+
budget = TokenBudget()
|
| 26 |
+
huge = "z" * 20000
|
| 27 |
+
|
| 28 |
+
result = budget.enforce(
|
| 29 |
+
{
|
| 30 |
+
"code": huge,
|
| 31 |
+
"ast_summary_text": "{}",
|
| 32 |
+
"dependency_summaries": [],
|
| 33 |
+
"dependent_summaries": [],
|
| 34 |
+
"neighbor_reviews": [],
|
| 35 |
+
"task_description": "task",
|
| 36 |
+
"available_actions": ["REQUEST_CONTEXT"],
|
| 37 |
+
"requested_context_code": huge,
|
| 38 |
+
}
|
| 39 |
+
)
|
| 40 |
+
|
| 41 |
+
trimmed_code = str(result.payload["code"])
|
| 42 |
+
assert "[TRUNCATED]" in trimmed_code
|
code-review-env/tests/test_seed.py
ADDED
|
@@ -0,0 +1,30 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
from pathlib import Path
|
| 2 |
+
|
| 3 |
+
from db.seed import seed_project
|
| 4 |
+
from parser.ast_parser import parse_python_file
|
| 5 |
+
from parser.chunker import chunk_module
|
| 6 |
+
|
| 7 |
+
|
| 8 |
+
def test_seed_project_uses_hash_cache(tmp_path: Path) -> None:
|
| 9 |
+
db_path = tmp_path / "seed.db"
|
| 10 |
+
target = Path("sample_project")
|
| 11 |
+
|
| 12 |
+
first = seed_project(target, db_path=str(db_path), force=False)
|
| 13 |
+
second = seed_project(target, db_path=str(db_path), force=False)
|
| 14 |
+
|
| 15 |
+
assert first["loaded_from_cache"] is False
|
| 16 |
+
assert second["loaded_from_cache"] is True
|
| 17 |
+
assert first["node_count"] == second["node_count"]
|
| 18 |
+
assert first["edge_count"] == second["edge_count"]
|
| 19 |
+
|
| 20 |
+
|
| 21 |
+
def test_chunker_splits_large_module_into_sub_nodes() -> None:
|
| 22 |
+
root = Path("sample_project")
|
| 23 |
+
parsed = parse_python_file(root / "huge_module.py", root)
|
| 24 |
+
chunked = chunk_module(parsed, max_lines=300)
|
| 25 |
+
|
| 26 |
+
assert chunked.parent.module_id == "huge_module"
|
| 27 |
+
assert chunked.parent.code == ""
|
| 28 |
+
assert len(chunked.chunks) >= 2
|
| 29 |
+
assert all(chunk.parent_module_id == "huge_module" for chunk in chunked.chunks)
|
| 30 |
+
assert any("::helper_alpha" in chunk.module_id for chunk in chunked.chunks)
|
plans/phase-02-graph-manager-observation-plan.md
ADDED
|
@@ -0,0 +1,206 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Phase 2 Plan — Graph Manager & Observation Builder (for GPT-5.3)
|
| 2 |
+
|
| 3 |
+
## Objective
|
| 4 |
+
Deliver Phase 2 only:
|
| 5 |
+
- graph/graph_manager.py: load graph from SQLite, traversal order, neighbor queries
|
| 6 |
+
- graph/token_budget.py: hard 2000-token enforcement with per-component limits
|
| 7 |
+
- env/observation.py: strict Pydantic v2 CodeObservation model
|
| 8 |
+
|
| 9 |
+
No Phase 3+ implementation in this phase.
|
| 10 |
+
|
| 11 |
+
## Context7-Validated Constraints To Use
|
| 12 |
+
1. SQLAlchemy 2.0 + SQLite:
|
| 13 |
+
- Use SQLAlchemy ORM patterns with Declarative models and explicit Session boundaries.
|
| 14 |
+
- Keep read-heavy graph fetches in short-lived sessions.
|
| 15 |
+
|
| 16 |
+
2. NetworkX traversal and determinism:
|
| 17 |
+
- Use DAG topological utilities when possible.
|
| 18 |
+
- Use deterministic ordering (lexicographical tie-breaking) to avoid run-to-run drift.
|
| 19 |
+
- Betweenness centrality is available for ranking high-impact nodes.
|
| 20 |
+
|
| 21 |
+
3. Pydantic v2 model strictness:
|
| 22 |
+
- Use BaseModel with strict config and forbid unknown fields.
|
| 23 |
+
- Use model_validate/model_dump APIs consistently.
|
| 24 |
+
|
| 25 |
+
## Current Codebase Reality (important for Phase 2)
|
| 26 |
+
1. Existing graph logic is in env/graph.py, not graph/graph_manager.py.
|
| 27 |
+
2. env/observation_builder.py and env/models.py are placeholders.
|
| 28 |
+
3. DB layer currently uses SQLModel schema classes in db/schema.py.
|
| 29 |
+
|
| 30 |
+
Implication: Phase 2 should add the target files while preserving compatibility with existing imports/tests where possible.
|
| 31 |
+
|
| 32 |
+
## Proposed Phase 2 Deliverables
|
| 33 |
+
|
| 34 |
+
### 1) Create graph package and GraphManager
|
| 35 |
+
Files:
|
| 36 |
+
- code-review-env/graph/__init__.py
|
| 37 |
+
- code-review-env/graph/graph_manager.py
|
| 38 |
+
|
| 39 |
+
Planned API:
|
| 40 |
+
- class GraphManager:
|
| 41 |
+
- __init__(self, source_root: str, db_path: str | None = None)
|
| 42 |
+
- load_graph(self) -> nx.DiGraph
|
| 43 |
+
- get_node(self, module_id: str) -> dict[str, object]
|
| 44 |
+
- get_neighbors(self, module_id: str, direction: Literal["out", "in", "both"], limit: int | None = None) -> list[str]
|
| 45 |
+
- traversal_order(self) -> list[str]
|
| 46 |
+
- centrality(self) -> dict[str, float]
|
| 47 |
+
|
| 48 |
+
Implementation rules:
|
| 49 |
+
- Load modules/edges from SQLite as source of truth.
|
| 50 |
+
- Add all module metadata needed for observations as node attributes.
|
| 51 |
+
- traversal_order target behavior:
|
| 52 |
+
- Prefer leaf-first review order.
|
| 53 |
+
- Push high-centrality nodes later.
|
| 54 |
+
- Deterministic tie-breaker by module_id.
|
| 55 |
+
- Recommended approach:
|
| 56 |
+
- Reverse-edge DAG ordering for leaf-first when acyclic.
|
| 57 |
+
- If cyclic, condense SCCs or apply stable fallback ordering by:
|
| 58 |
+
1) out_degree ascending
|
| 59 |
+
2) betweenness centrality ascending
|
| 60 |
+
3) module_id ascending
|
| 61 |
+
|
| 62 |
+
Compatibility note:
|
| 63 |
+
- Keep env/graph.py as a thin wrapper or adapter to GraphManager until all callers migrate.
|
| 64 |
+
|
| 65 |
+
### 2) Implement hard token budget module
|
| 66 |
+
File:
|
| 67 |
+
- code-review-env/graph/token_budget.py
|
| 68 |
+
|
| 69 |
+
Constants:
|
| 70 |
+
- MAX_TOTAL_TOKENS = 2000
|
| 71 |
+
- COMPONENT_BUDGETS (initial defaults from plan):
|
| 72 |
+
- current_code: 800
|
| 73 |
+
- ast_summary: 100
|
| 74 |
+
- direct_deps: 250
|
| 75 |
+
- dependents: 150
|
| 76 |
+
- neighbor_reviews: 120
|
| 77 |
+
- task_and_actions: 200
|
| 78 |
+
- buffer: 280
|
| 79 |
+
|
| 80 |
+
Planned API:
|
| 81 |
+
- estimate_tokens(text: str) -> int
|
| 82 |
+
- truncate_to_budget(text: str, max_tokens: int, suffix_notice: str) -> str
|
| 83 |
+
- allocate_budget(components: dict[str, str | list[str]]) -> dict[str, object]
|
| 84 |
+
- returns included/truncated text + per-component token usage + total
|
| 85 |
+
- enforce_observation_budget(observation_payload: dict[str, object]) -> dict[str, object]
|
| 86 |
+
|
| 87 |
+
Implementation rules:
|
| 88 |
+
- Budget must be enforced, never advisory.
|
| 89 |
+
- If full payload exceeds 2000, trim in priority order:
|
| 90 |
+
1) dependent summaries
|
| 91 |
+
2) neighbor reviews
|
| 92 |
+
3) direct dependency summaries (lowest-ranked first)
|
| 93 |
+
4) current code (but preserve critical context header + truncation notice)
|
| 94 |
+
- REQUEST_CONTEXT path must still obey MAX_TOTAL_TOKENS and return full neighbor code only when it fits; otherwise return bounded code + explicit truncation marker.
|
| 95 |
+
|
| 96 |
+
Token estimator policy:
|
| 97 |
+
- Start with deterministic approximation for stability (for example chars/4 heuristic).
|
| 98 |
+
- Keep estimator in one function to allow later swap to model-specific tokenizer without API break.
|
| 99 |
+
|
| 100 |
+
### 3) Implement strict Pydantic observation model
|
| 101 |
+
File:
|
| 102 |
+
- code-review-env/env/observation.py
|
| 103 |
+
|
| 104 |
+
Planned models:
|
| 105 |
+
- class NeighborSummary(BaseModel)
|
| 106 |
+
- module_id: str
|
| 107 |
+
- relation: Literal["dependency", "dependent"]
|
| 108 |
+
- summary: str
|
| 109 |
+
- review_snippet: str | None
|
| 110 |
+
|
| 111 |
+
- class RequestedContext(BaseModel)
|
| 112 |
+
- module_id: str
|
| 113 |
+
- code: str
|
| 114 |
+
- was_truncated: bool
|
| 115 |
+
|
| 116 |
+
- class CodeObservation(BaseModel)
|
| 117 |
+
- module_id: str
|
| 118 |
+
- code: str
|
| 119 |
+
- ast_summary: dict[str, object]
|
| 120 |
+
- dependency_summaries: list[NeighborSummary]
|
| 121 |
+
- dependent_summaries: list[NeighborSummary]
|
| 122 |
+
- neighbor_reviews: list[str]
|
| 123 |
+
- task_description: str
|
| 124 |
+
- available_actions: list[str]
|
| 125 |
+
- requested_context: RequestedContext | None = None
|
| 126 |
+
- token_usage: dict[str, int]
|
| 127 |
+
- total_tokens: int
|
| 128 |
+
- within_budget: bool
|
| 129 |
+
|
| 130 |
+
Model config:
|
| 131 |
+
- strict=True
|
| 132 |
+
- extra="forbid"
|
| 133 |
+
|
| 134 |
+
Validation rules:
|
| 135 |
+
- total_tokens <= 2000 must be true.
|
| 136 |
+
- module_id and code cannot be empty.
|
| 137 |
+
- dependency/dependent list limits enforced before serialization.
|
| 138 |
+
|
| 139 |
+
### 4) Observation assembly integration path
|
| 140 |
+
File to update in Phase 2:
|
| 141 |
+
- code-review-env/env/observation_builder.py
|
| 142 |
+
|
| 143 |
+
Plan:
|
| 144 |
+
- Replace placeholder with builder that composes:
|
| 145 |
+
- GraphManager neighbor and ordering queries
|
| 146 |
+
- DB-backed module source + summaries + review annotations
|
| 147 |
+
- TokenBudget allocation and enforcement
|
| 148 |
+
- CodeObservation validation
|
| 149 |
+
|
| 150 |
+
Behavior:
|
| 151 |
+
- Default observation returns current module + compressed neighbors.
|
| 152 |
+
- REQUEST_CONTEXT(module_id): include requested neighbor code in requested_context while still meeting global budget.
|
| 153 |
+
|
| 154 |
+
## Verification Plan (must pass before Phase 2 complete)
|
| 155 |
+
|
| 156 |
+
### A) Unit tests to add/update
|
| 157 |
+
1. tests/test_graph_manager_phase2.py
|
| 158 |
+
- load_graph builds expected node/edge counts from seeded DB.
|
| 159 |
+
- traversal_order places leaf nodes earlier than high-centrality hubs.
|
| 160 |
+
- ordering is deterministic across repeated calls.
|
| 161 |
+
|
| 162 |
+
2. tests/test_token_budget_phase2.py
|
| 163 |
+
- enforce_observation_budget always returns total_tokens <= 2000.
|
| 164 |
+
- long current code is truncated with explicit notice.
|
| 165 |
+
- REQUEST_CONTEXT path stays within 2000.
|
| 166 |
+
|
| 167 |
+
3. tests/test_observation_phase2.py
|
| 168 |
+
- CodeObservation strict validation rejects unknown fields/type coercion.
|
| 169 |
+
- valid payload serializes with model_dump and preserves token fields.
|
| 170 |
+
|
| 171 |
+
### B) Scenario checks
|
| 172 |
+
1. Seed sample_project SQLite DB.
|
| 173 |
+
2. Build observation for every module_id in modules table.
|
| 174 |
+
3. Assert all observations are within budget.
|
| 175 |
+
4. Trigger REQUEST_CONTEXT for high-fanout node and validate bounded response.
|
| 176 |
+
|
| 177 |
+
### C) Determinism checks
|
| 178 |
+
1. Run traversal_order 10 times on same DB snapshot.
|
| 179 |
+
2. Output order must be identical each run.
|
| 180 |
+
|
| 181 |
+
## Risks and Mitigations
|
| 182 |
+
1. Existing env/graph.py may conflict with new graph/graph_manager.py.
|
| 183 |
+
- Mitigation: keep wrapper compatibility until callers migrate.
|
| 184 |
+
|
| 185 |
+
2. SQLModel vs SQLAlchemy ORM naming mismatch in current schema.
|
| 186 |
+
- Mitigation: Phase 2 consumes existing schema as-is; DB table redesign deferred unless explicitly approved.
|
| 187 |
+
|
| 188 |
+
3. Token estimation mismatch vs actual model tokenizer.
|
| 189 |
+
- Mitigation: enforce conservative budget with safety buffer; keep estimator swappable.
|
| 190 |
+
|
| 191 |
+
## Design Questions To Resolve Before Implementation
|
| 192 |
+
1. File structure decision:
|
| 193 |
+
- Should Phase 2 introduce new graph/ package now and keep env/graph.py compatibility wrapper, or refactor callers immediately?
|
| 194 |
+
|
| 195 |
+
2. Schema alignment decision:
|
| 196 |
+
- Keep current SQLModel-backed tables in Phase 2 and map to planned names later, or perform a schema migration now?
|
| 197 |
+
|
| 198 |
+
3. REQUEST_CONTEXT strictness:
|
| 199 |
+
- If full neighbor code cannot fit, should response be truncated (with marker) or should the action fail with explicit error and no code body?
|
| 200 |
+
|
| 201 |
+
## Definition of Done for Phase 2
|
| 202 |
+
1. graph/graph_manager.py, graph/token_budget.py, env/observation.py implemented with type hints and docstrings.
|
| 203 |
+
2. observation_builder builds validated CodeObservation objects.
|
| 204 |
+
3. All Phase 2 tests pass.
|
| 205 |
+
4. Every generated observation satisfies hard <= 2000 token limit.
|
| 206 |
+
5. Traversal order behavior matches leaf-first and high-centrality-last intent with deterministic ties.
|