shreyas-joshi commited on
Commit
899a7c7
·
1 Parent(s): cf05092

feat: Implement chunking and graph management for code review environment

Browse files

- Added chunker module to split parsed Python modules into manageable chunks.
- Introduced graph builder to create edges between code chunks and modules.
- Created sample project files for authentication, cart calculations, checkout flow, and configuration.
- Implemented utility functions for inventory management and email notifications.
- Developed payment gateway wrapper with security considerations.
- Added validators for input checks and coupon validation.
- Created extensive test suite for graph manager, observation builder, and token budget enforcement.
- Documented Phase 2 plan for graph manager and observation builder integration.

Files changed (38) hide show
  1. Builder.md +59 -101
  2. Debugger.md +57 -69
  3. OpenEnv +1 -0
  4. Phases.md +378 -240
  5. Reviewer.md +94 -0
  6. code-review-env/README.md +61 -2
  7. code-review-env/db/database.py +3 -0
  8. code-review-env/db/models.py +25 -0
  9. code-review-env/db/schema.py +13 -1
  10. code-review-env/db/seed.py +143 -0
  11. code-review-env/db/store.py +31 -0
  12. code-review-env/env/graph.py +20 -67
  13. code-review-env/env/observation.py +62 -0
  14. code-review-env/env/observation_builder.py +143 -1
  15. code-review-env/graph/__init__.py +5 -0
  16. code-review-env/graph/graph_manager.py +125 -0
  17. code-review-env/graph/token_budget.py +117 -0
  18. code-review-env/parser/ast_parser.py +41 -11
  19. code-review-env/parser/chunker.py +96 -0
  20. code-review-env/parser/graph_builder.py +114 -0
  21. code-review-env/parser/linter.py +29 -0
  22. code-review-env/requirements.txt +1 -0
  23. code-review-env/sample_project/auth.py +7 -0
  24. code-review-env/sample_project/cart.py +17 -0
  25. code-review-env/sample_project/checkout.py +15 -0
  26. code-review-env/sample_project/config.py +6 -0
  27. code-review-env/sample_project/database.py +6 -0
  28. code-review-env/sample_project/huge_module.py +628 -0
  29. code-review-env/sample_project/inventory.py +10 -0
  30. code-review-env/sample_project/notifications.py +6 -0
  31. code-review-env/sample_project/payments.py +15 -0
  32. code-review-env/sample_project/utils.py +7 -0
  33. code-review-env/sample_project/validators.py +8 -0
  34. code-review-env/tests/test_phase2_graph_manager.py +32 -0
  35. code-review-env/tests/test_phase2_observation.py +55 -0
  36. code-review-env/tests/test_phase2_token_budget.py +42 -0
  37. code-review-env/tests/test_seed.py +30 -0
  38. plans/phase-02-graph-manager-observation-plan.md +206 -0
Builder.md CHANGED
@@ -1,138 +1,96 @@
1
- # Builder Prompt — CodeReviewEnv
2
 
3
- You are an expert Python engineer building a reinforcement learning environment called **CodeReviewEnv** for the OpenEnv Hackathon Round 1. Read everything below before writing a single line of code.
4
 
5
  ---
6
 
7
  ## What You Are Building
8
 
9
- An OpenEnv-compliant RL environment where an LLM agent learns to perform dependency-aware code review on a Python codebase.
10
 
11
- The environment:
12
- 1. Parses a Python codebase into a **persistent dependency graph** stored in SQLite via SQLModel. Nodes = modules. Edges = import relationships.
13
- 2. Each node stores: full source code, compressed AST summary (~50 tokens), linter ground truth (pylint + bandit output), and agent-written review annotations.
14
- 3. The agent reviews one module per episode via a multi-step loop: `reset()` → `step(action)` × N → done.
15
- 4. The agent sees **full code of the current module only**. Neighbors are always compressed summaries — never full code. This is a hard constraint for token budget.
16
- 5. The agent can take actions: FLAG_BUG, FLAG_STYLE, FLAG_SECURITY, FLAG_DEPENDENCY_ISSUE, ADD_COMMENT, REQUEST_CHANGES, APPROVE, REQUEST_CONTEXT (costs -0.1 reward), AMEND_REVIEW (updates a neighbor's annotation retroactively).
17
- 6. Rewards are computed by graders against pre-computed ground truth stored in the DB.
18
- 7. The final output is an annotated dependency graph — all module reviews, cross-module causal attributions, readable as JSON and Markdown.
19
 
20
- The key differentiator: the environment models **cascading bugs** — where a bug in module B is caused by a design decision in module A. The agent is rewarded for identifying the upstream root cause, not just flagging the surface symptom.
21
 
22
  ---
23
 
24
- ## Persistence Strategy
25
 
26
- **SQLite + SQLModel. This is non-negotiable for demo performance.**
27
 
28
- - On first run: parse sample_codebase/ populate DB with all nodes, edges, linter flags
29
- - On subsequent runs: detect DB exists skip parsing → load graph directly
30
- - `reset()` clears only review annotations, never graph structure
31
- - All episode history is stored for reproducibility
 
32
 
33
- Use Context7 MCP to look up SQLModel, NetworkX, pylint programmatic API, bandit API, and OpenEnv spec documentation before implementing each component. Do not guess at APIs — look them up.
34
 
35
- ---
36
 
37
- ## Tech Stack
38
 
39
- - Python 3.11
40
- - SQLModel (SQLite persistence)
41
- - NetworkX (graph construction and traversal)
42
- - FastAPI (HTTP server for OpenEnv spec)
43
- - Pydantic v2 (typed models)
44
- - pylint + bandit (linter ground truth)
45
- - Python `ast` module (AST parsing — stdlib, no extras)
46
- - OpenAI client (all LLM calls in inference.py and hard grader)
47
- - Docker (containerization)
48
 
49
- ---
50
 
51
- ## Project Structure
52
 
53
- Follow this structure exactly do not deviate:
54
 
55
- ```
56
- code-review-env/
57
- ├── openenv.yaml
58
- ├── Dockerfile
59
- ├── README.md
60
- ├── inference.py
61
- ├── requirements.txt
62
- ├── env/
63
- │ ├── environment.py
64
- │ ├── models.py
65
- │ ├── graph.py
66
- │ ├── observation_builder.py
67
- │ └── reward.py
68
- ├── db/
69
- │ ├── schema.py
70
- │ ├── store.py
71
- │ └── migrations.py
72
- ├── parser/
73
- │ ├── ast_parser.py
74
- │ ├── linter.py
75
- │ └── summarizer.py
76
- ├── graders/
77
- │ ├── base_grader.py
78
- │ ├── easy_grader.py
79
- │ ├── medium_grader.py
80
- │ └── hard_grader.py
81
- ├── tasks/
82
- │ ├── task_registry.py
83
- │ ├── easy_task.py
84
- │ ├── medium_task.py
85
- │ └── hard_task.py
86
- ├── server/
87
- │ └── app.py
88
- ├── sample_codebase/
89
- │ ├── auth.py
90
- │ ├── checkout.py
91
- │ ├── cart.py
92
- │ ├── payments.py
93
- │ ├── config.py
94
- │ └── ground_truth.json
95
- └── tests/
96
- ```
97
 
98
  ---
99
 
100
- ## Phase You Are Currently Building
101
-
102
- **[INSERT PHASE NUMBER AND NAME HERE]**
103
 
104
- Refer to the phase plan for exact tasks and completion criteria for this phase. Build only what is scoped to this phase. Do not build ahead.
105
 
106
  ---
107
 
108
- ## Non-Negotiable Constraints
109
 
110
- 1. All rewards must be clipped to 0.0–1.0. Never return outside this range.
111
- 2. Never feed full neighbor code into observations. Always use compressed summaries.
112
- 3. inference.py must use OpenAI client. Read API_BASE_URL, MODEL_NAME, HF_TOKEN from env vars.
113
- 4. inference.py must emit [START], [STEP], [END] log format exactly — no deviations.
114
- 5. Hard grader must use temperature=0 and a fixed rubric prompt stored as a constant.
115
- 6. DB must auto-populate on first Docker run without manual intervention.
116
- 7. All Pydantic models must be fully typed no `Any`, no `dict` without a model.
117
- 8. Episode step limit is 10. Hard cap. Enforce in environment.py.
 
 
 
 
 
 
 
 
 
 
 
118
 
119
  ---
120
 
121
- ## Before You Start Each File
122
 
123
- 1. Use Context7 MCP to look up the relevant library documentation
124
- 2. Check if the schema/interface you are about to implement has dependencies on already-built files import them, don't reimplement
125
- 3. If you need to make a design choice not covered in this prompt (e.g. exact DB column types, traversal tie-breaking, summary format), **ask the user before proceeding**
126
- 4. Write tests alongside implementation — not after
 
 
 
 
 
 
127
 
128
  ---
129
 
130
- ## Questions To Ask The User Before Starting
131
-
132
- If any of the following are unclear, ask before building:
133
 
134
- - What Python codebase should be used as the demo target? (default: the sample_codebase/ provided)
135
- - Should the hard grader use the same MODEL_NAME from env vars, or a fixed model?
136
- - Should REQUEST_CONTEXT return the full raw code or the full AST + raw code?
137
- - Should AMEND_REVIEW require the agent to specify what was wrong with the original review?
138
- - What is the maximum number of neighbors to include in an observation? (recommend: 5, confirm)
 
1
+ # Builder Prompt — GraphReview RL Environment
2
 
3
+ You are an expert Python engineer building a production-quality RL environment for a competitive hackathon (OpenEnv Round 1). You have one job: build the GraphReview environment correctly, phase by phase, without breaking prior work.
4
 
5
  ---
6
 
7
  ## What You Are Building
8
 
9
+ An OpenEnv-compliant RL environment where an LLM agent reviews Python code with full dependency graph awareness. The environment parses a Python codebase into a persistent SQLite-backed dependency graph, pre-computes ground truth linter flags, and exposes a step()/reset()/state() API for an agent to interact with.
10
 
11
+ This is online RL — no training dataset is needed. The ground truth (pylint/bandit/pyflakes results) is computed once at seed time and stored in SQLite. The agent explores the environment and receives rewards compared against that ground truth.
 
 
 
 
 
 
 
12
 
13
+ The full phase plan and architecture are provided below. Read the entire plan before writing a single line of code.
14
 
15
  ---
16
 
17
+ ## Your Operating Rules
18
 
19
+ 1. **Before building each phase, read the full plan for that phase.** Do not start coding until you understand what the phase produces and what its success criteria are.
20
 
21
+ 2. **Ask me questions before starting if any of the following are unclear:**
22
+ - A design decision that affects DB schema or file structure
23
+ - Anything that would be hard to change later (interfaces, Pydantic models, DB tables)
24
+ - Ambiguity in how two components interact
25
+ Do NOT ask about low-level implementation details — choose the best approach yourself.
26
 
27
+ 3. **Use context7 MCP to look up documentation** for: openenv-core, SQLAlchemy, NetworkX, Pyvis, astroid, pylint API, FastAPI, Pydantic v2. Do not rely on memory for library APIs — always verify.
28
 
29
+ 4. **One phase at a time.** Complete a phase fully before moving to the next. Each phase has explicit success criteria — verify them before declaring a phase done.
30
 
31
+ 5. **Never break prior phases.** If a later phase requires changing an earlier interface, explicitly flag it, explain why, and get confirmation before making the change.
32
 
33
+ 6. **DB is the source of truth.** All state lives in SQLite. Nothing important lives only in memory. reset() clears only task-run annotations — never re-parses the codebase.
 
 
 
 
 
 
 
 
34
 
35
+ 7. **Token budget is a hard constraint.** No observation may exceed 2000 tokens. Enforce this in token_budget.py — do not leave it as a soft guideline.
36
 
37
+ 8. **Graders must be deterministic.** Easy and medium graders: zero LLM calls, same input always produces same output. Hard grader: temperature=0, document prompt hash. Test this explicitly.
38
 
39
+ 9. **inference.py log format is mandatory.** [START], [STEP], [END] format must be exact. Any deviation causes evaluation failure. Treat this as a contract.
40
 
41
+ 10. **Write clean, typed Python.** All functions typed. All Pydantic models complete. No `Any` types unless unavoidable with explanation.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
42
 
43
  ---
44
 
45
+ ## Phase Plan
 
 
46
 
47
+ [INSERT FULL PHASE PLAN HERE paste the contents of the phase plan artifact]
48
 
49
  ---
50
 
51
+ ## Sample Project Specification
52
 
53
+ The sample_project/ directory must contain exactly these files with these injected bugs:
54
+
55
+ ```
56
+ auth.py validate_token() can return None (not handled)
57
+ checkout.py — calls auth.validate_token(), doesn't check for None
58
+ cart.py — style violations only (PEP8)
59
+ config.py — missing required key in get_config() (root cause of cascade)
60
+ database.py — SQL query built with string concatenation (SQL injection)
61
+ utils.py — unused imports, dead code
62
+ models.py — clean file (no issues, tests APPROVE path)
63
+ payments.py — depends on checkout.py, inherits None risk
64
+ api.py — depends on auth.py and checkout.py
65
+ main.py — entry point, light glue code
66
+ ```
67
+
68
+ Task mapping:
69
+ - easy_task: cart.py (style only)
70
+ - medium_task: checkout.py + auth.py (null reference)
71
+ - hard_task: config.py → auth.py → checkout.py (cascade)
72
 
73
  ---
74
 
75
+ ## Tech Stack
76
 
77
+ - Python 3.11
78
+ - SQLite via SQLAlchemy ORM
79
+ - NetworkX + astroid + Python ast
80
+ - pylint + bandit + pyflakes
81
+ - Pyvis for visualization
82
+ - Pydantic v2
83
+ - FastAPI
84
+ - OpenAI client (inference.py + hard grader judge)
85
+ - openenv-core
86
+ - context7 MCP for all library lookups
87
 
88
  ---
89
 
90
+ ## Start Instructions
 
 
91
 
92
+ Begin with Phase 1. Before writing any code:
93
+ 1. Use context7 MCP to look up: openenv-core spec, SQLAlchemy ORM setup, astroid API
94
+ 2. Ask me any design questions that affect DB schema or file structure
95
+ 3. Confirm the sample_project file list with me if you want to adjust it
96
+ 4. Then build Phase 1 completely and verify all success criteria before stopping
Debugger.md CHANGED
@@ -1,100 +1,88 @@
1
- # Debugger Prompt — CodeReviewEnv
2
 
3
- You are an expert Python debugger working on **CodeReviewEnv**, an OpenEnv-compliant RL environment for the OpenEnv Hackathon. Your job is to diagnose and fix issues without breaking the architecture.
4
 
5
  ---
6
 
7
- ## Project Summary
8
 
9
- This is a reinforcement learning environment where an LLM agent reviews Python codebases using a persistent dependency graph. The graph is stored in SQLite via SQLModel. The RL loop uses OpenEnv's step()/reset()/state() spec. There are 3 tasks (easy/medium/hard) with deterministic graders. The inference script must run in under 20 minutes on 2 vCPU / 8GB RAM.
 
 
 
 
 
 
 
10
 
11
  ---
12
 
13
- ## Architecture Rules — Never Violate These When Fixing
14
 
15
- 1. **Persistence is SQLite/SQLModel** do not switch to in-memory or another DB to fix a bug
16
- 2. **Neighbor observations are always compressed summaries** — never fix a context issue by passing full neighbor code
17
- 3. **Rewards must always be in 0.0–1.0** — if a reward bug exists, fix the computation, never remove the clip
18
- 4. **inference.py uses OpenAI client only** — do not swap to direct HTTP calls or another client
19
- 5. **[START]/[STEP]/[END] log format is fixed** — do not change field names or ordering to fix a logging bug
20
- 6. **Hard grader uses temperature=0 and fixed rubric** — do not relax this to fix flaky test failures
21
- 7. **episode step limit is 10** — do not raise this to fix timeout issues, optimize the agent instead
22
 
23
- ---
24
 
25
- ## How To Approach Any Bug
26
 
27
- ### Step 1 Locate
28
- - Identify which layer the bug is in: parser → db → graph → observation_builder → environment → grader → server → inference
29
- - Do not assume the bug is where the error surfaces — trace back to root cause
30
 
31
- ### Step 2 Check Interfaces First
32
- - Before changing implementation, verify the interface contract between the broken component and its dependencies
33
- - Use Context7 MCP to re-check library APIs if the bug involves SQLModel, NetworkX, pylint, bandit, FastAPI, or OpenEnv
34
- - Do not fix a bug by changing a shared interface without checking all callers
35
 
36
- ### Step 3Fix Minimally
37
- - Fix the smallest possible change that resolves the issue
38
- - If the fix requires changing a DB schema, check whether a migration is needed and write it
39
- - If the fix changes a Pydantic model, check all serialization/deserialization paths
40
 
41
- ### Step 4Verify
42
- - After fixing, confirm the completion criteria for the relevant phase still pass
43
- - Run the specific test for the broken component
44
- - If inference.py is affected, do a dry run and confirm [START]/[STEP]/[END] logs emit correctly
45
 
46
- ---
47
 
48
- ## Common Failure Modes To Check First
49
 
50
- ### DB / Persistence
51
- - DB not found on startup → check migrations.py auto-init logic
52
- - Graph loads empty on second run → check upsert_node is committing correctly
53
- - Annotations not persisting across reset() → check reset() only clears annotations, not nodes/edges
54
 
55
- ### Parser
56
- - AST parser crashes on type-annotated functions → check handling of ast.Constant vs ast.Str in Python 3.11
57
- - Linter returns no output → check pylint/bandit are installed in the Docker image and PATH is correct
58
- - Import resolution fails on relative imports → check the resolver handles both absolute and relative imports
59
 
60
- ### RL Environment
61
- - Reward outside 0.0–1.0 → find the unclipped computation in reward.py
62
- - done never becomes True → check step limit counter and REQUEST_CHANGES/APPROVE handling
63
- - reset() returns wrong module → check task registry is loading the correct starting module
64
 
65
- ### Graders
66
- - Easy grader always returns 0 → check linter_flags were populated in DB during parsing
67
- - Hard grader is non-deterministic → confirm temperature=0 and seed param is being passed
68
- - Grader crashes on empty annotation → add null check before scoring
69
 
70
- ### Server
71
- - /health returns 404 check route is registered in app.py
72
- - /step rejects valid action check discriminated union deserialization in Pydantic v2
73
- - openenv validate fails → check openenv.yaml field names against spec exactly
74
 
75
- ### Inference Script
76
- - Runs over 20 minutes profile which task is slowest, reduce max steps or add timeout per episode
77
- - LLM returns unparseable action check JSON mode is enabled, add fallback to APPROVE
78
- - Missing [STEP] logs → check log emit is inside the step loop, not outside
79
 
80
- ### Docker
81
- - Build fails on pylint/bandit install add gcc and build-essential to apt-get
82
- - DB not found inside container check WORKDIR and DB path are consistent
83
- - Port not exposed → confirm EXPOSE 7860 and uvicorn binds to 0.0.0.0
84
 
85
- ---
 
 
 
 
 
 
86
 
87
- ## When You Find An Ambiguity
 
 
88
 
89
- If fixing the bug requires a design decision (e.g. "should reset() preserve REQUEST_CONTEXT history?"), **ask the user before implementing**. Do not make silent architectural decisions while debugging.
 
 
90
 
91
  ---
92
 
93
- ## Context To Always Include When Reporting A Fix
 
 
 
 
 
 
94
 
95
- After fixing, always report:
96
- - What the root cause was (one sentence)
97
- - Which file(s) were changed
98
- - Whether any DB schema changed (and if so, whether a migration was added)
99
- - Whether any Pydantic model interface changed (and if so, which callers were updated)
100
- - The specific test or check that now passes
 
1
+ # Debugger Prompt — GraphReview RL Environment
2
 
3
+ You are an expert Python debugger working on a competitive hackathon RL environment called GraphReview. Your job is to diagnose and fix bugs without breaking existing working functionality.
4
 
5
  ---
6
 
7
+ ## Project Context
8
 
9
+ GraphReview is an OpenEnv-compliant RL environment. It:
10
+ - Parses Python codebases into a SQLite-backed NetworkX dependency graph
11
+ - Pre-computes linter ground truth (pylint/bandit/pyflakes) at seed time
12
+ - Exposes step()/reset()/state() for an LLM agent to review code
13
+ - Scores agent actions against stored ground truth via deterministic graders
14
+ - Outputs an annotated graph visualization via Pyvis
15
+
16
+ The DB is the source of truth. Pydantic v2 models define all interfaces. FastAPI wraps the environment for HTTP. inference.py runs the baseline agent.
17
 
18
  ---
19
 
20
+ ## Your Operating Rules
21
 
22
+ 1. **Diagnose before fixing.** State exactly what is wrong and why before writing any fix. One sentence minimum: "The bug is X because Y."
 
 
 
 
 
 
23
 
24
+ 2. **Minimal surface area.** Fix only what is broken. Do not refactor, rename, or improve unrelated code while fixing a bug.
25
 
26
+ 3. **Check DB integrity first** for any bug involving missing data, wrong rewards, or incorrect state. Run: `SELECT * FROM seed_meta` to verify seeded flag. Check `modules`, `edges`, `linter_flags` are populated before assuming code is wrong.
27
 
28
+ 4. **Use context7 MCP** to verify library APIs before assuming a bug is in your code. Many bugs come from incorrect assumptions about SQLAlchemy session handling, Pydantic v2 validation, or NetworkX graph methods.
 
 
29
 
30
+ 5. **Never re-seed unless explicitly told to.** Re-seeding takes 30s and loses demo state. If a bug looks like a seeding issue, verify first.
 
 
 
31
 
32
+ 6. **Grader determinism is sacred.** If a grader produces different results across runs, that is a critical bug fix it before anything else. Check: temperature settings, prompt variability, random seeds.
 
 
 
33
 
34
+ 7. **Do not change Pydantic model field names or types** without explicitly flagging it. These are shared interfaces changing them breaks step()/reset()/state() and inference.py simultaneously.
 
 
 
35
 
36
+ 8. **inference.py log format is a contract.** [START]/[STEP]/[END] field names and order must never change. If a bug is in inference.py, fix the logic without changing the log format.
37
 
38
+ 9. **After fixing, state what you changed and why**, and identify any other components that might be affected by the change.
39
 
40
+ 10. **If the bug requires a design change** (not just a code fix), say so clearly. Do not silently implement a design change as if it were a bug fix.
 
 
 
41
 
42
+ ---
 
 
 
43
 
44
+ ## Common Bug Patterns in This Project
 
 
 
45
 
46
+ **DB not seeded / partial seed**
47
+ - Symptom: KeyError on module_id, empty linter_flags, missing edges
48
+ - Check: seed_meta table for seeded=true, verify row counts in modules and edges
 
49
 
50
+ **Pydantic v2 validation errors**
51
+ - Symptom: ValidationError on step() or reset()
52
+ - Check: field types match exactly, Optional fields have defaults, JSON fields are dicts not strings
 
53
 
54
+ **NetworkX graph not reconstructed from DB**
55
+ - Symptom: graph_manager returns empty neighbors, traversal order is wrong
56
+ - Check: edges table has rows, graph_manager.load_graph() is called before queries
 
57
 
58
+ **Grader returning out-of-range reward**
59
+ - Symptom: reward > 1.0 or < -1.0
60
+ - Check: reward aggregation logic, episode completion bonus not double-applied
 
61
 
62
+ **Token budget exceeded**
63
+ - Symptom: LLM returns truncated or incoherent response
64
+ - Check: token_budget.py is being called, observation summaries not using raw code
65
+
66
+ **Hard grader non-determinism**
67
+ - Symptom: different scores for identical inputs
68
+ - Check: temperature=0 set on judge API call, system prompt is static string not f-string with variables
69
 
70
+ **inference.py timeout (>20 min)**
71
+ - Symptom: evaluation fails on judge's machine
72
+ - Check: REQUEST_CONTEXT actions in inference loop causing extra API calls, batching strategy
73
 
74
+ **reset() clearing too much**
75
+ - Symptom: graph annotations from prior tasks lost after reset
76
+ - Check: reset() filters by task_id when deleting review_annotations, not deleting all rows
77
 
78
  ---
79
 
80
+ ## How to Use This Prompt
81
+
82
+ Paste this prompt, then describe:
83
+ 1. What you were trying to do
84
+ 2. What happened instead (error message, wrong output, wrong reward value)
85
+ 3. Which phase/file the bug is in
86
+ 4. What you already tried
87
 
88
+ Then share the relevant code. I will diagnose and fix it.
 
 
 
 
 
OpenEnv ADDED
@@ -0,0 +1 @@
 
 
1
+ Subproject commit c719decf2b19175d5ca35301d58a14c83e985480
Phases.md CHANGED
@@ -1,295 +1,433 @@
1
- # CodeReviewEnv — Phased Build Plan
2
- ## For: LLM-Assisted Development
3
 
4
  ---
5
 
6
- ## 🧠 What You Are Building
7
 
8
- An OpenEnv-compliant reinforcement learning environment where an LLM agent learns to perform **dependency-aware code review**.
9
 
10
- The environment parses a Python codebase into a **persistent dependency graph** (nodes = modules, edges = import relationships). Each node stores compressed AST summaries, linter-generated ground truth issues, and agent-written review annotations.
 
 
 
 
 
 
 
11
 
12
- The agent reviews one module per episode. It receives the **full code of the current module** plus **compressed AST summaries of its neighbors** (never full neighbor code — token budget). It takes multi-step actions (flag bugs, add comments, request context, amend upstream reviews). The environment rewards correct, well-attributed findings and penalizes false positives.
13
 
14
- The final output is an **annotated dependency graph** a machine-readable + human-readable map of the entire codebase with reviews on every module, including cross-module causal attributions.
15
-
16
- This is differentiated from tools like CodeRabbit because:
17
- - It models cascading dependency bugs (bug in B caused by design in A)
18
- - Reviews are stored back into the graph and can be amended as agent learns more
19
- - It is an RL training/evaluation environment, not a static analysis tool
20
- - The agent learns a policy over multi-step decisions, not a single LLM call
21
 
22
  ---
23
 
24
- ## 🗂️ Persistence Strategy
25
 
26
- **Use SQLite via SQLModel** for all persistent state. Do NOT reparse the codebase on every run. The database stores:
27
- - Parsed module nodes (code, AST summary, linter flags)
28
- - Graph edges (dependency relationships + reasons)
29
- - Review annotations (written by agent, updatable)
30
- - Episode history (for reproducibility)
31
- - Task definitions and ground truth
32
 
33
- On startup: check if DB exists → if yes, load graph from DB → if no, parse codebase and populate DB.
34
 
35
- This makes demos fast (parse once, review many times) and makes `reset()` cheap (clear annotations only, keep graph structure).
 
 
 
 
 
 
 
 
 
 
 
 
 
 
36
 
37
  ---
38
 
39
- ## 📁 Target Project Structure
40
 
41
  ```
42
- code-review-env/
43
- ├── openenv.yaml
44
- ├── Dockerfile
45
- ├── README.md
46
- ├── inference.py # Required by spec, root level
47
- ├── requirements.txt
48
- ├── pyproject.toml
49
-
50
- ├── env/
51
- │ ├── __init__.py
52
- │ ├── environment.py # Main CodeReviewEnv class
53
- │ ├── models.py # Pydantic: Observation, Action, Reward, GraphState
54
- │ ├── graph.py # Graph construction, traversal, compression
55
- │ ├── observation_builder.py # Assembles tiered observation per step
56
- │ └── reward.py # Reward computation logic
57
-
58
- ├── db/
59
- │ ├── __init__.py
60
- │ ├── schema.py # SQLModel table definitions
61
- │ ├── store.py # DB read/write operations
62
- │ └── migrations.py # Init and seed scripts
63
-
64
- ├── parser/
65
- │ ├── __init__.py
66
- │ ├── ast_parser.py # AST extraction: signatures, imports, classes
67
- │ ├── linter.py # Pylint + Bandit runner, stores results to DB
68
- │ └── summarizer.py # Converts AST output → compressed node summary
69
-
70
- ├── graders/
71
- │ ├── __init__.py
72
- │ ├── base_grader.py # Abstract grader interface
73
- │ ├── easy_grader.py # Linter match — fully deterministic
74
- │ ├── medium_grader.py # AST + line attribution match
75
- │ └── hard_grader.py # LLM-as-judge, temp=0, seed=42, rubric-constrained
76
-
77
- ├── tasks/
78
- │ ├── __init__.py
79
- │ ├── task_registry.py # Registers and loads tasks
80
- │ ├── easy_task.py # Style/linter issue in isolated module
81
- │ ├── medium_task.py # Logic bug with direct dependency context
82
- │ └── hard_task.py # Cascading bug across 2+ modules
83
-
84
- ├── server/
85
- │ ├── __init__.py
86
- │ └── app.py # FastAPI server exposing OpenEnv HTTP endpoints
87
-
88
- ├── sample_codebase/ # Synthetic test codebase for demo
89
  │ ├── auth.py
90
  │ ├── checkout.py
91
  │ ├── cart.py
92
- │ ├── payments.py
93
- │ └── config.py
94
-
95
- ── tests/
96
- ├── test_parser.py
97
- ├── test_graders.py
98
- ── test_environment.py
99
- ── test_inference.py
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
100
  ```
101
 
102
  ---
103
 
104
- ## 📐 Core Data Models (Design Intent Implementation Is Your Choice)
105
-
106
- ### Graph Node
107
- Stores everything about one module. Persisted in DB.
108
- - module_id (filename/path)
109
- - raw_code (full source)
110
- - ast_summary (compressed: signatures, classes, exports)
111
- - linter_flags (pre-computed ground truth from pylint/bandit)
112
- - dependency_reason (why this module needs its neighbors extracted from import context)
113
- - review_annotation (agent-written, nullable, updatable)
114
- - review_status (pending | in_progress | reviewed)
115
- - review_summary (one-line, written at episode end)
116
-
117
- ### Graph Edge
118
- - source_module_id
119
- - target_module_id
120
- - edge_type (explicit_import | implicit_name_resolution)
121
- - import_line (the actual import statement)
122
- - weight (1.0 explicit, 0.5 implicit)
123
-
124
- ### Observation (Pydantic)
125
- - current_module: full code + full AST summary
126
- - direct_dependencies: list of compressed node summaries (NOT full code)
127
- - dependents: list of compressed node summaries
128
- - existing_reviews: list of one-line review summaries from already-reviewed neighbors
129
- - constraint_flags: any known forced decisions from upstream
130
- - step_number: int
131
- - episode_id: str
132
-
133
- ### Action (Pydantic, discriminated union)
134
- - APPROVE
135
- - FLAG_STYLE(line: int, description: str)
136
- - FLAG_BUG(line: int, description: str)
137
- - FLAG_SECURITY(line: int, description: str)
138
- - FLAG_DEPENDENCY_ISSUE(source_module: str, description: str)
139
- - ADD_COMMENT(text: str)
140
- - REQUEST_CHANGES(summary: str)
141
- - REQUEST_CONTEXT(module_id: str) ← costs -0.1 reward, returns full code of neighbor
142
- - AMEND_REVIEW(module_id: str, note: str) ← retroactively updates neighbor annotation
143
-
144
- ### Reward (Pydantic)
145
- - value: float (0.0–1.0)
146
- - reason: str
147
- - cumulative: float
 
 
 
 
 
 
 
 
 
 
 
 
148
 
149
  ---
150
 
151
- ## 🏗️ PHASE 1 Foundation & Persistence
152
- **Goal: Database schema, parser, graph construction. No RL yet.**
153
 
154
- ### Tasks
155
- 1. Define SQLModel schema for all tables (nodes, edges, annotations, episodes, tasks)
156
- 2. Build `ast_parser.py` — extract from any .py file: all function signatures with type hints, all class definitions, all import statements with source resolution, all module-level constants
157
- 3. Build `linter.py` run pylint and bandit programmatically on a file, parse output into structured list of {line, severity, code, message}. Store results directly to DB as ground truth.
158
- 4. Build `summarizer.py` — convert AST output into a compressed summary string under 100 tokens. Format: "exports: [fn(args)->return, ...] | issues: N | depends_on: [module, ...]"
159
- 5. Build `store.py` CRUD operations for all tables. Key operations: upsert_node, upsert_edge, get_node_with_neighbors, update_annotation, get_full_graph
160
- 6. Build `graph.py` — on first run: parse all files in target directory → populate DB. On subsequent runs: load from DB. Build NetworkX DiGraph from DB records. Implement traversal order: topological sort weighted by betweenness centrality (leaf modules first, high-centrality modules last).
161
- 7. Build `sample_codebase/` — 5 Python files with known injected issues: one style issue, one logic bug with a direct dependency cause, one security issue, one cascading bug where the root cause is 2 hops away. Document every injected issue in a ground_truth.json file.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
162
 
163
- ### Completion Criteria
164
- - `python -m parser.ast_parser sample_codebase/` populates DB with all nodes and edges
165
- - DB persists across runs (second run loads from DB, does not reparse)
166
- - `python -m db.store` can query a node and return its summary and neighbors
167
- - ground_truth.json matches linter output for easy/medium tasks
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
168
 
169
  ---
170
 
171
- ## 🏗️ PHASE 2 — OpenEnv Core (RL Environment)
172
- **Goal: Full step()/reset()/state() loop with reward. This is the RL part.**
173
-
174
- ### Tasks
175
- 1. Build `models.py` all Pydantic models: Observation, Action (discriminated union), Reward, GraphState, EpisodeRecord. Must be fully typed.
176
- 2. Build `observation_builder.py` — given a module_id and current graph state, assemble the tiered observation: full code for current module, compressed summaries for neighbors (pulled from DB), existing review annotations for already-reviewed neighbors, constraint flags
177
- 3. Build `reward.py` — implement reward logic:
178
- - Easy: compare agent flags against linter ground truth. Correct flag = +0.5, false positive = -0.2, missed critical = -0.4
179
- - Medium: check flag + line number within ±3 lines of ground truth = +0.5, correct comment attribution = +0.3
180
- - Hard: call hard_grader with agent's FLAG_DEPENDENCY_ISSUE and the known root cause. Score returned by judge × 0.8 as reward.
181
- - REQUEST_CONTEXT action always costs -0.1 (thinking cost)
182
- - AMEND_REVIEW with correct attribution = +0.4 (high reward this is the key cascading behavior)
183
- - Episode completion bonus: +0.2 if all critical issues found, -0.1 if APPROVE on module with known critical bugs
184
- 4. Build `graders/` — implement all three graders per spec above. Hard grader must use OpenAI client (per competition spec), temperature=0, fixed rubric prompt stored as a constant.
185
- 5. Build `environment.py` — main class implementing full OpenEnv interface:
186
- - `reset(task_id)` → clears annotations for task modules, returns first observation
187
- - `step(action)` validates action, updates graph annotations in DB, computes reward, returns (obs, reward, done, info)
188
- - `state()` returns full GraphState (serialized NetworkX graph + all annotations)
189
- - Episode ends when: agent calls APPROVE or REQUEST_CHANGES, OR step limit reached (max 10 steps)
190
- 6. Build `tasks/` register 3 tasks pointing to specific modules in sample_codebase with known ground truth issues
191
-
192
- ### Completion Criteria
193
- - `env.reset("easy_task")` returns a valid typed Observation
194
- - `env.step(FLAG_BUG(line=12, description="null risk"))` returns reward > 0 for correct flag
195
- - `env.state()` returns serializable graph with annotations
196
- - Full episode runs without error on all 3 tasks
197
- - Reward values all fall in 0.0–1.0 range
198
 
199
  ---
200
 
201
- ## 🏗️ PHASE 3 — HTTP Server & OpenEnv Spec Compliance
202
- **Goal: Wrap environment in FastAPI, pass openenv validate.**
203
 
204
- ### Tasks
205
- 1. Build `server/app.py` FastAPI app exposing:
206
- - POST /reset calls env.reset(), returns Observation JSON
207
- - POST /step → calls env.step(action), returns (obs, reward, done, info) JSON
208
- - GET /state calls env.state(), returns GraphState JSON
209
- - GET /healthreturns 200 (required for HF Space ping)
210
- 2. Build `openenv.yaml` fill all required metadata: name, version, description, tasks list, observation_space, action_space, reward_range
211
- 3. Run `openenv validate` — fix all compliance errors
212
- 4. Confirm all Pydantic models serialize/deserialize correctly over HTTP
213
 
214
- ### Completion Criteria
215
- - `openenv validate` passes with no errors
216
- - All endpoints return correct typed responses
217
- - GET /health returns 200
218
 
219
  ---
220
 
221
- ## 🏗️ PHASE 4 — Inference Script
222
- **Goal: Build inference.py that runs Gemma 4 as the agent. This is what judges auto-run.**
223
 
224
- ### Critical Requirements (Non-Negotiable)
225
- - File must be named `inference.py` at root
226
- - Use OpenAI client for all LLM calls
227
- - Read API_BASE_URL, MODEL_NAME, HF_TOKEN from environment variables
228
- - Emit structured stdout logs in EXACTLY this format:
229
  ```
230
- [START] task=<task_id> episode=<n>
231
- [STEP] step=<n> action=<action_type> reward=<float> cumulative=<float>
232
- [END] task=<task_id> total_reward=<float> steps=<n>
 
 
 
 
 
 
 
 
233
  ```
234
- - Must complete all 3 tasks in under 20 minutes total
235
- - Must run on 2 vCPU / 8GB RAM
236
-
237
- ### Tasks
238
- 1. Build the agent loop — for each task: reset env, loop step() until done, collect rewards
239
- 2. Build the LLM action parser — send observation to model with a structured prompt, parse response into typed Action. Use JSON mode or structured output. Handle parse failures gracefully (default to APPROVE with penalty).
240
- 3. Build the action prompt — system prompt explaining the environment, action space, and output format. Include the compressed observation in user message. Tell model to output JSON action only.
241
- 4. Implement all 3 task runs sequentially
242
- 5. Emit all required log lines to stdout
243
- 6. Final output: baseline scores for all 3 tasks printed to stdout
244
-
245
- ### Completion Criteria
246
- - Script runs end to end without error
247
- - All [START]/[STEP]/[END] logs emitted correctly
248
- - Produces a score for each task between 0.0–1.0
249
- - Completes in under 20 minutes
250
 
251
  ---
252
 
253
- ## 🏗️ PHASE 5 Containerization & Deployment
254
- **Goal: Docker build works, HF Space deploys, pre-validation script passes.**
255
-
256
- ### Tasks
257
- 1. Write `Dockerfile`:
258
- - Base: python:3.11-slim
259
- - Install system deps for pylint, bandit, networkx
260
- - Copy project, install requirements
261
- - On container start: run parser to populate DB if not exists, then start FastAPI server
262
- - Expose port 7860 (HF Spaces default)
263
- 2. Write `README.md` with all required sections: environment description and motivation, observation and action space definitions, all 3 task descriptions with difficulty, setup instructions, baseline scores
264
- 3. Run pre-submission validation script fix all failures
265
- 4. Deploy to HF Space with `openenv push`
266
- 5. Confirm Space URL returns 200 on GET /health and responds to POST /reset
267
-
268
- ### Completion Criteria
269
- - `docker build .` succeeds
270
- - `docker run -p 7860:7860` starts server cleanly
271
- - HF Space URL responds to reset()
272
- - Pre-validation script passes all checks
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
273
 
274
  ---
275
 
276
- ## ⏱️ Suggested Time Allocation (Given ~36hrs remaining)
 
 
 
 
 
 
 
 
277
 
278
- | Phase | Time |
279
- |---|---|
280
- | Phase 1 Foundation | 6 hrs |
281
- | Phase 2 RL Environment | 8 hrs |
282
- | Phase 3 Server + Spec | 3 hrs |
283
- | Phase 4 — Inference Script | 4 hrs |
284
- | Phase 5 — Docker + Deploy | 3 hrs |
285
- | Buffer / debugging | 4 hrs |
286
 
287
  ---
288
 
289
- ## ⚠️ Known Risk Areas (Watch These)
 
 
 
 
 
 
 
 
 
 
 
 
290
 
291
- 1. **Hard grader reproducibility** document judge prompt and seed explicitly
292
- 2. **DB migration on fresh Docker build** first run must auto-populate DB from sample_codebase
293
- 3. **Inference script runtime** — test full 3-task run locally before submitting, must be under 20 min
294
- 4. **openenv validate strictness** — run it early in Phase 3, not at the end
295
- 5. **Reward always in 0.0–1.0** clip all reward values, graders must never return outside range
 
 
 
 
 
 
 
 
 
 
 
1
+ # GraphReview RL Environment Complete Phased Build Plan v2
 
2
 
3
  ---
4
 
5
+ ## What You Are Building
6
 
7
+ An OpenEnv-compliant RL environment where an LLM agent learns to review Python code with full dependency graph awareness. The environment:
8
 
9
+ 1. Parses a Python codebase into a **persistent dependency graph** stored in SQLite
10
+ 2. Splits large files (>300 lines) into sub-nodes by class/function to keep observations manageable
11
+ 3. Pre-computes ground truth linter flags (pylint + bandit + pyflakes) per node at seed time
12
+ 4. Presents the agent with one module at a time + compressed AST summaries of neighbors
13
+ 5. Receives structured actions (FLAG_BUG, ADD_COMMENT, REQUEST_CONTEXT, etc.)
14
+ 6. Scores actions against pre-computed ground truth — no training data needed, ground truth IS the data
15
+ 7. Accumulates review annotations back onto graph nodes in SQLite
16
+ 8. Outputs an annotated dependency graph visualized via Pyvis (interactive HTML) + markdown report
17
 
18
+ **The RL loop:** Agent takes multi-step actions per module episode, receives per-step rewards, learns to reason about cascading dependency issues. This is online RL the environment generates interaction data live. No pre-existing dataset required.
19
 
20
+ **The key differentiator vs CodeRabbit:** Agent sees WHY a decision was made (upstream context) before flagging it. Reviews are stored back into the graph. Agent can AMEND earlier reviews as it learns more about root causes downstream.
 
 
 
 
 
 
21
 
22
  ---
23
 
24
+ ## Why No Training Data Is Needed
25
 
26
+ This is online RL, not offline supervised learning:
27
+ - Ground truth = pylint/bandit/pyflakes results, computed once at seed time, stored in DB
28
+ - Agent explores environment receives rewards → that interaction IS the training signal
29
+ - For Round 1, the baseline inference script evaluates a pre-trained LLM (Gemma 4 E4B) acting as agent
30
+ - You are not training a model — you are building the environment that COULD train one
31
+ - The three graders define what "correct behavior" looks like — that is your data
32
 
33
+ ---
34
 
35
+ ## Tech Stack (Fixed)
36
+
37
+ - Python 3.11
38
+ - OpenEnv: step() / reset() / state() + Pydantic typed models + openenv.yaml
39
+ - SQLite via SQLAlchemy ORM (persistent, file-based, ships in Docker)
40
+ - NetworkX for graph operations and traversal
41
+ - Python built-in `ast` module for structure extraction
42
+ - `astroid` for scope-aware name resolution and intra-file conflict detection
43
+ - pylint + bandit + pyflakes for ground truth generation (run once at seed time)
44
+ - Pyvis for interactive graph visualization
45
+ - OpenAI client (inference.py + hard task LLM judge)
46
+ - Gemma 4 E4B as baseline agent model
47
+ - FastAPI for HTTP server (required for HF Spaces)
48
+ - Docker + Hugging Face Spaces
49
+ - context7 MCP for library documentation during build
50
 
51
  ---
52
 
53
+ ## File Structure
54
 
55
  ```
56
+ graphreview/
57
+ ├── sample_project/ # synthetic input codebase with injected bugs
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
58
  │ ├── auth.py
59
  │ ├── checkout.py
60
  │ ├── cart.py
61
+ │ ├── database.py
62
+ │ └── ...
63
+ ├── parser/
64
+ │ ├── ast_parser.py # extract signatures, imports, classes per file
65
+ ├── chunker.py # split files >300 lines into sub-nodes
66
+ ├── graph_builder.py # build NetworkX DiGraph from parsed output
67
+ │ └── summarizer.py # compress each node to ~50 token summary
68
+ ── db/
69
+ │ ├── database.py # SQLAlchemy engine, session factory
70
+ │ ├── models.py # ORM models for all tables
71
+ │ └── seed.py # parse once → store → skip if seeded
72
+ ├── graph/
73
+ │ ├── graph_manager.py # load graph from DB, traversal, neighbor queries
74
+ │ └── token_budget.py # enforce token limits on observations
75
+ ├── env/
76
+ │ ├── environment.py # CodeReviewEnv main class
77
+ │ ├── observation.py # Pydantic: CodeObservation
78
+ │ ├── action.py # Pydantic: ReviewAction
79
+ │ ├── reward.py # Pydantic: ReviewReward + reward table
80
+ │ └── state.py # Pydantic: GraphState
81
+ ├── graders/
82
+ │ ├── base_grader.py # abstract interface
83
+ │ ├── easy_grader.py # linter match (deterministic)
84
+ │ ├── medium_grader.py # AST + line attribution (deterministic)
85
+ │ └── hard_grader.py # graph consistency + LLM judge (temperature=0)
86
+ ├── tasks/
87
+ │ ├── task_registry.py # register 3 tasks
88
+ │ ├── easy_task.py # style/linter review
89
+ │ ├── medium_task.py # logic bug + direct dep context
90
+ │ └── hard_task.py # cascading bug across 2+ module hops
91
+ ├── visualizer/
92
+ │ ├── pyvis_renderer.py # NetworkX → interactive HTML graph
93
+ │ └── report_generator.py # markdown + JSON final report
94
+ ├── server.py # FastAPI wrapper for OpenEnv HTTP spec
95
+ ├── inference.py # baseline agent script (mandatory, root level)
96
+ ├── openenv.yaml # spec metadata
97
+ ├── Dockerfile
98
+ └── README.md
99
  ```
100
 
101
  ---
102
 
103
+ ## Database Schema (SQLitePersistent)
104
+
105
+ **modules**
106
+ ```
107
+ id TEXT PK (relative file path, or "file.py::ClassName" for sub-nodes)
108
+ name TEXT
109
+ code TEXT (full source full file or chunked section)
110
+ ast_summary JSON (signatures, classes, return types, decorators)
111
+ linter_flags JSON (pre-computed pylint+bandit+pyflakesGROUND TRUTH)
112
+ summary TEXT (~50 token natural language description)
113
+ parent_module_id TEXT NULL (set if this is a sub-node chunk of a larger file)
114
+ review_status TEXT (pending | in_progress | reviewed)
115
+ is_chunk BOOLEAN
116
+ ```
117
+
118
+ **edges**
119
+ ```
120
+ source_id TEXT FK modules.id
121
+ target_id TEXT FK modules.id
122
+ edge_type TEXT (explicit_import | implicit_dependency | intra_file)
123
+ import_line TEXT
124
+ dependency_reason TEXT
125
+ scope TEXT (module_level | function_level)
126
+ weight FLOAT (1.0 explicit, 0.5 implicit)
127
+ ```
128
+
129
+ **review_annotations**
130
+ ```
131
+ id INTEGER PK AUTOINCREMENT
132
+ module_id TEXT FK modules.id
133
+ task_id TEXT
134
+ action_type TEXT
135
+ content TEXT
136
+ reward_given FLOAT
137
+ attributed_to TEXT NULL (module_id for cascade attribution)
138
+ is_amendment BOOLEAN (true if this amends a prior review)
139
+ created_at TIMESTAMP
140
+ ```
141
+
142
+ **task_runs**
143
+ ```
144
+ id INTEGER PK AUTOINCREMENT
145
+ task_id TEXT
146
+ started_at TIMESTAMP
147
+ completed_at TIMESTAMP NULL
148
+ total_reward FLOAT
149
+ total_steps INTEGER
150
+ status TEXT (running | complete | failed)
151
+ ```
152
+
153
+ **seed_meta**
154
+ ```
155
+ key TEXT PK
156
+ value TEXT
157
+ ```
158
+ (stores seeded=true flag, seed timestamp, codebase hash)
159
 
160
  ---
161
 
162
+ ## Chunking Strategy for Large Files
 
163
 
164
+ ```
165
+ File 300 lines → one node, id = "filename.py"
166
+
167
+ File > 300 lines → chunk by top-level class or function
168
+ Each chunk becomes a sub-node:
169
+ id = "filename.py::ClassName" or "filename.py::function_name"
170
+ parent_module_id = "filename.py"
171
+
172
+ A virtual parent node is kept for the file itself
173
+ with no code but with all inter-file edges
174
+
175
+ Intra-file edges added between chunks:
176
+ if function_a calls function_b in same file →
177
+ edge(filename.py::function_a → filename.py::function_b, type=intra_file)
178
+
179
+ Dependency conflict detection (via astroid):
180
+ If import is used only inside one function → scope=function_level, weight=0.5
181
+ If import used at module level → scope=module_level, weight=1.0
182
+ Circular imports → flagged as edge with type=circular, added to linter_flags
183
+ ```
184
+
185
+ ---
186
 
187
+ ## Observation Token Budget
188
+
189
+ ```
190
+ Current module full code: ~800 tokens (hard cap, truncate with notice)
191
+ AST summary of current: ~100 tokens
192
+ Direct dependency summaries: ~50 tokens × up to 5 deps = 250 tokens
193
+ Dependent summaries: ~50 tokens × up to 3 = 150 tokens
194
+ Existing neighbor reviews: ~30 tokens × up to 4 = 120 tokens
195
+ Task description + action space: ~200 tokens
196
+ Buffer: ~280 tokens
197
+ ─────────────────────────────────────────────
198
+ Total: ~1900 tokens (well within E4B 128K window)
199
+ ```
200
+
201
+ If a module has >5 direct dependencies, rank by betweenness centrality and include top 5 only.
202
+
203
+ ---
204
+
205
+ ## Action Space
206
+
207
+ ```python
208
+ action_type options:
209
+ FLAG_STYLE # style/formatting issue
210
+ FLAG_BUG # logic error
211
+ FLAG_SECURITY # security vulnerability
212
+ FLAG_DEPENDENCY_ISSUE # issue caused by upstream module
213
+ ADD_COMMENT # explanation (requires content field)
214
+ REQUEST_CONTEXT # fetch full code of a neighbor (-0.1 reward cost)
215
+ REQUEST_CHANGES # end episode, verdict = changes needed
216
+ APPROVE # end episode, verdict = approved
217
+ AMEND_REVIEW # update a prior annotation on a neighbor node
218
+
219
+ Fields:
220
+ action_type: required
221
+ target_line: optional int
222
+ content: required for ADD_COMMENT, AMEND_REVIEW
223
+ attributed_to: optional module_id (for FLAG_DEPENDENCY_ISSUE, AMEND_REVIEW)
224
+ context_request: required for REQUEST_CONTEXT (module_id to fetch)
225
+ ```
226
+
227
+ ---
228
+
229
+ ## Reward Table
230
+
231
+ ```
232
+ Correct FLAG_* matching linter ground truth: +0.5
233
+ Accurate ADD_COMMENT (keyword match to linter desc): +0.3
234
+ FLAG_DEPENDENCY_ISSUE with correct attribution: +0.6
235
+ FLAG_DEPENDENCY_ISSUE wrong attribution: +0.1
236
+ AMEND_REVIEW correctly updating prior annotation: +0.4
237
+ REQUEST_CONTEXT (investigation cost): -0.1
238
+ False positive flag (no linter match): -0.2
239
+ APPROVE on module with unflagged critical issues: -1.0
240
+ REQUEST_CHANGES on clean module: -0.3
241
+ Episode completion bonus (all issues caught): +0.2
242
+ ```
243
+
244
+ ---
245
+
246
+ ## Grader Architecture
247
+
248
+ ### Easy Grader (fully deterministic)
249
+ - Load linter_flags JSON from DB for current module
250
+ - For each agent FLAG_* action: check if a matching linter flag exists (type + line ±3)
251
+ - Score per action, aggregate for episode
252
+ - No LLM call. Zero variance.
253
+
254
+ ### Medium Grader (fully deterministic)
255
+ - Easy grader logic PLUS:
256
+ - For ADD_COMMENT: extract keywords from linter flag description, check overlap with agent comment (Jaccard similarity > 0.3 = match)
257
+ - For line attribution: ±3 line tolerance
258
+ - Still no LLM call.
259
+
260
+ ### Hard Grader (quasi-deterministic)
261
+ - Graph consistency check (deterministic):
262
+ If FLAG_DEPENDENCY_ISSUE with attributed_to=X: verify edge(current → X) or edge(X → current) exists in graph
263
+ If no edge: reward = 0.0, feedback = "no dependency relationship found"
264
+ - LLM-as-judge (temperature=0, fixed rubric):
265
+ Separate API call to judge model (NOT the agent)
266
+ Fixed system prompt with scoring rubric
267
+ Scores cascade reasoning quality: 0.0 | 0.5 | 1.0
268
+ Document prompt hash in README for reproducibility
269
 
270
  ---
271
 
272
+ ## Three Tasks
273
+
274
+ ### Task 1: style_review (Easy)
275
+ - Input: single module with 3 pylint style violations
276
+ - Agent must: flag all 3 style issues
277
+ - No dependency context needed
278
+ - Grader: easy_grader only
279
+ - Expected baseline score: 0.7–0.9
280
+
281
+ ### Task 2: logic_review (Medium)
282
+ - Input: checkout.py with a null-reference bug
283
+ - auth.py (its dependency) has validate_token that can return None
284
+ - Agent must: flag the bug + add comment referencing the None return risk
285
+ - Grader: medium_grader
286
+ - Expected baseline score: 0.4–0.7
287
+
288
+ ### Task 3: cascade_review (Hard)
289
+ - Input: 3-module chain: config.py auth.py checkout.py
290
+ - Bug originates in config.py (missing key), propagates through auth.py, surfaces in checkout.py
291
+ - Agent must: flag issue in checkout.py AND attribute root cause to config.py
292
+ - Grader: hard_grader (graph consistency + LLM judge)
293
+ - Expected baseline score: 0.2–0.5
 
 
 
 
 
294
 
295
  ---
296
 
297
+ ## Visualization
 
298
 
299
+ ### Pyvis Interactive Graph (primary)
300
+ - Nodes colored by review_status: grey=pending, yellow=in_progress, green=approved, red=changes_requested
301
+ - Node size = number of dependents (centrality)
302
+ - Edge color: blue=explicit_import, orange=implicit, red=circular
303
+ - Edge thickness = weight (1.0 explicit, 0.5 implicit)
304
+ - Click nodeshows review_annotations panel
305
+ - Rendered as standalone HTML, embedded in HF Space
 
 
306
 
307
+ ### Final Report Output (end of all episodes)
308
+ - `graphreview_report.md`: per-module sections with verdict + issues + cascade attributions
309
+ - `graphreview_report.json`: machine-readable full graph + annotations
310
+ - `graphreview_graph.html`: pyvis interactive visualization
311
 
312
  ---
313
 
314
+ ## inference.py Log Format (Mandatory)
 
315
 
 
 
 
 
 
316
  ```
317
+ [START] task=cascade_review module_count=3
318
+ [STEP] module=checkout.py action=FLAG_BUG line=24 reward=0.5 cumulative=0.5
319
+ [STEP] module=checkout.py action=ADD_COMMENT content="null risk from auth" reward=0.3 cumulative=0.8
320
+ [STEP] module=checkout.py action=FLAG_DEPENDENCY_ISSUE attributed_to=auth.py reward=0.6 cumulative=1.4
321
+ [STEP] module=checkout.py action=REQUEST_CHANGES reward=0.2 cumulative=1.6 done=true
322
+ [STEP] module=auth.py action=FLAG_BUG line=15 reward=0.5 cumulative=2.1
323
+ [STEP] module=auth.py action=FLAG_DEPENDENCY_ISSUE attributed_to=config.py reward=0.6 cumulative=2.7
324
+ [STEP] module=auth.py action=REQUEST_CHANGES reward=0.2 cumulative=2.9 done=true
325
+ [STEP] module=config.py action=FLAG_BUG line=8 reward=0.5 cumulative=3.4
326
+ [STEP] module=config.py action=REQUEST_CHANGES reward=0.2 cumulative=3.6 done=true
327
+ [END] task=cascade_review total_reward=3.6 modules_reviewed=3 report=graphreview_report.md
328
  ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
329
 
330
  ---
331
 
332
+ ## Phase 1Persistence Layer & Sample Project
333
+ **Goal: Parse once, store forever, never re-parse**
334
+
335
+ Build:
336
+ - `sample_project/` — 10 Python files, ~50 functions total, with injected known bugs for each task
337
+ - `db/models.py` — all SQLAlchemy ORM models
338
+ - `db/database.py` engine setup, session factory, init_db()
339
+ - `db/seed.py` orchestrate full parse → lint → store pipeline
340
+ - `parser/ast_parser.py` extract structure per file using Python ast
341
+ - `parser/chunker.py` split files >300 lines by class/function into sub-nodes
342
+ - `parser/graph_builder.py` build NetworkX DiGraph, explicit + implicit edges
343
+ - `parser/summarizer.py`~50 token summaries per node
344
+
345
+ Success criteria:
346
+ - seed.py completes in <30s on sample_project
347
+ - Second run detects seeded flag, loads in <1s
348
+ - All modules, edges, linter_flags correctly stored
349
+ - Chunking correctly splits a 400-line test file into sub-nodes
350
+
351
+ ---
352
+
353
+ ## Phase 2 — Graph Manager & Observation Builder
354
+ **Goal: Efficient, token-budgeted observations from DB**
355
+
356
+ Build:
357
+ - `graph/graph_manager.py` — load graph, traversal order, neighbor queries
358
+ - `graph/token_budget.py` — enforce per-component token limits
359
+ - `env/observation.py` — Pydantic CodeObservation model
360
+
361
+ Success criteria:
362
+ - Observation for any node fits within 2000 token budget
363
+ - Traversal order: leaf nodes first, high-centrality nodes last
364
+ - REQUEST_CONTEXT returns full neighbor code within budget
365
+
366
+ ---
367
+
368
+ ## Phase 3 — Action Space, Reward Engine & Graders
369
+ **Goal: All actions scored correctly and deterministically**
370
+
371
+ Build:
372
+ - `env/action.py` — Pydantic ReviewAction
373
+ - `env/reward.py` — Pydantic ReviewReward + reward table logic
374
+ - `graders/base_grader.py` — abstract interface
375
+ - `graders/easy_grader.py` — linter match
376
+ - `graders/medium_grader.py` — linter + keyword + line attribution
377
+ - `graders/hard_grader.py` — graph consistency + LLM judge
378
+
379
+ Success criteria:
380
+ - Easy grader: same input always gives same output (verified with 10 runs)
381
+ - Hard grader: temperature=0 verified, prompt hash documented
382
+ - All reward values within 0.0–1.0 range
383
+ - False positive and false negative cases handled explicitly
384
 
385
  ---
386
 
387
+ ## Phase 4 OpenEnv Core
388
+ **Goal: Fully compliant step() / reset() / state()**
389
+
390
+ Build:
391
+ - `env/environment.py` — CodeReviewEnv main class
392
+ - `env/state.py` — GraphState Pydantic model
393
+ - `tasks/task_registry.py` + 3 task files
394
+ - `openenv.yaml`
395
+ - `server.py` — FastAPI HTTP wrapper
396
 
397
+ Success criteria:
398
+ - `openenv validate` passes
399
+ - All 3 tasks run end-to-end without error
400
+ - state() correctly returns full annotated graph
401
+ - reset() clears only current task annotations, not full DB
 
 
 
402
 
403
  ---
404
 
405
+ ## Phase 5 Visualization & Reporting
406
+ **Goal: Useful output the user actually sees**
407
+
408
+ Build:
409
+ - `visualizer/pyvis_renderer.py` — interactive HTML graph
410
+ - `visualizer/report_generator.py` — markdown + JSON report
411
+
412
+ Success criteria:
413
+ - Graph colors update correctly as reviews accumulate
414
+ - Report correctly attributes cascade issues across modules
415
+ - HTML renders in browser without external dependencies
416
+
417
+ ---
418
 
419
+ ## Phase 6inference.py & Deployment
420
+ **Goal: Baseline script + Docker + HF Space**
421
+
422
+ Build:
423
+ - `inference.py` runs Gemma 4 E4B against all 3 tasks, emits mandatory log format
424
+ - `Dockerfile` — clean build + run
425
+ - `README.md` — full documentation
426
+ - HF Space deployment
427
+
428
+ Success criteria:
429
+ - inference.py completes all 3 tasks in <20 minutes
430
+ - Runs on 2 vCPU / 8GB RAM
431
+ - docker build && docker run works cleanly
432
+ - HF Space deploys and responds to reset() ping
433
+ - Baseline scores reproducible across 3 runs
Reviewer.md ADDED
@@ -0,0 +1,94 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Phase Reviewer Prompt — GraphReview RL Environment
2
+
3
+ You are a senior engineer and RL systems expert reviewing completed phases of a competitive hackathon project called GraphReview. Your job is to catch problems before they compound into later phases.
4
+
5
+ ---
6
+
7
+ ## Project Context
8
+
9
+ GraphReview is an OpenEnv-compliant RL environment for graph-aware Python code review. Key constraints:
10
+ - SQLite is the persistent store — DB schema changes are expensive after Phase 1
11
+ - Pydantic v2 models are shared interfaces — field changes break multiple components
12
+ - Graders must be deterministic — non-determinism is a disqualification risk
13
+ - inference.py log format is a judging contract — any deviation fails automated scoring
14
+ - Must run in <20 min on 2 vCPU / 8GB RAM
15
+ - Must pass `openenv validate` and `docker build && docker run`
16
+
17
+ ---
18
+
19
+ ## Your Review Checklist
20
+
21
+ For every phase submitted to you, check ALL of the following:
22
+
23
+ ### Correctness
24
+ - [ ] Does the code do what the phase plan says it should do?
25
+ - [ ] Are all success criteria from the phase plan met?
26
+ - [ ] Are edge cases handled (empty files, circular imports, modules with no dependencies, modules with >5 deps)?
27
+ - [ ] Does reset() only clear current task annotations, not the full DB?
28
+ - [ ] Does state() return the full graph including all prior annotations?
29
+
30
+ ### Interface Integrity
31
+ - [ ] Do all Pydantic models match the spec exactly (field names, types, Optional handling)?
32
+ - [ ] Do function signatures match what later phases will call?
33
+ - [ ] Are all DB foreign keys correct and consistent?
34
+ - [ ] Is the module_id format consistent everywhere (relative path, sub-node format)?
35
+
36
+ ### Determinism & Reproducibility
37
+ - [ ] Do easy and medium graders make zero LLM calls?
38
+ - [ ] Is hard grader temperature explicitly set to 0?
39
+ - [ ] Would running the same input twice produce the same reward?
40
+ - [ ] Is the LLM judge prompt a static string (not variable-dependent)?
41
+
42
+ ### Performance & Resource Constraints
43
+ - [ ] Will seed.py complete in <30s on the sample_project?
44
+ - [ ] Will inference.py complete all 3 tasks in <20 minutes?
45
+ - [ ] Does token_budget.py enforce the 2000 token cap?
46
+ - [ ] Will the environment run on 2 vCPU / 8GB RAM?
47
+
48
+ ### OpenEnv Compliance
49
+ - [ ] Does openenv.yaml include all required fields?
50
+ - [ ] Do step()/reset()/state() match the OpenEnv spec exactly?
51
+ - [ ] Will `openenv validate` pass based on what's been built?
52
+
53
+ ### Code Quality
54
+ - [ ] Are all functions fully typed?
55
+ - [ ] Are Pydantic models complete with no missing fields?
56
+ - [ ] Is SQLAlchemy session handling correct (no session leaks)?
57
+ - [ ] Are there no hardcoded paths that break in Docker?
58
+
59
+ ### Forward Compatibility
60
+ - [ ] Will this phase's output work cleanly with the next phase's inputs?
61
+ - [ ] Are there any design decisions that will cause pain in later phases?
62
+ - [ ] Is the DB schema flexible enough for the remaining phases?
63
+
64
+ ---
65
+
66
+ ## How to Report Issues
67
+
68
+ For each issue found, report:
69
+
70
+ **Severity:** Critical | Major | Minor
71
+
72
+ **Critical** — will cause disqualification or break a later phase entirely
73
+ **Major** — will cause incorrect behavior or significant rework
74
+ **Minor** — suboptimal but won't break anything
75
+
76
+ **Format:**
77
+ ```
78
+ [CRITICAL] File: graders/hard_grader.py
79
+ Issue: temperature not set to 0 on judge API call
80
+ Why it matters: grader will produce different scores on identical inputs, failing reproducibility check
81
+ Fix: add temperature=0 to API call parameters
82
+ ```
83
+
84
+ ---
85
+
86
+ ## After Reviewing
87
+
88
+ Summarise:
89
+ 1. Total issues found by severity
90
+ 2. Whether the phase passes (no Criticals) or fails (any Critical)
91
+ 3. The single most important thing to fix before moving to the next phase
92
+ 4. Any forward-looking risks the builder should keep in mind for upcoming phases
93
+
94
+ Do not approve a phase with any Critical issues. Do not nitpick Minor issues if the phase is under time pressure — flag them but do not block.
code-review-env/README.md CHANGED
@@ -1,11 +1,70 @@
1
  # CodeReviewEnv
2
 
3
- Phase 1 foundation for dependency-aware code review environment.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4
 
5
  ## Quickstart
6
 
7
  ```bash
8
  pip install -r requirements.txt
9
- python -m parser.ast_parser sample_codebase/
10
  python -m db.store --module checkout
11
  ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  # CodeReviewEnv
2
 
3
+ Dependency-aware code review RL environment with persistent SQLite graph storage.
4
+
5
+ ## Current Status
6
+
7
+ - Phase 1: implemented and validated
8
+ - persistent seed pipeline with hash-based cache
9
+ - parser/chunker/graph builder + linter findings persistence
10
+ - Phase 2: implemented
11
+ - graph manager for DB-backed graph loading and deterministic traversal
12
+ - hard token budget enforcement (max 2000 tokens)
13
+ - strict Pydantic v2 observation models
14
+ - observation builder with neighbor summaries and REQUEST_CONTEXT support
15
+
16
+ ## Implemented Phase 2 Components
17
+
18
+ - [graph/graph_manager.py](graph/graph_manager.py)
19
+ - Loads graph nodes/edges from SQLite.
20
+ - Exposes neighbor queries (in/out/both).
21
+ - Provides deterministic traversal ordering with leaf-first preference.
22
+
23
+ - [graph/token_budget.py](graph/token_budget.py)
24
+ - Enforces hard observation token cap (<= 2000).
25
+ - Applies per-component token limits.
26
+ - Truncates oversized components with explicit marker.
27
+
28
+ - [env/observation.py](env/observation.py)
29
+ - Strict Pydantic models: `NeighborSummary`, `RequestedContext`, `CodeObservation`.
30
+ - Forbids extra fields and type coercion.
31
+ - Enforces `total_tokens <= 2000`.
32
+
33
+ - [env/observation_builder.py](env/observation_builder.py)
34
+ - Builds observation payloads from DB graph state.
35
+ - Ranks dependency context using graph centrality.
36
+ - Produces validated `CodeObservation` objects.
37
+
38
+ ## Compatibility
39
+
40
+ - [env/graph.py](env/graph.py) remains stable for existing callers and now delegates to GraphManager.
41
 
42
  ## Quickstart
43
 
44
  ```bash
45
  pip install -r requirements.txt
46
+ python -m db.seed sample_project/
47
  python -m db.store --module checkout
48
  ```
49
+
50
+ ## Validation
51
+
52
+ Run tests:
53
+
54
+ ```bash
55
+ pytest -q
56
+ ```
57
+
58
+ Phase 2-focused tests:
59
+
60
+ ```bash
61
+ pytest -q tests/test_phase2_graph_manager.py tests/test_phase2_token_budget.py tests/test_phase2_observation.py
62
+ ```
63
+
64
+ ## Security and Quality Notes
65
+
66
+ - SQLite is used as the source of truth for graph and review state.
67
+ - No dynamic code execution is introduced in Phase 2 paths.
68
+ - Input handling fails closed for unknown `module_id` values.
69
+ - Observations are hard-capped to prevent context overflow.
70
+ - Code follows typed interfaces and minimal stateful behavior.
code-review-env/db/database.py ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ from db.migrations import get_default_db_path, get_engine, init_db
2
+
3
+ __all__ = ["get_default_db_path", "get_engine", "init_db"]
code-review-env/db/models.py ADDED
@@ -0,0 +1,25 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from db.schema import (
2
+ EdgeType,
3
+ EpisodeRecord,
4
+ LinterFinding,
5
+ ModuleEdge,
6
+ ModuleNode,
7
+ ReviewAnnotation,
8
+ ReviewStatus,
9
+ SeedMeta,
10
+ Severity,
11
+ TaskDefinition,
12
+ )
13
+
14
+ __all__ = [
15
+ "EdgeType",
16
+ "EpisodeRecord",
17
+ "LinterFinding",
18
+ "ModuleEdge",
19
+ "ModuleNode",
20
+ "ReviewAnnotation",
21
+ "ReviewStatus",
22
+ "SeedMeta",
23
+ "Severity",
24
+ "TaskDefinition",
25
+ ]
code-review-env/db/schema.py CHANGED
@@ -9,7 +9,9 @@ from sqlmodel import Field, SQLModel
9
 
10
  class EdgeType(StrEnum):
11
  EXPLICIT_IMPORT = "explicit_import"
12
- IMPLICIT_NAME_RESOLUTION = "implicit_name_resolution"
 
 
13
 
14
 
15
  class ReviewStatus(StrEnum):
@@ -28,8 +30,13 @@ class ModuleNode(SQLModel, table=True):
28
  id: Optional[int] = Field(default=None, primary_key=True)
29
  source_root: str = Field(index=True)
30
  module_id: str = Field(index=True)
 
31
  raw_code: str
32
  ast_summary: str
 
 
 
 
33
  dependency_reason: str = ""
34
  review_annotation: Optional[str] = None
35
  review_status: ReviewStatus = Field(default=ReviewStatus.PENDING)
@@ -89,3 +96,8 @@ class TaskDefinition(SQLModel, table=True):
89
  target_module_id: str = Field(index=True)
90
  description: str
91
  ground_truth_ref: str
 
 
 
 
 
 
9
 
10
  class EdgeType(StrEnum):
11
  EXPLICIT_IMPORT = "explicit_import"
12
+ IMPLICIT_DEPENDENCY = "implicit_dependency"
13
+ INTRA_FILE = "intra_file"
14
+ CIRCULAR = "circular"
15
 
16
 
17
  class ReviewStatus(StrEnum):
 
30
  id: Optional[int] = Field(default=None, primary_key=True)
31
  source_root: str = Field(index=True)
32
  module_id: str = Field(index=True)
33
+ name: Optional[str] = None
34
  raw_code: str
35
  ast_summary: str
36
+ summary: Optional[str] = None
37
+ linter_flags: str = "[]"
38
+ parent_module_id: Optional[str] = Field(default=None, index=True)
39
+ is_chunk: bool = False
40
  dependency_reason: str = ""
41
  review_annotation: Optional[str] = None
42
  review_status: ReviewStatus = Field(default=ReviewStatus.PENDING)
 
96
  target_module_id: str = Field(index=True)
97
  description: str
98
  ground_truth_ref: str
99
+
100
+
101
+ class SeedMeta(SQLModel, table=True):
102
+ key: str = Field(primary_key=True)
103
+ value: str
code-review-env/db/seed.py ADDED
@@ -0,0 +1,143 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from __future__ import annotations
2
+
3
+ import argparse
4
+ import hashlib
5
+ import json
6
+ from datetime import UTC, datetime
7
+ from pathlib import Path
8
+
9
+ from db.store import Store
10
+ from parser.ast_parser import parse_python_file
11
+ from parser.chunker import chunk_module
12
+ from parser.graph_builder import build_edges
13
+ from parser.linter import run_linters
14
+ from parser.summarizer import summarize_module
15
+
16
+
17
+ def _codebase_hash(target_dir: Path) -> str:
18
+ digest = hashlib.sha256()
19
+ for path in sorted(target_dir.rglob("*.py")):
20
+ rel = path.relative_to(target_dir).as_posix()
21
+ digest.update(rel.encode("utf-8"))
22
+ digest.update(path.read_bytes())
23
+ return digest.hexdigest()
24
+
25
+
26
+ def _seed_meta_key(source_root: str) -> str:
27
+ return f"seeded:{source_root}"
28
+
29
+
30
+ def seed_project(target_dir: Path, db_path: str | None = None, force: bool = False) -> dict[str, object]:
31
+ target_dir = target_dir.resolve()
32
+ store = Store(source_root=str(target_dir), db_path=db_path)
33
+
34
+ current_hash = _codebase_hash(target_dir)
35
+ meta_key = _seed_meta_key(str(target_dir))
36
+ existing_raw = store.get_meta(meta_key)
37
+ existing = json.loads(existing_raw) if existing_raw else {}
38
+
39
+ if (
40
+ not force
41
+ and store.has_nodes()
42
+ and existing.get("codebase_hash") == current_hash
43
+ and existing.get("seeded") is True
44
+ ):
45
+ return {
46
+ "seeded": True,
47
+ "loaded_from_cache": True,
48
+ "codebase_hash": current_hash,
49
+ "node_count": int(existing.get("node_count", 0)),
50
+ "edge_count": int(existing.get("edge_count", 0)),
51
+ }
52
+
53
+ store.clear_source_graph()
54
+
55
+ py_files = sorted(target_dir.rglob("*.py"))
56
+ parsed_modules = [parse_python_file(path, target_dir) for path in py_files]
57
+ module_ids = {parsed.module_id for parsed in parsed_modules}
58
+
59
+ chunk_ids_by_parent: dict[str, set[str]] = {}
60
+
61
+ for path, parsed in zip(py_files, parsed_modules):
62
+ issues = run_linters(path)
63
+ summary = summarize_module(parsed, issues)
64
+ linter_flags = json.dumps([issue.model_dump() for issue in issues])
65
+
66
+ chunk_result = chunk_module(parsed, max_lines=300)
67
+ parent = chunk_result.parent
68
+ store.upsert_node(
69
+ module_id=parent.module_id,
70
+ name=parent.name,
71
+ raw_code=parent.code,
72
+ ast_summary=summary,
73
+ summary=summary,
74
+ linter_flags=linter_flags,
75
+ dependency_reason="Imports and symbol usage captured from AST",
76
+ parent_module_id=parent.parent_module_id,
77
+ is_chunk=parent.is_chunk,
78
+ )
79
+
80
+ if chunk_result.chunks:
81
+ chunk_ids_by_parent[parent.module_id] = {chunk.module_id for chunk in chunk_result.chunks}
82
+
83
+ for chunk in chunk_result.chunks:
84
+ chunk_summary = f"Chunk {chunk.name} lines {chunk.start_line}-{chunk.end_line}"
85
+ store.upsert_node(
86
+ module_id=chunk.module_id,
87
+ name=chunk.name,
88
+ raw_code=chunk.code,
89
+ ast_summary=chunk_summary,
90
+ summary=chunk_summary,
91
+ linter_flags="[]",
92
+ dependency_reason="Top-level class/function chunk",
93
+ parent_module_id=chunk.parent_module_id,
94
+ is_chunk=chunk.is_chunk,
95
+ )
96
+
97
+ store.replace_findings_for_module(parsed.module_id, [issue.model_dump() for issue in issues])
98
+
99
+ edges = build_edges(parsed_modules, module_ids, chunk_ids_by_parent)
100
+ for edge in edges:
101
+ store.upsert_edge(
102
+ source_module_id=edge.source_module_id,
103
+ target_module_id=edge.target_module_id,
104
+ edge_type=edge.edge_type,
105
+ import_line=edge.import_line,
106
+ weight=edge.weight,
107
+ )
108
+
109
+ snapshot = store.get_full_graph()
110
+ meta_payload = {
111
+ "seeded": True,
112
+ "seeded_at": datetime.now(UTC).isoformat(),
113
+ "codebase_hash": current_hash,
114
+ "node_count": len(snapshot.nodes),
115
+ "edge_count": len(snapshot.edges),
116
+ }
117
+ store.set_meta(meta_key, json.dumps(meta_payload))
118
+
119
+ return {
120
+ "seeded": True,
121
+ "loaded_from_cache": False,
122
+ "codebase_hash": current_hash,
123
+ "node_count": len(snapshot.nodes),
124
+ "edge_count": len(snapshot.edges),
125
+ }
126
+
127
+
128
+ def _build_parser() -> argparse.ArgumentParser:
129
+ parser = argparse.ArgumentParser(description="Seed graph database from Python project")
130
+ parser.add_argument("target", help="Path to target codebase")
131
+ parser.add_argument("--db-path", default=None, help="Path to SQLite database")
132
+ parser.add_argument("--force", action="store_true", help="Force re-parse even if seeded")
133
+ return parser
134
+
135
+
136
+ def main() -> None:
137
+ args = _build_parser().parse_args()
138
+ result = seed_project(Path(args.target), db_path=args.db_path, force=args.force)
139
+ print(json.dumps(result, indent=2))
140
+
141
+
142
+ if __name__ == "__main__":
143
+ main()
code-review-env/db/store.py CHANGED
@@ -17,6 +17,7 @@ from db.schema import (
17
  ModuleNode,
18
  ReviewAnnotation,
19
  ReviewStatus,
 
20
  Severity,
21
  )
22
 
@@ -77,6 +78,11 @@ class Store:
77
  raw_code: str,
78
  ast_summary: str,
79
  dependency_reason: str,
 
 
 
 
 
80
  ) -> ModuleNode:
81
  with Session(self.engine) as session:
82
  existing = session.exec(
@@ -86,8 +92,13 @@ class Store:
86
  )
87
  ).first()
88
  if existing:
 
89
  existing.raw_code = raw_code
90
  existing.ast_summary = ast_summary
 
 
 
 
91
  existing.dependency_reason = dependency_reason
92
  existing.updated_at = datetime.now(UTC)
93
  session.add(existing)
@@ -98,8 +109,13 @@ class Store:
98
  node = ModuleNode(
99
  source_root=self.config.source_root,
100
  module_id=module_id,
 
101
  raw_code=raw_code,
102
  ast_summary=ast_summary,
 
 
 
 
103
  dependency_reason=dependency_reason,
104
  )
105
  session.add(node)
@@ -322,6 +338,21 @@ class Store:
322
  ).first()
323
  return first_node is not None
324
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
325
  def clear_source_graph(self) -> None:
326
  with Session(self.engine) as session:
327
  session.exec(
 
17
  ModuleNode,
18
  ReviewAnnotation,
19
  ReviewStatus,
20
+ SeedMeta,
21
  Severity,
22
  )
23
 
 
78
  raw_code: str,
79
  ast_summary: str,
80
  dependency_reason: str,
81
+ name: str | None = None,
82
+ summary: str | None = None,
83
+ linter_flags: str = "[]",
84
+ parent_module_id: str | None = None,
85
+ is_chunk: bool = False,
86
  ) -> ModuleNode:
87
  with Session(self.engine) as session:
88
  existing = session.exec(
 
92
  )
93
  ).first()
94
  if existing:
95
+ existing.name = name or existing.name
96
  existing.raw_code = raw_code
97
  existing.ast_summary = ast_summary
98
+ existing.summary = summary or existing.summary
99
+ existing.linter_flags = linter_flags
100
+ existing.parent_module_id = parent_module_id
101
+ existing.is_chunk = is_chunk
102
  existing.dependency_reason = dependency_reason
103
  existing.updated_at = datetime.now(UTC)
104
  session.add(existing)
 
109
  node = ModuleNode(
110
  source_root=self.config.source_root,
111
  module_id=module_id,
112
+ name=name,
113
  raw_code=raw_code,
114
  ast_summary=ast_summary,
115
+ summary=summary,
116
+ linter_flags=linter_flags,
117
+ parent_module_id=parent_module_id,
118
+ is_chunk=is_chunk,
119
  dependency_reason=dependency_reason,
120
  )
121
  session.add(node)
 
338
  ).first()
339
  return first_node is not None
340
 
341
+ def get_meta(self, key: str) -> Optional[str]:
342
+ with Session(self.engine) as session:
343
+ record = session.get(SeedMeta, key)
344
+ return record.value if record else None
345
+
346
+ def set_meta(self, key: str, value: str) -> None:
347
+ with Session(self.engine) as session:
348
+ record = session.get(SeedMeta, key)
349
+ if record:
350
+ record.value = value
351
+ session.add(record)
352
+ else:
353
+ session.add(SeedMeta(key=key, value=value))
354
+ session.commit()
355
+
356
  def clear_source_graph(self) -> None:
357
  with Session(self.engine) as session:
358
  session.exec(
code-review-env/env/graph.py CHANGED
@@ -4,11 +4,9 @@ from dataclasses import dataclass
4
  from pathlib import Path
5
 
6
  import networkx as nx
7
- from sqlmodel import Session, select
8
 
9
- from db.schema import ModuleEdge, ModuleNode
10
- from db.store import Store
11
- from parser.ast_parser import parse_directory
12
 
13
 
14
  @dataclass
@@ -20,79 +18,34 @@ class GraphLoadResult:
20
  class DependencyGraph:
21
  def __init__(self, target_dir: str | Path, db_path: str | Path | None = None) -> None:
22
  self.target_dir = Path(target_dir).resolve()
23
- self.store = Store(source_root=str(self.target_dir), db_path=db_path)
24
 
25
  def load_or_build(self, force_reparse: bool = False) -> GraphLoadResult:
26
- if force_reparse or not self.store.has_nodes():
27
- parse_directory(self.target_dir, db_path=str(self.store.config.db_path))
28
- loaded_from_cache = False
29
- else:
30
- loaded_from_cache = True
 
31
  return GraphLoadResult(graph=self._build_graph(), loaded_from_cache=loaded_from_cache)
32
 
33
  def _build_graph(self) -> nx.DiGraph:
34
- graph = nx.DiGraph()
35
- with Session(self.store.engine) as session:
36
- nodes = list(
37
- session.exec(
38
- select(ModuleNode).where(ModuleNode.source_root == self.store.config.source_root)
39
- ).all()
40
- )
41
- edges = list(
42
- session.exec(
43
- select(ModuleEdge).where(ModuleEdge.source_root == self.store.config.source_root)
44
- ).all()
45
- )
46
-
47
- for node in nodes:
48
- graph.add_node(
49
- node.module_id,
50
- ast_summary=node.ast_summary,
51
- review_status=node.review_status.value,
52
- )
53
-
54
- for edge in edges:
55
- graph.add_edge(
56
- edge.source_module_id,
57
- edge.target_module_id,
58
- import_line=edge.import_line,
59
- edge_type=edge.edge_type.value,
60
- weight=edge.weight,
61
- )
62
-
63
- return graph
64
 
65
  def traversal_order(self, graph: nx.DiGraph | None = None) -> list[str]:
66
- graph = graph or self._build_graph()
 
67
  if graph.number_of_nodes() == 0:
68
  return []
69
-
70
- if not nx.is_directed_acyclic_graph(graph):
71
- # Fall back to deterministic ordering if cyclic imports exist.
72
- return sorted(graph.nodes())
73
-
74
- centrality = nx.betweenness_centrality(graph)
75
- indegree = {node: graph.in_degree(node) for node in graph.nodes()}
76
- queue = [node for node, deg in indegree.items() if deg == 0]
77
- order: list[str] = []
78
-
79
- def rank(node: str) -> tuple[float, float, str]:
80
- return (
81
- float(graph.out_degree(node)),
82
  float(centrality.get(node, 0.0)),
83
- node,
84
- )
85
-
86
- while queue:
87
- queue.sort(key=rank)
88
- current = queue.pop(0)
89
- order.append(current)
90
- for successor in sorted(graph.successors(current)):
91
- indegree[successor] -= 1
92
- if indegree[successor] == 0:
93
- queue.append(successor)
94
-
95
- return order
96
 
97
 
98
  if __name__ == "__main__":
 
4
  from pathlib import Path
5
 
6
  import networkx as nx
 
7
 
8
+ from db.seed import seed_project
9
+ from graph.graph_manager import GraphManager
 
10
 
11
 
12
  @dataclass
 
18
  class DependencyGraph:
19
  def __init__(self, target_dir: str | Path, db_path: str | Path | None = None) -> None:
20
  self.target_dir = Path(target_dir).resolve()
21
+ self.graph_manager = GraphManager(source_root=self.target_dir, db_path=db_path)
22
 
23
  def load_or_build(self, force_reparse: bool = False) -> GraphLoadResult:
24
+ result = seed_project(
25
+ self.target_dir,
26
+ db_path=str(self.graph_manager.store.config.db_path),
27
+ force=force_reparse,
28
+ )
29
+ loaded_from_cache = bool(result.get("loaded_from_cache", False))
30
  return GraphLoadResult(graph=self._build_graph(), loaded_from_cache=loaded_from_cache)
31
 
32
  def _build_graph(self) -> nx.DiGraph:
33
+ return self.graph_manager.load_graph()
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
34
 
35
  def traversal_order(self, graph: nx.DiGraph | None = None) -> list[str]:
36
+ if graph is None:
37
+ return self.graph_manager.traversal_order()
38
  if graph.number_of_nodes() == 0:
39
  return []
40
+ centrality = nx.betweenness_centrality(graph, normalized=True)
41
+ return sorted(
42
+ graph.nodes(),
43
+ key=lambda node: (
44
+ int(graph.out_degree(node)),
 
 
 
 
 
 
 
 
45
  float(centrality.get(node, 0.0)),
46
+ str(node),
47
+ ),
48
+ )
 
 
 
 
 
 
 
 
 
 
49
 
50
 
51
  if __name__ == "__main__":
code-review-env/env/observation.py ADDED
@@ -0,0 +1,62 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from __future__ import annotations
2
+
3
+ from typing import Literal
4
+
5
+ from pydantic import BaseModel, ConfigDict, Field, field_validator
6
+
7
+ from graph.token_budget import MAX_TOTAL_TOKENS
8
+
9
+
10
+ class NeighborSummary(BaseModel):
11
+ model_config = ConfigDict(strict=True, extra="forbid")
12
+
13
+ module_id: str
14
+ relation: Literal["dependency", "dependent"]
15
+ summary: str
16
+ review_snippet: str | None = None
17
+
18
+
19
+ class RequestedContext(BaseModel):
20
+ model_config = ConfigDict(strict=True, extra="forbid")
21
+
22
+ module_id: str
23
+ code: str
24
+ was_truncated: bool
25
+
26
+
27
+ class CodeObservation(BaseModel):
28
+ model_config = ConfigDict(strict=True, extra="forbid")
29
+
30
+ module_id: str
31
+ code: str
32
+ ast_summary: dict[str, object]
33
+ dependency_summaries: list[NeighborSummary] = Field(default_factory=list)
34
+ dependent_summaries: list[NeighborSummary] = Field(default_factory=list)
35
+ neighbor_reviews: list[str] = Field(default_factory=list)
36
+ task_description: str
37
+ available_actions: list[str] = Field(default_factory=list)
38
+ requested_context: RequestedContext | None = None
39
+ token_usage: dict[str, int]
40
+ total_tokens: int
41
+ within_budget: bool
42
+
43
+ @field_validator("module_id", "code", "task_description")
44
+ @classmethod
45
+ def _must_not_be_empty(cls, value: str) -> str:
46
+ if not value.strip():
47
+ raise ValueError("Field cannot be empty")
48
+ return value
49
+
50
+ @field_validator("total_tokens")
51
+ @classmethod
52
+ def _budget_hard_cap(cls, value: int) -> int:
53
+ if value > MAX_TOTAL_TOKENS:
54
+ raise ValueError(f"total_tokens exceeds hard cap: {MAX_TOTAL_TOKENS}")
55
+ return value
56
+
57
+ @field_validator("within_budget")
58
+ @classmethod
59
+ def _must_be_true(cls, value: bool) -> bool:
60
+ if not value:
61
+ raise ValueError("within_budget must be True")
62
+ return value
code-review-env/env/observation_builder.py CHANGED
@@ -1 +1,143 @@
1
- """Phase 2 implementation placeholder."""
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from __future__ import annotations
2
+
3
+ import json
4
+ from pathlib import Path
5
+
6
+ from sqlmodel import Session, select
7
+
8
+ from db.schema import ModuleNode
9
+ from env.observation import CodeObservation, NeighborSummary, RequestedContext
10
+ from graph.graph_manager import GraphManager
11
+ from graph.token_budget import TokenBudget
12
+
13
+
14
+ DEFAULT_ACTIONS = [
15
+ "FLAG_STYLE",
16
+ "FLAG_BUG",
17
+ "FLAG_SECURITY",
18
+ "FLAG_DEPENDENCY_ISSUE",
19
+ "ADD_COMMENT",
20
+ "REQUEST_CONTEXT",
21
+ "REQUEST_CHANGES",
22
+ "APPROVE",
23
+ "AMEND_REVIEW",
24
+ ]
25
+
26
+
27
+ class ObservationBuilder:
28
+ def __init__(self, source_root: str | Path, db_path: str | Path | None = None) -> None:
29
+ self.graph_manager = GraphManager(source_root=source_root, db_path=db_path)
30
+ self.token_budget = TokenBudget()
31
+
32
+ def _fetch_node(self, module_id: str) -> ModuleNode:
33
+ with Session(self.graph_manager.store.engine) as session:
34
+ node = session.exec(
35
+ select(ModuleNode).where(
36
+ ModuleNode.source_root == self.graph_manager.store.config.source_root,
37
+ ModuleNode.module_id == module_id,
38
+ )
39
+ ).first()
40
+ if not node:
41
+ raise ValueError(f"Unknown module_id: {module_id}")
42
+ return node
43
+
44
+ @staticmethod
45
+ def _ast_summary_payload(ast_summary: str) -> dict[str, object]:
46
+ try:
47
+ loaded = json.loads(ast_summary)
48
+ except json.JSONDecodeError:
49
+ return {"text": ast_summary}
50
+ return loaded if isinstance(loaded, dict) else {"items": loaded}
51
+
52
+ def build(
53
+ self,
54
+ module_id: str,
55
+ task_description: str,
56
+ available_actions: list[str] | None = None,
57
+ context_request: str | None = None,
58
+ ) -> CodeObservation:
59
+ graph = self.graph_manager.load_graph()
60
+ if module_id not in graph:
61
+ raise ValueError(f"Unknown module_id: {module_id}")
62
+
63
+ node = self._fetch_node(module_id)
64
+ centrality = self.graph_manager.centrality()
65
+
66
+ dependencies = list(graph.successors(module_id))
67
+ dependents = list(graph.predecessors(module_id))
68
+
69
+ dep_ranked = sorted(dependencies, key=lambda n: (-float(centrality.get(n, 0.0)), n))[:5]
70
+ dependent_ranked = sorted(dependents, key=lambda n: (-float(centrality.get(n, 0.0)), n))[:3]
71
+
72
+ dependency_summaries: list[NeighborSummary] = []
73
+ dependent_summaries: list[NeighborSummary] = []
74
+ neighbor_reviews: list[str] = []
75
+
76
+ for dep_id in dep_ranked:
77
+ dep_node = self._fetch_node(dep_id)
78
+ dependency_summaries.append(
79
+ NeighborSummary(
80
+ module_id=dep_id,
81
+ relation="dependency",
82
+ summary=dep_node.summary or dep_node.ast_summary,
83
+ review_snippet=dep_node.review_summary,
84
+ )
85
+ )
86
+ if dep_node.review_summary:
87
+ neighbor_reviews.append(f"{dep_id}: {dep_node.review_summary}")
88
+
89
+ for depd_id in dependent_ranked:
90
+ depd_node = self._fetch_node(depd_id)
91
+ dependent_summaries.append(
92
+ NeighborSummary(
93
+ module_id=depd_id,
94
+ relation="dependent",
95
+ summary=depd_node.summary or depd_node.ast_summary,
96
+ review_snippet=depd_node.review_summary,
97
+ )
98
+ )
99
+ if depd_node.review_summary:
100
+ neighbor_reviews.append(f"{depd_id}: {depd_node.review_summary}")
101
+
102
+ requested_context: RequestedContext | None = None
103
+ requested_context_code = ""
104
+ if context_request:
105
+ context_node = self._fetch_node(context_request)
106
+ requested_context_code = context_node.raw_code
107
+
108
+ actions = available_actions or DEFAULT_ACTIONS
109
+ budgeted = self.token_budget.enforce(
110
+ {
111
+ "code": node.raw_code,
112
+ "ast_summary_text": node.ast_summary,
113
+ "dependency_summaries": [item.model_dump_json() for item in dependency_summaries],
114
+ "dependent_summaries": [item.model_dump_json() for item in dependent_summaries],
115
+ "neighbor_reviews": neighbor_reviews[:4],
116
+ "task_description": task_description,
117
+ "available_actions": actions,
118
+ "requested_context_code": requested_context_code,
119
+ }
120
+ )
121
+
122
+ if context_request:
123
+ context_trimmed = budgeted.payload.get("requested_context_code", "")
124
+ requested_context = RequestedContext(
125
+ module_id=context_request,
126
+ code=str(context_trimmed),
127
+ was_truncated=str(context_trimmed) != requested_context_code,
128
+ )
129
+
130
+ return CodeObservation(
131
+ module_id=module_id,
132
+ code=str(budgeted.payload.get("code", "")),
133
+ ast_summary=self._ast_summary_payload(str(budgeted.payload.get("ast_summary_text", ""))),
134
+ dependency_summaries=dependency_summaries,
135
+ dependent_summaries=dependent_summaries,
136
+ neighbor_reviews=neighbor_reviews[:4],
137
+ task_description=task_description,
138
+ available_actions=actions,
139
+ requested_context=requested_context,
140
+ token_usage=budgeted.token_usage,
141
+ total_tokens=budgeted.total_tokens,
142
+ within_budget=budgeted.total_tokens <= self.token_budget.max_total_tokens,
143
+ )
code-review-env/graph/__init__.py ADDED
@@ -0,0 +1,5 @@
 
 
 
 
 
 
1
+ """Graph utilities for loading and querying dependency graphs."""
2
+
3
+ from graph.graph_manager import GraphManager
4
+
5
+ __all__ = ["GraphManager"]
code-review-env/graph/graph_manager.py ADDED
@@ -0,0 +1,125 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from __future__ import annotations
2
+
3
+ from pathlib import Path
4
+ from typing import Literal
5
+
6
+ import networkx as nx
7
+ from sqlmodel import Session, select
8
+
9
+ from db.schema import ModuleEdge, ModuleNode
10
+ from db.store import Store
11
+
12
+
13
+ class GraphManager:
14
+ """Load and query dependency graph state from SQLite."""
15
+
16
+ def __init__(self, source_root: str | Path, db_path: str | Path | None = None) -> None:
17
+ self.source_root = str(Path(source_root).resolve())
18
+ self.store = Store(source_root=self.source_root, db_path=db_path)
19
+
20
+ def load_graph(self) -> nx.DiGraph:
21
+ graph = nx.DiGraph()
22
+ with Session(self.store.engine) as session:
23
+ nodes = list(
24
+ session.exec(
25
+ select(ModuleNode).where(ModuleNode.source_root == self.store.config.source_root)
26
+ ).all()
27
+ )
28
+ edges = list(
29
+ session.exec(
30
+ select(ModuleEdge).where(ModuleEdge.source_root == self.store.config.source_root)
31
+ ).all()
32
+ )
33
+
34
+ for node in nodes:
35
+ graph.add_node(
36
+ node.module_id,
37
+ name=node.name,
38
+ raw_code=node.raw_code,
39
+ ast_summary=node.ast_summary,
40
+ summary=node.summary or "",
41
+ linter_flags=node.linter_flags,
42
+ parent_module_id=node.parent_module_id,
43
+ review_status=node.review_status.value,
44
+ review_summary=node.review_summary or "",
45
+ is_chunk=node.is_chunk,
46
+ )
47
+
48
+ for edge in edges:
49
+ graph.add_edge(
50
+ edge.source_module_id,
51
+ edge.target_module_id,
52
+ edge_type=edge.edge_type.value,
53
+ import_line=edge.import_line,
54
+ weight=edge.weight,
55
+ )
56
+
57
+ return graph
58
+
59
+ def get_node(self, module_id: str) -> dict[str, object]:
60
+ graph = self.load_graph()
61
+ if module_id not in graph:
62
+ raise ValueError(f"Unknown module_id: {module_id}")
63
+ return dict(graph.nodes[module_id])
64
+
65
+ def get_neighbors(
66
+ self,
67
+ module_id: str,
68
+ direction: Literal["out", "in", "both"] = "both",
69
+ limit: int | None = None,
70
+ ) -> list[str]:
71
+ graph = self.load_graph()
72
+ if module_id not in graph:
73
+ raise ValueError(f"Unknown module_id: {module_id}")
74
+
75
+ if direction == "out":
76
+ neighbors = set(graph.successors(module_id))
77
+ elif direction == "in":
78
+ neighbors = set(graph.predecessors(module_id))
79
+ else:
80
+ neighbors = set(graph.successors(module_id))
81
+ neighbors.update(graph.predecessors(module_id))
82
+
83
+ ordered = sorted(neighbors)
84
+ if limit is None:
85
+ return ordered
86
+ return ordered[: max(limit, 0)]
87
+
88
+ def centrality(self) -> dict[str, float]:
89
+ graph = self.load_graph()
90
+ if graph.number_of_nodes() == 0:
91
+ return {}
92
+ return nx.betweenness_centrality(graph, normalized=True)
93
+
94
+ def traversal_order(self) -> list[str]:
95
+ """
96
+ Return a deterministic, leaf-first traversal where high-centrality nodes are later.
97
+ """
98
+ graph = self.load_graph()
99
+ if graph.number_of_nodes() == 0:
100
+ return []
101
+
102
+ centrality = self.centrality()
103
+
104
+ # For DAGs, reverse topological order visits leaves first.
105
+ if nx.is_directed_acyclic_graph(graph):
106
+ topo_reversed = list(reversed(list(nx.lexicographical_topological_sort(graph))))
107
+ topo_rank = {node: idx for idx, node in enumerate(topo_reversed)}
108
+ return sorted(
109
+ graph.nodes(),
110
+ key=lambda node: (
111
+ int(topo_rank.get(node, 0)),
112
+ float(centrality.get(node, 0.0)),
113
+ str(node),
114
+ ),
115
+ )
116
+
117
+ # Stable fallback for cyclic graphs.
118
+ return sorted(
119
+ graph.nodes(),
120
+ key=lambda node: (
121
+ int(graph.out_degree(node)),
122
+ float(centrality.get(node, 0.0)),
123
+ str(node),
124
+ ),
125
+ )
code-review-env/graph/token_budget.py ADDED
@@ -0,0 +1,117 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from __future__ import annotations
2
+
3
+ import math
4
+ from dataclasses import dataclass
5
+
6
+ MAX_TOTAL_TOKENS = 2000
7
+
8
+ COMPONENT_LIMITS: dict[str, int] = {
9
+ "current_code": 800,
10
+ "ast_summary": 100,
11
+ "direct_deps": 250,
12
+ "dependents": 150,
13
+ "neighbor_reviews": 120,
14
+ "task_and_actions": 200,
15
+ "requested_context": 800,
16
+ }
17
+
18
+
19
+ def estimate_tokens(text: str) -> int:
20
+ """Deterministic approximation with conservative floor for non-empty text."""
21
+ if not text:
22
+ return 0
23
+ return max(1, int(math.ceil(len(text) / 4)))
24
+
25
+
26
+ def truncate_to_budget(text: str, max_tokens: int, suffix_notice: str = "\n... [TRUNCATED]") -> str:
27
+ if max_tokens <= 0:
28
+ return ""
29
+
30
+ current = estimate_tokens(text)
31
+ if current <= max_tokens:
32
+ return text
33
+
34
+ notice_tokens = estimate_tokens(suffix_notice)
35
+ content_budget = max(max_tokens - notice_tokens, 0)
36
+ max_chars = content_budget * 4
37
+ trimmed = text[:max_chars]
38
+ return f"{trimmed}{suffix_notice}" if trimmed else suffix_notice.strip()
39
+
40
+
41
+ @dataclass(frozen=True)
42
+ class BudgetResult:
43
+ payload: dict[str, object]
44
+ token_usage: dict[str, int]
45
+ total_tokens: int
46
+
47
+
48
+ class TokenBudget:
49
+ def __init__(self, max_total_tokens: int = MAX_TOTAL_TOKENS) -> None:
50
+ self.max_total_tokens = max_total_tokens
51
+
52
+ def _trim_component(self, text: str, component_name: str) -> str:
53
+ limit = COMPONENT_LIMITS.get(component_name, self.max_total_tokens)
54
+ return truncate_to_budget(text, limit)
55
+
56
+ def enforce(self, payload: dict[str, object]) -> BudgetResult:
57
+ normalized = dict(payload)
58
+ usage: dict[str, int] = {}
59
+
60
+ current_code = str(normalized.get("code", ""))
61
+ ast_summary = str(normalized.get("ast_summary_text", ""))
62
+ dep_text = "\n".join(str(item) for item in normalized.get("dependency_summaries", []))
63
+ dependent_text = "\n".join(str(item) for item in normalized.get("dependent_summaries", []))
64
+ review_text = "\n".join(str(item) for item in normalized.get("neighbor_reviews", []))
65
+ task_actions = "\n".join(
66
+ [
67
+ str(normalized.get("task_description", "")),
68
+ " ".join(str(a) for a in normalized.get("available_actions", [])),
69
+ ]
70
+ )
71
+ requested_context = str(normalized.get("requested_context_code", ""))
72
+
73
+ current_code = self._trim_component(current_code, "current_code")
74
+ ast_summary = self._trim_component(ast_summary, "ast_summary")
75
+ dep_text = self._trim_component(dep_text, "direct_deps")
76
+ dependent_text = self._trim_component(dependent_text, "dependents")
77
+ review_text = self._trim_component(review_text, "neighbor_reviews")
78
+ task_actions = self._trim_component(task_actions, "task_and_actions")
79
+ requested_context = self._trim_component(requested_context, "requested_context")
80
+
81
+ normalized["code"] = current_code
82
+ normalized["ast_summary_text"] = ast_summary
83
+ normalized["dependency_summaries_text"] = dep_text
84
+ normalized["dependent_summaries_text"] = dependent_text
85
+ normalized["neighbor_reviews_text"] = review_text
86
+ normalized["task_actions_text"] = task_actions
87
+ normalized["requested_context_code"] = requested_context
88
+
89
+ usage["current_code"] = estimate_tokens(current_code)
90
+ usage["ast_summary"] = estimate_tokens(ast_summary)
91
+ usage["direct_deps"] = estimate_tokens(dep_text)
92
+ usage["dependents"] = estimate_tokens(dependent_text)
93
+ usage["neighbor_reviews"] = estimate_tokens(review_text)
94
+ usage["task_and_actions"] = estimate_tokens(task_actions)
95
+ usage["requested_context"] = estimate_tokens(requested_context)
96
+
97
+ total = sum(usage.values())
98
+ if total > self.max_total_tokens:
99
+ overflow = total - self.max_total_tokens
100
+ requested_limit = max(estimate_tokens(requested_context) - overflow, 0)
101
+ requested_context = truncate_to_budget(requested_context, requested_limit)
102
+ normalized["requested_context_code"] = requested_context
103
+ usage["requested_context"] = estimate_tokens(requested_context)
104
+ total = sum(usage.values())
105
+
106
+ if total > self.max_total_tokens:
107
+ overflow = total - self.max_total_tokens
108
+ code_limit = max(estimate_tokens(current_code) - overflow, 0)
109
+ current_code = truncate_to_budget(current_code, code_limit)
110
+ normalized["code"] = current_code
111
+ usage["current_code"] = estimate_tokens(current_code)
112
+ total = sum(usage.values())
113
+
114
+ if total > self.max_total_tokens:
115
+ raise ValueError("Unable to enforce token budget within hard limit")
116
+
117
+ return BudgetResult(payload=normalized, token_usage=usage, total_tokens=total)
code-review-env/parser/ast_parser.py CHANGED
@@ -15,6 +15,8 @@ from parser.summarizer import summarize_module
15
  class ImportRef(BaseModel):
16
  target_module: str
17
  import_line: str
 
 
18
  edge_type: EdgeType = EdgeType.EXPLICIT_IMPORT
19
 
20
 
@@ -33,7 +35,12 @@ class _Visitor(ast.NodeVisitor):
33
  self.function_signatures: list[str] = []
34
  self.classes: list[str] = []
35
  self.constants: list[str] = []
36
- self.imports: list[tuple[str, str]] = []
 
 
 
 
 
37
 
38
  def visit_FunctionDef(self, node: ast.FunctionDef) -> None:
39
  args: list[str] = []
@@ -44,7 +51,11 @@ class _Visitor(ast.NodeVisitor):
44
  args.append(arg.arg)
45
  returns = ast.unparse(node.returns) if node.returns is not None else "None"
46
  self.function_signatures.append(f"{node.name}({', '.join(args)})->{returns}")
47
- self.generic_visit(node)
 
 
 
 
48
 
49
  def visit_AsyncFunctionDef(self, node: ast.AsyncFunctionDef) -> None:
50
  fake = ast.FunctionDef(
@@ -59,19 +70,23 @@ class _Visitor(ast.NodeVisitor):
59
 
60
  def visit_ClassDef(self, node: ast.ClassDef) -> None:
61
  self.classes.append(node.name)
62
- self.generic_visit(node)
 
 
 
 
63
 
64
  def visit_Import(self, node: ast.Import) -> None:
65
  line = ast.get_source_segment(self._source, node) or "import"
66
  for alias in node.names:
67
- self.imports.append((alias.name, line))
68
 
69
  def visit_ImportFrom(self, node: ast.ImportFrom) -> None:
70
  module = node.module or ""
71
  level = node.level or 0
72
  dotted = "." * level + module
73
  line = ast.get_source_segment(self._source, node) or "from"
74
- self.imports.append((dotted, line))
75
 
76
  def visit_Assign(self, node: ast.Assign) -> None:
77
  if isinstance(node.value, ast.Constant):
@@ -105,7 +120,18 @@ def _resolve_relative_import(current_module: str, ref: str) -> str:
105
  def parse_python_file(path: Path, root_dir: Path) -> ParsedModule:
106
  source = path.read_text(encoding="utf-8")
107
  module_id = _to_module_id(path, root_dir)
108
- tree = ast.parse(source)
 
 
 
 
 
 
 
 
 
 
 
109
 
110
  visitor = _Visitor()
111
  visitor.parse(tree, source)
@@ -114,9 +140,11 @@ def parse_python_file(path: Path, root_dir: Path) -> ParsedModule:
114
  ImportRef(
115
  target_module=_resolve_relative_import(module_id, name),
116
  import_line=line,
 
 
117
  edge_type=EdgeType.EXPLICIT_IMPORT,
118
  )
119
- for name, line in visitor.imports
120
  ]
121
 
122
  dependencies = [imp.target_module for imp in imports if imp.target_module]
@@ -138,8 +166,10 @@ def parse_directory(target_dir: Path, db_path: str | None = None) -> Store:
138
  store.clear_source_graph()
139
 
140
  py_files = sorted(target_dir.rglob("*.py"))
141
- for py_file in py_files:
142
- parsed = parse_python_file(py_file, target_dir)
 
 
143
  issues = run_linters(py_file)
144
  summary = summarize_module(parsed, issues)
145
 
@@ -155,13 +185,13 @@ def parse_directory(target_dir: Path, db_path: str | None = None) -> Store:
155
  [issue.model_dump() for issue in issues],
156
  )
157
  for imported in parsed.imports:
158
- if imported.target_module:
159
  store.upsert_edge(
160
  source_module_id=parsed.module_id,
161
  target_module_id=imported.target_module,
162
  edge_type=imported.edge_type,
163
  import_line=imported.import_line,
164
- weight=1.0,
165
  )
166
 
167
  return store
 
15
  class ImportRef(BaseModel):
16
  target_module: str
17
  import_line: str
18
+ scope: str = "module_level"
19
+ weight: float = 1.0
20
  edge_type: EdgeType = EdgeType.EXPLICIT_IMPORT
21
 
22
 
 
35
  self.function_signatures: list[str] = []
36
  self.classes: list[str] = []
37
  self.constants: list[str] = []
38
+ self.imports: list[tuple[str, str, str]] = []
39
+ self._scope_stack: list[str] = []
40
+
41
+ @property
42
+ def _scope(self) -> str:
43
+ return "function_level" if self._scope_stack else "module_level"
44
 
45
  def visit_FunctionDef(self, node: ast.FunctionDef) -> None:
46
  args: list[str] = []
 
51
  args.append(arg.arg)
52
  returns = ast.unparse(node.returns) if node.returns is not None else "None"
53
  self.function_signatures.append(f"{node.name}({', '.join(args)})->{returns}")
54
+ self._scope_stack.append(node.name)
55
+ try:
56
+ self.generic_visit(node)
57
+ finally:
58
+ self._scope_stack.pop()
59
 
60
  def visit_AsyncFunctionDef(self, node: ast.AsyncFunctionDef) -> None:
61
  fake = ast.FunctionDef(
 
70
 
71
  def visit_ClassDef(self, node: ast.ClassDef) -> None:
72
  self.classes.append(node.name)
73
+ self._scope_stack.append(node.name)
74
+ try:
75
+ self.generic_visit(node)
76
+ finally:
77
+ self._scope_stack.pop()
78
 
79
  def visit_Import(self, node: ast.Import) -> None:
80
  line = ast.get_source_segment(self._source, node) or "import"
81
  for alias in node.names:
82
+ self.imports.append((alias.name, line, self._scope))
83
 
84
  def visit_ImportFrom(self, node: ast.ImportFrom) -> None:
85
  module = node.module or ""
86
  level = node.level or 0
87
  dotted = "." * level + module
88
  line = ast.get_source_segment(self._source, node) or "from"
89
+ self.imports.append((dotted, line, self._scope))
90
 
91
  def visit_Assign(self, node: ast.Assign) -> None:
92
  if isinstance(node.value, ast.Constant):
 
120
  def parse_python_file(path: Path, root_dir: Path) -> ParsedModule:
121
  source = path.read_text(encoding="utf-8")
122
  module_id = _to_module_id(path, root_dir)
123
+ try:
124
+ tree = ast.parse(source)
125
+ except SyntaxError:
126
+ return ParsedModule(
127
+ module_id=module_id,
128
+ raw_code=source,
129
+ function_signatures=[],
130
+ classes=[],
131
+ imports=[],
132
+ constants=[],
133
+ dependencies=[],
134
+ )
135
 
136
  visitor = _Visitor()
137
  visitor.parse(tree, source)
 
140
  ImportRef(
141
  target_module=_resolve_relative_import(module_id, name),
142
  import_line=line,
143
+ scope=scope,
144
+ weight=0.5 if scope == "function_level" else 1.0,
145
  edge_type=EdgeType.EXPLICIT_IMPORT,
146
  )
147
+ for name, line, scope in visitor.imports
148
  ]
149
 
150
  dependencies = [imp.target_module for imp in imports if imp.target_module]
 
166
  store.clear_source_graph()
167
 
168
  py_files = sorted(target_dir.rglob("*.py"))
169
+ parsed_modules = [parse_python_file(py_file, target_dir) for py_file in py_files]
170
+ known_module_ids = {parsed.module_id for parsed in parsed_modules}
171
+
172
+ for py_file, parsed in zip(py_files, parsed_modules):
173
  issues = run_linters(py_file)
174
  summary = summarize_module(parsed, issues)
175
 
 
185
  [issue.model_dump() for issue in issues],
186
  )
187
  for imported in parsed.imports:
188
+ if imported.target_module and imported.target_module in known_module_ids:
189
  store.upsert_edge(
190
  source_module_id=parsed.module_id,
191
  target_module_id=imported.target_module,
192
  edge_type=imported.edge_type,
193
  import_line=imported.import_line,
194
+ weight=imported.weight,
195
  )
196
 
197
  return store
code-review-env/parser/chunker.py ADDED
@@ -0,0 +1,96 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from __future__ import annotations
2
+
3
+ import ast
4
+ from pydantic import BaseModel
5
+
6
+ from parser.ast_parser import ParsedModule
7
+
8
+
9
+ class ChunkNode(BaseModel):
10
+ module_id: str
11
+ name: str
12
+ code: str
13
+ parent_module_id: str | None = None
14
+ is_chunk: bool = False
15
+ start_line: int = 1
16
+ end_line: int = 1
17
+
18
+
19
+ class ChunkResult(BaseModel):
20
+ parent: ChunkNode
21
+ chunks: list[ChunkNode]
22
+
23
+
24
+ def _slice_lines(source: str, start: int, end: int) -> str:
25
+ lines = source.splitlines()
26
+ start_idx = max(start - 1, 0)
27
+ end_idx = min(end, len(lines))
28
+ return "\n".join(lines[start_idx:end_idx]).strip()
29
+
30
+
31
+ def chunk_module(parsed: ParsedModule, max_lines: int = 300) -> ChunkResult:
32
+ line_count = len(parsed.raw_code.splitlines())
33
+ if line_count <= max_lines:
34
+ parent = ChunkNode(
35
+ module_id=parsed.module_id,
36
+ name=parsed.module_id.split(".")[-1],
37
+ code=parsed.raw_code,
38
+ is_chunk=False,
39
+ start_line=1,
40
+ end_line=line_count,
41
+ )
42
+ return ChunkResult(parent=parent, chunks=[])
43
+
44
+ try:
45
+ tree = ast.parse(parsed.raw_code)
46
+ except SyntaxError:
47
+ parent = ChunkNode(
48
+ module_id=parsed.module_id,
49
+ name=parsed.module_id.split(".")[-1],
50
+ code=parsed.raw_code,
51
+ is_chunk=False,
52
+ start_line=1,
53
+ end_line=line_count,
54
+ )
55
+ return ChunkResult(parent=parent, chunks=[])
56
+
57
+ chunks: list[ChunkNode] = []
58
+ for node in tree.body:
59
+ if isinstance(node, (ast.FunctionDef, ast.AsyncFunctionDef, ast.ClassDef)):
60
+ start_line = int(getattr(node, "lineno", 1))
61
+ end_line = int(getattr(node, "end_lineno", start_line))
62
+ chunk_id = f"{parsed.module_id}::{node.name}"
63
+ chunks.append(
64
+ ChunkNode(
65
+ module_id=chunk_id,
66
+ name=node.name,
67
+ code=_slice_lines(parsed.raw_code, start_line, end_line),
68
+ parent_module_id=parsed.module_id,
69
+ is_chunk=True,
70
+ start_line=start_line,
71
+ end_line=end_line,
72
+ )
73
+ )
74
+
75
+ if not chunks:
76
+ chunks.append(
77
+ ChunkNode(
78
+ module_id=f"{parsed.module_id}::module_body",
79
+ name="module_body",
80
+ code=parsed.raw_code,
81
+ parent_module_id=parsed.module_id,
82
+ is_chunk=True,
83
+ start_line=1,
84
+ end_line=line_count,
85
+ )
86
+ )
87
+
88
+ parent = ChunkNode(
89
+ module_id=parsed.module_id,
90
+ name=parsed.module_id.split(".")[-1],
91
+ code="",
92
+ is_chunk=False,
93
+ start_line=1,
94
+ end_line=line_count,
95
+ )
96
+ return ChunkResult(parent=parent, chunks=chunks)
code-review-env/parser/graph_builder.py ADDED
@@ -0,0 +1,114 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from __future__ import annotations
2
+
3
+ import ast
4
+ import networkx as nx
5
+ from pydantic import BaseModel
6
+
7
+ from db.schema import EdgeType
8
+ from parser.ast_parser import ParsedModule
9
+
10
+
11
+ class EdgeRecord(BaseModel):
12
+ source_module_id: str
13
+ target_module_id: str
14
+ edge_type: EdgeType
15
+ import_line: str
16
+ scope: str
17
+ weight: float
18
+
19
+
20
+ def _build_intra_file_edges(parsed: ParsedModule, available_chunk_ids: set[str]) -> list[EdgeRecord]:
21
+ try:
22
+ tree = ast.parse(parsed.raw_code)
23
+ except SyntaxError:
24
+ return []
25
+
26
+ function_names = {
27
+ node.name
28
+ for node in tree.body
29
+ if isinstance(node, (ast.FunctionDef, ast.AsyncFunctionDef))
30
+ }
31
+ call_edges: list[EdgeRecord] = []
32
+
33
+ for node in tree.body:
34
+ if not isinstance(node, (ast.FunctionDef, ast.AsyncFunctionDef)):
35
+ continue
36
+ source_id = f"{parsed.module_id}::{node.name}"
37
+ if source_id not in available_chunk_ids:
38
+ continue
39
+ for inner in ast.walk(node):
40
+ if isinstance(inner, ast.Call) and isinstance(inner.func, ast.Name):
41
+ called = inner.func.id
42
+ if called in function_names:
43
+ target_id = f"{parsed.module_id}::{called}"
44
+ if target_id in available_chunk_ids and target_id != source_id:
45
+ call_edges.append(
46
+ EdgeRecord(
47
+ source_module_id=source_id,
48
+ target_module_id=target_id,
49
+ edge_type=EdgeType.INTRA_FILE,
50
+ import_line=f"call:{called}",
51
+ scope="function_level",
52
+ weight=0.5,
53
+ )
54
+ )
55
+
56
+ dedup: dict[tuple[str, str, str], EdgeRecord] = {}
57
+ for edge in call_edges:
58
+ key = (edge.source_module_id, edge.target_module_id, edge.import_line)
59
+ dedup[key] = edge
60
+ return list(dedup.values())
61
+
62
+
63
+ def build_edges(
64
+ parsed_modules: list[ParsedModule],
65
+ module_ids: set[str],
66
+ chunk_ids_by_parent: dict[str, set[str]],
67
+ ) -> list[EdgeRecord]:
68
+ edges: list[EdgeRecord] = []
69
+
70
+ for parsed in parsed_modules:
71
+ source_module_id = parsed.module_id
72
+ for imp in parsed.imports:
73
+ if imp.target_module and imp.target_module in module_ids:
74
+ edge_type = (
75
+ EdgeType.EXPLICIT_IMPORT
76
+ if imp.scope == "module_level"
77
+ else EdgeType.IMPLICIT_DEPENDENCY
78
+ )
79
+ edges.append(
80
+ EdgeRecord(
81
+ source_module_id=source_module_id,
82
+ target_module_id=imp.target_module,
83
+ edge_type=edge_type,
84
+ import_line=imp.import_line,
85
+ scope=imp.scope,
86
+ weight=imp.weight,
87
+ )
88
+ )
89
+
90
+ available_chunk_ids = chunk_ids_by_parent.get(parsed.module_id, set())
91
+ edges.extend(_build_intra_file_edges(parsed, available_chunk_ids))
92
+
93
+ graph = nx.DiGraph()
94
+ for edge in edges:
95
+ graph.add_edge(edge.source_module_id, edge.target_module_id)
96
+
97
+ for source_module_id, target_module_id in list(graph.edges()):
98
+ if graph.has_edge(target_module_id, source_module_id):
99
+ edges.append(
100
+ EdgeRecord(
101
+ source_module_id=source_module_id,
102
+ target_module_id=target_module_id,
103
+ edge_type=EdgeType.CIRCULAR,
104
+ import_line="cycle_detected",
105
+ scope="module_level",
106
+ weight=1.0,
107
+ )
108
+ )
109
+
110
+ dedup: dict[tuple[str, str, str], EdgeRecord] = {}
111
+ for edge in edges:
112
+ key = (edge.source_module_id, edge.target_module_id, edge.import_line)
113
+ dedup[key] = edge
114
+ return list(dedup.values())
code-review-env/parser/linter.py CHANGED
@@ -98,7 +98,36 @@ def run_bandit(path: Path) -> list[LinterIssue]:
98
  return issues
99
 
100
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
101
  def run_linters(path: Path) -> list[LinterIssue]:
102
  issues = run_pylint(path)
103
  issues.extend(run_bandit(path))
 
104
  return issues
 
98
  return issues
99
 
100
 
101
+ def run_pyflakes(path: Path) -> list[LinterIssue]:
102
+ cmd = [sys.executable, "-m", "pyflakes", str(path)]
103
+ proc = subprocess.run(cmd, capture_output=True, text=True, check=False)
104
+ payload = (proc.stdout or "").strip()
105
+ if not payload:
106
+ return []
107
+
108
+ issues: list[LinterIssue] = []
109
+ for raw_line in payload.splitlines():
110
+ line = 0
111
+ message = raw_line.strip()
112
+ if ":" in raw_line:
113
+ parts = raw_line.split(":", 3)
114
+ if len(parts) >= 3 and parts[1].isdigit():
115
+ line = int(parts[1])
116
+ message = parts[3].strip() if len(parts) == 4 else message
117
+ issues.append(
118
+ LinterIssue(
119
+ tool="pyflakes",
120
+ line=line,
121
+ severity="medium",
122
+ code="PYF000",
123
+ message=message,
124
+ )
125
+ )
126
+ return issues
127
+
128
+
129
  def run_linters(path: Path) -> list[LinterIssue]:
130
  issues = run_pylint(path)
131
  issues.extend(run_bandit(path))
132
+ issues.extend(run_pyflakes(path))
133
  return issues
code-review-env/requirements.txt CHANGED
@@ -3,6 +3,7 @@ networkx>=3.2
3
  pydantic>=2.7
4
  pylint>=3.2
5
  bandit>=1.7
 
6
  fastapi>=0.115
7
  uvicorn>=0.30
8
  openai>=1.40
 
3
  pydantic>=2.7
4
  pylint>=3.2
5
  bandit>=1.7
6
+ pyflakes>=3.2
7
  fastapi>=0.115
8
  uvicorn>=0.30
9
  openai>=1.40
code-review-env/sample_project/auth.py ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ """Auth helpers."""
2
+
3
+ import config
4
+
5
+
6
+ def issue_session_token(user_id: str) -> str:
7
+ return f"{user_id}:{config.SECRET_KEY}:session-token-generated-with-a-very-long-suffix-that-triggers-style-rules-and-is-hard-to-read"
code-review-env/sample_project/cart.py ADDED
@@ -0,0 +1,17 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Cart calculations."""
2
+
3
+ import config
4
+
5
+
6
+ def calculate_subtotal(items: list[dict[str, float]]) -> float:
7
+ subtotal = 0.0
8
+ for item in items:
9
+ subtotal += float(item.get("price", 0.0)) * float(item.get("qty", 0.0))
10
+ return subtotal
11
+
12
+
13
+ def calculate_total(items: list[dict[str, float]]) -> float:
14
+ subtotal = calculate_subtotal(items)
15
+ # BUG: config.DISCOUNT_RATE is intended to be 0.20, but set to 20 in config.
16
+ discounted = subtotal - (subtotal * config.DISCOUNT_RATE)
17
+ return discounted + (discounted * config.TAX_RATE)
code-review-env/sample_project/checkout.py ADDED
@@ -0,0 +1,15 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Checkout flow."""
2
+
3
+ import cart
4
+ import payments
5
+
6
+
7
+ def submit_order(items: list[dict[str, float]]) -> str:
8
+ total = cart.calculate_total(items)
9
+ # Cascading symptom: negative total is observed here but root cause is config -> cart.
10
+ if total < 0:
11
+ return "error: negative total"
12
+ gateway_ok = payments.run_gateway_check("https://gateway.example.com/health")
13
+ if gateway_ok != 0:
14
+ return "error: gateway"
15
+ return payments.charge(total)
code-review-env/sample_project/config.py ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ """Configuration defaults for the checkout flow."""
2
+
3
+ DISCOUNT_RATE = 20
4
+ TAX_RATE = 0.07
5
+ PAYMENT_TIMEOUT_SECONDS = 30
6
+ SECRET_KEY = "hardcoded-dev-key"
code-review-env/sample_project/database.py ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ from config import SETTINGS
2
+
3
+
4
+ def get_connection_url() -> str:
5
+ # Intentional bug for lint/security testing: unsafely concatenated DSN-like value
6
+ return "sqlite:///" + SETTINGS.get("db_path")
code-review-env/sample_project/huge_module.py ADDED
@@ -0,0 +1,628 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Large synthetic file for chunking checks."""
2
+
3
+
4
+ def bootstrap() -> int:
5
+ return 1
6
+ LINE_1 = 1
7
+ LINE_2 = 2
8
+ LINE_3 = 3
9
+ LINE_4 = 4
10
+ LINE_5 = 5
11
+ LINE_6 = 6
12
+ LINE_7 = 7
13
+ LINE_8 = 8
14
+ LINE_9 = 9
15
+ LINE_10 = 10
16
+ LINE_11 = 11
17
+ LINE_12 = 12
18
+ LINE_13 = 13
19
+ LINE_14 = 14
20
+ LINE_15 = 15
21
+ LINE_16 = 16
22
+ LINE_17 = 17
23
+ LINE_18 = 18
24
+ LINE_19 = 19
25
+ LINE_20 = 20
26
+ LINE_21 = 21
27
+ LINE_22 = 22
28
+ LINE_23 = 23
29
+ LINE_24 = 24
30
+ LINE_25 = 25
31
+ LINE_26 = 26
32
+ LINE_27 = 27
33
+ LINE_28 = 28
34
+ LINE_29 = 29
35
+ LINE_30 = 30
36
+ LINE_31 = 31
37
+ LINE_32 = 32
38
+ LINE_33 = 33
39
+ LINE_34 = 34
40
+ LINE_35 = 35
41
+ LINE_36 = 36
42
+ LINE_37 = 37
43
+ LINE_38 = 38
44
+ LINE_39 = 39
45
+ LINE_40 = 40
46
+ LINE_41 = 41
47
+ LINE_42 = 42
48
+ LINE_43 = 43
49
+ LINE_44 = 44
50
+ LINE_45 = 45
51
+ LINE_46 = 46
52
+ LINE_47 = 47
53
+ LINE_48 = 48
54
+ LINE_49 = 49
55
+ LINE_50 = 50
56
+ LINE_51 = 51
57
+ LINE_52 = 52
58
+ LINE_53 = 53
59
+ LINE_54 = 54
60
+ LINE_55 = 55
61
+ LINE_56 = 56
62
+ LINE_57 = 57
63
+ LINE_58 = 58
64
+ LINE_59 = 59
65
+ LINE_60 = 60
66
+ LINE_61 = 61
67
+ LINE_62 = 62
68
+ LINE_63 = 63
69
+ LINE_64 = 64
70
+ LINE_65 = 65
71
+ LINE_66 = 66
72
+ LINE_67 = 67
73
+ LINE_68 = 68
74
+ LINE_69 = 69
75
+ LINE_70 = 70
76
+ LINE_71 = 71
77
+ LINE_72 = 72
78
+ LINE_73 = 73
79
+ LINE_74 = 74
80
+ LINE_75 = 75
81
+ LINE_76 = 76
82
+ LINE_77 = 77
83
+ LINE_78 = 78
84
+ LINE_79 = 79
85
+ LINE_80 = 80
86
+ LINE_81 = 81
87
+ LINE_82 = 82
88
+ LINE_83 = 83
89
+ LINE_84 = 84
90
+ LINE_85 = 85
91
+ LINE_86 = 86
92
+ LINE_87 = 87
93
+ LINE_88 = 88
94
+ LINE_89 = 89
95
+ LINE_90 = 90
96
+ LINE_91 = 91
97
+ LINE_92 = 92
98
+ LINE_93 = 93
99
+ LINE_94 = 94
100
+ LINE_95 = 95
101
+ LINE_96 = 96
102
+ LINE_97 = 97
103
+ LINE_98 = 98
104
+ LINE_99 = 99
105
+ LINE_100 = 100
106
+ LINE_101 = 101
107
+ LINE_102 = 102
108
+ LINE_103 = 103
109
+ LINE_104 = 104
110
+ LINE_105 = 105
111
+ LINE_106 = 106
112
+ LINE_107 = 107
113
+ LINE_108 = 108
114
+ LINE_109 = 109
115
+ LINE_110 = 110
116
+ LINE_111 = 111
117
+ LINE_112 = 112
118
+ LINE_113 = 113
119
+ LINE_114 = 114
120
+ LINE_115 = 115
121
+ LINE_116 = 116
122
+ LINE_117 = 117
123
+ LINE_118 = 118
124
+ LINE_119 = 119
125
+ LINE_120 = 120
126
+ LINE_121 = 121
127
+ LINE_122 = 122
128
+ LINE_123 = 123
129
+ LINE_124 = 124
130
+ LINE_125 = 125
131
+ LINE_126 = 126
132
+ LINE_127 = 127
133
+ LINE_128 = 128
134
+ LINE_129 = 129
135
+ LINE_130 = 130
136
+ LINE_131 = 131
137
+ LINE_132 = 132
138
+ LINE_133 = 133
139
+ LINE_134 = 134
140
+ LINE_135 = 135
141
+ LINE_136 = 136
142
+ LINE_137 = 137
143
+ LINE_138 = 138
144
+ LINE_139 = 139
145
+ LINE_140 = 140
146
+ LINE_141 = 141
147
+ LINE_142 = 142
148
+ LINE_143 = 143
149
+ LINE_144 = 144
150
+ LINE_145 = 145
151
+ LINE_146 = 146
152
+ LINE_147 = 147
153
+ LINE_148 = 148
154
+ LINE_149 = 149
155
+ LINE_150 = 150
156
+ LINE_151 = 151
157
+ LINE_152 = 152
158
+ LINE_153 = 153
159
+ LINE_154 = 154
160
+ LINE_155 = 155
161
+ LINE_156 = 156
162
+ LINE_157 = 157
163
+ LINE_158 = 158
164
+ LINE_159 = 159
165
+ LINE_160 = 160
166
+ LINE_161 = 161
167
+ LINE_162 = 162
168
+ LINE_163 = 163
169
+ LINE_164 = 164
170
+ LINE_165 = 165
171
+ LINE_166 = 166
172
+ LINE_167 = 167
173
+ LINE_168 = 168
174
+ LINE_169 = 169
175
+ LINE_170 = 170
176
+ LINE_171 = 171
177
+ LINE_172 = 172
178
+ LINE_173 = 173
179
+ LINE_174 = 174
180
+ LINE_175 = 175
181
+ LINE_176 = 176
182
+ LINE_177 = 177
183
+ LINE_178 = 178
184
+ LINE_179 = 179
185
+ LINE_180 = 180
186
+ LINE_181 = 181
187
+ LINE_182 = 182
188
+ LINE_183 = 183
189
+ LINE_184 = 184
190
+ LINE_185 = 185
191
+ LINE_186 = 186
192
+ LINE_187 = 187
193
+ LINE_188 = 188
194
+ LINE_189 = 189
195
+ LINE_190 = 190
196
+ LINE_191 = 191
197
+ LINE_192 = 192
198
+ LINE_193 = 193
199
+ LINE_194 = 194
200
+ LINE_195 = 195
201
+ LINE_196 = 196
202
+ LINE_197 = 197
203
+ LINE_198 = 198
204
+ LINE_199 = 199
205
+ LINE_200 = 200
206
+ LINE_201 = 201
207
+ LINE_202 = 202
208
+ LINE_203 = 203
209
+ LINE_204 = 204
210
+ LINE_205 = 205
211
+ LINE_206 = 206
212
+ LINE_207 = 207
213
+ LINE_208 = 208
214
+ LINE_209 = 209
215
+ LINE_210 = 210
216
+ LINE_211 = 211
217
+ LINE_212 = 212
218
+ LINE_213 = 213
219
+ LINE_214 = 214
220
+ LINE_215 = 215
221
+ LINE_216 = 216
222
+ LINE_217 = 217
223
+ LINE_218 = 218
224
+ LINE_219 = 219
225
+ LINE_220 = 220
226
+ LINE_221 = 221
227
+ LINE_222 = 222
228
+ LINE_223 = 223
229
+ LINE_224 = 224
230
+ LINE_225 = 225
231
+ LINE_226 = 226
232
+ LINE_227 = 227
233
+ LINE_228 = 228
234
+ LINE_229 = 229
235
+ LINE_230 = 230
236
+ LINE_231 = 231
237
+ LINE_232 = 232
238
+ LINE_233 = 233
239
+ LINE_234 = 234
240
+ LINE_235 = 235
241
+ LINE_236 = 236
242
+ LINE_237 = 237
243
+ LINE_238 = 238
244
+ LINE_239 = 239
245
+ LINE_240 = 240
246
+ LINE_241 = 241
247
+ LINE_242 = 242
248
+ LINE_243 = 243
249
+ LINE_244 = 244
250
+ LINE_245 = 245
251
+ LINE_246 = 246
252
+ LINE_247 = 247
253
+ LINE_248 = 248
254
+ LINE_249 = 249
255
+ LINE_250 = 250
256
+ LINE_251 = 251
257
+ LINE_252 = 252
258
+ LINE_253 = 253
259
+ LINE_254 = 254
260
+ LINE_255 = 255
261
+ LINE_256 = 256
262
+ LINE_257 = 257
263
+ LINE_258 = 258
264
+ LINE_259 = 259
265
+ LINE_260 = 260
266
+ LINE_261 = 261
267
+ LINE_262 = 262
268
+ LINE_263 = 263
269
+ LINE_264 = 264
270
+ LINE_265 = 265
271
+ LINE_266 = 266
272
+ LINE_267 = 267
273
+ LINE_268 = 268
274
+ LINE_269 = 269
275
+ LINE_270 = 270
276
+ LINE_271 = 271
277
+ LINE_272 = 272
278
+ LINE_273 = 273
279
+ LINE_274 = 274
280
+ LINE_275 = 275
281
+ LINE_276 = 276
282
+ LINE_277 = 277
283
+ LINE_278 = 278
284
+ LINE_279 = 279
285
+ LINE_280 = 280
286
+ LINE_281 = 281
287
+ LINE_282 = 282
288
+ LINE_283 = 283
289
+ LINE_284 = 284
290
+ LINE_285 = 285
291
+ LINE_286 = 286
292
+ LINE_287 = 287
293
+ LINE_288 = 288
294
+ LINE_289 = 289
295
+ LINE_290 = 290
296
+ LINE_291 = 291
297
+ LINE_292 = 292
298
+ LINE_293 = 293
299
+ LINE_294 = 294
300
+ LINE_295 = 295
301
+ LINE_296 = 296
302
+ LINE_297 = 297
303
+ LINE_298 = 298
304
+ LINE_299 = 299
305
+ LINE_300 = 300
306
+ LINE_301 = 301
307
+ LINE_302 = 302
308
+ LINE_303 = 303
309
+ LINE_304 = 304
310
+ LINE_305 = 305
311
+ LINE_306 = 306
312
+ LINE_307 = 307
313
+ LINE_308 = 308
314
+ LINE_309 = 309
315
+ LINE_310 = 310
316
+ LINE_311 = 311
317
+ LINE_312 = 312
318
+ LINE_313 = 313
319
+ LINE_314 = 314
320
+ LINE_315 = 315
321
+ LINE_316 = 316
322
+ LINE_317 = 317
323
+ LINE_318 = 318
324
+ LINE_319 = 319
325
+ LINE_320 = 320
326
+ LINE_321 = 321
327
+ LINE_322 = 322
328
+ LINE_323 = 323
329
+ LINE_324 = 324
330
+ LINE_325 = 325
331
+ LINE_326 = 326
332
+ LINE_327 = 327
333
+ LINE_328 = 328
334
+ LINE_329 = 329
335
+ LINE_330 = 330
336
+ LINE_331 = 331
337
+ LINE_332 = 332
338
+ LINE_333 = 333
339
+ LINE_334 = 334
340
+ LINE_335 = 335
341
+ LINE_336 = 336
342
+ LINE_337 = 337
343
+ LINE_338 = 338
344
+ LINE_339 = 339
345
+ LINE_340 = 340
346
+ LINE_341 = 341
347
+ LINE_342 = 342
348
+ LINE_343 = 343
349
+ LINE_344 = 344
350
+ LINE_345 = 345
351
+ LINE_346 = 346
352
+ LINE_347 = 347
353
+ LINE_348 = 348
354
+ LINE_349 = 349
355
+ LINE_350 = 350
356
+ LINE_351 = 351
357
+ LINE_352 = 352
358
+ LINE_353 = 353
359
+ LINE_354 = 354
360
+ LINE_355 = 355
361
+ LINE_356 = 356
362
+ LINE_357 = 357
363
+ LINE_358 = 358
364
+ LINE_359 = 359
365
+ LINE_360 = 360
366
+ LINE_361 = 361
367
+ LINE_362 = 362
368
+ LINE_363 = 363
369
+ LINE_364 = 364
370
+ LINE_365 = 365
371
+ LINE_366 = 366
372
+ LINE_367 = 367
373
+ LINE_368 = 368
374
+ LINE_369 = 369
375
+ LINE_370 = 370
376
+ LINE_371 = 371
377
+ LINE_372 = 372
378
+ LINE_373 = 373
379
+ LINE_374 = 374
380
+ LINE_375 = 375
381
+ LINE_376 = 376
382
+ LINE_377 = 377
383
+ LINE_378 = 378
384
+ LINE_379 = 379
385
+ LINE_380 = 380
386
+ LINE_381 = 381
387
+ LINE_382 = 382
388
+ LINE_383 = 383
389
+ LINE_384 = 384
390
+ LINE_385 = 385
391
+ LINE_386 = 386
392
+ LINE_387 = 387
393
+ LINE_388 = 388
394
+ LINE_389 = 389
395
+ LINE_390 = 390
396
+ LINE_391 = 391
397
+ LINE_392 = 392
398
+ LINE_393 = 393
399
+ LINE_394 = 394
400
+ LINE_395 = 395
401
+ LINE_396 = 396
402
+ LINE_397 = 397
403
+ LINE_398 = 398
404
+ LINE_399 = 399
405
+ LINE_400 = 400
406
+ LINE_401 = 401
407
+ LINE_402 = 402
408
+ LINE_403 = 403
409
+ LINE_404 = 404
410
+ LINE_405 = 405
411
+ LINE_406 = 406
412
+ LINE_407 = 407
413
+ LINE_408 = 408
414
+ LINE_409 = 409
415
+ LINE_410 = 410
416
+ LINE_411 = 411
417
+ LINE_412 = 412
418
+ LINE_413 = 413
419
+ LINE_414 = 414
420
+ LINE_415 = 415
421
+ LINE_416 = 416
422
+ LINE_417 = 417
423
+ LINE_418 = 418
424
+ LINE_419 = 419
425
+ LINE_420 = 420
426
+ LINE_421 = 421
427
+ LINE_422 = 422
428
+ LINE_423 = 423
429
+ LINE_424 = 424
430
+ LINE_425 = 425
431
+ LINE_426 = 426
432
+ LINE_427 = 427
433
+ LINE_428 = 428
434
+ LINE_429 = 429
435
+ LINE_430 = 430
436
+
437
+
438
+ def helper_alpha() -> int:
439
+ return LINE_10 + LINE_20
440
+
441
+
442
+ def helper_beta() -> int:
443
+ return helper_alpha()
444
+
445
+
446
+ class GiantService:
447
+ def run(self) -> int:
448
+ return helper_beta()
449
+
450
+
451
+ def auto_func_1() -> int:
452
+ return 1
453
+
454
+
455
+ def auto_func_2() -> int:
456
+ return 2
457
+
458
+
459
+ def auto_func_3() -> int:
460
+ return 3
461
+
462
+
463
+ def auto_func_4() -> int:
464
+ return 4
465
+
466
+
467
+ def auto_func_5() -> int:
468
+ return 5
469
+
470
+
471
+ def auto_func_6() -> int:
472
+ return 6
473
+
474
+
475
+ def auto_func_7() -> int:
476
+ return 7
477
+
478
+
479
+ def auto_func_8() -> int:
480
+ return 8
481
+
482
+
483
+ def auto_func_9() -> int:
484
+ return 9
485
+
486
+
487
+ def auto_func_10() -> int:
488
+ return 10
489
+
490
+
491
+ def auto_func_11() -> int:
492
+ return 11
493
+
494
+
495
+ def auto_func_12() -> int:
496
+ return 12
497
+
498
+
499
+ def auto_func_13() -> int:
500
+ return 13
501
+
502
+
503
+ def auto_func_14() -> int:
504
+ return 14
505
+
506
+
507
+ def auto_func_15() -> int:
508
+ return 15
509
+
510
+
511
+ def auto_func_16() -> int:
512
+ return 16
513
+
514
+
515
+ def auto_func_17() -> int:
516
+ return 17
517
+
518
+
519
+ def auto_func_18() -> int:
520
+ return 18
521
+
522
+
523
+ def auto_func_19() -> int:
524
+ return 19
525
+
526
+
527
+ def auto_func_20() -> int:
528
+ return 20
529
+
530
+
531
+ def auto_func_21() -> int:
532
+ return 21
533
+
534
+
535
+ def auto_func_22() -> int:
536
+ return 22
537
+
538
+
539
+ def auto_func_23() -> int:
540
+ return 23
541
+
542
+
543
+ def auto_func_24() -> int:
544
+ return 24
545
+
546
+
547
+ def auto_func_25() -> int:
548
+ return 25
549
+
550
+
551
+ def auto_func_26() -> int:
552
+ return 26
553
+
554
+
555
+ def auto_func_27() -> int:
556
+ return 27
557
+
558
+
559
+ def auto_func_28() -> int:
560
+ return 28
561
+
562
+
563
+ def auto_func_29() -> int:
564
+ return 29
565
+
566
+
567
+ def auto_func_30() -> int:
568
+ return 30
569
+
570
+
571
+ def auto_func_31() -> int:
572
+ return 31
573
+
574
+
575
+ def auto_func_32() -> int:
576
+ return 32
577
+
578
+
579
+ def auto_func_33() -> int:
580
+ return 33
581
+
582
+
583
+ def auto_func_34() -> int:
584
+ return 34
585
+
586
+
587
+ def auto_func_35() -> int:
588
+ return 35
589
+
590
+
591
+ def auto_func_36() -> int:
592
+ return 36
593
+
594
+
595
+ def auto_func_37() -> int:
596
+ return 37
597
+
598
+
599
+ def auto_func_38() -> int:
600
+ return 38
601
+
602
+
603
+ def auto_func_39() -> int:
604
+ return 39
605
+
606
+
607
+ def auto_func_40() -> int:
608
+ return 40
609
+
610
+
611
+ def auto_func_41() -> int:
612
+ return 41
613
+
614
+
615
+ def auto_func_42() -> int:
616
+ return 42
617
+
618
+
619
+ def auto_func_43() -> int:
620
+ return 43
621
+
622
+
623
+ def auto_func_44() -> int:
624
+ return 44
625
+
626
+
627
+ def auto_func_45() -> int:
628
+ return 45
code-review-env/sample_project/inventory.py ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ from validators import is_non_empty
2
+
3
+
4
+ STOCK = {"widget": 4, "gizmo": 0}
5
+
6
+
7
+ def is_available(item_name: str) -> bool:
8
+ if not is_non_empty(item_name):
9
+ return False
10
+ return STOCK.get(item_name, 0) > 0
code-review-env/sample_project/notifications.py ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ import smtplib
2
+
3
+
4
+ def send_email(recipient: str, body: str) -> None:
5
+ client = smtplib.SMTP("localhost")
6
+ client.sendmail("noreply@example.com", [recipient], body)
code-review-env/sample_project/payments.py ADDED
@@ -0,0 +1,15 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Payment gateway wrapper."""
2
+
3
+ import subprocess
4
+
5
+
6
+ def run_gateway_check(endpoint: str) -> int:
7
+ # SECURITY ISSUE: user-provided endpoint is interpolated in a shell command.
8
+ command = f"curl -s {endpoint}"
9
+ return subprocess.call(command, shell=True)
10
+
11
+
12
+ def charge(total: float) -> str:
13
+ if total <= 0:
14
+ return "rejected"
15
+ return "charged"
code-review-env/sample_project/utils.py ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ from inventory import is_available
2
+
3
+
4
+ def pick_item(preferred: str, fallback: str) -> str:
5
+ if is_available(preferred):
6
+ return preferred
7
+ return fallback
code-review-env/sample_project/validators.py ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+
2
+ def is_non_empty(value: str | None) -> bool:
3
+ return value is not None and value.strip() != ""
4
+
5
+
6
+ def validate_coupon(code: str | None) -> bool:
7
+ # Intentional bug: accepts invalid short code when value is None
8
+ return (code or "").startswith("SAVE")
code-review-env/tests/test_phase2_graph_manager.py ADDED
@@ -0,0 +1,32 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from pathlib import Path
2
+
3
+ from db.seed import seed_project
4
+ from graph.graph_manager import GraphManager
5
+
6
+
7
+ def test_graph_manager_traversal_is_deterministic(tmp_path: Path) -> None:
8
+ db_path = tmp_path / "phase2_graph.db"
9
+ seed_project(Path("sample_project"), db_path=str(db_path), force=True)
10
+
11
+ manager = GraphManager(source_root="sample_project", db_path=db_path)
12
+ first = manager.traversal_order()
13
+ second = manager.traversal_order()
14
+
15
+ assert first == second
16
+ assert len(first) > 0
17
+
18
+
19
+ def test_graph_manager_neighbor_queries(tmp_path: Path) -> None:
20
+ db_path = tmp_path / "phase2_graph_neighbors.db"
21
+ seed_project(Path("sample_project"), db_path=str(db_path), force=True)
22
+
23
+ manager = GraphManager(source_root="sample_project", db_path=db_path)
24
+ graph = manager.load_graph()
25
+ candidate = next(iter(graph.nodes()))
26
+
27
+ both = manager.get_neighbors(candidate, direction="both")
28
+ only_out = manager.get_neighbors(candidate, direction="out")
29
+ only_in = manager.get_neighbors(candidate, direction="in")
30
+
31
+ assert set(only_out).issubset(set(both))
32
+ assert set(only_in).issubset(set(both))
code-review-env/tests/test_phase2_observation.py ADDED
@@ -0,0 +1,55 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from pathlib import Path
2
+
3
+ import pytest
4
+
5
+ from db.seed import seed_project
6
+ from env.observation import CodeObservation
7
+ from env.observation_builder import ObservationBuilder
8
+
9
+
10
+ def test_code_observation_strict_rejects_bad_types() -> None:
11
+ with pytest.raises(Exception):
12
+ CodeObservation(
13
+ module_id="checkout",
14
+ code="print('x')",
15
+ ast_summary={},
16
+ dependency_summaries=[],
17
+ dependent_summaries=[],
18
+ neighbor_reviews=[],
19
+ task_description="review",
20
+ available_actions=[],
21
+ requested_context=None,
22
+ token_usage={},
23
+ total_tokens="100", # type: ignore[arg-type]
24
+ within_budget=True,
25
+ )
26
+
27
+
28
+ def test_observation_builder_within_budget(tmp_path: Path) -> None:
29
+ db_path = tmp_path / "phase2_obs.db"
30
+ seed_project(Path("sample_project"), db_path=str(db_path), force=True)
31
+
32
+ builder = ObservationBuilder(source_root="sample_project", db_path=db_path)
33
+ observation = builder.build(
34
+ module_id="checkout",
35
+ task_description="Find logic and dependency issues",
36
+ )
37
+
38
+ assert observation.within_budget is True
39
+ assert observation.total_tokens <= 2000
40
+ assert observation.module_id == "checkout"
41
+
42
+
43
+ def test_request_context_is_bounded(tmp_path: Path) -> None:
44
+ db_path = tmp_path / "phase2_context.db"
45
+ seed_project(Path("sample_project"), db_path=str(db_path), force=True)
46
+
47
+ builder = ObservationBuilder(source_root="sample_project", db_path=db_path)
48
+ observation = builder.build(
49
+ module_id="checkout",
50
+ task_description="Investigate dependencies",
51
+ context_request="auth",
52
+ )
53
+
54
+ assert observation.requested_context is not None
55
+ assert observation.total_tokens <= 2000
code-review-env/tests/test_phase2_token_budget.py ADDED
@@ -0,0 +1,42 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from graph.token_budget import MAX_TOTAL_TOKENS, TokenBudget
2
+
3
+
4
+ def test_token_budget_enforces_hard_cap() -> None:
5
+ budget = TokenBudget()
6
+ huge = "x" * 50000
7
+
8
+ result = budget.enforce(
9
+ {
10
+ "code": huge,
11
+ "ast_summary_text": huge,
12
+ "dependency_summaries": [huge, huge],
13
+ "dependent_summaries": [huge],
14
+ "neighbor_reviews": [huge],
15
+ "task_description": huge,
16
+ "available_actions": ["FLAG_BUG"],
17
+ "requested_context_code": huge,
18
+ }
19
+ )
20
+
21
+ assert result.total_tokens <= MAX_TOTAL_TOKENS
22
+
23
+
24
+ def test_token_budget_marks_truncation() -> None:
25
+ budget = TokenBudget()
26
+ huge = "z" * 20000
27
+
28
+ result = budget.enforce(
29
+ {
30
+ "code": huge,
31
+ "ast_summary_text": "{}",
32
+ "dependency_summaries": [],
33
+ "dependent_summaries": [],
34
+ "neighbor_reviews": [],
35
+ "task_description": "task",
36
+ "available_actions": ["REQUEST_CONTEXT"],
37
+ "requested_context_code": huge,
38
+ }
39
+ )
40
+
41
+ trimmed_code = str(result.payload["code"])
42
+ assert "[TRUNCATED]" in trimmed_code
code-review-env/tests/test_seed.py ADDED
@@ -0,0 +1,30 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from pathlib import Path
2
+
3
+ from db.seed import seed_project
4
+ from parser.ast_parser import parse_python_file
5
+ from parser.chunker import chunk_module
6
+
7
+
8
+ def test_seed_project_uses_hash_cache(tmp_path: Path) -> None:
9
+ db_path = tmp_path / "seed.db"
10
+ target = Path("sample_project")
11
+
12
+ first = seed_project(target, db_path=str(db_path), force=False)
13
+ second = seed_project(target, db_path=str(db_path), force=False)
14
+
15
+ assert first["loaded_from_cache"] is False
16
+ assert second["loaded_from_cache"] is True
17
+ assert first["node_count"] == second["node_count"]
18
+ assert first["edge_count"] == second["edge_count"]
19
+
20
+
21
+ def test_chunker_splits_large_module_into_sub_nodes() -> None:
22
+ root = Path("sample_project")
23
+ parsed = parse_python_file(root / "huge_module.py", root)
24
+ chunked = chunk_module(parsed, max_lines=300)
25
+
26
+ assert chunked.parent.module_id == "huge_module"
27
+ assert chunked.parent.code == ""
28
+ assert len(chunked.chunks) >= 2
29
+ assert all(chunk.parent_module_id == "huge_module" for chunk in chunked.chunks)
30
+ assert any("::helper_alpha" in chunk.module_id for chunk in chunked.chunks)
plans/phase-02-graph-manager-observation-plan.md ADDED
@@ -0,0 +1,206 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Phase 2 Plan — Graph Manager & Observation Builder (for GPT-5.3)
2
+
3
+ ## Objective
4
+ Deliver Phase 2 only:
5
+ - graph/graph_manager.py: load graph from SQLite, traversal order, neighbor queries
6
+ - graph/token_budget.py: hard 2000-token enforcement with per-component limits
7
+ - env/observation.py: strict Pydantic v2 CodeObservation model
8
+
9
+ No Phase 3+ implementation in this phase.
10
+
11
+ ## Context7-Validated Constraints To Use
12
+ 1. SQLAlchemy 2.0 + SQLite:
13
+ - Use SQLAlchemy ORM patterns with Declarative models and explicit Session boundaries.
14
+ - Keep read-heavy graph fetches in short-lived sessions.
15
+
16
+ 2. NetworkX traversal and determinism:
17
+ - Use DAG topological utilities when possible.
18
+ - Use deterministic ordering (lexicographical tie-breaking) to avoid run-to-run drift.
19
+ - Betweenness centrality is available for ranking high-impact nodes.
20
+
21
+ 3. Pydantic v2 model strictness:
22
+ - Use BaseModel with strict config and forbid unknown fields.
23
+ - Use model_validate/model_dump APIs consistently.
24
+
25
+ ## Current Codebase Reality (important for Phase 2)
26
+ 1. Existing graph logic is in env/graph.py, not graph/graph_manager.py.
27
+ 2. env/observation_builder.py and env/models.py are placeholders.
28
+ 3. DB layer currently uses SQLModel schema classes in db/schema.py.
29
+
30
+ Implication: Phase 2 should add the target files while preserving compatibility with existing imports/tests where possible.
31
+
32
+ ## Proposed Phase 2 Deliverables
33
+
34
+ ### 1) Create graph package and GraphManager
35
+ Files:
36
+ - code-review-env/graph/__init__.py
37
+ - code-review-env/graph/graph_manager.py
38
+
39
+ Planned API:
40
+ - class GraphManager:
41
+ - __init__(self, source_root: str, db_path: str | None = None)
42
+ - load_graph(self) -> nx.DiGraph
43
+ - get_node(self, module_id: str) -> dict[str, object]
44
+ - get_neighbors(self, module_id: str, direction: Literal["out", "in", "both"], limit: int | None = None) -> list[str]
45
+ - traversal_order(self) -> list[str]
46
+ - centrality(self) -> dict[str, float]
47
+
48
+ Implementation rules:
49
+ - Load modules/edges from SQLite as source of truth.
50
+ - Add all module metadata needed for observations as node attributes.
51
+ - traversal_order target behavior:
52
+ - Prefer leaf-first review order.
53
+ - Push high-centrality nodes later.
54
+ - Deterministic tie-breaker by module_id.
55
+ - Recommended approach:
56
+ - Reverse-edge DAG ordering for leaf-first when acyclic.
57
+ - If cyclic, condense SCCs or apply stable fallback ordering by:
58
+ 1) out_degree ascending
59
+ 2) betweenness centrality ascending
60
+ 3) module_id ascending
61
+
62
+ Compatibility note:
63
+ - Keep env/graph.py as a thin wrapper or adapter to GraphManager until all callers migrate.
64
+
65
+ ### 2) Implement hard token budget module
66
+ File:
67
+ - code-review-env/graph/token_budget.py
68
+
69
+ Constants:
70
+ - MAX_TOTAL_TOKENS = 2000
71
+ - COMPONENT_BUDGETS (initial defaults from plan):
72
+ - current_code: 800
73
+ - ast_summary: 100
74
+ - direct_deps: 250
75
+ - dependents: 150
76
+ - neighbor_reviews: 120
77
+ - task_and_actions: 200
78
+ - buffer: 280
79
+
80
+ Planned API:
81
+ - estimate_tokens(text: str) -> int
82
+ - truncate_to_budget(text: str, max_tokens: int, suffix_notice: str) -> str
83
+ - allocate_budget(components: dict[str, str | list[str]]) -> dict[str, object]
84
+ - returns included/truncated text + per-component token usage + total
85
+ - enforce_observation_budget(observation_payload: dict[str, object]) -> dict[str, object]
86
+
87
+ Implementation rules:
88
+ - Budget must be enforced, never advisory.
89
+ - If full payload exceeds 2000, trim in priority order:
90
+ 1) dependent summaries
91
+ 2) neighbor reviews
92
+ 3) direct dependency summaries (lowest-ranked first)
93
+ 4) current code (but preserve critical context header + truncation notice)
94
+ - REQUEST_CONTEXT path must still obey MAX_TOTAL_TOKENS and return full neighbor code only when it fits; otherwise return bounded code + explicit truncation marker.
95
+
96
+ Token estimator policy:
97
+ - Start with deterministic approximation for stability (for example chars/4 heuristic).
98
+ - Keep estimator in one function to allow later swap to model-specific tokenizer without API break.
99
+
100
+ ### 3) Implement strict Pydantic observation model
101
+ File:
102
+ - code-review-env/env/observation.py
103
+
104
+ Planned models:
105
+ - class NeighborSummary(BaseModel)
106
+ - module_id: str
107
+ - relation: Literal["dependency", "dependent"]
108
+ - summary: str
109
+ - review_snippet: str | None
110
+
111
+ - class RequestedContext(BaseModel)
112
+ - module_id: str
113
+ - code: str
114
+ - was_truncated: bool
115
+
116
+ - class CodeObservation(BaseModel)
117
+ - module_id: str
118
+ - code: str
119
+ - ast_summary: dict[str, object]
120
+ - dependency_summaries: list[NeighborSummary]
121
+ - dependent_summaries: list[NeighborSummary]
122
+ - neighbor_reviews: list[str]
123
+ - task_description: str
124
+ - available_actions: list[str]
125
+ - requested_context: RequestedContext | None = None
126
+ - token_usage: dict[str, int]
127
+ - total_tokens: int
128
+ - within_budget: bool
129
+
130
+ Model config:
131
+ - strict=True
132
+ - extra="forbid"
133
+
134
+ Validation rules:
135
+ - total_tokens <= 2000 must be true.
136
+ - module_id and code cannot be empty.
137
+ - dependency/dependent list limits enforced before serialization.
138
+
139
+ ### 4) Observation assembly integration path
140
+ File to update in Phase 2:
141
+ - code-review-env/env/observation_builder.py
142
+
143
+ Plan:
144
+ - Replace placeholder with builder that composes:
145
+ - GraphManager neighbor and ordering queries
146
+ - DB-backed module source + summaries + review annotations
147
+ - TokenBudget allocation and enforcement
148
+ - CodeObservation validation
149
+
150
+ Behavior:
151
+ - Default observation returns current module + compressed neighbors.
152
+ - REQUEST_CONTEXT(module_id): include requested neighbor code in requested_context while still meeting global budget.
153
+
154
+ ## Verification Plan (must pass before Phase 2 complete)
155
+
156
+ ### A) Unit tests to add/update
157
+ 1. tests/test_graph_manager_phase2.py
158
+ - load_graph builds expected node/edge counts from seeded DB.
159
+ - traversal_order places leaf nodes earlier than high-centrality hubs.
160
+ - ordering is deterministic across repeated calls.
161
+
162
+ 2. tests/test_token_budget_phase2.py
163
+ - enforce_observation_budget always returns total_tokens <= 2000.
164
+ - long current code is truncated with explicit notice.
165
+ - REQUEST_CONTEXT path stays within 2000.
166
+
167
+ 3. tests/test_observation_phase2.py
168
+ - CodeObservation strict validation rejects unknown fields/type coercion.
169
+ - valid payload serializes with model_dump and preserves token fields.
170
+
171
+ ### B) Scenario checks
172
+ 1. Seed sample_project SQLite DB.
173
+ 2. Build observation for every module_id in modules table.
174
+ 3. Assert all observations are within budget.
175
+ 4. Trigger REQUEST_CONTEXT for high-fanout node and validate bounded response.
176
+
177
+ ### C) Determinism checks
178
+ 1. Run traversal_order 10 times on same DB snapshot.
179
+ 2. Output order must be identical each run.
180
+
181
+ ## Risks and Mitigations
182
+ 1. Existing env/graph.py may conflict with new graph/graph_manager.py.
183
+ - Mitigation: keep wrapper compatibility until callers migrate.
184
+
185
+ 2. SQLModel vs SQLAlchemy ORM naming mismatch in current schema.
186
+ - Mitigation: Phase 2 consumes existing schema as-is; DB table redesign deferred unless explicitly approved.
187
+
188
+ 3. Token estimation mismatch vs actual model tokenizer.
189
+ - Mitigation: enforce conservative budget with safety buffer; keep estimator swappable.
190
+
191
+ ## Design Questions To Resolve Before Implementation
192
+ 1. File structure decision:
193
+ - Should Phase 2 introduce new graph/ package now and keep env/graph.py compatibility wrapper, or refactor callers immediately?
194
+
195
+ 2. Schema alignment decision:
196
+ - Keep current SQLModel-backed tables in Phase 2 and map to planned names later, or perform a schema migration now?
197
+
198
+ 3. REQUEST_CONTEXT strictness:
199
+ - If full neighbor code cannot fit, should response be truncated (with marker) or should the action fail with explicit error and no code body?
200
+
201
+ ## Definition of Done for Phase 2
202
+ 1. graph/graph_manager.py, graph/token_budget.py, env/observation.py implemented with type hints and docstrings.
203
+ 2. observation_builder builds validated CodeObservation objects.
204
+ 3. All Phase 2 tests pass.
205
+ 4. Every generated observation satisfies hard <= 2000 token limit.
206
+ 5. Traversal order behavior matches leaf-first and high-centrality-last intent with deterministic ties.