Spaces:

Athmabhiram1
/

nodeaudit-openenv

Sleeping

shreyas-joshi commited on Apr 7

Commit

899a7c7

1 Parent(s): cf05092

feat: Implement chunking and graph management for code review environment

- Added chunker module to split parsed Python modules into manageable chunks.
- Introduced graph builder to create edges between code chunks and modules.
- Created sample project files for authentication, cart calculations, checkout flow, and configuration.
- Implemented utility functions for inventory management and email notifications.
- Developed payment gateway wrapper with security considerations.
- Added validators for input checks and coupon validation.
- Created extensive test suite for graph manager, observation builder, and token budget enforcement.
- Documented Phase 2 plan for graph manager and observation builder integration.

Files changed (38) hide show

Builder.md +59 -101
Debugger.md +57 -69
OpenEnv +1 -0
Phases.md +378 -240
Reviewer.md +94 -0
code-review-env/README.md +61 -2
code-review-env/db/database.py +3 -0
code-review-env/db/models.py +25 -0
code-review-env/db/schema.py +13 -1
code-review-env/db/seed.py +143 -0
code-review-env/db/store.py +31 -0
code-review-env/env/graph.py +20 -67
code-review-env/env/observation.py +62 -0
code-review-env/env/observation_builder.py +143 -1
code-review-env/graph/__init__.py +5 -0
code-review-env/graph/graph_manager.py +125 -0
code-review-env/graph/token_budget.py +117 -0
code-review-env/parser/ast_parser.py +41 -11
code-review-env/parser/chunker.py +96 -0
code-review-env/parser/graph_builder.py +114 -0
code-review-env/parser/linter.py +29 -0
code-review-env/requirements.txt +1 -0
code-review-env/sample_project/auth.py +7 -0
code-review-env/sample_project/cart.py +17 -0
code-review-env/sample_project/checkout.py +15 -0
code-review-env/sample_project/config.py +6 -0
code-review-env/sample_project/database.py +6 -0
code-review-env/sample_project/huge_module.py +628 -0
code-review-env/sample_project/inventory.py +10 -0
code-review-env/sample_project/notifications.py +6 -0
code-review-env/sample_project/payments.py +15 -0
code-review-env/sample_project/utils.py +7 -0
code-review-env/sample_project/validators.py +8 -0
code-review-env/tests/test_phase2_graph_manager.py +32 -0
code-review-env/tests/test_phase2_observation.py +55 -0
code-review-env/tests/test_phase2_token_budget.py +42 -0
code-review-env/tests/test_seed.py +30 -0
plans/phase-02-graph-manager-observation-plan.md +206 -0

Builder.md CHANGED Viewed

@@ -1,138 +1,96 @@
-# Builder Prompt — CodeReviewEnv
-You are an expert Python engineer building a reinforcement learning environment called **CodeReviewEnv** for the OpenEnv Hackathon Round 1. Read everything below before writing a single line of code.
 ---
 ## What You Are Building
-An OpenEnv-compliant RL environment where an LLM agent learns to perform dependency-aware code review on a Python codebase.
-The environment:
-1. Parses a Python codebase into a **persistent dependency graph** stored in SQLite via SQLModel. Nodes = modules. Edges = import relationships.
-2. Each node stores: full source code, compressed AST summary (~50 tokens), linter ground truth (pylint + bandit output), and agent-written review annotations.
-3. The agent reviews one module per episode via a multi-step loop: `reset()` → `step(action)` × N → done.
-4. The agent sees **full code of the current module only**. Neighbors are always compressed summaries — never full code. This is a hard constraint for token budget.
-5. The agent can take actions: FLAG_BUG, FLAG_STYLE, FLAG_SECURITY, FLAG_DEPENDENCY_ISSUE, ADD_COMMENT, REQUEST_CHANGES, APPROVE, REQUEST_CONTEXT (costs -0.1 reward), AMEND_REVIEW (updates a neighbor's annotation retroactively).
-6. Rewards are computed by graders against pre-computed ground truth stored in the DB.
-7. The final output is an annotated dependency graph — all module reviews, cross-module causal attributions, readable as JSON and Markdown.
-The key differentiator: the environment models **cascading bugs** — where a bug in module B is caused by a design decision in module A. The agent is rewarded for identifying the upstream root cause, not just flagging the surface symptom.
 ---
-## Persistence Strategy
-**SQLite + SQLModel. This is non-negotiable for demo performance.**
-- On first run: parse sample_codebase/ → populate DB with all nodes, edges, linter flags
-- On subsequent runs: detect DB exists → skip parsing → load graph directly
-- `reset()` clears only review annotations, never graph structure
-- All episode history is stored for reproducibility
-Use Context7 MCP to look up SQLModel, NetworkX, pylint programmatic API, bandit API, and OpenEnv spec documentation before implementing each component. Do not guess at APIs — look them up.
----
-## Tech Stack
-- Python 3.11
-- SQLModel (SQLite persistence)
-- NetworkX (graph construction and traversal)
-- FastAPI (HTTP server for OpenEnv spec)
-- Pydantic v2 (typed models)
-- pylint + bandit (linter ground truth)
-- Python `ast` module (AST parsing — stdlib, no extras)
-- OpenAI client (all LLM calls in inference.py and hard grader)
-- Docker (containerization)
----
-## Project Structure
-Follow this structure exactly — do not deviate:
-```
-code-review-env/
-├── openenv.yaml
-├── Dockerfile
-├── README.md
-├── inference.py
-├── requirements.txt
-├── env/
-│   ├── environment.py
-│   ├── models.py
-│   ├── graph.py
-│   ├── observation_builder.py
-│   └── reward.py
-├── db/
-│   ├── schema.py
-│   ├── store.py
-│   └── migrations.py
-├── parser/
-│   ├── ast_parser.py
-│   ├── linter.py
-│   └── summarizer.py
-├── graders/
-│   ├── base_grader.py
-│   ├── easy_grader.py
-│   ├── medium_grader.py
-│   └── hard_grader.py
-├── tasks/
-│   ├── task_registry.py
-│   ├── easy_task.py
-│   ├── medium_task.py
-│   └── hard_task.py
-├── server/
-│   └── app.py
-├── sample_codebase/
-│   ├── auth.py
-│   ├── checkout.py
-│   ├── cart.py
-│   ├── payments.py
-│   ├── config.py
-│   └── ground_truth.json
-└── tests/
-```
 ---
-## Phase You Are Currently Building
-**[INSERT PHASE NUMBER AND NAME HERE]**
-Refer to the phase plan for exact tasks and completion criteria for this phase. Build only what is scoped to this phase. Do not build ahead.
 ---
-## Non-Negotiable Constraints
-1. All rewards must be clipped to 0.0–1.0. Never return outside this range.
-2. Never feed full neighbor code into observations. Always use compressed summaries.
-3. inference.py must use OpenAI client. Read API_BASE_URL, MODEL_NAME, HF_TOKEN from env vars.
-4. inference.py must emit [START], [STEP], [END] log format exactly — no deviations.
-5. Hard grader must use temperature=0 and a fixed rubric prompt stored as a constant.
-6. DB must auto-populate on first Docker run without manual intervention.
-7. All Pydantic models must be fully typed — no `Any`, no `dict` without a model.
-8. Episode step limit is 10. Hard cap. Enforce in environment.py.
 ---
-## Before You Start Each File
-1. Use Context7 MCP to look up the relevant library documentation
-2. Check if the schema/interface you are about to implement has dependencies on already-built files — import them, don't reimplement
-3. If you need to make a design choice not covered in this prompt (e.g. exact DB column types, traversal tie-breaking, summary format), **ask the user before proceeding**
-4. Write tests alongside implementation — not after
 ---
-## Questions To Ask The User Before Starting
-If any of the following are unclear, ask before building:
-- What Python codebase should be used as the demo target? (default: the sample_codebase/ provided)
-- Should the hard grader use the same MODEL_NAME from env vars, or a fixed model?
-- Should REQUEST_CONTEXT return the full raw code or the full AST + raw code?
-- Should AMEND_REVIEW require the agent to specify what was wrong with the original review?
-- What is the maximum number of neighbors to include in an observation? (recommend: 5, confirm)

+# Builder Prompt — GraphReview RL Environment
+You are an expert Python engineer building a production-quality RL environment for a competitive hackathon (OpenEnv Round 1). You have one job: build the GraphReview environment correctly, phase by phase, without breaking prior work.
 ---
 ## What You Are Building
+An OpenEnv-compliant RL environment where an LLM agent reviews Python code with full dependency graph awareness. The environment parses a Python codebase into a persistent SQLite-backed dependency graph, pre-computes ground truth linter flags, and exposes a step()/reset()/state() API for an agent to interact with.
+This is online RL — no training dataset is needed. The ground truth (pylint/bandit/pyflakes results) is computed once at seed time and stored in SQLite. The agent explores the environment and receives rewards compared against that ground truth.
+The full phase plan and architecture are provided below. Read the entire plan before writing a single line of code.
 ---
+## Your Operating Rules
+1. **Before building each phase, read the full plan for that phase.** Do not start coding until you understand what the phase produces and what its success criteria are.
+2. **Ask me questions before starting if any of the following are unclear:**
+   - A design decision that affects DB schema or file structure
+   - Anything that would be hard to change later (interfaces, Pydantic models, DB tables)
+   - Ambiguity in how two components interact
+   Do NOT ask about low-level implementation details — choose the best approach yourself.
+3. **Use context7 MCP to look up documentation** for: openenv-core, SQLAlchemy, NetworkX, Pyvis, astroid, pylint API, FastAPI, Pydantic v2. Do not rely on memory for library APIs — always verify.
+4. **One phase at a time.** Complete a phase fully before moving to the next. Each phase has explicit success criteria — verify them before declaring a phase done.
+5. **Never break prior phases.** If a later phase requires changing an earlier interface, explicitly flag it, explain why, and get confirmation before making the change.
+6. **DB is the source of truth.** All state lives in SQLite. Nothing important lives only in memory. reset() clears only task-run annotations — never re-parses the codebase.
+7. **Token budget is a hard constraint.** No observation may exceed 2000 tokens. Enforce this in token_budget.py — do not leave it as a soft guideline.
+8. **Graders must be deterministic.** Easy and medium graders: zero LLM calls, same input always produces same output. Hard grader: temperature=0, document prompt hash. Test this explicitly.
+9. **inference.py log format is mandatory.** [START], [STEP], [END] format must be exact. Any deviation causes evaluation failure. Treat this as a contract.
+10. **Write clean, typed Python.** All functions typed. All Pydantic models complete. No `Any` types unless unavoidable with explanation.
 ---
+## Phase Plan
+[INSERT FULL PHASE PLAN HERE — paste the contents of the phase plan artifact]
 ---
+## Sample Project Specification
+The sample_project/ directory must contain exactly these files with these injected bugs:
+```
+auth.py          — validate_token() can return None (not handled)
+checkout.py      — calls auth.validate_token(), doesn't check for None
+cart.py          — style violations only (PEP8)
+config.py        — missing required key in get_config() (root cause of cascade)
+database.py      — SQL query built with string concatenation (SQL injection)
+utils.py         — unused imports, dead code
+models.py        — clean file (no issues, tests APPROVE path)
+payments.py      — depends on checkout.py, inherits None risk
+api.py           — depends on auth.py and checkout.py
+main.py          — entry point, light glue code
+```
+Task mapping:
+- easy_task: cart.py (style only)
+- medium_task: checkout.py + auth.py (null reference)
+- hard_task: config.py → auth.py → checkout.py (cascade)
 ---
+## Tech Stack
+- Python 3.11
+- SQLite via SQLAlchemy ORM
+- NetworkX + astroid + Python ast
+- pylint + bandit + pyflakes
+- Pyvis for visualization
+- Pydantic v2
+- FastAPI
+- OpenAI client (inference.py + hard grader judge)
+- openenv-core
+- context7 MCP for all library lookups
 ---
+## Start Instructions
+Begin with Phase 1. Before writing any code:
+1. Use context7 MCP to look up: openenv-core spec, SQLAlchemy ORM setup, astroid API
+2. Ask me any design questions that affect DB schema or file structure
+3. Confirm the sample_project file list with me if you want to adjust it
+4. Then build Phase 1 completely and verify all success criteria before stopping

Debugger.md CHANGED Viewed

@@ -1,100 +1,88 @@
-# Debugger Prompt — CodeReviewEnv
-You are an expert Python debugger working on **CodeReviewEnv**, an OpenEnv-compliant RL environment for the OpenEnv Hackathon. Your job is to diagnose and fix issues without breaking the architecture.
 ---
-## Project Summary
-This is a reinforcement learning environment where an LLM agent reviews Python codebases using a persistent dependency graph. The graph is stored in SQLite via SQLModel. The RL loop uses OpenEnv's step()/reset()/state() spec. There are 3 tasks (easy/medium/hard) with deterministic graders. The inference script must run in under 20 minutes on 2 vCPU / 8GB RAM.
 ---
-## Architecture Rules — Never Violate These When Fixing
-1. **Persistence is SQLite/SQLModel** — do not switch to in-memory or another DB to fix a bug
-2. **Neighbor observations are always compressed summaries** — never fix a context issue by passing full neighbor code
-3. **Rewards must always be in 0.0–1.0** — if a reward bug exists, fix the computation, never remove the clip
-4. **inference.py uses OpenAI client only** — do not swap to direct HTTP calls or another client
-5. **[START]/[STEP]/[END] log format is fixed** — do not change field names or ordering to fix a logging bug
-6. **Hard grader uses temperature=0 and fixed rubric** — do not relax this to fix flaky test failures
-7. **episode step limit is 10** — do not raise this to fix timeout issues, optimize the agent instead
----
-## How To Approach Any Bug
-### Step 1 — Locate
-- Identify which layer the bug is in: parser → db → graph → observation_builder → environment → grader → server → inference
-- Do not assume the bug is where the error surfaces — trace back to root cause
-### Step 2 — Check Interfaces First
-- Before changing implementation, verify the interface contract between the broken component and its dependencies
-- Use Context7 MCP to re-check library APIs if the bug involves SQLModel, NetworkX, pylint, bandit, FastAPI, or OpenEnv
-- Do not fix a bug by changing a shared interface without checking all callers
-### Step 3 — Fix Minimally
-- Fix the smallest possible change that resolves the issue
-- If the fix requires changing a DB schema, check whether a migration is needed and write it
-- If the fix changes a Pydantic model, check all serialization/deserialization paths
-### Step 4 — Verify
-- After fixing, confirm the completion criteria for the relevant phase still pass
-- Run the specific test for the broken component
-- If inference.py is affected, do a dry run and confirm [START]/[STEP]/[END] logs emit correctly
----
-## Common Failure Modes To Check First
-### DB / Persistence
-- DB not found on startup → check migrations.py auto-init logic
-- Graph loads empty on second run → check upsert_node is committing correctly
-- Annotations not persisting across reset() → check reset() only clears annotations, not nodes/edges
-### Parser
-- AST parser crashes on type-annotated functions → check handling of ast.Constant vs ast.Str in Python 3.11
-- Linter returns no output → check pylint/bandit are installed in the Docker image and PATH is correct
-- Import resolution fails on relative imports → check the resolver handles both absolute and relative imports
-### RL Environment
-- Reward outside 0.0–1.0 → find the unclipped computation in reward.py
-- done never becomes True → check step limit counter and REQUEST_CHANGES/APPROVE handling
-- reset() returns wrong module → check task registry is loading the correct starting module
-### Graders
-- Easy grader always returns 0 → check linter_flags were populated in DB during parsing
-- Hard grader is non-deterministic → confirm temperature=0 and seed param is being passed
-- Grader crashes on empty annotation → add null check before scoring
-### Server
-- /health returns 404 → check route is registered in app.py
-- /step rejects valid action → check discriminated union deserialization in Pydantic v2
-- openenv validate fails → check openenv.yaml field names against spec exactly
-### Inference Script
-- Runs over 20 minutes → profile which task is slowest, reduce max steps or add timeout per episode
-- LLM returns unparseable action → check JSON mode is enabled, add fallback to APPROVE
-- Missing [STEP] logs → check log emit is inside the step loop, not outside
-### Docker
-- Build fails on pylint/bandit install → add gcc and build-essential to apt-get
-- DB not found inside container → check WORKDIR and DB path are consistent
-- Port not exposed → confirm EXPOSE 7860 and uvicorn binds to 0.0.0.0
----
-## When You Find An Ambiguity
-If fixing the bug requires a design decision (e.g. "should reset() preserve REQUEST_CONTEXT history?"), **ask the user before implementing**. Do not make silent architectural decisions while debugging.
 ---
-## Context To Always Include When Reporting A Fix
-After fixing, always report:
-- What the root cause was (one sentence)
-- Which file(s) were changed
-- Whether any DB schema changed (and if so, whether a migration was added)
-- Whether any Pydantic model interface changed (and if so, which callers were updated)
-- The specific test or check that now passes

+# Debugger Prompt — GraphReview RL Environment
+You are an expert Python debugger working on a competitive hackathon RL environment called GraphReview. Your job is to diagnose and fix bugs without breaking existing working functionality.
 ---
+## Project Context
+GraphReview is an OpenEnv-compliant RL environment. It:
+- Parses Python codebases into a SQLite-backed NetworkX dependency graph
+- Pre-computes linter ground truth (pylint/bandit/pyflakes) at seed time
+- Exposes step()/reset()/state() for an LLM agent to review code
+- Scores agent actions against stored ground truth via deterministic graders
+- Outputs an annotated graph visualization via Pyvis
+The DB is the source of truth. Pydantic v2 models define all interfaces. FastAPI wraps the environment for HTTP. inference.py runs the baseline agent.
 ---
+## Your Operating Rules
+1. **Diagnose before fixing.** State exactly what is wrong and why before writing any fix. One sentence minimum: "The bug is X because Y."
+2. **Minimal surface area.** Fix only what is broken. Do not refactor, rename, or improve unrelated code while fixing a bug.
+3. **Check DB integrity first** for any bug involving missing data, wrong rewards, or incorrect state. Run: `SELECT * FROM seed_meta` to verify seeded flag. Check `modules`, `edges`, `linter_flags` are populated before assuming code is wrong.
+4. **Use context7 MCP** to verify library APIs before assuming a bug is in your code. Many bugs come from incorrect assumptions about SQLAlchemy session handling, Pydantic v2 validation, or NetworkX graph methods.
+5. **Never re-seed unless explicitly told to.** Re-seeding takes 30s and loses demo state. If a bug looks like a seeding issue, verify first.
+6. **Grader determinism is sacred.** If a grader produces different results across runs, that is a critical bug — fix it before anything else. Check: temperature settings, prompt variability, random seeds.
+7. **Do not change Pydantic model field names or types** without explicitly flagging it. These are shared interfaces — changing them breaks step()/reset()/state() and inference.py simultaneously.
+8. **inference.py log format is a contract.** [START]/[STEP]/[END] field names and order must never change. If a bug is in inference.py, fix the logic without changing the log format.
+9. **After fixing, state what you changed and why**, and identify any other components that might be affected by the change.
+10. **If the bug requires a design change** (not just a code fix), say so clearly. Do not silently implement a design change as if it were a bug fix.
+---
+## Common Bug Patterns in This Project
+**DB not seeded / partial seed**
+- Symptom: KeyError on module_id, empty linter_flags, missing edges
+- Check: seed_meta table for seeded=true, verify row counts in modules and edges
+**Pydantic v2 validation errors**
+- Symptom: ValidationError on step() or reset()
+- Check: field types match exactly, Optional fields have defaults, JSON fields are dicts not strings
+**NetworkX graph not reconstructed from DB**
+- Symptom: graph_manager returns empty neighbors, traversal order is wrong
+- Check: edges table has rows, graph_manager.load_graph() is called before queries
+**Grader returning out-of-range reward**
+- Symptom: reward > 1.0 or < -1.0
+- Check: reward aggregation logic, episode completion bonus not double-applied
+**Token budget exceeded**
+- Symptom: LLM returns truncated or incoherent response
+- Check: token_budget.py is being called, observation summaries not using raw code
+**Hard grader non-determinism**
+- Symptom: different scores for identical inputs
+- Check: temperature=0 set on judge API call, system prompt is static string not f-string with variables
+**inference.py timeout (>20 min)**
+- Symptom: evaluation fails on judge's machine
+- Check: REQUEST_CONTEXT actions in inference loop causing extra API calls, batching strategy
+**reset() clearing too much**
+- Symptom: graph annotations from prior tasks lost after reset
+- Check: reset() filters by task_id when deleting review_annotations, not deleting all rows
 ---
+## How to Use This Prompt
+Paste this prompt, then describe:
+1. What you were trying to do
+2. What happened instead (error message, wrong output, wrong reward value)
+3. Which phase/file the bug is in
+4. What you already tried
+Then share the relevant code. I will diagnose and fix it.

OpenEnv ADDED Viewed

	@@ -0,0 +1 @@


1	+ Subproject commit c719decf2b19175d5ca35301d58a14c83e985480

Phases.md CHANGED Viewed

@@ -1,295 +1,433 @@
-# CodeReviewEnv — Phased Build Plan
-## For: LLM-Assisted Development
 ---
-## 🧠 What You Are Building
-An OpenEnv-compliant reinforcement learning environment where an LLM agent learns to perform **dependency-aware code review**.
-The environment parses a Python codebase into a **persistent dependency graph** (nodes = modules, edges = import relationships). Each node stores compressed AST summaries, linter-generated ground truth issues, and agent-written review annotations.
-The agent reviews one module per episode. It receives the **full code of the current module** plus **compressed AST summaries of its neighbors** (never full neighbor code — token budget). It takes multi-step actions (flag bugs, add comments, request context, amend upstream reviews). The environment rewards correct, well-attributed findings and penalizes false positives.
-The final output is an **annotated dependency graph** — a machine-readable + human-readable map of the entire codebase with reviews on every module, including cross-module causal attributions.
-This is differentiated from tools like CodeRabbit because:
-- It models cascading dependency bugs (bug in B caused by design in A)
-- Reviews are stored back into the graph and can be amended as agent learns more
-- It is an RL training/evaluation environment, not a static analysis tool
-- The agent learns a policy over multi-step decisions, not a single LLM call
 ---
-## 🗂️ Persistence Strategy
-**Use SQLite via SQLModel** for all persistent state. Do NOT reparse the codebase on every run. The database stores:
-- Parsed module nodes (code, AST summary, linter flags)
-- Graph edges (dependency relationships + reasons)
-- Review annotations (written by agent, updatable)
-- Episode history (for reproducibility)
-- Task definitions and ground truth
-On startup: check if DB exists → if yes, load graph from DB → if no, parse codebase and populate DB.
-This makes demos fast (parse once, review many times) and makes `reset()` cheap (clear annotations only, keep graph structure).
 ---
-## 📁 Target Project Structure
 ```
-code-review-env/
-├── openenv.yaml
-├── Dockerfile
-├── README.md
-├── inference.py               # Required by spec, root level
-├── requirements.txt
-├── pyproject.toml
-│
-├── env/
-│   ├── __init__.py
-│   ├── environment.py         # Main CodeReviewEnv class
-│   ├── models.py              # Pydantic: Observation, Action, Reward, GraphState
-│   ├── graph.py               # Graph construction, traversal, compression
-│   ├── observation_builder.py # Assembles tiered observation per step
-│   └── reward.py              # Reward computation logic
-│
-├── db/
-│   ├── __init__.py
-│   ├── schema.py              # SQLModel table definitions
-│   ├── store.py               # DB read/write operations
-│   └── migrations.py          # Init and seed scripts
-│
-├── parser/
-│   ├── __init__.py
-│   ├── ast_parser.py          # AST extraction: signatures, imports, classes
-│   ├── linter.py              # Pylint + Bandit runner, stores results to DB
-│   └── summarizer.py          # Converts AST output → compressed node summary
-│
-├── graders/
-│   ├── __init__.py
-│   ├── base_grader.py         # Abstract grader interface
-│   ├── easy_grader.py         # Linter match — fully deterministic
-│   ├── medium_grader.py       # AST + line attribution match
-│   └── hard_grader.py         # LLM-as-judge, temp=0, seed=42, rubric-constrained
-│
-├── tasks/
-│   ├── __init__.py
-│   ├── task_registry.py       # Registers and loads tasks
-│   ├── easy_task.py           # Style/linter issue in isolated module
-│   ├── medium_task.py         # Logic bug with direct dependency context
-│   └── hard_task.py           # Cascading bug across 2+ modules
-│
-├── server/
-│   ├── __init__.py
-│   └── app.py                 # FastAPI server exposing OpenEnv HTTP endpoints
-│
-├── sample_codebase/           # Synthetic test codebase for demo
 │   ├── auth.py
 │   ├── checkout.py
 │   ├── cart.py
-│   ├── payments.py
-│   └── config.py
-│
-└── tests/
-    ├── test_parser.py
-    ├── test_graders.py
-    ├── test_environment.py
-    └── test_inference.py
 ```
 ---
-## 📐 Core Data Models (Design Intent — Implementation Is Your Choice)
-### Graph Node
-Stores everything about one module. Persisted in DB.
-- module_id (filename/path)
-- raw_code (full source)
-- ast_summary (compressed: signatures, classes, exports)
-- linter_flags (pre-computed ground truth from pylint/bandit)
-- dependency_reason (why this module needs its neighbors — extracted from import context)
-- review_annotation (agent-written, nullable, updatable)
-- review_status (pending | in_progress | reviewed)
-- review_summary (one-line, written at episode end)
-### Graph Edge
-- source_module_id
-- target_module_id
-- edge_type (explicit_import | implicit_name_resolution)
-- import_line (the actual import statement)
-- weight (1.0 explicit, 0.5 implicit)
-### Observation (Pydantic)
-- current_module: full code + full AST summary
-- direct_dependencies: list of compressed node summaries (NOT full code)
-- dependents: list of compressed node summaries
-- existing_reviews: list of one-line review summaries from already-reviewed neighbors
-- constraint_flags: any known forced decisions from upstream
-- step_number: int
-- episode_id: str
-### Action (Pydantic, discriminated union)
-- APPROVE
-- FLAG_STYLE(line: int, description: str)
-- FLAG_BUG(line: int, description: str)
-- FLAG_SECURITY(line: int, description: str)
-- FLAG_DEPENDENCY_ISSUE(source_module: str, description: str)
-- ADD_COMMENT(text: str)
-- REQUEST_CHANGES(summary: str)
-- REQUEST_CONTEXT(module_id: str)  ← costs -0.1 reward, returns full code of neighbor
-- AMEND_REVIEW(module_id: str, note: str)  ← retroactively updates neighbor annotation
-### Reward (Pydantic)
-- value: float (0.0–1.0)
-- reason: str
-- cumulative: float
 ---
-## 🏗️ PHASE 1 — Foundation & Persistence
-**Goal: Database schema, parser, graph construction. No RL yet.**
-### Tasks
-1. Define SQLModel schema for all tables (nodes, edges, annotations, episodes, tasks)
-2. Build `ast_parser.py` — extract from any .py file: all function signatures with type hints, all class definitions, all import statements with source resolution, all module-level constants
-3. Build `linter.py` — run pylint and bandit programmatically on a file, parse output into structured list of {line, severity, code, message}. Store results directly to DB as ground truth.
-4. Build `summarizer.py` — convert AST output into a compressed summary string under 100 tokens. Format: "exports: [fn(args)->return, ...] | issues: N | depends_on: [module, ...]"
-5. Build `store.py` — CRUD operations for all tables. Key operations: upsert_node, upsert_edge, get_node_with_neighbors, update_annotation, get_full_graph
-6. Build `graph.py` — on first run: parse all files in target directory → populate DB. On subsequent runs: load from DB. Build NetworkX DiGraph from DB records. Implement traversal order: topological sort weighted by betweenness centrality (leaf modules first, high-centrality modules last).
-7. Build `sample_codebase/` — 5 Python files with known injected issues: one style issue, one logic bug with a direct dependency cause, one security issue, one cascading bug where the root cause is 2 hops away. Document every injected issue in a ground_truth.json file.
-### Completion Criteria
-- `python -m parser.ast_parser sample_codebase/` populates DB with all nodes and edges
-- DB persists across runs (second run loads from DB, does not reparse)
-- `python -m db.store` can query a node and return its summary and neighbors
-- ground_truth.json matches linter output for easy/medium tasks
 ---
-## 🏗️ PHASE 2 — OpenEnv Core (RL Environment)
-**Goal: Full step()/reset()/state() loop with reward. This is the RL part.**
-### Tasks
-1. Build `models.py` — all Pydantic models: Observation, Action (discriminated union), Reward, GraphState, EpisodeRecord. Must be fully typed.
-2. Build `observation_builder.py` — given a module_id and current graph state, assemble the tiered observation: full code for current module, compressed summaries for neighbors (pulled from DB), existing review annotations for already-reviewed neighbors, constraint flags
-3. Build `reward.py` — implement reward logic:
-   - Easy: compare agent flags against linter ground truth. Correct flag = +0.5, false positive = -0.2, missed critical = -0.4
-   - Medium: check flag + line number within ±3 lines of ground truth = +0.5, correct comment attribution = +0.3
-   - Hard: call hard_grader with agent's FLAG_DEPENDENCY_ISSUE and the known root cause. Score returned by judge × 0.8 as reward.
-   - REQUEST_CONTEXT action always costs -0.1 (thinking cost)
-   - AMEND_REVIEW with correct attribution = +0.4 (high reward — this is the key cascading behavior)
-   - Episode completion bonus: +0.2 if all critical issues found, -0.1 if APPROVE on module with known critical bugs
-4. Build `graders/` — implement all three graders per spec above. Hard grader must use OpenAI client (per competition spec), temperature=0, fixed rubric prompt stored as a constant.
-5. Build `environment.py` — main class implementing full OpenEnv interface:
-   - `reset(task_id)` → clears annotations for task modules, returns first observation
-   - `step(action)` → validates action, updates graph annotations in DB, computes reward, returns (obs, reward, done, info)
-   - `state()` → returns full GraphState (serialized NetworkX graph + all annotations)
-   - Episode ends when: agent calls APPROVE or REQUEST_CHANGES, OR step limit reached (max 10 steps)
-6. Build `tasks/` — register 3 tasks pointing to specific modules in sample_codebase with known ground truth issues
-### Completion Criteria
-- `env.reset("easy_task")` returns a valid typed Observation
-- `env.step(FLAG_BUG(line=12, description="null risk"))` returns reward > 0 for correct flag
-- `env.state()` returns serializable graph with annotations
-- Full episode runs without error on all 3 tasks
-- Reward values all fall in 0.0–1.0 range
 ---
-## 🏗️ PHASE 3 — HTTP Server & OpenEnv Spec Compliance
-**Goal: Wrap environment in FastAPI, pass openenv validate.**
-### Tasks
-1. Build `server/app.py` — FastAPI app exposing:
-   - POST /reset → calls env.reset(), returns Observation JSON
-   - POST /step → calls env.step(action), returns (obs, reward, done, info) JSON
-   - GET /state → calls env.state(), returns GraphState JSON
-   - GET /health → returns 200 (required for HF Space ping)
-2. Build `openenv.yaml` — fill all required metadata: name, version, description, tasks list, observation_space, action_space, reward_range
-3. Run `openenv validate` — fix all compliance errors
-4. Confirm all Pydantic models serialize/deserialize correctly over HTTP
-### Completion Criteria
-- `openenv validate` passes with no errors
-- All endpoints return correct typed responses
-- GET /health returns 200
 ---
-## 🏗️ PHASE 4 — Inference Script
-**Goal: Build inference.py that runs Gemma 4 as the agent. This is what judges auto-run.**
-### Critical Requirements (Non-Negotiable)
-- File must be named `inference.py` at root
-- Use OpenAI client for all LLM calls
-- Read API_BASE_URL, MODEL_NAME, HF_TOKEN from environment variables
-- Emit structured stdout logs in EXACTLY this format:
 ```
-[START] task=<task_id> episode=<n>
-[STEP] step=<n> action=<action_type> reward=<float> cumulative=<float>
-[END] task=<task_id> total_reward=<float> steps=<n>
 ```
-- Must complete all 3 tasks in under 20 minutes total
-- Must run on 2 vCPU / 8GB RAM
-### Tasks
-1. Build the agent loop — for each task: reset env, loop step() until done, collect rewards
-2. Build the LLM action parser — send observation to model with a structured prompt, parse response into typed Action. Use JSON mode or structured output. Handle parse failures gracefully (default to APPROVE with penalty).
-3. Build the action prompt — system prompt explaining the environment, action space, and output format. Include the compressed observation in user message. Tell model to output JSON action only.
-4. Implement all 3 task runs sequentially
-5. Emit all required log lines to stdout
-6. Final output: baseline scores for all 3 tasks printed to stdout
-### Completion Criteria
-- Script runs end to end without error
-- All [START]/[STEP]/[END] logs emitted correctly
-- Produces a score for each task between 0.0–1.0
-- Completes in under 20 minutes
 ---
-## 🏗️ PHASE 5 — Containerization & Deployment
-**Goal: Docker build works, HF Space deploys, pre-validation script passes.**
-### Tasks
-1. Write `Dockerfile`:
-   - Base: python:3.11-slim
-   - Install system deps for pylint, bandit, networkx
-   - Copy project, install requirements
-   - On container start: run parser to populate DB if not exists, then start FastAPI server
-   - Expose port 7860 (HF Spaces default)
-2. Write `README.md` with all required sections: environment description and motivation, observation and action space definitions, all 3 task descriptions with difficulty, setup instructions, baseline scores
-3. Run pre-submission validation script — fix all failures
-4. Deploy to HF Space with `openenv push`
-5. Confirm Space URL returns 200 on GET /health and responds to POST /reset
-### Completion Criteria
-- `docker build .` succeeds
-- `docker run -p 7860:7860` starts server cleanly
-- HF Space URL responds to reset()
-- Pre-validation script passes all checks
 ---
-## ⏱️ Suggested Time Allocation (Given ~36hrs remaining)
-| Phase | Time |
-|---|---|
-| Phase 1 — Foundation | 6 hrs |
-| Phase 2 — RL Environment | 8 hrs |
-| Phase 3 — Server + Spec | 3 hrs |
-| Phase 4 — Inference Script | 4 hrs |
-| Phase 5 — Docker + Deploy | 3 hrs |
-| Buffer / debugging | 4 hrs |
 ---
-## ⚠️ Known Risk Areas (Watch These)
-1. **Hard grader reproducibility** — document judge prompt and seed explicitly
-2. **DB migration on fresh Docker build** — first run must auto-populate DB from sample_codebase
-3. **Inference script runtime** — test full 3-task run locally before submitting, must be under 20 min
-4. **openenv validate strictness** — run it early in Phase 3, not at the end
-5. **Reward always in 0.0–1.0** — clip all reward values, graders must never return outside range

+# GraphReview RL Environment — Complete Phased Build Plan v2
 ---
+## What You Are Building
+An OpenEnv-compliant RL environment where an LLM agent learns to review Python code with full dependency graph awareness. The environment:
+1. Parses a Python codebase into a **persistent dependency graph** stored in SQLite
+2. Splits large files (>300 lines) into sub-nodes by class/function to keep observations manageable
+3. Pre-computes ground truth linter flags (pylint + bandit + pyflakes) per node at seed time
+4. Presents the agent with one module at a time + compressed AST summaries of neighbors
+5. Receives structured actions (FLAG_BUG, ADD_COMMENT, REQUEST_CONTEXT, etc.)
+6. Scores actions against pre-computed ground truth — no training data needed, ground truth IS the data
+7. Accumulates review annotations back onto graph nodes in SQLite
+8. Outputs an annotated dependency graph visualized via Pyvis (interactive HTML) + markdown report
+**The RL loop:** Agent takes multi-step actions per module episode, receives per-step rewards, learns to reason about cascading dependency issues. This is online RL — the environment generates interaction data live. No pre-existing dataset required.
+**The key differentiator vs CodeRabbit:** Agent sees WHY a decision was made (upstream context) before flagging it. Reviews are stored back into the graph. Agent can AMEND earlier reviews as it learns more about root causes downstream.
 ---
+## Why No Training Data Is Needed
+This is online RL, not offline supervised learning:
+- Ground truth = pylint/bandit/pyflakes results, computed once at seed time, stored in DB
+- Agent explores environment → receives rewards → that interaction IS the training signal
+- For Round 1, the baseline inference script evaluates a pre-trained LLM (Gemma 4 E4B) acting as agent
+- You are not training a model — you are building the environment that COULD train one
+- The three graders define what "correct behavior" looks like — that is your data
+---
+## Tech Stack (Fixed)
+- Python 3.11
+- OpenEnv: step() / reset() / state() + Pydantic typed models + openenv.yaml
+- SQLite via SQLAlchemy ORM (persistent, file-based, ships in Docker)
+- NetworkX for graph operations and traversal
+- Python built-in `ast` module for structure extraction
+- `astroid` for scope-aware name resolution and intra-file conflict detection
+- pylint + bandit + pyflakes for ground truth generation (run once at seed time)
+- Pyvis for interactive graph visualization
+- OpenAI client (inference.py + hard task LLM judge)
+- Gemma 4 E4B as baseline agent model
+- FastAPI for HTTP server (required for HF Spaces)
+- Docker + Hugging Face Spaces
+- context7 MCP for library documentation during build
 ---
+## File Structure
 ```
+graphreview/
+├── sample_project/          # synthetic input codebase with injected bugs
 │   ├── auth.py
 │   ├── checkout.py
 │   ├── cart.py
+│   ├── database.py
+│   └── ...
+├── parser/
+│   ├── ast_parser.py        # extract signatures, imports, classes per file
+│   ├── chunker.py           # split files >300 lines into sub-nodes
+│   ├── graph_builder.py     # build NetworkX DiGraph from parsed output
+│   └── summarizer.py        # compress each node to ~50 token summary
+├── db/
+│   ├── database.py          # SQLAlchemy engine, session factory
+│   ├── models.py            # ORM models for all tables
+│   └── seed.py              # parse once → store → skip if seeded
+├── graph/
+│   ├── graph_manager.py     # load graph from DB, traversal, neighbor queries
+│   └── token_budget.py      # enforce token limits on observations
+├── env/
+│   ├── environment.py       # CodeReviewEnv main class
+│   ├── observation.py       # Pydantic: CodeObservation
+│   ├── action.py            # Pydantic: ReviewAction
+│   ├── reward.py            # Pydantic: ReviewReward + reward table
+│   └── state.py             # Pydantic: GraphState
+├── graders/
+│   ├── base_grader.py       # abstract interface
+│   ├── easy_grader.py       # linter match (deterministic)
+│   ├── medium_grader.py     # AST + line attribution (deterministic)
+│   └── hard_grader.py       # graph consistency + LLM judge (temperature=0)
+├── tasks/
+│   ├── task_registry.py     # register 3 tasks
+│   ├── easy_task.py         # style/linter review
+│   ├── medium_task.py       # logic bug + direct dep context
+│   └── hard_task.py         # cascading bug across 2+ module hops
+├── visualizer/
+│   ├── pyvis_renderer.py    # NetworkX → interactive HTML graph
+│   └── report_generator.py  # markdown + JSON final report
+├── server.py                # FastAPI wrapper for OpenEnv HTTP spec
+├── inference.py             # baseline agent script (mandatory, root level)
+├── openenv.yaml             # spec metadata
+├── Dockerfile
+└── README.md
 ```
 ---
+## Database Schema (SQLite — Persistent)
+**modules**
+```
+id                TEXT PK      (relative file path, or "file.py::ClassName" for sub-nodes)
+name              TEXT
+code              TEXT         (full source — full file or chunked section)
+ast_summary       JSON         (signatures, classes, return types, decorators)
+linter_flags      JSON         (pre-computed pylint+bandit+pyflakes — GROUND TRUTH)
+summary           TEXT         (~50 token natural language description)
+parent_module_id  TEXT NULL    (set if this is a sub-node chunk of a larger file)
+review_status     TEXT         (pending | in_progress | reviewed)
+is_chunk          BOOLEAN
+```
+**edges**
+```
+source_id         TEXT FK → modules.id
+target_id         TEXT FK → modules.id
+edge_type         TEXT         (explicit_import | implicit_dependency | intra_file)
+import_line       TEXT
+dependency_reason TEXT
+scope             TEXT         (module_level | function_level)
+weight            FLOAT        (1.0 explicit, 0.5 implicit)
+```
+**review_annotations**
+```
+id                INTEGER PK AUTOINCREMENT
+module_id         TEXT FK → modules.id
+task_id           TEXT
+action_type       TEXT
+content           TEXT
+reward_given      FLOAT
+attributed_to     TEXT NULL    (module_id for cascade attribution)
+is_amendment      BOOLEAN      (true if this amends a prior review)
+created_at        TIMESTAMP
+```
+**task_runs**
+```
+id                INTEGER PK AUTOINCREMENT
+task_id           TEXT
+started_at        TIMESTAMP
+completed_at      TIMESTAMP NULL
+total_reward      FLOAT
+total_steps       INTEGER
+status            TEXT         (running | complete | failed)
+```
+**seed_meta**
+```
+key               TEXT PK
+value             TEXT
+```
+(stores seeded=true flag, seed timestamp, codebase hash)
 ---
+## Chunking Strategy for Large Files
+```
+File ≤ 300 lines  → one node, id = "filename.py"
+File > 300 lines  → chunk by top-level class or function
+  Each chunk becomes a sub-node:
+  id = "filename.py::ClassName" or "filename.py::function_name"
+  parent_module_id = "filename.py"
+  A virtual parent node is kept for the file itself
+  with no code but with all inter-file edges
+  Intra-file edges added between chunks:
+  if function_a calls function_b in same file →
+  edge(filename.py::function_a → filename.py::function_b, type=intra_file)
+Dependency conflict detection (via astroid):
+  If import is used only inside one function → scope=function_level, weight=0.5
+  If import used at module level → scope=module_level, weight=1.0
+  Circular imports → flagged as edge with type=circular, added to linter_flags
+```
+---
+## Observation Token Budget
+```
+Current module full code:        ~800 tokens  (hard cap, truncate with notice)
+AST summary of current:          ~100 tokens
+Direct dependency summaries:     ~50 tokens × up to 5 deps = 250 tokens
+Dependent summaries:             ~50 tokens × up to 3 = 150 tokens
+Existing neighbor reviews:       ~30 tokens × up to 4 = 120 tokens
+Task description + action space: ~200 tokens
+Buffer:                          ~280 tokens
+─────────────────────────────────────────────
+Total:                           ~1900 tokens (well within E4B 128K window)
+```
+If a module has >5 direct dependencies, rank by betweenness centrality and include top 5 only.
+---
+## Action Space
+```python
+action_type options:
+  FLAG_STYLE              # style/formatting issue
+  FLAG_BUG                # logic error
+  FLAG_SECURITY           # security vulnerability
+  FLAG_DEPENDENCY_ISSUE   # issue caused by upstream module
+  ADD_COMMENT             # explanation (requires content field)
+  REQUEST_CONTEXT         # fetch full code of a neighbor (-0.1 reward cost)
+  REQUEST_CHANGES         # end episode, verdict = changes needed
+  APPROVE                 # end episode, verdict = approved
+  AMEND_REVIEW            # update a prior annotation on a neighbor node
+Fields:
+  action_type:     required
+  target_line:     optional int
+  content:         required for ADD_COMMENT, AMEND_REVIEW
+  attributed_to:   optional module_id (for FLAG_DEPENDENCY_ISSUE, AMEND_REVIEW)
+  context_request: required for REQUEST_CONTEXT (module_id to fetch)
+```
+---
+## Reward Table
+```
+Correct FLAG_* matching linter ground truth:          +0.5
+Accurate ADD_COMMENT (keyword match to linter desc):  +0.3
+FLAG_DEPENDENCY_ISSUE with correct attribution:       +0.6
+FLAG_DEPENDENCY_ISSUE wrong attribution:              +0.1
+AMEND_REVIEW correctly updating prior annotation:     +0.4
+REQUEST_CONTEXT (investigation cost):                 -0.1
+False positive flag (no linter match):                -0.2
+APPROVE on module with unflagged critical issues:     -1.0
+REQUEST_CHANGES on clean module:                      -0.3
+Episode completion bonus (all issues caught):         +0.2
+```
+---
+## Grader Architecture
+### Easy Grader (fully deterministic)
+- Load linter_flags JSON from DB for current module
+- For each agent FLAG_* action: check if a matching linter flag exists (type + line ±3)
+- Score per action, aggregate for episode
+- No LLM call. Zero variance.
+### Medium Grader (fully deterministic)
+- Easy grader logic PLUS:
+- For ADD_COMMENT: extract keywords from linter flag description, check overlap with agent comment (Jaccard similarity > 0.3 = match)
+- For line attribution: ±3 line tolerance
+- Still no LLM call.
+### Hard Grader (quasi-deterministic)
+- Graph consistency check (deterministic):
+  If FLAG_DEPENDENCY_ISSUE with attributed_to=X: verify edge(current → X) or edge(X → current) exists in graph
+  If no edge: reward = 0.0, feedback = "no dependency relationship found"
+- LLM-as-judge (temperature=0, fixed rubric):
+  Separate API call to judge model (NOT the agent)
+  Fixed system prompt with scoring rubric
+  Scores cascade reasoning quality: 0.0 | 0.5 | 1.0
+  Document prompt hash in README for reproducibility
 ---
+## Three Tasks
+### Task 1: style_review (Easy)
+- Input: single module with 3 pylint style violations
+- Agent must: flag all 3 style issues
+- No dependency context needed
+- Grader: easy_grader only
+- Expected baseline score: 0.7–0.9
+### Task 2: logic_review (Medium)
+- Input: checkout.py with a null-reference bug
+- auth.py (its dependency) has validate_token that can return None
+- Agent must: flag the bug + add comment referencing the None return risk
+- Grader: medium_grader
+- Expected baseline score: 0.4–0.7
+### Task 3: cascade_review (Hard)
+- Input: 3-module chain: config.py → auth.py → checkout.py
+- Bug originates in config.py (missing key), propagates through auth.py, surfaces in checkout.py
+- Agent must: flag issue in checkout.py AND attribute root cause to config.py
+- Grader: hard_grader (graph consistency + LLM judge)
+- Expected baseline score: 0.2–0.5
 ---
+## Visualization
+### Pyvis Interactive Graph (primary)
+- Nodes colored by review_status: grey=pending, yellow=in_progress, green=approved, red=changes_requested
+- Node size = number of dependents (centrality)
+- Edge color: blue=explicit_import, orange=implicit, red=circular
+- Edge thickness = weight (1.0 explicit, 0.5 implicit)
+- Click node → shows review_annotations panel
+- Rendered as standalone HTML, embedded in HF Space
+### Final Report Output (end of all episodes)
+- `graphreview_report.md`: per-module sections with verdict + issues + cascade attributions
+- `graphreview_report.json`: machine-readable full graph + annotations
+- `graphreview_graph.html`: pyvis interactive visualization
 ---
+## inference.py Log Format (Mandatory)
 ```
+[START] task=cascade_review module_count=3
+[STEP] module=checkout.py action=FLAG_BUG line=24 reward=0.5 cumulative=0.5
+[STEP] module=checkout.py action=ADD_COMMENT content="null risk from auth" reward=0.3 cumulative=0.8
+[STEP] module=checkout.py action=FLAG_DEPENDENCY_ISSUE attributed_to=auth.py reward=0.6 cumulative=1.4
+[STEP] module=checkout.py action=REQUEST_CHANGES reward=0.2 cumulative=1.6 done=true
+[STEP] module=auth.py action=FLAG_BUG line=15 reward=0.5 cumulative=2.1
+[STEP] module=auth.py action=FLAG_DEPENDENCY_ISSUE attributed_to=config.py reward=0.6 cumulative=2.7
+[STEP] module=auth.py action=REQUEST_CHANGES reward=0.2 cumulative=2.9 done=true
+[STEP] module=config.py action=FLAG_BUG line=8 reward=0.5 cumulative=3.4
+[STEP] module=config.py action=REQUEST_CHANGES reward=0.2 cumulative=3.6 done=true
+[END] task=cascade_review total_reward=3.6 modules_reviewed=3 report=graphreview_report.md
 ```
 ---
+## Phase 1 — Persistence Layer & Sample Project
+**Goal: Parse once, store forever, never re-parse**
+Build:
+- `sample_project/` — 10 Python files, ~50 functions total, with injected known bugs for each task
+- `db/models.py` — all SQLAlchemy ORM models
+- `db/database.py` — engine setup, session factory, init_db()
+- `db/seed.py` — orchestrate full parse → lint → store pipeline
+- `parser/ast_parser.py` — extract structure per file using Python ast
+- `parser/chunker.py` — split files >300 lines by class/function into sub-nodes
+- `parser/graph_builder.py` — build NetworkX DiGraph, explicit + implicit edges
+- `parser/summarizer.py` — ~50 token summaries per node
+Success criteria:
+- seed.py completes in <30s on sample_project
+- Second run detects seeded flag, loads in <1s
+- All modules, edges, linter_flags correctly stored
+- Chunking correctly splits a 400-line test file into sub-nodes
+---
+## Phase 2 — Graph Manager & Observation Builder
+**Goal: Efficient, token-budgeted observations from DB**
+Build:
+- `graph/graph_manager.py` — load graph, traversal order, neighbor queries
+- `graph/token_budget.py` — enforce per-component token limits
+- `env/observation.py` — Pydantic CodeObservation model
+Success criteria:
+- Observation for any node fits within 2000 token budget
+- Traversal order: leaf nodes first, high-centrality nodes last
+- REQUEST_CONTEXT returns full neighbor code within budget
+---
+## Phase 3 — Action Space, Reward Engine & Graders
+**Goal: All actions scored correctly and deterministically**
+Build:
+- `env/action.py` — Pydantic ReviewAction
+- `env/reward.py` — Pydantic ReviewReward + reward table logic
+- `graders/base_grader.py` — abstract interface
+- `graders/easy_grader.py` — linter match
+- `graders/medium_grader.py` — linter + keyword + line attribution
+- `graders/hard_grader.py` — graph consistency + LLM judge
+Success criteria:
+- Easy grader: same input always gives same output (verified with 10 runs)
+- Hard grader: temperature=0 verified, prompt hash documented
+- All reward values within 0.0–1.0 range
+- False positive and false negative cases handled explicitly
 ---
+## Phase 4 — OpenEnv Core
+**Goal: Fully compliant step() / reset() / state()**
+Build:
+- `env/environment.py` — CodeReviewEnv main class
+- `env/state.py` — GraphState Pydantic model
+- `tasks/task_registry.py` + 3 task files
+- `openenv.yaml`
+- `server.py` — FastAPI HTTP wrapper
+Success criteria:
+- `openenv validate` passes
+- All 3 tasks run end-to-end without error
+- state() correctly returns full annotated graph
+- reset() clears only current task annotations, not full DB
 ---
+## Phase 5 — Visualization & Reporting
+**Goal: Useful output the user actually sees**
+Build:
+- `visualizer/pyvis_renderer.py` — interactive HTML graph
+- `visualizer/report_generator.py` — markdown + JSON report
+Success criteria:
+- Graph colors update correctly as reviews accumulate
+- Report correctly attributes cascade issues across modules
+- HTML renders in browser without external dependencies
+---
+## Phase 6 — inference.py & Deployment
+**Goal: Baseline script + Docker + HF Space**
+Build:
+- `inference.py` — runs Gemma 4 E4B against all 3 tasks, emits mandatory log format
+- `Dockerfile` — clean build + run
+- `README.md` — full documentation
+- HF Space deployment
+Success criteria:
+- inference.py completes all 3 tasks in <20 minutes
+- Runs on 2 vCPU / 8GB RAM
+- docker build && docker run works cleanly
+- HF Space deploys and responds to reset() ping
+- Baseline scores reproducible across 3 runs

Reviewer.md ADDED Viewed

	@@ -0,0 +1,94 @@

+# Phase Reviewer Prompt — GraphReview RL Environment
+You are a senior engineer and RL systems expert reviewing completed phases of a competitive hackathon project called GraphReview. Your job is to catch problems before they compound into later phases.
+---
+## Project Context
+GraphReview is an OpenEnv-compliant RL environment for graph-aware Python code review. Key constraints:
+- SQLite is the persistent store — DB schema changes are expensive after Phase 1
+- Pydantic v2 models are shared interfaces — field changes break multiple components
+- Graders must be deterministic — non-determinism is a disqualification risk
+- inference.py log format is a judging contract — any deviation fails automated scoring
+- Must run in <20 min on 2 vCPU / 8GB RAM
+- Must pass `openenv validate` and `docker build && docker run`
+---
+## Your Review Checklist
+For every phase submitted to you, check ALL of the following:
+### Correctness
+- [ ] Does the code do what the phase plan says it should do?
+- [ ] Are all success criteria from the phase plan met?
+- [ ] Are edge cases handled (empty files, circular imports, modules with no dependencies, modules with >5 deps)?
+- [ ] Does reset() only clear current task annotations, not the full DB?
+- [ ] Does state() return the full graph including all prior annotations?
+### Interface Integrity
+- [ ] Do all Pydantic models match the spec exactly (field names, types, Optional handling)?
+- [ ] Do function signatures match what later phases will call?
+- [ ] Are all DB foreign keys correct and consistent?
+- [ ] Is the module_id format consistent everywhere (relative path, sub-node format)?
+### Determinism & Reproducibility
+- [ ] Do easy and medium graders make zero LLM calls?
+- [ ] Is hard grader temperature explicitly set to 0?
+- [ ] Would running the same input twice produce the same reward?
+- [ ] Is the LLM judge prompt a static string (not variable-dependent)?
+### Performance & Resource Constraints
+- [ ] Will seed.py complete in <30s on the sample_project?
+- [ ] Will inference.py complete all 3 tasks in <20 minutes?
+- [ ] Does token_budget.py enforce the 2000 token cap?
+- [ ] Will the environment run on 2 vCPU / 8GB RAM?
+### OpenEnv Compliance
+- [ ] Does openenv.yaml include all required fields?
+- [ ] Do step()/reset()/state() match the OpenEnv spec exactly?
+- [ ] Will `openenv validate` pass based on what's been built?
+### Code Quality
+- [ ] Are all functions fully typed?
+- [ ] Are Pydantic models complete with no missing fields?
+- [ ] Is SQLAlchemy session handling correct (no session leaks)?
+- [ ] Are there no hardcoded paths that break in Docker?
+### Forward Compatibility
+- [ ] Will this phase's output work cleanly with the next phase's inputs?
+- [ ] Are there any design decisions that will cause pain in later phases?
+- [ ] Is the DB schema flexible enough for the remaining phases?
+---
+## How to Report Issues
+For each issue found, report:
+**Severity:** Critical | Major | Minor
+**Critical** — will cause disqualification or break a later phase entirely
+**Major** — will cause incorrect behavior or significant rework
+**Minor** — suboptimal but won't break anything
+**Format:**
+```
+[CRITICAL] File: graders/hard_grader.py
+Issue: temperature not set to 0 on judge API call
+Why it matters: grader will produce different scores on identical inputs, failing reproducibility check
+Fix: add temperature=0 to API call parameters
+```
+---
+## After Reviewing
+Summarise:
+1. Total issues found by severity
+2. Whether the phase passes (no Criticals) or fails (any Critical)
+3. The single most important thing to fix before moving to the next phase
+4. Any forward-looking risks the builder should keep in mind for upcoming phases
+Do not approve a phase with any Critical issues. Do not nitpick Minor issues if the phase is under time pressure — flag them but do not block.

code-review-env/README.md CHANGED Viewed

@@ -1,11 +1,70 @@
 # CodeReviewEnv
-Phase 1 foundation for dependency-aware code review environment.
 ## Quickstart
 ```bash
 pip install -r requirements.txt
-python -m parser.ast_parser sample_codebase/
 python -m db.store --module checkout
 ```

 # CodeReviewEnv
+Dependency-aware code review RL environment with persistent SQLite graph storage.
+## Current Status
+- Phase 1: implemented and validated
+  - persistent seed pipeline with hash-based cache
+  - parser/chunker/graph builder + linter findings persistence
+- Phase 2: implemented
+  - graph manager for DB-backed graph loading and deterministic traversal
+  - hard token budget enforcement (max 2000 tokens)
+  - strict Pydantic v2 observation models
+  - observation builder with neighbor summaries and REQUEST_CONTEXT support
+## Implemented Phase 2 Components
+- [graph/graph_manager.py](graph/graph_manager.py)
+  - Loads graph nodes/edges from SQLite.
+  - Exposes neighbor queries (in/out/both).
+  - Provides deterministic traversal ordering with leaf-first preference.
+- [graph/token_budget.py](graph/token_budget.py)
+  - Enforces hard observation token cap (<= 2000).
+  - Applies per-component token limits.
+  - Truncates oversized components with explicit marker.
+- [env/observation.py](env/observation.py)
+  - Strict Pydantic models: `NeighborSummary`, `RequestedContext`, `CodeObservation`.
+  - Forbids extra fields and type coercion.
+  - Enforces `total_tokens <= 2000`.
+- [env/observation_builder.py](env/observation_builder.py)
+  - Builds observation payloads from DB graph state.
+  - Ranks dependency context using graph centrality.
+  - Produces validated `CodeObservation` objects.
+## Compatibility
+- [env/graph.py](env/graph.py) remains stable for existing callers and now delegates to GraphManager.
 ## Quickstart
 ```bash
 pip install -r requirements.txt
+python -m db.seed sample_project/
 python -m db.store --module checkout
 ```
+## Validation
+Run tests:
+```bash
+pytest -q
+```
+Phase 2-focused tests:
+```bash
+pytest -q tests/test_phase2_graph_manager.py tests/test_phase2_token_budget.py tests/test_phase2_observation.py
+```
+## Security and Quality Notes
+- SQLite is used as the source of truth for graph and review state.
+- No dynamic code execution is introduced in Phase 2 paths.
+- Input handling fails closed for unknown `module_id` values.
+- Observations are hard-capped to prevent context overflow.
+- Code follows typed interfaces and minimal stateful behavior.

code-review-env/db/database.py ADDED Viewed

	@@ -0,0 +1,3 @@


1	+ from db.migrations import get_default_db_path, get_engine, init_db
2	+
3	+ __all__ = ["get_default_db_path", "get_engine", "init_db"]

code-review-env/db/models.py ADDED Viewed

	@@ -0,0 +1,25 @@

+from db.schema import (
+    EdgeType,
+    EpisodeRecord,
+    LinterFinding,
+    ModuleEdge,
+    ModuleNode,
+    ReviewAnnotation,
+    ReviewStatus,
+    SeedMeta,
+    Severity,
+    TaskDefinition,
+)
+__all__ = [
+    "EdgeType",
+    "EpisodeRecord",
+    "LinterFinding",
+    "ModuleEdge",
+    "ModuleNode",
+    "ReviewAnnotation",
+    "ReviewStatus",
+    "SeedMeta",
+    "Severity",
+    "TaskDefinition",
+]

code-review-env/db/schema.py CHANGED Viewed

@@ -9,7 +9,9 @@ from sqlmodel import Field, SQLModel
 class EdgeType(StrEnum):
     EXPLICIT_IMPORT = "explicit_import"
-    IMPLICIT_NAME_RESOLUTION = "implicit_name_resolution"
 class ReviewStatus(StrEnum):
@@ -28,8 +30,13 @@ class ModuleNode(SQLModel, table=True):
     id: Optional[int] = Field(default=None, primary_key=True)
     source_root: str = Field(index=True)
     module_id: str = Field(index=True)
     raw_code: str
     ast_summary: str
     dependency_reason: str = ""
     review_annotation: Optional[str] = None
     review_status: ReviewStatus = Field(default=ReviewStatus.PENDING)
@@ -89,3 +96,8 @@ class TaskDefinition(SQLModel, table=True):
     target_module_id: str = Field(index=True)
     description: str
     ground_truth_ref: str

 class EdgeType(StrEnum):
     EXPLICIT_IMPORT = "explicit_import"
+    IMPLICIT_DEPENDENCY = "implicit_dependency"
+    INTRA_FILE = "intra_file"
+    CIRCULAR = "circular"
 class ReviewStatus(StrEnum):
     id: Optional[int] = Field(default=None, primary_key=True)
     source_root: str = Field(index=True)
     module_id: str = Field(index=True)
+    name: Optional[str] = None
     raw_code: str
     ast_summary: str
+    summary: Optional[str] = None
+    linter_flags: str = "[]"
+    parent_module_id: Optional[str] = Field(default=None, index=True)
+    is_chunk: bool = False
     dependency_reason: str = ""
     review_annotation: Optional[str] = None
     review_status: ReviewStatus = Field(default=ReviewStatus.PENDING)
     target_module_id: str = Field(index=True)
     description: str
     ground_truth_ref: str
+class SeedMeta(SQLModel, table=True):
+    key: str = Field(primary_key=True)
+    value: str

code-review-env/db/seed.py ADDED Viewed

	@@ -0,0 +1,143 @@

+from __future__ import annotations
+import argparse
+import hashlib
+import json
+from datetime import UTC, datetime
+from pathlib import Path
+from db.store import Store
+from parser.ast_parser import parse_python_file
+from parser.chunker import chunk_module
+from parser.graph_builder import build_edges
+from parser.linter import run_linters
+from parser.summarizer import summarize_module
+def _codebase_hash(target_dir: Path) -> str:
+    digest = hashlib.sha256()
+    for path in sorted(target_dir.rglob("*.py")):
+        rel = path.relative_to(target_dir).as_posix()
+        digest.update(rel.encode("utf-8"))
+        digest.update(path.read_bytes())
+    return digest.hexdigest()
+def _seed_meta_key(source_root: str) -> str:
+    return f"seeded:{source_root}"
+def seed_project(target_dir: Path, db_path: str | None = None, force: bool = False) -> dict[str, object]:
+    target_dir = target_dir.resolve()
+    store = Store(source_root=str(target_dir), db_path=db_path)
+    current_hash = _codebase_hash(target_dir)
+    meta_key = _seed_meta_key(str(target_dir))
+    existing_raw = store.get_meta(meta_key)
+    existing = json.loads(existing_raw) if existing_raw else {}
+    if (
+        not force
+        and store.has_nodes()
+        and existing.get("codebase_hash") == current_hash
+        and existing.get("seeded") is True
+    ):
+        return {
+            "seeded": True,
+            "loaded_from_cache": True,
+            "codebase_hash": current_hash,
+            "node_count": int(existing.get("node_count", 0)),
+            "edge_count": int(existing.get("edge_count", 0)),
+        }
+    store.clear_source_graph()
+    py_files = sorted(target_dir.rglob("*.py"))
+    parsed_modules = [parse_python_file(path, target_dir) for path in py_files]
+    module_ids = {parsed.module_id for parsed in parsed_modules}
+    chunk_ids_by_parent: dict[str, set[str]] = {}
+    for path, parsed in zip(py_files, parsed_modules):
+        issues = run_linters(path)
+        summary = summarize_module(parsed, issues)
+        linter_flags = json.dumps([issue.model_dump() for issue in issues])
+        chunk_result = chunk_module(parsed, max_lines=300)
+        parent = chunk_result.parent
+        store.upsert_node(
+            module_id=parent.module_id,
+            name=parent.name,
+            raw_code=parent.code,
+            ast_summary=summary,
+            summary=summary,
+            linter_flags=linter_flags,
+            dependency_reason="Imports and symbol usage captured from AST",
+            parent_module_id=parent.parent_module_id,
+            is_chunk=parent.is_chunk,
+        )
+        if chunk_result.chunks:
+            chunk_ids_by_parent[parent.module_id] = {chunk.module_id for chunk in chunk_result.chunks}
+        for chunk in chunk_result.chunks:
+            chunk_summary = f"Chunk {chunk.name} lines {chunk.start_line}-{chunk.end_line}"
+            store.upsert_node(
+                module_id=chunk.module_id,
+                name=chunk.name,
+                raw_code=chunk.code,
+                ast_summary=chunk_summary,
+                summary=chunk_summary,
+                linter_flags="[]",
+                dependency_reason="Top-level class/function chunk",
+                parent_module_id=chunk.parent_module_id,
+                is_chunk=chunk.is_chunk,
+            )
+        store.replace_findings_for_module(parsed.module_id, [issue.model_dump() for issue in issues])
+    edges = build_edges(parsed_modules, module_ids, chunk_ids_by_parent)
+    for edge in edges:
+        store.upsert_edge(
+            source_module_id=edge.source_module_id,
+            target_module_id=edge.target_module_id,
+            edge_type=edge.edge_type,
+            import_line=edge.import_line,
+            weight=edge.weight,
+        )
+    snapshot = store.get_full_graph()
+    meta_payload = {
+        "seeded": True,
+        "seeded_at": datetime.now(UTC).isoformat(),
+        "codebase_hash": current_hash,
+        "node_count": len(snapshot.nodes),
+        "edge_count": len(snapshot.edges),
+    }
+    store.set_meta(meta_key, json.dumps(meta_payload))
+    return {
+        "seeded": True,
+        "loaded_from_cache": False,
+        "codebase_hash": current_hash,
+        "node_count": len(snapshot.nodes),
+        "edge_count": len(snapshot.edges),
+    }
+def _build_parser() -> argparse.ArgumentParser:
+    parser = argparse.ArgumentParser(description="Seed graph database from Python project")
+    parser.add_argument("target", help="Path to target codebase")
+    parser.add_argument("--db-path", default=None, help="Path to SQLite database")
+    parser.add_argument("--force", action="store_true", help="Force re-parse even if seeded")
+    return parser
+def main() -> None:
+    args = _build_parser().parse_args()
+    result = seed_project(Path(args.target), db_path=args.db_path, force=args.force)
+    print(json.dumps(result, indent=2))
+if __name__ == "__main__":
+    main()

code-review-env/db/store.py CHANGED Viewed

@@ -17,6 +17,7 @@ from db.schema import (
     ModuleNode,
     ReviewAnnotation,
     ReviewStatus,
     Severity,
 )
@@ -77,6 +78,11 @@ class Store:
         raw_code: str,
         ast_summary: str,
         dependency_reason: str,
     ) -> ModuleNode:
         with Session(self.engine) as session:
             existing = session.exec(
@@ -86,8 +92,13 @@ class Store:
                 )
             ).first()
             if existing:
                 existing.raw_code = raw_code
                 existing.ast_summary = ast_summary
                 existing.dependency_reason = dependency_reason
                 existing.updated_at = datetime.now(UTC)
                 session.add(existing)
@@ -98,8 +109,13 @@ class Store:
             node = ModuleNode(
                 source_root=self.config.source_root,
                 module_id=module_id,
                 raw_code=raw_code,
                 ast_summary=ast_summary,
                 dependency_reason=dependency_reason,
             )
             session.add(node)
@@ -322,6 +338,21 @@ class Store:
             ).first()
             return first_node is not None
     def clear_source_graph(self) -> None:
         with Session(self.engine) as session:
             session.exec(

     ModuleNode,
     ReviewAnnotation,
     ReviewStatus,
+    SeedMeta,
     Severity,
 )
         raw_code: str,
         ast_summary: str,
         dependency_reason: str,
+        name: str | None = None,
+        summary: str | None = None,
+        linter_flags: str = "[]",
+        parent_module_id: str | None = None,
+        is_chunk: bool = False,
     ) -> ModuleNode:
         with Session(self.engine) as session:
             existing = session.exec(
                 )
             ).first()
             if existing:
+                existing.name = name or existing.name
                 existing.raw_code = raw_code
                 existing.ast_summary = ast_summary
+                existing.summary = summary or existing.summary
+                existing.linter_flags = linter_flags
+                existing.parent_module_id = parent_module_id
+                existing.is_chunk = is_chunk
                 existing.dependency_reason = dependency_reason
                 existing.updated_at = datetime.now(UTC)
                 session.add(existing)
             node = ModuleNode(
                 source_root=self.config.source_root,
                 module_id=module_id,
+                name=name,
                 raw_code=raw_code,
                 ast_summary=ast_summary,
+                summary=summary,
+                linter_flags=linter_flags,
+                parent_module_id=parent_module_id,
+                is_chunk=is_chunk,
                 dependency_reason=dependency_reason,
             )
             session.add(node)
             ).first()
             return first_node is not None
+    def get_meta(self, key: str) -> Optional[str]:
+        with Session(self.engine) as session:
+            record = session.get(SeedMeta, key)
+            return record.value if record else None
+    def set_meta(self, key: str, value: str) -> None:
+        with Session(self.engine) as session:
+            record = session.get(SeedMeta, key)
+            if record:
+                record.value = value
+                session.add(record)
+            else:
+                session.add(SeedMeta(key=key, value=value))
+            session.commit()
     def clear_source_graph(self) -> None:
         with Session(self.engine) as session:
             session.exec(

code-review-env/env/graph.py CHANGED Viewed

@@ -4,11 +4,9 @@ from dataclasses import dataclass
 from pathlib import Path
 import networkx as nx
-from sqlmodel import Session, select
-from db.schema import ModuleEdge, ModuleNode
-from db.store import Store
-from parser.ast_parser import parse_directory
 @dataclass
@@ -20,79 +18,34 @@ class GraphLoadResult:
 class DependencyGraph:
     def __init__(self, target_dir: str | Path, db_path: str | Path | None = None) -> None:
         self.target_dir = Path(target_dir).resolve()
-        self.store = Store(source_root=str(self.target_dir), db_path=db_path)
     def load_or_build(self, force_reparse: bool = False) -> GraphLoadResult:
-        if force_reparse or not self.store.has_nodes():
-            parse_directory(self.target_dir, db_path=str(self.store.config.db_path))
-            loaded_from_cache = False
-        else:
-            loaded_from_cache = True
         return GraphLoadResult(graph=self._build_graph(), loaded_from_cache=loaded_from_cache)
     def _build_graph(self) -> nx.DiGraph:
-        graph = nx.DiGraph()
-        with Session(self.store.engine) as session:
-            nodes = list(
-                session.exec(
-                    select(ModuleNode).where(ModuleNode.source_root == self.store.config.source_root)
-                ).all()
-            )
-            edges = list(
-                session.exec(
-                    select(ModuleEdge).where(ModuleEdge.source_root == self.store.config.source_root)
-                ).all()
-            )
-        for node in nodes:
-            graph.add_node(
-                node.module_id,
-                ast_summary=node.ast_summary,
-                review_status=node.review_status.value,
-            )
-        for edge in edges:
-            graph.add_edge(
-                edge.source_module_id,
-                edge.target_module_id,
-                import_line=edge.import_line,
-                edge_type=edge.edge_type.value,
-                weight=edge.weight,
-            )
-        return graph
     def traversal_order(self, graph: nx.DiGraph | None = None) -> list[str]:
-        graph = graph or self._build_graph()
         if graph.number_of_nodes() == 0:
             return []
-        if not nx.is_directed_acyclic_graph(graph):
-            # Fall back to deterministic ordering if cyclic imports exist.
-            return sorted(graph.nodes())
-        centrality = nx.betweenness_centrality(graph)
-        indegree = {node: graph.in_degree(node) for node in graph.nodes()}
-        queue = [node for node, deg in indegree.items() if deg == 0]
-        order: list[str] = []
-        def rank(node: str) -> tuple[float, float, str]:
-            return (
-                float(graph.out_degree(node)),
                 float(centrality.get(node, 0.0)),
-                node,
-            )
-        while queue:
-            queue.sort(key=rank)
-            current = queue.pop(0)
-            order.append(current)
-            for successor in sorted(graph.successors(current)):
-                indegree[successor] -= 1
-                if indegree[successor] == 0:
-                    queue.append(successor)
-        return order
 if __name__ == "__main__":

 from pathlib import Path
 import networkx as nx
+from db.seed import seed_project
+from graph.graph_manager import GraphManager
 @dataclass
 class DependencyGraph:
     def __init__(self, target_dir: str | Path, db_path: str | Path | None = None) -> None:
         self.target_dir = Path(target_dir).resolve()
+        self.graph_manager = GraphManager(source_root=self.target_dir, db_path=db_path)
     def load_or_build(self, force_reparse: bool = False) -> GraphLoadResult:
+        result = seed_project(
+            self.target_dir,
+            db_path=str(self.graph_manager.store.config.db_path),
+            force=force_reparse,
+        )
+        loaded_from_cache = bool(result.get("loaded_from_cache", False))
         return GraphLoadResult(graph=self._build_graph(), loaded_from_cache=loaded_from_cache)
     def _build_graph(self) -> nx.DiGraph:
+        return self.graph_manager.load_graph()
     def traversal_order(self, graph: nx.DiGraph | None = None) -> list[str]:
+        if graph is None:
+            return self.graph_manager.traversal_order()
         if graph.number_of_nodes() == 0:
             return []
+        centrality = nx.betweenness_centrality(graph, normalized=True)
+        return sorted(
+            graph.nodes(),
+            key=lambda node: (
+                int(graph.out_degree(node)),
                 float(centrality.get(node, 0.0)),
+                str(node),
+            ),
+        )
 if __name__ == "__main__":

code-review-env/env/observation.py ADDED Viewed

	@@ -0,0 +1,62 @@

+from __future__ import annotations
+from typing import Literal
+from pydantic import BaseModel, ConfigDict, Field, field_validator
+from graph.token_budget import MAX_TOTAL_TOKENS
+class NeighborSummary(BaseModel):
+    model_config = ConfigDict(strict=True, extra="forbid")
+    module_id: str
+    relation: Literal["dependency", "dependent"]
+    summary: str
+    review_snippet: str | None = None
+class RequestedContext(BaseModel):
+    model_config = ConfigDict(strict=True, extra="forbid")
+    module_id: str
+    code: str
+    was_truncated: bool
+class CodeObservation(BaseModel):
+    model_config = ConfigDict(strict=True, extra="forbid")
+    module_id: str
+    code: str
+    ast_summary: dict[str, object]
+    dependency_summaries: list[NeighborSummary] = Field(default_factory=list)
+    dependent_summaries: list[NeighborSummary] = Field(default_factory=list)
+    neighbor_reviews: list[str] = Field(default_factory=list)
+    task_description: str
+    available_actions: list[str] = Field(default_factory=list)
+    requested_context: RequestedContext | None = None
+    token_usage: dict[str, int]
+    total_tokens: int
+    within_budget: bool
+    @field_validator("module_id", "code", "task_description")
+    @classmethod
+    def _must_not_be_empty(cls, value: str) -> str:
+        if not value.strip():
+            raise ValueError("Field cannot be empty")
+        return value
+    @field_validator("total_tokens")
+    @classmethod
+    def _budget_hard_cap(cls, value: int) -> int:
+        if value > MAX_TOTAL_TOKENS:
+            raise ValueError(f"total_tokens exceeds hard cap: {MAX_TOTAL_TOKENS}")
+        return value
+    @field_validator("within_budget")
+    @classmethod
+    def _must_be_true(cls, value: bool) -> bool:
+        if not value:
+            raise ValueError("within_budget must be True")
+        return value

code-review-env/env/observation_builder.py CHANGED Viewed

	@@ -1 +1,143 @@
1	- ~~"""Phase~~ 2 ~~implementation~~ ~~placeholder."""~~

+from __future__ import annotations
+import json
+from pathlib import Path
+from sqlmodel import Session, select
+from db.schema import ModuleNode
+from env.observation import CodeObservation, NeighborSummary, RequestedContext
+from graph.graph_manager import GraphManager
+from graph.token_budget import TokenBudget
+DEFAULT_ACTIONS = [
+	"FLAG_STYLE",
+	"FLAG_BUG",
+	"FLAG_SECURITY",
+	"FLAG_DEPENDENCY_ISSUE",
+	"ADD_COMMENT",
+	"REQUEST_CONTEXT",
+	"REQUEST_CHANGES",
+	"APPROVE",
+	"AMEND_REVIEW",
+]
+class ObservationBuilder:
+	def __init__(self, source_root: str | Path, db_path: str | Path | None = None) -> None:
+		self.graph_manager = GraphManager(source_root=source_root, db_path=db_path)
+		self.token_budget = TokenBudget()
+	def _fetch_node(self, module_id: str) -> ModuleNode:
+		with Session(self.graph_manager.store.engine) as session:
+			node = session.exec(
+				select(ModuleNode).where(
+					ModuleNode.source_root == self.graph_manager.store.config.source_root,
+					ModuleNode.module_id == module_id,
+				)
+			).first()
+		if not node:
+			raise ValueError(f"Unknown module_id: {module_id}")
+		return node
+	@staticmethod
+	def _ast_summary_payload(ast_summary: str) -> dict[str, object]:
+		try:
+			loaded = json.loads(ast_summary)
+		except json.JSONDecodeError:
+			return {"text": ast_summary}
+		return loaded if isinstance(loaded, dict) else {"items": loaded}
+	def build(
+		self,
+		module_id: str,
+		task_description: str,
+		available_actions: list[str] | None = None,
+		context_request: str | None = None,
+	) -> CodeObservation:
+		graph = self.graph_manager.load_graph()
+		if module_id not in graph:
+			raise ValueError(f"Unknown module_id: {module_id}")
+		node = self._fetch_node(module_id)
+		centrality = self.graph_manager.centrality()
+		dependencies = list(graph.successors(module_id))
+		dependents = list(graph.predecessors(module_id))
+		dep_ranked = sorted(dependencies, key=lambda n: (-float(centrality.get(n, 0.0)), n))[:5]
+		dependent_ranked = sorted(dependents, key=lambda n: (-float(centrality.get(n, 0.0)), n))[:3]
+		dependency_summaries: list[NeighborSummary] = []
+		dependent_summaries: list[NeighborSummary] = []
+		neighbor_reviews: list[str] = []
+		for dep_id in dep_ranked:
+			dep_node = self._fetch_node(dep_id)
+			dependency_summaries.append(
+				NeighborSummary(
+					module_id=dep_id,
+					relation="dependency",
+					summary=dep_node.summary or dep_node.ast_summary,
+					review_snippet=dep_node.review_summary,
+				)
+			)
+			if dep_node.review_summary:
+				neighbor_reviews.append(f"{dep_id}: {dep_node.review_summary}")
+		for depd_id in dependent_ranked:
+			depd_node = self._fetch_node(depd_id)
+			dependent_summaries.append(
+				NeighborSummary(
+					module_id=depd_id,
+					relation="dependent",
+					summary=depd_node.summary or depd_node.ast_summary,
+					review_snippet=depd_node.review_summary,
+				)
+			)
+			if depd_node.review_summary:
+				neighbor_reviews.append(f"{depd_id}: {depd_node.review_summary}")
+		requested_context: RequestedContext | None = None
+		requested_context_code = ""
+		if context_request:
+			context_node = self._fetch_node(context_request)
+			requested_context_code = context_node.raw_code
+		actions = available_actions or DEFAULT_ACTIONS
+		budgeted = self.token_budget.enforce(
+			{
+				"code": node.raw_code,
+				"ast_summary_text": node.ast_summary,
+				"dependency_summaries": [item.model_dump_json() for item in dependency_summaries],
+				"dependent_summaries": [item.model_dump_json() for item in dependent_summaries],
+				"neighbor_reviews": neighbor_reviews[:4],
+				"task_description": task_description,
+				"available_actions": actions,
+				"requested_context_code": requested_context_code,
+			}
+		)
+		if context_request:
+			context_trimmed = budgeted.payload.get("requested_context_code", "")
+			requested_context = RequestedContext(
+				module_id=context_request,
+				code=str(context_trimmed),
+				was_truncated=str(context_trimmed) != requested_context_code,
+			)
+		return CodeObservation(
+			module_id=module_id,
+			code=str(budgeted.payload.get("code", "")),
+			ast_summary=self._ast_summary_payload(str(budgeted.payload.get("ast_summary_text", ""))),
+			dependency_summaries=dependency_summaries,
+			dependent_summaries=dependent_summaries,
+			neighbor_reviews=neighbor_reviews[:4],
+			task_description=task_description,
+			available_actions=actions,
+			requested_context=requested_context,
+			token_usage=budgeted.token_usage,
+			total_tokens=budgeted.total_tokens,
+			within_budget=budgeted.total_tokens <= self.token_budget.max_total_tokens,
+		)

code-review-env/graph/__init__.py ADDED Viewed

	@@ -0,0 +1,5 @@

+"""Graph utilities for loading and querying dependency graphs."""
+from graph.graph_manager import GraphManager
+__all__ = ["GraphManager"]

code-review-env/graph/graph_manager.py ADDED Viewed

	@@ -0,0 +1,125 @@

+from __future__ import annotations
+from pathlib import Path
+from typing import Literal
+import networkx as nx
+from sqlmodel import Session, select
+from db.schema import ModuleEdge, ModuleNode
+from db.store import Store
+class GraphManager:
+    """Load and query dependency graph state from SQLite."""
+    def __init__(self, source_root: str | Path, db_path: str | Path | None = None) -> None:
+        self.source_root = str(Path(source_root).resolve())
+        self.store = Store(source_root=self.source_root, db_path=db_path)
+    def load_graph(self) -> nx.DiGraph:
+        graph = nx.DiGraph()
+        with Session(self.store.engine) as session:
+            nodes = list(
+                session.exec(
+                    select(ModuleNode).where(ModuleNode.source_root == self.store.config.source_root)
+                ).all()
+            )
+            edges = list(
+                session.exec(
+                    select(ModuleEdge).where(ModuleEdge.source_root == self.store.config.source_root)
+                ).all()
+            )
+        for node in nodes:
+            graph.add_node(
+                node.module_id,
+                name=node.name,
+                raw_code=node.raw_code,
+                ast_summary=node.ast_summary,
+                summary=node.summary or "",
+                linter_flags=node.linter_flags,
+                parent_module_id=node.parent_module_id,
+                review_status=node.review_status.value,
+                review_summary=node.review_summary or "",
+                is_chunk=node.is_chunk,
+            )
+        for edge in edges:
+            graph.add_edge(
+                edge.source_module_id,
+                edge.target_module_id,
+                edge_type=edge.edge_type.value,
+                import_line=edge.import_line,
+                weight=edge.weight,
+            )
+        return graph
+    def get_node(self, module_id: str) -> dict[str, object]:
+        graph = self.load_graph()
+        if module_id not in graph:
+            raise ValueError(f"Unknown module_id: {module_id}")
+        return dict(graph.nodes[module_id])
+    def get_neighbors(
+        self,
+        module_id: str,
+        direction: Literal["out", "in", "both"] = "both",
+        limit: int | None = None,
+    ) -> list[str]:
+        graph = self.load_graph()
+        if module_id not in graph:
+            raise ValueError(f"Unknown module_id: {module_id}")
+        if direction == "out":
+            neighbors = set(graph.successors(module_id))
+        elif direction == "in":
+            neighbors = set(graph.predecessors(module_id))
+        else:
+            neighbors = set(graph.successors(module_id))
+            neighbors.update(graph.predecessors(module_id))
+        ordered = sorted(neighbors)
+        if limit is None:
+            return ordered
+        return ordered[: max(limit, 0)]
+    def centrality(self) -> dict[str, float]:
+        graph = self.load_graph()
+        if graph.number_of_nodes() == 0:
+            return {}
+        return nx.betweenness_centrality(graph, normalized=True)
+    def traversal_order(self) -> list[str]:
+        """
+        Return a deterministic, leaf-first traversal where high-centrality nodes are later.
+        """
+        graph = self.load_graph()
+        if graph.number_of_nodes() == 0:
+            return []
+        centrality = self.centrality()
+        # For DAGs, reverse topological order visits leaves first.
+        if nx.is_directed_acyclic_graph(graph):
+            topo_reversed = list(reversed(list(nx.lexicographical_topological_sort(graph))))
+            topo_rank = {node: idx for idx, node in enumerate(topo_reversed)}
+            return sorted(
+                graph.nodes(),
+                key=lambda node: (
+                    int(topo_rank.get(node, 0)),
+                    float(centrality.get(node, 0.0)),
+                    str(node),
+                ),
+            )
+        # Stable fallback for cyclic graphs.
+        return sorted(
+            graph.nodes(),
+            key=lambda node: (
+                int(graph.out_degree(node)),
+                float(centrality.get(node, 0.0)),
+                str(node),
+            ),
+        )

code-review-env/graph/token_budget.py ADDED Viewed

	@@ -0,0 +1,117 @@

+from __future__ import annotations
+import math
+from dataclasses import dataclass
+MAX_TOTAL_TOKENS = 2000
+COMPONENT_LIMITS: dict[str, int] = {
+    "current_code": 800,
+    "ast_summary": 100,
+    "direct_deps": 250,
+    "dependents": 150,
+    "neighbor_reviews": 120,
+    "task_and_actions": 200,
+    "requested_context": 800,
+}
+def estimate_tokens(text: str) -> int:
+    """Deterministic approximation with conservative floor for non-empty text."""
+    if not text:
+        return 0
+    return max(1, int(math.ceil(len(text) / 4)))
+def truncate_to_budget(text: str, max_tokens: int, suffix_notice: str = "\n... [TRUNCATED]") -> str:
+    if max_tokens <= 0:
+        return ""
+    current = estimate_tokens(text)
+    if current <= max_tokens:
+        return text
+    notice_tokens = estimate_tokens(suffix_notice)
+    content_budget = max(max_tokens - notice_tokens, 0)
+    max_chars = content_budget * 4
+    trimmed = text[:max_chars]
+    return f"{trimmed}{suffix_notice}" if trimmed else suffix_notice.strip()
+@dataclass(frozen=True)
+class BudgetResult:
+    payload: dict[str, object]
+    token_usage: dict[str, int]
+    total_tokens: int
+class TokenBudget:
+    def __init__(self, max_total_tokens: int = MAX_TOTAL_TOKENS) -> None:
+        self.max_total_tokens = max_total_tokens
+    def _trim_component(self, text: str, component_name: str) -> str:
+        limit = COMPONENT_LIMITS.get(component_name, self.max_total_tokens)
+        return truncate_to_budget(text, limit)
+    def enforce(self, payload: dict[str, object]) -> BudgetResult:
+        normalized = dict(payload)
+        usage: dict[str, int] = {}
+        current_code = str(normalized.get("code", ""))
+        ast_summary = str(normalized.get("ast_summary_text", ""))
+        dep_text = "\n".join(str(item) for item in normalized.get("dependency_summaries", []))
+        dependent_text = "\n".join(str(item) for item in normalized.get("dependent_summaries", []))
+        review_text = "\n".join(str(item) for item in normalized.get("neighbor_reviews", []))
+        task_actions = "\n".join(
+            [
+                str(normalized.get("task_description", "")),
+                " ".join(str(a) for a in normalized.get("available_actions", [])),
+            ]
+        )
+        requested_context = str(normalized.get("requested_context_code", ""))
+        current_code = self._trim_component(current_code, "current_code")
+        ast_summary = self._trim_component(ast_summary, "ast_summary")
+        dep_text = self._trim_component(dep_text, "direct_deps")
+        dependent_text = self._trim_component(dependent_text, "dependents")
+        review_text = self._trim_component(review_text, "neighbor_reviews")
+        task_actions = self._trim_component(task_actions, "task_and_actions")
+        requested_context = self._trim_component(requested_context, "requested_context")
+        normalized["code"] = current_code
+        normalized["ast_summary_text"] = ast_summary
+        normalized["dependency_summaries_text"] = dep_text
+        normalized["dependent_summaries_text"] = dependent_text
+        normalized["neighbor_reviews_text"] = review_text
+        normalized["task_actions_text"] = task_actions
+        normalized["requested_context_code"] = requested_context
+        usage["current_code"] = estimate_tokens(current_code)
+        usage["ast_summary"] = estimate_tokens(ast_summary)
+        usage["direct_deps"] = estimate_tokens(dep_text)
+        usage["dependents"] = estimate_tokens(dependent_text)
+        usage["neighbor_reviews"] = estimate_tokens(review_text)
+        usage["task_and_actions"] = estimate_tokens(task_actions)
+        usage["requested_context"] = estimate_tokens(requested_context)
+        total = sum(usage.values())
+        if total > self.max_total_tokens:
+            overflow = total - self.max_total_tokens
+            requested_limit = max(estimate_tokens(requested_context) - overflow, 0)
+            requested_context = truncate_to_budget(requested_context, requested_limit)
+            normalized["requested_context_code"] = requested_context
+            usage["requested_context"] = estimate_tokens(requested_context)
+            total = sum(usage.values())
+        if total > self.max_total_tokens:
+            overflow = total - self.max_total_tokens
+            code_limit = max(estimate_tokens(current_code) - overflow, 0)
+            current_code = truncate_to_budget(current_code, code_limit)
+            normalized["code"] = current_code
+            usage["current_code"] = estimate_tokens(current_code)
+            total = sum(usage.values())
+        if total > self.max_total_tokens:
+            raise ValueError("Unable to enforce token budget within hard limit")
+        return BudgetResult(payload=normalized, token_usage=usage, total_tokens=total)

code-review-env/parser/ast_parser.py CHANGED Viewed

@@ -15,6 +15,8 @@ from parser.summarizer import summarize_module
 class ImportRef(BaseModel):
     target_module: str
     import_line: str
     edge_type: EdgeType = EdgeType.EXPLICIT_IMPORT
@@ -33,7 +35,12 @@ class _Visitor(ast.NodeVisitor):
         self.function_signatures: list[str] = []
         self.classes: list[str] = []
         self.constants: list[str] = []
-        self.imports: list[tuple[str, str]] = []
     def visit_FunctionDef(self, node: ast.FunctionDef) -> None:
         args: list[str] = []
@@ -44,7 +51,11 @@ class _Visitor(ast.NodeVisitor):
                 args.append(arg.arg)
         returns = ast.unparse(node.returns) if node.returns is not None else "None"
         self.function_signatures.append(f"{node.name}({', '.join(args)})->{returns}")
-        self.generic_visit(node)
     def visit_AsyncFunctionDef(self, node: ast.AsyncFunctionDef) -> None:
         fake = ast.FunctionDef(
@@ -59,19 +70,23 @@ class _Visitor(ast.NodeVisitor):
     def visit_ClassDef(self, node: ast.ClassDef) -> None:
         self.classes.append(node.name)
-        self.generic_visit(node)
     def visit_Import(self, node: ast.Import) -> None:
         line = ast.get_source_segment(self._source, node) or "import"
         for alias in node.names:
-            self.imports.append((alias.name, line))
     def visit_ImportFrom(self, node: ast.ImportFrom) -> None:
         module = node.module or ""
         level = node.level or 0
         dotted = "." * level + module
         line = ast.get_source_segment(self._source, node) or "from"
-        self.imports.append((dotted, line))
     def visit_Assign(self, node: ast.Assign) -> None:
         if isinstance(node.value, ast.Constant):
@@ -105,7 +120,18 @@ def _resolve_relative_import(current_module: str, ref: str) -> str:
 def parse_python_file(path: Path, root_dir: Path) -> ParsedModule:
     source = path.read_text(encoding="utf-8")
     module_id = _to_module_id(path, root_dir)
-    tree = ast.parse(source)
     visitor = _Visitor()
     visitor.parse(tree, source)
@@ -114,9 +140,11 @@ def parse_python_file(path: Path, root_dir: Path) -> ParsedModule:
         ImportRef(
             target_module=_resolve_relative_import(module_id, name),
             import_line=line,
             edge_type=EdgeType.EXPLICIT_IMPORT,
         )
-        for name, line in visitor.imports
     ]
     dependencies = [imp.target_module for imp in imports if imp.target_module]
@@ -138,8 +166,10 @@ def parse_directory(target_dir: Path, db_path: str | None = None) -> Store:
     store.clear_source_graph()
     py_files = sorted(target_dir.rglob("*.py"))
-    for py_file in py_files:
-        parsed = parse_python_file(py_file, target_dir)
         issues = run_linters(py_file)
         summary = summarize_module(parsed, issues)
@@ -155,13 +185,13 @@ def parse_directory(target_dir: Path, db_path: str | None = None) -> Store:
             [issue.model_dump() for issue in issues],
         )
         for imported in parsed.imports:
-            if imported.target_module:
                 store.upsert_edge(
                     source_module_id=parsed.module_id,
                     target_module_id=imported.target_module,
                     edge_type=imported.edge_type,
                     import_line=imported.import_line,
-                    weight=1.0,
                 )
     return store

 class ImportRef(BaseModel):
     target_module: str
     import_line: str
+    scope: str = "module_level"
+    weight: float = 1.0
     edge_type: EdgeType = EdgeType.EXPLICIT_IMPORT
         self.function_signatures: list[str] = []
         self.classes: list[str] = []
         self.constants: list[str] = []
+        self.imports: list[tuple[str, str, str]] = []
+        self._scope_stack: list[str] = []
+    @property
+    def _scope(self) -> str:
+        return "function_level" if self._scope_stack else "module_level"
     def visit_FunctionDef(self, node: ast.FunctionDef) -> None:
         args: list[str] = []
                 args.append(arg.arg)
         returns = ast.unparse(node.returns) if node.returns is not None else "None"
         self.function_signatures.append(f"{node.name}({', '.join(args)})->{returns}")
+        self._scope_stack.append(node.name)
+        try:
+            self.generic_visit(node)
+        finally:
+            self._scope_stack.pop()
     def visit_AsyncFunctionDef(self, node: ast.AsyncFunctionDef) -> None:
         fake = ast.FunctionDef(
     def visit_ClassDef(self, node: ast.ClassDef) -> None:
         self.classes.append(node.name)
+        self._scope_stack.append(node.name)
+        try:
+            self.generic_visit(node)
+        finally:
+            self._scope_stack.pop()
     def visit_Import(self, node: ast.Import) -> None:
         line = ast.get_source_segment(self._source, node) or "import"
         for alias in node.names:
+            self.imports.append((alias.name, line, self._scope))
     def visit_ImportFrom(self, node: ast.ImportFrom) -> None:
         module = node.module or ""
         level = node.level or 0
         dotted = "." * level + module
         line = ast.get_source_segment(self._source, node) or "from"
+        self.imports.append((dotted, line, self._scope))
     def visit_Assign(self, node: ast.Assign) -> None:
         if isinstance(node.value, ast.Constant):
 def parse_python_file(path: Path, root_dir: Path) -> ParsedModule:
     source = path.read_text(encoding="utf-8")
     module_id = _to_module_id(path, root_dir)
+    try:
+        tree = ast.parse(source)
+    except SyntaxError:
+        return ParsedModule(
+            module_id=module_id,
+            raw_code=source,
+            function_signatures=[],
+            classes=[],
+            imports=[],
+            constants=[],
+            dependencies=[],
+        )
     visitor = _Visitor()
     visitor.parse(tree, source)
         ImportRef(
             target_module=_resolve_relative_import(module_id, name),
             import_line=line,
+            scope=scope,
+            weight=0.5 if scope == "function_level" else 1.0,
             edge_type=EdgeType.EXPLICIT_IMPORT,
         )
+        for name, line, scope in visitor.imports
     ]
     dependencies = [imp.target_module for imp in imports if imp.target_module]
     store.clear_source_graph()
     py_files = sorted(target_dir.rglob("*.py"))
+    parsed_modules = [parse_python_file(py_file, target_dir) for py_file in py_files]
+    known_module_ids = {parsed.module_id for parsed in parsed_modules}
+    for py_file, parsed in zip(py_files, parsed_modules):
         issues = run_linters(py_file)
         summary = summarize_module(parsed, issues)
             [issue.model_dump() for issue in issues],
         )
         for imported in parsed.imports:
+            if imported.target_module and imported.target_module in known_module_ids:
                 store.upsert_edge(
                     source_module_id=parsed.module_id,
                     target_module_id=imported.target_module,
                     edge_type=imported.edge_type,
                     import_line=imported.import_line,
+                    weight=imported.weight,
                 )
     return store

code-review-env/parser/chunker.py ADDED Viewed

	@@ -0,0 +1,96 @@

+from __future__ import annotations
+import ast
+from pydantic import BaseModel
+from parser.ast_parser import ParsedModule
+class ChunkNode(BaseModel):
+    module_id: str
+    name: str
+    code: str
+    parent_module_id: str | None = None
+    is_chunk: bool = False
+    start_line: int = 1
+    end_line: int = 1
+class ChunkResult(BaseModel):
+    parent: ChunkNode
+    chunks: list[ChunkNode]
+def _slice_lines(source: str, start: int, end: int) -> str:
+    lines = source.splitlines()
+    start_idx = max(start - 1, 0)
+    end_idx = min(end, len(lines))
+    return "\n".join(lines[start_idx:end_idx]).strip()
+def chunk_module(parsed: ParsedModule, max_lines: int = 300) -> ChunkResult:
+    line_count = len(parsed.raw_code.splitlines())
+    if line_count <= max_lines:
+        parent = ChunkNode(
+            module_id=parsed.module_id,
+            name=parsed.module_id.split(".")[-1],
+            code=parsed.raw_code,
+            is_chunk=False,
+            start_line=1,
+            end_line=line_count,
+        )
+        return ChunkResult(parent=parent, chunks=[])
+    try:
+        tree = ast.parse(parsed.raw_code)
+    except SyntaxError:
+        parent = ChunkNode(
+            module_id=parsed.module_id,
+            name=parsed.module_id.split(".")[-1],
+            code=parsed.raw_code,
+            is_chunk=False,
+            start_line=1,
+            end_line=line_count,
+        )
+        return ChunkResult(parent=parent, chunks=[])
+    chunks: list[ChunkNode] = []
+    for node in tree.body:
+        if isinstance(node, (ast.FunctionDef, ast.AsyncFunctionDef, ast.ClassDef)):
+            start_line = int(getattr(node, "lineno", 1))
+            end_line = int(getattr(node, "end_lineno", start_line))
+            chunk_id = f"{parsed.module_id}::{node.name}"
+            chunks.append(
+                ChunkNode(
+                    module_id=chunk_id,
+                    name=node.name,
+                    code=_slice_lines(parsed.raw_code, start_line, end_line),
+                    parent_module_id=parsed.module_id,
+                    is_chunk=True,
+                    start_line=start_line,
+                    end_line=end_line,
+                )
+            )
+    if not chunks:
+        chunks.append(
+            ChunkNode(
+                module_id=f"{parsed.module_id}::module_body",
+                name="module_body",
+                code=parsed.raw_code,
+                parent_module_id=parsed.module_id,
+                is_chunk=True,
+                start_line=1,
+                end_line=line_count,
+            )
+        )
+    parent = ChunkNode(
+        module_id=parsed.module_id,
+        name=parsed.module_id.split(".")[-1],
+        code="",
+        is_chunk=False,
+        start_line=1,
+        end_line=line_count,
+    )
+    return ChunkResult(parent=parent, chunks=chunks)

code-review-env/parser/graph_builder.py ADDED Viewed

	@@ -0,0 +1,114 @@

+from __future__ import annotations
+import ast
+import networkx as nx
+from pydantic import BaseModel
+from db.schema import EdgeType
+from parser.ast_parser import ParsedModule
+class EdgeRecord(BaseModel):
+    source_module_id: str
+    target_module_id: str
+    edge_type: EdgeType
+    import_line: str
+    scope: str
+    weight: float
+def _build_intra_file_edges(parsed: ParsedModule, available_chunk_ids: set[str]) -> list[EdgeRecord]:
+    try:
+        tree = ast.parse(parsed.raw_code)
+    except SyntaxError:
+        return []
+    function_names = {
+        node.name
+        for node in tree.body
+        if isinstance(node, (ast.FunctionDef, ast.AsyncFunctionDef))
+    }
+    call_edges: list[EdgeRecord] = []
+    for node in tree.body:
+        if not isinstance(node, (ast.FunctionDef, ast.AsyncFunctionDef)):
+            continue
+        source_id = f"{parsed.module_id}::{node.name}"
+        if source_id not in available_chunk_ids:
+            continue
+        for inner in ast.walk(node):
+            if isinstance(inner, ast.Call) and isinstance(inner.func, ast.Name):
+                called = inner.func.id
+                if called in function_names:
+                    target_id = f"{parsed.module_id}::{called}"
+                    if target_id in available_chunk_ids and target_id != source_id:
+                        call_edges.append(
+                            EdgeRecord(
+                                source_module_id=source_id,
+                                target_module_id=target_id,
+                                edge_type=EdgeType.INTRA_FILE,
+                                import_line=f"call:{called}",
+                                scope="function_level",
+                                weight=0.5,
+                            )
+                        )
+    dedup: dict[tuple[str, str, str], EdgeRecord] = {}
+    for edge in call_edges:
+        key = (edge.source_module_id, edge.target_module_id, edge.import_line)
+        dedup[key] = edge
+    return list(dedup.values())
+def build_edges(
+    parsed_modules: list[ParsedModule],
+    module_ids: set[str],
+    chunk_ids_by_parent: dict[str, set[str]],
+) -> list[EdgeRecord]:
+    edges: list[EdgeRecord] = []
+    for parsed in parsed_modules:
+        source_module_id = parsed.module_id
+        for imp in parsed.imports:
+            if imp.target_module and imp.target_module in module_ids:
+                edge_type = (
+                    EdgeType.EXPLICIT_IMPORT
+                    if imp.scope == "module_level"
+                    else EdgeType.IMPLICIT_DEPENDENCY
+                )
+                edges.append(
+                    EdgeRecord(
+                        source_module_id=source_module_id,
+                        target_module_id=imp.target_module,
+                        edge_type=edge_type,
+                        import_line=imp.import_line,
+                        scope=imp.scope,
+                        weight=imp.weight,
+                    )
+                )
+        available_chunk_ids = chunk_ids_by_parent.get(parsed.module_id, set())
+        edges.extend(_build_intra_file_edges(parsed, available_chunk_ids))
+    graph = nx.DiGraph()
+    for edge in edges:
+        graph.add_edge(edge.source_module_id, edge.target_module_id)
+    for source_module_id, target_module_id in list(graph.edges()):
+        if graph.has_edge(target_module_id, source_module_id):
+            edges.append(
+                EdgeRecord(
+                    source_module_id=source_module_id,
+                    target_module_id=target_module_id,
+                    edge_type=EdgeType.CIRCULAR,
+                    import_line="cycle_detected",
+                    scope="module_level",
+                    weight=1.0,
+                )
+            )
+    dedup: dict[tuple[str, str, str], EdgeRecord] = {}
+    for edge in edges:
+        key = (edge.source_module_id, edge.target_module_id, edge.import_line)
+        dedup[key] = edge
+    return list(dedup.values())

code-review-env/parser/linter.py CHANGED Viewed

@@ -98,7 +98,36 @@ def run_bandit(path: Path) -> list[LinterIssue]:
     return issues
 def run_linters(path: Path) -> list[LinterIssue]:
     issues = run_pylint(path)
     issues.extend(run_bandit(path))
     return issues

     return issues
+def run_pyflakes(path: Path) -> list[LinterIssue]:
+    cmd = [sys.executable, "-m", "pyflakes", str(path)]
+    proc = subprocess.run(cmd, capture_output=True, text=True, check=False)
+    payload = (proc.stdout or "").strip()
+    if not payload:
+        return []
+    issues: list[LinterIssue] = []
+    for raw_line in payload.splitlines():
+        line = 0
+        message = raw_line.strip()
+        if ":" in raw_line:
+            parts = raw_line.split(":", 3)
+            if len(parts) >= 3 and parts[1].isdigit():
+                line = int(parts[1])
+                message = parts[3].strip() if len(parts) == 4 else message
+        issues.append(
+            LinterIssue(
+                tool="pyflakes",
+                line=line,
+                severity="medium",
+                code="PYF000",
+                message=message,
+            )
+        )
+    return issues
 def run_linters(path: Path) -> list[LinterIssue]:
     issues = run_pylint(path)
     issues.extend(run_bandit(path))
+    issues.extend(run_pyflakes(path))
     return issues

code-review-env/requirements.txt CHANGED Viewed

@@ -3,6 +3,7 @@ networkx>=3.2
 pydantic>=2.7
 pylint>=3.2
 bandit>=1.7
 fastapi>=0.115
 uvicorn>=0.30
 openai>=1.40

 pydantic>=2.7
 pylint>=3.2
 bandit>=1.7
+pyflakes>=3.2
 fastapi>=0.115
 uvicorn>=0.30
 openai>=1.40

code-review-env/sample_project/auth.py ADDED Viewed

	@@ -0,0 +1,7 @@

+"""Auth helpers."""
+import config
+def issue_session_token(user_id: str) -> str:
+    return f"{user_id}:{config.SECRET_KEY}:session-token-generated-with-a-very-long-suffix-that-triggers-style-rules-and-is-hard-to-read"

code-review-env/sample_project/cart.py ADDED Viewed

	@@ -0,0 +1,17 @@

+"""Cart calculations."""
+import config
+def calculate_subtotal(items: list[dict[str, float]]) -> float:
+    subtotal = 0.0
+    for item in items:
+        subtotal += float(item.get("price", 0.0)) * float(item.get("qty", 0.0))
+    return subtotal
+def calculate_total(items: list[dict[str, float]]) -> float:
+    subtotal = calculate_subtotal(items)
+    # BUG: config.DISCOUNT_RATE is intended to be 0.20, but set to 20 in config.
+    discounted = subtotal - (subtotal * config.DISCOUNT_RATE)
+    return discounted + (discounted * config.TAX_RATE)

code-review-env/sample_project/checkout.py ADDED Viewed

	@@ -0,0 +1,15 @@

+"""Checkout flow."""
+import cart
+import payments
+def submit_order(items: list[dict[str, float]]) -> str:
+    total = cart.calculate_total(items)
+    # Cascading symptom: negative total is observed here but root cause is config -> cart.
+    if total < 0:
+        return "error: negative total"
+    gateway_ok = payments.run_gateway_check("https://gateway.example.com/health")
+    if gateway_ok != 0:
+        return "error: gateway"
+    return payments.charge(total)

code-review-env/sample_project/config.py ADDED Viewed

	@@ -0,0 +1,6 @@

+"""Configuration defaults for the checkout flow."""
+DISCOUNT_RATE = 20
+TAX_RATE = 0.07
+PAYMENT_TIMEOUT_SECONDS = 30
+SECRET_KEY = "hardcoded-dev-key"

code-review-env/sample_project/database.py ADDED Viewed

	@@ -0,0 +1,6 @@

+from config import SETTINGS
+def get_connection_url() -> str:
+    # Intentional bug for lint/security testing: unsafely concatenated DSN-like value
+    return "sqlite:///" + SETTINGS.get("db_path")

code-review-env/sample_project/huge_module.py ADDED Viewed

	@@ -0,0 +1,628 @@

+"""Large synthetic file for chunking checks."""
+def bootstrap() -> int:
+    return 1
+LINE_1 = 1
+LINE_2 = 2
+LINE_3 = 3
+LINE_4 = 4
+LINE_5 = 5
+LINE_6 = 6
+LINE_7 = 7
+LINE_8 = 8
+LINE_9 = 9
+LINE_10 = 10
+LINE_11 = 11
+LINE_12 = 12
+LINE_13 = 13
+LINE_14 = 14
+LINE_15 = 15
+LINE_16 = 16
+LINE_17 = 17
+LINE_18 = 18
+LINE_19 = 19
+LINE_20 = 20
+LINE_21 = 21
+LINE_22 = 22
+LINE_23 = 23
+LINE_24 = 24
+LINE_25 = 25
+LINE_26 = 26
+LINE_27 = 27
+LINE_28 = 28
+LINE_29 = 29
+LINE_30 = 30
+LINE_31 = 31
+LINE_32 = 32
+LINE_33 = 33
+LINE_34 = 34
+LINE_35 = 35
+LINE_36 = 36
+LINE_37 = 37
+LINE_38 = 38
+LINE_39 = 39
+LINE_40 = 40
+LINE_41 = 41
+LINE_42 = 42
+LINE_43 = 43
+LINE_44 = 44
+LINE_45 = 45
+LINE_46 = 46
+LINE_47 = 47
+LINE_48 = 48
+LINE_49 = 49
+LINE_50 = 50
+LINE_51 = 51
+LINE_52 = 52
+LINE_53 = 53
+LINE_54 = 54
+LINE_55 = 55
+LINE_56 = 56
+LINE_57 = 57
+LINE_58 = 58
+LINE_59 = 59
+LINE_60 = 60
+LINE_61 = 61
+LINE_62 = 62
+LINE_63 = 63
+LINE_64 = 64
+LINE_65 = 65
+LINE_66 = 66
+LINE_67 = 67
+LINE_68 = 68
+LINE_69 = 69
+LINE_70 = 70
+LINE_71 = 71
+LINE_72 = 72
+LINE_73 = 73
+LINE_74 = 74
+LINE_75 = 75
+LINE_76 = 76
+LINE_77 = 77
+LINE_78 = 78
+LINE_79 = 79
+LINE_80 = 80
+LINE_81 = 81
+LINE_82 = 82
+LINE_83 = 83
+LINE_84 = 84
+LINE_85 = 85
+LINE_86 = 86
+LINE_87 = 87
+LINE_88 = 88
+LINE_89 = 89
+LINE_90 = 90
+LINE_91 = 91
+LINE_92 = 92
+LINE_93 = 93
+LINE_94 = 94
+LINE_95 = 95
+LINE_96 = 96
+LINE_97 = 97
+LINE_98 = 98
+LINE_99 = 99
+LINE_100 = 100
+LINE_101 = 101
+LINE_102 = 102
+LINE_103 = 103
+LINE_104 = 104
+LINE_105 = 105
+LINE_106 = 106
+LINE_107 = 107
+LINE_108 = 108
+LINE_109 = 109
+LINE_110 = 110
+LINE_111 = 111
+LINE_112 = 112
+LINE_113 = 113
+LINE_114 = 114
+LINE_115 = 115
+LINE_116 = 116
+LINE_117 = 117
+LINE_118 = 118
+LINE_119 = 119
+LINE_120 = 120
+LINE_121 = 121
+LINE_122 = 122
+LINE_123 = 123
+LINE_124 = 124
+LINE_125 = 125
+LINE_126 = 126
+LINE_127 = 127
+LINE_128 = 128
+LINE_129 = 129
+LINE_130 = 130
+LINE_131 = 131
+LINE_132 = 132
+LINE_133 = 133
+LINE_134 = 134
+LINE_135 = 135
+LINE_136 = 136
+LINE_137 = 137
+LINE_138 = 138
+LINE_139 = 139
+LINE_140 = 140
+LINE_141 = 141
+LINE_142 = 142
+LINE_143 = 143
+LINE_144 = 144
+LINE_145 = 145
+LINE_146 = 146
+LINE_147 = 147
+LINE_148 = 148
+LINE_149 = 149
+LINE_150 = 150
+LINE_151 = 151
+LINE_152 = 152
+LINE_153 = 153
+LINE_154 = 154
+LINE_155 = 155
+LINE_156 = 156
+LINE_157 = 157
+LINE_158 = 158
+LINE_159 = 159
+LINE_160 = 160
+LINE_161 = 161
+LINE_162 = 162
+LINE_163 = 163
+LINE_164 = 164
+LINE_165 = 165
+LINE_166 = 166
+LINE_167 = 167
+LINE_168 = 168
+LINE_169 = 169
+LINE_170 = 170
+LINE_171 = 171
+LINE_172 = 172
+LINE_173 = 173
+LINE_174 = 174
+LINE_175 = 175
+LINE_176 = 176
+LINE_177 = 177
+LINE_178 = 178
+LINE_179 = 179
+LINE_180 = 180
+LINE_181 = 181
+LINE_182 = 182
+LINE_183 = 183
+LINE_184 = 184
+LINE_185 = 185
+LINE_186 = 186
+LINE_187 = 187
+LINE_188 = 188
+LINE_189 = 189
+LINE_190 = 190
+LINE_191 = 191
+LINE_192 = 192
+LINE_193 = 193
+LINE_194 = 194
+LINE_195 = 195
+LINE_196 = 196
+LINE_197 = 197
+LINE_198 = 198
+LINE_199 = 199
+LINE_200 = 200
+LINE_201 = 201
+LINE_202 = 202
+LINE_203 = 203
+LINE_204 = 204
+LINE_205 = 205
+LINE_206 = 206
+LINE_207 = 207
+LINE_208 = 208
+LINE_209 = 209
+LINE_210 = 210
+LINE_211 = 211
+LINE_212 = 212
+LINE_213 = 213
+LINE_214 = 214
+LINE_215 = 215
+LINE_216 = 216
+LINE_217 = 217
+LINE_218 = 218
+LINE_219 = 219
+LINE_220 = 220
+LINE_221 = 221
+LINE_222 = 222
+LINE_223 = 223
+LINE_224 = 224
+LINE_225 = 225
+LINE_226 = 226
+LINE_227 = 227
+LINE_228 = 228
+LINE_229 = 229
+LINE_230 = 230
+LINE_231 = 231
+LINE_232 = 232
+LINE_233 = 233
+LINE_234 = 234
+LINE_235 = 235
+LINE_236 = 236
+LINE_237 = 237
+LINE_238 = 238
+LINE_239 = 239
+LINE_240 = 240
+LINE_241 = 241
+LINE_242 = 242
+LINE_243 = 243
+LINE_244 = 244
+LINE_245 = 245
+LINE_246 = 246
+LINE_247 = 247
+LINE_248 = 248
+LINE_249 = 249
+LINE_250 = 250
+LINE_251 = 251
+LINE_252 = 252
+LINE_253 = 253
+LINE_254 = 254
+LINE_255 = 255
+LINE_256 = 256
+LINE_257 = 257
+LINE_258 = 258
+LINE_259 = 259
+LINE_260 = 260
+LINE_261 = 261
+LINE_262 = 262
+LINE_263 = 263
+LINE_264 = 264
+LINE_265 = 265
+LINE_266 = 266
+LINE_267 = 267
+LINE_268 = 268
+LINE_269 = 269
+LINE_270 = 270
+LINE_271 = 271
+LINE_272 = 272
+LINE_273 = 273
+LINE_274 = 274
+LINE_275 = 275
+LINE_276 = 276
+LINE_277 = 277
+LINE_278 = 278
+LINE_279 = 279
+LINE_280 = 280
+LINE_281 = 281
+LINE_282 = 282
+LINE_283 = 283
+LINE_284 = 284
+LINE_285 = 285
+LINE_286 = 286
+LINE_287 = 287
+LINE_288 = 288
+LINE_289 = 289
+LINE_290 = 290
+LINE_291 = 291
+LINE_292 = 292
+LINE_293 = 293
+LINE_294 = 294
+LINE_295 = 295
+LINE_296 = 296
+LINE_297 = 297
+LINE_298 = 298
+LINE_299 = 299
+LINE_300 = 300
+LINE_301 = 301
+LINE_302 = 302
+LINE_303 = 303
+LINE_304 = 304
+LINE_305 = 305
+LINE_306 = 306
+LINE_307 = 307
+LINE_308 = 308
+LINE_309 = 309
+LINE_310 = 310
+LINE_311 = 311
+LINE_312 = 312
+LINE_313 = 313
+LINE_314 = 314
+LINE_315 = 315
+LINE_316 = 316
+LINE_317 = 317
+LINE_318 = 318
+LINE_319 = 319
+LINE_320 = 320
+LINE_321 = 321
+LINE_322 = 322
+LINE_323 = 323
+LINE_324 = 324
+LINE_325 = 325
+LINE_326 = 326
+LINE_327 = 327
+LINE_328 = 328
+LINE_329 = 329
+LINE_330 = 330
+LINE_331 = 331
+LINE_332 = 332
+LINE_333 = 333
+LINE_334 = 334
+LINE_335 = 335
+LINE_336 = 336
+LINE_337 = 337
+LINE_338 = 338
+LINE_339 = 339
+LINE_340 = 340
+LINE_341 = 341
+LINE_342 = 342
+LINE_343 = 343
+LINE_344 = 344
+LINE_345 = 345
+LINE_346 = 346
+LINE_347 = 347
+LINE_348 = 348
+LINE_349 = 349
+LINE_350 = 350
+LINE_351 = 351
+LINE_352 = 352
+LINE_353 = 353
+LINE_354 = 354
+LINE_355 = 355
+LINE_356 = 356
+LINE_357 = 357
+LINE_358 = 358
+LINE_359 = 359
+LINE_360 = 360
+LINE_361 = 361
+LINE_362 = 362
+LINE_363 = 363
+LINE_364 = 364
+LINE_365 = 365
+LINE_366 = 366
+LINE_367 = 367
+LINE_368 = 368
+LINE_369 = 369
+LINE_370 = 370
+LINE_371 = 371
+LINE_372 = 372
+LINE_373 = 373
+LINE_374 = 374
+LINE_375 = 375
+LINE_376 = 376
+LINE_377 = 377
+LINE_378 = 378
+LINE_379 = 379
+LINE_380 = 380
+LINE_381 = 381
+LINE_382 = 382
+LINE_383 = 383
+LINE_384 = 384
+LINE_385 = 385
+LINE_386 = 386
+LINE_387 = 387
+LINE_388 = 388
+LINE_389 = 389
+LINE_390 = 390
+LINE_391 = 391
+LINE_392 = 392
+LINE_393 = 393
+LINE_394 = 394
+LINE_395 = 395
+LINE_396 = 396
+LINE_397 = 397
+LINE_398 = 398
+LINE_399 = 399
+LINE_400 = 400
+LINE_401 = 401
+LINE_402 = 402
+LINE_403 = 403
+LINE_404 = 404
+LINE_405 = 405
+LINE_406 = 406
+LINE_407 = 407
+LINE_408 = 408
+LINE_409 = 409
+LINE_410 = 410
+LINE_411 = 411
+LINE_412 = 412
+LINE_413 = 413
+LINE_414 = 414
+LINE_415 = 415
+LINE_416 = 416
+LINE_417 = 417
+LINE_418 = 418
+LINE_419 = 419
+LINE_420 = 420
+LINE_421 = 421
+LINE_422 = 422
+LINE_423 = 423
+LINE_424 = 424
+LINE_425 = 425
+LINE_426 = 426
+LINE_427 = 427
+LINE_428 = 428
+LINE_429 = 429
+LINE_430 = 430
+def helper_alpha() -> int:
+    return LINE_10 + LINE_20
+def helper_beta() -> int:
+    return helper_alpha()
+class GiantService:
+    def run(self) -> int:
+        return helper_beta()
+def auto_func_1() -> int:
+    return 1
+def auto_func_2() -> int:
+    return 2
+def auto_func_3() -> int:
+    return 3
+def auto_func_4() -> int:
+    return 4
+def auto_func_5() -> int:
+    return 5
+def auto_func_6() -> int:
+    return 6
+def auto_func_7() -> int:
+    return 7
+def auto_func_8() -> int:
+    return 8
+def auto_func_9() -> int:
+    return 9
+def auto_func_10() -> int:
+    return 10
+def auto_func_11() -> int:
+    return 11
+def auto_func_12() -> int:
+    return 12
+def auto_func_13() -> int:
+    return 13
+def auto_func_14() -> int:
+    return 14
+def auto_func_15() -> int:
+    return 15
+def auto_func_16() -> int:
+    return 16
+def auto_func_17() -> int:
+    return 17
+def auto_func_18() -> int:
+    return 18
+def auto_func_19() -> int:
+    return 19
+def auto_func_20() -> int:
+    return 20
+def auto_func_21() -> int:
+    return 21
+def auto_func_22() -> int:
+    return 22
+def auto_func_23() -> int:
+    return 23
+def auto_func_24() -> int:
+    return 24
+def auto_func_25() -> int:
+    return 25
+def auto_func_26() -> int:
+    return 26
+def auto_func_27() -> int:
+    return 27
+def auto_func_28() -> int:
+    return 28
+def auto_func_29() -> int:
+    return 29
+def auto_func_30() -> int:
+    return 30
+def auto_func_31() -> int:
+    return 31
+def auto_func_32() -> int:
+    return 32
+def auto_func_33() -> int:
+    return 33
+def auto_func_34() -> int:
+    return 34
+def auto_func_35() -> int:
+    return 35
+def auto_func_36() -> int:
+    return 36
+def auto_func_37() -> int:
+    return 37
+def auto_func_38() -> int:
+    return 38
+def auto_func_39() -> int:
+    return 39
+def auto_func_40() -> int:
+    return 40
+def auto_func_41() -> int:
+    return 41
+def auto_func_42() -> int:
+    return 42
+def auto_func_43() -> int:
+    return 43
+def auto_func_44() -> int:
+    return 44
+def auto_func_45() -> int:
+    return 45

code-review-env/sample_project/inventory.py ADDED Viewed

	@@ -0,0 +1,10 @@

+from validators import is_non_empty
+STOCK = {"widget": 4, "gizmo": 0}
+def is_available(item_name: str) -> bool:
+    if not is_non_empty(item_name):
+        return False
+    return STOCK.get(item_name, 0) > 0

code-review-env/sample_project/notifications.py ADDED Viewed

	@@ -0,0 +1,6 @@

+import smtplib
+def send_email(recipient: str, body: str) -> None:
+    client = smtplib.SMTP("localhost")
+    client.sendmail("noreply@example.com", [recipient], body)

code-review-env/sample_project/payments.py ADDED Viewed

	@@ -0,0 +1,15 @@

+"""Payment gateway wrapper."""
+import subprocess
+def run_gateway_check(endpoint: str) -> int:
+    # SECURITY ISSUE: user-provided endpoint is interpolated in a shell command.
+    command = f"curl -s {endpoint}"
+    return subprocess.call(command, shell=True)
+def charge(total: float) -> str:
+    if total <= 0:
+        return "rejected"
+    return "charged"

code-review-env/sample_project/utils.py ADDED Viewed

	@@ -0,0 +1,7 @@

+from inventory import is_available
+def pick_item(preferred: str, fallback: str) -> str:
+    if is_available(preferred):
+        return preferred
+    return fallback

code-review-env/sample_project/validators.py ADDED Viewed

	@@ -0,0 +1,8 @@

+def is_non_empty(value: str | None) -> bool:
+    return value is not None and value.strip() != ""
+def validate_coupon(code: str | None) -> bool:
+    # Intentional bug: accepts invalid short code when value is None
+    return (code or "").startswith("SAVE")

code-review-env/tests/test_phase2_graph_manager.py ADDED Viewed

	@@ -0,0 +1,32 @@

+from pathlib import Path
+from db.seed import seed_project
+from graph.graph_manager import GraphManager
+def test_graph_manager_traversal_is_deterministic(tmp_path: Path) -> None:
+    db_path = tmp_path / "phase2_graph.db"
+    seed_project(Path("sample_project"), db_path=str(db_path), force=True)
+    manager = GraphManager(source_root="sample_project", db_path=db_path)
+    first = manager.traversal_order()
+    second = manager.traversal_order()
+    assert first == second
+    assert len(first) > 0
+def test_graph_manager_neighbor_queries(tmp_path: Path) -> None:
+    db_path = tmp_path / "phase2_graph_neighbors.db"
+    seed_project(Path("sample_project"), db_path=str(db_path), force=True)
+    manager = GraphManager(source_root="sample_project", db_path=db_path)
+    graph = manager.load_graph()
+    candidate = next(iter(graph.nodes()))
+    both = manager.get_neighbors(candidate, direction="both")
+    only_out = manager.get_neighbors(candidate, direction="out")
+    only_in = manager.get_neighbors(candidate, direction="in")
+    assert set(only_out).issubset(set(both))
+    assert set(only_in).issubset(set(both))

code-review-env/tests/test_phase2_observation.py ADDED Viewed

	@@ -0,0 +1,55 @@

+from pathlib import Path
+import pytest
+from db.seed import seed_project
+from env.observation import CodeObservation
+from env.observation_builder import ObservationBuilder
+def test_code_observation_strict_rejects_bad_types() -> None:
+    with pytest.raises(Exception):
+        CodeObservation(
+            module_id="checkout",
+            code="print('x')",
+            ast_summary={},
+            dependency_summaries=[],
+            dependent_summaries=[],
+            neighbor_reviews=[],
+            task_description="review",
+            available_actions=[],
+            requested_context=None,
+            token_usage={},
+            total_tokens="100",  # type: ignore[arg-type]
+            within_budget=True,
+        )
+def test_observation_builder_within_budget(tmp_path: Path) -> None:
+    db_path = tmp_path / "phase2_obs.db"
+    seed_project(Path("sample_project"), db_path=str(db_path), force=True)
+    builder = ObservationBuilder(source_root="sample_project", db_path=db_path)
+    observation = builder.build(
+        module_id="checkout",
+        task_description="Find logic and dependency issues",
+    )
+    assert observation.within_budget is True
+    assert observation.total_tokens <= 2000
+    assert observation.module_id == "checkout"
+def test_request_context_is_bounded(tmp_path: Path) -> None:
+    db_path = tmp_path / "phase2_context.db"
+    seed_project(Path("sample_project"), db_path=str(db_path), force=True)
+    builder = ObservationBuilder(source_root="sample_project", db_path=db_path)
+    observation = builder.build(
+        module_id="checkout",
+        task_description="Investigate dependencies",
+        context_request="auth",
+    )
+    assert observation.requested_context is not None
+    assert observation.total_tokens <= 2000

code-review-env/tests/test_phase2_token_budget.py ADDED Viewed

	@@ -0,0 +1,42 @@

+from graph.token_budget import MAX_TOTAL_TOKENS, TokenBudget
+def test_token_budget_enforces_hard_cap() -> None:
+    budget = TokenBudget()
+    huge = "x" * 50000
+    result = budget.enforce(
+        {
+            "code": huge,
+            "ast_summary_text": huge,
+            "dependency_summaries": [huge, huge],
+            "dependent_summaries": [huge],
+            "neighbor_reviews": [huge],
+            "task_description": huge,
+            "available_actions": ["FLAG_BUG"],
+            "requested_context_code": huge,
+        }
+    )
+    assert result.total_tokens <= MAX_TOTAL_TOKENS
+def test_token_budget_marks_truncation() -> None:
+    budget = TokenBudget()
+    huge = "z" * 20000
+    result = budget.enforce(
+        {
+            "code": huge,
+            "ast_summary_text": "{}",
+            "dependency_summaries": [],
+            "dependent_summaries": [],
+            "neighbor_reviews": [],
+            "task_description": "task",
+            "available_actions": ["REQUEST_CONTEXT"],
+            "requested_context_code": huge,
+        }
+    )
+    trimmed_code = str(result.payload["code"])
+    assert "[TRUNCATED]" in trimmed_code

code-review-env/tests/test_seed.py ADDED Viewed

	@@ -0,0 +1,30 @@

+from pathlib import Path
+from db.seed import seed_project
+from parser.ast_parser import parse_python_file
+from parser.chunker import chunk_module
+def test_seed_project_uses_hash_cache(tmp_path: Path) -> None:
+    db_path = tmp_path / "seed.db"
+    target = Path("sample_project")
+    first = seed_project(target, db_path=str(db_path), force=False)
+    second = seed_project(target, db_path=str(db_path), force=False)
+    assert first["loaded_from_cache"] is False
+    assert second["loaded_from_cache"] is True
+    assert first["node_count"] == second["node_count"]
+    assert first["edge_count"] == second["edge_count"]
+def test_chunker_splits_large_module_into_sub_nodes() -> None:
+    root = Path("sample_project")
+    parsed = parse_python_file(root / "huge_module.py", root)
+    chunked = chunk_module(parsed, max_lines=300)
+    assert chunked.parent.module_id == "huge_module"
+    assert chunked.parent.code == ""
+    assert len(chunked.chunks) >= 2
+    assert all(chunk.parent_module_id == "huge_module" for chunk in chunked.chunks)
+    assert any("::helper_alpha" in chunk.module_id for chunk in chunked.chunks)

plans/phase-02-graph-manager-observation-plan.md ADDED Viewed

	@@ -0,0 +1,206 @@

+# Phase 2 Plan — Graph Manager & Observation Builder (for GPT-5.3)
+## Objective
+Deliver Phase 2 only:
+- graph/graph_manager.py: load graph from SQLite, traversal order, neighbor queries
+- graph/token_budget.py: hard 2000-token enforcement with per-component limits
+- env/observation.py: strict Pydantic v2 CodeObservation model
+No Phase 3+ implementation in this phase.
+## Context7-Validated Constraints To Use
+1. SQLAlchemy 2.0 + SQLite:
+- Use SQLAlchemy ORM patterns with Declarative models and explicit Session boundaries.
+- Keep read-heavy graph fetches in short-lived sessions.
+2. NetworkX traversal and determinism:
+- Use DAG topological utilities when possible.
+- Use deterministic ordering (lexicographical tie-breaking) to avoid run-to-run drift.
+- Betweenness centrality is available for ranking high-impact nodes.
+3. Pydantic v2 model strictness:
+- Use BaseModel with strict config and forbid unknown fields.
+- Use model_validate/model_dump APIs consistently.
+## Current Codebase Reality (important for Phase 2)
+1. Existing graph logic is in env/graph.py, not graph/graph_manager.py.
+2. env/observation_builder.py and env/models.py are placeholders.
+3. DB layer currently uses SQLModel schema classes in db/schema.py.
+Implication: Phase 2 should add the target files while preserving compatibility with existing imports/tests where possible.
+## Proposed Phase 2 Deliverables
+### 1) Create graph package and GraphManager
+Files:
+- code-review-env/graph/__init__.py
+- code-review-env/graph/graph_manager.py
+Planned API:
+- class GraphManager:
+  - __init__(self, source_root: str, db_path: str | None = None)
+  - load_graph(self) -> nx.DiGraph
+  - get_node(self, module_id: str) -> dict[str, object]
+  - get_neighbors(self, module_id: str, direction: Literal["out", "in", "both"], limit: int | None = None) -> list[str]
+  - traversal_order(self) -> list[str]
+  - centrality(self) -> dict[str, float]
+Implementation rules:
+- Load modules/edges from SQLite as source of truth.
+- Add all module metadata needed for observations as node attributes.
+- traversal_order target behavior:
+  - Prefer leaf-first review order.
+  - Push high-centrality nodes later.
+  - Deterministic tie-breaker by module_id.
+- Recommended approach:
+  - Reverse-edge DAG ordering for leaf-first when acyclic.
+  - If cyclic, condense SCCs or apply stable fallback ordering by:
+    1) out_degree ascending
+    2) betweenness centrality ascending
+    3) module_id ascending
+Compatibility note:
+- Keep env/graph.py as a thin wrapper or adapter to GraphManager until all callers migrate.
+### 2) Implement hard token budget module
+File:
+- code-review-env/graph/token_budget.py
+Constants:
+- MAX_TOTAL_TOKENS = 2000
+- COMPONENT_BUDGETS (initial defaults from plan):
+  - current_code: 800
+  - ast_summary: 100
+  - direct_deps: 250
+  - dependents: 150
+  - neighbor_reviews: 120
+  - task_and_actions: 200
+  - buffer: 280
+Planned API:
+- estimate_tokens(text: str) -> int
+- truncate_to_budget(text: str, max_tokens: int, suffix_notice: str) -> str
+- allocate_budget(components: dict[str, str | list[str]]) -> dict[str, object]
+  - returns included/truncated text + per-component token usage + total
+- enforce_observation_budget(observation_payload: dict[str, object]) -> dict[str, object]
+Implementation rules:
+- Budget must be enforced, never advisory.
+- If full payload exceeds 2000, trim in priority order:
+  1) dependent summaries
+  2) neighbor reviews
+  3) direct dependency summaries (lowest-ranked first)
+  4) current code (but preserve critical context header + truncation notice)
+- REQUEST_CONTEXT path must still obey MAX_TOTAL_TOKENS and return full neighbor code only when it fits; otherwise return bounded code + explicit truncation marker.
+Token estimator policy:
+- Start with deterministic approximation for stability (for example chars/4 heuristic).
+- Keep estimator in one function to allow later swap to model-specific tokenizer without API break.
+### 3) Implement strict Pydantic observation model
+File:
+- code-review-env/env/observation.py
+Planned models:
+- class NeighborSummary(BaseModel)
+  - module_id: str
+  - relation: Literal["dependency", "dependent"]
+  - summary: str
+  - review_snippet: str | None
+- class RequestedContext(BaseModel)
+  - module_id: str
+  - code: str
+  - was_truncated: bool
+- class CodeObservation(BaseModel)
+  - module_id: str
+  - code: str
+  - ast_summary: dict[str, object]
+  - dependency_summaries: list[NeighborSummary]
+  - dependent_summaries: list[NeighborSummary]
+  - neighbor_reviews: list[str]
+  - task_description: str
+  - available_actions: list[str]
+  - requested_context: RequestedContext | None = None
+  - token_usage: dict[str, int]
+  - total_tokens: int
+  - within_budget: bool
+Model config:
+- strict=True
+- extra="forbid"
+Validation rules:
+- total_tokens <= 2000 must be true.
+- module_id and code cannot be empty.
+- dependency/dependent list limits enforced before serialization.
+### 4) Observation assembly integration path
+File to update in Phase 2:
+- code-review-env/env/observation_builder.py
+Plan:
+- Replace placeholder with builder that composes:
+  - GraphManager neighbor and ordering queries
+  - DB-backed module source + summaries + review annotations
+  - TokenBudget allocation and enforcement
+  - CodeObservation validation
+Behavior:
+- Default observation returns current module + compressed neighbors.
+- REQUEST_CONTEXT(module_id): include requested neighbor code in requested_context while still meeting global budget.
+## Verification Plan (must pass before Phase 2 complete)
+### A) Unit tests to add/update
+1. tests/test_graph_manager_phase2.py
+- load_graph builds expected node/edge counts from seeded DB.
+- traversal_order places leaf nodes earlier than high-centrality hubs.
+- ordering is deterministic across repeated calls.
+2. tests/test_token_budget_phase2.py
+- enforce_observation_budget always returns total_tokens <= 2000.
+- long current code is truncated with explicit notice.
+- REQUEST_CONTEXT path stays within 2000.
+3. tests/test_observation_phase2.py
+- CodeObservation strict validation rejects unknown fields/type coercion.
+- valid payload serializes with model_dump and preserves token fields.
+### B) Scenario checks
+1. Seed sample_project SQLite DB.
+2. Build observation for every module_id in modules table.
+3. Assert all observations are within budget.
+4. Trigger REQUEST_CONTEXT for high-fanout node and validate bounded response.
+### C) Determinism checks
+1. Run traversal_order 10 times on same DB snapshot.
+2. Output order must be identical each run.
+## Risks and Mitigations
+1. Existing env/graph.py may conflict with new graph/graph_manager.py.
+- Mitigation: keep wrapper compatibility until callers migrate.
+2. SQLModel vs SQLAlchemy ORM naming mismatch in current schema.
+- Mitigation: Phase 2 consumes existing schema as-is; DB table redesign deferred unless explicitly approved.
+3. Token estimation mismatch vs actual model tokenizer.
+- Mitigation: enforce conservative budget with safety buffer; keep estimator swappable.
+## Design Questions To Resolve Before Implementation
+1. File structure decision:
+- Should Phase 2 introduce new graph/ package now and keep env/graph.py compatibility wrapper, or refactor callers immediately?
+2. Schema alignment decision:
+- Keep current SQLModel-backed tables in Phase 2 and map to planned names later, or perform a schema migration now?
+3. REQUEST_CONTEXT strictness:
+- If full neighbor code cannot fit, should response be truncated (with marker) or should the action fail with explicit error and no code body?
+## Definition of Done for Phase 2
+1. graph/graph_manager.py, graph/token_budget.py, env/observation.py implemented with type hints and docstrings.
+2. observation_builder builds validated CodeObservation objects.
+3. All Phase 2 tests pass.
+4. Every generated observation satisfies hard <= 2000 token limit.
+5. Traversal order behavior matches leaf-first and high-centrality-last intent with deterministic ties.