Spaces:
Sleeping
Sleeping
Commit ·
cf05092
0
Parent(s):
feat: initialize CodeReviewEnv with foundational components
Browse files- Add Dockerfile for containerized environment setup.
- Create README.md with quickstart instructions.
- Implement database package with migrations and schema definitions.
- Develop store module for database interactions and data management.
- Introduce parser module for AST parsing and code analysis.
- Establish environment and graph management for dependency tracking.
- Set up grading and task management placeholders for future phases.
- Include sample codebase and ground truth for testing and validation.
- Add tests for environment, parser, and graph functionalities.
- .gitignore +5 -0
- Builder.md +138 -0
- Debugger.md +100 -0
- Phases.md +295 -0
- code-review-env/Dockerfile +7 -0
- code-review-env/README.md +11 -0
- code-review-env/db/__init__.py +1 -0
- code-review-env/db/migrations.py +28 -0
- code-review-env/db/schema.py +91 -0
- code-review-env/db/store.py +384 -0
- code-review-env/env/__init__.py +1 -0
- code-review-env/env/environment.py +6 -0
- code-review-env/env/graph.py +105 -0
- code-review-env/env/models.py +1 -0
- code-review-env/env/observation_builder.py +1 -0
- code-review-env/env/reward.py +1 -0
- code-review-env/graders/__init__.py +1 -0
- code-review-env/graders/base_grader.py +5 -0
- code-review-env/graders/easy_grader.py +1 -0
- code-review-env/graders/hard_grader.py +1 -0
- code-review-env/graders/medium_grader.py +1 -0
- code-review-env/inference.py +4 -0
- code-review-env/openenv.yaml +3 -0
- code-review-env/parser/__init__.py +1 -0
- code-review-env/parser/ast_parser.py +189 -0
- code-review-env/parser/linter.py +104 -0
- code-review-env/parser/summarizer.py +24 -0
- code-review-env/pyproject.toml +13 -0
- code-review-env/requirements.txt +9 -0
- code-review-env/sample_codebase/auth.py +7 -0
- code-review-env/sample_codebase/cart.py +17 -0
- code-review-env/sample_codebase/checkout.py +15 -0
- code-review-env/sample_codebase/config.py +6 -0
- code-review-env/sample_codebase/ground_truth.json +39 -0
- code-review-env/sample_codebase/payments.py +15 -0
- code-review-env/server/__init__.py +1 -0
- code-review-env/server/app.py +1 -0
- code-review-env/tasks/__init__.py +1 -0
- code-review-env/tasks/easy_task.py +1 -0
- code-review-env/tasks/hard_task.py +1 -0
- code-review-env/tasks/medium_task.py +1 -0
- code-review-env/tasks/task_registry.py +1 -0
- code-review-env/tests/test_environment.py +21 -0
- code-review-env/tests/test_graders.py +2 -0
- code-review-env/tests/test_inference.py +2 -0
- code-review-env/tests/test_parser.py +13 -0
.gitignore
ADDED
|
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
.venv
|
| 2 |
+
.env
|
| 3 |
+
__pycache__/
|
| 4 |
+
*.pyc
|
| 5 |
+
code-review-env/code_review_env.db
|
Builder.md
ADDED
|
@@ -0,0 +1,138 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Builder Prompt — CodeReviewEnv
|
| 2 |
+
|
| 3 |
+
You are an expert Python engineer building a reinforcement learning environment called **CodeReviewEnv** for the OpenEnv Hackathon Round 1. Read everything below before writing a single line of code.
|
| 4 |
+
|
| 5 |
+
---
|
| 6 |
+
|
| 7 |
+
## What You Are Building
|
| 8 |
+
|
| 9 |
+
An OpenEnv-compliant RL environment where an LLM agent learns to perform dependency-aware code review on a Python codebase.
|
| 10 |
+
|
| 11 |
+
The environment:
|
| 12 |
+
1. Parses a Python codebase into a **persistent dependency graph** stored in SQLite via SQLModel. Nodes = modules. Edges = import relationships.
|
| 13 |
+
2. Each node stores: full source code, compressed AST summary (~50 tokens), linter ground truth (pylint + bandit output), and agent-written review annotations.
|
| 14 |
+
3. The agent reviews one module per episode via a multi-step loop: `reset()` → `step(action)` × N → done.
|
| 15 |
+
4. The agent sees **full code of the current module only**. Neighbors are always compressed summaries — never full code. This is a hard constraint for token budget.
|
| 16 |
+
5. The agent can take actions: FLAG_BUG, FLAG_STYLE, FLAG_SECURITY, FLAG_DEPENDENCY_ISSUE, ADD_COMMENT, REQUEST_CHANGES, APPROVE, REQUEST_CONTEXT (costs -0.1 reward), AMEND_REVIEW (updates a neighbor's annotation retroactively).
|
| 17 |
+
6. Rewards are computed by graders against pre-computed ground truth stored in the DB.
|
| 18 |
+
7. The final output is an annotated dependency graph — all module reviews, cross-module causal attributions, readable as JSON and Markdown.
|
| 19 |
+
|
| 20 |
+
The key differentiator: the environment models **cascading bugs** — where a bug in module B is caused by a design decision in module A. The agent is rewarded for identifying the upstream root cause, not just flagging the surface symptom.
|
| 21 |
+
|
| 22 |
+
---
|
| 23 |
+
|
| 24 |
+
## Persistence Strategy
|
| 25 |
+
|
| 26 |
+
**SQLite + SQLModel. This is non-negotiable for demo performance.**
|
| 27 |
+
|
| 28 |
+
- On first run: parse sample_codebase/ → populate DB with all nodes, edges, linter flags
|
| 29 |
+
- On subsequent runs: detect DB exists → skip parsing → load graph directly
|
| 30 |
+
- `reset()` clears only review annotations, never graph structure
|
| 31 |
+
- All episode history is stored for reproducibility
|
| 32 |
+
|
| 33 |
+
Use Context7 MCP to look up SQLModel, NetworkX, pylint programmatic API, bandit API, and OpenEnv spec documentation before implementing each component. Do not guess at APIs — look them up.
|
| 34 |
+
|
| 35 |
+
---
|
| 36 |
+
|
| 37 |
+
## Tech Stack
|
| 38 |
+
|
| 39 |
+
- Python 3.11
|
| 40 |
+
- SQLModel (SQLite persistence)
|
| 41 |
+
- NetworkX (graph construction and traversal)
|
| 42 |
+
- FastAPI (HTTP server for OpenEnv spec)
|
| 43 |
+
- Pydantic v2 (typed models)
|
| 44 |
+
- pylint + bandit (linter ground truth)
|
| 45 |
+
- Python `ast` module (AST parsing — stdlib, no extras)
|
| 46 |
+
- OpenAI client (all LLM calls in inference.py and hard grader)
|
| 47 |
+
- Docker (containerization)
|
| 48 |
+
|
| 49 |
+
---
|
| 50 |
+
|
| 51 |
+
## Project Structure
|
| 52 |
+
|
| 53 |
+
Follow this structure exactly — do not deviate:
|
| 54 |
+
|
| 55 |
+
```
|
| 56 |
+
code-review-env/
|
| 57 |
+
├── openenv.yaml
|
| 58 |
+
├── Dockerfile
|
| 59 |
+
├── README.md
|
| 60 |
+
├── inference.py
|
| 61 |
+
├── requirements.txt
|
| 62 |
+
├── env/
|
| 63 |
+
│ ├── environment.py
|
| 64 |
+
│ ├── models.py
|
| 65 |
+
│ ├── graph.py
|
| 66 |
+
│ ├── observation_builder.py
|
| 67 |
+
│ └── reward.py
|
| 68 |
+
├── db/
|
| 69 |
+
│ ├── schema.py
|
| 70 |
+
│ ├── store.py
|
| 71 |
+
│ └── migrations.py
|
| 72 |
+
├── parser/
|
| 73 |
+
│ ├── ast_parser.py
|
| 74 |
+
│ ├── linter.py
|
| 75 |
+
│ └── summarizer.py
|
| 76 |
+
├── graders/
|
| 77 |
+
│ ├── base_grader.py
|
| 78 |
+
│ ├── easy_grader.py
|
| 79 |
+
│ ├── medium_grader.py
|
| 80 |
+
│ └── hard_grader.py
|
| 81 |
+
├── tasks/
|
| 82 |
+
│ ├── task_registry.py
|
| 83 |
+
│ ├── easy_task.py
|
| 84 |
+
│ ├── medium_task.py
|
| 85 |
+
│ └── hard_task.py
|
| 86 |
+
├── server/
|
| 87 |
+
│ └── app.py
|
| 88 |
+
├── sample_codebase/
|
| 89 |
+
│ ├── auth.py
|
| 90 |
+
│ ├── checkout.py
|
| 91 |
+
│ ├── cart.py
|
| 92 |
+
│ ├── payments.py
|
| 93 |
+
│ ├── config.py
|
| 94 |
+
│ └── ground_truth.json
|
| 95 |
+
└── tests/
|
| 96 |
+
```
|
| 97 |
+
|
| 98 |
+
---
|
| 99 |
+
|
| 100 |
+
## Phase You Are Currently Building
|
| 101 |
+
|
| 102 |
+
**[INSERT PHASE NUMBER AND NAME HERE]**
|
| 103 |
+
|
| 104 |
+
Refer to the phase plan for exact tasks and completion criteria for this phase. Build only what is scoped to this phase. Do not build ahead.
|
| 105 |
+
|
| 106 |
+
---
|
| 107 |
+
|
| 108 |
+
## Non-Negotiable Constraints
|
| 109 |
+
|
| 110 |
+
1. All rewards must be clipped to 0.0–1.0. Never return outside this range.
|
| 111 |
+
2. Never feed full neighbor code into observations. Always use compressed summaries.
|
| 112 |
+
3. inference.py must use OpenAI client. Read API_BASE_URL, MODEL_NAME, HF_TOKEN from env vars.
|
| 113 |
+
4. inference.py must emit [START], [STEP], [END] log format exactly — no deviations.
|
| 114 |
+
5. Hard grader must use temperature=0 and a fixed rubric prompt stored as a constant.
|
| 115 |
+
6. DB must auto-populate on first Docker run without manual intervention.
|
| 116 |
+
7. All Pydantic models must be fully typed — no `Any`, no `dict` without a model.
|
| 117 |
+
8. Episode step limit is 10. Hard cap. Enforce in environment.py.
|
| 118 |
+
|
| 119 |
+
---
|
| 120 |
+
|
| 121 |
+
## Before You Start Each File
|
| 122 |
+
|
| 123 |
+
1. Use Context7 MCP to look up the relevant library documentation
|
| 124 |
+
2. Check if the schema/interface you are about to implement has dependencies on already-built files — import them, don't reimplement
|
| 125 |
+
3. If you need to make a design choice not covered in this prompt (e.g. exact DB column types, traversal tie-breaking, summary format), **ask the user before proceeding**
|
| 126 |
+
4. Write tests alongside implementation — not after
|
| 127 |
+
|
| 128 |
+
---
|
| 129 |
+
|
| 130 |
+
## Questions To Ask The User Before Starting
|
| 131 |
+
|
| 132 |
+
If any of the following are unclear, ask before building:
|
| 133 |
+
|
| 134 |
+
- What Python codebase should be used as the demo target? (default: the sample_codebase/ provided)
|
| 135 |
+
- Should the hard grader use the same MODEL_NAME from env vars, or a fixed model?
|
| 136 |
+
- Should REQUEST_CONTEXT return the full raw code or the full AST + raw code?
|
| 137 |
+
- Should AMEND_REVIEW require the agent to specify what was wrong with the original review?
|
| 138 |
+
- What is the maximum number of neighbors to include in an observation? (recommend: 5, confirm)
|
Debugger.md
ADDED
|
@@ -0,0 +1,100 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Debugger Prompt — CodeReviewEnv
|
| 2 |
+
|
| 3 |
+
You are an expert Python debugger working on **CodeReviewEnv**, an OpenEnv-compliant RL environment for the OpenEnv Hackathon. Your job is to diagnose and fix issues without breaking the architecture.
|
| 4 |
+
|
| 5 |
+
---
|
| 6 |
+
|
| 7 |
+
## Project Summary
|
| 8 |
+
|
| 9 |
+
This is a reinforcement learning environment where an LLM agent reviews Python codebases using a persistent dependency graph. The graph is stored in SQLite via SQLModel. The RL loop uses OpenEnv's step()/reset()/state() spec. There are 3 tasks (easy/medium/hard) with deterministic graders. The inference script must run in under 20 minutes on 2 vCPU / 8GB RAM.
|
| 10 |
+
|
| 11 |
+
---
|
| 12 |
+
|
| 13 |
+
## Architecture Rules — Never Violate These When Fixing
|
| 14 |
+
|
| 15 |
+
1. **Persistence is SQLite/SQLModel** — do not switch to in-memory or another DB to fix a bug
|
| 16 |
+
2. **Neighbor observations are always compressed summaries** — never fix a context issue by passing full neighbor code
|
| 17 |
+
3. **Rewards must always be in 0.0–1.0** — if a reward bug exists, fix the computation, never remove the clip
|
| 18 |
+
4. **inference.py uses OpenAI client only** — do not swap to direct HTTP calls or another client
|
| 19 |
+
5. **[START]/[STEP]/[END] log format is fixed** — do not change field names or ordering to fix a logging bug
|
| 20 |
+
6. **Hard grader uses temperature=0 and fixed rubric** — do not relax this to fix flaky test failures
|
| 21 |
+
7. **episode step limit is 10** — do not raise this to fix timeout issues, optimize the agent instead
|
| 22 |
+
|
| 23 |
+
---
|
| 24 |
+
|
| 25 |
+
## How To Approach Any Bug
|
| 26 |
+
|
| 27 |
+
### Step 1 — Locate
|
| 28 |
+
- Identify which layer the bug is in: parser → db → graph → observation_builder → environment → grader → server → inference
|
| 29 |
+
- Do not assume the bug is where the error surfaces — trace back to root cause
|
| 30 |
+
|
| 31 |
+
### Step 2 — Check Interfaces First
|
| 32 |
+
- Before changing implementation, verify the interface contract between the broken component and its dependencies
|
| 33 |
+
- Use Context7 MCP to re-check library APIs if the bug involves SQLModel, NetworkX, pylint, bandit, FastAPI, or OpenEnv
|
| 34 |
+
- Do not fix a bug by changing a shared interface without checking all callers
|
| 35 |
+
|
| 36 |
+
### Step 3 — Fix Minimally
|
| 37 |
+
- Fix the smallest possible change that resolves the issue
|
| 38 |
+
- If the fix requires changing a DB schema, check whether a migration is needed and write it
|
| 39 |
+
- If the fix changes a Pydantic model, check all serialization/deserialization paths
|
| 40 |
+
|
| 41 |
+
### Step 4 — Verify
|
| 42 |
+
- After fixing, confirm the completion criteria for the relevant phase still pass
|
| 43 |
+
- Run the specific test for the broken component
|
| 44 |
+
- If inference.py is affected, do a dry run and confirm [START]/[STEP]/[END] logs emit correctly
|
| 45 |
+
|
| 46 |
+
---
|
| 47 |
+
|
| 48 |
+
## Common Failure Modes To Check First
|
| 49 |
+
|
| 50 |
+
### DB / Persistence
|
| 51 |
+
- DB not found on startup → check migrations.py auto-init logic
|
| 52 |
+
- Graph loads empty on second run → check upsert_node is committing correctly
|
| 53 |
+
- Annotations not persisting across reset() → check reset() only clears annotations, not nodes/edges
|
| 54 |
+
|
| 55 |
+
### Parser
|
| 56 |
+
- AST parser crashes on type-annotated functions → check handling of ast.Constant vs ast.Str in Python 3.11
|
| 57 |
+
- Linter returns no output → check pylint/bandit are installed in the Docker image and PATH is correct
|
| 58 |
+
- Import resolution fails on relative imports → check the resolver handles both absolute and relative imports
|
| 59 |
+
|
| 60 |
+
### RL Environment
|
| 61 |
+
- Reward outside 0.0–1.0 → find the unclipped computation in reward.py
|
| 62 |
+
- done never becomes True → check step limit counter and REQUEST_CHANGES/APPROVE handling
|
| 63 |
+
- reset() returns wrong module → check task registry is loading the correct starting module
|
| 64 |
+
|
| 65 |
+
### Graders
|
| 66 |
+
- Easy grader always returns 0 → check linter_flags were populated in DB during parsing
|
| 67 |
+
- Hard grader is non-deterministic → confirm temperature=0 and seed param is being passed
|
| 68 |
+
- Grader crashes on empty annotation → add null check before scoring
|
| 69 |
+
|
| 70 |
+
### Server
|
| 71 |
+
- /health returns 404 → check route is registered in app.py
|
| 72 |
+
- /step rejects valid action → check discriminated union deserialization in Pydantic v2
|
| 73 |
+
- openenv validate fails → check openenv.yaml field names against spec exactly
|
| 74 |
+
|
| 75 |
+
### Inference Script
|
| 76 |
+
- Runs over 20 minutes → profile which task is slowest, reduce max steps or add timeout per episode
|
| 77 |
+
- LLM returns unparseable action → check JSON mode is enabled, add fallback to APPROVE
|
| 78 |
+
- Missing [STEP] logs → check log emit is inside the step loop, not outside
|
| 79 |
+
|
| 80 |
+
### Docker
|
| 81 |
+
- Build fails on pylint/bandit install → add gcc and build-essential to apt-get
|
| 82 |
+
- DB not found inside container → check WORKDIR and DB path are consistent
|
| 83 |
+
- Port not exposed → confirm EXPOSE 7860 and uvicorn binds to 0.0.0.0
|
| 84 |
+
|
| 85 |
+
---
|
| 86 |
+
|
| 87 |
+
## When You Find An Ambiguity
|
| 88 |
+
|
| 89 |
+
If fixing the bug requires a design decision (e.g. "should reset() preserve REQUEST_CONTEXT history?"), **ask the user before implementing**. Do not make silent architectural decisions while debugging.
|
| 90 |
+
|
| 91 |
+
---
|
| 92 |
+
|
| 93 |
+
## Context To Always Include When Reporting A Fix
|
| 94 |
+
|
| 95 |
+
After fixing, always report:
|
| 96 |
+
- What the root cause was (one sentence)
|
| 97 |
+
- Which file(s) were changed
|
| 98 |
+
- Whether any DB schema changed (and if so, whether a migration was added)
|
| 99 |
+
- Whether any Pydantic model interface changed (and if so, which callers were updated)
|
| 100 |
+
- The specific test or check that now passes
|
Phases.md
ADDED
|
@@ -0,0 +1,295 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# CodeReviewEnv — Phased Build Plan
|
| 2 |
+
## For: LLM-Assisted Development
|
| 3 |
+
|
| 4 |
+
---
|
| 5 |
+
|
| 6 |
+
## 🧠 What You Are Building
|
| 7 |
+
|
| 8 |
+
An OpenEnv-compliant reinforcement learning environment where an LLM agent learns to perform **dependency-aware code review**.
|
| 9 |
+
|
| 10 |
+
The environment parses a Python codebase into a **persistent dependency graph** (nodes = modules, edges = import relationships). Each node stores compressed AST summaries, linter-generated ground truth issues, and agent-written review annotations.
|
| 11 |
+
|
| 12 |
+
The agent reviews one module per episode. It receives the **full code of the current module** plus **compressed AST summaries of its neighbors** (never full neighbor code — token budget). It takes multi-step actions (flag bugs, add comments, request context, amend upstream reviews). The environment rewards correct, well-attributed findings and penalizes false positives.
|
| 13 |
+
|
| 14 |
+
The final output is an **annotated dependency graph** — a machine-readable + human-readable map of the entire codebase with reviews on every module, including cross-module causal attributions.
|
| 15 |
+
|
| 16 |
+
This is differentiated from tools like CodeRabbit because:
|
| 17 |
+
- It models cascading dependency bugs (bug in B caused by design in A)
|
| 18 |
+
- Reviews are stored back into the graph and can be amended as agent learns more
|
| 19 |
+
- It is an RL training/evaluation environment, not a static analysis tool
|
| 20 |
+
- The agent learns a policy over multi-step decisions, not a single LLM call
|
| 21 |
+
|
| 22 |
+
---
|
| 23 |
+
|
| 24 |
+
## 🗂️ Persistence Strategy
|
| 25 |
+
|
| 26 |
+
**Use SQLite via SQLModel** for all persistent state. Do NOT reparse the codebase on every run. The database stores:
|
| 27 |
+
- Parsed module nodes (code, AST summary, linter flags)
|
| 28 |
+
- Graph edges (dependency relationships + reasons)
|
| 29 |
+
- Review annotations (written by agent, updatable)
|
| 30 |
+
- Episode history (for reproducibility)
|
| 31 |
+
- Task definitions and ground truth
|
| 32 |
+
|
| 33 |
+
On startup: check if DB exists → if yes, load graph from DB → if no, parse codebase and populate DB.
|
| 34 |
+
|
| 35 |
+
This makes demos fast (parse once, review many times) and makes `reset()` cheap (clear annotations only, keep graph structure).
|
| 36 |
+
|
| 37 |
+
---
|
| 38 |
+
|
| 39 |
+
## 📁 Target Project Structure
|
| 40 |
+
|
| 41 |
+
```
|
| 42 |
+
code-review-env/
|
| 43 |
+
├── openenv.yaml
|
| 44 |
+
├── Dockerfile
|
| 45 |
+
├── README.md
|
| 46 |
+
├── inference.py # Required by spec, root level
|
| 47 |
+
├── requirements.txt
|
| 48 |
+
├── pyproject.toml
|
| 49 |
+
│
|
| 50 |
+
├── env/
|
| 51 |
+
│ ├── __init__.py
|
| 52 |
+
│ ├── environment.py # Main CodeReviewEnv class
|
| 53 |
+
│ ├── models.py # Pydantic: Observation, Action, Reward, GraphState
|
| 54 |
+
│ ├── graph.py # Graph construction, traversal, compression
|
| 55 |
+
│ ├── observation_builder.py # Assembles tiered observation per step
|
| 56 |
+
│ └── reward.py # Reward computation logic
|
| 57 |
+
│
|
| 58 |
+
├── db/
|
| 59 |
+
│ ├── __init__.py
|
| 60 |
+
│ ├── schema.py # SQLModel table definitions
|
| 61 |
+
│ ├── store.py # DB read/write operations
|
| 62 |
+
│ └── migrations.py # Init and seed scripts
|
| 63 |
+
│
|
| 64 |
+
├── parser/
|
| 65 |
+
│ ├── __init__.py
|
| 66 |
+
│ ├── ast_parser.py # AST extraction: signatures, imports, classes
|
| 67 |
+
│ ├── linter.py # Pylint + Bandit runner, stores results to DB
|
| 68 |
+
│ └── summarizer.py # Converts AST output → compressed node summary
|
| 69 |
+
│
|
| 70 |
+
├── graders/
|
| 71 |
+
│ ├── __init__.py
|
| 72 |
+
│ ├── base_grader.py # Abstract grader interface
|
| 73 |
+
│ ├── easy_grader.py # Linter match — fully deterministic
|
| 74 |
+
│ ├── medium_grader.py # AST + line attribution match
|
| 75 |
+
│ └── hard_grader.py # LLM-as-judge, temp=0, seed=42, rubric-constrained
|
| 76 |
+
│
|
| 77 |
+
├── tasks/
|
| 78 |
+
│ ├── __init__.py
|
| 79 |
+
│ ├── task_registry.py # Registers and loads tasks
|
| 80 |
+
│ ├── easy_task.py # Style/linter issue in isolated module
|
| 81 |
+
│ ├── medium_task.py # Logic bug with direct dependency context
|
| 82 |
+
│ └── hard_task.py # Cascading bug across 2+ modules
|
| 83 |
+
│
|
| 84 |
+
├── server/
|
| 85 |
+
│ ├── __init__.py
|
| 86 |
+
│ └── app.py # FastAPI server exposing OpenEnv HTTP endpoints
|
| 87 |
+
│
|
| 88 |
+
├── sample_codebase/ # Synthetic test codebase for demo
|
| 89 |
+
│ ├── auth.py
|
| 90 |
+
│ ├── checkout.py
|
| 91 |
+
│ ├── cart.py
|
| 92 |
+
│ ├── payments.py
|
| 93 |
+
│ └── config.py
|
| 94 |
+
│
|
| 95 |
+
└── tests/
|
| 96 |
+
├── test_parser.py
|
| 97 |
+
├── test_graders.py
|
| 98 |
+
├── test_environment.py
|
| 99 |
+
└── test_inference.py
|
| 100 |
+
```
|
| 101 |
+
|
| 102 |
+
---
|
| 103 |
+
|
| 104 |
+
## 📐 Core Data Models (Design Intent — Implementation Is Your Choice)
|
| 105 |
+
|
| 106 |
+
### Graph Node
|
| 107 |
+
Stores everything about one module. Persisted in DB.
|
| 108 |
+
- module_id (filename/path)
|
| 109 |
+
- raw_code (full source)
|
| 110 |
+
- ast_summary (compressed: signatures, classes, exports)
|
| 111 |
+
- linter_flags (pre-computed ground truth from pylint/bandit)
|
| 112 |
+
- dependency_reason (why this module needs its neighbors — extracted from import context)
|
| 113 |
+
- review_annotation (agent-written, nullable, updatable)
|
| 114 |
+
- review_status (pending | in_progress | reviewed)
|
| 115 |
+
- review_summary (one-line, written at episode end)
|
| 116 |
+
|
| 117 |
+
### Graph Edge
|
| 118 |
+
- source_module_id
|
| 119 |
+
- target_module_id
|
| 120 |
+
- edge_type (explicit_import | implicit_name_resolution)
|
| 121 |
+
- import_line (the actual import statement)
|
| 122 |
+
- weight (1.0 explicit, 0.5 implicit)
|
| 123 |
+
|
| 124 |
+
### Observation (Pydantic)
|
| 125 |
+
- current_module: full code + full AST summary
|
| 126 |
+
- direct_dependencies: list of compressed node summaries (NOT full code)
|
| 127 |
+
- dependents: list of compressed node summaries
|
| 128 |
+
- existing_reviews: list of one-line review summaries from already-reviewed neighbors
|
| 129 |
+
- constraint_flags: any known forced decisions from upstream
|
| 130 |
+
- step_number: int
|
| 131 |
+
- episode_id: str
|
| 132 |
+
|
| 133 |
+
### Action (Pydantic, discriminated union)
|
| 134 |
+
- APPROVE
|
| 135 |
+
- FLAG_STYLE(line: int, description: str)
|
| 136 |
+
- FLAG_BUG(line: int, description: str)
|
| 137 |
+
- FLAG_SECURITY(line: int, description: str)
|
| 138 |
+
- FLAG_DEPENDENCY_ISSUE(source_module: str, description: str)
|
| 139 |
+
- ADD_COMMENT(text: str)
|
| 140 |
+
- REQUEST_CHANGES(summary: str)
|
| 141 |
+
- REQUEST_CONTEXT(module_id: str) ← costs -0.1 reward, returns full code of neighbor
|
| 142 |
+
- AMEND_REVIEW(module_id: str, note: str) ← retroactively updates neighbor annotation
|
| 143 |
+
|
| 144 |
+
### Reward (Pydantic)
|
| 145 |
+
- value: float (0.0–1.0)
|
| 146 |
+
- reason: str
|
| 147 |
+
- cumulative: float
|
| 148 |
+
|
| 149 |
+
---
|
| 150 |
+
|
| 151 |
+
## 🏗️ PHASE 1 — Foundation & Persistence
|
| 152 |
+
**Goal: Database schema, parser, graph construction. No RL yet.**
|
| 153 |
+
|
| 154 |
+
### Tasks
|
| 155 |
+
1. Define SQLModel schema for all tables (nodes, edges, annotations, episodes, tasks)
|
| 156 |
+
2. Build `ast_parser.py` — extract from any .py file: all function signatures with type hints, all class definitions, all import statements with source resolution, all module-level constants
|
| 157 |
+
3. Build `linter.py` — run pylint and bandit programmatically on a file, parse output into structured list of {line, severity, code, message}. Store results directly to DB as ground truth.
|
| 158 |
+
4. Build `summarizer.py` — convert AST output into a compressed summary string under 100 tokens. Format: "exports: [fn(args)->return, ...] | issues: N | depends_on: [module, ...]"
|
| 159 |
+
5. Build `store.py` — CRUD operations for all tables. Key operations: upsert_node, upsert_edge, get_node_with_neighbors, update_annotation, get_full_graph
|
| 160 |
+
6. Build `graph.py` — on first run: parse all files in target directory → populate DB. On subsequent runs: load from DB. Build NetworkX DiGraph from DB records. Implement traversal order: topological sort weighted by betweenness centrality (leaf modules first, high-centrality modules last).
|
| 161 |
+
7. Build `sample_codebase/` — 5 Python files with known injected issues: one style issue, one logic bug with a direct dependency cause, one security issue, one cascading bug where the root cause is 2 hops away. Document every injected issue in a ground_truth.json file.
|
| 162 |
+
|
| 163 |
+
### Completion Criteria
|
| 164 |
+
- `python -m parser.ast_parser sample_codebase/` populates DB with all nodes and edges
|
| 165 |
+
- DB persists across runs (second run loads from DB, does not reparse)
|
| 166 |
+
- `python -m db.store` can query a node and return its summary and neighbors
|
| 167 |
+
- ground_truth.json matches linter output for easy/medium tasks
|
| 168 |
+
|
| 169 |
+
---
|
| 170 |
+
|
| 171 |
+
## 🏗️ PHASE 2 — OpenEnv Core (RL Environment)
|
| 172 |
+
**Goal: Full step()/reset()/state() loop with reward. This is the RL part.**
|
| 173 |
+
|
| 174 |
+
### Tasks
|
| 175 |
+
1. Build `models.py` — all Pydantic models: Observation, Action (discriminated union), Reward, GraphState, EpisodeRecord. Must be fully typed.
|
| 176 |
+
2. Build `observation_builder.py` — given a module_id and current graph state, assemble the tiered observation: full code for current module, compressed summaries for neighbors (pulled from DB), existing review annotations for already-reviewed neighbors, constraint flags
|
| 177 |
+
3. Build `reward.py` — implement reward logic:
|
| 178 |
+
- Easy: compare agent flags against linter ground truth. Correct flag = +0.5, false positive = -0.2, missed critical = -0.4
|
| 179 |
+
- Medium: check flag + line number within ±3 lines of ground truth = +0.5, correct comment attribution = +0.3
|
| 180 |
+
- Hard: call hard_grader with agent's FLAG_DEPENDENCY_ISSUE and the known root cause. Score returned by judge × 0.8 as reward.
|
| 181 |
+
- REQUEST_CONTEXT action always costs -0.1 (thinking cost)
|
| 182 |
+
- AMEND_REVIEW with correct attribution = +0.4 (high reward — this is the key cascading behavior)
|
| 183 |
+
- Episode completion bonus: +0.2 if all critical issues found, -0.1 if APPROVE on module with known critical bugs
|
| 184 |
+
4. Build `graders/` — implement all three graders per spec above. Hard grader must use OpenAI client (per competition spec), temperature=0, fixed rubric prompt stored as a constant.
|
| 185 |
+
5. Build `environment.py` — main class implementing full OpenEnv interface:
|
| 186 |
+
- `reset(task_id)` → clears annotations for task modules, returns first observation
|
| 187 |
+
- `step(action)` → validates action, updates graph annotations in DB, computes reward, returns (obs, reward, done, info)
|
| 188 |
+
- `state()` → returns full GraphState (serialized NetworkX graph + all annotations)
|
| 189 |
+
- Episode ends when: agent calls APPROVE or REQUEST_CHANGES, OR step limit reached (max 10 steps)
|
| 190 |
+
6. Build `tasks/` — register 3 tasks pointing to specific modules in sample_codebase with known ground truth issues
|
| 191 |
+
|
| 192 |
+
### Completion Criteria
|
| 193 |
+
- `env.reset("easy_task")` returns a valid typed Observation
|
| 194 |
+
- `env.step(FLAG_BUG(line=12, description="null risk"))` returns reward > 0 for correct flag
|
| 195 |
+
- `env.state()` returns serializable graph with annotations
|
| 196 |
+
- Full episode runs without error on all 3 tasks
|
| 197 |
+
- Reward values all fall in 0.0–1.0 range
|
| 198 |
+
|
| 199 |
+
---
|
| 200 |
+
|
| 201 |
+
## 🏗️ PHASE 3 — HTTP Server & OpenEnv Spec Compliance
|
| 202 |
+
**Goal: Wrap environment in FastAPI, pass openenv validate.**
|
| 203 |
+
|
| 204 |
+
### Tasks
|
| 205 |
+
1. Build `server/app.py` — FastAPI app exposing:
|
| 206 |
+
- POST /reset → calls env.reset(), returns Observation JSON
|
| 207 |
+
- POST /step → calls env.step(action), returns (obs, reward, done, info) JSON
|
| 208 |
+
- GET /state → calls env.state(), returns GraphState JSON
|
| 209 |
+
- GET /health → returns 200 (required for HF Space ping)
|
| 210 |
+
2. Build `openenv.yaml` — fill all required metadata: name, version, description, tasks list, observation_space, action_space, reward_range
|
| 211 |
+
3. Run `openenv validate` — fix all compliance errors
|
| 212 |
+
4. Confirm all Pydantic models serialize/deserialize correctly over HTTP
|
| 213 |
+
|
| 214 |
+
### Completion Criteria
|
| 215 |
+
- `openenv validate` passes with no errors
|
| 216 |
+
- All endpoints return correct typed responses
|
| 217 |
+
- GET /health returns 200
|
| 218 |
+
|
| 219 |
+
---
|
| 220 |
+
|
| 221 |
+
## 🏗️ PHASE 4 — Inference Script
|
| 222 |
+
**Goal: Build inference.py that runs Gemma 4 as the agent. This is what judges auto-run.**
|
| 223 |
+
|
| 224 |
+
### Critical Requirements (Non-Negotiable)
|
| 225 |
+
- File must be named `inference.py` at root
|
| 226 |
+
- Use OpenAI client for all LLM calls
|
| 227 |
+
- Read API_BASE_URL, MODEL_NAME, HF_TOKEN from environment variables
|
| 228 |
+
- Emit structured stdout logs in EXACTLY this format:
|
| 229 |
+
```
|
| 230 |
+
[START] task=<task_id> episode=<n>
|
| 231 |
+
[STEP] step=<n> action=<action_type> reward=<float> cumulative=<float>
|
| 232 |
+
[END] task=<task_id> total_reward=<float> steps=<n>
|
| 233 |
+
```
|
| 234 |
+
- Must complete all 3 tasks in under 20 minutes total
|
| 235 |
+
- Must run on 2 vCPU / 8GB RAM
|
| 236 |
+
|
| 237 |
+
### Tasks
|
| 238 |
+
1. Build the agent loop — for each task: reset env, loop step() until done, collect rewards
|
| 239 |
+
2. Build the LLM action parser — send observation to model with a structured prompt, parse response into typed Action. Use JSON mode or structured output. Handle parse failures gracefully (default to APPROVE with penalty).
|
| 240 |
+
3. Build the action prompt — system prompt explaining the environment, action space, and output format. Include the compressed observation in user message. Tell model to output JSON action only.
|
| 241 |
+
4. Implement all 3 task runs sequentially
|
| 242 |
+
5. Emit all required log lines to stdout
|
| 243 |
+
6. Final output: baseline scores for all 3 tasks printed to stdout
|
| 244 |
+
|
| 245 |
+
### Completion Criteria
|
| 246 |
+
- Script runs end to end without error
|
| 247 |
+
- All [START]/[STEP]/[END] logs emitted correctly
|
| 248 |
+
- Produces a score for each task between 0.0–1.0
|
| 249 |
+
- Completes in under 20 minutes
|
| 250 |
+
|
| 251 |
+
---
|
| 252 |
+
|
| 253 |
+
## 🏗️ PHASE 5 — Containerization & Deployment
|
| 254 |
+
**Goal: Docker build works, HF Space deploys, pre-validation script passes.**
|
| 255 |
+
|
| 256 |
+
### Tasks
|
| 257 |
+
1. Write `Dockerfile`:
|
| 258 |
+
- Base: python:3.11-slim
|
| 259 |
+
- Install system deps for pylint, bandit, networkx
|
| 260 |
+
- Copy project, install requirements
|
| 261 |
+
- On container start: run parser to populate DB if not exists, then start FastAPI server
|
| 262 |
+
- Expose port 7860 (HF Spaces default)
|
| 263 |
+
2. Write `README.md` with all required sections: environment description and motivation, observation and action space definitions, all 3 task descriptions with difficulty, setup instructions, baseline scores
|
| 264 |
+
3. Run pre-submission validation script — fix all failures
|
| 265 |
+
4. Deploy to HF Space with `openenv push`
|
| 266 |
+
5. Confirm Space URL returns 200 on GET /health and responds to POST /reset
|
| 267 |
+
|
| 268 |
+
### Completion Criteria
|
| 269 |
+
- `docker build .` succeeds
|
| 270 |
+
- `docker run -p 7860:7860` starts server cleanly
|
| 271 |
+
- HF Space URL responds to reset()
|
| 272 |
+
- Pre-validation script passes all checks
|
| 273 |
+
|
| 274 |
+
---
|
| 275 |
+
|
| 276 |
+
## ⏱️ Suggested Time Allocation (Given ~36hrs remaining)
|
| 277 |
+
|
| 278 |
+
| Phase | Time |
|
| 279 |
+
|---|---|
|
| 280 |
+
| Phase 1 — Foundation | 6 hrs |
|
| 281 |
+
| Phase 2 — RL Environment | 8 hrs |
|
| 282 |
+
| Phase 3 — Server + Spec | 3 hrs |
|
| 283 |
+
| Phase 4 — Inference Script | 4 hrs |
|
| 284 |
+
| Phase 5 — Docker + Deploy | 3 hrs |
|
| 285 |
+
| Buffer / debugging | 4 hrs |
|
| 286 |
+
|
| 287 |
+
---
|
| 288 |
+
|
| 289 |
+
## ⚠️ Known Risk Areas (Watch These)
|
| 290 |
+
|
| 291 |
+
1. **Hard grader reproducibility** — document judge prompt and seed explicitly
|
| 292 |
+
2. **DB migration on fresh Docker build** — first run must auto-populate DB from sample_codebase
|
| 293 |
+
3. **Inference script runtime** — test full 3-task run locally before submitting, must be under 20 min
|
| 294 |
+
4. **openenv validate strictness** — run it early in Phase 3, not at the end
|
| 295 |
+
5. **Reward always in 0.0–1.0** — clip all reward values, graders must never return outside range
|
code-review-env/Dockerfile
ADDED
|
@@ -0,0 +1,7 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
FROM python:3.11-slim
|
| 2 |
+
|
| 3 |
+
WORKDIR /app
|
| 4 |
+
COPY requirements.txt /app/
|
| 5 |
+
RUN pip install --no-cache-dir -r requirements.txt
|
| 6 |
+
COPY . /app
|
| 7 |
+
CMD ["python", "-m", "parser.ast_parser", "sample_codebase/"]
|
code-review-env/README.md
ADDED
|
@@ -0,0 +1,11 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# CodeReviewEnv
|
| 2 |
+
|
| 3 |
+
Phase 1 foundation for dependency-aware code review environment.
|
| 4 |
+
|
| 5 |
+
## Quickstart
|
| 6 |
+
|
| 7 |
+
```bash
|
| 8 |
+
pip install -r requirements.txt
|
| 9 |
+
python -m parser.ast_parser sample_codebase/
|
| 10 |
+
python -m db.store --module checkout
|
| 11 |
+
```
|
code-review-env/db/__init__.py
ADDED
|
@@ -0,0 +1 @@
|
|
|
|
|
|
|
| 1 |
+
"""Database package for CodeReviewEnv."""
|
code-review-env/db/migrations.py
ADDED
|
@@ -0,0 +1,28 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
from __future__ import annotations
|
| 2 |
+
|
| 3 |
+
from pathlib import Path
|
| 4 |
+
|
| 5 |
+
from sqlmodel import SQLModel, create_engine
|
| 6 |
+
|
| 7 |
+
|
| 8 |
+
def get_default_db_path() -> Path:
|
| 9 |
+
project_root = Path(__file__).resolve().parents[1]
|
| 10 |
+
return project_root / "code_review_env.db"
|
| 11 |
+
|
| 12 |
+
|
| 13 |
+
def get_engine(db_path: str | Path | None = None, echo: bool = False):
|
| 14 |
+
path = Path(db_path) if db_path else get_default_db_path()
|
| 15 |
+
path.parent.mkdir(parents=True, exist_ok=True)
|
| 16 |
+
return create_engine(f"sqlite:///{path}", echo=echo)
|
| 17 |
+
|
| 18 |
+
|
| 19 |
+
def init_db(db_path: str | Path | None = None, echo: bool = False) -> None:
|
| 20 |
+
from db import schema # noqa: F401
|
| 21 |
+
|
| 22 |
+
engine = get_engine(db_path=db_path, echo=echo)
|
| 23 |
+
SQLModel.metadata.create_all(engine)
|
| 24 |
+
|
| 25 |
+
|
| 26 |
+
if __name__ == "__main__":
|
| 27 |
+
init_db()
|
| 28 |
+
print("Database initialized")
|
code-review-env/db/schema.py
ADDED
|
@@ -0,0 +1,91 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
from __future__ import annotations
|
| 2 |
+
|
| 3 |
+
from datetime import UTC, datetime
|
| 4 |
+
from enum import StrEnum
|
| 5 |
+
from typing import Optional
|
| 6 |
+
|
| 7 |
+
from sqlmodel import Field, SQLModel
|
| 8 |
+
|
| 9 |
+
|
| 10 |
+
class EdgeType(StrEnum):
|
| 11 |
+
EXPLICIT_IMPORT = "explicit_import"
|
| 12 |
+
IMPLICIT_NAME_RESOLUTION = "implicit_name_resolution"
|
| 13 |
+
|
| 14 |
+
|
| 15 |
+
class ReviewStatus(StrEnum):
|
| 16 |
+
PENDING = "pending"
|
| 17 |
+
IN_PROGRESS = "in_progress"
|
| 18 |
+
REVIEWED = "reviewed"
|
| 19 |
+
|
| 20 |
+
|
| 21 |
+
class Severity(StrEnum):
|
| 22 |
+
LOW = "low"
|
| 23 |
+
MEDIUM = "medium"
|
| 24 |
+
HIGH = "high"
|
| 25 |
+
|
| 26 |
+
|
| 27 |
+
class ModuleNode(SQLModel, table=True):
|
| 28 |
+
id: Optional[int] = Field(default=None, primary_key=True)
|
| 29 |
+
source_root: str = Field(index=True)
|
| 30 |
+
module_id: str = Field(index=True)
|
| 31 |
+
raw_code: str
|
| 32 |
+
ast_summary: str
|
| 33 |
+
dependency_reason: str = ""
|
| 34 |
+
review_annotation: Optional[str] = None
|
| 35 |
+
review_status: ReviewStatus = Field(default=ReviewStatus.PENDING)
|
| 36 |
+
review_summary: Optional[str] = None
|
| 37 |
+
created_at: datetime = Field(default_factory=lambda: datetime.now(UTC))
|
| 38 |
+
updated_at: datetime = Field(default_factory=lambda: datetime.now(UTC))
|
| 39 |
+
|
| 40 |
+
|
| 41 |
+
class ModuleEdge(SQLModel, table=True):
|
| 42 |
+
id: Optional[int] = Field(default=None, primary_key=True)
|
| 43 |
+
source_root: str = Field(index=True)
|
| 44 |
+
source_module_id: str = Field(index=True)
|
| 45 |
+
target_module_id: str = Field(index=True)
|
| 46 |
+
edge_type: EdgeType = Field(default=EdgeType.EXPLICIT_IMPORT)
|
| 47 |
+
import_line: str
|
| 48 |
+
weight: float = 1.0
|
| 49 |
+
|
| 50 |
+
|
| 51 |
+
class LinterFinding(SQLModel, table=True):
|
| 52 |
+
id: Optional[int] = Field(default=None, primary_key=True)
|
| 53 |
+
source_root: str = Field(index=True)
|
| 54 |
+
module_id: str = Field(index=True)
|
| 55 |
+
tool: str = Field(index=True)
|
| 56 |
+
line: int
|
| 57 |
+
severity: Severity
|
| 58 |
+
code: str
|
| 59 |
+
message: str
|
| 60 |
+
|
| 61 |
+
|
| 62 |
+
class ReviewAnnotation(SQLModel, table=True):
|
| 63 |
+
id: Optional[int] = Field(default=None, primary_key=True)
|
| 64 |
+
source_root: str = Field(index=True)
|
| 65 |
+
module_id: str = Field(index=True)
|
| 66 |
+
episode_id: str = Field(index=True)
|
| 67 |
+
step_number: int
|
| 68 |
+
action_type: str
|
| 69 |
+
note: str
|
| 70 |
+
created_at: datetime = Field(default_factory=lambda: datetime.now(UTC))
|
| 71 |
+
|
| 72 |
+
|
| 73 |
+
class EpisodeRecord(SQLModel, table=True):
|
| 74 |
+
id: Optional[int] = Field(default=None, primary_key=True)
|
| 75 |
+
source_root: str = Field(index=True)
|
| 76 |
+
episode_id: str = Field(index=True)
|
| 77 |
+
task_id: str = Field(index=True)
|
| 78 |
+
module_id: str = Field(index=True)
|
| 79 |
+
total_steps: int
|
| 80 |
+
cumulative_reward: float = 0.0
|
| 81 |
+
created_at: datetime = Field(default_factory=lambda: datetime.now(UTC))
|
| 82 |
+
|
| 83 |
+
|
| 84 |
+
class TaskDefinition(SQLModel, table=True):
|
| 85 |
+
id: Optional[int] = Field(default=None, primary_key=True)
|
| 86 |
+
source_root: str = Field(index=True)
|
| 87 |
+
task_id: str = Field(index=True)
|
| 88 |
+
task_level: str = Field(index=True)
|
| 89 |
+
target_module_id: str = Field(index=True)
|
| 90 |
+
description: str
|
| 91 |
+
ground_truth_ref: str
|
code-review-env/db/store.py
ADDED
|
@@ -0,0 +1,384 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
from __future__ import annotations
|
| 2 |
+
|
| 3 |
+
import argparse
|
| 4 |
+
from dataclasses import dataclass
|
| 5 |
+
from datetime import UTC, datetime
|
| 6 |
+
from pathlib import Path
|
| 7 |
+
from typing import Iterator, Optional
|
| 8 |
+
|
| 9 |
+
from pydantic import BaseModel
|
| 10 |
+
from sqlmodel import Session, delete, select
|
| 11 |
+
|
| 12 |
+
from db.migrations import get_default_db_path, get_engine, init_db
|
| 13 |
+
from db.schema import (
|
| 14 |
+
EdgeType,
|
| 15 |
+
LinterFinding,
|
| 16 |
+
ModuleEdge,
|
| 17 |
+
ModuleNode,
|
| 18 |
+
ReviewAnnotation,
|
| 19 |
+
ReviewStatus,
|
| 20 |
+
Severity,
|
| 21 |
+
)
|
| 22 |
+
|
| 23 |
+
|
| 24 |
+
@dataclass
|
| 25 |
+
class DBConfig:
|
| 26 |
+
source_root: str
|
| 27 |
+
db_path: Path
|
| 28 |
+
|
| 29 |
+
|
| 30 |
+
class NeighborSummary(BaseModel):
|
| 31 |
+
module_id: str
|
| 32 |
+
ast_summary: str
|
| 33 |
+
review_summary: Optional[str]
|
| 34 |
+
|
| 35 |
+
|
| 36 |
+
class NodeWithNeighbors(BaseModel):
|
| 37 |
+
module_id: str
|
| 38 |
+
ast_summary: str
|
| 39 |
+
review_status: ReviewStatus
|
| 40 |
+
neighbors: list[NeighborSummary]
|
| 41 |
+
|
| 42 |
+
|
| 43 |
+
class GraphNodeRecord(BaseModel):
|
| 44 |
+
module_id: str
|
| 45 |
+
ast_summary: str
|
| 46 |
+
review_status: ReviewStatus
|
| 47 |
+
|
| 48 |
+
|
| 49 |
+
class GraphEdgeRecord(BaseModel):
|
| 50 |
+
source_module_id: str
|
| 51 |
+
target_module_id: str
|
| 52 |
+
weight: float
|
| 53 |
+
import_line: str
|
| 54 |
+
|
| 55 |
+
|
| 56 |
+
class GraphSnapshot(BaseModel):
|
| 57 |
+
nodes: list[GraphNodeRecord]
|
| 58 |
+
edges: list[GraphEdgeRecord]
|
| 59 |
+
|
| 60 |
+
|
| 61 |
+
class Store:
|
| 62 |
+
def __init__(self, source_root: str, db_path: str | Path | None = None) -> None:
|
| 63 |
+
self.config = DBConfig(
|
| 64 |
+
source_root=str(Path(source_root).resolve()),
|
| 65 |
+
db_path=Path(db_path) if db_path else get_default_db_path(),
|
| 66 |
+
)
|
| 67 |
+
init_db(db_path=self.config.db_path)
|
| 68 |
+
self.engine = get_engine(self.config.db_path)
|
| 69 |
+
|
| 70 |
+
def session(self) -> Iterator[Session]:
|
| 71 |
+
with Session(self.engine) as session:
|
| 72 |
+
yield session
|
| 73 |
+
|
| 74 |
+
def upsert_node(
|
| 75 |
+
self,
|
| 76 |
+
module_id: str,
|
| 77 |
+
raw_code: str,
|
| 78 |
+
ast_summary: str,
|
| 79 |
+
dependency_reason: str,
|
| 80 |
+
) -> ModuleNode:
|
| 81 |
+
with Session(self.engine) as session:
|
| 82 |
+
existing = session.exec(
|
| 83 |
+
select(ModuleNode).where(
|
| 84 |
+
ModuleNode.source_root == self.config.source_root,
|
| 85 |
+
ModuleNode.module_id == module_id,
|
| 86 |
+
)
|
| 87 |
+
).first()
|
| 88 |
+
if existing:
|
| 89 |
+
existing.raw_code = raw_code
|
| 90 |
+
existing.ast_summary = ast_summary
|
| 91 |
+
existing.dependency_reason = dependency_reason
|
| 92 |
+
existing.updated_at = datetime.now(UTC)
|
| 93 |
+
session.add(existing)
|
| 94 |
+
session.commit()
|
| 95 |
+
session.refresh(existing)
|
| 96 |
+
return existing
|
| 97 |
+
|
| 98 |
+
node = ModuleNode(
|
| 99 |
+
source_root=self.config.source_root,
|
| 100 |
+
module_id=module_id,
|
| 101 |
+
raw_code=raw_code,
|
| 102 |
+
ast_summary=ast_summary,
|
| 103 |
+
dependency_reason=dependency_reason,
|
| 104 |
+
)
|
| 105 |
+
session.add(node)
|
| 106 |
+
session.commit()
|
| 107 |
+
session.refresh(node)
|
| 108 |
+
return node
|
| 109 |
+
|
| 110 |
+
def upsert_edge(
|
| 111 |
+
self,
|
| 112 |
+
source_module_id: str,
|
| 113 |
+
target_module_id: str,
|
| 114 |
+
edge_type: EdgeType,
|
| 115 |
+
import_line: str,
|
| 116 |
+
weight: float,
|
| 117 |
+
) -> ModuleEdge:
|
| 118 |
+
with Session(self.engine) as session:
|
| 119 |
+
existing = session.exec(
|
| 120 |
+
select(ModuleEdge).where(
|
| 121 |
+
ModuleEdge.source_root == self.config.source_root,
|
| 122 |
+
ModuleEdge.source_module_id == source_module_id,
|
| 123 |
+
ModuleEdge.target_module_id == target_module_id,
|
| 124 |
+
ModuleEdge.import_line == import_line,
|
| 125 |
+
)
|
| 126 |
+
).first()
|
| 127 |
+
if existing:
|
| 128 |
+
existing.edge_type = edge_type
|
| 129 |
+
existing.weight = weight
|
| 130 |
+
session.add(existing)
|
| 131 |
+
session.commit()
|
| 132 |
+
session.refresh(existing)
|
| 133 |
+
return existing
|
| 134 |
+
|
| 135 |
+
edge = ModuleEdge(
|
| 136 |
+
source_root=self.config.source_root,
|
| 137 |
+
source_module_id=source_module_id,
|
| 138 |
+
target_module_id=target_module_id,
|
| 139 |
+
edge_type=edge_type,
|
| 140 |
+
import_line=import_line,
|
| 141 |
+
weight=weight,
|
| 142 |
+
)
|
| 143 |
+
session.add(edge)
|
| 144 |
+
session.commit()
|
| 145 |
+
session.refresh(edge)
|
| 146 |
+
return edge
|
| 147 |
+
|
| 148 |
+
def replace_findings_for_module(self, module_id: str, findings: list[dict[str, str | int]]) -> None:
|
| 149 |
+
with Session(self.engine) as session:
|
| 150 |
+
session.exec(
|
| 151 |
+
delete(LinterFinding).where(
|
| 152 |
+
LinterFinding.source_root == self.config.source_root,
|
| 153 |
+
LinterFinding.module_id == module_id,
|
| 154 |
+
)
|
| 155 |
+
)
|
| 156 |
+
for finding in findings:
|
| 157 |
+
session.add(
|
| 158 |
+
LinterFinding(
|
| 159 |
+
source_root=self.config.source_root,
|
| 160 |
+
module_id=module_id,
|
| 161 |
+
tool=str(finding["tool"]),
|
| 162 |
+
line=int(finding["line"]),
|
| 163 |
+
severity=Severity(str(finding["severity"])),
|
| 164 |
+
code=str(finding["code"]),
|
| 165 |
+
message=str(finding["message"]),
|
| 166 |
+
)
|
| 167 |
+
)
|
| 168 |
+
session.commit()
|
| 169 |
+
|
| 170 |
+
def get_findings(self, module_id: str) -> list[LinterFinding]:
|
| 171 |
+
with Session(self.engine) as session:
|
| 172 |
+
return list(
|
| 173 |
+
session.exec(
|
| 174 |
+
select(LinterFinding).where(
|
| 175 |
+
LinterFinding.source_root == self.config.source_root,
|
| 176 |
+
LinterFinding.module_id == module_id,
|
| 177 |
+
)
|
| 178 |
+
).all()
|
| 179 |
+
)
|
| 180 |
+
|
| 181 |
+
def get_node(self, module_id: str) -> Optional[ModuleNode]:
|
| 182 |
+
with Session(self.engine) as session:
|
| 183 |
+
return session.exec(
|
| 184 |
+
select(ModuleNode).where(
|
| 185 |
+
ModuleNode.source_root == self.config.source_root,
|
| 186 |
+
ModuleNode.module_id == module_id,
|
| 187 |
+
)
|
| 188 |
+
).first()
|
| 189 |
+
|
| 190 |
+
def get_node_with_neighbors(self, module_id: str) -> Optional[NodeWithNeighbors]:
|
| 191 |
+
with Session(self.engine) as session:
|
| 192 |
+
node = session.exec(
|
| 193 |
+
select(ModuleNode).where(
|
| 194 |
+
ModuleNode.source_root == self.config.source_root,
|
| 195 |
+
ModuleNode.module_id == module_id,
|
| 196 |
+
)
|
| 197 |
+
).first()
|
| 198 |
+
if not node:
|
| 199 |
+
return None
|
| 200 |
+
|
| 201 |
+
outgoing = list(
|
| 202 |
+
session.exec(
|
| 203 |
+
select(ModuleEdge).where(
|
| 204 |
+
ModuleEdge.source_root == self.config.source_root,
|
| 205 |
+
ModuleEdge.source_module_id == module_id,
|
| 206 |
+
)
|
| 207 |
+
).all()
|
| 208 |
+
)
|
| 209 |
+
incoming = list(
|
| 210 |
+
session.exec(
|
| 211 |
+
select(ModuleEdge).where(
|
| 212 |
+
ModuleEdge.source_root == self.config.source_root,
|
| 213 |
+
ModuleEdge.target_module_id == module_id,
|
| 214 |
+
)
|
| 215 |
+
).all()
|
| 216 |
+
)
|
| 217 |
+
|
| 218 |
+
neighbor_ids = {edge.target_module_id for edge in outgoing}
|
| 219 |
+
neighbor_ids.update(edge.source_module_id for edge in incoming)
|
| 220 |
+
|
| 221 |
+
neighbors: list[NeighborSummary] = []
|
| 222 |
+
for neighbor_id in sorted(neighbor_ids):
|
| 223 |
+
neighbor = session.exec(
|
| 224 |
+
select(ModuleNode).where(
|
| 225 |
+
ModuleNode.source_root == self.config.source_root,
|
| 226 |
+
ModuleNode.module_id == neighbor_id,
|
| 227 |
+
)
|
| 228 |
+
).first()
|
| 229 |
+
if neighbor:
|
| 230 |
+
neighbors.append(
|
| 231 |
+
NeighborSummary(
|
| 232 |
+
module_id=neighbor.module_id,
|
| 233 |
+
ast_summary=neighbor.ast_summary,
|
| 234 |
+
review_summary=neighbor.review_summary,
|
| 235 |
+
)
|
| 236 |
+
)
|
| 237 |
+
|
| 238 |
+
return NodeWithNeighbors(
|
| 239 |
+
module_id=node.module_id,
|
| 240 |
+
ast_summary=node.ast_summary,
|
| 241 |
+
review_status=node.review_status,
|
| 242 |
+
neighbors=neighbors,
|
| 243 |
+
)
|
| 244 |
+
|
| 245 |
+
def update_annotation(
|
| 246 |
+
self,
|
| 247 |
+
module_id: str,
|
| 248 |
+
episode_id: str,
|
| 249 |
+
step_number: int,
|
| 250 |
+
action_type: str,
|
| 251 |
+
note: str,
|
| 252 |
+
review_summary: str | None = None,
|
| 253 |
+
review_status: ReviewStatus | None = None,
|
| 254 |
+
) -> None:
|
| 255 |
+
with Session(self.engine) as session:
|
| 256 |
+
node = session.exec(
|
| 257 |
+
select(ModuleNode).where(
|
| 258 |
+
ModuleNode.source_root == self.config.source_root,
|
| 259 |
+
ModuleNode.module_id == module_id,
|
| 260 |
+
)
|
| 261 |
+
).first()
|
| 262 |
+
if not node:
|
| 263 |
+
raise ValueError(f"Unknown module: {module_id}")
|
| 264 |
+
|
| 265 |
+
node.review_annotation = note
|
| 266 |
+
if review_summary is not None:
|
| 267 |
+
node.review_summary = review_summary
|
| 268 |
+
if review_status is not None:
|
| 269 |
+
node.review_status = review_status
|
| 270 |
+
node.updated_at = datetime.now(UTC)
|
| 271 |
+
|
| 272 |
+
session.add(node)
|
| 273 |
+
session.add(
|
| 274 |
+
ReviewAnnotation(
|
| 275 |
+
source_root=self.config.source_root,
|
| 276 |
+
module_id=module_id,
|
| 277 |
+
episode_id=episode_id,
|
| 278 |
+
step_number=step_number,
|
| 279 |
+
action_type=action_type,
|
| 280 |
+
note=note,
|
| 281 |
+
)
|
| 282 |
+
)
|
| 283 |
+
session.commit()
|
| 284 |
+
|
| 285 |
+
def get_full_graph(self) -> GraphSnapshot:
|
| 286 |
+
with Session(self.engine) as session:
|
| 287 |
+
nodes = list(
|
| 288 |
+
session.exec(
|
| 289 |
+
select(ModuleNode).where(ModuleNode.source_root == self.config.source_root)
|
| 290 |
+
).all()
|
| 291 |
+
)
|
| 292 |
+
edges = list(
|
| 293 |
+
session.exec(
|
| 294 |
+
select(ModuleEdge).where(ModuleEdge.source_root == self.config.source_root)
|
| 295 |
+
).all()
|
| 296 |
+
)
|
| 297 |
+
|
| 298 |
+
return GraphSnapshot(
|
| 299 |
+
nodes=[
|
| 300 |
+
GraphNodeRecord(
|
| 301 |
+
module_id=node.module_id,
|
| 302 |
+
ast_summary=node.ast_summary,
|
| 303 |
+
review_status=node.review_status,
|
| 304 |
+
)
|
| 305 |
+
for node in nodes
|
| 306 |
+
],
|
| 307 |
+
edges=[
|
| 308 |
+
GraphEdgeRecord(
|
| 309 |
+
source_module_id=edge.source_module_id,
|
| 310 |
+
target_module_id=edge.target_module_id,
|
| 311 |
+
weight=edge.weight,
|
| 312 |
+
import_line=edge.import_line,
|
| 313 |
+
)
|
| 314 |
+
for edge in edges
|
| 315 |
+
],
|
| 316 |
+
)
|
| 317 |
+
|
| 318 |
+
def has_nodes(self) -> bool:
|
| 319 |
+
with Session(self.engine) as session:
|
| 320 |
+
first_node = session.exec(
|
| 321 |
+
select(ModuleNode.id).where(ModuleNode.source_root == self.config.source_root)
|
| 322 |
+
).first()
|
| 323 |
+
return first_node is not None
|
| 324 |
+
|
| 325 |
+
def clear_source_graph(self) -> None:
|
| 326 |
+
with Session(self.engine) as session:
|
| 327 |
+
session.exec(
|
| 328 |
+
delete(ReviewAnnotation).where(
|
| 329 |
+
ReviewAnnotation.source_root == self.config.source_root
|
| 330 |
+
)
|
| 331 |
+
)
|
| 332 |
+
session.exec(
|
| 333 |
+
delete(LinterFinding).where(
|
| 334 |
+
LinterFinding.source_root == self.config.source_root
|
| 335 |
+
)
|
| 336 |
+
)
|
| 337 |
+
session.exec(
|
| 338 |
+
delete(ModuleEdge).where(
|
| 339 |
+
ModuleEdge.source_root == self.config.source_root
|
| 340 |
+
)
|
| 341 |
+
)
|
| 342 |
+
session.exec(
|
| 343 |
+
delete(ModuleNode).where(
|
| 344 |
+
ModuleNode.source_root == self.config.source_root
|
| 345 |
+
)
|
| 346 |
+
)
|
| 347 |
+
session.commit()
|
| 348 |
+
|
| 349 |
+
def clear_annotations(self) -> None:
|
| 350 |
+
with Session(self.engine) as session:
|
| 351 |
+
nodes = list(
|
| 352 |
+
session.exec(
|
| 353 |
+
select(ModuleNode).where(ModuleNode.source_root == self.config.source_root)
|
| 354 |
+
).all()
|
| 355 |
+
)
|
| 356 |
+
for node in nodes:
|
| 357 |
+
node.review_annotation = None
|
| 358 |
+
node.review_summary = None
|
| 359 |
+
node.review_status = ReviewStatus.PENDING
|
| 360 |
+
node.updated_at = datetime.now(UTC)
|
| 361 |
+
session.add(node)
|
| 362 |
+
session.commit()
|
| 363 |
+
|
| 364 |
+
|
| 365 |
+
def _build_parser() -> argparse.ArgumentParser:
|
| 366 |
+
parser = argparse.ArgumentParser(description="Store query helper")
|
| 367 |
+
parser.add_argument("--root", default="sample_codebase", help="Source root directory")
|
| 368 |
+
parser.add_argument("--db-path", default=None, help="SQLite path")
|
| 369 |
+
parser.add_argument("--module", required=True, help="Module id (without .py)")
|
| 370 |
+
return parser
|
| 371 |
+
|
| 372 |
+
|
| 373 |
+
def main() -> None:
|
| 374 |
+
args = _build_parser().parse_args()
|
| 375 |
+
store = Store(source_root=args.root, db_path=args.db_path)
|
| 376 |
+
result = store.get_node_with_neighbors(args.module)
|
| 377 |
+
if result is None:
|
| 378 |
+
print(f"Module '{args.module}' not found")
|
| 379 |
+
return
|
| 380 |
+
print(result.model_dump_json(indent=2))
|
| 381 |
+
|
| 382 |
+
|
| 383 |
+
if __name__ == "__main__":
|
| 384 |
+
main()
|
code-review-env/env/__init__.py
ADDED
|
@@ -0,0 +1 @@
|
|
|
|
|
|
|
| 1 |
+
"""Environment package for CodeReviewEnv."""
|
code-review-env/env/environment.py
ADDED
|
@@ -0,0 +1,6 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""Phase 2 implementation placeholder."""
|
| 2 |
+
|
| 3 |
+
|
| 4 |
+
class CodeReviewEnv:
|
| 5 |
+
def __init__(self) -> None:
|
| 6 |
+
raise NotImplementedError("Phase 2 implementation pending")
|
code-review-env/env/graph.py
ADDED
|
@@ -0,0 +1,105 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
from __future__ import annotations
|
| 2 |
+
|
| 3 |
+
from dataclasses import dataclass
|
| 4 |
+
from pathlib import Path
|
| 5 |
+
|
| 6 |
+
import networkx as nx
|
| 7 |
+
from sqlmodel import Session, select
|
| 8 |
+
|
| 9 |
+
from db.schema import ModuleEdge, ModuleNode
|
| 10 |
+
from db.store import Store
|
| 11 |
+
from parser.ast_parser import parse_directory
|
| 12 |
+
|
| 13 |
+
|
| 14 |
+
@dataclass
|
| 15 |
+
class GraphLoadResult:
|
| 16 |
+
graph: nx.DiGraph
|
| 17 |
+
loaded_from_cache: bool
|
| 18 |
+
|
| 19 |
+
|
| 20 |
+
class DependencyGraph:
|
| 21 |
+
def __init__(self, target_dir: str | Path, db_path: str | Path | None = None) -> None:
|
| 22 |
+
self.target_dir = Path(target_dir).resolve()
|
| 23 |
+
self.store = Store(source_root=str(self.target_dir), db_path=db_path)
|
| 24 |
+
|
| 25 |
+
def load_or_build(self, force_reparse: bool = False) -> GraphLoadResult:
|
| 26 |
+
if force_reparse or not self.store.has_nodes():
|
| 27 |
+
parse_directory(self.target_dir, db_path=str(self.store.config.db_path))
|
| 28 |
+
loaded_from_cache = False
|
| 29 |
+
else:
|
| 30 |
+
loaded_from_cache = True
|
| 31 |
+
return GraphLoadResult(graph=self._build_graph(), loaded_from_cache=loaded_from_cache)
|
| 32 |
+
|
| 33 |
+
def _build_graph(self) -> nx.DiGraph:
|
| 34 |
+
graph = nx.DiGraph()
|
| 35 |
+
with Session(self.store.engine) as session:
|
| 36 |
+
nodes = list(
|
| 37 |
+
session.exec(
|
| 38 |
+
select(ModuleNode).where(ModuleNode.source_root == self.store.config.source_root)
|
| 39 |
+
).all()
|
| 40 |
+
)
|
| 41 |
+
edges = list(
|
| 42 |
+
session.exec(
|
| 43 |
+
select(ModuleEdge).where(ModuleEdge.source_root == self.store.config.source_root)
|
| 44 |
+
).all()
|
| 45 |
+
)
|
| 46 |
+
|
| 47 |
+
for node in nodes:
|
| 48 |
+
graph.add_node(
|
| 49 |
+
node.module_id,
|
| 50 |
+
ast_summary=node.ast_summary,
|
| 51 |
+
review_status=node.review_status.value,
|
| 52 |
+
)
|
| 53 |
+
|
| 54 |
+
for edge in edges:
|
| 55 |
+
graph.add_edge(
|
| 56 |
+
edge.source_module_id,
|
| 57 |
+
edge.target_module_id,
|
| 58 |
+
import_line=edge.import_line,
|
| 59 |
+
edge_type=edge.edge_type.value,
|
| 60 |
+
weight=edge.weight,
|
| 61 |
+
)
|
| 62 |
+
|
| 63 |
+
return graph
|
| 64 |
+
|
| 65 |
+
def traversal_order(self, graph: nx.DiGraph | None = None) -> list[str]:
|
| 66 |
+
graph = graph or self._build_graph()
|
| 67 |
+
if graph.number_of_nodes() == 0:
|
| 68 |
+
return []
|
| 69 |
+
|
| 70 |
+
if not nx.is_directed_acyclic_graph(graph):
|
| 71 |
+
# Fall back to deterministic ordering if cyclic imports exist.
|
| 72 |
+
return sorted(graph.nodes())
|
| 73 |
+
|
| 74 |
+
centrality = nx.betweenness_centrality(graph)
|
| 75 |
+
indegree = {node: graph.in_degree(node) for node in graph.nodes()}
|
| 76 |
+
queue = [node for node, deg in indegree.items() if deg == 0]
|
| 77 |
+
order: list[str] = []
|
| 78 |
+
|
| 79 |
+
def rank(node: str) -> tuple[float, float, str]:
|
| 80 |
+
return (
|
| 81 |
+
float(graph.out_degree(node)),
|
| 82 |
+
float(centrality.get(node, 0.0)),
|
| 83 |
+
node,
|
| 84 |
+
)
|
| 85 |
+
|
| 86 |
+
while queue:
|
| 87 |
+
queue.sort(key=rank)
|
| 88 |
+
current = queue.pop(0)
|
| 89 |
+
order.append(current)
|
| 90 |
+
for successor in sorted(graph.successors(current)):
|
| 91 |
+
indegree[successor] -= 1
|
| 92 |
+
if indegree[successor] == 0:
|
| 93 |
+
queue.append(successor)
|
| 94 |
+
|
| 95 |
+
return order
|
| 96 |
+
|
| 97 |
+
|
| 98 |
+
if __name__ == "__main__":
|
| 99 |
+
manager = DependencyGraph(target_dir="sample_codebase")
|
| 100 |
+
result = manager.load_or_build()
|
| 101 |
+
print(
|
| 102 |
+
f"Loaded graph with {result.graph.number_of_nodes()} nodes and "
|
| 103 |
+
f"{result.graph.number_of_edges()} edges (cache={result.loaded_from_cache})"
|
| 104 |
+
)
|
| 105 |
+
print("Traversal order:", manager.traversal_order(result.graph))
|
code-review-env/env/models.py
ADDED
|
@@ -0,0 +1 @@
|
|
|
|
|
|
|
| 1 |
+
"""Phase 2 implementation placeholder."""
|
code-review-env/env/observation_builder.py
ADDED
|
@@ -0,0 +1 @@
|
|
|
|
|
|
|
| 1 |
+
"""Phase 2 implementation placeholder."""
|
code-review-env/env/reward.py
ADDED
|
@@ -0,0 +1 @@
|
|
|
|
|
|
|
| 1 |
+
"""Phase 2 implementation placeholder."""
|
code-review-env/graders/__init__.py
ADDED
|
@@ -0,0 +1 @@
|
|
|
|
|
|
|
| 1 |
+
"""Graders package placeholder for later phases."""
|
code-review-env/graders/base_grader.py
ADDED
|
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""Phase 2+ implementation placeholder."""
|
| 2 |
+
|
| 3 |
+
|
| 4 |
+
class BaseGrader:
|
| 5 |
+
pass
|
code-review-env/graders/easy_grader.py
ADDED
|
@@ -0,0 +1 @@
|
|
|
|
|
|
|
| 1 |
+
"""Phase 2+ implementation placeholder."""
|
code-review-env/graders/hard_grader.py
ADDED
|
@@ -0,0 +1 @@
|
|
|
|
|
|
|
| 1 |
+
"""Phase 2+ implementation placeholder."""
|
code-review-env/graders/medium_grader.py
ADDED
|
@@ -0,0 +1 @@
|
|
|
|
|
|
|
| 1 |
+
"""Phase 2+ implementation placeholder."""
|
code-review-env/inference.py
ADDED
|
@@ -0,0 +1,4 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""Phase 4 implementation placeholder."""
|
| 2 |
+
|
| 3 |
+
if __name__ == "__main__":
|
| 4 |
+
raise SystemExit("inference.py is not implemented in Phase 1")
|
code-review-env/openenv.yaml
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
name: code-review-env
|
| 2 |
+
version: 0.1.0
|
| 3 |
+
description: Phase 1 scaffold
|
code-review-env/parser/__init__.py
ADDED
|
@@ -0,0 +1 @@
|
|
|
|
|
|
|
| 1 |
+
"""Parser package for CodeReviewEnv."""
|
code-review-env/parser/ast_parser.py
ADDED
|
@@ -0,0 +1,189 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
from __future__ import annotations
|
| 2 |
+
|
| 3 |
+
import argparse
|
| 4 |
+
import ast
|
| 5 |
+
from pathlib import Path
|
| 6 |
+
|
| 7 |
+
from pydantic import BaseModel
|
| 8 |
+
|
| 9 |
+
from db.schema import EdgeType
|
| 10 |
+
from db.store import Store
|
| 11 |
+
from parser.linter import run_linters
|
| 12 |
+
from parser.summarizer import summarize_module
|
| 13 |
+
|
| 14 |
+
|
| 15 |
+
class ImportRef(BaseModel):
|
| 16 |
+
target_module: str
|
| 17 |
+
import_line: str
|
| 18 |
+
edge_type: EdgeType = EdgeType.EXPLICIT_IMPORT
|
| 19 |
+
|
| 20 |
+
|
| 21 |
+
class ParsedModule(BaseModel):
|
| 22 |
+
module_id: str
|
| 23 |
+
raw_code: str
|
| 24 |
+
function_signatures: list[str]
|
| 25 |
+
classes: list[str]
|
| 26 |
+
imports: list[ImportRef]
|
| 27 |
+
constants: list[str]
|
| 28 |
+
dependencies: list[str]
|
| 29 |
+
|
| 30 |
+
|
| 31 |
+
class _Visitor(ast.NodeVisitor):
|
| 32 |
+
def __init__(self) -> None:
|
| 33 |
+
self.function_signatures: list[str] = []
|
| 34 |
+
self.classes: list[str] = []
|
| 35 |
+
self.constants: list[str] = []
|
| 36 |
+
self.imports: list[tuple[str, str]] = []
|
| 37 |
+
|
| 38 |
+
def visit_FunctionDef(self, node: ast.FunctionDef) -> None:
|
| 39 |
+
args: list[str] = []
|
| 40 |
+
for arg in node.args.args:
|
| 41 |
+
if arg.annotation is not None:
|
| 42 |
+
args.append(f"{arg.arg}: {ast.unparse(arg.annotation)}")
|
| 43 |
+
else:
|
| 44 |
+
args.append(arg.arg)
|
| 45 |
+
returns = ast.unparse(node.returns) if node.returns is not None else "None"
|
| 46 |
+
self.function_signatures.append(f"{node.name}({', '.join(args)})->{returns}")
|
| 47 |
+
self.generic_visit(node)
|
| 48 |
+
|
| 49 |
+
def visit_AsyncFunctionDef(self, node: ast.AsyncFunctionDef) -> None:
|
| 50 |
+
fake = ast.FunctionDef(
|
| 51 |
+
name=node.name,
|
| 52 |
+
args=node.args,
|
| 53 |
+
body=node.body,
|
| 54 |
+
decorator_list=node.decorator_list,
|
| 55 |
+
returns=node.returns,
|
| 56 |
+
type_comment=node.type_comment,
|
| 57 |
+
)
|
| 58 |
+
self.visit_FunctionDef(fake)
|
| 59 |
+
|
| 60 |
+
def visit_ClassDef(self, node: ast.ClassDef) -> None:
|
| 61 |
+
self.classes.append(node.name)
|
| 62 |
+
self.generic_visit(node)
|
| 63 |
+
|
| 64 |
+
def visit_Import(self, node: ast.Import) -> None:
|
| 65 |
+
line = ast.get_source_segment(self._source, node) or "import"
|
| 66 |
+
for alias in node.names:
|
| 67 |
+
self.imports.append((alias.name, line))
|
| 68 |
+
|
| 69 |
+
def visit_ImportFrom(self, node: ast.ImportFrom) -> None:
|
| 70 |
+
module = node.module or ""
|
| 71 |
+
level = node.level or 0
|
| 72 |
+
dotted = "." * level + module
|
| 73 |
+
line = ast.get_source_segment(self._source, node) or "from"
|
| 74 |
+
self.imports.append((dotted, line))
|
| 75 |
+
|
| 76 |
+
def visit_Assign(self, node: ast.Assign) -> None:
|
| 77 |
+
if isinstance(node.value, ast.Constant):
|
| 78 |
+
for target in node.targets:
|
| 79 |
+
if isinstance(target, ast.Name) and target.id.isupper():
|
| 80 |
+
self.constants.append(target.id)
|
| 81 |
+
self.generic_visit(node)
|
| 82 |
+
|
| 83 |
+
def parse(self, tree: ast.AST, source: str) -> None:
|
| 84 |
+
self._source = source
|
| 85 |
+
self.visit(tree)
|
| 86 |
+
|
| 87 |
+
|
| 88 |
+
def _to_module_id(path: Path, root: Path) -> str:
|
| 89 |
+
rel = path.resolve().relative_to(root.resolve())
|
| 90 |
+
return str(rel.with_suffix("")).replace("/", ".")
|
| 91 |
+
|
| 92 |
+
|
| 93 |
+
def _resolve_relative_import(current_module: str, ref: str) -> str:
|
| 94 |
+
if not ref.startswith("."):
|
| 95 |
+
return ref
|
| 96 |
+
dots = len(ref) - len(ref.lstrip("."))
|
| 97 |
+
suffix = ref.lstrip(".")
|
| 98 |
+
parts = current_module.split(".")
|
| 99 |
+
base = parts[:-dots] if dots <= len(parts) else []
|
| 100 |
+
if suffix:
|
| 101 |
+
base.append(suffix)
|
| 102 |
+
return ".".join(part for part in base if part)
|
| 103 |
+
|
| 104 |
+
|
| 105 |
+
def parse_python_file(path: Path, root_dir: Path) -> ParsedModule:
|
| 106 |
+
source = path.read_text(encoding="utf-8")
|
| 107 |
+
module_id = _to_module_id(path, root_dir)
|
| 108 |
+
tree = ast.parse(source)
|
| 109 |
+
|
| 110 |
+
visitor = _Visitor()
|
| 111 |
+
visitor.parse(tree, source)
|
| 112 |
+
|
| 113 |
+
imports = [
|
| 114 |
+
ImportRef(
|
| 115 |
+
target_module=_resolve_relative_import(module_id, name),
|
| 116 |
+
import_line=line,
|
| 117 |
+
edge_type=EdgeType.EXPLICIT_IMPORT,
|
| 118 |
+
)
|
| 119 |
+
for name, line in visitor.imports
|
| 120 |
+
]
|
| 121 |
+
|
| 122 |
+
dependencies = [imp.target_module for imp in imports if imp.target_module]
|
| 123 |
+
|
| 124 |
+
return ParsedModule(
|
| 125 |
+
module_id=module_id,
|
| 126 |
+
raw_code=source,
|
| 127 |
+
function_signatures=visitor.function_signatures,
|
| 128 |
+
classes=visitor.classes,
|
| 129 |
+
imports=imports,
|
| 130 |
+
constants=visitor.constants,
|
| 131 |
+
dependencies=dependencies,
|
| 132 |
+
)
|
| 133 |
+
|
| 134 |
+
|
| 135 |
+
def parse_directory(target_dir: Path, db_path: str | None = None) -> Store:
|
| 136 |
+
target_dir = target_dir.resolve()
|
| 137 |
+
store = Store(source_root=str(target_dir), db_path=db_path)
|
| 138 |
+
store.clear_source_graph()
|
| 139 |
+
|
| 140 |
+
py_files = sorted(target_dir.rglob("*.py"))
|
| 141 |
+
for py_file in py_files:
|
| 142 |
+
parsed = parse_python_file(py_file, target_dir)
|
| 143 |
+
issues = run_linters(py_file)
|
| 144 |
+
summary = summarize_module(parsed, issues)
|
| 145 |
+
|
| 146 |
+
dep_reason = "Imports used by module-level and callable logic"
|
| 147 |
+
store.upsert_node(
|
| 148 |
+
module_id=parsed.module_id,
|
| 149 |
+
raw_code=parsed.raw_code,
|
| 150 |
+
ast_summary=summary,
|
| 151 |
+
dependency_reason=dep_reason,
|
| 152 |
+
)
|
| 153 |
+
store.replace_findings_for_module(
|
| 154 |
+
parsed.module_id,
|
| 155 |
+
[issue.model_dump() for issue in issues],
|
| 156 |
+
)
|
| 157 |
+
for imported in parsed.imports:
|
| 158 |
+
if imported.target_module:
|
| 159 |
+
store.upsert_edge(
|
| 160 |
+
source_module_id=parsed.module_id,
|
| 161 |
+
target_module_id=imported.target_module,
|
| 162 |
+
edge_type=imported.edge_type,
|
| 163 |
+
import_line=imported.import_line,
|
| 164 |
+
weight=1.0,
|
| 165 |
+
)
|
| 166 |
+
|
| 167 |
+
return store
|
| 168 |
+
|
| 169 |
+
|
| 170 |
+
def _build_parser() -> argparse.ArgumentParser:
|
| 171 |
+
parser = argparse.ArgumentParser(description="Parse Python codebase into SQLite graph")
|
| 172 |
+
parser.add_argument("target", help="Path to target codebase")
|
| 173 |
+
parser.add_argument("--db-path", default=None, help="Path to SQLite database")
|
| 174 |
+
return parser
|
| 175 |
+
|
| 176 |
+
|
| 177 |
+
def main() -> None:
|
| 178 |
+
args = _build_parser().parse_args()
|
| 179 |
+
target_dir = Path(args.target)
|
| 180 |
+
store = parse_directory(target_dir=target_dir, db_path=args.db_path)
|
| 181 |
+
snapshot = store.get_full_graph()
|
| 182 |
+
print(
|
| 183 |
+
f"Populated DB for {target_dir} with "
|
| 184 |
+
f"{len(snapshot.nodes)} nodes and {len(snapshot.edges)} edges"
|
| 185 |
+
)
|
| 186 |
+
|
| 187 |
+
|
| 188 |
+
if __name__ == "__main__":
|
| 189 |
+
main()
|
code-review-env/parser/linter.py
ADDED
|
@@ -0,0 +1,104 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
from __future__ import annotations
|
| 2 |
+
|
| 3 |
+
import json
|
| 4 |
+
import subprocess
|
| 5 |
+
from pathlib import Path
|
| 6 |
+
import sys
|
| 7 |
+
|
| 8 |
+
from pydantic import BaseModel
|
| 9 |
+
|
| 10 |
+
|
| 11 |
+
class LinterIssue(BaseModel):
|
| 12 |
+
tool: str
|
| 13 |
+
line: int
|
| 14 |
+
severity: str
|
| 15 |
+
code: str
|
| 16 |
+
message: str
|
| 17 |
+
|
| 18 |
+
|
| 19 |
+
_PYLINT_SEVERITY_MAP = {
|
| 20 |
+
"fatal": "high",
|
| 21 |
+
"error": "high",
|
| 22 |
+
"warning": "medium",
|
| 23 |
+
"refactor": "low",
|
| 24 |
+
"convention": "low",
|
| 25 |
+
"info": "low",
|
| 26 |
+
}
|
| 27 |
+
|
| 28 |
+
_BANDIT_SEVERITY_MAP = {
|
| 29 |
+
"high": "high",
|
| 30 |
+
"medium": "medium",
|
| 31 |
+
"low": "low",
|
| 32 |
+
}
|
| 33 |
+
|
| 34 |
+
|
| 35 |
+
def run_pylint(path: Path) -> list[LinterIssue]:
|
| 36 |
+
cmd = [
|
| 37 |
+
sys.executable,
|
| 38 |
+
"-m",
|
| 39 |
+
"pylint",
|
| 40 |
+
str(path),
|
| 41 |
+
"--output-format=json2",
|
| 42 |
+
"--score=n",
|
| 43 |
+
"--reports=n",
|
| 44 |
+
]
|
| 45 |
+
proc = subprocess.run(cmd, capture_output=True, text=True, check=False)
|
| 46 |
+
|
| 47 |
+
payload = (proc.stdout or "").strip()
|
| 48 |
+
if not payload:
|
| 49 |
+
return []
|
| 50 |
+
|
| 51 |
+
try:
|
| 52 |
+
data = json.loads(payload)
|
| 53 |
+
except json.JSONDecodeError:
|
| 54 |
+
return []
|
| 55 |
+
|
| 56 |
+
messages = data.get("messages", []) if isinstance(data, dict) else []
|
| 57 |
+
issues: list[LinterIssue] = []
|
| 58 |
+
for message in messages:
|
| 59 |
+
severity = _PYLINT_SEVERITY_MAP.get(str(message.get("type", "")).lower(), "low")
|
| 60 |
+
issues.append(
|
| 61 |
+
LinterIssue(
|
| 62 |
+
tool="pylint",
|
| 63 |
+
line=int(message.get("line", 0)),
|
| 64 |
+
severity=severity,
|
| 65 |
+
code=str(message.get("messageId", "PL0000")),
|
| 66 |
+
message=str(message.get("message", "")),
|
| 67 |
+
)
|
| 68 |
+
)
|
| 69 |
+
return issues
|
| 70 |
+
|
| 71 |
+
|
| 72 |
+
def run_bandit(path: Path) -> list[LinterIssue]:
|
| 73 |
+
cmd = [sys.executable, "-m", "bandit", "-q", "-f", "json", str(path)]
|
| 74 |
+
proc = subprocess.run(cmd, capture_output=True, text=True, check=False)
|
| 75 |
+
|
| 76 |
+
payload = (proc.stdout or "").strip()
|
| 77 |
+
if not payload:
|
| 78 |
+
return []
|
| 79 |
+
|
| 80 |
+
try:
|
| 81 |
+
data = json.loads(payload)
|
| 82 |
+
except json.JSONDecodeError:
|
| 83 |
+
return []
|
| 84 |
+
|
| 85 |
+
results = data.get("results", []) if isinstance(data, dict) else []
|
| 86 |
+
issues: list[LinterIssue] = []
|
| 87 |
+
for item in results:
|
| 88 |
+
raw_sev = str(item.get("issue_severity", "LOW")).lower()
|
| 89 |
+
issues.append(
|
| 90 |
+
LinterIssue(
|
| 91 |
+
tool="bandit",
|
| 92 |
+
line=int(item.get("line_number", 0)),
|
| 93 |
+
severity=_BANDIT_SEVERITY_MAP.get(raw_sev, "low"),
|
| 94 |
+
code=str(item.get("test_id", "B000")),
|
| 95 |
+
message=str(item.get("issue_text", "")),
|
| 96 |
+
)
|
| 97 |
+
)
|
| 98 |
+
return issues
|
| 99 |
+
|
| 100 |
+
|
| 101 |
+
def run_linters(path: Path) -> list[LinterIssue]:
|
| 102 |
+
issues = run_pylint(path)
|
| 103 |
+
issues.extend(run_bandit(path))
|
| 104 |
+
return issues
|
code-review-env/parser/summarizer.py
ADDED
|
@@ -0,0 +1,24 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
from __future__ import annotations
|
| 2 |
+
|
| 3 |
+
from typing import TYPE_CHECKING
|
| 4 |
+
|
| 5 |
+
from parser.linter import LinterIssue
|
| 6 |
+
|
| 7 |
+
if TYPE_CHECKING:
|
| 8 |
+
from parser.ast_parser import ParsedModule
|
| 9 |
+
|
| 10 |
+
|
| 11 |
+
def _truncate_tokens(text: str, max_tokens: int = 100) -> str:
|
| 12 |
+
words = text.split()
|
| 13 |
+
if len(words) <= max_tokens:
|
| 14 |
+
return text
|
| 15 |
+
return " ".join(words[:max_tokens])
|
| 16 |
+
|
| 17 |
+
|
| 18 |
+
def summarize_module(parsed: ParsedModule, issues: list[LinterIssue]) -> str:
|
| 19 |
+
exports = ", ".join(parsed.function_signatures[:5])
|
| 20 |
+
deps = ", ".join(sorted(set(parsed.dependencies))[:5])
|
| 21 |
+
summary = (
|
| 22 |
+
f"exports: [{exports}] | issues: {len(issues)} | depends_on: [{deps}]"
|
| 23 |
+
)
|
| 24 |
+
return _truncate_tokens(summary, max_tokens=100)
|
code-review-env/pyproject.toml
ADDED
|
@@ -0,0 +1,13 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
[build-system]
|
| 2 |
+
requires = ["setuptools>=68", "wheel"]
|
| 3 |
+
build-backend = "setuptools.build_meta"
|
| 4 |
+
|
| 5 |
+
[project]
|
| 6 |
+
name = "code-review-env"
|
| 7 |
+
version = "0.1.0"
|
| 8 |
+
description = "OpenEnv CodeReviewEnv"
|
| 9 |
+
requires-python = ">=3.11"
|
| 10 |
+
|
| 11 |
+
[tool.pytest.ini_options]
|
| 12 |
+
pythonpath = ["."]
|
| 13 |
+
testpaths = ["tests"]
|
code-review-env/requirements.txt
ADDED
|
@@ -0,0 +1,9 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
sqlmodel>=0.0.24
|
| 2 |
+
networkx>=3.2
|
| 3 |
+
pydantic>=2.7
|
| 4 |
+
pylint>=3.2
|
| 5 |
+
bandit>=1.7
|
| 6 |
+
fastapi>=0.115
|
| 7 |
+
uvicorn>=0.30
|
| 8 |
+
openai>=1.40
|
| 9 |
+
pytest>=8.2
|
code-review-env/sample_codebase/auth.py
ADDED
|
@@ -0,0 +1,7 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""Auth helpers."""
|
| 2 |
+
|
| 3 |
+
import config
|
| 4 |
+
|
| 5 |
+
|
| 6 |
+
def issue_session_token(user_id: str) -> str:
|
| 7 |
+
return f"{user_id}:{config.SECRET_KEY}:session-token-generated-with-a-very-long-suffix-that-triggers-style-rules-and-is-hard-to-read"
|
code-review-env/sample_codebase/cart.py
ADDED
|
@@ -0,0 +1,17 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""Cart calculations."""
|
| 2 |
+
|
| 3 |
+
import config
|
| 4 |
+
|
| 5 |
+
|
| 6 |
+
def calculate_subtotal(items: list[dict[str, float]]) -> float:
|
| 7 |
+
subtotal = 0.0
|
| 8 |
+
for item in items:
|
| 9 |
+
subtotal += float(item.get("price", 0.0)) * float(item.get("qty", 0.0))
|
| 10 |
+
return subtotal
|
| 11 |
+
|
| 12 |
+
|
| 13 |
+
def calculate_total(items: list[dict[str, float]]) -> float:
|
| 14 |
+
subtotal = calculate_subtotal(items)
|
| 15 |
+
# BUG: config.DISCOUNT_RATE is intended to be 0.20, but set to 20 in config.
|
| 16 |
+
discounted = subtotal - (subtotal * config.DISCOUNT_RATE)
|
| 17 |
+
return discounted + (discounted * config.TAX_RATE)
|
code-review-env/sample_codebase/checkout.py
ADDED
|
@@ -0,0 +1,15 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""Checkout flow."""
|
| 2 |
+
|
| 3 |
+
import cart
|
| 4 |
+
import payments
|
| 5 |
+
|
| 6 |
+
|
| 7 |
+
def submit_order(items: list[dict[str, float]]) -> str:
|
| 8 |
+
total = cart.calculate_total(items)
|
| 9 |
+
# Cascading symptom: negative total is observed here but root cause is config -> cart.
|
| 10 |
+
if total < 0:
|
| 11 |
+
return "error: negative total"
|
| 12 |
+
gateway_ok = payments.run_gateway_check("https://gateway.example.com/health")
|
| 13 |
+
if gateway_ok != 0:
|
| 14 |
+
return "error: gateway"
|
| 15 |
+
return payments.charge(total)
|
code-review-env/sample_codebase/config.py
ADDED
|
@@ -0,0 +1,6 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""Configuration defaults for the checkout flow."""
|
| 2 |
+
|
| 3 |
+
DISCOUNT_RATE = 20
|
| 4 |
+
TAX_RATE = 0.07
|
| 5 |
+
PAYMENT_TIMEOUT_SECONDS = 30
|
| 6 |
+
SECRET_KEY = "hardcoded-dev-key"
|
code-review-env/sample_codebase/ground_truth.json
ADDED
|
@@ -0,0 +1,39 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"issues": [
|
| 3 |
+
{
|
| 4 |
+
"id": "STYLE_001",
|
| 5 |
+
"module": "auth",
|
| 6 |
+
"line": 7,
|
| 7 |
+
"type": "style",
|
| 8 |
+
"tool": "pylint",
|
| 9 |
+
"code": "C0301",
|
| 10 |
+
"message_contains": "Line too long"
|
| 11 |
+
},
|
| 12 |
+
{
|
| 13 |
+
"id": "LOGIC_001",
|
| 14 |
+
"module": "checkout",
|
| 15 |
+
"line": 7,
|
| 16 |
+
"type": "logic",
|
| 17 |
+
"description": "Negative total symptom due to dependency behavior in cart"
|
| 18 |
+
},
|
| 19 |
+
{
|
| 20 |
+
"id": "SECURITY_001",
|
| 21 |
+
"module": "payments",
|
| 22 |
+
"line": 9,
|
| 23 |
+
"type": "security",
|
| 24 |
+
"tool": "bandit",
|
| 25 |
+
"code": "B602",
|
| 26 |
+
"message_contains": "shell=True"
|
| 27 |
+
},
|
| 28 |
+
{
|
| 29 |
+
"id": "CASCADE_001",
|
| 30 |
+
"module": "checkout",
|
| 31 |
+
"line": 7,
|
| 32 |
+
"type": "dependency_cascade",
|
| 33 |
+
"root_cause_module": "config",
|
| 34 |
+
"surface_module": "checkout",
|
| 35 |
+
"path": ["config", "cart", "checkout"],
|
| 36 |
+
"description": "DISCOUNT_RATE configured as 20 instead of 0.20 causes cart miscalculation and checkout failure"
|
| 37 |
+
}
|
| 38 |
+
]
|
| 39 |
+
}
|
code-review-env/sample_codebase/payments.py
ADDED
|
@@ -0,0 +1,15 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""Payment gateway wrapper."""
|
| 2 |
+
|
| 3 |
+
import subprocess
|
| 4 |
+
|
| 5 |
+
|
| 6 |
+
def run_gateway_check(endpoint: str) -> int:
|
| 7 |
+
# SECURITY ISSUE: user-provided endpoint is interpolated in a shell command.
|
| 8 |
+
command = f"curl -s {endpoint}"
|
| 9 |
+
return subprocess.call(command, shell=True)
|
| 10 |
+
|
| 11 |
+
|
| 12 |
+
def charge(total: float) -> str:
|
| 13 |
+
if total <= 0:
|
| 14 |
+
return "rejected"
|
| 15 |
+
return "charged"
|
code-review-env/server/__init__.py
ADDED
|
@@ -0,0 +1 @@
|
|
|
|
|
|
|
| 1 |
+
"""Server package placeholder for later phases."""
|
code-review-env/server/app.py
ADDED
|
@@ -0,0 +1 @@
|
|
|
|
|
|
|
| 1 |
+
"""Phase 3 implementation placeholder."""
|
code-review-env/tasks/__init__.py
ADDED
|
@@ -0,0 +1 @@
|
|
|
|
|
|
|
| 1 |
+
"""Tasks package placeholder for later phases."""
|
code-review-env/tasks/easy_task.py
ADDED
|
@@ -0,0 +1 @@
|
|
|
|
|
|
|
| 1 |
+
"""Phase 2+ implementation placeholder."""
|
code-review-env/tasks/hard_task.py
ADDED
|
@@ -0,0 +1 @@
|
|
|
|
|
|
|
| 1 |
+
"""Phase 2+ implementation placeholder."""
|
code-review-env/tasks/medium_task.py
ADDED
|
@@ -0,0 +1 @@
|
|
|
|
|
|
|
| 1 |
+
"""Phase 2+ implementation placeholder."""
|
code-review-env/tasks/task_registry.py
ADDED
|
@@ -0,0 +1 @@
|
|
|
|
|
|
|
| 1 |
+
"""Phase 2+ implementation placeholder."""
|
code-review-env/tests/test_environment.py
ADDED
|
@@ -0,0 +1,21 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
from pathlib import Path
|
| 2 |
+
|
| 3 |
+
from env.graph import DependencyGraph
|
| 4 |
+
|
| 5 |
+
|
| 6 |
+
def test_graph_builds_from_sample_codebase(tmp_path: Path) -> None:
|
| 7 |
+
db_path = tmp_path / "graph.db"
|
| 8 |
+
graph_mgr = DependencyGraph(target_dir="sample_codebase", db_path=db_path)
|
| 9 |
+
result = graph_mgr.load_or_build(force_reparse=True)
|
| 10 |
+
|
| 11 |
+
assert result.graph.number_of_nodes() >= 5
|
| 12 |
+
assert result.loaded_from_cache is False
|
| 13 |
+
|
| 14 |
+
|
| 15 |
+
def test_graph_second_load_uses_cache(tmp_path: Path) -> None:
|
| 16 |
+
db_path = tmp_path / "graph.db"
|
| 17 |
+
graph_mgr = DependencyGraph(target_dir="sample_codebase", db_path=db_path)
|
| 18 |
+
graph_mgr.load_or_build(force_reparse=True)
|
| 19 |
+
second = graph_mgr.load_or_build(force_reparse=False)
|
| 20 |
+
|
| 21 |
+
assert second.loaded_from_cache is True
|
code-review-env/tests/test_graders.py
ADDED
|
@@ -0,0 +1,2 @@
|
|
|
|
|
|
|
|
|
|
| 1 |
+
def test_phase1_placeholder() -> None:
|
| 2 |
+
assert True
|
code-review-env/tests/test_inference.py
ADDED
|
@@ -0,0 +1,2 @@
|
|
|
|
|
|
|
|
|
|
| 1 |
+
def test_inference_placeholder() -> None:
|
| 2 |
+
assert True
|
code-review-env/tests/test_parser.py
ADDED
|
@@ -0,0 +1,13 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
from pathlib import Path
|
| 2 |
+
|
| 3 |
+
from parser.ast_parser import parse_python_file
|
| 4 |
+
|
| 5 |
+
|
| 6 |
+
def test_parse_python_file_extracts_core_elements() -> None:
|
| 7 |
+
root = Path("sample_codebase")
|
| 8 |
+
path = root / "cart.py"
|
| 9 |
+
parsed = parse_python_file(path=path, root_dir=root)
|
| 10 |
+
|
| 11 |
+
assert parsed.module_id == "cart"
|
| 12 |
+
assert any(sig.startswith("calculate_total(") for sig in parsed.function_signatures)
|
| 13 |
+
assert "config" in " ".join(parsed.dependencies)
|