Spaces:

uvpatel7271
/

python-code-review-env

Runtime error

App Files Files Community

python-code-review-env / README.md

uvpatel7271

envrionment setup

0695520 about 2 months ago

4.64 kB

title: Python Code Review Environment
emoji: snake
colorFrom: yellow
colorTo: blue
sdk: docker
pinned: false
app_port: 8000
tags:
  - openenv
  - code-review
  - python

python_code_review_env

python_code_review_env is a production-style OpenEnv environment that simulates a realistic Python code review workflow. An agent inspects broken code, edits it, runs tests, and submits a final solution against deterministic graders for syntax repair, bug fixing, and optimization/refactoring.

Environment design

Observation includes task instructions, current code, syntax errors, public test output, action history, and remaining attempts.
Action is structured as analyze_code, edit_code, run_tests, or submit_solution.
Reward is shaped and non-binary. The environment awards syntax progress, test progress, correctness, and quality improvements while penalizing invalid actions, timeouts, regressions, and unchanged edits.
State exposes the internal episode snapshot through /state.

Task set

syntax_fix_invoice_totals (easy) Fix a syntax regression in an invoice normalization helper.
bug_fix_session_windows (medium) Repair a session-collapsing bug using deterministic public and hidden tests.
optimization_rank_active_users (hard) Refactor a slow ranking function and earn additional score from runtime improvement plus AST/style quality.

Action schema

{
  "action_type": "edit_code",
  "code": "def function(...):\n    ..."
}

Supported action_type values:

analyze_code
edit_code
run_tests
submit_solution

Observation schema

{
  "task_description": "...",
  "current_code": "...",
  "errors": "...",
  "test_results": "...",
  "history": []
}

The full observation also includes task_id, difficulty, task_kind, visible_tests, attempts_remaining, score, last_action_status, reward, done, and a structured reward_details breakdown.

Deterministic grading

Syntax tasks use compile() plus hidden behavioral checks.
Bug-fix tasks use deterministic function-call cases that behave like pytest assertions.
Optimization tasks combine correctness, runtime benchmarking, and AST/style quality scoring.
Infinite loops and long-running solutions are sandboxed with subprocess timeouts and receive penalties.
All scores are clamped to [0.0, 1.0].

Run locally

Install dependencies:

pip install .

Start the API server:

uvicorn server.app:app --host 0.0.0.0 --port 8000

Smoke-test the environment:

curl http://localhost:8000/health
curl http://localhost:8000/state

OpenEnv validation:

openenv validate

Docker build

The Docker image no longer depends on ghcr.io/meta-pytorch/openenv-base:latest, which removes the TLS handshake failure from the original build path.

docker build -t python-code-review-env -f server/Dockerfile .
docker run --rm -p 8000:8000 python-code-review-env

Expected health check:

curl http://localhost:8000/health

Hugging Face Spaces deployment

Create a Docker Space.
Push this repository content to the Space.
Ensure port 8000 is exposed.
Wait for the container to build.
Verify /reset and /health return 200.

The image is CPU-friendly and designed for a small Hugging Face Space such as 2 vCPU / 8 GB RAM.

Inference baseline

inference.py uses an OpenAI-compatible client:

client = OpenAI(base_url=API_BASE_URL, api_key=API_KEY)

Supported providers include:

Gemini through an OpenAI-compatible gateway
OpenRouter
Together AI
DeepSeek-compatible OpenAI endpoints

Run it with a free/open provider:

set API_BASE_URL=https://openrouter.ai/api/v1
set API_KEY=...
set MODEL=deepseek/deepseek-chat-v3-0324:free
python inference.py

If no credentials are supplied, the script falls back to a deterministic smoke-test policy that applies the reference fix for each task so the environment can still be validated end to end.

Example output:

Task 1 Score: 1.0
Task 2 Score: 1.0
Task 3 Score: 0.9
Final Score: 1.0

Project structure

python_env/
├── client.py
├── graders/
│   ├── bug_fix.py
│   ├── dispatch.py
│   ├── optimization.py
│   ├── shared.py
│   └── syntax.py
├── inference.py
├── models.py
├── openenv.yaml
├── README.md
├── server/
│   ├── app.py
│   ├── Dockerfile
│   ├── env.py
│   └── python_env_environment.py
└── tasks/
    └── catalog.py