Spaces:

uvpatel7271
/

python-code-review-env

Runtime error

App Files Files Community

python-code-review-env / README.md

uvpatel7271

envrionment setup

0695520 about 2 months ago

preview code

raw

history blame

4.64 kB

	---
	title: Python Code Review Environment
	emoji: snake
	colorFrom: yellow
	colorTo: blue
	sdk: docker
	pinned: false
	app_port: 8000
	tags:
	- openenv
	- code-review
	- python
	---

	# python_code_review_env

	`python_code_review_env` is a production-style OpenEnv environment that simulates a realistic Python code review workflow. An agent inspects broken code, edits it, runs tests, and submits a final solution against deterministic graders for syntax repair, bug fixing, and optimization/refactoring.

	## Environment design

	- `Observation` includes task instructions, current code, syntax errors, public test output, action history, and remaining attempts.
	- `Action` is structured as `analyze_code`, `edit_code`, `run_tests`, or `submit_solution`.
	- `Reward` is shaped and non-binary. The environment awards syntax progress, test progress, correctness, and quality improvements while penalizing invalid actions, timeouts, regressions, and unchanged edits.
	- `State` exposes the internal episode snapshot through `/state`.

	## Task set

	1. `syntax_fix_invoice_totals` (easy)
	Fix a syntax regression in an invoice normalization helper.
	2. `bug_fix_session_windows` (medium)
	Repair a session-collapsing bug using deterministic public and hidden tests.
	3. `optimization_rank_active_users` (hard)
	Refactor a slow ranking function and earn additional score from runtime improvement plus AST/style quality.

	## Action schema

	```json
	{
	"action_type": "edit_code",
	"code": "def function(...):\n ..."
	}
	```

	Supported `action_type` values:

	- `analyze_code`
	- `edit_code`
	- `run_tests`
	- `submit_solution`

	## Observation schema

	```json
	{
	"task_description": "...",
	"current_code": "...",
	"errors": "...",
	"test_results": "...",
	"history": []
	}
	```

	The full observation also includes `task_id`, `difficulty`, `task_kind`, `visible_tests`, `attempts_remaining`, `score`, `last_action_status`, `reward`, `done`, and a structured `reward_details` breakdown.

	## Deterministic grading

	- Syntax tasks use `compile()` plus hidden behavioral checks.
	- Bug-fix tasks use deterministic function-call cases that behave like pytest assertions.
	- Optimization tasks combine correctness, runtime benchmarking, and AST/style quality scoring.
	- Infinite loops and long-running solutions are sandboxed with subprocess timeouts and receive penalties.
	- All scores are clamped to `[0.0, 1.0]`.

	## Run locally

	Install dependencies:

	```bash
	pip install .
	```

	Start the API server:

	```bash
	uvicorn server.app:app --host 0.0.0.0 --port 8000
	```

	Smoke-test the environment:

	```bash
	curl http://localhost:8000/health
	curl http://localhost:8000/state
	```

	OpenEnv validation:

	```bash
	openenv validate
	```

	## Docker build

	The Docker image no longer depends on `ghcr.io/meta-pytorch/openenv-base:latest`, which removes the TLS handshake failure from the original build path.

	```bash
	docker build -t python-code-review-env -f server/Dockerfile .
	docker run --rm -p 8000:8000 python-code-review-env
	```

	Expected health check:

	```bash
	curl http://localhost:8000/health
	```

	## Hugging Face Spaces deployment

	1. Create a Docker Space.
	2. Push this repository content to the Space.
	3. Ensure port `8000` is exposed.
	4. Wait for the container to build.
	5. Verify `/reset` and `/health` return `200`.

	The image is CPU-friendly and designed for a small Hugging Face Space such as `2 vCPU / 8 GB RAM`.

	## Inference baseline

	`inference.py` uses an OpenAI-compatible client:

	```python
	client = OpenAI(base_url=API_BASE_URL, api_key=API_KEY)
	```

	Supported providers include:

	- Gemini through an OpenAI-compatible gateway
	- OpenRouter
	- Together AI
	- DeepSeek-compatible OpenAI endpoints

	Run it with a free/open provider:

	```bash
	set API_BASE_URL=https://openrouter.ai/api/v1
	set API_KEY=...
	set MODEL=deepseek/deepseek-chat-v3-0324:free
	python inference.py
	```

	If no credentials are supplied, the script falls back to a deterministic smoke-test policy that applies the reference fix for each task so the environment can still be validated end to end.

	Example output:

	```text
	Task 1 Score: 1.0
	Task 2 Score: 1.0
	Task 3 Score: 0.9
	Final Score: 1.0
	```

	## Project structure

	```text
	python_env/
	├── client.py
	├── graders/
	│ ├── bug_fix.py
	│ ├── dispatch.py
	│ ├── optimization.py
	│ ├── shared.py
	│ └── syntax.py
	├── inference.py
	├── models.py
	├── openenv.yaml
	├── README.md
	├── server/
	│ ├── app.py
	│ ├── Dockerfile
	│ ├── env.py
	│ └── python_env_environment.py
	└── tasks/
	└── catalog.py
	```