Spaces:

Pandaisop
/

codesensei-env

Sleeping

App Files Files Community

codesensei-env / README.md

vineetshukla.work@gmail.com

docs: rewrite README, clean up repo structure

f3f5cb0 about 1 month ago

preview code

raw

history blame contribute delete

3.91 kB

	---
	title: CodeSensei Environment
	emoji: 🧠
	colorFrom: purple
	colorTo: blue
	sdk: docker
	app_port: 7860
	license: mit
	short_description: RL environment for teaching LLMs to debug Python code
	---

	# CodeSensei

	An RL environment built on OpenEnv that trains LLMs to fix buggy Python code. The model gets a broken function, proposes a fix, runs tests, and learns from the results — basically the same loop a developer goes through when debugging, but automated with reinforcement learning.

	## How it works

	1. The environment picks a buggy Python function from the dataset
	2. The LLM reads the code + failing test output
	3. It proposes a corrected version
	4. We run the tests in a sandboxed subprocess
	5. A multi-signal reward tells the model what went well (or didn't)
	6. Repeat for up to 6 attempts per bug

	The reward isn't just pass/fail — it accounts for partial progress, syntax validity, code variety, and whether the model is actually improving or just submitting the same thing over and over.

	## Reward breakdown

	\| Signal \| When \| Value \|
	\|---\|---\|---\|
	\| All tests pass \| Bug fully fixed \| +2.0 \|
	\| More tests pass than before \| Making progress \| +0.5 \|
	\| No improvement over previous best \| Stuck \| -0.3 \|
	\| Code crashes at runtime \| Regression \| -0.5 \|
	\| Syntax error \| Invalid Python \| -1.0 \|
	\| Duplicate submission \| Same fix as before \| -0.5 \|

	## Project layout

	```
	├── inference.py # main inference script (OpenEnv submission)
	├── openenv.yaml # environment spec
	├── Dockerfile
	├── requirements.txt
	├── env/
	│ ├── client.py # async client with from_docker_image()
	│ ├── models.py # Action, Observation, State dataclasses
	│ ├── data/
	│ │ └── bug_dataset.json # 10 bugs with test suites
	│ └── server/
	│ ├── app.py # FastAPI — /reset, /step, /health, /ws
	│ ├── environment.py # core logic (reset/step/state)
	│ ├── sandbox.py # restricted code execution
	│ └── test_runner.py # runs tests against proposed fixes
	├── server/
	│ └── app.py # entry point for openenv validate
	├── training/
	│ └── colab_train.py # GRPO training (Colab T4)
	└── demo/
	└── app.py # Gradio demo
	```

	## Running locally

	```bash
	pip install -r requirements.txt
	uvicorn env.server.app:app --host 0.0.0.0 --port 7860
	```

	Then hit `POST /reset` with `{}` to start an episode, and `POST /step` with your fix to iterate.

	## Inference

	The inference script uses the OpenAI-compatible client pointed at HuggingFace's inference router. It connects to the environment via `from_docker_image()`, runs the debug loop, and logs everything in the required `[START]`/`[STEP]`/`[END]` format.

	```bash
	export HF_TOKEN="your_token"
	python inference.py
	```

	Default model is `Qwen/Qwen2.5-Coder-32B-Instruct` (free via HF router). You can swap it by setting `MODEL_NAME`.

	## Training

	Open `training/colab_train.py` in Google Colab with a T4 runtime. It uses GRPO from HuggingFace TRL with QLoRA (4-bit quantization + LoRA adapters) so the whole thing fits in 15GB VRAM. Checkpoints get pushed to HF Hub automatically.

	## API endpoints

	\| Method \| Path \| What it does \|
	\|---\|---\|---\|
	\| POST \| `/reset` \| Start a new debugging episode \|
	\| POST \| `/step` \| Submit a proposed fix \|
	\| GET \| `/state?session_id=X` \| Get current episode state \|
	\| GET \| `/health` \| Health check \|
	\| WS \| `/ws` \| WebSocket interface \|

	## Tech used

	- Environment: FastAPI + OpenEnv protocol
	- Training: TRL GRPO + QLoRA on Qwen2.5-Coder-32B-Instruct
	- Inference: OpenAI Python client → HuggingFace router (free tier)
	- Deployment: Docker on HF Spaces
	- Security: Code execution in sandboxed subprocesses with restricted builtins

	## License

	MIT