Spaces:

Pandaisop
/

codesensei-env

Sleeping

App Files Files Community

codesensei-env / README.md

vineetshukla.work@gmail.com

docs: rewrite README, clean up repo structure

f3f5cb0 about 1 month ago

preview code

raw

history blame contribute delete

3.91 kB

metadata

title: CodeSensei Environment
emoji: 🧠
colorFrom: purple
colorTo: blue
sdk: docker
app_port: 7860
license: mit
short_description: RL environment for teaching LLMs to debug Python code

CodeSensei

An RL environment built on OpenEnv that trains LLMs to fix buggy Python code. The model gets a broken function, proposes a fix, runs tests, and learns from the results — basically the same loop a developer goes through when debugging, but automated with reinforcement learning.

How it works

The environment picks a buggy Python function from the dataset
The LLM reads the code + failing test output
It proposes a corrected version
We run the tests in a sandboxed subprocess
A multi-signal reward tells the model what went well (or didn't)
Repeat for up to 6 attempts per bug

The reward isn't just pass/fail — it accounts for partial progress, syntax validity, code variety, and whether the model is actually improving or just submitting the same thing over and over.

Reward breakdown

Signal	When	Value
All tests pass	Bug fully fixed	+2.0
More tests pass than before	Making progress	+0.5
No improvement over previous best	Stuck	-0.3
Code crashes at runtime	Regression	-0.5
Syntax error	Invalid Python	-1.0
Duplicate submission	Same fix as before	-0.5

Project layout

├── inference.py             # main inference script (OpenEnv submission)
├── openenv.yaml             # environment spec
├── Dockerfile
├── requirements.txt
├── env/
│   ├── client.py            # async client with from_docker_image()
│   ├── models.py            # Action, Observation, State dataclasses
│   ├── data/
│   │   └── bug_dataset.json # 10 bugs with test suites
│   └── server/
│       ├── app.py           # FastAPI — /reset, /step, /health, /ws
│       ├── environment.py   # core logic (reset/step/state)
│       ├── sandbox.py       # restricted code execution
│       └── test_runner.py   # runs tests against proposed fixes
├── server/
│   └── app.py               # entry point for openenv validate
├── training/
│   └── colab_train.py       # GRPO training (Colab T4)
└── demo/
    └── app.py               # Gradio demo

Running locally

pip install -r requirements.txt
uvicorn env.server.app:app --host 0.0.0.0 --port 7860

Then hit POST /reset with {} to start an episode, and POST /step with your fix to iterate.

Inference

The inference script uses the OpenAI-compatible client pointed at HuggingFace's inference router. It connects to the environment via from_docker_image(), runs the debug loop, and logs everything in the required [START]/[STEP]/[END] format.

export HF_TOKEN="your_token"
python inference.py

Default model is Qwen/Qwen2.5-Coder-32B-Instruct (free via HF router). You can swap it by setting MODEL_NAME.

Training

Open training/colab_train.py in Google Colab with a T4 runtime. It uses GRPO from HuggingFace TRL with QLoRA (4-bit quantization + LoRA adapters) so the whole thing fits in 15GB VRAM. Checkpoints get pushed to HF Hub automatically.

API endpoints

Method	Path	What it does
POST	`/reset`	Start a new debugging episode
POST	`/step`	Submit a proposed fix
GET	`/state?session_id=X`	Get current episode state
GET	`/health`	Health check
WS	`/ws`	WebSocket interface

Tech used

Environment: FastAPI + OpenEnv protocol
Training: TRL GRPO + QLoRA on Qwen2.5-Coder-32B-Instruct
Inference: OpenAI Python client → HuggingFace router (free tier)
Deployment: Docker on HF Spaces
Security: Code execution in sandboxed subprocesses with restricted builtins

License

MIT