s-shah4

Add V1 env

4433dc8 about 1 month ago

4.3 kB

	---
	title: ADAPT DSA Tutor OpenEnv
	sdk: docker
	pinned: false
	app_port: 7860
	base_path: /web
	tags:
	- openenv
	- reinforcement-learning
	- code-generation
	---

	# ADAPT DSA Tutor OpenEnv

	ADAPT, the Adversarial DSA Tutor, is an OpenEnv-compliant RLVR environment for training code-generation agents on small DSA tasks. The agent receives a problem prompt, examples, and visible tests, then submits Python code. The environment runs the code against visible and hidden tests and returns reward, pass-rate metrics, execution status, and feedback.

	This repo now focuses on the environment layer only. Verifier work and training scripts are owned separately.

	## Why This Environment

	The hackathon asks for OpenEnv environments that can improve LLM behavior through verifiable interaction. ADAPT targets a simple but useful skill loop:

	```text
	agent writes code -> environment executes it -> hidden tests and reward signals score it -> trainer improves the agent
	```

	The differentiator is curriculum-ready DSA practice: each episode carries a problem id and difficulty tier so training can track per-tier success instead of only aggregate reward.

	## OpenEnv Interface

	The environment uses the latest OpenEnv API shape:

	- `AdaptEnvironment(Environment[AdaptAction, AdaptObservation, AdaptState])`
	- `reset()` returns a typed observation.
	- `step(action)` accepts an `AdaptAction` with a Python `code` string.
	- `state` exposes episode id, step count, current problem id, difficulty, and recent metrics.

	`openenv.yaml` points to:

	```yaml
	app: server.app:app
	port: 7860
	```

	## Action

	```python
	{
	"code": "n = int(input())\nprint(n * 2)"
	}
	```

	## Observation

	Reset and step observations include:

	- problem statement
	- input format
	- constraints
	- examples
	- visible tests
	- problem id
	- difficulty tier
	- feedback
	- pass rate, visible pass rate, and hidden pass rate
	- syntax/runtime/timeout status
	- reward components

	Hidden test inputs and expected outputs are never returned in observations.

	## Reward

	Reward is clipped to `[0.0, 1.0]` and combines multiple environment-level signals:

	- correctness from visible and hidden pass rate
	- syntax validity
	- clean execution
	- output format compliance
	- timeout penalty
	- runtime error penalty
	- static safety rejection for dangerous imports such as `os`, `subprocess`, `socket`, `pathlib`, and `shutil`

	If `verifier.verifier.verify(code, test_cases)` exists, the environment can use it as an optional reward augmentation. If the verifier is absent, the environment still works using executor-derived reward.

	## Local Setup

	Use Python `3.10+`.

	```powershell
	cd C:\Users\kaust\PycharmProjects\meta-rl-dsa-solver
	python -m venv .venv
	.\.venv\Scripts\pip install -e .
	```

	For this local machine, the existing checked-out OpenEnv repo can also be used during development:

	```powershell
	$env:PYTHONPATH="C:\Users\kaust\PycharmProjects\OpenEnv\src;$PWD"
	```

	## Smoke Tests

	Run the local smoke test:

	```powershell
	python test.py
	```

	Check syntax:

	```powershell
	python -m py_compile models.py env\adapt_env.py env\executor.py env\test_cases.py server\app.py
	```

	Start the OpenEnv server:

	```powershell
	uvicorn server.app:app --host 0.0.0.0 --port 7860
	```

	Useful endpoints:

	- `GET /health`
	- `GET /schema`
	- `POST /reset`
	- `POST /step`
	- `GET /state`

	Example step request:

	```powershell
	curl -X POST http://localhost:7860/step -H "Content-Type: application/json" -d "{\"action\":{\"code\":\"n=int(input())\nprint(n*2)\"}}"
	```

	Validate with OpenEnv once dependencies are installed:

	```powershell
	openenv validate .
	```

	## Hugging Face Spaces

	This repo is Docker Space ready:

	```powershell
	openenv push --repo-id <your-hf-username>/adapt-dsa-tutor
	```

	Before final submission, add:

	- live Hugging Face Space link
	- training reward/loss plots from Disha's run
	- before/after code example showing a problem the model failed before training and solved after training
	- mini-blog or short video link

	## Current Problem Bank

	The environment includes a lightweight curated bank:

	- `easy_double`
	- `easy_sum_two`
	- `medium_maximum`
	- `medium_count_even`
	- `hard_reverse_words`

	This is intentionally small for submission-minimum stability. Later work can expand it to 30-50 tiered problems without changing the OpenEnv API.