| --- |
| title: ADAPT DSA Tutor OpenEnv |
| sdk: docker |
| pinned: false |
| app_port: 7860 |
| base_path: /web |
| tags: |
| - openenv |
| - reinforcement-learning |
| - code-generation |
| --- |
| |
| # ADAPT DSA Tutor OpenEnv |
|
|
| ADAPT, the Adversarial DSA Tutor, is an OpenEnv-compliant RLVR environment for training code-generation agents on small DSA tasks. The agent receives a problem prompt, examples, and visible tests, then submits Python code. The environment runs the code against visible and hidden tests and returns reward, pass-rate metrics, execution status, and feedback. |
|
|
| This repo now focuses on the environment layer only. Verifier work and training scripts are owned separately. |
|
|
| ## Why This Environment |
|
|
| The hackathon asks for OpenEnv environments that can improve LLM behavior through verifiable interaction. ADAPT targets a simple but useful skill loop: |
|
|
| ```text |
| agent writes code -> environment executes it -> hidden tests and reward signals score it -> trainer improves the agent |
| ``` |
|
|
| The differentiator is curriculum-ready DSA practice: each episode carries a problem id and difficulty tier so training can track per-tier success instead of only aggregate reward. |
|
|
| ## OpenEnv Interface |
|
|
| The environment uses the latest OpenEnv API shape: |
|
|
| - `AdaptEnvironment(Environment[AdaptAction, AdaptObservation, AdaptState])` |
| - `reset()` returns a typed observation. |
| - `step(action)` accepts an `AdaptAction` with a Python `code` string. |
| - `state` exposes episode id, step count, current problem id, difficulty, and recent metrics. |
|
|
| `openenv.yaml` points to: |
|
|
| ```yaml |
| app: server.app:app |
| port: 7860 |
| ``` |
|
|
| ## Action |
|
|
| ```python |
| { |
| "code": "n = int(input())\nprint(n * 2)" |
| } |
| ``` |
|
|
| ## Observation |
|
|
| Reset and step observations include: |
|
|
| - problem statement |
| - input format |
| - constraints |
| - examples |
| - visible tests |
| - problem id |
| - difficulty tier |
| - feedback |
| - pass rate, visible pass rate, and hidden pass rate |
| - syntax/runtime/timeout status |
| - reward components |
|
|
| Hidden test inputs and expected outputs are never returned in observations. |
|
|
| ## Reward |
|
|
| Reward is clipped to `[0.0, 1.0]` and combines multiple environment-level signals: |
|
|
| - correctness from visible and hidden pass rate |
| - syntax validity |
| - clean execution |
| - output format compliance |
| - timeout penalty |
| - runtime error penalty |
| - static safety rejection for dangerous imports such as `os`, `subprocess`, `socket`, `pathlib`, and `shutil` |
|
|
| If `verifier.verifier.verify(code, test_cases)` exists, the environment can use it as an optional reward augmentation. If the verifier is absent, the environment still works using executor-derived reward. |
|
|
| ## Local Setup |
|
|
| Use Python `3.10+`. |
|
|
| ```powershell |
| cd C:\Users\kaust\PycharmProjects\meta-rl-dsa-solver |
| python -m venv .venv |
| .\.venv\Scripts\pip install -e . |
| ``` |
|
|
| For this local machine, the existing checked-out OpenEnv repo can also be used during development: |
|
|
| ```powershell |
| $env:PYTHONPATH="C:\Users\kaust\PycharmProjects\OpenEnv\src;$PWD" |
| ``` |
|
|
| ## Smoke Tests |
|
|
| Run the local smoke test: |
|
|
| ```powershell |
| python test.py |
| ``` |
|
|
| Check syntax: |
|
|
| ```powershell |
| python -m py_compile models.py env\adapt_env.py env\executor.py env\test_cases.py server\app.py |
| ``` |
|
|
| Start the OpenEnv server: |
|
|
| ```powershell |
| uvicorn server.app:app --host 0.0.0.0 --port 7860 |
| ``` |
|
|
| Useful endpoints: |
|
|
| - `GET /health` |
| - `GET /schema` |
| - `POST /reset` |
| - `POST /step` |
| - `GET /state` |
|
|
| Example step request: |
|
|
| ```powershell |
| curl -X POST http://localhost:7860/step -H "Content-Type: application/json" -d "{\"action\":{\"code\":\"n=int(input())\nprint(n*2)\"}}" |
| ``` |
|
|
| Validate with OpenEnv once dependencies are installed: |
|
|
| ```powershell |
| openenv validate . |
| ``` |
|
|
| ## Hugging Face Spaces |
|
|
| This repo is Docker Space ready: |
|
|
| ```powershell |
| openenv push --repo-id <your-hf-username>/adapt-dsa-tutor |
| ``` |
|
|
| Before final submission, add: |
|
|
| - live Hugging Face Space link |
| - training reward/loss plots from Disha's run |
| - before/after code example showing a problem the model failed before training and solved after training |
| - mini-blog or short video link |
|
|
| ## Current Problem Bank |
|
|
| The environment includes a lightweight curated bank: |
|
|
| - `easy_double` |
| - `easy_sum_two` |
| - `medium_maximum` |
| - `medium_count_even` |
| - `hard_reverse_words` |
|
|
| This is intentionally small for submission-minimum stability. Later work can expand it to 30-50 tiered problems without changing the OpenEnv API. |
|
|