File size: 4,298 Bytes
4433dc8 1b7b2a4 4433dc8 1b7b2a4 4433dc8 1b7b2a4 4433dc8 1b7b2a4 4433dc8 1b7b2a4 4433dc8 1b7b2a4 4433dc8 1b7b2a4 4433dc8 da8df85 4433dc8 da8df85 4433dc8 1b7b2a4 4433dc8 1b7b2a4 4433dc8 1b7b2a4 4433dc8 1b7b2a4 4433dc8 1b7b2a4 da8df85 4433dc8 da8df85 1b7b2a4 4433dc8 1b7b2a4 4433dc8 da8df85 4433dc8 da8df85 4433dc8 da8df85 4433dc8 da8df85 4433dc8 da8df85 4433dc8 da8df85 4433dc8 da8df85 4433dc8 1b7b2a4 da8df85 4433dc8 1b7b2a4 4433dc8 1b7b2a4 4433dc8 1b7b2a4 4433dc8 1b7b2a4 4433dc8 da8df85 1b7b2a4 4433dc8 da8df85 4433dc8 1b7b2a4 4433dc8 1b7b2a4 4433dc8 da8df85 4433dc8 da8df85 4433dc8 1b7b2a4 4433dc8 da8df85 4433dc8 da8df85 4433dc8 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 | ---
title: ADAPT DSA Tutor OpenEnv
sdk: docker
pinned: false
app_port: 7860
base_path: /web
tags:
- openenv
- reinforcement-learning
- code-generation
---
# ADAPT DSA Tutor OpenEnv
ADAPT, the Adversarial DSA Tutor, is an OpenEnv-compliant RLVR environment for training code-generation agents on small DSA tasks. The agent receives a problem prompt, examples, and visible tests, then submits Python code. The environment runs the code against visible and hidden tests and returns reward, pass-rate metrics, execution status, and feedback.
This repo now focuses on the environment layer only. Verifier work and training scripts are owned separately.
## Why This Environment
The hackathon asks for OpenEnv environments that can improve LLM behavior through verifiable interaction. ADAPT targets a simple but useful skill loop:
```text
agent writes code -> environment executes it -> hidden tests and reward signals score it -> trainer improves the agent
```
The differentiator is curriculum-ready DSA practice: each episode carries a problem id and difficulty tier so training can track per-tier success instead of only aggregate reward.
## OpenEnv Interface
The environment uses the latest OpenEnv API shape:
- `AdaptEnvironment(Environment[AdaptAction, AdaptObservation, AdaptState])`
- `reset()` returns a typed observation.
- `step(action)` accepts an `AdaptAction` with a Python `code` string.
- `state` exposes episode id, step count, current problem id, difficulty, and recent metrics.
`openenv.yaml` points to:
```yaml
app: server.app:app
port: 7860
```
## Action
```python
{
"code": "n = int(input())\nprint(n * 2)"
}
```
## Observation
Reset and step observations include:
- problem statement
- input format
- constraints
- examples
- visible tests
- problem id
- difficulty tier
- feedback
- pass rate, visible pass rate, and hidden pass rate
- syntax/runtime/timeout status
- reward components
Hidden test inputs and expected outputs are never returned in observations.
## Reward
Reward is clipped to `[0.0, 1.0]` and combines multiple environment-level signals:
- correctness from visible and hidden pass rate
- syntax validity
- clean execution
- output format compliance
- timeout penalty
- runtime error penalty
- static safety rejection for dangerous imports such as `os`, `subprocess`, `socket`, `pathlib`, and `shutil`
If `verifier.verifier.verify(code, test_cases)` exists, the environment can use it as an optional reward augmentation. If the verifier is absent, the environment still works using executor-derived reward.
## Local Setup
Use Python `3.10+`.
```powershell
cd C:\Users\kaust\PycharmProjects\meta-rl-dsa-solver
python -m venv .venv
.\.venv\Scripts\pip install -e .
```
For this local machine, the existing checked-out OpenEnv repo can also be used during development:
```powershell
$env:PYTHONPATH="C:\Users\kaust\PycharmProjects\OpenEnv\src;$PWD"
```
## Smoke Tests
Run the local smoke test:
```powershell
python test.py
```
Check syntax:
```powershell
python -m py_compile models.py env\adapt_env.py env\executor.py env\test_cases.py server\app.py
```
Start the OpenEnv server:
```powershell
uvicorn server.app:app --host 0.0.0.0 --port 7860
```
Useful endpoints:
- `GET /health`
- `GET /schema`
- `POST /reset`
- `POST /step`
- `GET /state`
Example step request:
```powershell
curl -X POST http://localhost:7860/step -H "Content-Type: application/json" -d "{\"action\":{\"code\":\"n=int(input())\nprint(n*2)\"}}"
```
Validate with OpenEnv once dependencies are installed:
```powershell
openenv validate .
```
## Hugging Face Spaces
This repo is Docker Space ready:
```powershell
openenv push --repo-id <your-hf-username>/adapt-dsa-tutor
```
Before final submission, add:
- live Hugging Face Space link
- training reward/loss plots from Disha's run
- before/after code example showing a problem the model failed before training and solved after training
- mini-blog or short video link
## Current Problem Bank
The environment includes a lightweight curated bank:
- `easy_double`
- `easy_sum_two`
- `medium_maximum`
- `medium_count_even`
- `hard_reverse_words`
This is intentionally small for submission-minimum stability. Later work can expand it to 30-50 tiered problems without changing the OpenEnv API.
|