File size: 4,298 Bytes
4433dc8
 
 
 
 
 
 
 
 
 
 
1b7b2a4
4433dc8
1b7b2a4
4433dc8
1b7b2a4
4433dc8
1b7b2a4
4433dc8
1b7b2a4
4433dc8
1b7b2a4
 
4433dc8
1b7b2a4
 
4433dc8
da8df85
4433dc8
da8df85
4433dc8
1b7b2a4
4433dc8
 
 
 
1b7b2a4
4433dc8
1b7b2a4
4433dc8
 
 
 
1b7b2a4
4433dc8
1b7b2a4
da8df85
 
4433dc8
da8df85
1b7b2a4
 
4433dc8
1b7b2a4
4433dc8
da8df85
4433dc8
 
 
 
 
 
 
 
 
 
 
da8df85
4433dc8
da8df85
4433dc8
da8df85
4433dc8
 
 
 
 
 
 
 
 
da8df85
4433dc8
da8df85
4433dc8
da8df85
4433dc8
1b7b2a4
 
da8df85
4433dc8
 
1b7b2a4
 
4433dc8
1b7b2a4
 
4433dc8
1b7b2a4
 
4433dc8
1b7b2a4
4433dc8
 
 
 
da8df85
1b7b2a4
4433dc8
da8df85
 
4433dc8
1b7b2a4
 
4433dc8
1b7b2a4
4433dc8
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
da8df85
 
4433dc8
da8df85
 
4433dc8
1b7b2a4
 
4433dc8
 
 
da8df85
 
4433dc8
da8df85
4433dc8
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
---
title: ADAPT DSA Tutor OpenEnv
sdk: docker
pinned: false
app_port: 7860
base_path: /web
tags:
  - openenv
  - reinforcement-learning
  - code-generation
---

# ADAPT DSA Tutor OpenEnv

ADAPT, the Adversarial DSA Tutor, is an OpenEnv-compliant RLVR environment for training code-generation agents on small DSA tasks. The agent receives a problem prompt, examples, and visible tests, then submits Python code. The environment runs the code against visible and hidden tests and returns reward, pass-rate metrics, execution status, and feedback.

This repo now focuses on the environment layer only. Verifier work and training scripts are owned separately.

## Why This Environment

The hackathon asks for OpenEnv environments that can improve LLM behavior through verifiable interaction. ADAPT targets a simple but useful skill loop:

```text
agent writes code -> environment executes it -> hidden tests and reward signals score it -> trainer improves the agent
```

The differentiator is curriculum-ready DSA practice: each episode carries a problem id and difficulty tier so training can track per-tier success instead of only aggregate reward.

## OpenEnv Interface

The environment uses the latest OpenEnv API shape:

- `AdaptEnvironment(Environment[AdaptAction, AdaptObservation, AdaptState])`
- `reset()` returns a typed observation.
- `step(action)` accepts an `AdaptAction` with a Python `code` string.
- `state` exposes episode id, step count, current problem id, difficulty, and recent metrics.

`openenv.yaml` points to:

```yaml
app: server.app:app
port: 7860
```

## Action

```python
{
    "code": "n = int(input())\nprint(n * 2)"
}
```

## Observation

Reset and step observations include:

- problem statement
- input format
- constraints
- examples
- visible tests
- problem id
- difficulty tier
- feedback
- pass rate, visible pass rate, and hidden pass rate
- syntax/runtime/timeout status
- reward components

Hidden test inputs and expected outputs are never returned in observations.

## Reward

Reward is clipped to `[0.0, 1.0]` and combines multiple environment-level signals:

- correctness from visible and hidden pass rate
- syntax validity
- clean execution
- output format compliance
- timeout penalty
- runtime error penalty
- static safety rejection for dangerous imports such as `os`, `subprocess`, `socket`, `pathlib`, and `shutil`

If `verifier.verifier.verify(code, test_cases)` exists, the environment can use it as an optional reward augmentation. If the verifier is absent, the environment still works using executor-derived reward.

## Local Setup

Use Python `3.10+`.

```powershell
cd C:\Users\kaust\PycharmProjects\meta-rl-dsa-solver
python -m venv .venv
.\.venv\Scripts\pip install -e .
```

For this local machine, the existing checked-out OpenEnv repo can also be used during development:

```powershell
$env:PYTHONPATH="C:\Users\kaust\PycharmProjects\OpenEnv\src;$PWD"
```

## Smoke Tests

Run the local smoke test:

```powershell
python test.py
```

Check syntax:

```powershell
python -m py_compile models.py env\adapt_env.py env\executor.py env\test_cases.py server\app.py
```

Start the OpenEnv server:

```powershell
uvicorn server.app:app --host 0.0.0.0 --port 7860
```

Useful endpoints:

- `GET /health`
- `GET /schema`
- `POST /reset`
- `POST /step`
- `GET /state`

Example step request:

```powershell
curl -X POST http://localhost:7860/step -H "Content-Type: application/json" -d "{\"action\":{\"code\":\"n=int(input())\nprint(n*2)\"}}"
```

Validate with OpenEnv once dependencies are installed:

```powershell
openenv validate .
```

## Hugging Face Spaces

This repo is Docker Space ready:

```powershell
openenv push --repo-id <your-hf-username>/adapt-dsa-tutor
```

Before final submission, add:

- live Hugging Face Space link
- training reward/loss plots from Disha's run
- before/after code example showing a problem the model failed before training and solved after training
- mini-blog or short video link

## Current Problem Bank

The environment includes a lightweight curated bank:

- `easy_double`
- `easy_sum_two`
- `medium_maximum`
- `medium_count_even`
- `hard_reverse_words`

This is intentionally small for submission-minimum stability. Later work can expand it to 30-50 tiered problems without changing the OpenEnv API.