databoysu commited on
Commit
ba3fae8
·
1 Parent(s): 5813a84
.gitignore CHANGED
@@ -1,3 +1,4 @@
1
  .venv
2
  .agents
3
  .env
 
 
1
  .venv
2
  .agents
3
  .env
4
+ uv.lock
CLAUDE.md ADDED
@@ -0,0 +1,356 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # CLAUDE.md — Python Debugging Gym (RL_ENV)
2
+
3
+ Codebase knowledge for AI assistants. Read before making changes.
4
+
5
+ **Phase status:**
6
+ - Phase 1 — REPLACE_LINES action space ✅
7
+ - Phase 1B — Hackathon compliance (clamper + inference.py) ✅
8
+ - Phase 2 — Mini-Git (UNDO_EDIT + RESET_TO_ORIGINAL) ✅
9
+ - Phase 3 — Curriculum learning + static task registry ✅
10
+ - Phase 4 — Constrained decoding + Chain-of-Thought + "Reasoning Model" support ✅
11
+
12
+ ---
13
+
14
+ ## File Map
15
+
16
+ ```
17
+ models.py ← Pydantic v1 schema (action + observation spaces)
18
+ tasks.py ← Static task registry (16 hardcoded curated tasks)
19
+ sandbox.py ← Isolated multiprocessing executor
20
+ environment.py ← RL state machine (reset/step/reward/clamping/curriculum)
21
+ context.py ← ±10-line localized view around last edit
22
+ server.py ← FastAPI WebSocket + HTTP server
23
+ inference.py ← Hackathon baseline agent (OpenAI client)
24
+ openenv.yaml ← OpenEnv metadata (validate-submission.sh)
25
+ Dockerfile ← 2-stage HuggingFace Spaces build (production)
26
+ requirements.txt
27
+
28
+ ── Offline tools (NOT in Docker image) ────────────────────────────────────
29
+ mutation_engine.py ← Bug injection operators — run locally to generate tasks
30
+ dataset_generator.py← Validate + build task dicts from base solutions
31
+ ```
32
+
33
+ **Critical:** `mutation_engine.py` and `dataset_generator.py` are **not copied into the Docker image**. They are local data-science tools only. `tasks.py` must not import them.
34
+
35
+ ---
36
+
37
+ ## `models.py`
38
+
39
+ - **Pydantic v1** (`pydantic==1.10.17`). Never upgrade — `.dict()`, `.parse_raw()`, `.json()`, `@validator` are v1 APIs used everywhere.
40
+ - `ActionType`: exactly **6** strings — `"VIEW_CODE"`, `"RUN_TESTS"`, `"REPLACE_LINES"`, `"UNDO_EDIT"`, `"RESET_TO_ORIGINAL"`, `"SUBMIT"`.
41
+ - `CodeAction(extra="forbid")` — any extra JSON key raises `ValidationError`.
42
+
43
+ ### CodeAction fields
44
+
45
+ | Field | Type | Required for |
46
+ |---|---|---|
47
+ | `thought` | `Optional[str]` | Always (Chain-of-thought scratchpad) |
48
+ | `action_type` | `ActionType` | always |
49
+ | `start_line` | `Optional[int]` (ge=1) | `REPLACE_LINES` |
50
+ | `end_line` | `Optional[int]` (ge=1) | `REPLACE_LINES` |
51
+ | `new_code_block` | `Optional[str]` | `REPLACE_LINES` |
52
+
53
+ No extra fields for `UNDO_EDIT` or `RESET_TO_ORIGINAL`.
54
+
55
+ ### CodeObservation key fields
56
+
57
+ | Field | Notes |
58
+ |---|---|
59
+ | `code_lines: List[str]` | Complete current source (authoritative) |
60
+ | `localized_context: str` | ±10 lines around last edit; empty until first REPLACE_LINES |
61
+ | `last_execution_output: str` | Tail of stdout+stderr from last RUN_TESTS/SUBMIT |
62
+ | `syntax_error: bool` | `ast.parse()` check, updated every step |
63
+ | `test_results: List[TestResult]` | Per-test pass/fail + error_message |
64
+ | `step_count / steps_remaining` | Progress vs MAX_STEPS=50 |
65
+ | `reward_last_step: float` | Per-step RL signal |
66
+ | `done: bool` | Episode ended |
67
+ | `info: dict` | `episode_id`, `task_name`, `task_difficulty` |
68
+
69
+ ---
70
+
71
+ ## `tasks.py` — Static Registry
72
+
73
+ **This file is a dumb registry.** It contains only hardcoded dicts — no imports from `mutation_engine` or `dataset_generator`. Zero cold-start cost; fully deterministic for evaluators.
74
+
75
+ To add new tasks: run `mutation_engine.py` + `dataset_generator.py` locally, curate the best outputs, paste them in as hardcoded dicts.
76
+
77
+ ### Exported symbols
78
+
79
+ | Symbol | Type | Description |
80
+ |---|---|---|
81
+ | `TASKS_BY_DIFFICULTY` | `Dict[str, List[Dict]]` | Tasks grouped by difficulty tier |
82
+ | `ALL_TASKS` | `List[Dict]` | Flat list of all tasks (for random sampling) |
83
+
84
+ **Current registry size:** `easy=4`, `medium=6`, `hard=6` → 16 tasks total.
85
+
86
+ ### Task dict schema
87
+
88
+ ```python
89
+ {
90
+ "name": str, # e.g. "binary_search_off_by_one"
91
+ "description": str,
92
+ "code": List[str], # buggy version, lines without trailing \n
93
+ "solution": List[str], # correct version
94
+ "tests": List[Callable],# accept (namespace_dict), raise AssertionError
95
+ "difficulty": str, # "easy" | "medium" | "hard"
96
+ "bug_type": str, # e.g. "wrong_operator" or "logic_inversion"
97
+ }
98
+ ```
99
+
100
+ ### Task catalogue
101
+
102
+ | Name | Bug | Difficulty |
103
+ |---|---|---|
104
+ | `sum_even_wrong_condition` | `!= 0` instead of `== 0` | easy |
105
+ | `sum_even_missing_accumulator` | `-=` instead of `+=` | easy |
106
+ | `reverse_string_wrong_step` | `[::-2]` instead of `[::-1]` | easy |
107
+ | `reverse_string_returns_original` | `[::1]` instead of `[::-1]` | easy |
108
+ | `binary_search_off_by_one` | `right = len(arr)` instead of `len(arr)-1` | medium |
109
+ | `binary_search_wrong_mid` | `left + right` instead of `(left + right) // 2` | medium |
110
+ | `flatten_missing_recursion` | `append` instead of `extend(flatten(item))` | medium |
111
+ | `flatten_inverted_branch` | `not isinstance` inverts the recursive branch | medium |
112
+ | `word_count_no_lower` | missing `text = text.lower()` | medium |
113
+ | `word_count_no_punct_strip` | missing punctuation stripping | medium |
114
+ | `lru_cache_wrong_eviction` | `pop(-1)` instead of `pop(0)` — evicts MRU | hard |
115
+ | `lru_cache_no_promotion` | `get()` doesn't move key to most-recently-used | hard |
116
+ | `valid_parentheses_wrong_mapping` | all three bracket mappings are wrong | hard |
117
+ | `valid_parentheses_no_empty_check` | missing `not stack or` guard before `pop()` | hard |
118
+ | `merge_intervals_strict_overlap` | `<` instead of `<=` — touching intervals not merged | hard |
119
+ | `merge_intervals_missing_sort` | missing `intervals.sort()` | hard |
120
+
121
+ ---
122
+
123
+ ## `environment.py`
124
+
125
+ ### Interface
126
+ ```python
127
+ obs, system_prompt = env.reset()
128
+ obs, reward, done, info = env.step(action: CodeAction)
129
+ ```
130
+
131
+ ### Reward constants
132
+ ```python
133
+ R_STEP_COST = -0.01 # every step (RL signal only)
134
+ R_RUN_TESTS = +0.10
135
+ R_PER_NEW_PASS = +0.05 # per newly passing test
136
+ R_INVALID_LINE = -0.02
137
+ R_SYNTAX_ERROR = -0.10 # inside _act_run_tests on syntax failure
138
+ R_UNDO_RESET = -0.10 # UNDO_EDIT and RESET_TO_ORIGINAL
139
+ MAX_STEPS = 50
140
+ ```
141
+
142
+ ### Episode state (ALL reset in `reset()`)
143
+
144
+ - **System Prompt**: Enforces SOP (Standard Operating Procedure: ORIENT → DIAGNOSE → FIX → VERIFY → REPEAT → SUBMIT) and strictly forbids consecutive `VIEW_CODE` calls.
145
+
146
+ | Field | Description |
147
+ |---|---|
148
+ | `_code_lines` | Working copy of buggy code |
149
+ | `_task` | Current task dict |
150
+ | `_step_count` | Steps this episode |
151
+ | `_prev_pass_count` | Test passes at last RUN_TESTS |
152
+ | `_last_test_results` | From last RUN_TESTS/SUBMIT |
153
+ | `_last_output` | Text output for observation |
154
+ | `_last_edited_line` | 1-indexed anchor for context.py |
155
+ | `_episode_id` | 8-char UUID prefix |
156
+ | `_done` | Episode finished |
157
+ | `_cumulative_reward` | Sum of all step rewards |
158
+ | `_accumulated_step_costs` | `count × 0.01` — used by hackathon clamper |
159
+ | `_original_code` | Deep copy of episode-start code; never mutated |
160
+ | `_edit_history` | Stack of `List[str]` snapshots; one pushed before each REPLACE_LINES |
161
+
162
+ `training_step: int = 0` — **not reset by `reset()`**. Persists across episodes. Set externally by trainer.
163
+
164
+ ### `_sample_task()` — Evaluation-safe curriculum sampler
165
+
166
+ Priority order:
167
+
168
+ 1. **`task_override=dict`** → return it directly (eval/test pinning)
169
+ 2. **`training_step == 0`** → random pick from `ALL_TASKS` ← **judge-safe default**
170
+ - The Meta evaluator calls `reset()` without setting `training_step`, so this must not crash or bias to one bucket
171
+ 3. **`training_step > 0`** → curriculum bucketing:
172
+ - `< 1000` → easy
173
+ - `1000 – 4999` → medium
174
+ - `>= 5000` → hard
175
+ - Falls back to any non-empty bucket if the target is empty
176
+
177
+ ### Action handlers
178
+
179
+ | Method | Delta reward | Key behavior |
180
+ |---|---|---|
181
+ | `_act_view_code()` | 0.0 | Sets `_last_output` with numbered source |
182
+ | `_act_run_tests()` | `R_RUN_TESTS` ± syntax ± new passes | Updates `_prev_pass_count` |
183
+ | `_act_replace_lines(s, e, block)` | 0.0 or `R_INVALID_LINE` | Snapshots before mutating; slice assign; anchor = end of new block; blocks deletion of >5 lines (`R_DESTRUCTIVE_PENALTY`) |
184
+ | `_act_undo_edit()` | `R_UNDO_RESET` (-0.10) | Pops `_edit_history`; sets `_last_edited_line = None` |
185
+ | `_act_reset_to_original()` | `R_UNDO_RESET` (-0.10) | Restores `_original_code`; clears `_edit_history`; sets `_last_edited_line = None` |
186
+ | `_act_submit()` | clamped [0.0, 1.0] | Hackathon score formula |
187
+
188
+ **Action Penalties**:
189
+ - **Anti-Loop**: `step()` applies an escalating `-0.05 * n` penalty if the agent chooses the exact same `action_type` repeatedly.
190
+ - **Escape Hatch Rule**: The prompt explicitly warns against manual space-fixing on syntax/indent errors, directing the agent to use `UNDO_EDIT` or `RESET_TO_ORIGINAL`.
191
+
192
+ ### Hackathon Reward Clamper (`_act_submit` & Timeout)
193
+
194
+ ```python
195
+ proportion = passes / total # 0.0 on syntax error
196
+ raw_score = proportion - self._accumulated_step_costs
197
+ final_score = max(0.0, min(1.0, raw_score))
198
+ ```
199
+
200
+ - **Deterministic Evaluation**: Floor ≥0.0 and <=1.0 guaranteed.
201
+ - **Trigger**: Runs on `SUBMIT` **or** when hitting `MAX_STEPS` timeout. Never trusts the LLM to call `SUBMIT`.
202
+ - Stored in `info["final_score"]` when `done=True`.
203
+
204
+ ---
205
+
206
+ ## `context.py`
207
+
208
+ `get_localized_context(code_lines, anchor_line, window=10) -> str`
209
+ - Returns `""` if `anchor_line is None` or `code_lines` is empty.
210
+ - Uses `len(code_lines)` dynamically — handles REPLACE_LINES growth/shrink correctly.
211
+ - Hard cap: `MAX_CONTEXT_CHARS = 2_000`.
212
+
213
+ ---
214
+
215
+ ## `sandbox.py`
216
+
217
+ `run_code_with_tests(source: str, callables, timeout=5) -> (output_str, List[TestResult], had_syntax_error)`
218
+
219
+ - **Always a 3-tuple.** Never access as an object (no `.all_pass`, no `.test_results`).
220
+ - `source` must be a `str` — call `"\n".join(code_lines)` before passing.
221
+ - Isolation: `multiprocessing.Process`, SIGTERM → SIGKILL on timeout.
222
+ - Output tail-truncated to `MAX_OUTPUT_CHARS = 1_000`.
223
+
224
+ ---
225
+
226
+ ## `server.py`
227
+
228
+ FastAPI WebSocket layer. Port: `os.environ.get("PORT", 7860)`.
229
+
230
+ | Endpoint | Notes |
231
+ |---|---|
232
+ | `GET /health` | Liveness probe |
233
+ | `GET /info` | Env metadata + `CodeAction.schema()` |
234
+ | `POST /reset` | Stateless, new env per request |
235
+ | `WS /ws` | Primary RL channel — auto-resets on `done=True`. Append `?difficulty=easy|medium|hard` to set tier. |
236
+
237
+ ---
238
+
239
+ ## `inference.py`
240
+
241
+ Config from `os.getenv`:
242
+
243
+ | Variable | Default | Notes |
244
+ |---|---|---|
245
+ | `API_BASE_URL` | `https://api.openai.com/v1` | OpenEnv compatible proxy URL |
246
+ | `MODEL_NAME` | `gpt-4o` | Robust fallback model if missing |
247
+ | `HF_TOKEN` | `""` | Optional HuggingFace Token |
248
+ | `ENV_WS_URL` | `ws://localhost:7860/ws` | Connecting environment URL |
249
+ | `DEBUG_LOG` | `0` | Set to `1` to print raw LLM output |
250
+
251
+ **CLI Flags:**
252
+ - `python inference.py --easy` (or `--medium`, `--hard`) appends `?difficulty=...` parameter to the WS URL to override `training_step` bucketing.
253
+
254
+ ### Decoding & Fallbacks
255
+
256
+ - **Structured Output**: Uses `json_schema` protocol with strict `CodeAction` forcing `thought` generation before `action_type`.
257
+ - **Reasoning Models**: Directly parses `.model_dump()["reasoning_content"]` if `content` is empty (e.g. DeepSeek-R1 / Nemotron in LM Studio).
258
+ - **Mask-Free Parser**: Invalid JSON explicitly returns `PARSE_ERROR` to the server (preventing silent `VIEW_CODE` loops), forcing LLM self-correction.
259
+
260
+ **Exact stdout log format (regex-parsed by validation judge):**
261
+ ```
262
+ [START] task=<task_name> env=PythonDebuggingGym model=<model_name>
263
+ [STEP] step=<n> action=<action_type> reward=<r.rr> done=<true|false> error=<msg|null>
264
+ [END] success=<true|false> steps=<n> score=<s.sss> rewards=<r1,r2,...,rn>
265
+ ```
266
+
267
+ - `reward` → `:.2f`; `done` → lowercase; `error` → `"null"` on success.
268
+ - `score` → `:.3f` — pulled from `info["final_score"]` (the clamped [0,1] value).
269
+ - `rewards` → comma-separated, no spaces.
270
+
271
+ ---
272
+
273
+ ## `openenv.yaml`
274
+
275
+ Consumed by `openenv validate` step in `validate-submission.sh`.
276
+
277
+ Key fields: `reward_range: [0.0, 1.0]`, `inference_script: inference.py`, `websocket_path: /ws`, `port: 7860`.
278
+
279
+ ---
280
+
281
+ ## `Dockerfile`
282
+
283
+ Two-stage build. Runtime COPY (all with `--chown=appuser:appuser`):
284
+ ```
285
+ models.py environment.py sandbox.py tasks.py
286
+ server.py context.py inference.py
287
+ ```
288
+
289
+ **`mutation_engine.py` and `dataset_generator.py` are NOT copied.** They are offline tools.
290
+
291
+ ---
292
+
293
+ ## Offline Tools (local only)
294
+
295
+ ### `mutation_engine.py`
296
+
297
+ `MutationEngine(seed).mutate(code_lines, difficulty, max_attempts=10)`
298
+ → `(List[str], {"bug_type": str, "num_bugs": int})` or `(None, None)`
299
+
300
+ Operator sets:
301
+
302
+ | Difficulty | Operators |
303
+ |---|---|
304
+ | easy | `_var_name_error`, `_wrong_operator` |
305
+ | medium | easy + `_off_by_one`, `_logic_inversion`, `_index_error` |
306
+ | hard | medium + `_mutable_default`, `_remove_return`, `_wrong_function_call` |
307
+
308
+ ### `dataset_generator.py`
309
+
310
+ `validate_task(original, mutated, tests)` — original must pass all tests; mutated must fail ≥ 1.
311
+ `generate_task(base_task, mutator)` — calls mutate + validate; returns task dict or `None`.
312
+
313
+ **Workflow to add new tasks:**
314
+ ```bash
315
+ python -c "
316
+ from mutation_engine import MutationEngine
317
+ from dataset_generator import generate_task
318
+ # define base_task with solution + tests
319
+ # run generate_task, inspect output, paste into tasks.py
320
+ "
321
+ ```
322
+
323
+ ---
324
+
325
+ ## Dependencies
326
+
327
+ ```
328
+ fastapi==0.111.0
329
+ uvicorn[standard]==0.30.1
330
+ pydantic==1.10.17 ← v1 ONLY
331
+ websockets==12.0
332
+ openai>=1.30.0 ← inference.py only
333
+ ```
334
+
335
+ IDE lint warnings for these packages are expected false-positives — they live in Docker, not system Python.
336
+
337
+ ---
338
+
339
+ ## Invariants
340
+
341
+ 1. **Pydantic v1 only.** Never upgrade.
342
+ 2. **1-indexed lines in public API**; 0-indexed in `_code_lines`.
343
+ 3. `reset()` wipes every mutable field including `_accumulated_step_costs`, `_original_code`, `_edit_history`. `training_step` is NOT reset.
344
+ 4. Reward delta model — handlers return delta; `R_STEP_COST` applied once per step before routing.
345
+ 5. REPLACE_LINES anchor = `min(start + len(new_lines) - 1, file_length)`.
346
+ 6. SUBMIT reward clamped `[0.0, 1.0]` — this is the grader score. Floor guaranteed ≥ 0.0.
347
+ 7. `_act_run_tests()` updates `_prev_pass_count`; `_act_submit()` does not.
348
+ 8. Task `code` strings have no trailing `\n`; `_source()` joins with `\n`.
349
+ 9. `context.py` is already fully dynamic — no changes needed for REPLACE_LINES growth/shrink.
350
+ 10. Output truncation is **tail-based** (end of traceback = actionable info).
351
+ 11. **Mini-Git snapshot timing**: snapshot pushed **before** slice assignment. Rejected edits (OOB, inverted range) produce no snapshot.
352
+ 12. **Context desync invariant**: Both rollback handlers set `_last_edited_line = None`. Without this, `context.py` anchors to a ghost line after revert.
353
+ 13. **`_original_code` is immutable**: set once in `reset()`, only read in `_act_reset_to_original()`.
354
+ 14. **`sandbox.run_code_with_tests` returns a 3-tuple**: `(output_str, List[TestResult], had_syntax_error)`. Never treat as object.
355
+ 15. **`tasks.py` must not import `mutation_engine` or `dataset_generator`**: those are offline tools not in the Docker image.
356
+ 16. **`training_step == 0` → random from ALL_TASKS**: the judge calls `reset()` with default `training_step=0`, so this path must work correctly and not bias to one difficulty bucket.
__pycache__/__init__.cpython-312.pyc ADDED
Binary file (388 Bytes). View file
 
__pycache__/client.cpython-312.pyc ADDED
Binary file (4.29 kB). View file
 
__pycache__/context.cpython-312.pyc ADDED
Binary file (3.42 kB). View file
 
__pycache__/environment.cpython-312.pyc ADDED
Binary file (26.2 kB). View file
 
__pycache__/models.cpython-312.pyc ADDED
Binary file (4.01 kB). View file
 
__pycache__/sandbox.cpython-312.pyc ADDED
Binary file (12.7 kB). View file
 
__pycache__/tasks.cpython-312.pyc ADDED
Binary file (16.7 kB). View file
 
inference.py CHANGED
@@ -35,9 +35,9 @@ from openai import OpenAI
35
  # ---------------------------------------------------------------------------
36
 
37
  API_BASE_URL: str = os.getenv("API_BASE_URL", "https://api.openai.com/v1")
38
- MODEL_NAME: str = os.getenv("MODEL_NAME", "nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-NVFP4")
39
  HF_TOKEN: str = os.getenv("HF_TOKEN", "")
40
- ENV_WS_URL: str = os.getenv("ENV_WS_URL", "ws://localhost:8000/ws")
41
 
42
  # ---------------------------------------------------------------------------
43
  # OpenAI client
 
35
  # ---------------------------------------------------------------------------
36
 
37
  API_BASE_URL: str = os.getenv("API_BASE_URL", "https://api.openai.com/v1")
38
+ MODEL_NAME: str = os.getenv("MODEL_NAME", "nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-FP8")
39
  HF_TOKEN: str = os.getenv("HF_TOKEN", "")
40
+ ENV_WS_URL: str = os.getenv("ENV_WS_URL", "ws://localhost:7860/ws")
41
 
42
  # ---------------------------------------------------------------------------
43
  # OpenAI client
openenv_python_debugging_gym.egg-info/PKG-INFO ADDED
@@ -0,0 +1,11 @@
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Metadata-Version: 2.4
2
+ Name: openenv-python-debugging-gym
3
+ Version: 0.1.0
4
+ Summary: Python Debugging Gym environment for OpenEnv
5
+ Requires-Python: >=3.10
6
+ Requires-Dist: openenv-core[core]>=0.2.2
7
+ Requires-Dist: openai>=1.30.0
8
+ Requires-Dist: websockets>=12.0
9
+ Provides-Extra: dev
10
+ Requires-Dist: pytest>=8.0.0; extra == "dev"
11
+ Requires-Dist: pytest-cov>=4.0.0; extra == "dev"
openenv_python_debugging_gym.egg-info/SOURCES.txt ADDED
@@ -0,0 +1,19 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ README.md
2
+ pyproject.toml
3
+ ./__init__.py
4
+ ./client.py
5
+ ./context.py
6
+ ./environment.py
7
+ ./inference.py
8
+ ./models.py
9
+ ./sandbox.py
10
+ ./tasks.py
11
+ openenv_python_debugging_gym.egg-info/PKG-INFO
12
+ openenv_python_debugging_gym.egg-info/SOURCES.txt
13
+ openenv_python_debugging_gym.egg-info/dependency_links.txt
14
+ openenv_python_debugging_gym.egg-info/entry_points.txt
15
+ openenv_python_debugging_gym.egg-info/requires.txt
16
+ openenv_python_debugging_gym.egg-info/top_level.txt
17
+ server/__init__.py
18
+ server/app.py
19
+ server/my_env_environment.py
openenv_python_debugging_gym.egg-info/dependency_links.txt ADDED
@@ -0,0 +1 @@
 
 
1
+
openenv_python_debugging_gym.egg-info/entry_points.txt ADDED
@@ -0,0 +1,2 @@
 
 
 
1
+ [console_scripts]
2
+ server = my_env.server.app:main
openenv_python_debugging_gym.egg-info/requires.txt ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ openenv-core[core]>=0.2.2
2
+ openai>=1.30.0
3
+ websockets>=12.0
4
+
5
+ [dev]
6
+ pytest>=8.0.0
7
+ pytest-cov>=4.0.0
openenv_python_debugging_gym.egg-info/top_level.txt ADDED
@@ -0,0 +1 @@
 
 
1
+ my_env
requirements.txt ADDED
@@ -0,0 +1,5 @@
 
 
 
 
 
 
1
+ fastapi==0.111.0
2
+ uvicorn[standard]==0.30.1
3
+ pydantic==1.10.17
4
+ websockets==12.0
5
+ openai>=1.30.0
server/Dockerfile CHANGED
@@ -21,8 +21,7 @@ RUN apt-get update && \
21
  rm -rf /var/lib/apt/lists/*
22
 
23
  # Build argument to control whether we're building standalone or in-repo
24
- ARG BUILD_MODE=in-repo
25
- ARG ENV_NAME=my_env
26
 
27
  # Copy environment code (always at root of build context)
28
  COPY . /app/env
@@ -73,8 +72,8 @@ ENV PYTHONPATH="/app/env:$PYTHONPATH"
73
 
74
  # Health check
75
  HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \
76
- CMD curl -f http://localhost:8000/health || exit 1
77
 
78
  # Run the FastAPI server
79
  # The module path is constructed to work with the /app/env structure
80
- CMD ["sh", "-c", "cd /app/env && uvicorn server.app:app --host 0.0.0.0 --port 8000"]
 
21
  rm -rf /var/lib/apt/lists/*
22
 
23
  # Build argument to control whether we're building standalone or in-repo
24
+ ARG BUILD_MODE=standalone
 
25
 
26
  # Copy environment code (always at root of build context)
27
  COPY . /app/env
 
72
 
73
  # Health check
74
  HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \
75
+ CMD curl -f http://localhost:7860/health || exit 1
76
 
77
  # Run the FastAPI server
78
  # The module path is constructed to work with the /app/env structure
79
+ CMD ["sh", "-c", "cd /app/env && uvicorn server.app:app --host 0.0.0.0 --port 7860"]
server/__pycache__/__init__.cpython-312.pyc ADDED
Binary file (305 Bytes). View file
 
server/__pycache__/app.cpython-312.pyc ADDED
Binary file (1.79 kB). View file
 
server/__pycache__/my_env_environment.cpython-312.pyc ADDED
Binary file (2.85 kB). View file
 
server/app.py CHANGED
@@ -41,7 +41,7 @@ def main() -> None:
41
  import uvicorn
42
 
43
  host = os.environ.get("HOST", "0.0.0.0")
44
- port = int(os.environ.get("PORT", "8000"))
45
  uvicorn.run(app, host=host, port=port)
46
 
47
 
 
41
  import uvicorn
42
 
43
  host = os.environ.get("HOST", "0.0.0.0")
44
+ port = int(os.environ.get("PORT", "7860"))
45
  uvicorn.run(app, host=host, port=port)
46
 
47
 
uv.lock CHANGED
@@ -1599,11 +1599,13 @@ core = [
1599
  ]
1600
 
1601
  [[package]]
1602
- name = "openenv-my-env"
1603
  version = "0.1.0"
1604
  source = { editable = "." }
1605
  dependencies = [
 
1606
  { name = "openenv-core", extra = ["core"] },
 
1607
  ]
1608
 
1609
  [package.optional-dependencies]
@@ -1614,9 +1616,11 @@ dev = [
1614
 
1615
  [package.metadata]
1616
  requires-dist = [
 
1617
  { name = "openenv-core", extras = ["core"], specifier = ">=0.2.2" },
1618
  { name = "pytest", marker = "extra == 'dev'", specifier = ">=8.0.0" },
1619
  { name = "pytest-cov", marker = "extra == 'dev'", specifier = ">=4.0.0" },
 
1620
  ]
1621
  provides-extras = ["dev"]
1622
 
 
1599
  ]
1600
 
1601
  [[package]]
1602
+ name = "openenv-python-debugging-gym"
1603
  version = "0.1.0"
1604
  source = { editable = "." }
1605
  dependencies = [
1606
+ { name = "openai" },
1607
  { name = "openenv-core", extra = ["core"] },
1608
+ { name = "websockets" },
1609
  ]
1610
 
1611
  [package.optional-dependencies]
 
1616
 
1617
  [package.metadata]
1618
  requires-dist = [
1619
+ { name = "openai", specifier = ">=1.30.0" },
1620
  { name = "openenv-core", extras = ["core"], specifier = ">=0.2.2" },
1621
  { name = "pytest", marker = "extra == 'dev'", specifier = ">=8.0.0" },
1622
  { name = "pytest-cov", marker = "extra == 'dev'", specifier = ">=4.0.0" },
1623
+ { name = "websockets", specifier = ">=12.0" },
1624
  ]
1625
  provides-extras = ["dev"]
1626