Spaces:
Sleeping
Sleeping
File size: 5,418 Bytes
33ef871 ba3fae8 33ef871 ba3fae8 33ef871 ba3fae8 33ef871 ba3fae8 33ef871 ba3fae8 33ef871 ba3fae8 33ef871 ba3fae8 33ef871 ba3fae8 33ef871 ba3fae8 33ef871 ba3fae8 33ef871 ba3fae8 33ef871 ba3fae8 33ef871 ba3fae8 33ef871 ba3fae8 33ef871 ba3fae8 33ef871 ba3fae8 33ef871 ba3fae8 33ef871 ba3fae8 33ef871 ba3fae8 33ef871 ba3fae8 33ef871 ba3fae8 33ef871 ba3fae8 33ef871 ba3fae8 33ef871 ba3fae8 33ef871 ba3fae8 33ef871 ba3fae8 33ef871 ba3fae8 33ef871 ba3fae8 33ef871 ba3fae8 33ef871 ba3fae8 33ef871 ba3fae8 33ef871 ba3fae8 33ef871 ba3fae8 33ef871 ba3fae8 33ef871 ba3fae8 33ef871 ba3fae8 33ef871 ba3fae8 33ef871 ba3fae8 33ef871 ba3fae8 33ef871 ba3fae8 33ef871 ba3fae8 33ef871 ba3fae8 33ef871 ba3fae8 33ef871 ba3fae8 33ef871 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 | # CLAUDE.md - TraceFix-RL (RL_ENV_FINAL)
Current, code-backed notes for assistants working in this repository.
Last updated: 2026-04-08
## Project Status Snapshot
- Repo: `code_reasoner_rl_env`
- Branch: `master`
- Working tree: dirty
- Modified: `.gitignore`, `inference.py`, `models.py`, `__pycache__/models.cpython-312.pyc`
- Untracked: `.hfignore`
- Last recorded pre-validation command in terminal:
- `./pre-val.sh https://sus-human-tracefix-rl.hf.space .`
- Exit code: `1`
This file describes the current implementation in `RL_ENV_FINAL` only.
## High-Level Architecture
- `environment.py`: core gym-style state machine (`TraceFixRLGym`)
- `server/tracefix_rl_environment.py`: OpenEnv adapter (`Environment` interface)
- `server/app.py`: FastAPI app creation and uvicorn entrypoint
- `models.py`: action/observation schemas (`CodeAction`, `CodeObservation`, `TestResult`)
- `sandbox.py`: isolated code execution + test running + timeout handling
- `tasks.py`: static task registry (easy/medium/hard)
- `context.py`: localized context windowing around last edit
- `client.py`: typed OpenEnv client (`TraceFixRLEnv` / `MyEnv`)
- `inference.py`: baseline agent runner with OpenAI-compatible API
- `openenv.yaml`: OpenEnv runtime metadata (`app: server.app:app`, `port: 7860`)
## Runtime and Entry Points
- Local server via project script:
- `uv run --project . server`
- Container command in `Dockerfile`:
- `uvicorn server.app:app --host 0.0.0.0 --port 7860`
- OpenEnv spec points to:
- `server.app:app`
## Environment Behavior (`environment.py`)
Action space:
- `VIEW_CODE`
- `RUN_TESTS`
- `REPLACE_LINES`
- `UNDO_EDIT`
- `RESET_TO_ORIGINAL`
- `SUBMIT`
Reward constants currently defined:
- `R_STEP_COST = -0.01`
- `R_RUN_TESTS = +0.10`
- `R_PER_NEW_PASS = +0.05`
- `R_SYNTAX_ERROR = -0.10`
- `R_INVALID_LINE = -0.02`
- `R_DESTRUCTIVE_PENALTY = -0.20`
- `R_UNDO_RESET = -0.10`
- `MAX_STEPS = 50`
Episode internals include:
- code snapshotting (`_original_code`, `_edit_history`)
- anti-loop penalty for repeated identical `action_type`
- contextual anchor (`_last_edited_line`) for localized context
- cumulative step-cost tracking (`_accumulated_step_costs`)
Submit scoring model:
- `proportion = passing_tests / total_tests` (or `0` on syntax error)
- `raw_score = proportion - _accumulated_step_costs`
- `final_score = clamp(raw_score, 0.0, 1.0)`
- same clamp model used on max-step timeout auto-evaluation
Task sampling policy:
- `training_step == 0`: random from `ALL_TASKS`
- `< 1000`: easy
- `< 5000`: medium
- `>= 5000`: hard
- fallback to first non-empty bucket
## Schema Notes (`models.py`)
Important: current code uses Pydantic v2-style validation APIs.
- `CodeAction` uses `@model_validator(mode="before")`
- Non-`REPLACE_LINES` actions force `start_line`, `end_line`, `new_code_block` to `None`
- `REPLACE_LINES` enforces required fields and 1-indexed positive range constraints
This is not compatible with Pydantic v1-only assumptions.
## Sandbox Notes (`sandbox.py`)
`run_code_with_tests(...)` returns a strict 3-tuple:
- `output_str`
- `List[TestResult>`
- `had_syntax_error: bool`
Execution safeguards:
- subprocess isolation via `multiprocessing.Process`
- timeout terminate/kill path
- tail truncation (`MAX_OUTPUT_CHARS = 1000`)
- restricted builtins to block risky operations
## Tasks Registry (`tasks.py`)
- Static hardcoded registry grouped by difficulty
- Exports:
- `TASKS_BY_DIFFICULTY`
- `ALL_TASKS`
- Expected total currently: 16 tasks
- easy: 4
- medium: 6
- hard: 6
## OpenEnv Adapter and Client
`server/tracefix_rl_environment.py`:
- Maps optional reset difficulty to `training_step` hints
- Writes `system_prompt` into observation metadata
- Sets observation reward/done from gym step output
`client.py`:
- Sends actions using `model_dump(exclude_none=True)`
- Parses OpenEnv payloads into typed `CodeObservation`
## Inference Runner (`inference.py`)
Key defaults:
- `API_BASE_URL = https://router.huggingface.co/v1`
- `MODEL_NAME = Qwen/Qwen2.5-72B-Instruct`
- `MAX_STEPS = 50`
- `SUCCESS_SCORE_THRESHOLD = 0.99`
- `THINKING_TOKEN_LIMIT = 512`
Behavior:
- Logs in strict sequence: `[START]`, repeated `[STEP]`, then `[END]`
- Uses JSON extraction fallback path from model text
- Falls back to `RUN_TESTS` on parse or validation failure
- Supports `--easy`, `--medium`, `--hard`, `--debug`
## Drift and Risk Notes
1. `requirements.txt` currently pins `pydantic==1.10.17`, but code in `models.py` uses v2 APIs (`model_validator`).
2. `pyproject.toml` is the active dependency source for `uv sync`; `requirements.txt` appears stale relative to runtime assumptions.
3. `environment.py` defines `R_SUBMIT_ALL_PASS` and `R_SUBMIT_FAIL`, but submit currently uses clamped proportion-minus-step-cost scoring instead of those constants.
4. `server/tracefix_rl_environment.py` advertises concurrent sessions support, while `create_app(..., max_concurrent_envs=1)` constrains server-level concurrency.
## Practical Checklist Before Validation
1. Confirm dependency source of truth (`pyproject.toml` vs `requirements.txt`) and align Pydantic version expectations.
2. Re-run pre-validation and capture the first failing check/output.
3. Remove tracked cache artifacts from version control if unintended (for example `__pycache__/*.pyc`).
4. Keep stdout format in `inference.py` unchanged, as validator parsing depends on it.
|