Spaces:
Sleeping
Sleeping
Upload folder using huggingface_hub
Browse files- README.md +456 -33
- graphify-out/GRAPH_REPORT.md +24 -19
- graphify-out/cache/28c79dacba9b7f6e353406b2afa843edb89da380af82cb43906368decabb8bb9.json +1 -0
- graphify-out/graph.html +0 -0
- graphify-out/graph.json +90 -90
- openenv.yaml +2 -1
- server/app.py +1 -1
README.md
CHANGED
|
@@ -4,61 +4,484 @@ emoji: 🧪
|
|
| 4 |
colorFrom: blue
|
| 5 |
colorTo: green
|
| 6 |
sdk: docker
|
|
|
|
| 7 |
pinned: false
|
| 8 |
base_path: /web
|
| 9 |
---
|
| 10 |
|
| 11 |
# DebugZero Environment
|
| 12 |
|
| 13 |
-
|
| 14 |
|
| 15 |
-
|
| 16 |
-
1. **Proposer**: Injects realistic bugs into clean Python functions using AST-level edits.
|
| 17 |
-
2. **Solver**: Fixes the bugs, practicing on the generated adversarial examples.
|
| 18 |
|
| 19 |
-
|
|
|
|
| 20 |
|
| 21 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 22 |
|
| 23 |
-
First, install dependencies:
|
| 24 |
```bash
|
| 25 |
-
|
| 26 |
```
|
| 27 |
|
| 28 |
-
## Running
|
| 29 |
|
| 30 |
-
|
| 31 |
|
| 32 |
```bash
|
| 33 |
-
|
| 34 |
```
|
| 35 |
|
| 36 |
-
|
| 37 |
-
[`notebooks/train_colab.ipynb`](notebooks/train_colab.ipynb). It installs DebugZero
|
| 38 |
-
from GitHub, connects through the packaged OpenEnv client, trains with TRL/Unsloth
|
| 39 |
-
against live `reset`/`step` environment rollouts, and saves reward/loss plots for
|
| 40 |
-
the README.
|
| 41 |
|
| 42 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 43 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 44 |
```
|
|
|
|
|
|
|
|
|
|
|
|
|
| 45 |
debugZero/
|
| 46 |
-
|
| 47 |
-
|
| 48 |
-
|
| 49 |
-
|
| 50 |
-
|
| 51 |
-
|
| 52 |
-
|
| 53 |
-
|
| 54 |
-
|
| 55 |
-
|
| 56 |
-
|
| 57 |
-
|
| 58 |
-
|
| 59 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 60 |
```
|
| 61 |
|
| 62 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 63 |
|
| 64 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 4 |
colorFrom: blue
|
| 5 |
colorTo: green
|
| 6 |
sdk: docker
|
| 7 |
+
app_port: 8000
|
| 8 |
pinned: false
|
| 9 |
base_path: /web
|
| 10 |
---
|
| 11 |
|
| 12 |
# DebugZero Environment
|
| 13 |
|
| 14 |
+
DebugZero is an OpenEnv environment for training code models through adversarial debugging self-play.
|
| 15 |
|
| 16 |
+
One model plays two roles:
|
|
|
|
|
|
|
| 17 |
|
| 18 |
+
1. **Proposer**: receives a clean Python function and submits a realistic buggy version.
|
| 19 |
+
2. **Solver**: receives the buggy function and submits a repaired version.
|
| 20 |
|
| 21 |
+
The environment executes the submitted code against tests in a constrained Python sandbox and returns structured OpenEnv observations. The training pipeline turns those observations into scalar rewards for GRPO/Unsloth training.
|
| 22 |
+
|
| 23 |
+
The goal is to teach an LLM a debugging skill that static supervised examples do not capture well: generating plausible failures, diagnosing them, and repairing code based on executable feedback.
|
| 24 |
+
|
| 25 |
+
## Submission Links
|
| 26 |
+
|
| 27 |
+
- **Hugging Face Space**: add the final submitted Space URL here before the deadline.
|
| 28 |
+
- **Training notebook**: [`notebooks/train_colab.ipynb`](notebooks/train_colab.ipynb)
|
| 29 |
+
- **OpenEnv manifest**: [`openenv.yaml`](openenv.yaml)
|
| 30 |
+
|
| 31 |
+
## Why This Environment Matters
|
| 32 |
+
|
| 33 |
+
Most code training data shows finished solutions. DebugZero instead creates a loop where the model has to reason about failure:
|
| 34 |
+
|
| 35 |
+
- What kind of bug would a real programmer accidentally introduce?
|
| 36 |
+
- Does the mutated program still parse and run?
|
| 37 |
+
- Does it fail tests for a meaningful reason?
|
| 38 |
+
- Can the solver recover the original intended behavior?
|
| 39 |
+
|
| 40 |
+
That makes the environment useful for training debugging, program repair, adversarial test thinking, and execution-grounded code reasoning.
|
| 41 |
+
|
| 42 |
+
## OpenEnv Integration
|
| 43 |
+
|
| 44 |
+
DebugZero uses the standard OpenEnv client/server pattern.
|
| 45 |
+
|
| 46 |
+
The manifest is:
|
| 47 |
+
|
| 48 |
+
```yaml
|
| 49 |
+
spec_version: 1
|
| 50 |
+
name: debugZero
|
| 51 |
+
type: space
|
| 52 |
+
runtime: fastapi
|
| 53 |
+
app: server.app:app
|
| 54 |
+
port: 8000
|
| 55 |
+
workers: 4
|
| 56 |
+
max_concurrent_envs: 100
|
| 57 |
+
```
|
| 58 |
+
|
| 59 |
+
The FastAPI app is created with OpenEnv's server helper in [`server/app.py`](server/app.py):
|
| 60 |
+
|
| 61 |
+
```python
|
| 62 |
+
app = create_app(
|
| 63 |
+
DebugzeroEnvironment,
|
| 64 |
+
DebugzeroAction,
|
| 65 |
+
DebugzeroObservation,
|
| 66 |
+
env_name="debugZero",
|
| 67 |
+
max_concurrent_envs=int(os.environ.get("MAX_CONCURRENT_ENVS", "100")),
|
| 68 |
+
)
|
| 69 |
+
```
|
| 70 |
+
|
| 71 |
+
Clients should interact with the environment through [`DebugzeroEnv`](client.py), not by importing server internals. The client serializes `DebugzeroAction` objects, parses OpenEnv `StepResult` payloads, and exposes the normal `reset`, `step`, and `state` flow.
|
| 72 |
+
|
| 73 |
+
## Episode Flow
|
| 74 |
+
|
| 75 |
+
Each episode is a two-turn game over one seed function.
|
| 76 |
+
|
| 77 |
+
### 1. Reset
|
| 78 |
+
|
| 79 |
+
`reset()` creates a fresh `DebugzeroState`:
|
| 80 |
+
|
| 81 |
+
- `episode_id`: new UUID
|
| 82 |
+
- `step_count`: `0`
|
| 83 |
+
- `seed_id`: currently `HumanEval/0`
|
| 84 |
+
- `original_code`: clean HumanEval seed implementation
|
| 85 |
+
- `current_code`: initially the same clean code
|
| 86 |
+
- `role_turn`: `proposer`
|
| 87 |
+
|
| 88 |
+
The reset observation tells the agent that the proposer acts first and provides the clean function.
|
| 89 |
+
|
| 90 |
+
### 2. Proposer Step
|
| 91 |
+
|
| 92 |
+
The proposer sends:
|
| 93 |
+
|
| 94 |
+
```json
|
| 95 |
+
{
|
| 96 |
+
"role": "proposer",
|
| 97 |
+
"code": "<complete mutated Python function>"
|
| 98 |
+
}
|
| 99 |
+
```
|
| 100 |
+
|
| 101 |
+
The environment:
|
| 102 |
+
|
| 103 |
+
1. Stores the submitted code as `current_code`.
|
| 104 |
+
2. Runs it with the seed tests using `execute_code`.
|
| 105 |
+
3. Returns an observation with:
|
| 106 |
+
- `role_next = "solver"`
|
| 107 |
+
- `tests_passed`
|
| 108 |
+
- `syntax_error`
|
| 109 |
+
- truncated `execution_result`
|
| 110 |
+
- `done = false`
|
| 111 |
+
|
| 112 |
+
A good proposer submission is syntax-valid, safe to execute, close to the original code, and causes tests to fail.
|
| 113 |
+
|
| 114 |
+
### 3. Solver Step
|
| 115 |
+
|
| 116 |
+
The solver sends:
|
| 117 |
+
|
| 118 |
+
```json
|
| 119 |
+
{
|
| 120 |
+
"role": "solver",
|
| 121 |
+
"code": "<complete repaired Python function>"
|
| 122 |
+
}
|
| 123 |
+
```
|
| 124 |
+
|
| 125 |
+
The environment:
|
| 126 |
+
|
| 127 |
+
1. Stores the submitted repair as `current_code`.
|
| 128 |
+
2. Runs it against the same tests.
|
| 129 |
+
3. Returns an observation with:
|
| 130 |
+
- `role_next = "end"`
|
| 131 |
+
- `tests_passed`
|
| 132 |
+
- `syntax_error`
|
| 133 |
+
- truncated `execution_result`
|
| 134 |
+
- `done = true`
|
| 135 |
+
|
| 136 |
+
A good solver submission passes tests without syntax errors.
|
| 137 |
+
|
| 138 |
+
## Action, Observation, and State Schemas
|
| 139 |
+
|
| 140 |
+
The OpenEnv models live in [`models.py`](models.py).
|
| 141 |
+
|
| 142 |
+
### Action
|
| 143 |
+
|
| 144 |
+
`DebugzeroAction` extends OpenEnv `Action`:
|
| 145 |
+
|
| 146 |
+
| Field | Type | Meaning |
|
| 147 |
+
| --- | --- | --- |
|
| 148 |
+
| `role` | `str` | Either `proposer` or `solver`. |
|
| 149 |
+
| `code` | `str` | The complete buggy or repaired Python function. |
|
| 150 |
+
|
| 151 |
+
### Observation
|
| 152 |
+
|
| 153 |
+
`DebugzeroObservation` extends OpenEnv `Observation`:
|
| 154 |
+
|
| 155 |
+
| Field | Type | Meaning |
|
| 156 |
+
| --- | --- | --- |
|
| 157 |
+
| `role_next` | `str` | Which role should act next. |
|
| 158 |
+
| `current_code` | `str` | Current code after reset or step. |
|
| 159 |
+
| `execution_result` | `str` | Captured stdout/stderr summary from sandbox execution. |
|
| 160 |
+
| `tests_passed` | `bool` | Whether the submitted code passed the environment tests. |
|
| 161 |
+
| `syntax_error` | `bool` | Whether parsing or execution produced a syntax error. |
|
| 162 |
+
| `done` | `bool` | OpenEnv completion flag. |
|
| 163 |
+
| `reward` | `float` | Server currently returns `0.0`; training code computes shaped rewards externally. |
|
| 164 |
+
|
| 165 |
+
### State
|
| 166 |
+
|
| 167 |
+
`DebugzeroState` extends OpenEnv `State`:
|
| 168 |
+
|
| 169 |
+
| Field | Type | Meaning |
|
| 170 |
+
| --- | --- | --- |
|
| 171 |
+
| `seed_id` | `str` | Identifier for the seed task. |
|
| 172 |
+
| `original_code` | `str` | Clean reference code. |
|
| 173 |
+
| `current_code` | `str` | Latest proposer or solver code. |
|
| 174 |
+
| `role_turn` | `str` | Internal turn marker: `proposer`, `solver`, or `end`. |
|
| 175 |
+
|
| 176 |
+
## Reward and Grading Logic
|
| 177 |
+
|
| 178 |
+
DebugZero separates **verification** from **reward shaping**.
|
| 179 |
+
|
| 180 |
+
- The OpenEnv server is the verifier. It runs submitted code and returns observations.
|
| 181 |
+
- The training layer is the grader. It reads `tests_passed`, `syntax_error`, plausibility, and solve history, then computes scalar rewards.
|
| 182 |
+
|
| 183 |
+
This is intentional: the same environment can support different reward rubrics without changing the OpenEnv API.
|
| 184 |
+
|
| 185 |
+
### Server Verifier
|
| 186 |
+
|
| 187 |
+
[`DebugzeroEnvironment.step`](server/debugZero_environment.py) always executes code and reports the result, but currently returns `reward=0.0` in the observation. The meaningful reward is computed by the training code from the observation fields.
|
| 188 |
+
|
| 189 |
+
For proposer actions:
|
| 190 |
+
|
| 191 |
+
- Syntax error: bad mutation.
|
| 192 |
+
- Tests still pass: mutation did not create a useful bug.
|
| 193 |
+
- Tests fail without syntax error: likely useful bug.
|
| 194 |
+
|
| 195 |
+
For solver actions:
|
| 196 |
+
|
| 197 |
+
- Tests pass without syntax error: solved.
|
| 198 |
+
- Tests fail or syntax error: not solved.
|
| 199 |
+
|
| 200 |
+
### Proposer Reward
|
| 201 |
+
|
| 202 |
+
Implemented in [`training/rewards.py`](training/rewards.py):
|
| 203 |
+
|
| 204 |
+
```python
|
| 205 |
+
reward = validity + plausibility + learnability
|
| 206 |
+
```
|
| 207 |
+
|
| 208 |
+
Components:
|
| 209 |
+
|
| 210 |
+
| Component | Logic | Reason |
|
| 211 |
+
| --- | --- | --- |
|
| 212 |
+
| `validity` | `-1.0` if syntax error, `+1.0` if tests fail, `0.0` if tests still pass | Rewards executable bugs, rejects broken syntax. |
|
| 213 |
+
| `plausibility` | AST similarity score from `compute_ast_distance` | Rewards small realistic edits over random corruption. |
|
| 214 |
+
| `learnability` | `+1.0` when recent solver success rate is between `0.1` and `0.9` | Rewards bugs that are neither trivial nor impossible. |
|
| 215 |
+
|
| 216 |
+
The proposer is therefore rewarded for bugs that are:
|
| 217 |
+
|
| 218 |
+
- valid Python,
|
| 219 |
+
- test-breaking,
|
| 220 |
+
- close to the original AST,
|
| 221 |
+
- useful training examples for the solver.
|
| 222 |
+
|
| 223 |
+
### Solver Reward
|
| 224 |
+
|
| 225 |
+
Implemented in [`training/rewards.py`](training/rewards.py):
|
| 226 |
+
|
| 227 |
+
```python
|
| 228 |
+
solved = tests_passed and not syntax_error
|
| 229 |
+
reward = 1.0 if solved else 0.0
|
| 230 |
+
```
|
| 231 |
+
|
| 232 |
+
Every solver result is recorded in a per-seed rolling deque of length `20`. The proposer uses this history through `get_solve_rate(seed_id)` to estimate whether a bug is learnable.
|
| 233 |
+
|
| 234 |
+
### Plausibility Grader
|
| 235 |
+
|
| 236 |
+
Implemented in [`server/plausibility.py`](server/plausibility.py).
|
| 237 |
+
|
| 238 |
+
The plausibility score compares AST dumps of the clean and mutated code using a Levenshtein-style fuzz ratio:
|
| 239 |
+
|
| 240 |
+
| AST similarity ratio | Score | Interpretation |
|
| 241 |
+
| --- | --- | --- |
|
| 242 |
+
| `100` | `0.0` | No edit, not a useful bug. |
|
| 243 |
+
| `85` to `99` | `1.0` | Small realistic mutation. |
|
| 244 |
+
| `50` to `84` | Linear decay down to `0.1` | Medium-sized change. |
|
| 245 |
+
| `< 50` | `0.0` | Too different, likely unrealistic. |
|
| 246 |
+
|
| 247 |
+
This discourages the proposer from replacing the whole function with nonsense.
|
| 248 |
+
|
| 249 |
+
### Notebook Reward
|
| 250 |
+
|
| 251 |
+
The Colab notebook at [`notebooks/train_colab.ipynb`](notebooks/train_colab.ipynb) uses the live OpenEnv server inside the reward function.
|
| 252 |
+
|
| 253 |
+
For each model completion, it:
|
| 254 |
+
|
| 255 |
+
1. Extracts Python code from the model output.
|
| 256 |
+
2. Calls `env.reset()`.
|
| 257 |
+
3. Calls `env.step(DebugzeroAction(...))`.
|
| 258 |
+
4. Computes reward from the returned observation.
|
| 259 |
+
|
| 260 |
+
That means training is connected to the real environment, not a static dataset. The notebook also evaluates baseline and trained policies and saves:
|
| 261 |
+
|
| 262 |
+
- `results/reward_curve.png`
|
| 263 |
+
- `results/loss_curve.png`
|
| 264 |
+
- `results/baseline_vs_trained_reward.png`
|
| 265 |
+
- `results/training_log.csv`
|
| 266 |
+
|
| 267 |
+
## Bug Injection Logic
|
| 268 |
+
|
| 269 |
+
The AST mutation engine lives in [`server/bug_injector.py`](server/bug_injector.py).
|
| 270 |
+
|
| 271 |
+
`inject_bug(original_code, proposed_operator)` parses the clean code, applies one AST mutation, unparses the result, and accepts it only if all safety checks pass.
|
| 272 |
+
|
| 273 |
+
Supported mutation operators:
|
| 274 |
+
|
| 275 |
+
| Operator | Example behavior |
|
| 276 |
+
| --- | --- |
|
| 277 |
+
| `off_by_one` | Integer constants are shifted by `+1` or `-1`. |
|
| 278 |
+
| `wrong_operator` | Comparisons and arithmetic operators are swapped, such as `<` to `>=` or `+` to `-`. |
|
| 279 |
+
| `wrong_builtin` | Built-ins are swapped, such as `min`/`max`, `any`/`all`, or `sum`/`len`. |
|
| 280 |
+
| `loop_boundary_shift` | `range(n)` becomes `range(n + 1)`, or a two-argument range shifts the start. |
|
| 281 |
+
| `condition_negation` | `if condition` becomes `if not condition`. |
|
| 282 |
+
| `missing_base_case` | A return inside an `if` body is replaced with `pass`. |
|
| 283 |
+
| `slice_boundary_corruption` | Slice lower or upper bounds are shifted. |
|
| 284 |
+
| `variable_swap` | Tuple assignment targets are swapped. |
|
| 285 |
+
|
| 286 |
+
Accepted mutations must satisfy four checks:
|
| 287 |
+
|
| 288 |
+
1. Original code parses.
|
| 289 |
+
2. Mutated code is actually different.
|
| 290 |
+
3. Mutated code does not include blocked imports.
|
| 291 |
+
4. Mutated code parses after mutation.
|
| 292 |
+
|
| 293 |
+
## Sandbox and Safety
|
| 294 |
+
|
| 295 |
+
Execution is handled by [`server/executor.py`](server/executor.py).
|
| 296 |
+
|
| 297 |
+
The executor builds:
|
| 298 |
+
|
| 299 |
+
```python
|
| 300 |
+
full_code = submitted_code + "\n\n" + tests
|
| 301 |
+
```
|
| 302 |
+
|
| 303 |
+
Then it validates and executes the code in a temporary file with a timeout.
|
| 304 |
+
|
| 305 |
+
Safety checks include:
|
| 306 |
+
|
| 307 |
+
- blocked imports: `os`, `sys`, `subprocess`, `shutil`, `pathlib`
|
| 308 |
+
- blocked built-ins: `__import__`, `eval`, `exec`, `open`
|
| 309 |
+
- AST parsing before execution
|
| 310 |
+
- AST walk to catch direct `Import`, `ImportFrom`, and blocked function calls
|
| 311 |
+
- subprocess timeout, currently `5` seconds
|
| 312 |
+
- temporary directory isolation for each execution
|
| 313 |
+
|
| 314 |
+
If code is unsafe but parses, the executor returns:
|
| 315 |
+
|
| 316 |
+
```text
|
| 317 |
+
Unsafe import detected.
|
| 318 |
+
```
|
| 319 |
+
|
| 320 |
+
If code does not parse, the executor returns a syntax-error observation.
|
| 321 |
+
|
| 322 |
+
## Training Pipeline
|
| 323 |
+
|
| 324 |
+
There are two training paths.
|
| 325 |
+
|
| 326 |
+
### Recommended: Colab Notebook
|
| 327 |
+
|
| 328 |
+
Use [`notebooks/train_colab.ipynb`](notebooks/train_colab.ipynb) for the hackathon submission.
|
| 329 |
+
|
| 330 |
+
It:
|
| 331 |
+
|
| 332 |
+
1. Installs DebugZero from GitHub.
|
| 333 |
+
2. Starts the packaged OpenEnv FastAPI server, or connects to a remote HF Space URL.
|
| 334 |
+
3. Runs an OpenEnv smoke test through `DebugzeroEnv`.
|
| 335 |
+
4. Builds prompts from live environment resets.
|
| 336 |
+
5. Uses TRL `GRPOTrainer`.
|
| 337 |
+
6. Uses Unsloth when available, with native TRL fallback.
|
| 338 |
+
7. Computes rewards through live `reset` and `step` calls.
|
| 339 |
+
8. Saves plots for the README and final presentation.
|
| 340 |
+
|
| 341 |
+
### Experimental Script
|
| 342 |
+
|
| 343 |
+
[`training/grpo_train.py`](training/grpo_train.py) contains an experimental GRPO trainer configuration and the richer reward functions from [`training/rewards.py`](training/rewards.py). It is useful as implementation reference, but the notebook is the clearer end-to-end artifact for judges because it connects directly to the environment and saves visible training evidence.
|
| 344 |
+
|
| 345 |
+
## Prompt Templates
|
| 346 |
+
|
| 347 |
+
[`training/dual_role_sampler.py`](training/dual_role_sampler.py) defines two role prompts.
|
| 348 |
+
|
| 349 |
+
The proposer prompt asks the model to:
|
| 350 |
+
|
| 351 |
+
- inject an adversarial but plausible bug,
|
| 352 |
+
- keep code syntax-valid,
|
| 353 |
+
- make the function fail tests,
|
| 354 |
+
- return only modified code.
|
| 355 |
+
|
| 356 |
+
The solver prompt asks the model to:
|
| 357 |
+
|
| 358 |
+
- inspect buggy code,
|
| 359 |
+
- repair it,
|
| 360 |
+
- return only corrected code.
|
| 361 |
+
|
| 362 |
+
## Evaluation
|
| 363 |
+
|
| 364 |
+
Tests live under [`eval/`](eval/).
|
| 365 |
+
|
| 366 |
+
Current checks cover:
|
| 367 |
+
|
| 368 |
+
- AST mutation behavior:
|
| 369 |
+
- missing base case,
|
| 370 |
+
- off-by-one mutation,
|
| 371 |
+
- loop boundary shift,
|
| 372 |
+
- wrong built-in,
|
| 373 |
+
- condition negation,
|
| 374 |
+
- safety checks.
|
| 375 |
+
- Executor behavior:
|
| 376 |
+
- safe code passes,
|
| 377 |
+
- blocked imports are rejected,
|
| 378 |
+
- syntax errors are rejected,
|
| 379 |
+
- correct code passes tests,
|
| 380 |
+
- buggy code fails tests.
|
| 381 |
+
|
| 382 |
+
There is also a plausibility evaluation scaffold in [`eval/plausibility_eval.py`](eval/plausibility_eval.py) for comparing generated bugs with human-like bugs from the navidadkhah dataset.
|
| 383 |
+
|
| 384 |
+
Run local checks with:
|
| 385 |
|
|
|
|
| 386 |
```bash
|
| 387 |
+
pytest eval
|
| 388 |
```
|
| 389 |
|
| 390 |
+
## Running Locally
|
| 391 |
|
| 392 |
+
Install dependencies:
|
| 393 |
|
| 394 |
```bash
|
| 395 |
+
uv sync
|
| 396 |
```
|
| 397 |
|
| 398 |
+
Start the OpenEnv server:
|
|
|
|
|
|
|
|
|
|
|
|
|
| 399 |
|
| 400 |
+
```bash
|
| 401 |
+
python -m uvicorn server.app:app --host 0.0.0.0 --port 8000
|
| 402 |
+
```
|
| 403 |
+
|
| 404 |
+
Smoke-test with the client:
|
| 405 |
|
| 406 |
+
```python
|
| 407 |
+
from debugZero.client import DebugzeroEnv
|
| 408 |
+
from debugZero.models import DebugzeroAction
|
| 409 |
+
|
| 410 |
+
with DebugzeroEnv(base_url="http://localhost:8000") as env:
|
| 411 |
+
obs = env.reset().observation
|
| 412 |
+
print(obs.role_next)
|
| 413 |
+
print(obs.current_code)
|
| 414 |
+
|
| 415 |
+
buggy = obs.current_code.replace("distance < threshold", "distance <= threshold")
|
| 416 |
+
result = env.step(DebugzeroAction(role="proposer", code=buggy))
|
| 417 |
+
print(result.observation.tests_passed)
|
| 418 |
```
|
| 419 |
+
|
| 420 |
+
## Repository Structure
|
| 421 |
+
|
| 422 |
+
```text
|
| 423 |
debugZero/
|
| 424 |
+
|-- openenv.yaml # OpenEnv manifest
|
| 425 |
+
|-- README.md # Project and submission documentation
|
| 426 |
+
|-- models.py # Action, observation, and state schemas
|
| 427 |
+
|-- client.py # OpenEnv client
|
| 428 |
+
|-- server/
|
| 429 |
+
| |-- app.py # FastAPI OpenEnv app
|
| 430 |
+
| |-- debugZero_environment.py # Environment state machine
|
| 431 |
+
| |-- executor.py # Code execution and safety checks
|
| 432 |
+
| |-- bug_injector.py # AST mutation engine
|
| 433 |
+
| |-- plausibility.py # AST similarity grader
|
| 434 |
+
| `-- requirements.txt # HF Space server dependencies
|
| 435 |
+
|-- training/
|
| 436 |
+
| |-- rewards.py # Proposer and solver reward functions
|
| 437 |
+
| |-- dual_role_sampler.py # Prompt templates
|
| 438 |
+
| `-- grpo_train.py # Experimental GRPO trainer script
|
| 439 |
+
|-- notebooks/
|
| 440 |
+
| `-- train_colab.ipynb # Recommended rerunnable training notebook
|
| 441 |
+
`-- eval/
|
| 442 |
+
|-- test_bug_injector.py # Mutation tests
|
| 443 |
+
|-- test_executor.py # Executor tests
|
| 444 |
+
`-- plausibility_eval.py # Plausibility evaluation scaffold
|
| 445 |
+
```
|
| 446 |
+
|
| 447 |
+
## Deployment Notes
|
| 448 |
+
|
| 449 |
+
The HF Space runs `server.app:app`, so imports are written to support both:
|
| 450 |
+
|
| 451 |
+
- top-level Space import mode: `server.app`
|
| 452 |
+
- installed package mode: `debugZero.server.app`
|
| 453 |
+
|
| 454 |
+
Server dependencies for the Space are in [`server/requirements.txt`](server/requirements.txt). The server requires `thefuzz` because `server/plausibility.py` imports it during app startup.
|
| 455 |
+
|
| 456 |
+
Because the Docker Space serves Uvicorn on port `8000`, the Hugging Face README metadata must include:
|
| 457 |
+
|
| 458 |
+
```yaml
|
| 459 |
+
sdk: docker
|
| 460 |
+
app_port: 8000
|
| 461 |
```
|
| 462 |
|
| 463 |
+
After pushing to Hugging Face, confirm:
|
| 464 |
+
|
| 465 |
+
- the Space builds successfully,
|
| 466 |
+
- `/schema` returns a valid OpenEnv schema,
|
| 467 |
+
- `reset` returns the HumanEval seed code,
|
| 468 |
+
- `step` returns `tests_passed` and `syntax_error`,
|
| 469 |
+
- the README links to the final Space URL and training evidence.
|
| 470 |
+
|
| 471 |
+
## Current Limitations and Next Steps
|
| 472 |
+
|
| 473 |
+
Current implementation details to be aware of:
|
| 474 |
+
|
| 475 |
+
- The server seed is currently a single HumanEval-style function, `HumanEval/0`.
|
| 476 |
+
- The server verifies behavior but does not emit shaped scalar rewards yet. Training computes those externally from observations.
|
| 477 |
+
- Tests are currently bundled in the environment seed. For a stronger benchmark, split public and hidden tests.
|
| 478 |
+
- The AST bug injector exists as a utility, while proposer actions currently submit full mutated code.
|
| 479 |
+
- The training notebook is the preferred proof artifact because it uses the live OpenEnv path and produces plots.
|
| 480 |
+
|
| 481 |
+
High-impact next steps:
|
| 482 |
|
| 483 |
+
- Add more HumanEval or curated seed tasks.
|
| 484 |
+
- Move shaped reward metadata into observations for easier external analysis.
|
| 485 |
+
- Add hidden tests and baseline-vs-trained examples to the README.
|
| 486 |
+
- Use the AST injector to generate proposer warm-start examples.
|
| 487 |
+
- Record qualitative before/after solver repairs for the final presentation.
|
graphify-out/GRAPH_REPORT.md
CHANGED
|
@@ -1,11 +1,11 @@
|
|
| 1 |
# Graph Report - C:\Users\astra\Desktop\hackon\debugZero (2026-04-25)
|
| 2 |
|
| 3 |
## Corpus Check
|
| 4 |
-
- 16 files · ~
|
| 5 |
- Verdict: corpus is large enough that graph structure adds value.
|
| 6 |
|
| 7 |
## Summary
|
| 8 |
-
- 85 nodes · 139 edges ·
|
| 9 |
- Extraction: 60% EXTRACTED · 40% INFERRED · 0% AMBIGUOUS · INFERRED: 55 edges (avg confidence: 0.62)
|
| 10 |
- Token cost: 0 input · 0 output
|
| 11 |
|
|
@@ -21,6 +21,7 @@
|
|
| 21 |
- [[_COMMUNITY_Community 8|Community 8]]
|
| 22 |
- [[_COMMUNITY_Community 9|Community 9]]
|
| 23 |
- [[_COMMUNITY_Community 10|Community 10]]
|
|
|
|
| 24 |
|
| 25 |
## God Nodes (most connected - your core abstractions)
|
| 26 |
1. `DebugzeroObservation` - 16 edges
|
|
@@ -35,15 +36,15 @@
|
|
| 35 |
10. `test_local_env()` - 5 edges
|
| 36 |
|
| 37 |
## Surprising Connections (you probably didn't know these)
|
| 38 |
-
- `DebugzeroEnv` --uses--> `
|
| 39 |
C:\Users\astra\Desktop\hackon\debugZero\client.py → C:\Users\astra\Desktop\hackon\debugZero\models.py
|
| 40 |
- `DebugzeroEnv` --uses--> `DebugzeroState` [INFERRED]
|
| 41 |
C:\Users\astra\Desktop\hackon\debugZero\client.py → C:\Users\astra\Desktop\hackon\debugZero\models.py
|
| 42 |
-
- `Client for the DebugZero Environment. This client maintains a persistent We` --uses--> `
|
| 43 |
C:\Users\astra\Desktop\hackon\debugZero\client.py → C:\Users\astra\Desktop\hackon\debugZero\models.py
|
| 44 |
- `Client for the DebugZero Environment. This client maintains a persistent We` --uses--> `DebugzeroState` [INFERRED]
|
| 45 |
C:\Users\astra\Desktop\hackon\debugZero\client.py → C:\Users\astra\Desktop\hackon\debugZero\models.py
|
| 46 |
-
- `Convert DebugzeroAction to JSON payload for step message. Args:` --uses--> `
|
| 47 |
C:\Users\astra\Desktop\hackon\debugZero\client.py → C:\Users\astra\Desktop\hackon\debugZero\models.py
|
| 48 |
|
| 49 |
## Communities
|
|
@@ -57,38 +58,42 @@ Cohesion: 0.2
|
|
| 57 |
Nodes (1): BugInjectorVisitor
|
| 58 |
|
| 59 |
### Community 2 - "Community 2"
|
| 60 |
-
Cohesion: 0.33
|
| 61 |
-
Nodes (7): DebugzeroEnv, Client for the DebugZero Environment. This client maintains a persistent We, Convert DebugzeroAction to JSON payload for step message. Args:, Parse server response into StepResult[DebugzeroObservation]. Args:, DebugzeroObservation, Observation from the DebugZero environment following sandbox execution., Observation
|
| 62 |
-
|
| 63 |
-
### Community 3 - "Community 3"
|
| 64 |
Cohesion: 0.28
|
| 65 |
Nodes (2): main(), Entry point for direct execution via uv run or python -m. This function ena
|
| 66 |
|
| 67 |
-
### Community
|
| 68 |
Cohesion: 0.33
|
| 69 |
Nodes (7): create_dataset(), main(), reward_fn(), compute_proposer_reward(), compute_solver_reward(), get_solve_rate(), record_solve_result()
|
| 70 |
|
| 71 |
-
### Community
|
| 72 |
-
Cohesion: 0.
|
| 73 |
-
Nodes (7): Action, Entry point for direct execution via uv run or python -m. This function ena,
|
| 74 |
|
| 75 |
-
### Community
|
| 76 |
Cohesion: 0.32
|
| 77 |
Nodes (6): execute_code(), is_safe(), Check if the code contains any blocked imports strings. Also performs a qu, Executes the provided python code alongside its tests in an isolated subprocess., test_execute_code(), test_executor_is_safe()
|
| 78 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 79 |
### Community 7 - "Community 7"
|
| 80 |
Cohesion: 0.47
|
| 81 |
Nodes (3): DebugzeroEnvironment, Environment, test_local_env()
|
| 82 |
|
| 83 |
### Community 8 - "Community 8"
|
| 84 |
-
Cohesion: 0.4
|
| 85 |
-
Nodes (4): Parse server response into State object. Args: payload: JSO, DebugzeroState, State for the DebugZero environment, extending default state with seed context., State
|
| 86 |
-
|
| 87 |
-
### Community 9 - "Community 9"
|
| 88 |
Cohesion: 0.33
|
| 89 |
Nodes (4): compute_ast_distance(), evaluate_navidadkhah_plausibility(), Offline evaluation of generated bugs against the navidadkhah 25k bug dataset., Computes the string similarity distance between the AST dumps of the original
|
| 90 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 91 |
### Community 10 - "Community 10"
|
|
|
|
|
|
|
|
|
|
|
|
|
| 92 |
Cohesion: 0.67
|
| 93 |
Nodes (0):
|
| 94 |
|
|
@@ -99,7 +104,7 @@ Nodes (0):
|
|
| 99 |
## Suggested Questions
|
| 100 |
_Questions this graph is uniquely positioned to answer:_
|
| 101 |
|
| 102 |
-
- **Why does `DebugzeroEnvironment` connect `Community 7` to `Community
|
| 103 |
_High betweenness centrality (0.213) - this node is a cross-community bridge._
|
| 104 |
- **Why does `BugInjectorVisitor` connect `Community 1` to `Community 0`?**
|
| 105 |
_High betweenness centrality (0.173) - this node is a cross-community bridge._
|
|
|
|
| 1 |
# Graph Report - C:\Users\astra\Desktop\hackon\debugZero (2026-04-25)
|
| 2 |
|
| 3 |
## Corpus Check
|
| 4 |
+
- 16 files · ~13,482 words
|
| 5 |
- Verdict: corpus is large enough that graph structure adds value.
|
| 6 |
|
| 7 |
## Summary
|
| 8 |
+
- 85 nodes · 139 edges · 12 communities detected
|
| 9 |
- Extraction: 60% EXTRACTED · 40% INFERRED · 0% AMBIGUOUS · INFERRED: 55 edges (avg confidence: 0.62)
|
| 10 |
- Token cost: 0 input · 0 output
|
| 11 |
|
|
|
|
| 21 |
- [[_COMMUNITY_Community 8|Community 8]]
|
| 22 |
- [[_COMMUNITY_Community 9|Community 9]]
|
| 23 |
- [[_COMMUNITY_Community 10|Community 10]]
|
| 24 |
+
- [[_COMMUNITY_Community 11|Community 11]]
|
| 25 |
|
| 26 |
## God Nodes (most connected - your core abstractions)
|
| 27 |
1. `DebugzeroObservation` - 16 edges
|
|
|
|
| 36 |
10. `test_local_env()` - 5 edges
|
| 37 |
|
| 38 |
## Surprising Connections (you probably didn't know these)
|
| 39 |
+
- `DebugzeroEnv` --uses--> `DebugzeroObservation` [INFERRED]
|
| 40 |
C:\Users\astra\Desktop\hackon\debugZero\client.py → C:\Users\astra\Desktop\hackon\debugZero\models.py
|
| 41 |
- `DebugzeroEnv` --uses--> `DebugzeroState` [INFERRED]
|
| 42 |
C:\Users\astra\Desktop\hackon\debugZero\client.py → C:\Users\astra\Desktop\hackon\debugZero\models.py
|
| 43 |
+
- `Client for the DebugZero Environment. This client maintains a persistent We` --uses--> `DebugzeroObservation` [INFERRED]
|
| 44 |
C:\Users\astra\Desktop\hackon\debugZero\client.py → C:\Users\astra\Desktop\hackon\debugZero\models.py
|
| 45 |
- `Client for the DebugZero Environment. This client maintains a persistent We` --uses--> `DebugzeroState` [INFERRED]
|
| 46 |
C:\Users\astra\Desktop\hackon\debugZero\client.py → C:\Users\astra\Desktop\hackon\debugZero\models.py
|
| 47 |
+
- `Convert DebugzeroAction to JSON payload for step message. Args:` --uses--> `DebugzeroObservation` [INFERRED]
|
| 48 |
C:\Users\astra\Desktop\hackon\debugZero\client.py → C:\Users\astra\Desktop\hackon\debugZero\models.py
|
| 49 |
|
| 50 |
## Communities
|
|
|
|
| 58 |
Nodes (1): BugInjectorVisitor
|
| 59 |
|
| 60 |
### Community 2 - "Community 2"
|
|
|
|
|
|
|
|
|
|
|
|
|
| 61 |
Cohesion: 0.28
|
| 62 |
Nodes (2): main(), Entry point for direct execution via uv run or python -m. This function ena
|
| 63 |
|
| 64 |
+
### Community 3 - "Community 3"
|
| 65 |
Cohesion: 0.33
|
| 66 |
Nodes (7): create_dataset(), main(), reward_fn(), compute_proposer_reward(), compute_solver_reward(), get_solve_rate(), record_solve_result()
|
| 67 |
|
| 68 |
+
### Community 4 - "Community 4"
|
| 69 |
+
Cohesion: 0.32
|
| 70 |
+
Nodes (7): Action, Entry point for direct execution via uv run or python -m. This function ena, DebugzeroEnv, Client for the DebugZero Environment. This client maintains a persistent We, Convert DebugzeroAction to JSON payload for step message. Args:, DebugzeroAction, Action for the DebugZero environment representing the Proposer or Solver inputs.
|
| 71 |
|
| 72 |
+
### Community 5 - "Community 5"
|
| 73 |
Cohesion: 0.32
|
| 74 |
Nodes (6): execute_code(), is_safe(), Check if the code contains any blocked imports strings. Also performs a qu, Executes the provided python code alongside its tests in an isolated subprocess., test_execute_code(), test_executor_is_safe()
|
| 75 |
|
| 76 |
+
### Community 6 - "Community 6"
|
| 77 |
+
Cohesion: 0.4
|
| 78 |
+
Nodes (4): Parse server response into State object. Args: payload: JSO, DebugzeroState, State for the DebugZero environment, extending default state with seed context., State
|
| 79 |
+
|
| 80 |
### Community 7 - "Community 7"
|
| 81 |
Cohesion: 0.47
|
| 82 |
Nodes (3): DebugzeroEnvironment, Environment, test_local_env()
|
| 83 |
|
| 84 |
### Community 8 - "Community 8"
|
|
|
|
|
|
|
|
|
|
|
|
|
| 85 |
Cohesion: 0.33
|
| 86 |
Nodes (4): compute_ast_distance(), evaluate_navidadkhah_plausibility(), Offline evaluation of generated bugs against the navidadkhah 25k bug dataset., Computes the string similarity distance between the AST dumps of the original
|
| 87 |
|
| 88 |
+
### Community 9 - "Community 9"
|
| 89 |
+
Cohesion: 0.5
|
| 90 |
+
Nodes (4): Parse server response into StepResult[DebugzeroObservation]. Args:, DebugzeroObservation, Observation from the DebugZero environment following sandbox execution., Observation
|
| 91 |
+
|
| 92 |
### Community 10 - "Community 10"
|
| 93 |
+
Cohesion: 0.5
|
| 94 |
+
Nodes (3): Dual-role DebugZero Environment wrapping a Python sandbox execution for Prop, Dual-role DebugZero Environment wrapping a Python sandbox execution for Prop, ExecutionResult
|
| 95 |
+
|
| 96 |
+
### Community 11 - "Community 11"
|
| 97 |
Cohesion: 0.67
|
| 98 |
Nodes (0):
|
| 99 |
|
|
|
|
| 104 |
## Suggested Questions
|
| 105 |
_Questions this graph is uniquely positioned to answer:_
|
| 106 |
|
| 107 |
+
- **Why does `DebugzeroEnvironment` connect `Community 7` to `Community 2`, `Community 4`, `Community 6`, `Community 9`, `Community 10`?**
|
| 108 |
_High betweenness centrality (0.213) - this node is a cross-community bridge._
|
| 109 |
- **Why does `BugInjectorVisitor` connect `Community 1` to `Community 0`?**
|
| 110 |
_High betweenness centrality (0.173) - this node is a cross-community bridge._
|
graphify-out/cache/28c79dacba9b7f6e353406b2afa843edb89da380af82cb43906368decabb8bb9.json
ADDED
|
@@ -0,0 +1 @@
|
|
|
|
|
|
|
| 1 |
+
{"nodes": [{"id": "c_users_astra_desktop_hackon_debugzero_server_app_py", "label": "app.py", "file_type": "code", "source_file": "C:\\Users\\astra\\Desktop\\hackon\\debugZero\\server\\app.py", "source_location": "L1"}, {"id": "app_main", "label": "main()", "file_type": "code", "source_file": "C:\\Users\\astra\\Desktop\\hackon\\debugZero\\server\\app.py", "source_location": "L63"}, {"id": "app_rationale_64", "label": "Entry point for direct execution via uv run or python -m. This function ena", "file_type": "rationale", "source_file": "C:\\Users\\astra\\Desktop\\hackon\\debugZero\\server\\app.py", "source_location": "L64"}], "edges": [{"source": "c_users_astra_desktop_hackon_debugzero_server_app_py", "target": "os", "relation": "imports", "confidence": "EXTRACTED", "source_file": "C:\\Users\\astra\\Desktop\\hackon\\debugZero\\server\\app.py", "source_location": "L30", "weight": 1.0}, {"source": "c_users_astra_desktop_hackon_debugzero_server_app_py", "target": "openenv_core_env_server_http_server", "relation": "imports_from", "confidence": "EXTRACTED", "source_file": "C:\\Users\\astra\\Desktop\\hackon\\debugZero\\server\\app.py", "source_location": "L36", "weight": 1.0}, {"source": "c_users_astra_desktop_hackon_debugzero_server_app_py", "target": "models", "relation": "imports_from", "confidence": "EXTRACTED", "source_file": "C:\\Users\\astra\\Desktop\\hackon\\debugZero\\server\\app.py", "source_location": "L43", "weight": 1.0}, {"source": "c_users_astra_desktop_hackon_debugzero_server_app_py", "target": "c_users_astra_desktop_hackon_debugzero_server_debugzero_environment_py", "relation": "imports_from", "confidence": "EXTRACTED", "source_file": "C:\\Users\\astra\\Desktop\\hackon\\debugZero\\server\\app.py", "source_location": "L44", "weight": 1.0}, {"source": "c_users_astra_desktop_hackon_debugzero_server_app_py", "target": "c_users_astra_desktop_hackon_debugzero_models_py", "relation": "imports_from", "confidence": "EXTRACTED", "source_file": "C:\\Users\\astra\\Desktop\\hackon\\debugZero\\server\\app.py", "source_location": "L46", "weight": 1.0}, {"source": "c_users_astra_desktop_hackon_debugzero_server_app_py", "target": "c_users_astra_desktop_hackon_debugzero_server_debugzero_environment_py", "relation": "imports_from", "confidence": "EXTRACTED", "source_file": "C:\\Users\\astra\\Desktop\\hackon\\debugZero\\server\\app.py", "source_location": "L47", "weight": 1.0}, {"source": "c_users_astra_desktop_hackon_debugzero_server_app_py", "target": "models", "relation": "imports_from", "confidence": "EXTRACTED", "source_file": "C:\\Users\\astra\\Desktop\\hackon\\debugZero\\server\\app.py", "source_location": "L49", "weight": 1.0}, {"source": "c_users_astra_desktop_hackon_debugzero_server_app_py", "target": "server_debugzero_environment", "relation": "imports_from", "confidence": "EXTRACTED", "source_file": "C:\\Users\\astra\\Desktop\\hackon\\debugZero\\server\\app.py", "source_location": "L50", "weight": 1.0}, {"source": "c_users_astra_desktop_hackon_debugzero_server_app_py", "target": "app_main", "relation": "contains", "confidence": "EXTRACTED", "source_file": "C:\\Users\\astra\\Desktop\\hackon\\debugZero\\server\\app.py", "source_location": "L63", "weight": 1.0}, {"source": "app_rationale_64", "target": "app_main", "relation": "rationale_for", "confidence": "EXTRACTED", "source_file": "C:\\Users\\astra\\Desktop\\hackon\\debugZero\\server\\app.py", "source_location": "L64", "weight": 1.0}], "raw_calls": [{"caller_nid": "app_main", "callee": "ArgumentParser", "source_file": "C:\\Users\\astra\\Desktop\\hackon\\debugZero\\server\\app.py", "source_location": "L75"}, {"caller_nid": "app_main", "callee": "add_argument", "source_file": "C:\\Users\\astra\\Desktop\\hackon\\debugZero\\server\\app.py", "source_location": "L76"}, {"caller_nid": "app_main", "callee": "add_argument", "source_file": "C:\\Users\\astra\\Desktop\\hackon\\debugZero\\server\\app.py", "source_location": "L77"}, {"caller_nid": "app_main", "callee": "parse_args", "source_file": "C:\\Users\\astra\\Desktop\\hackon\\debugZero\\server\\app.py", "source_location": "L78"}, {"caller_nid": "app_main", "callee": "run", "source_file": "C:\\Users\\astra\\Desktop\\hackon\\debugZero\\server\\app.py", "source_location": "L80"}]}
|
graphify-out/graph.html
CHANGED
|
The diff for this file is too large to render.
See raw diff
|
|
|
graphify-out/graph.json
CHANGED
|
@@ -9,7 +9,7 @@
|
|
| 9 |
"source_file": "C:\\Users\\astra\\Desktop\\hackon\\debugZero\\client.py",
|
| 10 |
"source_location": "L1",
|
| 11 |
"id": "c_users_astra_desktop_hackon_debugzero_client_py",
|
| 12 |
-
"community":
|
| 13 |
"norm_label": "client.py"
|
| 14 |
},
|
| 15 |
{
|
|
@@ -18,7 +18,7 @@
|
|
| 18 |
"source_file": "C:\\Users\\astra\\Desktop\\hackon\\debugZero\\client.py",
|
| 19 |
"source_location": "L18",
|
| 20 |
"id": "client_debugzeroenv",
|
| 21 |
-
"community":
|
| 22 |
"norm_label": "debugzeroenv"
|
| 23 |
},
|
| 24 |
{
|
|
@@ -27,7 +27,7 @@
|
|
| 27 |
"source_file": "C:\\Users\\astra\\Desktop\\hackon\\debugZero\\client.py",
|
| 28 |
"source_location": "L47",
|
| 29 |
"id": "client_debugzeroenv_step_payload",
|
| 30 |
-
"community":
|
| 31 |
"norm_label": "._step_payload()"
|
| 32 |
},
|
| 33 |
{
|
|
@@ -36,7 +36,7 @@
|
|
| 36 |
"source_file": "C:\\Users\\astra\\Desktop\\hackon\\debugZero\\client.py",
|
| 37 |
"source_location": "L62",
|
| 38 |
"id": "client_debugzeroenv_parse_result",
|
| 39 |
-
"community":
|
| 40 |
"norm_label": "._parse_result()"
|
| 41 |
},
|
| 42 |
{
|
|
@@ -45,7 +45,7 @@
|
|
| 45 |
"source_file": "C:\\Users\\astra\\Desktop\\hackon\\debugZero\\client.py",
|
| 46 |
"source_location": "L90",
|
| 47 |
"id": "client_debugzeroenv_parse_state",
|
| 48 |
-
"community":
|
| 49 |
"norm_label": "._parse_state()"
|
| 50 |
},
|
| 51 |
{
|
|
@@ -53,7 +53,7 @@
|
|
| 53 |
"file_type": "rationale",
|
| 54 |
"source_file": "C:\\Users\\astra\\Desktop\\hackon\\debugZero\\client.py",
|
| 55 |
"source_location": "L21",
|
| 56 |
-
"community":
|
| 57 |
"norm_label": "client for the debugzero environment. this client maintains a persistent we",
|
| 58 |
"id": "client_rationale_21"
|
| 59 |
},
|
|
@@ -62,7 +62,7 @@
|
|
| 62 |
"file_type": "rationale",
|
| 63 |
"source_file": "C:\\Users\\astra\\Desktop\\hackon\\debugZero\\client.py",
|
| 64 |
"source_location": "L48",
|
| 65 |
-
"community":
|
| 66 |
"norm_label": "convert debugzeroaction to json payload for step message. args:",
|
| 67 |
"id": "client_rationale_48"
|
| 68 |
},
|
|
@@ -71,7 +71,7 @@
|
|
| 71 |
"file_type": "rationale",
|
| 72 |
"source_file": "C:\\Users\\astra\\Desktop\\hackon\\debugZero\\client.py",
|
| 73 |
"source_location": "L63",
|
| 74 |
-
"community":
|
| 75 |
"norm_label": "parse server response into stepresult[debugzeroobservation]. args:",
|
| 76 |
"id": "client_rationale_63"
|
| 77 |
},
|
|
@@ -80,7 +80,7 @@
|
|
| 80 |
"file_type": "rationale",
|
| 81 |
"source_file": "C:\\Users\\astra\\Desktop\\hackon\\debugZero\\client.py",
|
| 82 |
"source_location": "L91",
|
| 83 |
-
"community":
|
| 84 |
"norm_label": "parse server response into state object. args: payload: jso",
|
| 85 |
"id": "client_rationale_91"
|
| 86 |
},
|
|
@@ -90,7 +90,7 @@
|
|
| 90 |
"source_file": "C:\\Users\\astra\\Desktop\\hackon\\debugZero\\models.py",
|
| 91 |
"source_location": "L1",
|
| 92 |
"id": "c_users_astra_desktop_hackon_debugzero_models_py",
|
| 93 |
-
"community":
|
| 94 |
"norm_label": "models.py"
|
| 95 |
},
|
| 96 |
{
|
|
@@ -99,7 +99,7 @@
|
|
| 99 |
"source_file": "C:\\Users\\astra\\Desktop\\hackon\\debugZero\\models.py",
|
| 100 |
"source_location": "L18",
|
| 101 |
"id": "models_debugzeroaction",
|
| 102 |
-
"community":
|
| 103 |
"norm_label": "debugzeroaction"
|
| 104 |
},
|
| 105 |
{
|
|
@@ -108,7 +108,7 @@
|
|
| 108 |
"source_file": "",
|
| 109 |
"source_location": "",
|
| 110 |
"id": "action",
|
| 111 |
-
"community":
|
| 112 |
"norm_label": "action"
|
| 113 |
},
|
| 114 |
{
|
|
@@ -117,7 +117,7 @@
|
|
| 117 |
"source_file": "C:\\Users\\astra\\Desktop\\hackon\\debugZero\\models.py",
|
| 118 |
"source_location": "L25",
|
| 119 |
"id": "models_debugzeroobservation",
|
| 120 |
-
"community":
|
| 121 |
"norm_label": "debugzeroobservation"
|
| 122 |
},
|
| 123 |
{
|
|
@@ -126,7 +126,7 @@
|
|
| 126 |
"source_file": "",
|
| 127 |
"source_location": "",
|
| 128 |
"id": "observation",
|
| 129 |
-
"community":
|
| 130 |
"norm_label": "observation"
|
| 131 |
},
|
| 132 |
{
|
|
@@ -135,7 +135,7 @@
|
|
| 135 |
"source_file": "C:\\Users\\astra\\Desktop\\hackon\\debugZero\\models.py",
|
| 136 |
"source_location": "L34",
|
| 137 |
"id": "models_debugzerostate",
|
| 138 |
-
"community":
|
| 139 |
"norm_label": "debugzerostate"
|
| 140 |
},
|
| 141 |
{
|
|
@@ -144,7 +144,7 @@
|
|
| 144 |
"source_file": "",
|
| 145 |
"source_location": "",
|
| 146 |
"id": "state",
|
| 147 |
-
"community":
|
| 148 |
"norm_label": "state"
|
| 149 |
},
|
| 150 |
{
|
|
@@ -152,7 +152,7 @@
|
|
| 152 |
"file_type": "rationale",
|
| 153 |
"source_file": "C:\\Users\\astra\\Desktop\\hackon\\debugZero\\models.py",
|
| 154 |
"source_location": "L19",
|
| 155 |
-
"community":
|
| 156 |
"norm_label": "action for the debugzero environment representing the proposer or solver inputs.",
|
| 157 |
"id": "models_rationale_19"
|
| 158 |
},
|
|
@@ -161,7 +161,7 @@
|
|
| 161 |
"file_type": "rationale",
|
| 162 |
"source_file": "C:\\Users\\astra\\Desktop\\hackon\\debugZero\\models.py",
|
| 163 |
"source_location": "L26",
|
| 164 |
-
"community":
|
| 165 |
"norm_label": "observation from the debugzero environment following sandbox execution.",
|
| 166 |
"id": "models_rationale_26"
|
| 167 |
},
|
|
@@ -170,7 +170,7 @@
|
|
| 170 |
"file_type": "rationale",
|
| 171 |
"source_file": "C:\\Users\\astra\\Desktop\\hackon\\debugZero\\models.py",
|
| 172 |
"source_location": "L35",
|
| 173 |
-
"community":
|
| 174 |
"norm_label": "state for the debugzero environment, extending default state with seed context.",
|
| 175 |
"id": "models_rationale_35"
|
| 176 |
},
|
|
@@ -198,7 +198,7 @@
|
|
| 198 |
"source_file": "C:\\Users\\astra\\Desktop\\hackon\\debugZero\\__init__.py",
|
| 199 |
"source_location": "L1",
|
| 200 |
"id": "c_users_astra_desktop_hackon_debugzero_init_py",
|
| 201 |
-
"community":
|
| 202 |
"norm_label": "__init__.py"
|
| 203 |
},
|
| 204 |
{
|
|
@@ -207,7 +207,7 @@
|
|
| 207 |
"source_file": "C:\\Users\\astra\\Desktop\\hackon\\debugZero\\eval\\plausibility_eval.py",
|
| 208 |
"source_location": "L1",
|
| 209 |
"id": "c_users_astra_desktop_hackon_debugzero_eval_plausibility_eval_py",
|
| 210 |
-
"community":
|
| 211 |
"norm_label": "plausibility_eval.py"
|
| 212 |
},
|
| 213 |
{
|
|
@@ -216,7 +216,7 @@
|
|
| 216 |
"source_file": "C:\\Users\\astra\\Desktop\\hackon\\debugZero\\eval\\plausibility_eval.py",
|
| 217 |
"source_location": "L5",
|
| 218 |
"id": "plausibility_eval_evaluate_navidadkhah_plausibility",
|
| 219 |
-
"community":
|
| 220 |
"norm_label": "evaluate_navidadkhah_plausibility()"
|
| 221 |
},
|
| 222 |
{
|
|
@@ -224,7 +224,7 @@
|
|
| 224 |
"file_type": "rationale",
|
| 225 |
"source_file": "C:\\Users\\astra\\Desktop\\hackon\\debugZero\\eval\\plausibility_eval.py",
|
| 226 |
"source_location": "L6",
|
| 227 |
-
"community":
|
| 228 |
"norm_label": "offline evaluation of generated bugs against the navidadkhah 25k bug dataset.",
|
| 229 |
"id": "plausibility_eval_rationale_6"
|
| 230 |
},
|
|
@@ -297,7 +297,7 @@
|
|
| 297 |
"source_file": "C:\\Users\\astra\\Desktop\\hackon\\debugZero\\eval\\test_executor.py",
|
| 298 |
"source_location": "L1",
|
| 299 |
"id": "c_users_astra_desktop_hackon_debugzero_eval_test_executor_py",
|
| 300 |
-
"community":
|
| 301 |
"norm_label": "test_executor.py"
|
| 302 |
},
|
| 303 |
{
|
|
@@ -306,7 +306,7 @@
|
|
| 306 |
"source_file": "C:\\Users\\astra\\Desktop\\hackon\\debugZero\\eval\\test_executor.py",
|
| 307 |
"source_location": "L4",
|
| 308 |
"id": "test_executor_test_executor_is_safe",
|
| 309 |
-
"community":
|
| 310 |
"norm_label": "test_executor_is_safe()"
|
| 311 |
},
|
| 312 |
{
|
|
@@ -315,7 +315,7 @@
|
|
| 315 |
"source_file": "C:\\Users\\astra\\Desktop\\hackon\\debugZero\\eval\\test_executor.py",
|
| 316 |
"source_location": "L16",
|
| 317 |
"id": "test_executor_test_execute_code",
|
| 318 |
-
"community":
|
| 319 |
"norm_label": "test_execute_code()"
|
| 320 |
},
|
| 321 |
{
|
|
@@ -324,7 +324,7 @@
|
|
| 324 |
"source_file": "C:\\Users\\astra\\Desktop\\hackon\\debugZero\\server\\app.py",
|
| 325 |
"source_location": "L1",
|
| 326 |
"id": "c_users_astra_desktop_hackon_debugzero_server_app_py",
|
| 327 |
-
"community":
|
| 328 |
"norm_label": "app.py"
|
| 329 |
},
|
| 330 |
{
|
|
@@ -333,7 +333,7 @@
|
|
| 333 |
"source_file": "C:\\Users\\astra\\Desktop\\hackon\\debugZero\\server\\app.py",
|
| 334 |
"source_location": "L63",
|
| 335 |
"id": "app_main",
|
| 336 |
-
"community":
|
| 337 |
"norm_label": "main()"
|
| 338 |
},
|
| 339 |
{
|
|
@@ -341,9 +341,9 @@
|
|
| 341 |
"file_type": "rationale",
|
| 342 |
"source_file": "C:\\Users\\astra\\Desktop\\hackon\\debugZero\\server\\app.py",
|
| 343 |
"source_location": "L64",
|
| 344 |
-
"
|
| 345 |
-
"
|
| 346 |
-
"
|
| 347 |
},
|
| 348 |
{
|
| 349 |
"label": "bug_injector.py",
|
|
@@ -477,7 +477,7 @@
|
|
| 477 |
"source_file": "C:\\Users\\astra\\Desktop\\hackon\\debugZero\\server\\debugZero_environment.py",
|
| 478 |
"source_location": "L1",
|
| 479 |
"id": "c_users_astra_desktop_hackon_debugzero_server_debugzero_environment_py",
|
| 480 |
-
"community":
|
| 481 |
"norm_label": "debugzero_environment.py"
|
| 482 |
},
|
| 483 |
{
|
|
@@ -504,7 +504,7 @@
|
|
| 504 |
"source_file": "C:\\Users\\astra\\Desktop\\hackon\\debugZero\\server\\debugZero_environment.py",
|
| 505 |
"source_location": "L47",
|
| 506 |
"id": "debugzero_environment_debugzeroenvironment_init",
|
| 507 |
-
"community":
|
| 508 |
"norm_label": ".__init__()"
|
| 509 |
},
|
| 510 |
{
|
|
@@ -531,7 +531,7 @@
|
|
| 531 |
"source_file": "C:\\Users\\astra\\Desktop\\hackon\\debugZero\\server\\debugZero_environment.py",
|
| 532 |
"source_location": "L129",
|
| 533 |
"id": "debugzero_environment_state",
|
| 534 |
-
"community":
|
| 535 |
"norm_label": "state()"
|
| 536 |
},
|
| 537 |
{
|
|
@@ -539,9 +539,9 @@
|
|
| 539 |
"file_type": "rationale",
|
| 540 |
"source_file": "C:\\Users\\astra\\Desktop\\hackon\\debugZero\\server\\debugZero_environment.py",
|
| 541 |
"source_location": "L41",
|
| 542 |
-
"
|
| 543 |
-
"
|
| 544 |
-
"
|
| 545 |
},
|
| 546 |
{
|
| 547 |
"label": "executor.py",
|
|
@@ -549,7 +549,7 @@
|
|
| 549 |
"source_file": "C:\\Users\\astra\\Desktop\\hackon\\debugZero\\server\\executor.py",
|
| 550 |
"source_location": "L1",
|
| 551 |
"id": "c_users_astra_desktop_hackon_debugzero_server_executor_py",
|
| 552 |
-
"community":
|
| 553 |
"norm_label": "executor.py"
|
| 554 |
},
|
| 555 |
{
|
|
@@ -558,7 +558,7 @@
|
|
| 558 |
"source_file": "C:\\Users\\astra\\Desktop\\hackon\\debugZero\\server\\executor.py",
|
| 559 |
"source_location": "L9",
|
| 560 |
"id": "executor_is_safe",
|
| 561 |
-
"community":
|
| 562 |
"norm_label": "is_safe()"
|
| 563 |
},
|
| 564 |
{
|
|
@@ -567,7 +567,7 @@
|
|
| 567 |
"source_file": "C:\\Users\\astra\\Desktop\\hackon\\debugZero\\server\\executor.py",
|
| 568 |
"source_location": "L44",
|
| 569 |
"id": "executor_executionresult",
|
| 570 |
-
"community":
|
| 571 |
"norm_label": "executionresult"
|
| 572 |
},
|
| 573 |
{
|
|
@@ -576,7 +576,7 @@
|
|
| 576 |
"source_file": "C:\\Users\\astra\\Desktop\\hackon\\debugZero\\server\\executor.py",
|
| 577 |
"source_location": "L45",
|
| 578 |
"id": "executor_executionresult_init",
|
| 579 |
-
"community":
|
| 580 |
"norm_label": ".__init__()"
|
| 581 |
},
|
| 582 |
{
|
|
@@ -585,7 +585,7 @@
|
|
| 585 |
"source_file": "C:\\Users\\astra\\Desktop\\hackon\\debugZero\\server\\executor.py",
|
| 586 |
"source_location": "L51",
|
| 587 |
"id": "executor_execute_code",
|
| 588 |
-
"community":
|
| 589 |
"norm_label": "execute_code()"
|
| 590 |
},
|
| 591 |
{
|
|
@@ -593,7 +593,7 @@
|
|
| 593 |
"file_type": "rationale",
|
| 594 |
"source_file": "C:\\Users\\astra\\Desktop\\hackon\\debugZero\\server\\executor.py",
|
| 595 |
"source_location": "L10",
|
| 596 |
-
"community":
|
| 597 |
"norm_label": "check if the code contains any blocked imports strings. also performs a qu",
|
| 598 |
"id": "executor_rationale_10"
|
| 599 |
},
|
|
@@ -602,7 +602,7 @@
|
|
| 602 |
"file_type": "rationale",
|
| 603 |
"source_file": "C:\\Users\\astra\\Desktop\\hackon\\debugZero\\server\\executor.py",
|
| 604 |
"source_location": "L52",
|
| 605 |
-
"community":
|
| 606 |
"norm_label": "executes the provided python code alongside its tests in an isolated subprocess.",
|
| 607 |
"id": "executor_rationale_52"
|
| 608 |
},
|
|
@@ -612,7 +612,7 @@
|
|
| 612 |
"source_file": "C:\\Users\\astra\\Desktop\\hackon\\debugZero\\server\\plausibility.py",
|
| 613 |
"source_location": "L1",
|
| 614 |
"id": "c_users_astra_desktop_hackon_debugzero_server_plausibility_py",
|
| 615 |
-
"community":
|
| 616 |
"norm_label": "plausibility.py"
|
| 617 |
},
|
| 618 |
{
|
|
@@ -621,7 +621,7 @@
|
|
| 621 |
"source_file": "C:\\Users\\astra\\Desktop\\hackon\\debugZero\\server\\plausibility.py",
|
| 622 |
"source_location": "L4",
|
| 623 |
"id": "plausibility_compute_ast_distance",
|
| 624 |
-
"community":
|
| 625 |
"norm_label": "compute_ast_distance()"
|
| 626 |
},
|
| 627 |
{
|
|
@@ -629,7 +629,7 @@
|
|
| 629 |
"file_type": "rationale",
|
| 630 |
"source_file": "C:\\Users\\astra\\Desktop\\hackon\\debugZero\\server\\plausibility.py",
|
| 631 |
"source_location": "L5",
|
| 632 |
-
"community":
|
| 633 |
"norm_label": "computes the string similarity distance between the ast dumps of the original",
|
| 634 |
"id": "plausibility_rationale_5"
|
| 635 |
},
|
|
@@ -639,7 +639,7 @@
|
|
| 639 |
"source_file": "C:\\Users\\astra\\Desktop\\hackon\\debugZero\\server\\__init__.py",
|
| 640 |
"source_location": "L1",
|
| 641 |
"id": "c_users_astra_desktop_hackon_debugzero_server_init_py",
|
| 642 |
-
"community":
|
| 643 |
"norm_label": "__init__.py"
|
| 644 |
},
|
| 645 |
{
|
|
@@ -648,7 +648,7 @@
|
|
| 648 |
"source_file": "C:\\Users\\astra\\Desktop\\hackon\\debugZero\\training\\dual_role_sampler.py",
|
| 649 |
"source_location": "L1",
|
| 650 |
"id": "c_users_astra_desktop_hackon_debugzero_training_dual_role_sampler_py",
|
| 651 |
-
"community":
|
| 652 |
"norm_label": "dual_role_sampler.py"
|
| 653 |
},
|
| 654 |
{
|
|
@@ -657,7 +657,7 @@
|
|
| 657 |
"source_file": "C:\\Users\\astra\\Desktop\\hackon\\debugZero\\training\\dual_role_sampler.py",
|
| 658 |
"source_location": "L23",
|
| 659 |
"id": "dual_role_sampler_sample_proposer_prompt",
|
| 660 |
-
"community":
|
| 661 |
"norm_label": "sample_proposer_prompt()"
|
| 662 |
},
|
| 663 |
{
|
|
@@ -666,7 +666,7 @@
|
|
| 666 |
"source_file": "C:\\Users\\astra\\Desktop\\hackon\\debugZero\\training\\dual_role_sampler.py",
|
| 667 |
"source_location": "L26",
|
| 668 |
"id": "dual_role_sampler_sample_solver_prompt",
|
| 669 |
-
"community":
|
| 670 |
"norm_label": "sample_solver_prompt()"
|
| 671 |
},
|
| 672 |
{
|
|
@@ -675,7 +675,7 @@
|
|
| 675 |
"source_file": "C:\\Users\\astra\\Desktop\\hackon\\debugZero\\training\\grpo_train.py",
|
| 676 |
"source_location": "L1",
|
| 677 |
"id": "c_users_astra_desktop_hackon_debugzero_training_grpo_train_py",
|
| 678 |
-
"community":
|
| 679 |
"norm_label": "grpo_train.py"
|
| 680 |
},
|
| 681 |
{
|
|
@@ -684,7 +684,7 @@
|
|
| 684 |
"source_file": "C:\\Users\\astra\\Desktop\\hackon\\debugZero\\training\\grpo_train.py",
|
| 685 |
"source_location": "L25",
|
| 686 |
"id": "grpo_train_reward_fn",
|
| 687 |
-
"community":
|
| 688 |
"norm_label": "reward_fn()"
|
| 689 |
},
|
| 690 |
{
|
|
@@ -693,7 +693,7 @@
|
|
| 693 |
"source_file": "C:\\Users\\astra\\Desktop\\hackon\\debugZero\\training\\grpo_train.py",
|
| 694 |
"source_location": "L44",
|
| 695 |
"id": "grpo_train_create_dataset",
|
| 696 |
-
"community":
|
| 697 |
"norm_label": "create_dataset()"
|
| 698 |
},
|
| 699 |
{
|
|
@@ -702,7 +702,7 @@
|
|
| 702 |
"source_file": "C:\\Users\\astra\\Desktop\\hackon\\debugZero\\training\\grpo_train.py",
|
| 703 |
"source_location": "L57",
|
| 704 |
"id": "grpo_train_main",
|
| 705 |
-
"community":
|
| 706 |
"norm_label": "main()"
|
| 707 |
},
|
| 708 |
{
|
|
@@ -711,7 +711,7 @@
|
|
| 711 |
"source_file": "C:\\Users\\astra\\Desktop\\hackon\\debugZero\\training\\rewards.py",
|
| 712 |
"source_location": "L1",
|
| 713 |
"id": "c_users_astra_desktop_hackon_debugzero_training_rewards_py",
|
| 714 |
-
"community":
|
| 715 |
"norm_label": "rewards.py"
|
| 716 |
},
|
| 717 |
{
|
|
@@ -720,7 +720,7 @@
|
|
| 720 |
"source_file": "C:\\Users\\astra\\Desktop\\hackon\\debugZero\\training\\rewards.py",
|
| 721 |
"source_location": "L7",
|
| 722 |
"id": "rewards_get_solve_rate",
|
| 723 |
-
"community":
|
| 724 |
"norm_label": "get_solve_rate()"
|
| 725 |
},
|
| 726 |
{
|
|
@@ -729,7 +729,7 @@
|
|
| 729 |
"source_file": "C:\\Users\\astra\\Desktop\\hackon\\debugZero\\training\\rewards.py",
|
| 730 |
"source_location": "L13",
|
| 731 |
"id": "rewards_record_solve_result",
|
| 732 |
-
"community":
|
| 733 |
"norm_label": "record_solve_result()"
|
| 734 |
},
|
| 735 |
{
|
|
@@ -738,7 +738,7 @@
|
|
| 738 |
"source_file": "C:\\Users\\astra\\Desktop\\hackon\\debugZero\\training\\rewards.py",
|
| 739 |
"source_location": "L18",
|
| 740 |
"id": "rewards_compute_proposer_reward",
|
| 741 |
-
"community":
|
| 742 |
"norm_label": "compute_proposer_reward()"
|
| 743 |
},
|
| 744 |
{
|
|
@@ -747,7 +747,7 @@
|
|
| 747 |
"source_file": "C:\\Users\\astra\\Desktop\\hackon\\debugZero\\training\\rewards.py",
|
| 748 |
"source_location": "L37",
|
| 749 |
"id": "rewards_compute_solver_reward",
|
| 750 |
-
"community":
|
| 751 |
"norm_label": "compute_solver_reward()"
|
| 752 |
},
|
| 753 |
{
|
|
@@ -755,7 +755,7 @@
|
|
| 755 |
"file_type": "rationale",
|
| 756 |
"source_file": "C:\\Users\\astra\\Desktop\\hackon\\debugZero\\server\\app.py",
|
| 757 |
"source_location": "L61",
|
| 758 |
-
"community":
|
| 759 |
"norm_label": "entry point for direct execution via uv run or python -m. this function ena",
|
| 760 |
"id": "app_rationale_61"
|
| 761 |
},
|
|
@@ -764,7 +764,7 @@
|
|
| 764 |
"file_type": "rationale",
|
| 765 |
"source_file": "C:\\Users\\astra\\Desktop\\hackon\\debugZero\\server\\debugZero_environment.py",
|
| 766 |
"source_location": "L39",
|
| 767 |
-
"community":
|
| 768 |
"norm_label": "dual-role debugzero environment wrapping a python sandbox execution for prop",
|
| 769 |
"id": "debugzero_environment_rationale_39"
|
| 770 |
}
|
|
@@ -1196,11 +1196,11 @@
|
|
| 1196 |
"source_file": "C:\\Users\\astra\\Desktop\\hackon\\debugZero\\server\\app.py",
|
| 1197 |
"source_location": "L49",
|
| 1198 |
"weight": 0.8,
|
| 1199 |
-
"_src": "
|
| 1200 |
-
"_tgt": "
|
|
|
|
| 1201 |
"source": "models_debugzeroaction",
|
| 1202 |
-
"target": "app_rationale_64"
|
| 1203 |
-
"confidence_score": 0.5
|
| 1204 |
},
|
| 1205 |
{
|
| 1206 |
"relation": "uses",
|
|
@@ -1220,11 +1220,11 @@
|
|
| 1220 |
"source_file": "C:\\Users\\astra\\Desktop\\hackon\\debugZero\\server\\debugZero_environment.py",
|
| 1221 |
"source_location": "L21",
|
| 1222 |
"weight": 0.8,
|
| 1223 |
-
"_src": "
|
| 1224 |
-
"_tgt": "
|
|
|
|
| 1225 |
"source": "models_debugzeroaction",
|
| 1226 |
-
"target": "debugzero_environment_rationale_41"
|
| 1227 |
-
"confidence_score": 0.5
|
| 1228 |
},
|
| 1229 |
{
|
| 1230 |
"relation": "calls",
|
|
@@ -1292,11 +1292,11 @@
|
|
| 1292 |
"source_file": "C:\\Users\\astra\\Desktop\\hackon\\debugZero\\server\\app.py",
|
| 1293 |
"source_location": "L49",
|
| 1294 |
"weight": 0.8,
|
| 1295 |
-
"_src": "
|
| 1296 |
-
"_tgt": "
|
|
|
|
| 1297 |
"source": "models_debugzeroobservation",
|
| 1298 |
-
"target": "app_rationale_64"
|
| 1299 |
-
"confidence_score": 0.5
|
| 1300 |
},
|
| 1301 |
{
|
| 1302 |
"relation": "uses",
|
|
@@ -1316,11 +1316,11 @@
|
|
| 1316 |
"source_file": "C:\\Users\\astra\\Desktop\\hackon\\debugZero\\server\\debugZero_environment.py",
|
| 1317 |
"source_location": "L21",
|
| 1318 |
"weight": 0.8,
|
| 1319 |
-
"_src": "
|
| 1320 |
-
"_tgt": "
|
|
|
|
| 1321 |
"source": "models_debugzeroobservation",
|
| 1322 |
-
"target": "debugzero_environment_rationale_41"
|
| 1323 |
-
"confidence_score": 0.5
|
| 1324 |
},
|
| 1325 |
{
|
| 1326 |
"relation": "calls",
|
|
@@ -1412,11 +1412,11 @@
|
|
| 1412 |
"source_file": "C:\\Users\\astra\\Desktop\\hackon\\debugZero\\server\\debugZero_environment.py",
|
| 1413 |
"source_location": "L21",
|
| 1414 |
"weight": 0.8,
|
| 1415 |
-
"_src": "
|
| 1416 |
-
"_tgt": "
|
|
|
|
| 1417 |
"source": "models_debugzerostate",
|
| 1418 |
-
"target": "debugzero_environment_rationale_41"
|
| 1419 |
-
"confidence_score": 0.5
|
| 1420 |
},
|
| 1421 |
{
|
| 1422 |
"relation": "calls",
|
|
@@ -1774,9 +1774,9 @@
|
|
| 1774 |
"weight": 0.8,
|
| 1775 |
"_src": "app_rationale_64",
|
| 1776 |
"_tgt": "debugzero_environment_debugzeroenvironment",
|
|
|
|
| 1777 |
"source": "app_rationale_64",
|
| 1778 |
-
"target": "debugzero_environment_debugzeroenvironment"
|
| 1779 |
-
"confidence_score": 0.5
|
| 1780 |
},
|
| 1781 |
{
|
| 1782 |
"relation": "contains",
|
|
@@ -2108,8 +2108,8 @@
|
|
| 2108 |
"source_file": "C:\\Users\\astra\\Desktop\\hackon\\debugZero\\server\\app.py",
|
| 2109 |
"source_location": "L47",
|
| 2110 |
"weight": 0.8,
|
| 2111 |
-
"_src": "
|
| 2112 |
-
"_tgt": "
|
| 2113 |
"confidence_score": 0.5,
|
| 2114 |
"source": "debugzero_environment_debugzeroenvironment",
|
| 2115 |
"target": "app_rationale_61"
|
|
@@ -2134,9 +2134,9 @@
|
|
| 2134 |
"weight": 0.8,
|
| 2135 |
"_src": "debugzero_environment_rationale_41",
|
| 2136 |
"_tgt": "executor_executionresult",
|
|
|
|
| 2137 |
"source": "debugzero_environment_rationale_41",
|
| 2138 |
-
"target": "executor_executionresult"
|
| 2139 |
-
"confidence_score": 0.5
|
| 2140 |
},
|
| 2141 |
{
|
| 2142 |
"relation": "contains",
|
|
@@ -2228,8 +2228,8 @@
|
|
| 2228 |
"source_file": "C:\\Users\\astra\\Desktop\\hackon\\debugZero\\server\\debugZero_environment.py",
|
| 2229 |
"source_location": "L26",
|
| 2230 |
"weight": 0.8,
|
| 2231 |
-
"_src": "
|
| 2232 |
-
"_tgt": "
|
| 2233 |
"confidence_score": 0.5,
|
| 2234 |
"source": "executor_executionresult",
|
| 2235 |
"target": "debugzero_environment_rationale_39"
|
|
|
|
| 9 |
"source_file": "C:\\Users\\astra\\Desktop\\hackon\\debugZero\\client.py",
|
| 10 |
"source_location": "L1",
|
| 11 |
"id": "c_users_astra_desktop_hackon_debugzero_client_py",
|
| 12 |
+
"community": 2,
|
| 13 |
"norm_label": "client.py"
|
| 14 |
},
|
| 15 |
{
|
|
|
|
| 18 |
"source_file": "C:\\Users\\astra\\Desktop\\hackon\\debugZero\\client.py",
|
| 19 |
"source_location": "L18",
|
| 20 |
"id": "client_debugzeroenv",
|
| 21 |
+
"community": 4,
|
| 22 |
"norm_label": "debugzeroenv"
|
| 23 |
},
|
| 24 |
{
|
|
|
|
| 27 |
"source_file": "C:\\Users\\astra\\Desktop\\hackon\\debugZero\\client.py",
|
| 28 |
"source_location": "L47",
|
| 29 |
"id": "client_debugzeroenv_step_payload",
|
| 30 |
+
"community": 4,
|
| 31 |
"norm_label": "._step_payload()"
|
| 32 |
},
|
| 33 |
{
|
|
|
|
| 36 |
"source_file": "C:\\Users\\astra\\Desktop\\hackon\\debugZero\\client.py",
|
| 37 |
"source_location": "L62",
|
| 38 |
"id": "client_debugzeroenv_parse_result",
|
| 39 |
+
"community": 9,
|
| 40 |
"norm_label": "._parse_result()"
|
| 41 |
},
|
| 42 |
{
|
|
|
|
| 45 |
"source_file": "C:\\Users\\astra\\Desktop\\hackon\\debugZero\\client.py",
|
| 46 |
"source_location": "L90",
|
| 47 |
"id": "client_debugzeroenv_parse_state",
|
| 48 |
+
"community": 6,
|
| 49 |
"norm_label": "._parse_state()"
|
| 50 |
},
|
| 51 |
{
|
|
|
|
| 53 |
"file_type": "rationale",
|
| 54 |
"source_file": "C:\\Users\\astra\\Desktop\\hackon\\debugZero\\client.py",
|
| 55 |
"source_location": "L21",
|
| 56 |
+
"community": 4,
|
| 57 |
"norm_label": "client for the debugzero environment. this client maintains a persistent we",
|
| 58 |
"id": "client_rationale_21"
|
| 59 |
},
|
|
|
|
| 62 |
"file_type": "rationale",
|
| 63 |
"source_file": "C:\\Users\\astra\\Desktop\\hackon\\debugZero\\client.py",
|
| 64 |
"source_location": "L48",
|
| 65 |
+
"community": 4,
|
| 66 |
"norm_label": "convert debugzeroaction to json payload for step message. args:",
|
| 67 |
"id": "client_rationale_48"
|
| 68 |
},
|
|
|
|
| 71 |
"file_type": "rationale",
|
| 72 |
"source_file": "C:\\Users\\astra\\Desktop\\hackon\\debugZero\\client.py",
|
| 73 |
"source_location": "L63",
|
| 74 |
+
"community": 9,
|
| 75 |
"norm_label": "parse server response into stepresult[debugzeroobservation]. args:",
|
| 76 |
"id": "client_rationale_63"
|
| 77 |
},
|
|
|
|
| 80 |
"file_type": "rationale",
|
| 81 |
"source_file": "C:\\Users\\astra\\Desktop\\hackon\\debugZero\\client.py",
|
| 82 |
"source_location": "L91",
|
| 83 |
+
"community": 6,
|
| 84 |
"norm_label": "parse server response into state object. args: payload: jso",
|
| 85 |
"id": "client_rationale_91"
|
| 86 |
},
|
|
|
|
| 90 |
"source_file": "C:\\Users\\astra\\Desktop\\hackon\\debugZero\\models.py",
|
| 91 |
"source_location": "L1",
|
| 92 |
"id": "c_users_astra_desktop_hackon_debugzero_models_py",
|
| 93 |
+
"community": 2,
|
| 94 |
"norm_label": "models.py"
|
| 95 |
},
|
| 96 |
{
|
|
|
|
| 99 |
"source_file": "C:\\Users\\astra\\Desktop\\hackon\\debugZero\\models.py",
|
| 100 |
"source_location": "L18",
|
| 101 |
"id": "models_debugzeroaction",
|
| 102 |
+
"community": 4,
|
| 103 |
"norm_label": "debugzeroaction"
|
| 104 |
},
|
| 105 |
{
|
|
|
|
| 108 |
"source_file": "",
|
| 109 |
"source_location": "",
|
| 110 |
"id": "action",
|
| 111 |
+
"community": 4,
|
| 112 |
"norm_label": "action"
|
| 113 |
},
|
| 114 |
{
|
|
|
|
| 117 |
"source_file": "C:\\Users\\astra\\Desktop\\hackon\\debugZero\\models.py",
|
| 118 |
"source_location": "L25",
|
| 119 |
"id": "models_debugzeroobservation",
|
| 120 |
+
"community": 9,
|
| 121 |
"norm_label": "debugzeroobservation"
|
| 122 |
},
|
| 123 |
{
|
|
|
|
| 126 |
"source_file": "",
|
| 127 |
"source_location": "",
|
| 128 |
"id": "observation",
|
| 129 |
+
"community": 9,
|
| 130 |
"norm_label": "observation"
|
| 131 |
},
|
| 132 |
{
|
|
|
|
| 135 |
"source_file": "C:\\Users\\astra\\Desktop\\hackon\\debugZero\\models.py",
|
| 136 |
"source_location": "L34",
|
| 137 |
"id": "models_debugzerostate",
|
| 138 |
+
"community": 6,
|
| 139 |
"norm_label": "debugzerostate"
|
| 140 |
},
|
| 141 |
{
|
|
|
|
| 144 |
"source_file": "",
|
| 145 |
"source_location": "",
|
| 146 |
"id": "state",
|
| 147 |
+
"community": 6,
|
| 148 |
"norm_label": "state"
|
| 149 |
},
|
| 150 |
{
|
|
|
|
| 152 |
"file_type": "rationale",
|
| 153 |
"source_file": "C:\\Users\\astra\\Desktop\\hackon\\debugZero\\models.py",
|
| 154 |
"source_location": "L19",
|
| 155 |
+
"community": 4,
|
| 156 |
"norm_label": "action for the debugzero environment representing the proposer or solver inputs.",
|
| 157 |
"id": "models_rationale_19"
|
| 158 |
},
|
|
|
|
| 161 |
"file_type": "rationale",
|
| 162 |
"source_file": "C:\\Users\\astra\\Desktop\\hackon\\debugZero\\models.py",
|
| 163 |
"source_location": "L26",
|
| 164 |
+
"community": 9,
|
| 165 |
"norm_label": "observation from the debugzero environment following sandbox execution.",
|
| 166 |
"id": "models_rationale_26"
|
| 167 |
},
|
|
|
|
| 170 |
"file_type": "rationale",
|
| 171 |
"source_file": "C:\\Users\\astra\\Desktop\\hackon\\debugZero\\models.py",
|
| 172 |
"source_location": "L35",
|
| 173 |
+
"community": 6,
|
| 174 |
"norm_label": "state for the debugzero environment, extending default state with seed context.",
|
| 175 |
"id": "models_rationale_35"
|
| 176 |
},
|
|
|
|
| 198 |
"source_file": "C:\\Users\\astra\\Desktop\\hackon\\debugZero\\__init__.py",
|
| 199 |
"source_location": "L1",
|
| 200 |
"id": "c_users_astra_desktop_hackon_debugzero_init_py",
|
| 201 |
+
"community": 2,
|
| 202 |
"norm_label": "__init__.py"
|
| 203 |
},
|
| 204 |
{
|
|
|
|
| 207 |
"source_file": "C:\\Users\\astra\\Desktop\\hackon\\debugZero\\eval\\plausibility_eval.py",
|
| 208 |
"source_location": "L1",
|
| 209 |
"id": "c_users_astra_desktop_hackon_debugzero_eval_plausibility_eval_py",
|
| 210 |
+
"community": 8,
|
| 211 |
"norm_label": "plausibility_eval.py"
|
| 212 |
},
|
| 213 |
{
|
|
|
|
| 216 |
"source_file": "C:\\Users\\astra\\Desktop\\hackon\\debugZero\\eval\\plausibility_eval.py",
|
| 217 |
"source_location": "L5",
|
| 218 |
"id": "plausibility_eval_evaluate_navidadkhah_plausibility",
|
| 219 |
+
"community": 8,
|
| 220 |
"norm_label": "evaluate_navidadkhah_plausibility()"
|
| 221 |
},
|
| 222 |
{
|
|
|
|
| 224 |
"file_type": "rationale",
|
| 225 |
"source_file": "C:\\Users\\astra\\Desktop\\hackon\\debugZero\\eval\\plausibility_eval.py",
|
| 226 |
"source_location": "L6",
|
| 227 |
+
"community": 8,
|
| 228 |
"norm_label": "offline evaluation of generated bugs against the navidadkhah 25k bug dataset.",
|
| 229 |
"id": "plausibility_eval_rationale_6"
|
| 230 |
},
|
|
|
|
| 297 |
"source_file": "C:\\Users\\astra\\Desktop\\hackon\\debugZero\\eval\\test_executor.py",
|
| 298 |
"source_location": "L1",
|
| 299 |
"id": "c_users_astra_desktop_hackon_debugzero_eval_test_executor_py",
|
| 300 |
+
"community": 5,
|
| 301 |
"norm_label": "test_executor.py"
|
| 302 |
},
|
| 303 |
{
|
|
|
|
| 306 |
"source_file": "C:\\Users\\astra\\Desktop\\hackon\\debugZero\\eval\\test_executor.py",
|
| 307 |
"source_location": "L4",
|
| 308 |
"id": "test_executor_test_executor_is_safe",
|
| 309 |
+
"community": 5,
|
| 310 |
"norm_label": "test_executor_is_safe()"
|
| 311 |
},
|
| 312 |
{
|
|
|
|
| 315 |
"source_file": "C:\\Users\\astra\\Desktop\\hackon\\debugZero\\eval\\test_executor.py",
|
| 316 |
"source_location": "L16",
|
| 317 |
"id": "test_executor_test_execute_code",
|
| 318 |
+
"community": 5,
|
| 319 |
"norm_label": "test_execute_code()"
|
| 320 |
},
|
| 321 |
{
|
|
|
|
| 324 |
"source_file": "C:\\Users\\astra\\Desktop\\hackon\\debugZero\\server\\app.py",
|
| 325 |
"source_location": "L1",
|
| 326 |
"id": "c_users_astra_desktop_hackon_debugzero_server_app_py",
|
| 327 |
+
"community": 2,
|
| 328 |
"norm_label": "app.py"
|
| 329 |
},
|
| 330 |
{
|
|
|
|
| 333 |
"source_file": "C:\\Users\\astra\\Desktop\\hackon\\debugZero\\server\\app.py",
|
| 334 |
"source_location": "L63",
|
| 335 |
"id": "app_main",
|
| 336 |
+
"community": 2,
|
| 337 |
"norm_label": "main()"
|
| 338 |
},
|
| 339 |
{
|
|
|
|
| 341 |
"file_type": "rationale",
|
| 342 |
"source_file": "C:\\Users\\astra\\Desktop\\hackon\\debugZero\\server\\app.py",
|
| 343 |
"source_location": "L64",
|
| 344 |
+
"community": 2,
|
| 345 |
+
"norm_label": "entry point for direct execution via uv run or python -m. this function ena",
|
| 346 |
+
"id": "app_rationale_64"
|
| 347 |
},
|
| 348 |
{
|
| 349 |
"label": "bug_injector.py",
|
|
|
|
| 477 |
"source_file": "C:\\Users\\astra\\Desktop\\hackon\\debugZero\\server\\debugZero_environment.py",
|
| 478 |
"source_location": "L1",
|
| 479 |
"id": "c_users_astra_desktop_hackon_debugzero_server_debugzero_environment_py",
|
| 480 |
+
"community": 2,
|
| 481 |
"norm_label": "debugzero_environment.py"
|
| 482 |
},
|
| 483 |
{
|
|
|
|
| 504 |
"source_file": "C:\\Users\\astra\\Desktop\\hackon\\debugZero\\server\\debugZero_environment.py",
|
| 505 |
"source_location": "L47",
|
| 506 |
"id": "debugzero_environment_debugzeroenvironment_init",
|
| 507 |
+
"community": 6,
|
| 508 |
"norm_label": ".__init__()"
|
| 509 |
},
|
| 510 |
{
|
|
|
|
| 531 |
"source_file": "C:\\Users\\astra\\Desktop\\hackon\\debugZero\\server\\debugZero_environment.py",
|
| 532 |
"source_location": "L129",
|
| 533 |
"id": "debugzero_environment_state",
|
| 534 |
+
"community": 2,
|
| 535 |
"norm_label": "state()"
|
| 536 |
},
|
| 537 |
{
|
|
|
|
| 539 |
"file_type": "rationale",
|
| 540 |
"source_file": "C:\\Users\\astra\\Desktop\\hackon\\debugZero\\server\\debugZero_environment.py",
|
| 541 |
"source_location": "L41",
|
| 542 |
+
"community": 10,
|
| 543 |
+
"norm_label": "dual-role debugzero environment wrapping a python sandbox execution for prop",
|
| 544 |
+
"id": "debugzero_environment_rationale_41"
|
| 545 |
},
|
| 546 |
{
|
| 547 |
"label": "executor.py",
|
|
|
|
| 549 |
"source_file": "C:\\Users\\astra\\Desktop\\hackon\\debugZero\\server\\executor.py",
|
| 550 |
"source_location": "L1",
|
| 551 |
"id": "c_users_astra_desktop_hackon_debugzero_server_executor_py",
|
| 552 |
+
"community": 5,
|
| 553 |
"norm_label": "executor.py"
|
| 554 |
},
|
| 555 |
{
|
|
|
|
| 558 |
"source_file": "C:\\Users\\astra\\Desktop\\hackon\\debugZero\\server\\executor.py",
|
| 559 |
"source_location": "L9",
|
| 560 |
"id": "executor_is_safe",
|
| 561 |
+
"community": 5,
|
| 562 |
"norm_label": "is_safe()"
|
| 563 |
},
|
| 564 |
{
|
|
|
|
| 567 |
"source_file": "C:\\Users\\astra\\Desktop\\hackon\\debugZero\\server\\executor.py",
|
| 568 |
"source_location": "L44",
|
| 569 |
"id": "executor_executionresult",
|
| 570 |
+
"community": 10,
|
| 571 |
"norm_label": "executionresult"
|
| 572 |
},
|
| 573 |
{
|
|
|
|
| 576 |
"source_file": "C:\\Users\\astra\\Desktop\\hackon\\debugZero\\server\\executor.py",
|
| 577 |
"source_location": "L45",
|
| 578 |
"id": "executor_executionresult_init",
|
| 579 |
+
"community": 10,
|
| 580 |
"norm_label": ".__init__()"
|
| 581 |
},
|
| 582 |
{
|
|
|
|
| 585 |
"source_file": "C:\\Users\\astra\\Desktop\\hackon\\debugZero\\server\\executor.py",
|
| 586 |
"source_location": "L51",
|
| 587 |
"id": "executor_execute_code",
|
| 588 |
+
"community": 5,
|
| 589 |
"norm_label": "execute_code()"
|
| 590 |
},
|
| 591 |
{
|
|
|
|
| 593 |
"file_type": "rationale",
|
| 594 |
"source_file": "C:\\Users\\astra\\Desktop\\hackon\\debugZero\\server\\executor.py",
|
| 595 |
"source_location": "L10",
|
| 596 |
+
"community": 5,
|
| 597 |
"norm_label": "check if the code contains any blocked imports strings. also performs a qu",
|
| 598 |
"id": "executor_rationale_10"
|
| 599 |
},
|
|
|
|
| 602 |
"file_type": "rationale",
|
| 603 |
"source_file": "C:\\Users\\astra\\Desktop\\hackon\\debugZero\\server\\executor.py",
|
| 604 |
"source_location": "L52",
|
| 605 |
+
"community": 5,
|
| 606 |
"norm_label": "executes the provided python code alongside its tests in an isolated subprocess.",
|
| 607 |
"id": "executor_rationale_52"
|
| 608 |
},
|
|
|
|
| 612 |
"source_file": "C:\\Users\\astra\\Desktop\\hackon\\debugZero\\server\\plausibility.py",
|
| 613 |
"source_location": "L1",
|
| 614 |
"id": "c_users_astra_desktop_hackon_debugzero_server_plausibility_py",
|
| 615 |
+
"community": 8,
|
| 616 |
"norm_label": "plausibility.py"
|
| 617 |
},
|
| 618 |
{
|
|
|
|
| 621 |
"source_file": "C:\\Users\\astra\\Desktop\\hackon\\debugZero\\server\\plausibility.py",
|
| 622 |
"source_location": "L4",
|
| 623 |
"id": "plausibility_compute_ast_distance",
|
| 624 |
+
"community": 8,
|
| 625 |
"norm_label": "compute_ast_distance()"
|
| 626 |
},
|
| 627 |
{
|
|
|
|
| 629 |
"file_type": "rationale",
|
| 630 |
"source_file": "C:\\Users\\astra\\Desktop\\hackon\\debugZero\\server\\plausibility.py",
|
| 631 |
"source_location": "L5",
|
| 632 |
+
"community": 8,
|
| 633 |
"norm_label": "computes the string similarity distance between the ast dumps of the original",
|
| 634 |
"id": "plausibility_rationale_5"
|
| 635 |
},
|
|
|
|
| 639 |
"source_file": "C:\\Users\\astra\\Desktop\\hackon\\debugZero\\server\\__init__.py",
|
| 640 |
"source_location": "L1",
|
| 641 |
"id": "c_users_astra_desktop_hackon_debugzero_server_init_py",
|
| 642 |
+
"community": 2,
|
| 643 |
"norm_label": "__init__.py"
|
| 644 |
},
|
| 645 |
{
|
|
|
|
| 648 |
"source_file": "C:\\Users\\astra\\Desktop\\hackon\\debugZero\\training\\dual_role_sampler.py",
|
| 649 |
"source_location": "L1",
|
| 650 |
"id": "c_users_astra_desktop_hackon_debugzero_training_dual_role_sampler_py",
|
| 651 |
+
"community": 11,
|
| 652 |
"norm_label": "dual_role_sampler.py"
|
| 653 |
},
|
| 654 |
{
|
|
|
|
| 657 |
"source_file": "C:\\Users\\astra\\Desktop\\hackon\\debugZero\\training\\dual_role_sampler.py",
|
| 658 |
"source_location": "L23",
|
| 659 |
"id": "dual_role_sampler_sample_proposer_prompt",
|
| 660 |
+
"community": 11,
|
| 661 |
"norm_label": "sample_proposer_prompt()"
|
| 662 |
},
|
| 663 |
{
|
|
|
|
| 666 |
"source_file": "C:\\Users\\astra\\Desktop\\hackon\\debugZero\\training\\dual_role_sampler.py",
|
| 667 |
"source_location": "L26",
|
| 668 |
"id": "dual_role_sampler_sample_solver_prompt",
|
| 669 |
+
"community": 11,
|
| 670 |
"norm_label": "sample_solver_prompt()"
|
| 671 |
},
|
| 672 |
{
|
|
|
|
| 675 |
"source_file": "C:\\Users\\astra\\Desktop\\hackon\\debugZero\\training\\grpo_train.py",
|
| 676 |
"source_location": "L1",
|
| 677 |
"id": "c_users_astra_desktop_hackon_debugzero_training_grpo_train_py",
|
| 678 |
+
"community": 3,
|
| 679 |
"norm_label": "grpo_train.py"
|
| 680 |
},
|
| 681 |
{
|
|
|
|
| 684 |
"source_file": "C:\\Users\\astra\\Desktop\\hackon\\debugZero\\training\\grpo_train.py",
|
| 685 |
"source_location": "L25",
|
| 686 |
"id": "grpo_train_reward_fn",
|
| 687 |
+
"community": 3,
|
| 688 |
"norm_label": "reward_fn()"
|
| 689 |
},
|
| 690 |
{
|
|
|
|
| 693 |
"source_file": "C:\\Users\\astra\\Desktop\\hackon\\debugZero\\training\\grpo_train.py",
|
| 694 |
"source_location": "L44",
|
| 695 |
"id": "grpo_train_create_dataset",
|
| 696 |
+
"community": 3,
|
| 697 |
"norm_label": "create_dataset()"
|
| 698 |
},
|
| 699 |
{
|
|
|
|
| 702 |
"source_file": "C:\\Users\\astra\\Desktop\\hackon\\debugZero\\training\\grpo_train.py",
|
| 703 |
"source_location": "L57",
|
| 704 |
"id": "grpo_train_main",
|
| 705 |
+
"community": 3,
|
| 706 |
"norm_label": "main()"
|
| 707 |
},
|
| 708 |
{
|
|
|
|
| 711 |
"source_file": "C:\\Users\\astra\\Desktop\\hackon\\debugZero\\training\\rewards.py",
|
| 712 |
"source_location": "L1",
|
| 713 |
"id": "c_users_astra_desktop_hackon_debugzero_training_rewards_py",
|
| 714 |
+
"community": 3,
|
| 715 |
"norm_label": "rewards.py"
|
| 716 |
},
|
| 717 |
{
|
|
|
|
| 720 |
"source_file": "C:\\Users\\astra\\Desktop\\hackon\\debugZero\\training\\rewards.py",
|
| 721 |
"source_location": "L7",
|
| 722 |
"id": "rewards_get_solve_rate",
|
| 723 |
+
"community": 3,
|
| 724 |
"norm_label": "get_solve_rate()"
|
| 725 |
},
|
| 726 |
{
|
|
|
|
| 729 |
"source_file": "C:\\Users\\astra\\Desktop\\hackon\\debugZero\\training\\rewards.py",
|
| 730 |
"source_location": "L13",
|
| 731 |
"id": "rewards_record_solve_result",
|
| 732 |
+
"community": 3,
|
| 733 |
"norm_label": "record_solve_result()"
|
| 734 |
},
|
| 735 |
{
|
|
|
|
| 738 |
"source_file": "C:\\Users\\astra\\Desktop\\hackon\\debugZero\\training\\rewards.py",
|
| 739 |
"source_location": "L18",
|
| 740 |
"id": "rewards_compute_proposer_reward",
|
| 741 |
+
"community": 3,
|
| 742 |
"norm_label": "compute_proposer_reward()"
|
| 743 |
},
|
| 744 |
{
|
|
|
|
| 747 |
"source_file": "C:\\Users\\astra\\Desktop\\hackon\\debugZero\\training\\rewards.py",
|
| 748 |
"source_location": "L37",
|
| 749 |
"id": "rewards_compute_solver_reward",
|
| 750 |
+
"community": 3,
|
| 751 |
"norm_label": "compute_solver_reward()"
|
| 752 |
},
|
| 753 |
{
|
|
|
|
| 755 |
"file_type": "rationale",
|
| 756 |
"source_file": "C:\\Users\\astra\\Desktop\\hackon\\debugZero\\server\\app.py",
|
| 757 |
"source_location": "L61",
|
| 758 |
+
"community": 4,
|
| 759 |
"norm_label": "entry point for direct execution via uv run or python -m. this function ena",
|
| 760 |
"id": "app_rationale_61"
|
| 761 |
},
|
|
|
|
| 764 |
"file_type": "rationale",
|
| 765 |
"source_file": "C:\\Users\\astra\\Desktop\\hackon\\debugZero\\server\\debugZero_environment.py",
|
| 766 |
"source_location": "L39",
|
| 767 |
+
"community": 10,
|
| 768 |
"norm_label": "dual-role debugzero environment wrapping a python sandbox execution for prop",
|
| 769 |
"id": "debugzero_environment_rationale_39"
|
| 770 |
}
|
|
|
|
| 1196 |
"source_file": "C:\\Users\\astra\\Desktop\\hackon\\debugZero\\server\\app.py",
|
| 1197 |
"source_location": "L49",
|
| 1198 |
"weight": 0.8,
|
| 1199 |
+
"_src": "models_debugzeroaction",
|
| 1200 |
+
"_tgt": "app_rationale_64",
|
| 1201 |
+
"confidence_score": 0.5,
|
| 1202 |
"source": "models_debugzeroaction",
|
| 1203 |
+
"target": "app_rationale_64"
|
|
|
|
| 1204 |
},
|
| 1205 |
{
|
| 1206 |
"relation": "uses",
|
|
|
|
| 1220 |
"source_file": "C:\\Users\\astra\\Desktop\\hackon\\debugZero\\server\\debugZero_environment.py",
|
| 1221 |
"source_location": "L21",
|
| 1222 |
"weight": 0.8,
|
| 1223 |
+
"_src": "models_debugzeroaction",
|
| 1224 |
+
"_tgt": "debugzero_environment_rationale_41",
|
| 1225 |
+
"confidence_score": 0.5,
|
| 1226 |
"source": "models_debugzeroaction",
|
| 1227 |
+
"target": "debugzero_environment_rationale_41"
|
|
|
|
| 1228 |
},
|
| 1229 |
{
|
| 1230 |
"relation": "calls",
|
|
|
|
| 1292 |
"source_file": "C:\\Users\\astra\\Desktop\\hackon\\debugZero\\server\\app.py",
|
| 1293 |
"source_location": "L49",
|
| 1294 |
"weight": 0.8,
|
| 1295 |
+
"_src": "models_debugzeroobservation",
|
| 1296 |
+
"_tgt": "app_rationale_64",
|
| 1297 |
+
"confidence_score": 0.5,
|
| 1298 |
"source": "models_debugzeroobservation",
|
| 1299 |
+
"target": "app_rationale_64"
|
|
|
|
| 1300 |
},
|
| 1301 |
{
|
| 1302 |
"relation": "uses",
|
|
|
|
| 1316 |
"source_file": "C:\\Users\\astra\\Desktop\\hackon\\debugZero\\server\\debugZero_environment.py",
|
| 1317 |
"source_location": "L21",
|
| 1318 |
"weight": 0.8,
|
| 1319 |
+
"_src": "models_debugzeroobservation",
|
| 1320 |
+
"_tgt": "debugzero_environment_rationale_41",
|
| 1321 |
+
"confidence_score": 0.5,
|
| 1322 |
"source": "models_debugzeroobservation",
|
| 1323 |
+
"target": "debugzero_environment_rationale_41"
|
|
|
|
| 1324 |
},
|
| 1325 |
{
|
| 1326 |
"relation": "calls",
|
|
|
|
| 1412 |
"source_file": "C:\\Users\\astra\\Desktop\\hackon\\debugZero\\server\\debugZero_environment.py",
|
| 1413 |
"source_location": "L21",
|
| 1414 |
"weight": 0.8,
|
| 1415 |
+
"_src": "models_debugzerostate",
|
| 1416 |
+
"_tgt": "debugzero_environment_rationale_41",
|
| 1417 |
+
"confidence_score": 0.5,
|
| 1418 |
"source": "models_debugzerostate",
|
| 1419 |
+
"target": "debugzero_environment_rationale_41"
|
|
|
|
| 1420 |
},
|
| 1421 |
{
|
| 1422 |
"relation": "calls",
|
|
|
|
| 1774 |
"weight": 0.8,
|
| 1775 |
"_src": "app_rationale_64",
|
| 1776 |
"_tgt": "debugzero_environment_debugzeroenvironment",
|
| 1777 |
+
"confidence_score": 0.5,
|
| 1778 |
"source": "app_rationale_64",
|
| 1779 |
+
"target": "debugzero_environment_debugzeroenvironment"
|
|
|
|
| 1780 |
},
|
| 1781 |
{
|
| 1782 |
"relation": "contains",
|
|
|
|
| 2108 |
"source_file": "C:\\Users\\astra\\Desktop\\hackon\\debugZero\\server\\app.py",
|
| 2109 |
"source_location": "L47",
|
| 2110 |
"weight": 0.8,
|
| 2111 |
+
"_src": "debugzero_environment_debugzeroenvironment",
|
| 2112 |
+
"_tgt": "app_rationale_61",
|
| 2113 |
"confidence_score": 0.5,
|
| 2114 |
"source": "debugzero_environment_debugzeroenvironment",
|
| 2115 |
"target": "app_rationale_61"
|
|
|
|
| 2134 |
"weight": 0.8,
|
| 2135 |
"_src": "debugzero_environment_rationale_41",
|
| 2136 |
"_tgt": "executor_executionresult",
|
| 2137 |
+
"confidence_score": 0.5,
|
| 2138 |
"source": "debugzero_environment_rationale_41",
|
| 2139 |
+
"target": "executor_executionresult"
|
|
|
|
| 2140 |
},
|
| 2141 |
{
|
| 2142 |
"relation": "contains",
|
|
|
|
| 2228 |
"source_file": "C:\\Users\\astra\\Desktop\\hackon\\debugZero\\server\\debugZero_environment.py",
|
| 2229 |
"source_location": "L26",
|
| 2230 |
"weight": 0.8,
|
| 2231 |
+
"_src": "executor_executionresult",
|
| 2232 |
+
"_tgt": "debugzero_environment_rationale_39",
|
| 2233 |
"confidence_score": 0.5,
|
| 2234 |
"source": "executor_executionresult",
|
| 2235 |
"target": "debugzero_environment_rationale_39"
|
openenv.yaml
CHANGED
|
@@ -5,4 +5,5 @@ type: space
|
|
| 5 |
runtime: fastapi
|
| 6 |
app: server.app:app
|
| 7 |
port: 8000
|
| 8 |
-
|
|
|
|
|
|
| 5 |
runtime: fastapi
|
| 6 |
app: server.app:app
|
| 7 |
port: 8000
|
| 8 |
+
workers: 4
|
| 9 |
+
max_concurrent_envs: 100
|
server/app.py
CHANGED
|
@@ -56,7 +56,7 @@ app = create_app(
|
|
| 56 |
DebugzeroAction,
|
| 57 |
DebugzeroObservation,
|
| 58 |
env_name="debugZero",
|
| 59 |
-
max_concurrent_envs=
|
| 60 |
)
|
| 61 |
|
| 62 |
|
|
|
|
| 56 |
DebugzeroAction,
|
| 57 |
DebugzeroObservation,
|
| 58 |
env_name="debugZero",
|
| 59 |
+
max_concurrent_envs=int(os.environ.get("MAX_CONCURRENT_ENVS", "100")),
|
| 60 |
)
|
| 61 |
|
| 62 |
|