databoysu commited on
Commit
0bc4dba
·
1 Parent(s): 1a7ff25

improve tooling and whitespace error handling error

Browse files
Files changed (1) hide show
  1. inference.py +14 -19
inference.py CHANGED
@@ -96,13 +96,6 @@ Action policy:
96
  - After RUN_TESTS, do not choose RUN_TESTS again immediately unless test evidence is genuinely missing.
97
  - Treat "no output" as invalid reasoning when pass_count_summary or traceback text is present.
98
 
99
- Strict state transition rules (no looping on same action):
100
- - If pass_count_summary=unknown (no test output yet), MUST run RUN_TESTS next (never VIEW_CODE or REPLACE_LINES).
101
- - After VIEW_CODE, prefer RUN_TESTS next to get test evidence (never do VIEW_CODE twice in a row).
102
- - CRITICAL RULE: After you use REPLACE_LINES, your VERY NEXT action MUST be RUN_TESTS to verify the edit. Do NOT use REPLACE_LINES twice in a row. Do NOT guess whether you made a syntax error; always run tests to get proof.
103
- - If REPLACE_LINES+RUN_TESTS did not fix the issue, do VIEW_CODE next to re-orient before another edit.
104
- - Do not attempt the same edit twice; if it fails, change strategy (VIEW_CODE or UNDO_EDIT or RESET_TO_ORIGINAL).
105
-
106
  Worked examples (generic, no benchmark task leakage):
107
 
108
  Example 1: failing tests after RUN_TESTS -> choose REPLACE_LINES
@@ -125,17 +118,6 @@ Observation: output explicitly shows Tests Passed: 3/3 and includes the success
125
  Valid action JSON:
126
  {"thought":"Observation: ... Diagnosis: ... Plan: ...","action_type":"SUBMIT","start_line":null,"end_line":null,"new_code_block":null}
127
 
128
- Example 4: NO TEST OUTPUT YET + UNKNOWN PASS COUNT -> MUST choose RUN_TESTS (not VIEW_CODE)
129
- Input evidence snippet:
130
- - pass_count_summary=unknown
131
- - all_tests_pass_signal=false
132
- - last_execution_output is empty ""
133
- Valid thought:
134
- Observation: no test execution has occurred; pass_count_summary is unknown and no test output is available. Diagnosis: cannot diagnose code bugs without seeing test failures; must run tests first. Plan: choose RUN_TESTS to collect test evidence and determine what needs fixing.
135
- Valid action JSON (WRONG - violates state rule):
136
- BAD: {"thought":"...","action_type":"VIEW_CODE","start_line":null,"end_line":null,"new_code_block":null}
137
- CORRECT: {"thought":"...","action_type":"RUN_TESTS","start_line":null,"end_line":null,"new_code_block":null}
138
-
139
  Submit gate (hard rule):
140
  - If any failure, error, traceback, xfailed/unfinished signal, or uncertainty remains, do not SUBMIT.
141
  - If all-tests-passed signal is present, do SUBMIT immediately on this turn.
@@ -368,6 +350,18 @@ async def run(difficulty: Optional[str] = None, show_thought: bool = False) -> N
368
  obs_last_output = str(getattr(result.observation, "last_execution_output", "") or "")
369
  pass_count_text, all_tests_pass_signal = _extract_pass_signal_fields(obs_last_output)
370
  last_action = action_trajectory[-1] if action_trajectory else "none"
 
 
 
 
 
 
 
 
 
 
 
 
371
  if show_thought:
372
  output_preview = "\\n".join(obs_last_output.splitlines()[:6])
373
  print("[OBS_DEBUG]", file=sys.stderr, flush=True)
@@ -386,8 +380,9 @@ async def run(difficulty: Optional[str] = None, show_thought: bool = False) -> N
386
  "If all_tests_pass_signal=true, you must choose SUBMIT now and must not choose RUN_TESTS again. "
387
  "Do not wait for additional test output when all_tests_pass_signal=true. "
388
  "If last_action was RUN_TESTS and all_tests_pass_signal=false, choose REPLACE_LINES or VIEW_CODE next, not RUN_TESTS again.\n\n"
 
 
389
  f"decision_guard: last_action={last_action}, pass_count_summary={pass_count_text}, all_tests_pass_signal={str(all_tests_pass_signal).lower()}\n\n"
390
- f"action_trajectory={(' -> '.join(action_trajectory) if action_trajectory else 'none')}\n\n"
391
  f"{obs_text}"
392
  ),
393
  }
 
96
  - After RUN_TESTS, do not choose RUN_TESTS again immediately unless test evidence is genuinely missing.
97
  - Treat "no output" as invalid reasoning when pass_count_summary or traceback text is present.
98
 
 
 
 
 
 
 
 
99
  Worked examples (generic, no benchmark task leakage):
100
 
101
  Example 1: failing tests after RUN_TESTS -> choose REPLACE_LINES
 
118
  Valid action JSON:
119
  {"thought":"Observation: ... Diagnosis: ... Plan: ...","action_type":"SUBMIT","start_line":null,"end_line":null,"new_code_block":null}
120
 
 
 
 
 
 
 
 
 
 
 
 
121
  Submit gate (hard rule):
122
  - If any failure, error, traceback, xfailed/unfinished signal, or uncertainty remains, do not SUBMIT.
123
  - If all-tests-passed signal is present, do SUBMIT immediately on this turn.
 
350
  obs_last_output = str(getattr(result.observation, "last_execution_output", "") or "")
351
  pass_count_text, all_tests_pass_signal = _extract_pass_signal_fields(obs_last_output)
352
  last_action = action_trajectory[-1] if action_trajectory else "none"
353
+ dynamic_override = ""
354
+ if action_trajectory and action_trajectory[-1] == "REPLACE_LINES":
355
+ dynamic_override = (
356
+ "\n[SYSTEM OVERRIDE]: Your last action was REPLACE_LINES. "
357
+ "You are STRICTLY FORBIDDEN from editing the code again. "
358
+ "Your action_type MUST be RUN_TESTS to verify the changes.\n"
359
+ )
360
+ elif action_trajectory and action_trajectory[-1] == "VIEW_CODE":
361
+ dynamic_override = (
362
+ "\n[SYSTEM OVERRIDE]: Your last action was VIEW_CODE. "
363
+ "You MUST choose RUN_TESTS next to get test evidence.\n"
364
+ )
365
  if show_thought:
366
  output_preview = "\\n".join(obs_last_output.splitlines()[:6])
367
  print("[OBS_DEBUG]", file=sys.stderr, flush=True)
 
380
  "If all_tests_pass_signal=true, you must choose SUBMIT now and must not choose RUN_TESTS again. "
381
  "Do not wait for additional test output when all_tests_pass_signal=true. "
382
  "If last_action was RUN_TESTS and all_tests_pass_signal=false, choose REPLACE_LINES or VIEW_CODE next, not RUN_TESTS again.\n\n"
383
+ f"action_trajectory={(' -> '.join(action_trajectory) if action_trajectory else 'none')}\n"
384
+ f"{dynamic_override}\n"
385
  f"decision_guard: last_action={last_action}, pass_count_summary={pass_count_text}, all_tests_pass_signal={str(all_tests_pass_signal).lower()}\n\n"
 
386
  f"{obs_text}"
387
  ),
388
  }