Spaces:
Sleeping
Sleeping
databoysu commited on
Commit ·
1a7ff25
1
Parent(s): 807adf9
improve tooling and whitespace error handling error
Browse files- inference.py +1 -23
inference.py
CHANGED
|
@@ -99,7 +99,7 @@ Action policy:
|
|
| 99 |
Strict state transition rules (no looping on same action):
|
| 100 |
- If pass_count_summary=unknown (no test output yet), MUST run RUN_TESTS next (never VIEW_CODE or REPLACE_LINES).
|
| 101 |
- After VIEW_CODE, prefer RUN_TESTS next to get test evidence (never do VIEW_CODE twice in a row).
|
| 102 |
-
- After REPLACE_LINES,
|
| 103 |
- If REPLACE_LINES+RUN_TESTS did not fix the issue, do VIEW_CODE next to re-orient before another edit.
|
| 104 |
- Do not attempt the same edit twice; if it fails, change strategy (VIEW_CODE or UNDO_EDIT or RESET_TO_ORIGINAL).
|
| 105 |
|
|
@@ -125,16 +125,6 @@ Observation: output explicitly shows Tests Passed: 3/3 and includes the success
|
|
| 125 |
Valid action JSON:
|
| 126 |
{"thought":"Observation: ... Diagnosis: ... Plan: ...","action_type":"SUBMIT","start_line":null,"end_line":null,"new_code_block":null}
|
| 127 |
|
| 128 |
-
Example 3: tests ran but syntax error present -> choose REPLACE_LINES (or UNDO_EDIT)
|
| 129 |
-
Input evidence snippet:
|
| 130 |
-
- syntax_error=true
|
| 131 |
-
- pass_count_summary=Tests Passed: 0/3
|
| 132 |
-
- traceback includes SyntaxError with a file line reference.
|
| 133 |
-
Valid thought:
|
| 134 |
-
Observation: syntax_error=true and traceback provides a concrete syntax failure location. Diagnosis: code is syntactically invalid, so functional debugging is blocked until syntax is repaired. Plan: apply REPLACE_LINES at the indicated lines (or UNDO_EDIT if the latest edit caused this), then RUN_TESTS.
|
| 135 |
-
Valid action JSON:
|
| 136 |
-
{"thought":"Observation: ... Diagnosis: ... Plan: ...","action_type":"REPLACE_LINES","start_line":8,"end_line":9,"new_code_block":" # syntax-fixed code"}
|
| 137 |
-
|
| 138 |
Example 4: NO TEST OUTPUT YET + UNKNOWN PASS COUNT -> MUST choose RUN_TESTS (not VIEW_CODE)
|
| 139 |
Input evidence snippet:
|
| 140 |
- pass_count_summary=unknown
|
|
@@ -146,18 +136,6 @@ Valid action JSON (WRONG - violates state rule):
|
|
| 146 |
BAD: {"thought":"...","action_type":"VIEW_CODE","start_line":null,"end_line":null,"new_code_block":null}
|
| 147 |
CORRECT: {"thought":"...","action_type":"RUN_TESTS","start_line":null,"end_line":null,"new_code_block":null}
|
| 148 |
|
| 149 |
-
Example 5: EDIT FAILED - SAME ACTION TWICE IS WRONG
|
| 150 |
-
Previous step: REPLACE_LINES on line 4 (indentation fix attempt 1)
|
| 151 |
-
Current observation:
|
| 152 |
-
- syntax_error=true (indentation still wrong)
|
| 153 |
-
- same error message about dedentation
|
| 154 |
-
- pass_count_summary=unknown
|
| 155 |
-
Valid thought:
|
| 156 |
-
Observation: same indentation error persists despite the previous REPLACE_LINES attempt. The fix did not work. Diagnosis: attempting the identical REPLACE_LINES again will create an infinite loop; must change strategy. Plan: call VIEW_CODE next to re-examine the context and understand why the fix didn't apply properly, then revise the approach.
|
| 157 |
-
Valid action JSON (WRONG - causes looping):
|
| 158 |
-
BAD (same edit): {"thought":"...","action_type":"REPLACE_LINES","start_line":4,"end_line":4,"new_code_block":" fixed code"}
|
| 159 |
-
CORRECT (change strategy): {"thought":"...","action_type":"VIEW_CODE","start_line":null,"end_line":null,"new_code_block":null}
|
| 160 |
-
|
| 161 |
Submit gate (hard rule):
|
| 162 |
- If any failure, error, traceback, xfailed/unfinished signal, or uncertainty remains, do not SUBMIT.
|
| 163 |
- If all-tests-passed signal is present, do SUBMIT immediately on this turn.
|
|
|
|
| 99 |
Strict state transition rules (no looping on same action):
|
| 100 |
- If pass_count_summary=unknown (no test output yet), MUST run RUN_TESTS next (never VIEW_CODE or REPLACE_LINES).
|
| 101 |
- After VIEW_CODE, prefer RUN_TESTS next to get test evidence (never do VIEW_CODE twice in a row).
|
| 102 |
+
- CRITICAL RULE: After you use REPLACE_LINES, your VERY NEXT action MUST be RUN_TESTS to verify the edit. Do NOT use REPLACE_LINES twice in a row. Do NOT guess whether you made a syntax error; always run tests to get proof.
|
| 103 |
- If REPLACE_LINES+RUN_TESTS did not fix the issue, do VIEW_CODE next to re-orient before another edit.
|
| 104 |
- Do not attempt the same edit twice; if it fails, change strategy (VIEW_CODE or UNDO_EDIT or RESET_TO_ORIGINAL).
|
| 105 |
|
|
|
|
| 125 |
Valid action JSON:
|
| 126 |
{"thought":"Observation: ... Diagnosis: ... Plan: ...","action_type":"SUBMIT","start_line":null,"end_line":null,"new_code_block":null}
|
| 127 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 128 |
Example 4: NO TEST OUTPUT YET + UNKNOWN PASS COUNT -> MUST choose RUN_TESTS (not VIEW_CODE)
|
| 129 |
Input evidence snippet:
|
| 130 |
- pass_count_summary=unknown
|
|
|
|
| 136 |
BAD: {"thought":"...","action_type":"VIEW_CODE","start_line":null,"end_line":null,"new_code_block":null}
|
| 137 |
CORRECT: {"thought":"...","action_type":"RUN_TESTS","start_line":null,"end_line":null,"new_code_block":null}
|
| 138 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 139 |
Submit gate (hard rule):
|
| 140 |
- If any failure, error, traceback, xfailed/unfinished signal, or uncertainty remains, do not SUBMIT.
|
| 141 |
- If all-tests-passed signal is present, do SUBMIT immediately on this turn.
|