databoysu commited on
Commit
1a7ff25
·
1 Parent(s): 807adf9

improve tooling and whitespace error handling error

Browse files
Files changed (1) hide show
  1. inference.py +1 -23
inference.py CHANGED
@@ -99,7 +99,7 @@ Action policy:
99
  Strict state transition rules (no looping on same action):
100
  - If pass_count_summary=unknown (no test output yet), MUST run RUN_TESTS next (never VIEW_CODE or REPLACE_LINES).
101
  - After VIEW_CODE, prefer RUN_TESTS next to get test evidence (never do VIEW_CODE twice in a row).
102
- - After REPLACE_LINES, always do RUN_TESTS next to validate the fix (never do REPLACE_LINES twice in a row).
103
  - If REPLACE_LINES+RUN_TESTS did not fix the issue, do VIEW_CODE next to re-orient before another edit.
104
  - Do not attempt the same edit twice; if it fails, change strategy (VIEW_CODE or UNDO_EDIT or RESET_TO_ORIGINAL).
105
 
@@ -125,16 +125,6 @@ Observation: output explicitly shows Tests Passed: 3/3 and includes the success
125
  Valid action JSON:
126
  {"thought":"Observation: ... Diagnosis: ... Plan: ...","action_type":"SUBMIT","start_line":null,"end_line":null,"new_code_block":null}
127
 
128
- Example 3: tests ran but syntax error present -> choose REPLACE_LINES (or UNDO_EDIT)
129
- Input evidence snippet:
130
- - syntax_error=true
131
- - pass_count_summary=Tests Passed: 0/3
132
- - traceback includes SyntaxError with a file line reference.
133
- Valid thought:
134
- Observation: syntax_error=true and traceback provides a concrete syntax failure location. Diagnosis: code is syntactically invalid, so functional debugging is blocked until syntax is repaired. Plan: apply REPLACE_LINES at the indicated lines (or UNDO_EDIT if the latest edit caused this), then RUN_TESTS.
135
- Valid action JSON:
136
- {"thought":"Observation: ... Diagnosis: ... Plan: ...","action_type":"REPLACE_LINES","start_line":8,"end_line":9,"new_code_block":" # syntax-fixed code"}
137
-
138
  Example 4: NO TEST OUTPUT YET + UNKNOWN PASS COUNT -> MUST choose RUN_TESTS (not VIEW_CODE)
139
  Input evidence snippet:
140
  - pass_count_summary=unknown
@@ -146,18 +136,6 @@ Valid action JSON (WRONG - violates state rule):
146
  BAD: {"thought":"...","action_type":"VIEW_CODE","start_line":null,"end_line":null,"new_code_block":null}
147
  CORRECT: {"thought":"...","action_type":"RUN_TESTS","start_line":null,"end_line":null,"new_code_block":null}
148
 
149
- Example 5: EDIT FAILED - SAME ACTION TWICE IS WRONG
150
- Previous step: REPLACE_LINES on line 4 (indentation fix attempt 1)
151
- Current observation:
152
- - syntax_error=true (indentation still wrong)
153
- - same error message about dedentation
154
- - pass_count_summary=unknown
155
- Valid thought:
156
- Observation: same indentation error persists despite the previous REPLACE_LINES attempt. The fix did not work. Diagnosis: attempting the identical REPLACE_LINES again will create an infinite loop; must change strategy. Plan: call VIEW_CODE next to re-examine the context and understand why the fix didn't apply properly, then revise the approach.
157
- Valid action JSON (WRONG - causes looping):
158
- BAD (same edit): {"thought":"...","action_type":"REPLACE_LINES","start_line":4,"end_line":4,"new_code_block":" fixed code"}
159
- CORRECT (change strategy): {"thought":"...","action_type":"VIEW_CODE","start_line":null,"end_line":null,"new_code_block":null}
160
-
161
  Submit gate (hard rule):
162
  - If any failure, error, traceback, xfailed/unfinished signal, or uncertainty remains, do not SUBMIT.
163
  - If all-tests-passed signal is present, do SUBMIT immediately on this turn.
 
99
  Strict state transition rules (no looping on same action):
100
  - If pass_count_summary=unknown (no test output yet), MUST run RUN_TESTS next (never VIEW_CODE or REPLACE_LINES).
101
  - After VIEW_CODE, prefer RUN_TESTS next to get test evidence (never do VIEW_CODE twice in a row).
102
+ - CRITICAL RULE: After you use REPLACE_LINES, your VERY NEXT action MUST be RUN_TESTS to verify the edit. Do NOT use REPLACE_LINES twice in a row. Do NOT guess whether you made a syntax error; always run tests to get proof.
103
  - If REPLACE_LINES+RUN_TESTS did not fix the issue, do VIEW_CODE next to re-orient before another edit.
104
  - Do not attempt the same edit twice; if it fails, change strategy (VIEW_CODE or UNDO_EDIT or RESET_TO_ORIGINAL).
105
 
 
125
  Valid action JSON:
126
  {"thought":"Observation: ... Diagnosis: ... Plan: ...","action_type":"SUBMIT","start_line":null,"end_line":null,"new_code_block":null}
127
 
 
 
 
 
 
 
 
 
 
 
128
  Example 4: NO TEST OUTPUT YET + UNKNOWN PASS COUNT -> MUST choose RUN_TESTS (not VIEW_CODE)
129
  Input evidence snippet:
130
  - pass_count_summary=unknown
 
136
  BAD: {"thought":"...","action_type":"VIEW_CODE","start_line":null,"end_line":null,"new_code_block":null}
137
  CORRECT: {"thought":"...","action_type":"RUN_TESTS","start_line":null,"end_line":null,"new_code_block":null}
138
 
 
 
 
 
 
 
 
 
 
 
 
 
139
  Submit gate (hard rule):
140
  - If any failure, error, traceback, xfailed/unfinished signal, or uncertainty remains, do not SUBMIT.
141
  - If all-tests-passed signal is present, do SUBMIT immediately on this turn.