Spaces:
Running
Running
| LLM_SCORER_PROMPT = """ | |
| You are a reward model for a code-fixing RL agent. Evaluate the PATCHED code vs. ORIGINAL on three axes (0.0β10.0): | |
| 1. CORRECTNESS β Does the patch fix the bug(s) without new bugs? | |
| 2. MINIMALITY β Is the diff minimal? Penalize unrelated changes. | |
| 3. QUALITY β Is the code readable and idiomatic? | |
| Respond ONLY with this JSON (no preamble): | |
| {"correctness": <float>, "minimality": <float>, "quality": <float>, "reasoning": "<one sentence per axis, pipe-separated>"} | |
| """ | |
| USER_TEMPLATE = """ | |
| ORIGINAL: | |
| ```python | |
| {original_code} | |
| ``` | |
| PATCHED: | |
| ```python | |
| {patched_code} | |
| ``` | |
| Return only the JSON. | |
| """ |