Spaces:
Sleeping
fix(parser): bump quoted_span limit 200 → 400 chars
Browse filesDiscovered via the SC-005 re-baseline after the trust_remote_code fix
landed on the Space. Phi-4-mini-instruct, when asked for a 5-15 word
quoted_span, consistently generates 200-220 char spans:
⚠ The model returned malformed output.
Detail: Score decreasing_marginal_cost.quoted_span must be ≤200 chars,
got 215
Two ways to fix:
(a) tighten the prompt to enforce a hard word/char limit
(b) loosen the parser ceiling
Picked (b). The 200-char floor was always a soft constraint to prevent
runaway model output, not a hard contract. Bumping to 400 chars gives
generous headroom for typical model output while still catching
pathological cases (a single 1000-char span would still fail loud).
Changes (symmetric Python + TS, per the parser-parity invariant):
gradio-apps/compounding-test/app.py:
parse_response — quoted_span ceiling 200 → 400
src/lib/diagnose-parser.ts:
parseResponse — same change
Test updates:
test_diagnose.py:
test_quoted_span_over_200_chars_raises → test_quoted_span_over_400_chars_raises
+ new test_quoted_span_up_to_400_chars_accepted (confirms 250-char
typical-output passes)
diagnose-parser.test.ts:
'rejects quoted_span over 200 chars' → 'rejects quoted_span over 400 chars'
+ new 'accepts quoted_span up to 400 chars (typical Phi-4-mini output)'
Verified:
pytest test_diagnose.py 64 passed, 1 skipped (+1 from new acceptance test)
vitest diagnose-parser 22 passed (+1 from new acceptance test)
npm run build clean
The Space currently runs the OLD ceiling — deploy.sh push to the Space
will land this fix. Visitor whose first attempt hit the malformed-output
error can retry after the rebuild (~3-5 min).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- app.py +2 -2
- test_diagnose.py +17 -2
|
@@ -148,9 +148,9 @@ def parse_response(raw: str) -> Response:
|
|
| 148 |
)
|
| 149 |
if not isinstance(s["quoted_span"], str) or not s["quoted_span"]:
|
| 150 |
raise MalformedResponseError(f"Score {key}.quoted_span must be a non-empty string")
|
| 151 |
-
if len(s["quoted_span"]) >
|
| 152 |
raise MalformedResponseError(
|
| 153 |
-
f"Score {key}.quoted_span must be ≤
|
| 154 |
)
|
| 155 |
scores[key] = Score(
|
| 156 |
score=s["score"], rationale=s["rationale"], quoted_span=s["quoted_span"]
|
|
|
|
| 148 |
)
|
| 149 |
if not isinstance(s["quoted_span"], str) or not s["quoted_span"]:
|
| 150 |
raise MalformedResponseError(f"Score {key}.quoted_span must be a non-empty string")
|
| 151 |
+
if len(s["quoted_span"]) > 400:
|
| 152 |
raise MalformedResponseError(
|
| 153 |
+
f"Score {key}.quoted_span must be ≤400 chars, got {len(s['quoted_span'])}"
|
| 154 |
)
|
| 155 |
scores[key] = Score(
|
| 156 |
score=s["score"], rationale=s["rationale"], quoted_span=s["quoted_span"]
|
|
@@ -165,8 +165,12 @@ def test_empty_quoted_span_raises():
|
|
| 165 |
parse_response(raw)
|
| 166 |
|
| 167 |
|
| 168 |
-
def
|
| 169 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 170 |
raw = VALID_JSON_BLOCK.replace(
|
| 171 |
'"quoted_span": "claim outcomes Progressive observes directly"',
|
| 172 |
f'"quoted_span": "{over_limit}"',
|
|
@@ -175,6 +179,17 @@ def test_quoted_span_over_200_chars_raises():
|
|
| 175 |
parse_response(raw)
|
| 176 |
|
| 177 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 178 |
# --- Tolerance: forward-compat and whitespace ------------------------------
|
| 179 |
|
| 180 |
|
|
|
|
| 165 |
parse_response(raw)
|
| 166 |
|
| 167 |
|
| 168 |
+
def test_quoted_span_over_400_chars_raises():
|
| 169 |
+
"""The 400-char limit is a generous ceiling — Phi-4-mini consistently
|
| 170 |
+
generates ~200-220 char quoted_spans when asked for 5-15 words, so
|
| 171 |
+
we bumped from 200 to 400 to accommodate normal model output without
|
| 172 |
+
losing the runaway-output guard."""
|
| 173 |
+
over_limit = "x" * 401
|
| 174 |
raw = VALID_JSON_BLOCK.replace(
|
| 175 |
'"quoted_span": "claim outcomes Progressive observes directly"',
|
| 176 |
f'"quoted_span": "{over_limit}"',
|
|
|
|
| 179 |
parse_response(raw)
|
| 180 |
|
| 181 |
|
| 182 |
+
def test_quoted_span_up_to_400_chars_accepted():
|
| 183 |
+
"""Confirms the new ceiling lets typical Phi-4-mini output through."""
|
| 184 |
+
at_limit = "x" * 250 # well above the prior 200-char cap
|
| 185 |
+
raw = VALID_JSON_BLOCK.replace(
|
| 186 |
+
'"quoted_span": "claim outcomes Progressive observes directly"',
|
| 187 |
+
f'"quoted_span": "{at_limit}"',
|
| 188 |
+
)
|
| 189 |
+
r = parse_response(raw)
|
| 190 |
+
assert len(r.scores["proprietary_data"].quoted_span) == 250
|
| 191 |
+
|
| 192 |
+
|
| 193 |
# --- Tolerance: forward-compat and whitespace ------------------------------
|
| 194 |
|
| 195 |
|