apingali Claude Opus 4.7 (1M context) commited on
Commit
e084f76
·
1 Parent(s): 75a8a07

fix(parser): bump quoted_span limit 200 → 400 chars

Browse files

Discovered via the SC-005 re-baseline after the trust_remote_code fix
landed on the Space. Phi-4-mini-instruct, when asked for a 5-15 word
quoted_span, consistently generates 200-220 char spans:

⚠ The model returned malformed output.
Detail: Score decreasing_marginal_cost.quoted_span must be ≤200 chars,
got 215

Two ways to fix:
(a) tighten the prompt to enforce a hard word/char limit
(b) loosen the parser ceiling

Picked (b). The 200-char floor was always a soft constraint to prevent
runaway model output, not a hard contract. Bumping to 400 chars gives
generous headroom for typical model output while still catching
pathological cases (a single 1000-char span would still fail loud).

Changes (symmetric Python + TS, per the parser-parity invariant):
gradio-apps/compounding-test/app.py:
parse_response — quoted_span ceiling 200 → 400
src/lib/diagnose-parser.ts:
parseResponse — same change

Test updates:
test_diagnose.py:
test_quoted_span_over_200_chars_raises → test_quoted_span_over_400_chars_raises
+ new test_quoted_span_up_to_400_chars_accepted (confirms 250-char
typical-output passes)
diagnose-parser.test.ts:
'rejects quoted_span over 200 chars' → 'rejects quoted_span over 400 chars'
+ new 'accepts quoted_span up to 400 chars (typical Phi-4-mini output)'

Verified:
pytest test_diagnose.py 64 passed, 1 skipped (+1 from new acceptance test)
vitest diagnose-parser 22 passed (+1 from new acceptance test)
npm run build clean

The Space currently runs the OLD ceiling — deploy.sh push to the Space
will land this fix. Visitor whose first attempt hit the malformed-output
error can retry after the rebuild (~3-5 min).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Files changed (2) hide show
  1. app.py +2 -2
  2. test_diagnose.py +17 -2
app.py CHANGED
@@ -148,9 +148,9 @@ def parse_response(raw: str) -> Response:
148
  )
149
  if not isinstance(s["quoted_span"], str) or not s["quoted_span"]:
150
  raise MalformedResponseError(f"Score {key}.quoted_span must be a non-empty string")
151
- if len(s["quoted_span"]) > 200:
152
  raise MalformedResponseError(
153
- f"Score {key}.quoted_span must be ≤200 chars, got {len(s['quoted_span'])}"
154
  )
155
  scores[key] = Score(
156
  score=s["score"], rationale=s["rationale"], quoted_span=s["quoted_span"]
 
148
  )
149
  if not isinstance(s["quoted_span"], str) or not s["quoted_span"]:
150
  raise MalformedResponseError(f"Score {key}.quoted_span must be a non-empty string")
151
+ if len(s["quoted_span"]) > 400:
152
  raise MalformedResponseError(
153
+ f"Score {key}.quoted_span must be ≤400 chars, got {len(s['quoted_span'])}"
154
  )
155
  scores[key] = Score(
156
  score=s["score"], rationale=s["rationale"], quoted_span=s["quoted_span"]
test_diagnose.py CHANGED
@@ -165,8 +165,12 @@ def test_empty_quoted_span_raises():
165
  parse_response(raw)
166
 
167
 
168
- def test_quoted_span_over_200_chars_raises():
169
- over_limit = "x" * 201
 
 
 
 
170
  raw = VALID_JSON_BLOCK.replace(
171
  '"quoted_span": "claim outcomes Progressive observes directly"',
172
  f'"quoted_span": "{over_limit}"',
@@ -175,6 +179,17 @@ def test_quoted_span_over_200_chars_raises():
175
  parse_response(raw)
176
 
177
 
 
 
 
 
 
 
 
 
 
 
 
178
  # --- Tolerance: forward-compat and whitespace ------------------------------
179
 
180
 
 
165
  parse_response(raw)
166
 
167
 
168
+ def test_quoted_span_over_400_chars_raises():
169
+ """The 400-char limit is a generous ceiling — Phi-4-mini consistently
170
+ generates ~200-220 char quoted_spans when asked for 5-15 words, so
171
+ we bumped from 200 to 400 to accommodate normal model output without
172
+ losing the runaway-output guard."""
173
+ over_limit = "x" * 401
174
  raw = VALID_JSON_BLOCK.replace(
175
  '"quoted_span": "claim outcomes Progressive observes directly"',
176
  f'"quoted_span": "{over_limit}"',
 
179
  parse_response(raw)
180
 
181
 
182
+ def test_quoted_span_up_to_400_chars_accepted():
183
+ """Confirms the new ceiling lets typical Phi-4-mini output through."""
184
+ at_limit = "x" * 250 # well above the prior 200-char cap
185
+ raw = VALID_JSON_BLOCK.replace(
186
+ '"quoted_span": "claim outcomes Progressive observes directly"',
187
+ f'"quoted_span": "{at_limit}"',
188
+ )
189
+ r = parse_response(raw)
190
+ assert len(r.scores["proprietary_data"].quoted_span) == 250
191
+
192
+
193
  # --- Tolerance: forward-compat and whitespace ------------------------------
194
 
195