apingali Claude Opus 4.7 (1M context) commited on
Commit
21da7b6
Β·
1 Parent(s): 61393d7

test(hf-space): cover diagnose() Premium path + Anthropic call shape

Browse files

User asked: do we have tests verifying the Premium path actually
works, barring an invalid API key? Honest answer was no β€” the existing
31 tests covered parse_response and the provider dispatcher but
diagnose() and _call_anthropic had zero coverage. That's why my
Premium-key-override logic shipped unverified.

Added 11 unit tests + 1 opt-in integration test:

diagnose() input validation (4):
- empty description β†’ "Please describe" friendly error
- <200 words β†’ "at least 200 words" error with current count
- >5000 words β†’ "under 5000 words" error
- unknown provider β†’ "Unknown model provider" error

diagnose() Premium (anthropic) path (6):
- no env key + no user key β†’ "Premium needs an API key"
- whitespace-only user key β†’ same friendly error
- env key only β†’ dispatches; env key visible at backend call time
- user-supplied key OVERRIDES env key β€” the visitor-key-overrides-
Space-owner-key invariant for public Spaces
- backend raises β†’ F14 wraps as markdown with provider/model/
exception name/detail, no stack trace leak
- backend returns garbage β†’ MalformedResponseError surfaces as
"model returned malformed output", no crash

_call_anthropic call shape (1):
Mocks anthropic.Anthropic and asserts the SDK is invoked with:
- model = ANTHROPIC_MODEL_ID (claude-opus-4-7)
- max_tokens = 2500
- system block + cache_control: ephemeral
- user messages array
- response unwrapped via .content[0].text
Catches typos / forgotten cache_control / wrong arg names without
hitting the network or costing money.

Integration test (opt-in, 1):
test_call_anthropic_real_api_returns_text β€” skipped unless BOTH
ANTHROPIC_API_KEY AND ANTHROPIC_INTEGRATION=1 are set. Hits the
real API, costs ~$0.05/call. Run manually to verify keys + model
availability end-to-end; routine CI leaves it skipped.

All 42 unit tests pass + 1 skipped. No backend code changes; this is
pure new coverage. Run all:
pytest gradio-apps/compounding-test/test_diagnose.py -v
Run integration too:
ANTHROPIC_API_KEY=sk-... ANTHROPIC_INTEGRATION=1 pytest ...

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Files changed (1) hide show
  1. test_diagnose.py +262 -0
test_diagnose.py CHANGED
@@ -9,11 +9,18 @@ from __future__ import annotations
9
 
10
  import pytest
11
 
 
 
 
 
12
  from app import (
 
13
  MalformedResponseError,
14
  PROVIDERS,
 
15
  _call_model,
16
  _detect_provider,
 
17
  parse_response,
18
  )
19
  import app as app_module
@@ -270,6 +277,261 @@ def test_call_model_unknown_provider_raises():
270
  _call_model("s", "u", "bogus-provider")
271
 
272
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
273
  def test_warnings_populated_for_failure_quadrant():
274
  raw = VALID_JSON_BLOCK.replace('"quadrant": "compounder"', '"quadrant": "roman-candle"').replace(
275
  '"warnings": []',
 
9
 
10
  import pytest
11
 
12
+ import json
13
+ import os
14
+ from unittest.mock import MagicMock
15
+
16
  from app import (
17
+ ANTHROPIC_MODEL_ID,
18
  MalformedResponseError,
19
  PROVIDERS,
20
+ _call_anthropic,
21
  _call_model,
22
  _detect_provider,
23
+ diagnose,
24
  parse_response,
25
  )
26
  import app as app_module
 
277
  _call_model("s", "u", "bogus-provider")
278
 
279
 
280
+ # --- diagnose() input validation -------------------------------------------
281
+
282
+ # Reused across diagnose() tests: a description long enough to pass the
283
+ # 200-word minimum. The actual content doesn't matter for these tests
284
+ # because we mock the backend.
285
+ _LONG_DESCRIPTION = " ".join(["word"] * 250)
286
+
287
+
288
+ def test_diagnose_empty_description_returns_friendly_error():
289
+ writeup, json_str = diagnose("", None, None, None, provider="zerogpu")
290
+ assert "Please describe" in writeup
291
+ assert json_str == ""
292
+
293
+
294
+ def test_diagnose_short_description_returns_word_count_error():
295
+ short = " ".join(["word"] * 50)
296
+ writeup, json_str = diagnose(short, None, None, None, provider="zerogpu")
297
+ assert "at least 200 words" in writeup
298
+ assert "50" in writeup # current word count
299
+ assert json_str == ""
300
+
301
+
302
+ def test_diagnose_long_description_returns_word_count_error():
303
+ long = " ".join(["word"] * 5001)
304
+ writeup, json_str = diagnose(long, None, None, None, provider="zerogpu")
305
+ assert "under 5000 words" in writeup
306
+ assert json_str == ""
307
+
308
+
309
+ def test_diagnose_unknown_provider_returns_friendly_error():
310
+ writeup, json_str = diagnose(_LONG_DESCRIPTION, None, None, None, provider="bogus")
311
+ assert "Unknown model provider" in writeup
312
+ assert "bogus" in writeup
313
+ assert json_str == ""
314
+
315
+
316
+ # --- diagnose() Premium (Anthropic) path -----------------------------------
317
+
318
+
319
+ def test_diagnose_premium_without_any_key_returns_friendly_error(monkeypatch):
320
+ monkeypatch.delenv("ANTHROPIC_API_KEY", raising=False)
321
+ writeup, json_str = diagnose(
322
+ _LONG_DESCRIPTION, None, None, None,
323
+ provider="anthropic",
324
+ anthropic_api_key=None,
325
+ )
326
+ assert "Premium" in writeup
327
+ assert "API key" in writeup
328
+ assert json_str == ""
329
+
330
+
331
+ def test_diagnose_premium_with_empty_string_key_returns_friendly_error(monkeypatch):
332
+ # Whitespace-only key should not count as supplied
333
+ monkeypatch.delenv("ANTHROPIC_API_KEY", raising=False)
334
+ writeup, _ = diagnose(
335
+ _LONG_DESCRIPTION, None, None, None,
336
+ provider="anthropic",
337
+ anthropic_api_key=" ",
338
+ )
339
+ assert "Premium" in writeup
340
+ assert "API key" in writeup
341
+
342
+
343
+ def test_diagnose_premium_with_env_key_dispatches_to_anthropic(monkeypatch):
344
+ monkeypatch.setenv("ANTHROPIC_API_KEY", "sk-env-xxx")
345
+ captured = {}
346
+
347
+ def fake_anthropic(system, user):
348
+ captured["system"] = system
349
+ captured["user"] = user
350
+ captured["env_key_at_call_time"] = os.environ.get("ANTHROPIC_API_KEY")
351
+ return _VALID_BACKEND_RESPONSE
352
+
353
+ monkeypatch.setitem(PROVIDERS, "anthropic", fake_anthropic)
354
+
355
+ writeup, json_str = diagnose(
356
+ _LONG_DESCRIPTION, "insurance", "enterprise", "$1M–$10M",
357
+ provider="anthropic",
358
+ anthropic_api_key=None,
359
+ )
360
+ # Backend was called (so dispatch worked) and the env key was visible
361
+ assert captured.get("env_key_at_call_time") == "sk-env-xxx"
362
+ # And the response made it through the parser β†’ JSON tab populated
363
+ assert json_str
364
+ parsed = json.loads(json_str)
365
+ assert parsed["quadrant"] == "compounder"
366
+
367
+
368
+ def test_diagnose_premium_user_key_overrides_env(monkeypatch):
369
+ """The page's API-key field should take precedence over any
370
+ ANTHROPIC_API_KEY env var the Space owner has configured. This is
371
+ the F15-ish defense β€” a visitor pasting their own key should never
372
+ inadvertently use the Space owner's key."""
373
+ monkeypatch.setenv("ANTHROPIC_API_KEY", "sk-space-owner-xxx")
374
+ captured = {}
375
+
376
+ def fake_anthropic(system, user):
377
+ captured["env_key_at_call_time"] = os.environ.get("ANTHROPIC_API_KEY")
378
+ return _VALID_BACKEND_RESPONSE
379
+
380
+ monkeypatch.setitem(PROVIDERS, "anthropic", fake_anthropic)
381
+
382
+ diagnose(
383
+ _LONG_DESCRIPTION, None, None, None,
384
+ provider="anthropic",
385
+ anthropic_api_key="sk-user-yyy",
386
+ )
387
+ assert captured.get("env_key_at_call_time") == "sk-user-yyy"
388
+
389
+
390
+ def test_diagnose_premium_backend_exception_returns_friendly_error(monkeypatch):
391
+ """When the Anthropic SDK raises (auth fail, rate limit, network),
392
+ F14 should wrap it in a markdown message that names the provider,
393
+ model, exception class, and exception detail β€” never a raw trace."""
394
+ monkeypatch.setenv("ANTHROPIC_API_KEY", "sk-test")
395
+
396
+ class FakeAuthError(Exception):
397
+ pass
398
+
399
+ def failing_anthropic(system, user):
400
+ raise FakeAuthError("invalid x-api-key header")
401
+
402
+ monkeypatch.setitem(PROVIDERS, "anthropic", failing_anthropic)
403
+
404
+ writeup, json_str = diagnose(
405
+ _LONG_DESCRIPTION, None, None, None,
406
+ provider="anthropic",
407
+ anthropic_api_key=None,
408
+ )
409
+ assert "anthropic" in writeup
410
+ assert ANTHROPIC_MODEL_ID in writeup
411
+ assert "FakeAuthError" in writeup
412
+ assert "invalid x-api-key header" in writeup
413
+ assert "stack" not in writeup.lower() # no stack trace leaked
414
+ assert json_str == ""
415
+
416
+
417
+ def test_diagnose_premium_backend_returns_malformed_response(monkeypatch):
418
+ """When the backend returns something that fails the JSON schema
419
+ validator, surface the parser's error message β€” don't crash."""
420
+ monkeypatch.setenv("ANTHROPIC_API_KEY", "sk-test")
421
+
422
+ def garbage_anthropic(system, user):
423
+ return "Sorry, I cannot help with that request." # no JSON block
424
+
425
+ monkeypatch.setitem(PROVIDERS, "anthropic", garbage_anthropic)
426
+
427
+ writeup, json_str = diagnose(
428
+ _LONG_DESCRIPTION, None, None, None,
429
+ provider="anthropic",
430
+ anthropic_api_key=None,
431
+ )
432
+ assert "malformed output" in writeup
433
+ assert json_str == ""
434
+
435
+
436
+ # --- _call_anthropic: Anthropic SDK call shape -----------------------------
437
+ #
438
+ # Per Principle VII the actual API call is exempt from automated tests
439
+ # (the SDK and the remote API are not our code). But the SHAPE of the
440
+ # call we make IS our code: model id, system block, cache_control flag,
441
+ # messages structure, response-unwrap path. These are easy to typo and
442
+ # easy to miss in review. Shape tests catch that without hitting the
443
+ # network.
444
+
445
+
446
+ def test_call_anthropic_passes_system_block_with_cache_control(monkeypatch):
447
+ captured = {}
448
+
449
+ class FakeContentBlock:
450
+ text = "raw response text"
451
+
452
+ class FakeMessage:
453
+ content = [FakeContentBlock()]
454
+
455
+ class FakeClient:
456
+ class messages: # noqa: N801 β€” mirroring SDK's nested .messages.create
457
+ @staticmethod
458
+ def create(**kwargs):
459
+ captured.update(kwargs)
460
+ return FakeMessage()
461
+
462
+ import anthropic as anthropic_module
463
+ monkeypatch.setattr(anthropic_module, "Anthropic", lambda: FakeClient())
464
+
465
+ result = _call_anthropic("MY SYSTEM BLOCK", "MY USER PROMPT")
466
+
467
+ # The function unwrapped content[0].text correctly
468
+ assert result == "raw response text"
469
+ # Model + token budget
470
+ assert captured["model"] == ANTHROPIC_MODEL_ID
471
+ assert captured["max_tokens"] == 2500
472
+ # System block is a list of one dict with cache_control
473
+ sys_block = captured["system"]
474
+ assert isinstance(sys_block, list)
475
+ assert len(sys_block) == 1
476
+ assert sys_block[0]["type"] == "text"
477
+ assert sys_block[0]["text"] == "MY SYSTEM BLOCK"
478
+ assert sys_block[0]["cache_control"] == {"type": "ephemeral"}
479
+ # User prompt in the messages array
480
+ assert captured["messages"] == [{"role": "user", "content": "MY USER PROMPT"}]
481
+
482
+
483
+ # --- Integration test (opt-in; hits the real Anthropic API) ----------------
484
+ #
485
+ # Skipped unless ANTHROPIC_API_KEY is set AND ANTHROPIC_INTEGRATION=1 is
486
+ # set. Costs money to run (~$0.05 per call to Opus 4.7). Use this when
487
+ # you want to verify end-to-end that the key works and the model is
488
+ # reachable; routine CI should leave this skipped.
489
+
490
+ import pytest as _pytest # already imported above as pytest, but kept explicit
491
+
492
+
493
+ @_pytest.mark.skipif(
494
+ not (os.environ.get("ANTHROPIC_API_KEY") and os.environ.get("ANTHROPIC_INTEGRATION") == "1"),
495
+ reason="needs ANTHROPIC_API_KEY + ANTHROPIC_INTEGRATION=1 to hit the real API",
496
+ )
497
+ def test_call_anthropic_real_api_returns_text():
498
+ result = _call_anthropic(
499
+ "You are a one-word echo. Reply with exactly one word.",
500
+ "Say hello.",
501
+ )
502
+ assert isinstance(result, str)
503
+ assert len(result) > 0
504
+ assert len(result.split()) < 20 # one-word reply, generously bounded
505
+
506
+
507
+ # ---------------------------------------------------------------------------
508
+ # Fixture: a backend response that satisfies parse_response() so the
509
+ # diagnose-Premium happy-path tests can assert on parser output without
510
+ # duplicating the full JSON shape per test.
511
+ # ---------------------------------------------------------------------------
512
+
513
+ _VALID_BACKEND_RESPONSE = """```json
514
+ {
515
+ "constraint": "Underwriting at the quote screen.",
516
+ "scores": {
517
+ "proprietary_data": { "score": 4, "rationale": "first-party.", "quoted_span": "12-year claims database" },
518
+ "self_labeling": { "score": 4, "rationale": "policies self-label.", "quoted_span": "Every policy we write" },
519
+ "decreasing_marginal_cost": { "score": 3, "rationale": "amortized pipeline.", "quoted_span": "feed back into the next quarter" },
520
+ "defensible_asymmetry": { "score": 3, "rationale": "no data sharing.", "quoted_span": "we don't share data with" }
521
+ },
522
+ "quadrant": "compounder",
523
+ "closest_portrait": "progressive",
524
+ "closest_portrait_paragraph": "Your case tracks Progressive most closely.",
525
+ "warnings": []
526
+ }
527
+ ```
528
+
529
+ # The Verdict
530
+
531
+ Your initiative is a compounder. Here's why.
532
+ """
533
+
534
+
535
  def test_warnings_populated_for_failure_quadrant():
536
  raw = VALID_JSON_BLOCK.replace('"quadrant": "compounder"', '"quadrant": "roman-candle"').replace(
537
  '"warnings": []',