Spaces:
Sleeping
test(hf-space): cover diagnose() Premium path + Anthropic call shape
Browse filesUser asked: do we have tests verifying the Premium path actually
works, barring an invalid API key? Honest answer was no β the existing
31 tests covered parse_response and the provider dispatcher but
diagnose() and _call_anthropic had zero coverage. That's why my
Premium-key-override logic shipped unverified.
Added 11 unit tests + 1 opt-in integration test:
diagnose() input validation (4):
- empty description β "Please describe" friendly error
- <200 words β "at least 200 words" error with current count
- >5000 words β "under 5000 words" error
- unknown provider β "Unknown model provider" error
diagnose() Premium (anthropic) path (6):
- no env key + no user key β "Premium needs an API key"
- whitespace-only user key β same friendly error
- env key only β dispatches; env key visible at backend call time
- user-supplied key OVERRIDES env key β the visitor-key-overrides-
Space-owner-key invariant for public Spaces
- backend raises β F14 wraps as markdown with provider/model/
exception name/detail, no stack trace leak
- backend returns garbage β MalformedResponseError surfaces as
"model returned malformed output", no crash
_call_anthropic call shape (1):
Mocks anthropic.Anthropic and asserts the SDK is invoked with:
- model = ANTHROPIC_MODEL_ID (claude-opus-4-7)
- max_tokens = 2500
- system block + cache_control: ephemeral
- user messages array
- response unwrapped via .content[0].text
Catches typos / forgotten cache_control / wrong arg names without
hitting the network or costing money.
Integration test (opt-in, 1):
test_call_anthropic_real_api_returns_text β skipped unless BOTH
ANTHROPIC_API_KEY AND ANTHROPIC_INTEGRATION=1 are set. Hits the
real API, costs ~$0.05/call. Run manually to verify keys + model
availability end-to-end; routine CI leaves it skipped.
All 42 unit tests pass + 1 skipped. No backend code changes; this is
pure new coverage. Run all:
pytest gradio-apps/compounding-test/test_diagnose.py -v
Run integration too:
ANTHROPIC_API_KEY=sk-... ANTHROPIC_INTEGRATION=1 pytest ...
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- test_diagnose.py +262 -0
|
@@ -9,11 +9,18 @@ from __future__ import annotations
|
|
| 9 |
|
| 10 |
import pytest
|
| 11 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 12 |
from app import (
|
|
|
|
| 13 |
MalformedResponseError,
|
| 14 |
PROVIDERS,
|
|
|
|
| 15 |
_call_model,
|
| 16 |
_detect_provider,
|
|
|
|
| 17 |
parse_response,
|
| 18 |
)
|
| 19 |
import app as app_module
|
|
@@ -270,6 +277,261 @@ def test_call_model_unknown_provider_raises():
|
|
| 270 |
_call_model("s", "u", "bogus-provider")
|
| 271 |
|
| 272 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 273 |
def test_warnings_populated_for_failure_quadrant():
|
| 274 |
raw = VALID_JSON_BLOCK.replace('"quadrant": "compounder"', '"quadrant": "roman-candle"').replace(
|
| 275 |
'"warnings": []',
|
|
|
|
| 9 |
|
| 10 |
import pytest
|
| 11 |
|
| 12 |
+
import json
|
| 13 |
+
import os
|
| 14 |
+
from unittest.mock import MagicMock
|
| 15 |
+
|
| 16 |
from app import (
|
| 17 |
+
ANTHROPIC_MODEL_ID,
|
| 18 |
MalformedResponseError,
|
| 19 |
PROVIDERS,
|
| 20 |
+
_call_anthropic,
|
| 21 |
_call_model,
|
| 22 |
_detect_provider,
|
| 23 |
+
diagnose,
|
| 24 |
parse_response,
|
| 25 |
)
|
| 26 |
import app as app_module
|
|
|
|
| 277 |
_call_model("s", "u", "bogus-provider")
|
| 278 |
|
| 279 |
|
| 280 |
+
# --- diagnose() input validation -------------------------------------------
|
| 281 |
+
|
| 282 |
+
# Reused across diagnose() tests: a description long enough to pass the
|
| 283 |
+
# 200-word minimum. The actual content doesn't matter for these tests
|
| 284 |
+
# because we mock the backend.
|
| 285 |
+
_LONG_DESCRIPTION = " ".join(["word"] * 250)
|
| 286 |
+
|
| 287 |
+
|
| 288 |
+
def test_diagnose_empty_description_returns_friendly_error():
|
| 289 |
+
writeup, json_str = diagnose("", None, None, None, provider="zerogpu")
|
| 290 |
+
assert "Please describe" in writeup
|
| 291 |
+
assert json_str == ""
|
| 292 |
+
|
| 293 |
+
|
| 294 |
+
def test_diagnose_short_description_returns_word_count_error():
|
| 295 |
+
short = " ".join(["word"] * 50)
|
| 296 |
+
writeup, json_str = diagnose(short, None, None, None, provider="zerogpu")
|
| 297 |
+
assert "at least 200 words" in writeup
|
| 298 |
+
assert "50" in writeup # current word count
|
| 299 |
+
assert json_str == ""
|
| 300 |
+
|
| 301 |
+
|
| 302 |
+
def test_diagnose_long_description_returns_word_count_error():
|
| 303 |
+
long = " ".join(["word"] * 5001)
|
| 304 |
+
writeup, json_str = diagnose(long, None, None, None, provider="zerogpu")
|
| 305 |
+
assert "under 5000 words" in writeup
|
| 306 |
+
assert json_str == ""
|
| 307 |
+
|
| 308 |
+
|
| 309 |
+
def test_diagnose_unknown_provider_returns_friendly_error():
|
| 310 |
+
writeup, json_str = diagnose(_LONG_DESCRIPTION, None, None, None, provider="bogus")
|
| 311 |
+
assert "Unknown model provider" in writeup
|
| 312 |
+
assert "bogus" in writeup
|
| 313 |
+
assert json_str == ""
|
| 314 |
+
|
| 315 |
+
|
| 316 |
+
# --- diagnose() Premium (Anthropic) path -----------------------------------
|
| 317 |
+
|
| 318 |
+
|
| 319 |
+
def test_diagnose_premium_without_any_key_returns_friendly_error(monkeypatch):
|
| 320 |
+
monkeypatch.delenv("ANTHROPIC_API_KEY", raising=False)
|
| 321 |
+
writeup, json_str = diagnose(
|
| 322 |
+
_LONG_DESCRIPTION, None, None, None,
|
| 323 |
+
provider="anthropic",
|
| 324 |
+
anthropic_api_key=None,
|
| 325 |
+
)
|
| 326 |
+
assert "Premium" in writeup
|
| 327 |
+
assert "API key" in writeup
|
| 328 |
+
assert json_str == ""
|
| 329 |
+
|
| 330 |
+
|
| 331 |
+
def test_diagnose_premium_with_empty_string_key_returns_friendly_error(monkeypatch):
|
| 332 |
+
# Whitespace-only key should not count as supplied
|
| 333 |
+
monkeypatch.delenv("ANTHROPIC_API_KEY", raising=False)
|
| 334 |
+
writeup, _ = diagnose(
|
| 335 |
+
_LONG_DESCRIPTION, None, None, None,
|
| 336 |
+
provider="anthropic",
|
| 337 |
+
anthropic_api_key=" ",
|
| 338 |
+
)
|
| 339 |
+
assert "Premium" in writeup
|
| 340 |
+
assert "API key" in writeup
|
| 341 |
+
|
| 342 |
+
|
| 343 |
+
def test_diagnose_premium_with_env_key_dispatches_to_anthropic(monkeypatch):
|
| 344 |
+
monkeypatch.setenv("ANTHROPIC_API_KEY", "sk-env-xxx")
|
| 345 |
+
captured = {}
|
| 346 |
+
|
| 347 |
+
def fake_anthropic(system, user):
|
| 348 |
+
captured["system"] = system
|
| 349 |
+
captured["user"] = user
|
| 350 |
+
captured["env_key_at_call_time"] = os.environ.get("ANTHROPIC_API_KEY")
|
| 351 |
+
return _VALID_BACKEND_RESPONSE
|
| 352 |
+
|
| 353 |
+
monkeypatch.setitem(PROVIDERS, "anthropic", fake_anthropic)
|
| 354 |
+
|
| 355 |
+
writeup, json_str = diagnose(
|
| 356 |
+
_LONG_DESCRIPTION, "insurance", "enterprise", "$1Mβ$10M",
|
| 357 |
+
provider="anthropic",
|
| 358 |
+
anthropic_api_key=None,
|
| 359 |
+
)
|
| 360 |
+
# Backend was called (so dispatch worked) and the env key was visible
|
| 361 |
+
assert captured.get("env_key_at_call_time") == "sk-env-xxx"
|
| 362 |
+
# And the response made it through the parser β JSON tab populated
|
| 363 |
+
assert json_str
|
| 364 |
+
parsed = json.loads(json_str)
|
| 365 |
+
assert parsed["quadrant"] == "compounder"
|
| 366 |
+
|
| 367 |
+
|
| 368 |
+
def test_diagnose_premium_user_key_overrides_env(monkeypatch):
|
| 369 |
+
"""The page's API-key field should take precedence over any
|
| 370 |
+
ANTHROPIC_API_KEY env var the Space owner has configured. This is
|
| 371 |
+
the F15-ish defense β a visitor pasting their own key should never
|
| 372 |
+
inadvertently use the Space owner's key."""
|
| 373 |
+
monkeypatch.setenv("ANTHROPIC_API_KEY", "sk-space-owner-xxx")
|
| 374 |
+
captured = {}
|
| 375 |
+
|
| 376 |
+
def fake_anthropic(system, user):
|
| 377 |
+
captured["env_key_at_call_time"] = os.environ.get("ANTHROPIC_API_KEY")
|
| 378 |
+
return _VALID_BACKEND_RESPONSE
|
| 379 |
+
|
| 380 |
+
monkeypatch.setitem(PROVIDERS, "anthropic", fake_anthropic)
|
| 381 |
+
|
| 382 |
+
diagnose(
|
| 383 |
+
_LONG_DESCRIPTION, None, None, None,
|
| 384 |
+
provider="anthropic",
|
| 385 |
+
anthropic_api_key="sk-user-yyy",
|
| 386 |
+
)
|
| 387 |
+
assert captured.get("env_key_at_call_time") == "sk-user-yyy"
|
| 388 |
+
|
| 389 |
+
|
| 390 |
+
def test_diagnose_premium_backend_exception_returns_friendly_error(monkeypatch):
|
| 391 |
+
"""When the Anthropic SDK raises (auth fail, rate limit, network),
|
| 392 |
+
F14 should wrap it in a markdown message that names the provider,
|
| 393 |
+
model, exception class, and exception detail β never a raw trace."""
|
| 394 |
+
monkeypatch.setenv("ANTHROPIC_API_KEY", "sk-test")
|
| 395 |
+
|
| 396 |
+
class FakeAuthError(Exception):
|
| 397 |
+
pass
|
| 398 |
+
|
| 399 |
+
def failing_anthropic(system, user):
|
| 400 |
+
raise FakeAuthError("invalid x-api-key header")
|
| 401 |
+
|
| 402 |
+
monkeypatch.setitem(PROVIDERS, "anthropic", failing_anthropic)
|
| 403 |
+
|
| 404 |
+
writeup, json_str = diagnose(
|
| 405 |
+
_LONG_DESCRIPTION, None, None, None,
|
| 406 |
+
provider="anthropic",
|
| 407 |
+
anthropic_api_key=None,
|
| 408 |
+
)
|
| 409 |
+
assert "anthropic" in writeup
|
| 410 |
+
assert ANTHROPIC_MODEL_ID in writeup
|
| 411 |
+
assert "FakeAuthError" in writeup
|
| 412 |
+
assert "invalid x-api-key header" in writeup
|
| 413 |
+
assert "stack" not in writeup.lower() # no stack trace leaked
|
| 414 |
+
assert json_str == ""
|
| 415 |
+
|
| 416 |
+
|
| 417 |
+
def test_diagnose_premium_backend_returns_malformed_response(monkeypatch):
|
| 418 |
+
"""When the backend returns something that fails the JSON schema
|
| 419 |
+
validator, surface the parser's error message β don't crash."""
|
| 420 |
+
monkeypatch.setenv("ANTHROPIC_API_KEY", "sk-test")
|
| 421 |
+
|
| 422 |
+
def garbage_anthropic(system, user):
|
| 423 |
+
return "Sorry, I cannot help with that request." # no JSON block
|
| 424 |
+
|
| 425 |
+
monkeypatch.setitem(PROVIDERS, "anthropic", garbage_anthropic)
|
| 426 |
+
|
| 427 |
+
writeup, json_str = diagnose(
|
| 428 |
+
_LONG_DESCRIPTION, None, None, None,
|
| 429 |
+
provider="anthropic",
|
| 430 |
+
anthropic_api_key=None,
|
| 431 |
+
)
|
| 432 |
+
assert "malformed output" in writeup
|
| 433 |
+
assert json_str == ""
|
| 434 |
+
|
| 435 |
+
|
| 436 |
+
# --- _call_anthropic: Anthropic SDK call shape -----------------------------
|
| 437 |
+
#
|
| 438 |
+
# Per Principle VII the actual API call is exempt from automated tests
|
| 439 |
+
# (the SDK and the remote API are not our code). But the SHAPE of the
|
| 440 |
+
# call we make IS our code: model id, system block, cache_control flag,
|
| 441 |
+
# messages structure, response-unwrap path. These are easy to typo and
|
| 442 |
+
# easy to miss in review. Shape tests catch that without hitting the
|
| 443 |
+
# network.
|
| 444 |
+
|
| 445 |
+
|
| 446 |
+
def test_call_anthropic_passes_system_block_with_cache_control(monkeypatch):
|
| 447 |
+
captured = {}
|
| 448 |
+
|
| 449 |
+
class FakeContentBlock:
|
| 450 |
+
text = "raw response text"
|
| 451 |
+
|
| 452 |
+
class FakeMessage:
|
| 453 |
+
content = [FakeContentBlock()]
|
| 454 |
+
|
| 455 |
+
class FakeClient:
|
| 456 |
+
class messages: # noqa: N801 β mirroring SDK's nested .messages.create
|
| 457 |
+
@staticmethod
|
| 458 |
+
def create(**kwargs):
|
| 459 |
+
captured.update(kwargs)
|
| 460 |
+
return FakeMessage()
|
| 461 |
+
|
| 462 |
+
import anthropic as anthropic_module
|
| 463 |
+
monkeypatch.setattr(anthropic_module, "Anthropic", lambda: FakeClient())
|
| 464 |
+
|
| 465 |
+
result = _call_anthropic("MY SYSTEM BLOCK", "MY USER PROMPT")
|
| 466 |
+
|
| 467 |
+
# The function unwrapped content[0].text correctly
|
| 468 |
+
assert result == "raw response text"
|
| 469 |
+
# Model + token budget
|
| 470 |
+
assert captured["model"] == ANTHROPIC_MODEL_ID
|
| 471 |
+
assert captured["max_tokens"] == 2500
|
| 472 |
+
# System block is a list of one dict with cache_control
|
| 473 |
+
sys_block = captured["system"]
|
| 474 |
+
assert isinstance(sys_block, list)
|
| 475 |
+
assert len(sys_block) == 1
|
| 476 |
+
assert sys_block[0]["type"] == "text"
|
| 477 |
+
assert sys_block[0]["text"] == "MY SYSTEM BLOCK"
|
| 478 |
+
assert sys_block[0]["cache_control"] == {"type": "ephemeral"}
|
| 479 |
+
# User prompt in the messages array
|
| 480 |
+
assert captured["messages"] == [{"role": "user", "content": "MY USER PROMPT"}]
|
| 481 |
+
|
| 482 |
+
|
| 483 |
+
# --- Integration test (opt-in; hits the real Anthropic API) ----------------
|
| 484 |
+
#
|
| 485 |
+
# Skipped unless ANTHROPIC_API_KEY is set AND ANTHROPIC_INTEGRATION=1 is
|
| 486 |
+
# set. Costs money to run (~$0.05 per call to Opus 4.7). Use this when
|
| 487 |
+
# you want to verify end-to-end that the key works and the model is
|
| 488 |
+
# reachable; routine CI should leave this skipped.
|
| 489 |
+
|
| 490 |
+
import pytest as _pytest # already imported above as pytest, but kept explicit
|
| 491 |
+
|
| 492 |
+
|
| 493 |
+
@_pytest.mark.skipif(
|
| 494 |
+
not (os.environ.get("ANTHROPIC_API_KEY") and os.environ.get("ANTHROPIC_INTEGRATION") == "1"),
|
| 495 |
+
reason="needs ANTHROPIC_API_KEY + ANTHROPIC_INTEGRATION=1 to hit the real API",
|
| 496 |
+
)
|
| 497 |
+
def test_call_anthropic_real_api_returns_text():
|
| 498 |
+
result = _call_anthropic(
|
| 499 |
+
"You are a one-word echo. Reply with exactly one word.",
|
| 500 |
+
"Say hello.",
|
| 501 |
+
)
|
| 502 |
+
assert isinstance(result, str)
|
| 503 |
+
assert len(result) > 0
|
| 504 |
+
assert len(result.split()) < 20 # one-word reply, generously bounded
|
| 505 |
+
|
| 506 |
+
|
| 507 |
+
# ---------------------------------------------------------------------------
|
| 508 |
+
# Fixture: a backend response that satisfies parse_response() so the
|
| 509 |
+
# diagnose-Premium happy-path tests can assert on parser output without
|
| 510 |
+
# duplicating the full JSON shape per test.
|
| 511 |
+
# ---------------------------------------------------------------------------
|
| 512 |
+
|
| 513 |
+
_VALID_BACKEND_RESPONSE = """```json
|
| 514 |
+
{
|
| 515 |
+
"constraint": "Underwriting at the quote screen.",
|
| 516 |
+
"scores": {
|
| 517 |
+
"proprietary_data": { "score": 4, "rationale": "first-party.", "quoted_span": "12-year claims database" },
|
| 518 |
+
"self_labeling": { "score": 4, "rationale": "policies self-label.", "quoted_span": "Every policy we write" },
|
| 519 |
+
"decreasing_marginal_cost": { "score": 3, "rationale": "amortized pipeline.", "quoted_span": "feed back into the next quarter" },
|
| 520 |
+
"defensible_asymmetry": { "score": 3, "rationale": "no data sharing.", "quoted_span": "we don't share data with" }
|
| 521 |
+
},
|
| 522 |
+
"quadrant": "compounder",
|
| 523 |
+
"closest_portrait": "progressive",
|
| 524 |
+
"closest_portrait_paragraph": "Your case tracks Progressive most closely.",
|
| 525 |
+
"warnings": []
|
| 526 |
+
}
|
| 527 |
+
```
|
| 528 |
+
|
| 529 |
+
# The Verdict
|
| 530 |
+
|
| 531 |
+
Your initiative is a compounder. Here's why.
|
| 532 |
+
"""
|
| 533 |
+
|
| 534 |
+
|
| 535 |
def test_warnings_populated_for_failure_quadrant():
|
| 536 |
raw = VALID_JSON_BLOCK.replace('"quadrant": "compounder"', '"quadrant": "roman-candle"').replace(
|
| 537 |
'"warnings": []',
|