Spaces:
Sleeping
deploy_env_space_tests.md β Test Plan for docs/modules/deploy_env_space.md
Target artifact: app.py (FastAPI entrypoint) + driftcall/routes/*.py (per-endpoint handlers: reset.py, step.py, state.py, close.py, health.py) + driftcall/session_cache.py (in-process session cache + eviction sweep) + Dockerfile + openenv.yaml
Spec doc: DRIFTCALL/docs/modules/deploy_env_space.md (final, sealed 2026-04-24)
Framework: pytest + httpx (via fastapi.testclient.TestClient) + hypothesis (properties) + docker CLI (integration only)
Owner: Person B (Rewards & Tests) β domain-reviewed by Person D (Deploy & Story)
Implements: deploy_env_space.md Β§2 (interface), Β§3 (behavior), Β§4 (data structures), Β§5 (error modes M1βM12), Β§7 (edge cases); DRIFTCALL/CLAUDE.md Β§3.1 (nine-section test-plan doc β this plan supplies the five required sections: Unit, Property, Integration, Coverage, Fixtures).
Coverage targets: 100% line + β₯ 95% branch on app.py + driftcall/routes/*.py + driftcall/session_cache.py. All 12 error modes M1βM12 must be raised by at least one test.
Numeric invariants: HTTP status codes are exact integers (200, 400, 401, 404, 409, 413, 429, 500, 503). TTL values in tests use time.monotonic() monkey-patched via freezegun-style fixture β wall-clock is never read directly. Bearer tokens are secrets.token_urlsafe(32) strings; never hardcoded magic values outside the valid_bearer_token fixture.
Mandatory assertion on every error response: json.loads(resp.text) == {"error": {"code": <slug>, "message": <str>}} and resp.headers["Cache-Control"] == "no-store" β enforced by helper assert_error_envelope(resp, code, http_status) that all error-path tests call.
Mandatory assertion on every success response: resp.headers["Content-Type"].startswith("application/json") (except /healthz which is text/plain).
Fixtures defined in Β§5 are shared with deploy_demo_space_tests.md (same names, same canonicalised content). If any fixture changes here, the shared copy in tests/conftest.py MUST be updated in lockstep, and deploy_demo_space_tests.md Β§5 cross-checked.
1. Unit Tests
Organisation: one pytest sub-package mirroring the route layout under tests/test_deploy_env/:
tests/test_deploy_env/
__init__.py
conftest.py # fixtures from Β§5, plus assert_error_envelope helper
test_healthz.py # /healthz β unauthenticated, cheap
test_auth.py # bearer enforcement across all mutating endpoints
test_session_header.py # X-Session-Id header validation
test_reset.py # POST /reset happy + error paths
test_step.py # POST /step happy + error paths
test_state.py # GET /state happy + error paths
test_close.py # POST /close happy + error paths
test_body_schemas.py # Β§2.1.1 shape conformance (envelope, not inner dataclass)
test_session_cache_unit.py # LRU, TTL, eviction sweep β direct cache tests
test_error_modes_mapping.py # M1..M12 matrix β every error mode hit at least once
test_status_code_map.py # every row of Β§2.2 table asserted
test_lifespan_eager_load.py # app.py lifespan loads Kokoro+Whisper BEFORE serving
Unit test case inventory β 28 cases total (exceeds the β₯ 20 requirement).
1.1 /healthz β test_healthz.py
| # | Name | Setup | Assertion |
|---|---|---|---|
| U1 | test_healthz_returns_200_plaintext_ok |
No auth header. | resp.status_code == 200; resp.text == "ok"; resp.headers["Content-Type"].startswith("text/plain"); endpoint does not require bearer (Β§3.5 "unauthenticated"). |
| U2 | test_healthz_works_when_models_loaded |
Lifespan fixture loads stub Kokoro+Whisper. | resp.status_code == 200; no 503 raised even under no-auth request. Confirms /healthz bypass is independent of model readiness gate for probe liveness. |
1.2 Bearer auth β test_auth.py
Applies to every mutating endpoint (/reset, /step, /state, /close).
| # | Name | Setup | Assertion |
|---|---|---|---|
| U3 | test_reset_missing_authorization_returns_401_M1 |
POST /reset with X-Session-Id but no Authorization header. |
assert_error_envelope(resp, code="unauthorized", http_status=401); matches M1. |
| U4 | test_step_bad_bearer_returns_401_M1 |
POST /step with Authorization: Bearer not-the-token. |
assert_error_envelope(resp, code="unauthorized", http_status=401); matches M1. Body must not leak the expected token. |
| U5 | test_state_missing_bearer_returns_401_M1 |
GET /state with no Authorization. |
assert_error_envelope(resp, code="unauthorized", http_status=401). |
| U6 | test_close_wrong_scheme_returns_401_M1 |
POST /close with Authorization: Basic <token> (wrong scheme). |
assert_error_envelope(resp, code="unauthorized", http_status=401). Only Bearer scheme accepted (Β§3.5). |
1.3 X-Session-Id header β test_session_header.py
| # | Name | Setup | Assertion |
|---|---|---|---|
| U7 | test_reset_missing_x_session_id_returns_400_M2 |
POST /reset with valid bearer, no X-Session-Id. |
assert_error_envelope(resp, code="missing_session_id", http_status=400); matches M2. |
| U8 | test_step_malformed_x_session_id_returns_400_M2 |
POST /step with X-Session-Id: "bad session!" (space + !, violates [A-Za-z0-9_-] charset). |
assert_error_envelope(resp, code="missing_session_id", http_status=400); matches M2 (treated as "not a valid session id"). |
| U9 | test_step_x_session_id_over_64_chars_returns_400_M2 |
POST /step with X-Session-Id of length 65. |
assert_error_envelope(resp, code="missing_session_id", http_status=400); matches M2. |
1.4 POST /reset β test_reset.py
| # | Name | Setup | Assertion |
|---|---|---|---|
| U10 | test_reset_happy_path_returns_200_and_observation_envelope |
Valid bearer, X-Session-Id: session_id_alpha, body {"seed": 42, "config": {"curriculum_stage": 1}}. |
resp.status_code == 200; body top-level keys == {"observation", "episode_id", "max_turns"}; episode_id is a uuid4 string; max_turns is int, 1 β€ value β€ 16; observation is a dict. Envelope conformance per Β§2.1.1. |
| U11 | test_reset_with_language_weights_returns_200 |
Valid bearer, valid session id, body {"config": {"language_weights": {"hi": 0.5, "ta": 0.5}}}. |
resp.status_code == 200; observation includes the requested language distribution's imprint (via info.config_echo if exposed β else just assert envelope). |
| U12 | test_reset_bad_json_returns_400_M7 |
POST /reset with body b"{not json" and Content-Type: application/json. |
assert_error_envelope(resp, code="bad_json", http_status=400); matches M7. |
| U13 | test_reset_invalid_curriculum_stage_returns_400_M8 |
Body {"config": {"curriculum_stage": 99}}. |
assert_error_envelope(resp, code="invalid_action", http_status=400); matches M8 (dataclass validation failure on reset config). |
| U14 | test_reset_payload_over_1mib_returns_413_M11 |
Body size = 1 MiB + 1 byte (padded config dict). |
assert_error_envelope(resp, code="payload_too_large", http_status=413); matches M11. |
1.5 POST /step β test_step.py
| # | Name | Setup | Assertion |
|---|---|---|---|
| U15 | test_step_happy_path_returns_200 |
Session pre-created via /reset; body {"action": {"action_type": "tool_call", "tool_name": "airline.search", "tool_args": {}}}. |
resp.status_code == 200; body keys == {"observation", "reward", "done", "info"}; reward is float or None; done is bool. Envelope per Β§2.1.1. |
| U16 | test_step_unknown_session_returns_404_M3 |
No prior /reset; POST /step with X-Session-Id: never-existed-0001. |
assert_error_envelope(resp, code="session_not_found", http_status=404); matches M3. |
| U17 | test_step_invalid_action_shape_returns_400_M8 |
Session pre-created; body {"action": {"action_type": "tool_call"}} (missing tool_name). |
assert_error_envelope(resp, code="invalid_action", http_status=400); matches M8. |
| U18 | test_step_internal_exception_returns_500_M9_no_stacktrace |
Monkey-patch env.step to raise RuntimeError("boom"). |
assert_error_envelope(resp, code="internal_error", http_status=500); matches M9. "boom" does not appear in body (stack-trace suppression Β§5 rule 1). resp.json()["error"]["request_id"] is present (ASGI scope id). |
1.6 GET /state β test_state.py
| # | Name | Setup | Assertion |
|---|---|---|---|
| U19 | test_state_happy_path_returns_200 |
Session pre-created via /reset then two /steps. |
resp.status_code == 200; body keys == {"state", "turn"}; turn == 2 (int). Envelope per Β§2.1.1. |
| U20 | test_state_expired_session_returns_404_M4 |
Session exists at t0; monotonic clock advanced by 3601 s via fixture; sweep runs; GET /state. |
assert_error_envelope(resp, code="session_expired", http_status=404); matches M4. |
1.7 POST /close β test_close.py
| # | Name | Setup | Assertion |
|---|---|---|---|
| U21 | test_close_happy_path_returns_200_and_final_state |
Session pre-created. | resp.status_code == 200; body keys == {"closed", "final_state"}; closed is True; final_state is a dict. |
| U22 | test_close_on_already_evicted_session_returns_200_with_null_final_state |
Session was evicted by sweep before /close arrives. |
resp.status_code == 200; resp.json() == {"closed": True, "final_state": None} (Β§2.1.1 "null if session was already evicted"). |
1.8 Session cache direct unit tests β test_session_cache_unit.py
These bypass HTTP and call the cache API directly, to pin the policy invariants from Β§3.2.
| # | Name | Setup | Assertion |
|---|---|---|---|
| U23 | test_cache_lru_eviction_on_11th_session |
Fill cache with sessions s0..s9 (max=10); insert s10. |
Cache size remains == 10; s0 (oldest last_touched) is evicted; s10 is present; env.close() was called on the evicted entry (spy assertion). Β§3.2 invariant. |
| U24 | test_cache_ttl_sweep_evicts_stale_entries |
Insert s_old at t0; advance monotonic clock by 3601 s; call cache.sweep(). |
s_old no longer in cache; spy confirms env.close() called; cache remains internally consistent (len == 0). Β§3.3. |
| U25 | test_cache_max_sessions_returns_429_M5_with_retry_after |
Cache full of 10 fresh sessions (all touched < 1 s ago); POST /reset with a new X-Session-Id. |
resp.status_code == 429; assert_error_envelope(resp, code="max_sessions", http_status=429); resp.headers["Retry-After"] == "30" (only M5 carries Retry-After β Β§5 rules). Matches M5. |
1.9 Error-mode matrix β test_error_modes_mapping.py
One parametrized test asserting M1..M12 are each reachable and return the expected HTTP code + slug. Parameters:
[
("M1", "unauthorized", 401, <bad_bearer_request>),
("M2", "missing_session_id", 400, <no_session_header_request>),
("M3", "session_not_found", 404, <step_on_unknown_sid>),
("M4", "session_expired", 404, <step_after_ttl_expiry>),
("M5", "max_sessions", 429, <reset_when_cache_full>),
("M6", "model_not_ready", 503, <step_before_lifespan_load>),
("M7", "bad_json", 400, <malformed_body>),
("M8", "invalid_action", 400, <wrong_action_shape>),
("M9", "internal_error", 500, <env_step_raises>),
("M10", "io_error", 500, <tmpfs_full_monkeypatch>),
("M11", "payload_too_large", 413, <oversize_body>),
("M12", "reset_in_progress", 409, <concurrent_reset_same_sid>),
]
| # | Name | Setup | Assertion |
|---|---|---|---|
| U26 | test_error_modes_M1_through_M12_full_matrix |
Parametrized over the 12 tuples above. | For every row: resp.status_code == expected_http; resp.json()["error"]["code"] == expected_slug; resp.headers["Cache-Control"] == "no-store"; resp.headers.get("Retry-After") is "30" iff row is M5 else absent. |
1.10 Lifespan eager load β test_lifespan_eager_load.py
| # | Name | Setup | Assertion |
|---|---|---|---|
| U27 | test_lifespan_loads_models_before_serving_requests |
Instrument audio.tts_kokoro.load and audio.asr_whisper.load with call-counter. Start app via LifespanManager; issue /reset immediately after startup event fires. |
Call-counters == 1 each before any request handler runs (assertion inside lifespan startup). Request returns 200, never 503. Β§7.3. |
| U28 | test_step_before_lifespan_complete_returns_503_M6 |
Monkey-patch lifespan to defer model load; issue /step during the deferred window. |
assert_error_envelope(resp, code="model_not_ready", http_status=503); matches M6. Confirms the guard exists before models are ready. |
2. Property Tests
Hypothesis-driven invariants on the deployment surface. Minimum 5 properties; this plan specifies 7 (two extra for margin).
2.1 P1 β /step is idempotent on invalid action (env state unchanged)
Strategy: invalid_action_strategy = hypothesis.strategies.dictionaries(...) producing action bodies that fail pydantic validation (missing fields, wrong types, unknown action_type).
Invariant:
pre_state = GET /state (turn = T)
resp = POST /step with invalid action # β 400 M8
post_state = GET /state
assert pre_state == post_state # turn unchanged, drift_schedule unchanged
assert resp.status_code == 400
Confirms Β§7.5 transactional step semantics: state only mutates after all work succeeds; a rejected action is a no-op.
2.2 P2 β Session expiration is monotonic and consistent
Strategy: st.integers(min_value=0, max_value=7200) for synthetic elapsed seconds.
Invariant:
For any elapsed β [0, 7200]:
if elapsed < 3600: /step returns 200 (session alive)
if elapsed >= 3600: /step returns 404 M4 (session expired)
Once expired, the session NEVER becomes alive again without a new /reset.
Tests monotone one-way transition: alive β expired is terminal. Β§3.2 TTL = 3600 s.
2.3 P3 β Error envelope shape is universal
Strategy: parametrized across all 12 error-triggering inputs (from U26 matrix).
Invariant: every error response satisfies:
body = resp.json()
set(body.keys()) == {"error"}
set(body["error"].keys()) >= {"code", "message"}
isinstance(body["error"]["code"], str) and body["error"]["code"] != ""
isinstance(body["error"]["message"], str)
"traceback" not in json.dumps(body).lower()
"bearer" not in body["error"]["message"].lower() # no token leakage
2.4 P4 β X-Session-Id charset and length round-trip
Strategy: st.text(alphabet=string.ascii_letters + string.digits + "_-", min_size=1, max_size=64) generates valid session ids; a second strategy generates invalid ones (containing !@# , length 0, length 65+).
Invariant:
valid_sid β /reset returns 200
invalid_sid β /reset returns 400 M2
After /reset with valid_sid:
GET /state with the same sid returns 200
GET /state with ANY other sid returns 404 M3
2.5 P5 β LRU eviction preserves cache size cap
Strategy: st.lists(st.text(alphabet=string.ascii_letters, min_size=8, max_size=16), min_size=11, max_size=50, unique=True) β sequences of distinct session ids.
Invariant: after POSTing /reset for every sid in the list (one at a time):
len(cache) == min(len(sids), 10)
The 10 present sids are exactly the 10 most-recently-inserted (by last_touched).
No env instance is leaked (every evicted env had .close() called exactly once).
2.6 P6 β Reward field is float-or-null
Strategy: parametrized over valid actions per DriftCallAction shape.
Invariant: every /step 200-response body satisfies:
reward = body["reward"]
assert reward is None or (isinstance(reward, float) and -1.0 <= reward <= 1.0)
assert isinstance(body["done"], bool)
Pins Β§2.1.1 envelope: reward: float | null, range aligned with openenv.yaml reward.range: [-1.0, 1.0] (Β§4.3).
2.7 P7 β Concurrent /reset on same sid never produces two envs
Strategy: hypothesis.stateful.RuleBasedStateMachine driving concurrent /reset calls on the same X-Session-Id via anyio.create_task_group.
Invariant:
Across N concurrent /reset calls on the same sid:
exactly one succeeds with 200 (winner)
the remaining N-1 return 409 M12 (reset_in_progress)
cache ends with exactly one env under that sid
no env instance is leaked
Β§7.1 per-session asyncio lock invariant.
3. Integration Tests
Cross-cutting scenarios that exercise real subsystems. Marked @pytest.mark.integration; run in CI only, not in the fast pytest tests/ loop.
3.1 I1 β End-to-end curl flow: /reset β 6Γ /step β /state β /close
Mechanism: subprocess.run(["curl", ...]) against a locally-booted FastAPI app (via uvicorn subprocess, port 7860). Uses the real curl binary to exercise headers + HTTP/1.1 semantics exactly as judges will.
Flow:
- Start uvicorn in a subprocess, wait for
/healthzto returnok(max 45 s, matchesHEALTHCHECK --start-period=45sin Β§4.2). curl -X POST /resetwith bearer +X-Session-Id: e2e-001, body{"seed": 42, "config": {"curriculum_stage": 1}}. Assert 200.- Loop 6 times:
curl -X POST /stepwith atool_callaction. Assert 200 each time; accumulatedonevalues. curl /state. Assert 200;turn >= 6.curl -X POST /close. Assert 200;closed is True.- Kill uvicorn subprocess; assert no zombie process.
Budget: single test must complete under 60 s including subprocess boot.
3.2 I2 β Docker build locally + openenv validate
Mechanism: docker build -t driftcall-env:test -f DRIFTCALL/Dockerfile DRIFTCALL/ then docker run -d -p 7860:7860 -e DRIFTCALL_ENV_TOKEN=test-token driftcall-env:test, then openenv validate http://localhost:7860 --auth-bearer test-token.
Assertions:
docker buildexits 0.- Image size < 2 GB (
docker image inspect driftcall-env:test --format '{{.Size}}'<2 * 1024**3). - Container healthz returns
okwithin 60 s ofdocker run. openenv validateexits 0 and its stdout contains each of:openenv.yaml parses, schema v1.0POST /resetsuccess linePOST /stepsuccess lineGET /statesuccess linePOST /closesuccess line6 endpoints validated, 0 errors
- Container cleanup:
docker rm -finfinallyblock.
Gating: marked @pytest.mark.skipif(not shutil.which("docker")) β locally opt-in, mandatory in CI.
3.3 I3 β HF Space deploy dry-run (no actual push)
Mechanism: hf upload --dry-run <team>/driftcall-env . --repo-type=space. Captures the file manifest that would be pushed.
Assertions:
- Exit code 0.
- Manifest includes:
app.py,openenv.yaml,Dockerfile,requirements.txt,README.md,driftcall/subtree. - Manifest excludes:
tests/,training/,data/raw/,.env*,*.ipynb,.git/. README.mdYAML frontmatter contains required keys:title,sdk: docker,app_port: 7860,emoji,colorFrom,colorTo(Β§4.4).- No actual network call to
huggingface.coβ enforced viamonkeypatchonhuggingface_huboutbound session to raise if reached.
3.4 I4 β Concurrent 10-session load test
Mechanism: anyio.create_task_group spawning 10 coroutines, each driving a unique X-Session-Id through /reset β 3Γ /step β /close against TestClient(app).
Assertions:
- All 10
/resetcalls return 200 (cache is exactly at cap). - An 11th concurrent
/reset(while the first 10 are stilllast_touched < TTL) returns 429 M5 withRetry-After: 30(proves cap enforcement under contention). - All 30
/stepcalls (3 Γ 10 sessions) return 200; no cross-session state bleed β each session'sobservation.turnprogresses independently (1, 2, 3). - All 10
/closecalls return 200. - Wall-clock budget: total test completes in < 30 s on CI 2-vCPU runner.
3.5 I5 β Cold-start lifespan blocks request serving until models loaded
Mechanism: Instrument audio.tts_kokoro.load with an artificial 2 s anyio.sleep. Boot the app via LifespanManager and concurrently fire a /reset request at t=0 (before startup completes).
Assertions:
- The
/resetrequest blocks until lifespan startup is complete β it does not return 503 during the loading window ifapp.pycorrectly awaits lifespan before accepting requests (this is the FastAPI default). - If instead we disable the lifespan gate (test variant), the request returns 503 M6 with
code="model_not_ready"β proves M6 is reachable and the guard is load-bearing. /healthzresponds 200 throughout (probe endpoint is cheap and does not require models β Β§3.5 "unauthenticated").
3.6 I6 β TTL sweep liveness under sustained traffic
Mechanism: Run the TestClient against the app for 70 s of simulated traffic (monotonic clock advanced via fixture), issuing one /reset per synthetic minute with a fresh X-Session-Id. Sweep runs every 60 s per Β§3.3.
Assertions:
- After the 61st synthetic second, the first session's entry has been evicted by the sweep task.
- A
/stepon that first session returns 404 M4. - The sweep task itself does not raise; logs contain exactly one "swept 1 expired session" structured log line per sweep cycle (Β§3.7 logging fields).
4. Coverage Target
Targets (enforced in CI via pytest --cov-fail-under):
| Artifact | Line coverage | Branch coverage |
|---|---|---|
app.py |
100% | β₯ 95% |
driftcall/routes/reset.py |
100% | β₯ 95% |
driftcall/routes/step.py |
100% | β₯ 95% |
driftcall/routes/state.py |
100% | β₯ 95% |
driftcall/routes/close.py |
100% | β₯ 95% |
driftcall/routes/health.py |
100% | 100% (trivial file) |
driftcall/session_cache.py |
100% | β₯ 95% |
Command:
pytest tests/test_deploy_env/ \
--cov=app \
--cov=driftcall.routes \
--cov=driftcall.session_cache \
--cov-branch \
--cov-report=term-missing \
--cov-fail-under=100
Branch-coverage carve-outs (documented pragmas, not silent): the except asyncio.CancelledError: raise guard at the bottom of the sweep task's loop is excluded via # pragma: no cover β re-raising a cancellation is standard-library contract and triggering it requires injecting a cancellation into the lifespan shutdown, which is covered by the lifespan test (I5) at the event-loop level.
Error-mode coverage ledger β every one of M1..M12 is raised by at least one test:
| Mode | Raised by | HTTP |
|---|---|---|
M1 unauthorized |
U3, U4, U5, U6, U26 | 401 |
M2 missing_session_id |
U7, U8, U9, U26, P4 | 400 |
M3 session_not_found |
U16, U26, P4 | 404 |
M4 session_expired |
U20, U26, P2, I6 | 404 |
M5 max_sessions |
U25, U26, I4 | 429 |
M6 model_not_ready |
U28, U26, I5 | 503 |
M7 bad_json |
U12, U26 | 400 |
M8 invalid_action |
U13, U17, U26, P1 | 400 |
M9 internal_error |
U18, U26 | 500 |
M10 io_error |
U26 (monkeypatched tmpfs full) | 500 |
M11 payload_too_large |
U14, U26 | 413 |
M12 reset_in_progress |
U26, P7 | 409 |
HTTP status codes asserted at least once: 200, 400, 401, 404, 409, 413, 429, 500, 503 β all nine from Β§2.2.
5. Fixtures
Defined in tests/conftest.py (project-wide) and imported by tests/test_deploy_env/conftest.py. Shared with deploy_demo_space_tests.md β any change here propagates there and vice versa.
5.1 fastapi_test_client
@pytest.fixture
def fastapi_test_client(monkeypatch, valid_bearer_token, stub_audio_models):
"""
Boots the FastAPI app with lifespan, stub Kokoro+Whisper loaded,
and bearer token injected into app config.
Yields a `fastapi.testclient.TestClient` that supports all HTTP verbs
against the live app (in-process, no socket).
Lifecycle: uses LifespanManager to fire startup/shutdown events;
cache is flushed between tests via autouse cache-reset fixture.
"""
monkeypatch.setenv("DRIFTCALL_ENV_TOKEN", valid_bearer_token)
from app import app
with TestClient(app) as client:
yield client
Used by: every unit test in Β§1, properties P1βP7, integration tests I1, I4, I5, I6.
5.2 valid_bearer_token
@pytest.fixture(scope="session")
def valid_bearer_token() -> str:
"""A freshly-generated URL-safe token, session-scoped so it is stable
across tests in one pytest run but distinct between runs."""
return secrets.token_urlsafe(32)
Used by: every test that asserts 200 on a mutating endpoint, plus the "bad bearer" tests (which receive valid_bearer_token + "x" as the wrong token).
5.3 session_id_alpha
@pytest.fixture
def session_id_alpha() -> str:
"""Deterministic session id for tests that only need one sid."""
return "session-alpha-0001"
Charset and length both pass the header validator (Β§2.1 headers table).
5.4 session_id_beta
@pytest.fixture
def session_id_beta() -> str:
"""Second deterministic session id for cross-session tests
(e.g., asserting no state bleed between alpha and beta)."""
return "session-beta-0002"
5.5 Helper fixtures (non-shared, internal to this test package)
stub_audio_modelsβ monkeypatchesaudio.tts_kokoro.loadandaudio.asr_whisper.loadto return lightweight stubs so lifespan completes in < 50 ms. Used everywhere except I5 (which tests real-ish load behavior).monotonic_clockβ monkeypatchestime.monotonic()to advance deterministically; used by U20, U24, P2, I6.cache_reset(autouse) β clearssession_cache._storebetween tests; prevents cross-test bleed.assert_error_envelope(resp, code, http_status)β imported helper, asserts envelope shape +Cache-Control: no-storeheader + optionalRetry-Afterwhencode == "max_sessions".one_mib_plus_one_bodyβ precomputedbytespayload for U14 (M11 oversize test).
Fixture ownership note: fastapi_test_client, valid_bearer_token, session_id_alpha, session_id_beta live in tests/conftest.py at the project root and are the shared set with deploy_demo_space_tests.md. Helper fixtures (Β§5.5) are local to tests/test_deploy_env/conftest.py and are not shared.
6. Non-goals (out of scope for this plan)
- Deep per-field validation of
DriftCallObservation/DriftCallAction/DriftCallStateβ owned byenv_tests.md+models_tests.md. - Reward math correctness β owned by
rewards_tests.md. - Kokoro / Whisper model quality β owned by
audio_tests.md. - Actual HF Hub pushes β forbidden in tests (Β§3.3 dry-run only); real push happens in Batch C3 manual verification.
- GPU behavior β deployment is CPU-only (deploy_env_space.md Β§1, Β§6.5 explicit non-dependency).
- Cross-worker cache coherence β documented as acceptable 404 path in Β§3.2 of the spec; not a test target for this hackathon (future hardening).