Spaces:

saumilyajj
/

driftcall

Sleeping

App Files Files Community

driftcall / docs /tests /deploy_env_space_tests.md

saumilyajj

Upload folder using huggingface_hub

f2df60e verified about 1 month ago

preview code

raw

history blame contribute delete

27.4 kB

	# deploy_env_space_tests.md — Test Plan for `docs/modules/deploy_env_space.md`

	Target artifact: `app.py` (FastAPI entrypoint) + `driftcall/routes/*.py` (per-endpoint handlers: `reset.py`, `step.py`, `state.py`, `close.py`, `health.py`) + `driftcall/session_cache.py` (in-process session cache + eviction sweep) + `Dockerfile` + `openenv.yaml`
	Spec doc: `DRIFTCALL/docs/modules/deploy_env_space.md` (final, sealed 2026-04-24)
	Framework: `pytest` + `httpx` (via `fastapi.testclient.TestClient`) + `hypothesis` (properties) + `docker` CLI (integration only)
	Owner: Person B (Rewards & Tests) — domain-reviewed by Person D (Deploy & Story)
	Implements: deploy_env_space.md §2 (interface), §3 (behavior), §4 (data structures), §5 (error modes M1–M12), §7 (edge cases); `DRIFTCALL/CLAUDE.md §3.1` (nine-section test-plan doc — this plan supplies the five required sections: Unit, Property, Integration, Coverage, Fixtures).
	Coverage targets: 100% line + ≥ 95% branch on `app.py` + `driftcall/routes/.py` + `driftcall/session_cache.py`. All 12 error modes M1–M12* must be raised by at least one test.
	Numeric invariants: HTTP status codes are exact integers (200, 400, 401, 404, 409, 413, 429, 500, 503). TTL values in tests use `time.monotonic()` monkey-patched via `freezegun`-style fixture — wall-clock is never read directly. Bearer tokens are `secrets.token_urlsafe(32)` strings; never hardcoded magic values outside the `valid_bearer_token` fixture.
	Mandatory assertion on every error response: `json.loads(resp.text) == {"error": {"code": <slug>, "message": <str>}}` and `resp.headers["Cache-Control"] == "no-store"` — enforced by helper `assert_error_envelope(resp, code, http_status)` that all error-path tests call.
	Mandatory assertion on every success response: `resp.headers["Content-Type"].startswith("application/json")` (except `/healthz` which is `text/plain`).

	Fixtures defined in §5 are shared with `deploy_demo_space_tests.md` (same names, same canonicalised content). If any fixture changes here, the shared copy in `tests/conftest.py` MUST be updated in lockstep, and `deploy_demo_space_tests.md §5` cross-checked.

	---

	## 1. Unit Tests

	Organisation: one `pytest` sub-package mirroring the route layout under `tests/test_deploy_env/`:

	```
	tests/test_deploy_env/
	__init__.py
	conftest.py # fixtures from §5, plus assert_error_envelope helper
	test_healthz.py # /healthz — unauthenticated, cheap
	test_auth.py # bearer enforcement across all mutating endpoints
	test_session_header.py # X-Session-Id header validation
	test_reset.py # POST /reset happy + error paths
	test_step.py # POST /step happy + error paths
	test_state.py # GET /state happy + error paths
	test_close.py # POST /close happy + error paths
	test_body_schemas.py # §2.1.1 shape conformance (envelope, not inner dataclass)
	test_session_cache_unit.py # LRU, TTL, eviction sweep — direct cache tests
	test_error_modes_mapping.py # M1..M12 matrix — every error mode hit at least once
	test_status_code_map.py # every row of §2.2 table asserted
	test_lifespan_eager_load.py # app.py lifespan loads Kokoro+Whisper BEFORE serving
	```

	Unit test case inventory — 28 cases total (exceeds the ≥ 20 requirement).

	### 1.1 `/healthz` — `test_healthz.py`

	\| # \| Name \| Setup \| Assertion \|
	\|---\|---\|---\|---\|
	\| U1 \| `test_healthz_returns_200_plaintext_ok` \| No auth header. \| `resp.status_code == 200`; `resp.text == "ok"`; `resp.headers["Content-Type"].startswith("text/plain")`; endpoint does not require bearer (§3.5 "unauthenticated"). \|
	\| U2 \| `test_healthz_works_when_models_loaded` \| Lifespan fixture loads stub Kokoro+Whisper. \| `resp.status_code == 200`; no 503 raised even under no-auth request. Confirms `/healthz` bypass is independent of model readiness gate for probe liveness. \|

	### 1.2 Bearer auth — `test_auth.py`

	Applies to every mutating endpoint (`/reset`, `/step`, `/state`, `/close`).

	\| # \| Name \| Setup \| Assertion \|
	\|---\|---\|---\|---\|
	\| U3 \| `test_reset_missing_authorization_returns_401_M1` \| POST `/reset` with `X-Session-Id` but no `Authorization` header. \| `assert_error_envelope(resp, code="unauthorized", http_status=401)`; matches M1. \|
	\| U4 \| `test_step_bad_bearer_returns_401_M1` \| POST `/step` with `Authorization: Bearer not-the-token`. \| `assert_error_envelope(resp, code="unauthorized", http_status=401)`; matches M1. Body must not leak the expected token. \|
	\| U5 \| `test_state_missing_bearer_returns_401_M1` \| GET `/state` with no `Authorization`. \| `assert_error_envelope(resp, code="unauthorized", http_status=401)`. \|
	\| U6 \| `test_close_wrong_scheme_returns_401_M1` \| POST `/close` with `Authorization: Basic <token>` (wrong scheme). \| `assert_error_envelope(resp, code="unauthorized", http_status=401)`. Only `Bearer` scheme accepted (§3.5). \|

	### 1.3 `X-Session-Id` header — `test_session_header.py`

	\| # \| Name \| Setup \| Assertion \|
	\|---\|---\|---\|---\|
	\| U7 \| `test_reset_missing_x_session_id_returns_400_M2` \| POST `/reset` with valid bearer, no `X-Session-Id`. \| `assert_error_envelope(resp, code="missing_session_id", http_status=400)`; matches M2. \|
	\| U8 \| `test_step_malformed_x_session_id_returns_400_M2` \| POST `/step` with `X-Session-Id: "bad session!"` (space + `!`, violates `[A-Za-z0-9_-]` charset). \| `assert_error_envelope(resp, code="missing_session_id", http_status=400)`; matches M2 (treated as "not a valid session id"). \|
	\| U9 \| `test_step_x_session_id_over_64_chars_returns_400_M2` \| POST `/step` with `X-Session-Id` of length 65. \| `assert_error_envelope(resp, code="missing_session_id", http_status=400)`; matches M2. \|

	### 1.4 `POST /reset` — `test_reset.py`

	\| # \| Name \| Setup \| Assertion \|
	\|---\|---\|---\|---\|
	\| U10 \| `test_reset_happy_path_returns_200_and_observation_envelope` \| Valid bearer, `X-Session-Id: session_id_alpha`, body `{"seed": 42, "config": {"curriculum_stage": 1}}`. \| `resp.status_code == 200`; body top-level keys `== {"observation", "episode_id", "max_turns"}`; `episode_id` is a uuid4 string; `max_turns` is `int`, 1 ≤ value ≤ 16; `observation` is a dict. Envelope conformance per §2.1.1. \|
	\| U11 \| `test_reset_with_language_weights_returns_200` \| Valid bearer, valid session id, body `{"config": {"language_weights": {"hi": 0.5, "ta": 0.5}}}`. \| `resp.status_code == 200`; observation includes the requested language distribution's imprint (via `info.config_echo` if exposed — else just assert envelope). \|
	\| U12 \| `test_reset_bad_json_returns_400_M7` \| POST `/reset` with body `b"{not json"` and `Content-Type: application/json`. \| `assert_error_envelope(resp, code="bad_json", http_status=400)`; matches M7. \|
	\| U13 \| `test_reset_invalid_curriculum_stage_returns_400_M8` \| Body `{"config": {"curriculum_stage": 99}}`. \| `assert_error_envelope(resp, code="invalid_action", http_status=400)`; matches M8 (dataclass validation failure on reset config). \|
	\| U14 \| `test_reset_payload_over_1mib_returns_413_M11` \| Body size = 1 MiB + 1 byte (padded `config` dict). \| `assert_error_envelope(resp, code="payload_too_large", http_status=413)`; matches M11. \|

	### 1.5 `POST /step` — `test_step.py`

	\| # \| Name \| Setup \| Assertion \|
	\|---\|---\|---\|---\|
	\| U15 \| `test_step_happy_path_returns_200` \| Session pre-created via `/reset`; body `{"action": {"action_type": "tool_call", "tool_name": "airline.search", "tool_args": {}}}`. \| `resp.status_code == 200`; body keys `== {"observation", "reward", "done", "info"}`; `reward` is `float` or `None`; `done` is `bool`. Envelope per §2.1.1. \|
	\| U16 \| `test_step_unknown_session_returns_404_M3` \| No prior `/reset`; POST `/step` with `X-Session-Id: never-existed-0001`. \| `assert_error_envelope(resp, code="session_not_found", http_status=404)`; matches M3. \|
	\| U17 \| `test_step_invalid_action_shape_returns_400_M8` \| Session pre-created; body `{"action": {"action_type": "tool_call"}}` (missing `tool_name`). \| `assert_error_envelope(resp, code="invalid_action", http_status=400)`; matches M8. \|
	\| U18 \| `test_step_internal_exception_returns_500_M9_no_stacktrace` \| Monkey-patch `env.step` to raise `RuntimeError("boom")`. \| `assert_error_envelope(resp, code="internal_error", http_status=500)`; matches M9. `"boom"` does not appear in body (stack-trace suppression §5 rule 1). `resp.json()["error"]["request_id"]` is present (ASGI scope id). \|

	### 1.6 `GET /state` — `test_state.py`

	\| # \| Name \| Setup \| Assertion \|
	\|---\|---\|---\|---\|
	\| U19 \| `test_state_happy_path_returns_200` \| Session pre-created via `/reset` then two `/step`s. \| `resp.status_code == 200`; body keys `== {"state", "turn"}`; `turn == 2` (int). Envelope per §2.1.1. \|
	\| U20 \| `test_state_expired_session_returns_404_M4` \| Session exists at `t0`; monotonic clock advanced by 3601 s via fixture; sweep runs; GET `/state`. \| `assert_error_envelope(resp, code="session_expired", http_status=404)`; matches M4. \|

	### 1.7 `POST /close` — `test_close.py`

	\| # \| Name \| Setup \| Assertion \|
	\|---\|---\|---\|---\|
	\| U21 \| `test_close_happy_path_returns_200_and_final_state` \| Session pre-created. \| `resp.status_code == 200`; body keys `== {"closed", "final_state"}`; `closed is True`; `final_state` is a dict. \|
	\| U22 \| `test_close_on_already_evicted_session_returns_200_with_null_final_state` \| Session was evicted by sweep before `/close` arrives. \| `resp.status_code == 200`; `resp.json() == {"closed": True, "final_state": None}` (§2.1.1 "null if session was already evicted"). \|

	### 1.8 Session cache direct unit tests — `test_session_cache_unit.py`

	These bypass HTTP and call the cache API directly, to pin the policy invariants from §3.2.

	\| # \| Name \| Setup \| Assertion \|
	\|---\|---\|---\|---\|
	\| U23 \| `test_cache_lru_eviction_on_11th_session` \| Fill cache with sessions `s0..s9` (max=10); insert `s10`. \| Cache size remains `== 10`; `s0` (oldest `last_touched`) is evicted; `s10` is present; `env.close()` was called on the evicted entry (spy assertion). §3.2 invariant. \|
	\| U24 \| `test_cache_ttl_sweep_evicts_stale_entries` \| Insert `s_old` at `t0`; advance monotonic clock by 3601 s; call `cache.sweep()`. \| `s_old` no longer in cache; spy confirms `env.close()` called; cache remains internally consistent (len == 0). §3.3. \|
	\| U25 \| `test_cache_max_sessions_returns_429_M5_with_retry_after` \| Cache full of 10 fresh sessions (all touched < 1 s ago); POST `/reset` with a new `X-Session-Id`. \| `resp.status_code == 429`; `assert_error_envelope(resp, code="max_sessions", http_status=429)`; `resp.headers["Retry-After"] == "30"` (only M5 carries Retry-After — §5 rules). Matches M5. \|

	### 1.9 Error-mode matrix — `test_error_modes_mapping.py`

	One parametrized test asserting M1..M12 are each reachable and return the expected HTTP code + slug. Parameters:

	```
	[
	("M1", "unauthorized", 401, <bad_bearer_request>),
	("M2", "missing_session_id", 400, <no_session_header_request>),
	("M3", "session_not_found", 404, <step_on_unknown_sid>),
	("M4", "session_expired", 404, <step_after_ttl_expiry>),
	("M5", "max_sessions", 429, <reset_when_cache_full>),
	("M6", "model_not_ready", 503, <step_before_lifespan_load>),
	("M7", "bad_json", 400, <malformed_body>),
	("M8", "invalid_action", 400, <wrong_action_shape>),
	("M9", "internal_error", 500, <env_step_raises>),
	("M10", "io_error", 500, <tmpfs_full_monkeypatch>),
	("M11", "payload_too_large", 413, <oversize_body>),
	("M12", "reset_in_progress", 409, <concurrent_reset_same_sid>),
	]
	```

	\| # \| Name \| Setup \| Assertion \|
	\|---\|---\|---\|---\|
	\| U26 \| `test_error_modes_M1_through_M12_full_matrix` \| Parametrized over the 12 tuples above. \| For every row: `resp.status_code == expected_http`; `resp.json()["error"]["code"] == expected_slug`; `resp.headers["Cache-Control"] == "no-store"`; `resp.headers.get("Retry-After")` is `"30"` iff row is M5 else absent. \|

	### 1.10 Lifespan eager load — `test_lifespan_eager_load.py`

	\| # \| Name \| Setup \| Assertion \|
	\|---\|---\|---\|---\|
	\| U27 \| `test_lifespan_loads_models_before_serving_requests` \| Instrument `audio.tts_kokoro.load` and `audio.asr_whisper.load` with call-counter. Start app via `LifespanManager`; issue `/reset` immediately after startup event fires. \| Call-counters `== 1` each before any request handler runs (assertion inside lifespan startup). Request returns 200, never 503. §7.3. \|
	\| U28 \| `test_step_before_lifespan_complete_returns_503_M6` \| Monkey-patch lifespan to defer model load; issue `/step` during the deferred window. \| `assert_error_envelope(resp, code="model_not_ready", http_status=503)`; matches M6. Confirms the guard exists before models are ready. \|

	---

	## 2. Property Tests

	Hypothesis-driven invariants on the deployment surface. Minimum 5 properties; this plan specifies 7 (two extra for margin).

	### 2.1 `P1` — `/step` is idempotent on invalid action (env state unchanged)

	Strategy: `invalid_action_strategy = hypothesis.strategies.dictionaries(...)` producing action bodies that fail pydantic validation (missing fields, wrong types, unknown `action_type`).

	Invariant:
	```
	pre_state = GET /state (turn = T)
	resp = POST /step with invalid action # → 400 M8
	post_state = GET /state
	assert pre_state == post_state # turn unchanged, drift_schedule unchanged
	assert resp.status_code == 400
	```

	Confirms §7.5 transactional step semantics: state only mutates after all work succeeds; a rejected action is a no-op.

	### 2.2 `P2` — Session expiration is monotonic and consistent

	Strategy: `st.integers(min_value=0, max_value=7200)` for synthetic elapsed seconds.

	Invariant:
	```
	For any elapsed ∈ [0, 7200]:
	if elapsed < 3600: /step returns 200 (session alive)
	if elapsed >= 3600: /step returns 404 M4 (session expired)
	Once expired, the session NEVER becomes alive again without a new /reset.
	```

	Tests monotone one-way transition: `alive → expired` is terminal. §3.2 TTL = 3600 s.

	### 2.3 `P3` — Error envelope shape is universal

	Strategy: parametrized across all 12 error-triggering inputs (from U26 matrix).

	Invariant: every error response satisfies:
	```
	body = resp.json()
	set(body.keys()) == {"error"}
	set(body["error"].keys()) >= {"code", "message"}
	isinstance(body["error"]["code"], str) and body["error"]["code"] != ""
	isinstance(body["error"]["message"], str)
	"traceback" not in json.dumps(body).lower()
	"bearer" not in body["error"]["message"].lower() # no token leakage
	```

	### 2.4 `P4` — `X-Session-Id` charset and length round-trip

	Strategy: `st.text(alphabet=string.ascii_letters + string.digits + "_-", min_size=1, max_size=64)` generates valid session ids; a second strategy generates invalid ones (containing `!@# `, length 0, length 65+).

	Invariant:
	```
	valid_sid → /reset returns 200
	invalid_sid → /reset returns 400 M2
	After /reset with valid_sid:
	GET /state with the same sid returns 200
	GET /state with ANY other sid returns 404 M3
	```

	### 2.5 `P5` — LRU eviction preserves cache size cap

	Strategy: `st.lists(st.text(alphabet=string.ascii_letters, min_size=8, max_size=16), min_size=11, max_size=50, unique=True)` — sequences of distinct session ids.

	Invariant: after POSTing `/reset` for every sid in the list (one at a time):
	```
	len(cache) == min(len(sids), 10)
	The 10 present sids are exactly the 10 most-recently-inserted (by last_touched).
	No env instance is leaked (every evicted env had .close() called exactly once).
	```

	### 2.6 `P6` — Reward field is float-or-null

	Strategy: parametrized over valid actions per `DriftCallAction` shape.

	Invariant: every `/step` 200-response body satisfies:
	```
	reward = body["reward"]
	assert reward is None or (isinstance(reward, float) and -1.0 <= reward <= 1.0)
	assert isinstance(body["done"], bool)
	```

	Pins §2.1.1 envelope: `reward: float \| null`, range aligned with `openenv.yaml` `reward.range: [-1.0, 1.0]` (§4.3).

	### 2.7 `P7` — Concurrent `/reset` on same sid never produces two envs

	Strategy: `hypothesis.stateful.RuleBasedStateMachine` driving concurrent `/reset` calls on the same `X-Session-Id` via `anyio.create_task_group`.

	Invariant:
	```
	Across N concurrent /reset calls on the same sid:
	exactly one succeeds with 200 (winner)
	the remaining N-1 return 409 M12 (reset_in_progress)
	cache ends with exactly one env under that sid
	no env instance is leaked
	```

	§7.1 per-session asyncio lock invariant.

	---

	## 3. Integration Tests

	Cross-cutting scenarios that exercise real subsystems. Marked `@pytest.mark.integration`; run in CI only, not in the fast `pytest tests/` loop.

	### 3.1 `I1` — End-to-end curl flow: `/reset` → 6× `/step` → `/state` → `/close`

	Mechanism: `subprocess.run(["curl", ...])` against a locally-booted FastAPI app (via `uvicorn` subprocess, port 7860). Uses the real `curl` binary to exercise headers + HTTP/1.1 semantics exactly as judges will.

	Flow:
	1. Start uvicorn in a subprocess, wait for `/healthz` to return `ok` (max 45 s, matches `HEALTHCHECK --start-period=45s` in §4.2).
	2. `curl -X POST /reset` with bearer + `X-Session-Id: e2e-001`, body `{"seed": 42, "config": {"curriculum_stage": 1}}`. Assert 200.
	3. Loop 6 times: `curl -X POST /step` with a `tool_call` action. Assert 200 each time; accumulate `done` values.
	4. `curl /state`. Assert 200; `turn >= 6`.
	5. `curl -X POST /close`. Assert 200; `closed is True`.
	6. Kill uvicorn subprocess; assert no zombie process.

	Budget: single test must complete under 60 s including subprocess boot.

	### 3.2 `I2` — Docker build locally + `openenv validate`

	Mechanism: `docker build -t driftcall-env:test -f DRIFTCALL/Dockerfile DRIFTCALL/` then `docker run -d -p 7860:7860 -e DRIFTCALL_ENV_TOKEN=test-token driftcall-env:test`, then `openenv validate http://localhost:7860 --auth-bearer test-token`.

	Assertions:
	1. `docker build` exits 0.
	2. Image size < 2 GB (`docker image inspect driftcall-env:test --format '{{.Size}}'` < `2 * 1024**3`).
	3. Container healthz returns `ok` within 60 s of `docker run`.
	4. `openenv validate` exits 0 and its stdout contains each of:
	- `openenv.yaml parses, schema v1.0`
	- `POST /reset` success line
	- `POST /step` success line
	- `GET /state` success line
	- `POST /close` success line
	- `6 endpoints validated, 0 errors`
	5. Container cleanup: `docker rm -f` in `finally` block.

	Gating: marked `@pytest.mark.skipif(not shutil.which("docker"))` — locally opt-in, mandatory in CI.

	### 3.3 `I3` — HF Space deploy dry-run (no actual push)

	Mechanism: `hf upload --dry-run <team>/driftcall-env . --repo-type=space`. Captures the file manifest that would be pushed.

	Assertions:
	1. Exit code 0.
	2. Manifest includes: `app.py`, `openenv.yaml`, `Dockerfile`, `requirements.txt`, `README.md`, `driftcall/` subtree.
	3. Manifest excludes: `tests/`, `training/`, `data/raw/`, `.env`, `.ipynb`, `.git/`.
	4. `README.md` YAML frontmatter contains required keys: `title`, `sdk: docker`, `app_port: 7860`, `emoji`, `colorFrom`, `colorTo` (§4.4).
	5. No actual network call to `huggingface.co` — enforced via `monkeypatch` on `huggingface_hub` outbound session to raise if reached.

	### 3.4 `I4` — Concurrent 10-session load test

	Mechanism: `anyio.create_task_group` spawning 10 coroutines, each driving a unique `X-Session-Id` through `/reset` → 3× `/step` → `/close` against `TestClient(app)`.

	Assertions:
	1. All 10 `/reset` calls return 200 (cache is exactly at cap).
	2. An 11th concurrent `/reset` (while the first 10 are still `last_touched < TTL`) returns 429 M5 with `Retry-After: 30` (proves cap enforcement under contention).
	3. All 30 `/step` calls (3 × 10 sessions) return 200; no cross-session state bleed — each session's `observation.turn` progresses independently (`1, 2, 3`).
	4. All 10 `/close` calls return 200.
	5. Wall-clock budget: total test completes in < 30 s on CI 2-vCPU runner.

	### 3.5 `I5` — Cold-start lifespan blocks request serving until models loaded

	Mechanism: Instrument `audio.tts_kokoro.load` with an artificial 2 s `anyio.sleep`. Boot the app via `LifespanManager` and concurrently fire a `/reset` request at `t=0` (before startup completes).

	Assertions:
	1. The `/reset` request blocks until lifespan startup is complete — it does not return 503 during the loading window if `app.py` correctly awaits lifespan before accepting requests (this is the FastAPI default).
	2. If instead we disable the lifespan gate (test variant), the request returns 503 M6 with `code="model_not_ready"` — proves M6 is reachable and the guard is load-bearing.
	3. `/healthz` responds 200 throughout (probe endpoint is cheap and does not require models — §3.5 "unauthenticated").

	### 3.6 `I6` — TTL sweep liveness under sustained traffic

	Mechanism: Run the `TestClient` against the app for 70 s of simulated traffic (monotonic clock advanced via fixture), issuing one `/reset` per synthetic minute with a fresh `X-Session-Id`. Sweep runs every 60 s per §3.3.

	Assertions:
	1. After the 61st synthetic second, the first session's entry has been evicted by the sweep task.
	2. A `/step` on that first session returns 404 M4.
	3. The sweep task itself does not raise; logs contain exactly one "swept 1 expired session" structured log line per sweep cycle (§3.7 logging fields).

	---

	## 4. Coverage Target

	Targets (enforced in CI via `pytest --cov-fail-under`):

	\| Artifact \| Line coverage \| Branch coverage \|
	\|---\|---\|---\|
	\| `app.py` \| 100% \| ≥ 95% \|
	\| `driftcall/routes/reset.py` \| 100% \| ≥ 95% \|
	\| `driftcall/routes/step.py` \| 100% \| ≥ 95% \|
	\| `driftcall/routes/state.py` \| 100% \| ≥ 95% \|
	\| `driftcall/routes/close.py` \| 100% \| ≥ 95% \|
	\| `driftcall/routes/health.py` \| 100% \| 100% (trivial file) \|
	\| `driftcall/session_cache.py` \| 100% \| ≥ 95% \|

	Command:
	```
	pytest tests/test_deploy_env/ \
	--cov=app \
	--cov=driftcall.routes \
	--cov=driftcall.session_cache \
	--cov-branch \
	--cov-report=term-missing \
	--cov-fail-under=100
	```

	Branch-coverage carve-outs (documented pragmas, not silent): the `except asyncio.CancelledError: raise` guard at the bottom of the sweep task's loop is excluded via `# pragma: no cover` — re-raising a cancellation is standard-library contract and triggering it requires injecting a cancellation into the `lifespan` shutdown, which is covered by the lifespan test (I5) at the event-loop level.

	Error-mode coverage ledger — every one of M1..M12 is raised by at least one test:

	\| Mode \| Raised by \| HTTP \|
	\|---\|---\|---\|
	\| M1 `unauthorized` \| U3, U4, U5, U6, U26 \| 401 \|
	\| M2 `missing_session_id` \| U7, U8, U9, U26, P4 \| 400 \|
	\| M3 `session_not_found` \| U16, U26, P4 \| 404 \|
	\| M4 `session_expired` \| U20, U26, P2, I6 \| 404 \|
	\| M5 `max_sessions` \| U25, U26, I4 \| 429 \|
	\| M6 `model_not_ready` \| U28, U26, I5 \| 503 \|
	\| M7 `bad_json` \| U12, U26 \| 400 \|
	\| M8 `invalid_action` \| U13, U17, U26, P1 \| 400 \|
	\| M9 `internal_error` \| U18, U26 \| 500 \|
	\| M10 `io_error` \| U26 (monkeypatched tmpfs full) \| 500 \|
	\| M11 `payload_too_large` \| U14, U26 \| 413 \|
	\| M12 `reset_in_progress` \| U26, P7 \| 409 \|

	HTTP status codes asserted at least once: `200, 400, 401, 404, 409, 413, 429, 500, 503` — all nine from §2.2.

	---

	## 5. Fixtures

	Defined in `tests/conftest.py` (project-wide) and imported by `tests/test_deploy_env/conftest.py`. Shared with `deploy_demo_space_tests.md` — any change here propagates there and vice versa.

	### 5.1 `fastapi_test_client`

	```
	@pytest.fixture
	def fastapi_test_client(monkeypatch, valid_bearer_token, stub_audio_models):
	"""
	Boots the FastAPI app with lifespan, stub Kokoro+Whisper loaded,
	and bearer token injected into app config.

	Yields a `fastapi.testclient.TestClient` that supports all HTTP verbs
	against the live app (in-process, no socket).

	Lifecycle: uses LifespanManager to fire startup/shutdown events;
	cache is flushed between tests via autouse cache-reset fixture.
	"""
	monkeypatch.setenv("DRIFTCALL_ENV_TOKEN", valid_bearer_token)
	from app import app
	with TestClient(app) as client:
	yield client
	```

	Used by: every unit test in §1, properties P1–P7, integration tests I1, I4, I5, I6.

	### 5.2 `valid_bearer_token`

	```
	@pytest.fixture(scope="session")
	def valid_bearer_token() -> str:
	"""A freshly-generated URL-safe token, session-scoped so it is stable
	across tests in one pytest run but distinct between runs."""
	return secrets.token_urlsafe(32)
	```

	Used by: every test that asserts 200 on a mutating endpoint, plus the "bad bearer" tests (which receive `valid_bearer_token + "x"` as the wrong token).

	### 5.3 `session_id_alpha`

	```
	@pytest.fixture
	def session_id_alpha() -> str:
	"""Deterministic session id for tests that only need one sid."""
	return "session-alpha-0001"
	```

	Charset and length both pass the header validator (§2.1 headers table).

	### 5.4 `session_id_beta`

	```
	@pytest.fixture
	def session_id_beta() -> str:
	"""Second deterministic session id for cross-session tests
	(e.g., asserting no state bleed between alpha and beta)."""
	return "session-beta-0002"
	```

	### 5.5 Helper fixtures (non-shared, internal to this test package)

	- `stub_audio_models` — monkeypatches `audio.tts_kokoro.load` and `audio.asr_whisper.load` to return lightweight stubs so lifespan completes in < 50 ms. Used everywhere except I5 (which tests real-ish load behavior).
	- `monotonic_clock` — monkeypatches `time.monotonic()` to advance deterministically; used by U20, U24, P2, I6.
	- `cache_reset` (autouse) — clears `session_cache._store` between tests; prevents cross-test bleed.
	- `assert_error_envelope(resp, code, http_status)` — imported helper, asserts envelope shape + `Cache-Control: no-store` header + optional `Retry-After` when `code == "max_sessions"`.
	- `one_mib_plus_one_body` — precomputed `bytes` payload for U14 (M11 oversize test).

	Fixture ownership note: `fastapi_test_client`, `valid_bearer_token`, `session_id_alpha`, `session_id_beta` live in `tests/conftest.py` at the project root and are the shared set with `deploy_demo_space_tests.md`. Helper fixtures (§5.5) are local to `tests/test_deploy_env/conftest.py` and are not shared.

	---

	## 6. Non-goals (out of scope for this plan)

	- Deep per-field validation of `DriftCallObservation` / `DriftCallAction` / `DriftCallState` — owned by `env_tests.md` + `models_tests.md`.
	- Reward math correctness — owned by `rewards_tests.md`.
	- Kokoro / Whisper model quality — owned by `audio_tests.md`.
	- Actual HF Hub pushes — forbidden in tests (§3.3 dry-run only); real push happens in Batch C3 manual verification.
	- GPU behavior — deployment is CPU-only (deploy_env_space.md §1, §6.5 explicit non-dependency).
	- Cross-worker cache coherence — documented as acceptable 404 path in §3.2 of the spec; not a test target for this hackathon (future hardening).