# deploy_env_space_tests.md — Test Plan for `docs/modules/deploy_env_space.md` **Target artifact:** `app.py` (FastAPI entrypoint) + `driftcall/routes/*.py` (per-endpoint handlers: `reset.py`, `step.py`, `state.py`, `close.py`, `health.py`) + `driftcall/session_cache.py` (in-process session cache + eviction sweep) + `Dockerfile` + `openenv.yaml` **Spec doc:** `DRIFTCALL/docs/modules/deploy_env_space.md` (final, sealed 2026-04-24) **Framework:** `pytest` + `httpx` (via `fastapi.testclient.TestClient`) + `hypothesis` (properties) + `docker` CLI (integration only) **Owner:** Person B (Rewards & Tests) — domain-reviewed by Person D (Deploy & Story) **Implements:** deploy_env_space.md §2 (interface), §3 (behavior), §4 (data structures), §5 (error modes M1–M12), §7 (edge cases); `DRIFTCALL/CLAUDE.md §3.1` (nine-section test-plan doc — this plan supplies the five required sections: Unit, Property, Integration, Coverage, Fixtures). **Coverage targets:** **100% line** + **≥ 95% branch** on `app.py` + `driftcall/routes/*.py` + `driftcall/session_cache.py`. All 12 error modes **M1–M12** must be raised by at least one test. **Numeric invariants:** HTTP status codes are exact integers (200, 400, 401, 404, 409, 413, 429, 500, 503). TTL values in tests use `time.monotonic()` monkey-patched via `freezegun`-style fixture — wall-clock is never read directly. Bearer tokens are `secrets.token_urlsafe(32)` strings; never hardcoded magic values outside the `valid_bearer_token` fixture. **Mandatory assertion on every error response:** `json.loads(resp.text) == {"error": {"code": , "message": }}` and `resp.headers["Cache-Control"] == "no-store"` — enforced by helper `assert_error_envelope(resp, code, http_status)` that all error-path tests call. **Mandatory assertion on every success response:** `resp.headers["Content-Type"].startswith("application/json")` (except `/healthz` which is `text/plain`). Fixtures defined in §5 are **shared** with `deploy_demo_space_tests.md` (same names, same canonicalised content). If any fixture changes here, the shared copy in `tests/conftest.py` MUST be updated in lockstep, and `deploy_demo_space_tests.md §5` cross-checked. --- ## 1. Unit Tests **Organisation:** one `pytest` sub-package mirroring the route layout under `tests/test_deploy_env/`: ``` tests/test_deploy_env/ __init__.py conftest.py # fixtures from §5, plus assert_error_envelope helper test_healthz.py # /healthz — unauthenticated, cheap test_auth.py # bearer enforcement across all mutating endpoints test_session_header.py # X-Session-Id header validation test_reset.py # POST /reset happy + error paths test_step.py # POST /step happy + error paths test_state.py # GET /state happy + error paths test_close.py # POST /close happy + error paths test_body_schemas.py # §2.1.1 shape conformance (envelope, not inner dataclass) test_session_cache_unit.py # LRU, TTL, eviction sweep — direct cache tests test_error_modes_mapping.py # M1..M12 matrix — every error mode hit at least once test_status_code_map.py # every row of §2.2 table asserted test_lifespan_eager_load.py # app.py lifespan loads Kokoro+Whisper BEFORE serving ``` **Unit test case inventory — 28 cases total (exceeds the ≥ 20 requirement).** ### 1.1 `/healthz` — `test_healthz.py` | # | Name | Setup | Assertion | |---|---|---|---| | U1 | `test_healthz_returns_200_plaintext_ok` | No auth header. | `resp.status_code == 200`; `resp.text == "ok"`; `resp.headers["Content-Type"].startswith("text/plain")`; endpoint does **not** require bearer (§3.5 "unauthenticated"). | | U2 | `test_healthz_works_when_models_loaded` | Lifespan fixture loads stub Kokoro+Whisper. | `resp.status_code == 200`; no 503 raised even under no-auth request. Confirms `/healthz` bypass is independent of model readiness gate for probe liveness. | ### 1.2 Bearer auth — `test_auth.py` Applies to every mutating endpoint (`/reset`, `/step`, `/state`, `/close`). | # | Name | Setup | Assertion | |---|---|---|---| | U3 | `test_reset_missing_authorization_returns_401_M1` | POST `/reset` with `X-Session-Id` but **no** `Authorization` header. | `assert_error_envelope(resp, code="unauthorized", http_status=401)`; matches **M1**. | | U4 | `test_step_bad_bearer_returns_401_M1` | POST `/step` with `Authorization: Bearer not-the-token`. | `assert_error_envelope(resp, code="unauthorized", http_status=401)`; matches **M1**. Body must **not** leak the expected token. | | U5 | `test_state_missing_bearer_returns_401_M1` | GET `/state` with no `Authorization`. | `assert_error_envelope(resp, code="unauthorized", http_status=401)`. | | U6 | `test_close_wrong_scheme_returns_401_M1` | POST `/close` with `Authorization: Basic ` (wrong scheme). | `assert_error_envelope(resp, code="unauthorized", http_status=401)`. Only `Bearer` scheme accepted (§3.5). | ### 1.3 `X-Session-Id` header — `test_session_header.py` | # | Name | Setup | Assertion | |---|---|---|---| | U7 | `test_reset_missing_x_session_id_returns_400_M2` | POST `/reset` with valid bearer, **no** `X-Session-Id`. | `assert_error_envelope(resp, code="missing_session_id", http_status=400)`; matches **M2**. | | U8 | `test_step_malformed_x_session_id_returns_400_M2` | POST `/step` with `X-Session-Id: "bad session!"` (space + `!`, violates `[A-Za-z0-9_-]` charset). | `assert_error_envelope(resp, code="missing_session_id", http_status=400)`; matches **M2** (treated as "not a valid session id"). | | U9 | `test_step_x_session_id_over_64_chars_returns_400_M2` | POST `/step` with `X-Session-Id` of length 65. | `assert_error_envelope(resp, code="missing_session_id", http_status=400)`; matches **M2**. | ### 1.4 `POST /reset` — `test_reset.py` | # | Name | Setup | Assertion | |---|---|---|---| | U10 | `test_reset_happy_path_returns_200_and_observation_envelope` | Valid bearer, `X-Session-Id: session_id_alpha`, body `{"seed": 42, "config": {"curriculum_stage": 1}}`. | `resp.status_code == 200`; body top-level keys `== {"observation", "episode_id", "max_turns"}`; `episode_id` is a uuid4 string; `max_turns` is `int`, 1 ≤ value ≤ 16; `observation` is a dict. Envelope conformance per §2.1.1. | | U11 | `test_reset_with_language_weights_returns_200` | Valid bearer, valid session id, body `{"config": {"language_weights": {"hi": 0.5, "ta": 0.5}}}`. | `resp.status_code == 200`; observation includes the requested language distribution's imprint (via `info.config_echo` if exposed — else just assert envelope). | | U12 | `test_reset_bad_json_returns_400_M7` | POST `/reset` with body `b"{not json"` and `Content-Type: application/json`. | `assert_error_envelope(resp, code="bad_json", http_status=400)`; matches **M7**. | | U13 | `test_reset_invalid_curriculum_stage_returns_400_M8` | Body `{"config": {"curriculum_stage": 99}}`. | `assert_error_envelope(resp, code="invalid_action", http_status=400)`; matches **M8** (dataclass validation failure on reset config). | | U14 | `test_reset_payload_over_1mib_returns_413_M11` | Body size = 1 MiB + 1 byte (padded `config` dict). | `assert_error_envelope(resp, code="payload_too_large", http_status=413)`; matches **M11**. | ### 1.5 `POST /step` — `test_step.py` | # | Name | Setup | Assertion | |---|---|---|---| | U15 | `test_step_happy_path_returns_200` | Session pre-created via `/reset`; body `{"action": {"action_type": "tool_call", "tool_name": "airline.search", "tool_args": {}}}`. | `resp.status_code == 200`; body keys `== {"observation", "reward", "done", "info"}`; `reward` is `float` **or** `None`; `done` is `bool`. Envelope per §2.1.1. | | U16 | `test_step_unknown_session_returns_404_M3` | No prior `/reset`; POST `/step` with `X-Session-Id: never-existed-0001`. | `assert_error_envelope(resp, code="session_not_found", http_status=404)`; matches **M3**. | | U17 | `test_step_invalid_action_shape_returns_400_M8` | Session pre-created; body `{"action": {"action_type": "tool_call"}}` (missing `tool_name`). | `assert_error_envelope(resp, code="invalid_action", http_status=400)`; matches **M8**. | | U18 | `test_step_internal_exception_returns_500_M9_no_stacktrace` | Monkey-patch `env.step` to raise `RuntimeError("boom")`. | `assert_error_envelope(resp, code="internal_error", http_status=500)`; matches **M9**. `"boom"` does **not** appear in body (stack-trace suppression §5 rule 1). `resp.json()["error"]["request_id"]` is present (ASGI scope id). | ### 1.6 `GET /state` — `test_state.py` | # | Name | Setup | Assertion | |---|---|---|---| | U19 | `test_state_happy_path_returns_200` | Session pre-created via `/reset` then two `/step`s. | `resp.status_code == 200`; body keys `== {"state", "turn"}`; `turn == 2` (int). Envelope per §2.1.1. | | U20 | `test_state_expired_session_returns_404_M4` | Session exists at `t0`; monotonic clock advanced by 3601 s via fixture; sweep runs; GET `/state`. | `assert_error_envelope(resp, code="session_expired", http_status=404)`; matches **M4**. | ### 1.7 `POST /close` — `test_close.py` | # | Name | Setup | Assertion | |---|---|---|---| | U21 | `test_close_happy_path_returns_200_and_final_state` | Session pre-created. | `resp.status_code == 200`; body keys `== {"closed", "final_state"}`; `closed is True`; `final_state` is a dict. | | U22 | `test_close_on_already_evicted_session_returns_200_with_null_final_state` | Session was evicted by sweep before `/close` arrives. | `resp.status_code == 200`; `resp.json() == {"closed": True, "final_state": None}` (§2.1.1 "null if session was already evicted"). | ### 1.8 Session cache direct unit tests — `test_session_cache_unit.py` These bypass HTTP and call the cache API directly, to pin the policy invariants from §3.2. | # | Name | Setup | Assertion | |---|---|---|---| | U23 | `test_cache_lru_eviction_on_11th_session` | Fill cache with sessions `s0..s9` (max=10); insert `s10`. | Cache size remains `== 10`; `s0` (oldest `last_touched`) is evicted; `s10` is present; `env.close()` was called on the evicted entry (spy assertion). §3.2 invariant. | | U24 | `test_cache_ttl_sweep_evicts_stale_entries` | Insert `s_old` at `t0`; advance monotonic clock by 3601 s; call `cache.sweep()`. | `s_old` no longer in cache; spy confirms `env.close()` called; cache remains internally consistent (len == 0). §3.3. | | U25 | `test_cache_max_sessions_returns_429_M5_with_retry_after` | Cache full of 10 fresh sessions (all touched < 1 s ago); POST `/reset` with a new `X-Session-Id`. | `resp.status_code == 429`; `assert_error_envelope(resp, code="max_sessions", http_status=429)`; `resp.headers["Retry-After"] == "30"` (only M5 carries Retry-After — §5 rules). Matches **M5**. | ### 1.9 Error-mode matrix — `test_error_modes_mapping.py` One parametrized test asserting **M1..M12** are each reachable and return the expected HTTP code + slug. Parameters: ``` [ ("M1", "unauthorized", 401, ), ("M2", "missing_session_id", 400, ), ("M3", "session_not_found", 404, ), ("M4", "session_expired", 404, ), ("M5", "max_sessions", 429, ), ("M6", "model_not_ready", 503, ), ("M7", "bad_json", 400, ), ("M8", "invalid_action", 400, ), ("M9", "internal_error", 500, ), ("M10", "io_error", 500, ), ("M11", "payload_too_large", 413, ), ("M12", "reset_in_progress", 409, ), ] ``` | # | Name | Setup | Assertion | |---|---|---|---| | U26 | `test_error_modes_M1_through_M12_full_matrix` | Parametrized over the 12 tuples above. | For every row: `resp.status_code == expected_http`; `resp.json()["error"]["code"] == expected_slug`; `resp.headers["Cache-Control"] == "no-store"`; `resp.headers.get("Retry-After")` is `"30"` iff row is M5 else absent. | ### 1.10 Lifespan eager load — `test_lifespan_eager_load.py` | # | Name | Setup | Assertion | |---|---|---|---| | U27 | `test_lifespan_loads_models_before_serving_requests` | Instrument `audio.tts_kokoro.load` and `audio.asr_whisper.load` with call-counter. Start app via `LifespanManager`; issue `/reset` immediately after startup event fires. | Call-counters `== 1` each **before** any request handler runs (assertion inside lifespan startup). Request returns 200, never 503. §7.3. | | U28 | `test_step_before_lifespan_complete_returns_503_M6` | Monkey-patch lifespan to defer model load; issue `/step` during the deferred window. | `assert_error_envelope(resp, code="model_not_ready", http_status=503)`; matches **M6**. Confirms the guard exists before models are ready. | --- ## 2. Property Tests Hypothesis-driven invariants on the deployment surface. Minimum **5 properties**; this plan specifies **7** (two extra for margin). ### 2.1 `P1` — `/step` is idempotent on invalid action (env state unchanged) **Strategy:** `invalid_action_strategy = hypothesis.strategies.dictionaries(...)` producing action bodies that fail pydantic validation (missing fields, wrong types, unknown `action_type`). **Invariant:** ``` pre_state = GET /state (turn = T) resp = POST /step with invalid action # → 400 M8 post_state = GET /state assert pre_state == post_state # turn unchanged, drift_schedule unchanged assert resp.status_code == 400 ``` Confirms §7.5 transactional step semantics: state only mutates after all work succeeds; a rejected action is a no-op. ### 2.2 `P2` — Session expiration is monotonic and consistent **Strategy:** `st.integers(min_value=0, max_value=7200)` for synthetic elapsed seconds. **Invariant:** ``` For any elapsed ∈ [0, 7200]: if elapsed < 3600: /step returns 200 (session alive) if elapsed >= 3600: /step returns 404 M4 (session expired) Once expired, the session NEVER becomes alive again without a new /reset. ``` Tests monotone one-way transition: `alive → expired` is terminal. §3.2 TTL = 3600 s. ### 2.3 `P3` — Error envelope shape is universal **Strategy:** parametrized across all 12 error-triggering inputs (from U26 matrix). **Invariant:** every error response satisfies: ``` body = resp.json() set(body.keys()) == {"error"} set(body["error"].keys()) >= {"code", "message"} isinstance(body["error"]["code"], str) and body["error"]["code"] != "" isinstance(body["error"]["message"], str) "traceback" not in json.dumps(body).lower() "bearer" not in body["error"]["message"].lower() # no token leakage ``` ### 2.4 `P4` — `X-Session-Id` charset and length round-trip **Strategy:** `st.text(alphabet=string.ascii_letters + string.digits + "_-", min_size=1, max_size=64)` generates valid session ids; a second strategy generates invalid ones (containing `!@# `, length 0, length 65+). **Invariant:** ``` valid_sid → /reset returns 200 invalid_sid → /reset returns 400 M2 After /reset with valid_sid: GET /state with the same sid returns 200 GET /state with ANY other sid returns 404 M3 ``` ### 2.5 `P5` — LRU eviction preserves cache size cap **Strategy:** `st.lists(st.text(alphabet=string.ascii_letters, min_size=8, max_size=16), min_size=11, max_size=50, unique=True)` — sequences of distinct session ids. **Invariant:** after POSTing `/reset` for every sid in the list (one at a time): ``` len(cache) == min(len(sids), 10) The 10 present sids are exactly the 10 most-recently-inserted (by last_touched). No env instance is leaked (every evicted env had .close() called exactly once). ``` ### 2.6 `P6` — Reward field is float-or-null **Strategy:** parametrized over valid actions per `DriftCallAction` shape. **Invariant:** every `/step` 200-response body satisfies: ``` reward = body["reward"] assert reward is None or (isinstance(reward, float) and -1.0 <= reward <= 1.0) assert isinstance(body["done"], bool) ``` Pins §2.1.1 envelope: `reward: float | null`, range aligned with `openenv.yaml` `reward.range: [-1.0, 1.0]` (§4.3). ### 2.7 `P7` — Concurrent `/reset` on same sid never produces two envs **Strategy:** `hypothesis.stateful.RuleBasedStateMachine` driving concurrent `/reset` calls on the same `X-Session-Id` via `anyio.create_task_group`. **Invariant:** ``` Across N concurrent /reset calls on the same sid: exactly one succeeds with 200 (winner) the remaining N-1 return 409 M12 (reset_in_progress) cache ends with exactly one env under that sid no env instance is leaked ``` §7.1 per-session asyncio lock invariant. --- ## 3. Integration Tests Cross-cutting scenarios that exercise real subsystems. Marked `@pytest.mark.integration`; run in CI only, not in the fast `pytest tests/` loop. ### 3.1 `I1` — End-to-end curl flow: `/reset` → 6× `/step` → `/state` → `/close` **Mechanism:** `subprocess.run(["curl", ...])` against a locally-booted FastAPI app (via `uvicorn` subprocess, port 7860). Uses the **real** `curl` binary to exercise headers + HTTP/1.1 semantics exactly as judges will. **Flow:** 1. Start uvicorn in a subprocess, wait for `/healthz` to return `ok` (max 45 s, matches `HEALTHCHECK --start-period=45s` in §4.2). 2. `curl -X POST /reset` with bearer + `X-Session-Id: e2e-001`, body `{"seed": 42, "config": {"curriculum_stage": 1}}`. Assert 200. 3. Loop 6 times: `curl -X POST /step` with a `tool_call` action. Assert 200 each time; accumulate `done` values. 4. `curl /state`. Assert 200; `turn >= 6`. 5. `curl -X POST /close`. Assert 200; `closed is True`. 6. Kill uvicorn subprocess; assert no zombie process. **Budget:** single test must complete under 60 s including subprocess boot. ### 3.2 `I2` — Docker build locally + `openenv validate` **Mechanism:** `docker build -t driftcall-env:test -f DRIFTCALL/Dockerfile DRIFTCALL/` then `docker run -d -p 7860:7860 -e DRIFTCALL_ENV_TOKEN=test-token driftcall-env:test`, then `openenv validate http://localhost:7860 --auth-bearer test-token`. **Assertions:** 1. `docker build` exits 0. 2. Image size < 2 GB (`docker image inspect driftcall-env:test --format '{{.Size}}'` < `2 * 1024**3`). 3. Container healthz returns `ok` within 60 s of `docker run`. 4. `openenv validate` exits 0 and its stdout contains each of: - `openenv.yaml parses, schema v1.0` - `POST /reset` success line - `POST /step` success line - `GET /state` success line - `POST /close` success line - `6 endpoints validated, 0 errors` 5. Container cleanup: `docker rm -f` in `finally` block. **Gating:** marked `@pytest.mark.skipif(not shutil.which("docker"))` — locally opt-in, mandatory in CI. ### 3.3 `I3` — HF Space deploy dry-run (no actual push) **Mechanism:** `hf upload --dry-run /driftcall-env . --repo-type=space`. Captures the file manifest that **would** be pushed. **Assertions:** 1. Exit code 0. 2. Manifest includes: `app.py`, `openenv.yaml`, `Dockerfile`, `requirements.txt`, `README.md`, `driftcall/` subtree. 3. Manifest **excludes**: `tests/`, `training/`, `data/raw/`, `.env*`, `*.ipynb`, `.git/`. 4. `README.md` YAML frontmatter contains required keys: `title`, `sdk: docker`, `app_port: 7860`, `emoji`, `colorFrom`, `colorTo` (§4.4). 5. **No actual network call** to `huggingface.co` — enforced via `monkeypatch` on `huggingface_hub` outbound session to raise if reached. ### 3.4 `I4` — Concurrent 10-session load test **Mechanism:** `anyio.create_task_group` spawning 10 coroutines, each driving a unique `X-Session-Id` through `/reset` → 3× `/step` → `/close` against `TestClient(app)`. **Assertions:** 1. All 10 `/reset` calls return 200 (cache is exactly at cap). 2. An 11th concurrent `/reset` (while the first 10 are still `last_touched < TTL`) returns **429 M5** with `Retry-After: 30` (proves cap enforcement under contention). 3. All 30 `/step` calls (3 × 10 sessions) return 200; no cross-session state bleed — each session's `observation.turn` progresses independently (`1, 2, 3`). 4. All 10 `/close` calls return 200. 5. Wall-clock budget: total test completes in < 30 s on CI 2-vCPU runner. ### 3.5 `I5` — Cold-start lifespan blocks request serving until models loaded **Mechanism:** Instrument `audio.tts_kokoro.load` with an artificial 2 s `anyio.sleep`. Boot the app via `LifespanManager` and concurrently fire a `/reset` request at `t=0` (before startup completes). **Assertions:** 1. The `/reset` request **blocks** until lifespan startup is complete — it does **not** return 503 during the loading window if `app.py` correctly awaits lifespan before accepting requests (this is the FastAPI default). 2. If instead we disable the lifespan gate (test variant), the request returns **503 M6** with `code="model_not_ready"` — proves M6 is reachable and the guard is load-bearing. 3. `/healthz` responds 200 throughout (probe endpoint is cheap and does not require models — §3.5 "unauthenticated"). ### 3.6 `I6` — TTL sweep liveness under sustained traffic **Mechanism:** Run the `TestClient` against the app for 70 s of simulated traffic (monotonic clock advanced via fixture), issuing one `/reset` per synthetic minute with a fresh `X-Session-Id`. Sweep runs every 60 s per §3.3. **Assertions:** 1. After the 61st synthetic second, the first session's entry has been evicted by the sweep task. 2. A `/step` on that first session returns 404 M4. 3. The sweep task itself does not raise; logs contain exactly one "swept 1 expired session" structured log line per sweep cycle (§3.7 logging fields). --- ## 4. Coverage Target **Targets (enforced in CI via `pytest --cov-fail-under`):** | Artifact | Line coverage | Branch coverage | |---|---|---| | `app.py` | **100%** | **≥ 95%** | | `driftcall/routes/reset.py` | **100%** | **≥ 95%** | | `driftcall/routes/step.py` | **100%** | **≥ 95%** | | `driftcall/routes/state.py` | **100%** | **≥ 95%** | | `driftcall/routes/close.py` | **100%** | **≥ 95%** | | `driftcall/routes/health.py` | **100%** | **100%** (trivial file) | | `driftcall/session_cache.py` | **100%** | **≥ 95%** | **Command:** ``` pytest tests/test_deploy_env/ \ --cov=app \ --cov=driftcall.routes \ --cov=driftcall.session_cache \ --cov-branch \ --cov-report=term-missing \ --cov-fail-under=100 ``` **Branch-coverage carve-outs (documented pragmas, not silent):** the `except asyncio.CancelledError: raise` guard at the bottom of the sweep task's loop is excluded via `# pragma: no cover` — re-raising a cancellation is standard-library contract and triggering it requires injecting a cancellation into the `lifespan` shutdown, which is covered by the lifespan test (I5) at the event-loop level. **Error-mode coverage ledger — every one of M1..M12 is raised by at least one test:** | Mode | Raised by | HTTP | |---|---|---| | M1 `unauthorized` | U3, U4, U5, U6, U26 | 401 | | M2 `missing_session_id` | U7, U8, U9, U26, P4 | 400 | | M3 `session_not_found` | U16, U26, P4 | 404 | | M4 `session_expired` | U20, U26, P2, I6 | 404 | | M5 `max_sessions` | U25, U26, I4 | 429 | | M6 `model_not_ready` | U28, U26, I5 | 503 | | M7 `bad_json` | U12, U26 | 400 | | M8 `invalid_action` | U13, U17, U26, P1 | 400 | | M9 `internal_error` | U18, U26 | 500 | | M10 `io_error` | U26 (monkeypatched tmpfs full) | 500 | | M11 `payload_too_large` | U14, U26 | 413 | | M12 `reset_in_progress` | U26, P7 | 409 | **HTTP status codes asserted at least once:** `200, 400, 401, 404, 409, 413, 429, 500, 503` — all nine from §2.2. --- ## 5. Fixtures Defined in `tests/conftest.py` (project-wide) and imported by `tests/test_deploy_env/conftest.py`. **Shared** with `deploy_demo_space_tests.md` — any change here propagates there and vice versa. ### 5.1 `fastapi_test_client` ``` @pytest.fixture def fastapi_test_client(monkeypatch, valid_bearer_token, stub_audio_models): """ Boots the FastAPI app with lifespan, stub Kokoro+Whisper loaded, and bearer token injected into app config. Yields a `fastapi.testclient.TestClient` that supports all HTTP verbs against the live app (in-process, no socket). Lifecycle: uses LifespanManager to fire startup/shutdown events; cache is flushed between tests via autouse cache-reset fixture. """ monkeypatch.setenv("DRIFTCALL_ENV_TOKEN", valid_bearer_token) from app import app with TestClient(app) as client: yield client ``` Used by: every unit test in §1, properties P1–P7, integration tests I1, I4, I5, I6. ### 5.2 `valid_bearer_token` ``` @pytest.fixture(scope="session") def valid_bearer_token() -> str: """A freshly-generated URL-safe token, session-scoped so it is stable across tests in one pytest run but distinct between runs.""" return secrets.token_urlsafe(32) ``` Used by: every test that asserts 200 on a mutating endpoint, plus the "bad bearer" tests (which receive `valid_bearer_token + "x"` as the wrong token). ### 5.3 `session_id_alpha` ``` @pytest.fixture def session_id_alpha() -> str: """Deterministic session id for tests that only need one sid.""" return "session-alpha-0001" ``` Charset and length both pass the header validator (§2.1 headers table). ### 5.4 `session_id_beta` ``` @pytest.fixture def session_id_beta() -> str: """Second deterministic session id for cross-session tests (e.g., asserting no state bleed between alpha and beta).""" return "session-beta-0002" ``` ### 5.5 Helper fixtures (non-shared, internal to this test package) - `stub_audio_models` — monkeypatches `audio.tts_kokoro.load` and `audio.asr_whisper.load` to return lightweight stubs so lifespan completes in < 50 ms. Used everywhere except I5 (which tests real-ish load behavior). - `monotonic_clock` — monkeypatches `time.monotonic()` to advance deterministically; used by U20, U24, P2, I6. - `cache_reset` (autouse) — clears `session_cache._store` between tests; prevents cross-test bleed. - `assert_error_envelope(resp, code, http_status)` — imported helper, asserts envelope shape + `Cache-Control: no-store` header + optional `Retry-After` when `code == "max_sessions"`. - `one_mib_plus_one_body` — precomputed `bytes` payload for U14 (M11 oversize test). **Fixture ownership note:** `fastapi_test_client`, `valid_bearer_token`, `session_id_alpha`, `session_id_beta` live in `tests/conftest.py` at the project root and are the shared set with `deploy_demo_space_tests.md`. Helper fixtures (§5.5) are local to `tests/test_deploy_env/conftest.py` and are **not** shared. --- ## 6. Non-goals (out of scope for this plan) - Deep per-field validation of `DriftCallObservation` / `DriftCallAction` / `DriftCallState` — owned by `env_tests.md` + `models_tests.md`. - Reward math correctness — owned by `rewards_tests.md`. - Kokoro / Whisper model quality — owned by `audio_tests.md`. - Actual HF Hub pushes — forbidden in tests (§3.3 dry-run only); real push happens in Batch C3 manual verification. - GPU behavior — deployment is CPU-only (deploy_env_space.md §1, §6.5 explicit non-dependency). - Cross-worker cache coherence — documented as acceptable 404 path in §3.2 of the spec; not a test target for this hackathon (future hardening).