Spaces:
Sleeping
Sleeping
| # deploy_env_space_tests.md — Test Plan for `docs/modules/deploy_env_space.md` | |
| **Target artifact:** `app.py` (FastAPI entrypoint) + `driftcall/routes/*.py` (per-endpoint handlers: `reset.py`, `step.py`, `state.py`, `close.py`, `health.py`) + `driftcall/session_cache.py` (in-process session cache + eviction sweep) + `Dockerfile` + `openenv.yaml` | |
| **Spec doc:** `DRIFTCALL/docs/modules/deploy_env_space.md` (final, sealed 2026-04-24) | |
| **Framework:** `pytest` + `httpx` (via `fastapi.testclient.TestClient`) + `hypothesis` (properties) + `docker` CLI (integration only) | |
| **Owner:** Person B (Rewards & Tests) — domain-reviewed by Person D (Deploy & Story) | |
| **Implements:** deploy_env_space.md §2 (interface), §3 (behavior), §4 (data structures), §5 (error modes M1–M12), §7 (edge cases); `DRIFTCALL/CLAUDE.md §3.1` (nine-section test-plan doc — this plan supplies the five required sections: Unit, Property, Integration, Coverage, Fixtures). | |
| **Coverage targets:** **100% line** + **≥ 95% branch** on `app.py` + `driftcall/routes/*.py` + `driftcall/session_cache.py`. All 12 error modes **M1–M12** must be raised by at least one test. | |
| **Numeric invariants:** HTTP status codes are exact integers (200, 400, 401, 404, 409, 413, 429, 500, 503). TTL values in tests use `time.monotonic()` monkey-patched via `freezegun`-style fixture — wall-clock is never read directly. Bearer tokens are `secrets.token_urlsafe(32)` strings; never hardcoded magic values outside the `valid_bearer_token` fixture. | |
| **Mandatory assertion on every error response:** `json.loads(resp.text) == {"error": {"code": <slug>, "message": <str>}}` and `resp.headers["Cache-Control"] == "no-store"` — enforced by helper `assert_error_envelope(resp, code, http_status)` that all error-path tests call. | |
| **Mandatory assertion on every success response:** `resp.headers["Content-Type"].startswith("application/json")` (except `/healthz` which is `text/plain`). | |
| Fixtures defined in §5 are **shared** with `deploy_demo_space_tests.md` (same names, same canonicalised content). If any fixture changes here, the shared copy in `tests/conftest.py` MUST be updated in lockstep, and `deploy_demo_space_tests.md §5` cross-checked. | |
| --- | |
| ## 1. Unit Tests | |
| **Organisation:** one `pytest` sub-package mirroring the route layout under `tests/test_deploy_env/`: | |
| ``` | |
| tests/test_deploy_env/ | |
| __init__.py | |
| conftest.py # fixtures from §5, plus assert_error_envelope helper | |
| test_healthz.py # /healthz — unauthenticated, cheap | |
| test_auth.py # bearer enforcement across all mutating endpoints | |
| test_session_header.py # X-Session-Id header validation | |
| test_reset.py # POST /reset happy + error paths | |
| test_step.py # POST /step happy + error paths | |
| test_state.py # GET /state happy + error paths | |
| test_close.py # POST /close happy + error paths | |
| test_body_schemas.py # §2.1.1 shape conformance (envelope, not inner dataclass) | |
| test_session_cache_unit.py # LRU, TTL, eviction sweep — direct cache tests | |
| test_error_modes_mapping.py # M1..M12 matrix — every error mode hit at least once | |
| test_status_code_map.py # every row of §2.2 table asserted | |
| test_lifespan_eager_load.py # app.py lifespan loads Kokoro+Whisper BEFORE serving | |
| ``` | |
| **Unit test case inventory — 28 cases total (exceeds the ≥ 20 requirement).** | |
| ### 1.1 `/healthz` — `test_healthz.py` | |
| | # | Name | Setup | Assertion | | |
| |---|---|---|---| | |
| | U1 | `test_healthz_returns_200_plaintext_ok` | No auth header. | `resp.status_code == 200`; `resp.text == "ok"`; `resp.headers["Content-Type"].startswith("text/plain")`; endpoint does **not** require bearer (§3.5 "unauthenticated"). | | |
| | U2 | `test_healthz_works_when_models_loaded` | Lifespan fixture loads stub Kokoro+Whisper. | `resp.status_code == 200`; no 503 raised even under no-auth request. Confirms `/healthz` bypass is independent of model readiness gate for probe liveness. | | |
| ### 1.2 Bearer auth — `test_auth.py` | |
| Applies to every mutating endpoint (`/reset`, `/step`, `/state`, `/close`). | |
| | # | Name | Setup | Assertion | | |
| |---|---|---|---| | |
| | U3 | `test_reset_missing_authorization_returns_401_M1` | POST `/reset` with `X-Session-Id` but **no** `Authorization` header. | `assert_error_envelope(resp, code="unauthorized", http_status=401)`; matches **M1**. | | |
| | U4 | `test_step_bad_bearer_returns_401_M1` | POST `/step` with `Authorization: Bearer not-the-token`. | `assert_error_envelope(resp, code="unauthorized", http_status=401)`; matches **M1**. Body must **not** leak the expected token. | | |
| | U5 | `test_state_missing_bearer_returns_401_M1` | GET `/state` with no `Authorization`. | `assert_error_envelope(resp, code="unauthorized", http_status=401)`. | | |
| | U6 | `test_close_wrong_scheme_returns_401_M1` | POST `/close` with `Authorization: Basic <token>` (wrong scheme). | `assert_error_envelope(resp, code="unauthorized", http_status=401)`. Only `Bearer` scheme accepted (§3.5). | | |
| ### 1.3 `X-Session-Id` header — `test_session_header.py` | |
| | # | Name | Setup | Assertion | | |
| |---|---|---|---| | |
| | U7 | `test_reset_missing_x_session_id_returns_400_M2` | POST `/reset` with valid bearer, **no** `X-Session-Id`. | `assert_error_envelope(resp, code="missing_session_id", http_status=400)`; matches **M2**. | | |
| | U8 | `test_step_malformed_x_session_id_returns_400_M2` | POST `/step` with `X-Session-Id: "bad session!"` (space + `!`, violates `[A-Za-z0-9_-]` charset). | `assert_error_envelope(resp, code="missing_session_id", http_status=400)`; matches **M2** (treated as "not a valid session id"). | | |
| | U9 | `test_step_x_session_id_over_64_chars_returns_400_M2` | POST `/step` with `X-Session-Id` of length 65. | `assert_error_envelope(resp, code="missing_session_id", http_status=400)`; matches **M2**. | | |
| ### 1.4 `POST /reset` — `test_reset.py` | |
| | # | Name | Setup | Assertion | | |
| |---|---|---|---| | |
| | U10 | `test_reset_happy_path_returns_200_and_observation_envelope` | Valid bearer, `X-Session-Id: session_id_alpha`, body `{"seed": 42, "config": {"curriculum_stage": 1}}`. | `resp.status_code == 200`; body top-level keys `== {"observation", "episode_id", "max_turns"}`; `episode_id` is a uuid4 string; `max_turns` is `int`, 1 ≤ value ≤ 16; `observation` is a dict. Envelope conformance per §2.1.1. | | |
| | U11 | `test_reset_with_language_weights_returns_200` | Valid bearer, valid session id, body `{"config": {"language_weights": {"hi": 0.5, "ta": 0.5}}}`. | `resp.status_code == 200`; observation includes the requested language distribution's imprint (via `info.config_echo` if exposed — else just assert envelope). | | |
| | U12 | `test_reset_bad_json_returns_400_M7` | POST `/reset` with body `b"{not json"` and `Content-Type: application/json`. | `assert_error_envelope(resp, code="bad_json", http_status=400)`; matches **M7**. | | |
| | U13 | `test_reset_invalid_curriculum_stage_returns_400_M8` | Body `{"config": {"curriculum_stage": 99}}`. | `assert_error_envelope(resp, code="invalid_action", http_status=400)`; matches **M8** (dataclass validation failure on reset config). | | |
| | U14 | `test_reset_payload_over_1mib_returns_413_M11` | Body size = 1 MiB + 1 byte (padded `config` dict). | `assert_error_envelope(resp, code="payload_too_large", http_status=413)`; matches **M11**. | | |
| ### 1.5 `POST /step` — `test_step.py` | |
| | # | Name | Setup | Assertion | | |
| |---|---|---|---| | |
| | U15 | `test_step_happy_path_returns_200` | Session pre-created via `/reset`; body `{"action": {"action_type": "tool_call", "tool_name": "airline.search", "tool_args": {}}}`. | `resp.status_code == 200`; body keys `== {"observation", "reward", "done", "info"}`; `reward` is `float` **or** `None`; `done` is `bool`. Envelope per §2.1.1. | | |
| | U16 | `test_step_unknown_session_returns_404_M3` | No prior `/reset`; POST `/step` with `X-Session-Id: never-existed-0001`. | `assert_error_envelope(resp, code="session_not_found", http_status=404)`; matches **M3**. | | |
| | U17 | `test_step_invalid_action_shape_returns_400_M8` | Session pre-created; body `{"action": {"action_type": "tool_call"}}` (missing `tool_name`). | `assert_error_envelope(resp, code="invalid_action", http_status=400)`; matches **M8**. | | |
| | U18 | `test_step_internal_exception_returns_500_M9_no_stacktrace` | Monkey-patch `env.step` to raise `RuntimeError("boom")`. | `assert_error_envelope(resp, code="internal_error", http_status=500)`; matches **M9**. `"boom"` does **not** appear in body (stack-trace suppression §5 rule 1). `resp.json()["error"]["request_id"]` is present (ASGI scope id). | | |
| ### 1.6 `GET /state` — `test_state.py` | |
| | # | Name | Setup | Assertion | | |
| |---|---|---|---| | |
| | U19 | `test_state_happy_path_returns_200` | Session pre-created via `/reset` then two `/step`s. | `resp.status_code == 200`; body keys `== {"state", "turn"}`; `turn == 2` (int). Envelope per §2.1.1. | | |
| | U20 | `test_state_expired_session_returns_404_M4` | Session exists at `t0`; monotonic clock advanced by 3601 s via fixture; sweep runs; GET `/state`. | `assert_error_envelope(resp, code="session_expired", http_status=404)`; matches **M4**. | | |
| ### 1.7 `POST /close` — `test_close.py` | |
| | # | Name | Setup | Assertion | | |
| |---|---|---|---| | |
| | U21 | `test_close_happy_path_returns_200_and_final_state` | Session pre-created. | `resp.status_code == 200`; body keys `== {"closed", "final_state"}`; `closed is True`; `final_state` is a dict. | | |
| | U22 | `test_close_on_already_evicted_session_returns_200_with_null_final_state` | Session was evicted by sweep before `/close` arrives. | `resp.status_code == 200`; `resp.json() == {"closed": True, "final_state": None}` (§2.1.1 "null if session was already evicted"). | | |
| ### 1.8 Session cache direct unit tests — `test_session_cache_unit.py` | |
| These bypass HTTP and call the cache API directly, to pin the policy invariants from §3.2. | |
| | # | Name | Setup | Assertion | | |
| |---|---|---|---| | |
| | U23 | `test_cache_lru_eviction_on_11th_session` | Fill cache with sessions `s0..s9` (max=10); insert `s10`. | Cache size remains `== 10`; `s0` (oldest `last_touched`) is evicted; `s10` is present; `env.close()` was called on the evicted entry (spy assertion). §3.2 invariant. | | |
| | U24 | `test_cache_ttl_sweep_evicts_stale_entries` | Insert `s_old` at `t0`; advance monotonic clock by 3601 s; call `cache.sweep()`. | `s_old` no longer in cache; spy confirms `env.close()` called; cache remains internally consistent (len == 0). §3.3. | | |
| | U25 | `test_cache_max_sessions_returns_429_M5_with_retry_after` | Cache full of 10 fresh sessions (all touched < 1 s ago); POST `/reset` with a new `X-Session-Id`. | `resp.status_code == 429`; `assert_error_envelope(resp, code="max_sessions", http_status=429)`; `resp.headers["Retry-After"] == "30"` (only M5 carries Retry-After — §5 rules). Matches **M5**. | | |
| ### 1.9 Error-mode matrix — `test_error_modes_mapping.py` | |
| One parametrized test asserting **M1..M12** are each reachable and return the expected HTTP code + slug. Parameters: | |
| ``` | |
| [ | |
| ("M1", "unauthorized", 401, <bad_bearer_request>), | |
| ("M2", "missing_session_id", 400, <no_session_header_request>), | |
| ("M3", "session_not_found", 404, <step_on_unknown_sid>), | |
| ("M4", "session_expired", 404, <step_after_ttl_expiry>), | |
| ("M5", "max_sessions", 429, <reset_when_cache_full>), | |
| ("M6", "model_not_ready", 503, <step_before_lifespan_load>), | |
| ("M7", "bad_json", 400, <malformed_body>), | |
| ("M8", "invalid_action", 400, <wrong_action_shape>), | |
| ("M9", "internal_error", 500, <env_step_raises>), | |
| ("M10", "io_error", 500, <tmpfs_full_monkeypatch>), | |
| ("M11", "payload_too_large", 413, <oversize_body>), | |
| ("M12", "reset_in_progress", 409, <concurrent_reset_same_sid>), | |
| ] | |
| ``` | |
| | # | Name | Setup | Assertion | | |
| |---|---|---|---| | |
| | U26 | `test_error_modes_M1_through_M12_full_matrix` | Parametrized over the 12 tuples above. | For every row: `resp.status_code == expected_http`; `resp.json()["error"]["code"] == expected_slug`; `resp.headers["Cache-Control"] == "no-store"`; `resp.headers.get("Retry-After")` is `"30"` iff row is M5 else absent. | | |
| ### 1.10 Lifespan eager load — `test_lifespan_eager_load.py` | |
| | # | Name | Setup | Assertion | | |
| |---|---|---|---| | |
| | U27 | `test_lifespan_loads_models_before_serving_requests` | Instrument `audio.tts_kokoro.load` and `audio.asr_whisper.load` with call-counter. Start app via `LifespanManager`; issue `/reset` immediately after startup event fires. | Call-counters `== 1` each **before** any request handler runs (assertion inside lifespan startup). Request returns 200, never 503. §7.3. | | |
| | U28 | `test_step_before_lifespan_complete_returns_503_M6` | Monkey-patch lifespan to defer model load; issue `/step` during the deferred window. | `assert_error_envelope(resp, code="model_not_ready", http_status=503)`; matches **M6**. Confirms the guard exists before models are ready. | | |
| --- | |
| ## 2. Property Tests | |
| Hypothesis-driven invariants on the deployment surface. Minimum **5 properties**; this plan specifies **7** (two extra for margin). | |
| ### 2.1 `P1` — `/step` is idempotent on invalid action (env state unchanged) | |
| **Strategy:** `invalid_action_strategy = hypothesis.strategies.dictionaries(...)` producing action bodies that fail pydantic validation (missing fields, wrong types, unknown `action_type`). | |
| **Invariant:** | |
| ``` | |
| pre_state = GET /state (turn = T) | |
| resp = POST /step with invalid action # → 400 M8 | |
| post_state = GET /state | |
| assert pre_state == post_state # turn unchanged, drift_schedule unchanged | |
| assert resp.status_code == 400 | |
| ``` | |
| Confirms §7.5 transactional step semantics: state only mutates after all work succeeds; a rejected action is a no-op. | |
| ### 2.2 `P2` — Session expiration is monotonic and consistent | |
| **Strategy:** `st.integers(min_value=0, max_value=7200)` for synthetic elapsed seconds. | |
| **Invariant:** | |
| ``` | |
| For any elapsed ∈ [0, 7200]: | |
| if elapsed < 3600: /step returns 200 (session alive) | |
| if elapsed >= 3600: /step returns 404 M4 (session expired) | |
| Once expired, the session NEVER becomes alive again without a new /reset. | |
| ``` | |
| Tests monotone one-way transition: `alive → expired` is terminal. §3.2 TTL = 3600 s. | |
| ### 2.3 `P3` — Error envelope shape is universal | |
| **Strategy:** parametrized across all 12 error-triggering inputs (from U26 matrix). | |
| **Invariant:** every error response satisfies: | |
| ``` | |
| body = resp.json() | |
| set(body.keys()) == {"error"} | |
| set(body["error"].keys()) >= {"code", "message"} | |
| isinstance(body["error"]["code"], str) and body["error"]["code"] != "" | |
| isinstance(body["error"]["message"], str) | |
| "traceback" not in json.dumps(body).lower() | |
| "bearer" not in body["error"]["message"].lower() # no token leakage | |
| ``` | |
| ### 2.4 `P4` — `X-Session-Id` charset and length round-trip | |
| **Strategy:** `st.text(alphabet=string.ascii_letters + string.digits + "_-", min_size=1, max_size=64)` generates valid session ids; a second strategy generates invalid ones (containing `!@# `, length 0, length 65+). | |
| **Invariant:** | |
| ``` | |
| valid_sid → /reset returns 200 | |
| invalid_sid → /reset returns 400 M2 | |
| After /reset with valid_sid: | |
| GET /state with the same sid returns 200 | |
| GET /state with ANY other sid returns 404 M3 | |
| ``` | |
| ### 2.5 `P5` — LRU eviction preserves cache size cap | |
| **Strategy:** `st.lists(st.text(alphabet=string.ascii_letters, min_size=8, max_size=16), min_size=11, max_size=50, unique=True)` — sequences of distinct session ids. | |
| **Invariant:** after POSTing `/reset` for every sid in the list (one at a time): | |
| ``` | |
| len(cache) == min(len(sids), 10) | |
| The 10 present sids are exactly the 10 most-recently-inserted (by last_touched). | |
| No env instance is leaked (every evicted env had .close() called exactly once). | |
| ``` | |
| ### 2.6 `P6` — Reward field is float-or-null | |
| **Strategy:** parametrized over valid actions per `DriftCallAction` shape. | |
| **Invariant:** every `/step` 200-response body satisfies: | |
| ``` | |
| reward = body["reward"] | |
| assert reward is None or (isinstance(reward, float) and -1.0 <= reward <= 1.0) | |
| assert isinstance(body["done"], bool) | |
| ``` | |
| Pins §2.1.1 envelope: `reward: float | null`, range aligned with `openenv.yaml` `reward.range: [-1.0, 1.0]` (§4.3). | |
| ### 2.7 `P7` — Concurrent `/reset` on same sid never produces two envs | |
| **Strategy:** `hypothesis.stateful.RuleBasedStateMachine` driving concurrent `/reset` calls on the same `X-Session-Id` via `anyio.create_task_group`. | |
| **Invariant:** | |
| ``` | |
| Across N concurrent /reset calls on the same sid: | |
| exactly one succeeds with 200 (winner) | |
| the remaining N-1 return 409 M12 (reset_in_progress) | |
| cache ends with exactly one env under that sid | |
| no env instance is leaked | |
| ``` | |
| §7.1 per-session asyncio lock invariant. | |
| --- | |
| ## 3. Integration Tests | |
| Cross-cutting scenarios that exercise real subsystems. Marked `@pytest.mark.integration`; run in CI only, not in the fast `pytest tests/` loop. | |
| ### 3.1 `I1` — End-to-end curl flow: `/reset` → 6× `/step` → `/state` → `/close` | |
| **Mechanism:** `subprocess.run(["curl", ...])` against a locally-booted FastAPI app (via `uvicorn` subprocess, port 7860). Uses the **real** `curl` binary to exercise headers + HTTP/1.1 semantics exactly as judges will. | |
| **Flow:** | |
| 1. Start uvicorn in a subprocess, wait for `/healthz` to return `ok` (max 45 s, matches `HEALTHCHECK --start-period=45s` in §4.2). | |
| 2. `curl -X POST /reset` with bearer + `X-Session-Id: e2e-001`, body `{"seed": 42, "config": {"curriculum_stage": 1}}`. Assert 200. | |
| 3. Loop 6 times: `curl -X POST /step` with a `tool_call` action. Assert 200 each time; accumulate `done` values. | |
| 4. `curl /state`. Assert 200; `turn >= 6`. | |
| 5. `curl -X POST /close`. Assert 200; `closed is True`. | |
| 6. Kill uvicorn subprocess; assert no zombie process. | |
| **Budget:** single test must complete under 60 s including subprocess boot. | |
| ### 3.2 `I2` — Docker build locally + `openenv validate` | |
| **Mechanism:** `docker build -t driftcall-env:test -f DRIFTCALL/Dockerfile DRIFTCALL/` then `docker run -d -p 7860:7860 -e DRIFTCALL_ENV_TOKEN=test-token driftcall-env:test`, then `openenv validate http://localhost:7860 --auth-bearer test-token`. | |
| **Assertions:** | |
| 1. `docker build` exits 0. | |
| 2. Image size < 2 GB (`docker image inspect driftcall-env:test --format '{{.Size}}'` < `2 * 1024**3`). | |
| 3. Container healthz returns `ok` within 60 s of `docker run`. | |
| 4. `openenv validate` exits 0 and its stdout contains each of: | |
| - `openenv.yaml parses, schema v1.0` | |
| - `POST /reset` success line | |
| - `POST /step` success line | |
| - `GET /state` success line | |
| - `POST /close` success line | |
| - `6 endpoints validated, 0 errors` | |
| 5. Container cleanup: `docker rm -f` in `finally` block. | |
| **Gating:** marked `@pytest.mark.skipif(not shutil.which("docker"))` — locally opt-in, mandatory in CI. | |
| ### 3.3 `I3` — HF Space deploy dry-run (no actual push) | |
| **Mechanism:** `hf upload --dry-run <team>/driftcall-env . --repo-type=space`. Captures the file manifest that **would** be pushed. | |
| **Assertions:** | |
| 1. Exit code 0. | |
| 2. Manifest includes: `app.py`, `openenv.yaml`, `Dockerfile`, `requirements.txt`, `README.md`, `driftcall/` subtree. | |
| 3. Manifest **excludes**: `tests/`, `training/`, `data/raw/`, `.env*`, `*.ipynb`, `.git/`. | |
| 4. `README.md` YAML frontmatter contains required keys: `title`, `sdk: docker`, `app_port: 7860`, `emoji`, `colorFrom`, `colorTo` (§4.4). | |
| 5. **No actual network call** to `huggingface.co` — enforced via `monkeypatch` on `huggingface_hub` outbound session to raise if reached. | |
| ### 3.4 `I4` — Concurrent 10-session load test | |
| **Mechanism:** `anyio.create_task_group` spawning 10 coroutines, each driving a unique `X-Session-Id` through `/reset` → 3× `/step` → `/close` against `TestClient(app)`. | |
| **Assertions:** | |
| 1. All 10 `/reset` calls return 200 (cache is exactly at cap). | |
| 2. An 11th concurrent `/reset` (while the first 10 are still `last_touched < TTL`) returns **429 M5** with `Retry-After: 30` (proves cap enforcement under contention). | |
| 3. All 30 `/step` calls (3 × 10 sessions) return 200; no cross-session state bleed — each session's `observation.turn` progresses independently (`1, 2, 3`). | |
| 4. All 10 `/close` calls return 200. | |
| 5. Wall-clock budget: total test completes in < 30 s on CI 2-vCPU runner. | |
| ### 3.5 `I5` — Cold-start lifespan blocks request serving until models loaded | |
| **Mechanism:** Instrument `audio.tts_kokoro.load` with an artificial 2 s `anyio.sleep`. Boot the app via `LifespanManager` and concurrently fire a `/reset` request at `t=0` (before startup completes). | |
| **Assertions:** | |
| 1. The `/reset` request **blocks** until lifespan startup is complete — it does **not** return 503 during the loading window if `app.py` correctly awaits lifespan before accepting requests (this is the FastAPI default). | |
| 2. If instead we disable the lifespan gate (test variant), the request returns **503 M6** with `code="model_not_ready"` — proves M6 is reachable and the guard is load-bearing. | |
| 3. `/healthz` responds 200 throughout (probe endpoint is cheap and does not require models — §3.5 "unauthenticated"). | |
| ### 3.6 `I6` — TTL sweep liveness under sustained traffic | |
| **Mechanism:** Run the `TestClient` against the app for 70 s of simulated traffic (monotonic clock advanced via fixture), issuing one `/reset` per synthetic minute with a fresh `X-Session-Id`. Sweep runs every 60 s per §3.3. | |
| **Assertions:** | |
| 1. After the 61st synthetic second, the first session's entry has been evicted by the sweep task. | |
| 2. A `/step` on that first session returns 404 M4. | |
| 3. The sweep task itself does not raise; logs contain exactly one "swept 1 expired session" structured log line per sweep cycle (§3.7 logging fields). | |
| --- | |
| ## 4. Coverage Target | |
| **Targets (enforced in CI via `pytest --cov-fail-under`):** | |
| | Artifact | Line coverage | Branch coverage | | |
| |---|---|---| | |
| | `app.py` | **100%** | **≥ 95%** | | |
| | `driftcall/routes/reset.py` | **100%** | **≥ 95%** | | |
| | `driftcall/routes/step.py` | **100%** | **≥ 95%** | | |
| | `driftcall/routes/state.py` | **100%** | **≥ 95%** | | |
| | `driftcall/routes/close.py` | **100%** | **≥ 95%** | | |
| | `driftcall/routes/health.py` | **100%** | **100%** (trivial file) | | |
| | `driftcall/session_cache.py` | **100%** | **≥ 95%** | | |
| **Command:** | |
| ``` | |
| pytest tests/test_deploy_env/ \ | |
| --cov=app \ | |
| --cov=driftcall.routes \ | |
| --cov=driftcall.session_cache \ | |
| --cov-branch \ | |
| --cov-report=term-missing \ | |
| --cov-fail-under=100 | |
| ``` | |
| **Branch-coverage carve-outs (documented pragmas, not silent):** the `except asyncio.CancelledError: raise` guard at the bottom of the sweep task's loop is excluded via `# pragma: no cover` — re-raising a cancellation is standard-library contract and triggering it requires injecting a cancellation into the `lifespan` shutdown, which is covered by the lifespan test (I5) at the event-loop level. | |
| **Error-mode coverage ledger — every one of M1..M12 is raised by at least one test:** | |
| | Mode | Raised by | HTTP | | |
| |---|---|---| | |
| | M1 `unauthorized` | U3, U4, U5, U6, U26 | 401 | | |
| | M2 `missing_session_id` | U7, U8, U9, U26, P4 | 400 | | |
| | M3 `session_not_found` | U16, U26, P4 | 404 | | |
| | M4 `session_expired` | U20, U26, P2, I6 | 404 | | |
| | M5 `max_sessions` | U25, U26, I4 | 429 | | |
| | M6 `model_not_ready` | U28, U26, I5 | 503 | | |
| | M7 `bad_json` | U12, U26 | 400 | | |
| | M8 `invalid_action` | U13, U17, U26, P1 | 400 | | |
| | M9 `internal_error` | U18, U26 | 500 | | |
| | M10 `io_error` | U26 (monkeypatched tmpfs full) | 500 | | |
| | M11 `payload_too_large` | U14, U26 | 413 | | |
| | M12 `reset_in_progress` | U26, P7 | 409 | | |
| **HTTP status codes asserted at least once:** `200, 400, 401, 404, 409, 413, 429, 500, 503` — all nine from §2.2. | |
| --- | |
| ## 5. Fixtures | |
| Defined in `tests/conftest.py` (project-wide) and imported by `tests/test_deploy_env/conftest.py`. **Shared** with `deploy_demo_space_tests.md` — any change here propagates there and vice versa. | |
| ### 5.1 `fastapi_test_client` | |
| ``` | |
| @pytest.fixture | |
| def fastapi_test_client(monkeypatch, valid_bearer_token, stub_audio_models): | |
| """ | |
| Boots the FastAPI app with lifespan, stub Kokoro+Whisper loaded, | |
| and bearer token injected into app config. | |
| Yields a `fastapi.testclient.TestClient` that supports all HTTP verbs | |
| against the live app (in-process, no socket). | |
| Lifecycle: uses LifespanManager to fire startup/shutdown events; | |
| cache is flushed between tests via autouse cache-reset fixture. | |
| """ | |
| monkeypatch.setenv("DRIFTCALL_ENV_TOKEN", valid_bearer_token) | |
| from app import app | |
| with TestClient(app) as client: | |
| yield client | |
| ``` | |
| Used by: every unit test in §1, properties P1–P7, integration tests I1, I4, I5, I6. | |
| ### 5.2 `valid_bearer_token` | |
| ``` | |
| @pytest.fixture(scope="session") | |
| def valid_bearer_token() -> str: | |
| """A freshly-generated URL-safe token, session-scoped so it is stable | |
| across tests in one pytest run but distinct between runs.""" | |
| return secrets.token_urlsafe(32) | |
| ``` | |
| Used by: every test that asserts 200 on a mutating endpoint, plus the "bad bearer" tests (which receive `valid_bearer_token + "x"` as the wrong token). | |
| ### 5.3 `session_id_alpha` | |
| ``` | |
| @pytest.fixture | |
| def session_id_alpha() -> str: | |
| """Deterministic session id for tests that only need one sid.""" | |
| return "session-alpha-0001" | |
| ``` | |
| Charset and length both pass the header validator (§2.1 headers table). | |
| ### 5.4 `session_id_beta` | |
| ``` | |
| @pytest.fixture | |
| def session_id_beta() -> str: | |
| """Second deterministic session id for cross-session tests | |
| (e.g., asserting no state bleed between alpha and beta).""" | |
| return "session-beta-0002" | |
| ``` | |
| ### 5.5 Helper fixtures (non-shared, internal to this test package) | |
| - `stub_audio_models` — monkeypatches `audio.tts_kokoro.load` and `audio.asr_whisper.load` to return lightweight stubs so lifespan completes in < 50 ms. Used everywhere except I5 (which tests real-ish load behavior). | |
| - `monotonic_clock` — monkeypatches `time.monotonic()` to advance deterministically; used by U20, U24, P2, I6. | |
| - `cache_reset` (autouse) — clears `session_cache._store` between tests; prevents cross-test bleed. | |
| - `assert_error_envelope(resp, code, http_status)` — imported helper, asserts envelope shape + `Cache-Control: no-store` header + optional `Retry-After` when `code == "max_sessions"`. | |
| - `one_mib_plus_one_body` — precomputed `bytes` payload for U14 (M11 oversize test). | |
| **Fixture ownership note:** `fastapi_test_client`, `valid_bearer_token`, `session_id_alpha`, `session_id_beta` live in `tests/conftest.py` at the project root and are the shared set with `deploy_demo_space_tests.md`. Helper fixtures (§5.5) are local to `tests/test_deploy_env/conftest.py` and are **not** shared. | |
| --- | |
| ## 6. Non-goals (out of scope for this plan) | |
| - Deep per-field validation of `DriftCallObservation` / `DriftCallAction` / `DriftCallState` — owned by `env_tests.md` + `models_tests.md`. | |
| - Reward math correctness — owned by `rewards_tests.md`. | |
| - Kokoro / Whisper model quality — owned by `audio_tests.md`. | |
| - Actual HF Hub pushes — forbidden in tests (§3.3 dry-run only); real push happens in Batch C3 manual verification. | |
| - GPU behavior — deployment is CPU-only (deploy_env_space.md §1, §6.5 explicit non-dependency). | |
| - Cross-worker cache coherence — documented as acceptable 404 path in §3.2 of the spec; not a test target for this hackathon (future hardening). | |