Spaces:

saumilyajj
/

driftcall

Sleeping

App Files Files Community

driftcall / docs /tests /deploy_env_space_tests.md

saumilyajj

Upload folder using huggingface_hub

f2df60e verified about 1 month ago

preview code

raw

history blame contribute delete

27.4 kB

deploy_env_space_tests.md — Test Plan for `docs/modules/deploy_env_space.md`

Target artifact: app.py (FastAPI entrypoint) + driftcall/routes/*.py (per-endpoint handlers: reset.py, step.py, state.py, close.py, health.py) + driftcall/session_cache.py (in-process session cache + eviction sweep) + Dockerfile + openenv.yaml Spec doc: DRIFTCALL/docs/modules/deploy_env_space.md (final, sealed 2026-04-24) Framework: pytest + httpx (via fastapi.testclient.TestClient) + hypothesis (properties) + docker CLI (integration only) Owner: Person B (Rewards & Tests) — domain-reviewed by Person D (Deploy & Story) Implements: deploy_env_space.md §2 (interface), §3 (behavior), §4 (data structures), §5 (error modes M1–M12), §7 (edge cases); DRIFTCALL/CLAUDE.md §3.1 (nine-section test-plan doc — this plan supplies the five required sections: Unit, Property, Integration, Coverage, Fixtures). Coverage targets: 100% line + ≥ 95% branch on app.py + driftcall/routes/*.py + driftcall/session_cache.py. All 12 error modes M1–M12 must be raised by at least one test. Numeric invariants: HTTP status codes are exact integers (200, 400, 401, 404, 409, 413, 429, 500, 503). TTL values in tests use time.monotonic() monkey-patched via freezegun-style fixture — wall-clock is never read directly. Bearer tokens are secrets.token_urlsafe(32) strings; never hardcoded magic values outside the valid_bearer_token fixture. Mandatory assertion on every error response: json.loads(resp.text) == {"error": {"code": <slug>, "message": <str>}} and resp.headers["Cache-Control"] == "no-store" — enforced by helper assert_error_envelope(resp, code, http_status) that all error-path tests call. Mandatory assertion on every success response: resp.headers["Content-Type"].startswith("application/json") (except /healthz which is text/plain).

Fixtures defined in §5 are shared with deploy_demo_space_tests.md (same names, same canonicalised content). If any fixture changes here, the shared copy in tests/conftest.py MUST be updated in lockstep, and deploy_demo_space_tests.md §5 cross-checked.

1. Unit Tests

Organisation: one pytest sub-package mirroring the route layout under tests/test_deploy_env/:

tests/test_deploy_env/
  __init__.py
  conftest.py                        # fixtures from §5, plus assert_error_envelope helper
  test_healthz.py                    # /healthz — unauthenticated, cheap
  test_auth.py                       # bearer enforcement across all mutating endpoints
  test_session_header.py             # X-Session-Id header validation
  test_reset.py                      # POST /reset happy + error paths
  test_step.py                       # POST /step happy + error paths
  test_state.py                      # GET /state happy + error paths
  test_close.py                      # POST /close happy + error paths
  test_body_schemas.py               # §2.1.1 shape conformance (envelope, not inner dataclass)
  test_session_cache_unit.py         # LRU, TTL, eviction sweep — direct cache tests
  test_error_modes_mapping.py        # M1..M12 matrix — every error mode hit at least once
  test_status_code_map.py            # every row of §2.2 table asserted
  test_lifespan_eager_load.py        # app.py lifespan loads Kokoro+Whisper BEFORE serving

Unit test case inventory — 28 cases total (exceeds the ≥ 20 requirement).

1.1 `/healthz` — `test_healthz.py`

#	Name	Setup	Assertion
U1	`test_healthz_returns_200_plaintext_ok`	No auth header.	`resp.status_code == 200`; `resp.text == "ok"`; `resp.headers["Content-Type"].startswith("text/plain")`; endpoint does not require bearer (§3.5 "unauthenticated").
U2	`test_healthz_works_when_models_loaded`	Lifespan fixture loads stub Kokoro+Whisper.	`resp.status_code == 200`; no 503 raised even under no-auth request. Confirms `/healthz` bypass is independent of model readiness gate for probe liveness.

1.2 Bearer auth — `test_auth.py`

Applies to every mutating endpoint (/reset, /step, /state, /close).

#	Name	Setup	Assertion
U3	`test_reset_missing_authorization_returns_401_M1`	POST `/reset` with `X-Session-Id` but no `Authorization` header.	`assert_error_envelope(resp, code="unauthorized", http_status=401)`; matches M1.
U4	`test_step_bad_bearer_returns_401_M1`	POST `/step` with `Authorization: Bearer not-the-token`.	`assert_error_envelope(resp, code="unauthorized", http_status=401)`; matches M1. Body must not leak the expected token.
U5	`test_state_missing_bearer_returns_401_M1`	GET `/state` with no `Authorization`.	`assert_error_envelope(resp, code="unauthorized", http_status=401)`.
U6	`test_close_wrong_scheme_returns_401_M1`	POST `/close` with `Authorization: Basic <token>` (wrong scheme).	`assert_error_envelope(resp, code="unauthorized", http_status=401)`. Only `Bearer` scheme accepted (§3.5).

1.3 `X-Session-Id` header — `test_session_header.py`

#	Name	Setup	Assertion
U7	`test_reset_missing_x_session_id_returns_400_M2`	POST `/reset` with valid bearer, no `X-Session-Id`.	`assert_error_envelope(resp, code="missing_session_id", http_status=400)`; matches M2.
U8	`test_step_malformed_x_session_id_returns_400_M2`	POST `/step` with `X-Session-Id: "bad session!"` (space + `!`, violates `[A-Za-z0-9_-]` charset).	`assert_error_envelope(resp, code="missing_session_id", http_status=400)`; matches M2 (treated as "not a valid session id").
U9	`test_step_x_session_id_over_64_chars_returns_400_M2`	POST `/step` with `X-Session-Id` of length 65.	`assert_error_envelope(resp, code="missing_session_id", http_status=400)`; matches M2.

1.4 `POST /reset` — `test_reset.py`

#	Name	Setup	Assertion
U10	`test_reset_happy_path_returns_200_and_observation_envelope`	Valid bearer, `X-Session-Id: session_id_alpha`, body `{"seed": 42, "config": {"curriculum_stage": 1}}`.	`resp.status_code == 200`; body top-level keys `== {"observation", "episode_id", "max_turns"}`; `episode_id` is a uuid4 string; `max_turns` is `int`, 1 ≤ value ≤ 16; `observation` is a dict. Envelope conformance per §2.1.1.
U11	`test_reset_with_language_weights_returns_200`	Valid bearer, valid session id, body `{"config": {"language_weights": {"hi": 0.5, "ta": 0.5}}}`.	`resp.status_code == 200`; observation includes the requested language distribution's imprint (via `info.config_echo` if exposed — else just assert envelope).
U12	`test_reset_bad_json_returns_400_M7`	POST `/reset` with body `b"{not json"` and `Content-Type: application/json`.	`assert_error_envelope(resp, code="bad_json", http_status=400)`; matches M7.
U13	`test_reset_invalid_curriculum_stage_returns_400_M8`	Body `{"config": {"curriculum_stage": 99}}`.	`assert_error_envelope(resp, code="invalid_action", http_status=400)`; matches M8 (dataclass validation failure on reset config).
U14	`test_reset_payload_over_1mib_returns_413_M11`	Body size = 1 MiB + 1 byte (padded `config` dict).	`assert_error_envelope(resp, code="payload_too_large", http_status=413)`; matches M11.

1.5 `POST /step` — `test_step.py`

#	Name	Setup	Assertion
U15	`test_step_happy_path_returns_200`	Session pre-created via `/reset`; body `{"action": {"action_type": "tool_call", "tool_name": "airline.search", "tool_args": {}}}`.	`resp.status_code == 200`; body keys `== {"observation", "reward", "done", "info"}`; `reward` is `float` or `None`; `done` is `bool`. Envelope per §2.1.1.
U16	`test_step_unknown_session_returns_404_M3`	No prior `/reset`; POST `/step` with `X-Session-Id: never-existed-0001`.	`assert_error_envelope(resp, code="session_not_found", http_status=404)`; matches M3.
U17	`test_step_invalid_action_shape_returns_400_M8`	Session pre-created; body `{"action": {"action_type": "tool_call"}}` (missing `tool_name`).	`assert_error_envelope(resp, code="invalid_action", http_status=400)`; matches M8.
U18	`test_step_internal_exception_returns_500_M9_no_stacktrace`	Monkey-patch `env.step` to raise `RuntimeError("boom")`.	`assert_error_envelope(resp, code="internal_error", http_status=500)`; matches M9. `"boom"` does not appear in body (stack-trace suppression §5 rule 1). `resp.json()["error"]["request_id"]` is present (ASGI scope id).

1.6 `GET /state` — `test_state.py`

#	Name	Setup	Assertion
U19	`test_state_happy_path_returns_200`	Session pre-created via `/reset` then two `/step`s.	`resp.status_code == 200`; body keys `== {"state", "turn"}`; `turn == 2` (int). Envelope per §2.1.1.
U20	`test_state_expired_session_returns_404_M4`	Session exists at `t0`; monotonic clock advanced by 3601 s via fixture; sweep runs; GET `/state`.	`assert_error_envelope(resp, code="session_expired", http_status=404)`; matches M4.

1.7 `POST /close` — `test_close.py`

#	Name	Setup	Assertion
U21	`test_close_happy_path_returns_200_and_final_state`	Session pre-created.	`resp.status_code == 200`; body keys `== {"closed", "final_state"}`; `closed is True`; `final_state` is a dict.
U22	`test_close_on_already_evicted_session_returns_200_with_null_final_state`	Session was evicted by sweep before `/close` arrives.	`resp.status_code == 200`; `resp.json() == {"closed": True, "final_state": None}` (§2.1.1 "null if session was already evicted").

1.8 Session cache direct unit tests — `test_session_cache_unit.py`

These bypass HTTP and call the cache API directly, to pin the policy invariants from §3.2.

#	Name	Setup	Assertion
U23	`test_cache_lru_eviction_on_11th_session`	Fill cache with sessions `s0..s9` (max=10); insert `s10`.	Cache size remains `== 10`; `s0` (oldest `last_touched`) is evicted; `s10` is present; `env.close()` was called on the evicted entry (spy assertion). §3.2 invariant.
U24	`test_cache_ttl_sweep_evicts_stale_entries`	Insert `s_old` at `t0`; advance monotonic clock by 3601 s; call `cache.sweep()`.	`s_old` no longer in cache; spy confirms `env.close()` called; cache remains internally consistent (len == 0). §3.3.
U25	`test_cache_max_sessions_returns_429_M5_with_retry_after`	Cache full of 10 fresh sessions (all touched < 1 s ago); POST `/reset` with a new `X-Session-Id`.	`resp.status_code == 429`; `assert_error_envelope(resp, code="max_sessions", http_status=429)`; `resp.headers["Retry-After"] == "30"` (only M5 carries Retry-After — §5 rules). Matches M5.

1.9 Error-mode matrix — `test_error_modes_mapping.py`

One parametrized test asserting M1..M12 are each reachable and return the expected HTTP code + slug. Parameters:

[
  ("M1",  "unauthorized",       401, <bad_bearer_request>),
  ("M2",  "missing_session_id", 400, <no_session_header_request>),
  ("M3",  "session_not_found",  404, <step_on_unknown_sid>),
  ("M4",  "session_expired",    404, <step_after_ttl_expiry>),
  ("M5",  "max_sessions",       429, <reset_when_cache_full>),
  ("M6",  "model_not_ready",    503, <step_before_lifespan_load>),
  ("M7",  "bad_json",           400, <malformed_body>),
  ("M8",  "invalid_action",     400, <wrong_action_shape>),
  ("M9",  "internal_error",     500, <env_step_raises>),
  ("M10", "io_error",           500, <tmpfs_full_monkeypatch>),
  ("M11", "payload_too_large",  413, <oversize_body>),
  ("M12", "reset_in_progress",  409, <concurrent_reset_same_sid>),
]

#	Name	Setup	Assertion
U26	`test_error_modes_M1_through_M12_full_matrix`	Parametrized over the 12 tuples above.	For every row: `resp.status_code == expected_http`; `resp.json()["error"]["code"] == expected_slug`; `resp.headers["Cache-Control"] == "no-store"`; `resp.headers.get("Retry-After")` is `"30"` iff row is M5 else absent.

1.10 Lifespan eager load — `test_lifespan_eager_load.py`

#	Name	Setup	Assertion
U27	`test_lifespan_loads_models_before_serving_requests`	Instrument `audio.tts_kokoro.load` and `audio.asr_whisper.load` with call-counter. Start app via `LifespanManager`; issue `/reset` immediately after startup event fires.	Call-counters `== 1` each before any request handler runs (assertion inside lifespan startup). Request returns 200, never 503. §7.3.
U28	`test_step_before_lifespan_complete_returns_503_M6`	Monkey-patch lifespan to defer model load; issue `/step` during the deferred window.	`assert_error_envelope(resp, code="model_not_ready", http_status=503)`; matches M6. Confirms the guard exists before models are ready.

2. Property Tests

Hypothesis-driven invariants on the deployment surface. Minimum 5 properties; this plan specifies 7 (two extra for margin).

2.1 `P1` — `/step` is idempotent on invalid action (env state unchanged)

Strategy: invalid_action_strategy = hypothesis.strategies.dictionaries(...) producing action bodies that fail pydantic validation (missing fields, wrong types, unknown action_type).

Invariant:

pre_state  = GET /state (turn = T)
resp       = POST /step with invalid action     # → 400 M8
post_state = GET /state
assert pre_state == post_state                  # turn unchanged, drift_schedule unchanged
assert resp.status_code == 400

Confirms §7.5 transactional step semantics: state only mutates after all work succeeds; a rejected action is a no-op.

2.2 `P2` — Session expiration is monotonic and consistent

Strategy: st.integers(min_value=0, max_value=7200) for synthetic elapsed seconds.

Invariant:

For any elapsed ∈ [0, 7200]:
  if elapsed < 3600: /step returns 200 (session alive)
  if elapsed >= 3600: /step returns 404 M4 (session expired)
Once expired, the session NEVER becomes alive again without a new /reset.

Tests monotone one-way transition: alive → expired is terminal. §3.2 TTL = 3600 s.

2.3 `P3` — Error envelope shape is universal

Strategy: parametrized across all 12 error-triggering inputs (from U26 matrix).

Invariant: every error response satisfies:

body = resp.json()
set(body.keys()) == {"error"}
set(body["error"].keys()) >= {"code", "message"}
isinstance(body["error"]["code"], str) and body["error"]["code"] != ""
isinstance(body["error"]["message"], str)
"traceback" not in json.dumps(body).lower()
"bearer" not in body["error"]["message"].lower()   # no token leakage

2.4 `P4` — `X-Session-Id` charset and length round-trip

Strategy: st.text(alphabet=string.ascii_letters + string.digits + "_-", min_size=1, max_size=64) generates valid session ids; a second strategy generates invalid ones (containing !@# , length 0, length 65+).

Invariant:

valid_sid   → /reset returns 200
invalid_sid → /reset returns 400 M2
After /reset with valid_sid:
  GET /state with the same sid returns 200
  GET /state with ANY other sid returns 404 M3

2.5 `P5` — LRU eviction preserves cache size cap

Strategy: st.lists(st.text(alphabet=string.ascii_letters, min_size=8, max_size=16), min_size=11, max_size=50, unique=True) — sequences of distinct session ids.

Invariant: after POSTing /reset for every sid in the list (one at a time):

len(cache) == min(len(sids), 10)
The 10 present sids are exactly the 10 most-recently-inserted (by last_touched).
No env instance is leaked (every evicted env had .close() called exactly once).

2.6 `P6` — Reward field is float-or-null

Strategy: parametrized over valid actions per DriftCallAction shape.

Invariant: every /step 200-response body satisfies:

reward = body["reward"]
assert reward is None or (isinstance(reward, float) and -1.0 <= reward <= 1.0)
assert isinstance(body["done"], bool)

Pins §2.1.1 envelope: reward: float | null, range aligned with openenv.yaml reward.range: [-1.0, 1.0] (§4.3).

2.7 `P7` — Concurrent `/reset` on same sid never produces two envs

Strategy: hypothesis.stateful.RuleBasedStateMachine driving concurrent /reset calls on the same X-Session-Id via anyio.create_task_group.

Invariant:

Across N concurrent /reset calls on the same sid:
  exactly one succeeds with 200 (winner)
  the remaining N-1 return 409 M12 (reset_in_progress)
  cache ends with exactly one env under that sid
  no env instance is leaked

§7.1 per-session asyncio lock invariant.

3. Integration Tests

Cross-cutting scenarios that exercise real subsystems. Marked @pytest.mark.integration; run in CI only, not in the fast pytest tests/ loop.

3.1 `I1` — End-to-end curl flow: `/reset` → 6× `/step` → `/state` → `/close`

Mechanism: subprocess.run(["curl", ...]) against a locally-booted FastAPI app (via uvicorn subprocess, port 7860). Uses the real curl binary to exercise headers + HTTP/1.1 semantics exactly as judges will.

Flow:

Start uvicorn in a subprocess, wait for /healthz to return ok (max 45 s, matches HEALTHCHECK --start-period=45s in §4.2).
curl -X POST /reset with bearer + X-Session-Id: e2e-001, body {"seed": 42, "config": {"curriculum_stage": 1}}. Assert 200.
Loop 6 times: curl -X POST /step with a tool_call action. Assert 200 each time; accumulate done values.
curl /state. Assert 200; turn >= 6.
curl -X POST /close. Assert 200; closed is True.
Kill uvicorn subprocess; assert no zombie process.

Budget: single test must complete under 60 s including subprocess boot.

3.2 `I2` — Docker build locally + `openenv validate`

Mechanism: docker build -t driftcall-env:test -f DRIFTCALL/Dockerfile DRIFTCALL/ then docker run -d -p 7860:7860 -e DRIFTCALL_ENV_TOKEN=test-token driftcall-env:test, then openenv validate http://localhost:7860 --auth-bearer test-token.

Assertions:

docker build exits 0.
Image size < 2 GB (docker image inspect driftcall-env:test --format '{{.Size}}' < 2 * 1024**3).
Container healthz returns ok within 60 s of docker run.
openenv validate exits 0 and its stdout contains each of:
- openenv.yaml parses, schema v1.0
- POST /reset success line
- POST /step success line
- GET /state success line
- POST /close success line
- 6 endpoints validated, 0 errors
Container cleanup: docker rm -f in finally block.

Gating: marked @pytest.mark.skipif(not shutil.which("docker")) — locally opt-in, mandatory in CI.

3.3 `I3` — HF Space deploy dry-run (no actual push)

Mechanism: hf upload --dry-run <team>/driftcall-env . --repo-type=space. Captures the file manifest that would be pushed.

Assertions:

Exit code 0.
Manifest includes: app.py, openenv.yaml, Dockerfile, requirements.txt, README.md, driftcall/ subtree.
Manifest excludes: tests/, training/, data/raw/, .env*, *.ipynb, .git/.
README.md YAML frontmatter contains required keys: title, sdk: docker, app_port: 7860, emoji, colorFrom, colorTo (§4.4).
No actual network call to huggingface.co — enforced via monkeypatch on huggingface_hub outbound session to raise if reached.

3.4 `I4` — Concurrent 10-session load test

Mechanism: anyio.create_task_group spawning 10 coroutines, each driving a unique X-Session-Id through /reset → 3× /step → /close against TestClient(app).

Assertions:

All 10 /reset calls return 200 (cache is exactly at cap).
An 11th concurrent /reset (while the first 10 are still last_touched < TTL) returns 429 M5 with Retry-After: 30 (proves cap enforcement under contention).
All 30 /step calls (3 × 10 sessions) return 200; no cross-session state bleed — each session's observation.turn progresses independently (1, 2, 3).
All 10 /close calls return 200.
Wall-clock budget: total test completes in < 30 s on CI 2-vCPU runner.

3.5 `I5` — Cold-start lifespan blocks request serving until models loaded

Mechanism: Instrument audio.tts_kokoro.load with an artificial 2 s anyio.sleep. Boot the app via LifespanManager and concurrently fire a /reset request at t=0 (before startup completes).

Assertions:

The /reset request blocks until lifespan startup is complete — it does not return 503 during the loading window if app.py correctly awaits lifespan before accepting requests (this is the FastAPI default).
If instead we disable the lifespan gate (test variant), the request returns 503 M6 with code="model_not_ready" — proves M6 is reachable and the guard is load-bearing.
/healthz responds 200 throughout (probe endpoint is cheap and does not require models — §3.5 "unauthenticated").

3.6 `I6` — TTL sweep liveness under sustained traffic

Mechanism: Run the TestClient against the app for 70 s of simulated traffic (monotonic clock advanced via fixture), issuing one /reset per synthetic minute with a fresh X-Session-Id. Sweep runs every 60 s per §3.3.

Assertions:

After the 61st synthetic second, the first session's entry has been evicted by the sweep task.
A /step on that first session returns 404 M4.
The sweep task itself does not raise; logs contain exactly one "swept 1 expired session" structured log line per sweep cycle (§3.7 logging fields).

4. Coverage Target

Targets (enforced in CI via pytest --cov-fail-under):

Artifact	Line coverage	Branch coverage
`app.py`	100%	≥ 95%
`driftcall/routes/reset.py`	100%	≥ 95%
`driftcall/routes/step.py`	100%	≥ 95%
`driftcall/routes/state.py`	100%	≥ 95%
`driftcall/routes/close.py`	100%	≥ 95%
`driftcall/routes/health.py`	100%	100% (trivial file)
`driftcall/session_cache.py`	100%	≥ 95%

Command:

pytest tests/test_deploy_env/ \
  --cov=app \
  --cov=driftcall.routes \
  --cov=driftcall.session_cache \
  --cov-branch \
  --cov-report=term-missing \
  --cov-fail-under=100

Branch-coverage carve-outs (documented pragmas, not silent): the except asyncio.CancelledError: raise guard at the bottom of the sweep task's loop is excluded via # pragma: no cover — re-raising a cancellation is standard-library contract and triggering it requires injecting a cancellation into the lifespan shutdown, which is covered by the lifespan test (I5) at the event-loop level.

Error-mode coverage ledger — every one of M1..M12 is raised by at least one test:

Mode	Raised by	HTTP
M1 `unauthorized`	U3, U4, U5, U6, U26	401
M2 `missing_session_id`	U7, U8, U9, U26, P4	400
M3 `session_not_found`	U16, U26, P4	404
M4 `session_expired`	U20, U26, P2, I6	404
M5 `max_sessions`	U25, U26, I4	429
M6 `model_not_ready`	U28, U26, I5	503
M7 `bad_json`	U12, U26	400
M8 `invalid_action`	U13, U17, U26, P1	400
M9 `internal_error`	U18, U26	500
M10 `io_error`	U26 (monkeypatched tmpfs full)	500
M11 `payload_too_large`	U14, U26	413
M12 `reset_in_progress`	U26, P7	409

HTTP status codes asserted at least once: 200, 400, 401, 404, 409, 413, 429, 500, 503 — all nine from §2.2.

5. Fixtures

Defined in tests/conftest.py (project-wide) and imported by tests/test_deploy_env/conftest.py. Shared with deploy_demo_space_tests.md — any change here propagates there and vice versa.

5.1 `fastapi_test_client`

@pytest.fixture
def fastapi_test_client(monkeypatch, valid_bearer_token, stub_audio_models):
    """
    Boots the FastAPI app with lifespan, stub Kokoro+Whisper loaded,
    and bearer token injected into app config.

    Yields a `fastapi.testclient.TestClient` that supports all HTTP verbs
    against the live app (in-process, no socket).

    Lifecycle: uses LifespanManager to fire startup/shutdown events;
    cache is flushed between tests via autouse cache-reset fixture.
    """
    monkeypatch.setenv("DRIFTCALL_ENV_TOKEN", valid_bearer_token)
    from app import app
    with TestClient(app) as client:
        yield client

Used by: every unit test in §1, properties P1–P7, integration tests I1, I4, I5, I6.

5.2 `valid_bearer_token`

@pytest.fixture(scope="session")
def valid_bearer_token() -> str:
    """A freshly-generated URL-safe token, session-scoped so it is stable
    across tests in one pytest run but distinct between runs."""
    return secrets.token_urlsafe(32)

Used by: every test that asserts 200 on a mutating endpoint, plus the "bad bearer" tests (which receive valid_bearer_token + "x" as the wrong token).

5.3 `session_id_alpha`

@pytest.fixture
def session_id_alpha() -> str:
    """Deterministic session id for tests that only need one sid."""
    return "session-alpha-0001"

Charset and length both pass the header validator (§2.1 headers table).

5.4 `session_id_beta`

@pytest.fixture
def session_id_beta() -> str:
    """Second deterministic session id for cross-session tests
    (e.g., asserting no state bleed between alpha and beta)."""
    return "session-beta-0002"

5.5 Helper fixtures (non-shared, internal to this test package)

stub_audio_models — monkeypatches audio.tts_kokoro.load and audio.asr_whisper.load to return lightweight stubs so lifespan completes in < 50 ms. Used everywhere except I5 (which tests real-ish load behavior).
monotonic_clock — monkeypatches time.monotonic() to advance deterministically; used by U20, U24, P2, I6.
cache_reset (autouse) — clears session_cache._store between tests; prevents cross-test bleed.
assert_error_envelope(resp, code, http_status) — imported helper, asserts envelope shape + Cache-Control: no-store header + optional Retry-After when code == "max_sessions".
one_mib_plus_one_body — precomputed bytes payload for U14 (M11 oversize test).

Fixture ownership note: fastapi_test_client, valid_bearer_token, session_id_alpha, session_id_beta live in tests/conftest.py at the project root and are the shared set with deploy_demo_space_tests.md. Helper fixtures (§5.5) are local to tests/test_deploy_env/conftest.py and are not shared.

6. Non-goals (out of scope for this plan)

Deep per-field validation of DriftCallObservation / DriftCallAction / DriftCallState — owned by env_tests.md + models_tests.md.
Reward math correctness — owned by rewards_tests.md.
Kokoro / Whisper model quality — owned by audio_tests.md.
Actual HF Hub pushes — forbidden in tests (§3.3 dry-run only); real push happens in Batch C3 manual verification.
GPU behavior — deployment is CPU-only (deploy_env_space.md §1, §6.5 explicit non-dependency).
Cross-worker cache coherence — documented as acceptable 404 path in §3.2 of the spec; not a test target for this hackathon (future hardening).

deploy_env_space_tests.md — Test Plan for docs/modules/deploy_env_space.md

1. Unit Tests

1.1 /healthz — test_healthz.py

1.2 Bearer auth — test_auth.py

1.3 X-Session-Id header — test_session_header.py

1.4 POST /reset — test_reset.py

1.5 POST /step — test_step.py

1.6 GET /state — test_state.py

1.7 POST /close — test_close.py

1.8 Session cache direct unit tests — test_session_cache_unit.py

1.9 Error-mode matrix — test_error_modes_mapping.py

1.10 Lifespan eager load — test_lifespan_eager_load.py

2. Property Tests

2.1 P1 — /step is idempotent on invalid action (env state unchanged)

2.2 P2 — Session expiration is monotonic and consistent

2.3 P3 — Error envelope shape is universal

2.4 P4 — X-Session-Id charset and length round-trip

2.5 P5 — LRU eviction preserves cache size cap

2.6 P6 — Reward field is float-or-null

2.7 P7 — Concurrent /reset on same sid never produces two envs

3. Integration Tests

3.1 I1 — End-to-end curl flow: /reset → 6× /step → /state → /close

3.2 I2 — Docker build locally + openenv validate

3.3 I3 — HF Space deploy dry-run (no actual push)

3.4 I4 — Concurrent 10-session load test

3.5 I5 — Cold-start lifespan blocks request serving until models loaded

3.6 I6 — TTL sweep liveness under sustained traffic

4. Coverage Target

5. Fixtures

5.1 fastapi_test_client

5.2 valid_bearer_token

5.3 session_id_alpha

5.4 session_id_beta

5.5 Helper fixtures (non-shared, internal to this test package)

6. Non-goals (out of scope for this plan)

deploy_env_space_tests.md — Test Plan for `docs/modules/deploy_env_space.md`

1.1 `/healthz` — `test_healthz.py`

1.2 Bearer auth — `test_auth.py`

1.3 `X-Session-Id` header — `test_session_header.py`

1.4 `POST /reset` — `test_reset.py`

1.5 `POST /step` — `test_step.py`

1.6 `GET /state` — `test_state.py`

1.7 `POST /close` — `test_close.py`

1.8 Session cache direct unit tests — `test_session_cache_unit.py`

1.9 Error-mode matrix — `test_error_modes_mapping.py`

1.10 Lifespan eager load — `test_lifespan_eager_load.py`

2.1 `P1` — `/step` is idempotent on invalid action (env state unchanged)

2.2 `P2` — Session expiration is monotonic and consistent

2.3 `P3` — Error envelope shape is universal

2.4 `P4` — `X-Session-Id` charset and length round-trip

2.5 `P5` — LRU eviction preserves cache size cap

2.6 `P6` — Reward field is float-or-null

2.7 `P7` — Concurrent `/reset` on same sid never produces two envs

3.1 `I1` — End-to-end curl flow: `/reset` → 6× `/step` → `/state` → `/close`

3.2 `I2` — Docker build locally + `openenv validate`

3.3 `I3` — HF Space deploy dry-run (no actual push)

3.4 `I4` — Concurrent 10-session load test

3.5 `I5` — Cold-start lifespan blocks request serving until models loaded

3.6 `I6` — TTL sweep liveness under sustained traffic

5.1 `fastapi_test_client`

5.2 `valid_bearer_token`

5.3 `session_id_alpha`

5.4 `session_id_beta`