Spaces:

saumilyajj
/

driftcall

Paused

App Files Files Community

driftcall / docs /tests /env_tests.md

saumilyajj

Upload folder using huggingface_hub

f2df60e verified about 1 month ago

preview code

raw

history blame contribute delete

29.6 kB

env_tests.md — Test Plan for `driftcall/env.py`

Module under test: driftcall/env.py (class DriftCallEnv) Design doc: DRIFTCALL/docs/modules/env.md (final sealed, 9-section spec) Owner: Person B (Rewards & Tests); reviewed by Person A (Environment) Implements test coverage for: DESIGN.md §4 (OpenEnv Interface), §4.2–4.5 (reset/step/budget), §6.2 (drift trigger), §7 (reward invariants), §9.4 (audio boundary), §11.1 (one env per session) Framework: pytest + hypothesis (+ pytest-cov) Coverage tool: pytest --cov=driftcall.env --cov-branch --cov-report=term-missing Status: Test plan — pre-critic-gate Last updated: 2026-04-24 Training path constraint: All tests are CUDA-free (text-only). Audio-boundary tests use in-process stub engines — no Kokoro / Whisper model loads, no network, no disk writes.

This plan specifies 100% line coverage and ≥ 95% branch coverage on driftcall/env.py. Every behavior clause in env.md §2–§3, every error mode E1–E12 in env.md §5, every edge case in env.md §7, and every worked example in env.md §8 has at least one dedicated test. Fixtures are shared with docs/tests/deploy_env_space_tests.md and reuse factories already defined in models_tests.md, vendors_tests.md, drift_injector_tests.md, task_generator_tests.md, and rewards_tests.md — single source of truth in tests/conftest.py.

Test count target: ≥ 25 unit + ≥ 5 property + 4 integration = 34 cases minimum; inventory below sums to 45 (35 unit + 6 property + 4 integration).

0. Scope & Contract

Covered (public surface of DriftCallEnv + EnvConfig.from_mapping):

DriftCallEnv.__init__(config) — config validation, unknown-key rejection, mutually-exclusive fields
reset(seed) — deterministic trajectory, curriculum_stage derivation, language_weights propagation, audio_boundary_enabled toggle invokes tts_engine.synthesize
step(action) — full pipeline ordering per env.md §2.3: (1a pure _validate_action → 1b caller handles repeated failures → 2 turn increment → 3 drift fold → 4 side-channel emit → 5 dispatch → 6 dataclasses.replace record → 7 terminal check → 8 compute_rewards once → 9 observation)
state() — frozen reference return (no deepcopy), E2 when unready
close() — idempotent, E3 afterwards, does NOT free shared audio singletons (env.md §9 open question 7)
episode(), rewards(), done() — terminal-only gating, memoized return
All 12 typed exceptions in driftcall.env.errors rooted at DriftCallEnvError

Not covered here (covered elsewhere, referenced only):

Vendor dispatch internals → vendors_tests.md
Drift pattern catalogue → drift_injector_tests.md
Reward arithmetic → rewards_tests.md
Sim-caller responder body → resolved via env.md §9 Q1 at critic gate; this plan only asserts the responder is deterministic (seed, turn)-keyed.

1. Unit tests (≥ 25 cases — inventory: 35)

All unit tests live in tests/test_env/. Layout:

tests/test_env/
  __init__.py
  test_init_config_validation.py
  test_reset.py
  test_step_ordering.py
  test_step_validation_purity.py
  test_state_accessor.py
  test_close_idempotent.py
  test_terminal_accessors.py
  test_audio_boundary_toggle.py
  test_error_taxonomy.py

1.1 `init` + `EnvConfig.from_mapping` — config validation (9 cases)

Scope: E1 InvalidConfigError on every malformed-config branch. __init__ performs no I/O.

#	Name	Setup	Assertion
U1	`test_init_default_config_ok`	`DriftCallEnv()` (no arg)	Succeeds. `env._config.curriculum_stage == 1`. `env._config.language_weights == {"en":0.4,"hinglish":0.4,"hi":0.1,"ta":0.05,"kn":0.05}`. `env._config.audio_boundary_enabled is False`. `env._state is None`.
U2	`test_init_rejects_unknown_key`	`DriftCallEnv({"curriculum_stage":1, "frobnicate":True})`	Raises `InvalidConfigError`; message contains `"frobnicate"` and the full allowed-key list.
U3	`test_init_rejects_invalid_stage`	Parametrized: `0, 4, -1, "1", 1.0, None`	Raises `InvalidConfigError` with `"curriculum_stage"`.
U4	`test_init_rejects_weights_wrong_sum`	`language_weights={"en":0.5,"hinglish":0.4}` (sum=0.9)	Raises `InvalidConfigError`; message cites `"sum"`.
U5	`test_init_rejects_weights_negative`	`language_weights={"en":0.6,"hinglish":0.5,"hi":-0.1}`	Raises `InvalidConfigError`; cites `"negative"`.
U6	`test_init_rejects_audio_enabled_missing_tts`	`audio_boundary_enabled=True, tts_engine=None, asr_engine=<stub>`	Raises `InvalidConfigError`; cites `"tts_engine"`.
U7	`test_init_rejects_audio_disabled_with_tts`	`audio_boundary_enabled=False, tts_engine=<stub>`	Raises `InvalidConfigError` ("tts_engine must be None when audio_boundary_enabled is False" — env.md §7.5).
U8	`test_init_is_pure_no_io`	Patch `builtins.open`, `socket.socket`, and `os.urandom` to raise. `DriftCallEnv({"curriculum_stage":2})`.	Succeeds without invoking any patched callable. Asserts env.md §2.1 "no I/O, no model load, no network call".
U9	`test_init_stores_frozen_config_copy`	Pass a mutable `weights` dict; mutate it after construction.	`env._config.language_weights` unchanged. `EnvConfig` instance has `__dataclass_params__.frozen is True`.

1.2 `reset()` — trajectory setup (8 cases)

#	Name	Setup	Assertion
U10	`test_reset_stage1_sets_max_turns_8`	`env=DriftCallEnv({"curriculum_stage":1}); obs=env.reset(seed=1)`	`env._state.max_turns == 8`. `obs.budget_remaining == 8`. `obs.turn == 0`.
U11	`test_reset_stage2_sets_max_turns_12`	stage 2	`max_turns == 12`; `budget_remaining == 12`.
U12	`test_reset_stage3_sets_max_turns_16`	stage 3	`max_turns == 16`; `budget_remaining == 16`.
U13	`test_reset_populates_curriculum_stage_on_state`	stage 2	`env._state.stage == 2` (or equivalent attribute; matches env.md §4.3 `stage` field piped into `Episode`).
U14	`test_reset_passes_language_weights_to_task_generator`	Monkeypatch `task_generator.generate` to record args. `reset(seed=7)` with custom weights.	Recorded `language_weights` argument is byte-identical to `env._config.language_weights` (not merely equal-by-value).
U15	`test_reset_same_seed_same_goal_and_schedule`	`env.reset(seed=42)` twice (construct two envs)	`obs_a.goal == obs_b.goal`; `env_a._state.drift_schedule == env_b._state.drift_schedule`; `env_a._state.vendor_states == env_b._state.vendor_states`.
U16	`test_reset_none_seed_populates_from_urandom`	`reset(seed=None)`	`env._seed` is an `int`. Two calls produce different `_seed` with high probability (assert inequality across 3 calls — tolerates 1-in-2^64 flake).
U17	`test_reset_audio_boundary_enabled_invokes_tts_synthesize`	Stub `tts_engine` with a recording synthesize. `audio_boundary_enabled=True`. `reset(seed=11)`.	Stub recorded exactly one call with args `(goal.seed_utterance, goal.language)`. `obs.last_transcript == obs.goal.seed_utterance` (canonical source unchanged — env.md §3.7 clause 1).

1.3 `step()` — pipeline ordering (7 cases)

Every case instruments the env by monkeypatching private helpers (_validate_action, _fire_drifts, _dispatch, _record_action, _check_terminal) to append their names to a shared call_log list, proving the order.

#	Name	Setup	Assertion
U18	`test_step_validates_before_any_mutation`	Valid stage-1 env after `reset`. Issue a valid TOOL_CALL.	`call_log == ["_validate_action", "_fire_drifts", "_emit_side_channel", "_dispatch", "_record_action", "_check_terminal", "_build_observation"]` — this is the env.md §2.3 order.
U19	`test_step_increments_turn_after_validate_before_dispatch`	Valid TOOL_CALL.	`obs.turn == 1` post-step. Turn counter bump occurs between `_validate_action` and `_fire_drifts` (per env.md §2.3 step 2). Instrumented via snapshot of `self._state.turn` inside stubbed `_fire_drifts`.
U20	`test_step_fires_drifts_before_dispatch`	Scripted scheduler fires `airline.price_rename` at turn 1. Agent action: `TOOL_CALL airline.search` at turn 1.	`obs.tool_results[-1].schema_version == "v2"` (tool saw post-drift schema). `obs.drift_log[-1].pattern_id == "airline.price_rename"`.
U21	`test_step_records_action_via_dataclasses_replace`	Valid TOOL_CALL.	`prev_state = env._state; env.step(a); next_state = env._state`. Assert `prev_state is not next_state`, `id(prev_state.actions) != id(next_state.actions)`, `next_state.actions == prev_state.actions + (a,)`.
U22	`test_step_checks_terminal_after_record`	Stage-1 env; issue 8 benign SPEAK actions (budget=8).	8th step: `env.done() is True`. `env.episode().terminated_by == "TIMEOUT"`. Turn counter = 8.
U23	`test_step_submit_calls_compute_rewards_exactly_once`	Monkeypatch `rewards.compute_rewards` with a recorder. Issue TOOL_CALL then SUBMIT.	Recorder called once. `env.rewards()` returns the exact object the recorder produced. A second call to `env.rewards()` returns the same identity (memoized — env.md §3.6).
U24	`test_step_abort_forces_r1_zero`	`reset(seed=1)`; `step(ABORT)`.	`env.episode().terminated_by == "ABORT"`. `env.rewards().r1 == 0.0`. R2…R5 still computed (non-None).

1.4 `_validate_action` purity & `InvalidActionError` (4 cases)

These cases pin env.md §3.5 / E4 behavior: _validate_action raises before any mutation; env remains valid for a subsequent step().

#	Name	Setup	Assertion
U25	`test_invalid_action_raises_no_state_mutation`	Valid stage-1 env. Snapshot `prev_state = env._state`. Call `env.step(DriftCallAction(action_type=TOOL_CALL, tool_name="airline.search"))` with `tool_args=None` (required dict).	Raises `InvalidActionError`. `env._state is prev_state`. `env._state.turn == prev_state.turn`. `len(env._state.actions) == len(prev_state.actions)`. `env._state.done is False`. No Rewards cached (`env._rewards is None`).
U26	`test_env_valid_after_invalid_action`	U25's env, then issue a valid TOOL_CALL.	Succeeds. `env._state.turn == 1`. Observation returned normally. Proves env is still steppable.
U27	`test_invalid_action_no_drift_fired_no_terminal_marker`	Scripted scheduler places drift at turn 1. Attempt invalid action.	Raises `InvalidActionError`. `env._state.drift_fired == ()`. `env.done() is False`. The drift did NOT fire (drift firing is inside step 3, after validate).
U28	`test_oversize_rationale_raises_invalid_action`	`DriftCallAction(action_type=SUBMIT, confidence=0.5, rationale="x"*201)`	Raises `InvalidActionError` with `"rationale"`. State unchanged (repeat U25's state-preservation asserts).

1.5 `state()` — frozen reference (2 cases)

#	Name	Setup	Assertion
U29	`test_state_returns_frozen_reference`	Post-reset env.	`env.state() is env._state`. `env.state().__dataclass_params__.frozen is True`. Attempting `env.state().turn = 99` raises `dataclasses.FrozenInstanceError`.
U30	`test_state_unready_raises_e2`	Fresh `DriftCallEnv()` without reset.	`env.state()` raises `EnvNotReadyError`. `env.done() is False` (not an error — env.md §7.1).

1.6 `close()` — idempotency (2 cases)

#	Name	Setup	Assertion
U31	`test_close_idempotent`	`env.close(); env.close(); env.close()`	No exception; `env._closed is True` after first call and stays True.
U32	`test_close_does_not_free_shared_audio_engines`	Build env with `audio_boundary_enabled=True` and stub TTS/ASR engines. `env.close()`.	`env._closed is True`; `env._state is None`; the stub engines expose no `close()` method at all (`assert not hasattr(tts_stub, "close")` and same for `asr_stub`) — env.md §9 Q7: engines are process-global singletons, and audio.md §2.1–2.2 define no `close()` on `TTSEngine`/`ASREngine`.

1.7 Terminal-only accessors + error taxonomy (3 cases)

#	Name	Setup	Assertion
U33	`test_episode_before_terminal_raises_e6`	Post-reset, mid-episode.	`env.episode()` raises `EpisodeNotTerminalError`. Same for `env.rewards()`. `env.done() is False`.
U34	`test_double_submit_raises_e5`	Submit, then attempt another step.	Second `step(...)` raises `EpisodeAlreadyTerminalError` (E5 — env.md §7.2). `env.done()` still True. Rewards object identity preserved.
U35	`test_all_12_errors_derive_from_driftcallenverror`	Introspect `driftcall.env.errors`.	The set `{InvalidConfigError, EnvNotReadyError, EnvClosedError, InvalidActionError, EpisodeAlreadyTerminalError, EpisodeNotTerminalError, ConcurrentStepError, UnknownDomainError, UnknownToolError, DriftInjectionError, RewardComputationError, AudioPipelineError}` each subclass `DriftCallEnvError` which subclasses `Exception`. Count is exactly 12.

2. Property tests (≥ 5 — inventory: 6)

Written with hypothesis. Strategies live in tests/test_env/strategies.py (shared with test_rewards where applicable).

#	Name	Property	Strategy
P1	`test_step_is_pure_per_call`	For a fresh env `e1` and `e2` constructed with the same config and `reset(seed=s)`, given the same action sequence, every `step()` return is equal and the post-step states are equal. Same `(state, action) → (state', observation)`.	Seeds in `integers(0, 2**31-1)`; action sequences built from a `DriftCallAction` strategy over valid types; stage in `sampled_from([1,2,3])`. ≥ 200 examples.
P2	`test_validation_failure_preserves_pre_step_state`	For any env in a steppable state and any `DriftCallAction` that fails `_validate_action`: state after the raised `InvalidActionError` equals state before (by identity — `env._state is prev`).	Mixed-validity action strategy; hypothesis `assume()` filters to invalid ones.
P3	`test_turn_counter_monotone_non_decreasing`	Across any legal step sequence, `env._state.turn` is monotone non-decreasing; it strictly increases on every non-raising `step()` and is unchanged on every raised `InvalidActionError`.	Random action sequences up to length 20; assume `stage=3` to permit budget 16.
P4	`test_frozen_state_identity_changes_on_transition`	After every successful `step()`, `prev_state is not next_state` and `id(prev_state.actions) != id(next_state.actions)` whenever `len(next.actions) > len(prev.actions)`. (env.md §3.8 invariant.)	As P1.
P5	`test_rewards_memoized_identity`	After termination, `env.rewards() is env.rewards()` (identity, not just equality) across 10 calls. Same for `env.episode()`.	Parametrized over `terminated_by ∈ {"SUBMIT","ABORT","TIMEOUT"}`.
P6	`test_available_tools_fixed_for_episode`	The set `obs.available_tools` is equal across every observation in an episode, regardless of drifts fired. (env.md §3.4 clause 4.)	Random schedules over stage 2/3; ≥ 50 episodes.

3. Integration tests (4 cases)

Live in tests/test_env/test_e2e.py. These are full episode traces matching env.md §8 examples. All dependencies are real (real task_generator, real drift_injector, real vendors, real rewards.compute_rewards) — only the audio engines are stubbed in I4.

#	Name	Maps to	Scenario
I1	`test_episode_stage1_airline_happy_submit`	env.md §8.1	`DriftCallEnv({"curriculum_stage":1})`; `reset(seed=42)`. Replay the 5-turn script: `airline.search` → 3 more tool calls → `SUBMIT(confidence=0.9)`. Assertions: `env.done() is True`; `env.episode().terminated_by == "SUBMIT"`; `env.episode().turns_used == 5`; `obs.drift_log == ()`; `env.rewards().r1 == 1.0`; `env.rewards().r2 == 0.5` (stage-1 neutral); `env.rewards().reward` in `[0.85, 1.0]`.
I2	`test_episode_stage2_drift_detect_adapt`	env.md §8.2	`stage=2; seed=7`. Scripted sequence through turn 6 terminating in SUBMIT. Drift `airline.price_rename` fires turn 3. Agent SPEAK at turn 4 mentions `"total_fare_inr"`. Assertions: `obs.drift_log[0].pattern_id == "airline.price_rename"`; `obs.drift_log[0].turn == 3`; `obs.tool_results[-2].response` references `"total_fare_inr"` (not `"price"`); `env.rewards().r1 == 1.0`; `env.rewards().r2 == 1.0`; `env.rewards().reward ≈ 0.90 ± 0.05`.
I3	`test_episode_stage3_compound_drift_timeout`	env.md §8.3	`stage=3; seed=2026`. Script designed to consume all 16 turns. Two drifts fire (airline turn 3, payment turn 9). Assertions: `env.done() is True`; `env.episode().terminated_by == "TIMEOUT"`; `env.episode().turns_used == 16`; `env.rewards().r1 == 0.0`; `env.rewards().r2 in {0.5, 1.0}`; `env.rewards().reward < 0.3`.
I4	`test_episode_audio_boundary_enabled_stubs`	env.md §8.4	`audio_boundary_enabled=True`, `tts_engine=StubTTS()`, `asr_engine=StubWhisper()` (contracts in §5 — signatures match `audio.md §2.1–2.2`). Stubs are in-process, CUDA-free, deterministic: `synthesize(text, language_code, voice_pack=None, , seed=0, sample_rate_hz=16000) → f"WAV[{text}:{language_code}:{seed}:{sample_rate_hz}]".encode("utf-8")`; `transcribe(audio_bytes, language_hint, , beam_size=1, vad_filter=True, max_duration_s=30.0) → TranscriptResult(text=<scripted>, language_detected="hinglish", confidence=0.82, duration_s=1.250)`. Episode: `reset(seed=11)` → `CLARIFY` → `TOOL_CALL` → `SUBMIT`. Assertions: stub TTS synthesize called on `reset` and on every CLARIFY/SPEAK side-channel emission; `obs.last_transcript` after CLARIFY equals the stubbed ASR text; `obs.last_confidence == 0.82`; reward computation is 100% textual — no TTS bytes reach `compute_rewards` (verified by asserting `episode.actions` and `episode.tool_results` contain no `bytes` objects).

All integration tests reuse fixtures:

goal_airline, goal_restaurant — from drift_injector_tests.md §5.2 (session-scoped GoalSpec instances)
airline_v1, airline_v2, payment_v2 — from vendors_tests.md §5.1 (per-domain aliases over vendor_states_v{1,2,3}; payment_v2 is the post-auth_scope_bump state)
drift_patterns_fixture — from drift_injector_tests.md §5.1 (authoritative 20-pattern catalogue; individual events + compound schedules used by I2/I3 are defined locally in §5 below as drift_event_airline_price_rename_turn3, drift_event_payment_auth_turn9, and schedule_stage3_compound, because drift_injector_tests.md only ships the catalogue, not pre-composed schedules)
episode_happy_airline, episode_timeout — from rewards_tests.md §5 (§5.1 and §5.4 respectively)
valid_tool_call_action, valid_submit_action, valid_observation_reset — from models_tests.md §5.4 (the factory/instance fixtures used to assemble per-step action sequences)

No integration test touches the network. No test loads a real Kokoro/Whisper model.

4. Coverage target

100% line coverage and ≥ 95% branch coverage on driftcall/env.py under pytest --cov=driftcall.env --cov-branch --cov-report=term-missing.

4.1 Error-mode coverage matrix (every E1–E12 raised at least once)

Code	Exception	Raised by which test
E1	`InvalidConfigError`	U2 (unknown key), U3 (bad stage), U4 (weights sum), U5 (negative weight), U6 (missing TTS), U7 (forbidden TTS). Also raised from U4.3 reset if scripted scheduler produces turn > max_turns — covered by a dedicated test `test_reset_scripted_bad_schedule_raises_e1`.
E2	`EnvNotReadyError`	U30 (`state()`), plus `test_step_before_reset_raises_e2`, `test_episode_before_reset_raises_e2`.
E3	`EnvClosedError`	`test_step_after_close_raises_e3`, `test_reset_after_close_raises_e3`.
E4	`InvalidActionError`	U25, U26, U27, U28, plus per-ActionType parametrized cases: missing `tool_name` on TOOL_CALL, message len 0 and len 2001 on SPEAK, NUL byte in message on CLARIFY, missing `confidence` on SUBMIT, forbidden `tool_name` on ABORT.
E5	`EpisodeAlreadyTerminalError`	U34 (double SUBMIT).
E6	`EpisodeNotTerminalError`	U33.
E7	`ConcurrentStepError`	`test_reentrant_step_raises_e7` — stub a vendor `dispatch` that re-invokes `env.step(other_action)`; assert E7 raised on the inner call; assert outer state unchanged.
E8	`UnknownDomainError`	`test_probe_schema_unknown_domain_raises_e8` — PROBE_SCHEMA with `tool_name="spaceship"`.
E9	`UnknownToolError`	`test_tool_call_unknown_tool_raises_e9` — `tool_name="airline.teleport"`.
E10	`DriftInjectionError`	`test_drift_fold_error_propagates_e10` — scripted scheduler yields event with unknown `pattern_id`; env must not swallow.
E11	`RewardComputationError`	`test_reward_compute_error_propagates_e11` — monkeypatch `rewards.compute_rewards` to raise; env must surface.
E12	`AudioPipelineError`	`test_audio_pipeline_error_on_clarify` — stub ASR that raises on 2nd transcribe; assert E12 surfaces from `step(CLARIFY)`; episode does NOT terminate (env.md §5 E12 note). Second test: `test_audio_pipeline_error_on_reset_is_e1_class` — stub TTS that raises on `reset`; the env is unready afterwards per env.md §5 E12.

Total dedicated error-mode tests: 12 exceptions × ≥ 1 = 12 minimum; inventory covers 18 error-mode paths.

4.2 Line/branch targets

DriftCallEnv.__init__ — 100% line; 100% branch (both config is None and config is dict branches hit in U1, U2).
EnvConfig.from_mapping — 100% line; 100% branch (all 7 raise branches covered by U2–U7 + reset-bad-schedule).
reset — 100% line; step 7b audio branch covered by U17 (True) and U10 (False).
step — 100% line; all 6 ActionType dispatch branches (TOOL_CALL / SPEAK / CLARIFY / PROBE_SCHEMA / SUBMIT / ABORT) each have ≥ 1 unit test + integration coverage; drift-fold-empty vs non-empty both covered (I1 empty, I2 non-empty); terminal vs non-terminal both covered (U22 TIMEOUT, U23 SUBMIT, I1/I2/I3 mix).
state, close, episode, rewards, done — 100% line; all raise/early-return branches covered.
_validate_action — 100% line; every row of env.md §3.1 Table is parametrized (per-ActionType forbidden-field matrix).
_build_observation — 100% line; last_transcript branches for turn 0 vs mid-episode vs audio-enabled all covered (U17, I4, I1).

Branch coverage < 95% is a hard CI fail.

5. Fixtures

All fixtures defined in tests/conftest.py under the env_* namespace. Shared with docs/tests/deploy_env_space_tests.md (same names, same content).

Name	Scope	Purpose	Reuses
`env_stage1_airline`	function	`DriftCallEnv({"curriculum_stage":1})` already `reset(seed=42)`, goal forced to airline via scripted `task_generator` monkeypatch when hermetic goal needed. Provides `(env, obs0)` tuple.	`goal_airline` from `drift_injector_tests.md §5.2`; `airline_v1` from `vendors_tests.md §5.1`.
`env_stage2_restaurant_drift`	function	Stage-2 env `reset(seed=7)` with restaurant goal, scripted scheduler that fires `restaurant.items_shape_bump` at turn 3. Returns `(env, obs0, drift_event)`.	`goal_restaurant` from `drift_injector_tests.md §5.2`; `drift_event_restaurant_items_shape_bump_turn3` defined below.
`env_stage3_compound`	function	Stage-3 env `reset(seed=2026)`, scripted scheduler with compound drift (airline turn 3 + payment turn 9). Used by I3.	`schedule_stage3_compound` defined below; reuses `drift_patterns_fixture` catalogue from `drift_injector_tests.md §5.1`.
`env_audio_enabled`	function	Stage-1 env with `audio_boundary_enabled=True`, `tts_engine=StubTTS()`, `asr_engine=StubWhisper()`. Stubs are pure Python, CUDA-free, deterministic. Returns `(env, tts_stub, asr_stub)` for assertions on call counts.	`StubTTS`, `StubWhisper` classes defined in `tests/stubs/audio_stubs.py`.
`env_config_invalid_key`	function	`{"curriculum_stage":1, "frobnicate":True}` — a single malformed config dict reused across U2 and any critic-requested smoke test.	—

Stub engine contracts (pinned here for cross-doc consistency with audio_tests.md; signatures match docs/modules/audio.md §2.1 and §2.2 exactly):

from driftcall.audio.asr_whisper import TranscriptResult
from driftcall.audio.tts_kokoro import VoicePack

class StubTTS:
    """In-process TTS double. Matches audio.md §2.1 `TTSEngine.synthesize` signature."""
    def __init__(self) -> None:
        self.calls: list[tuple[str, str, VoicePack | None, int, int]] = []

    def synthesize(
        self,
        text: str,
        language_code: str,
        voice_pack: VoicePack | None = None,
        *,
        seed: int = 0,
        sample_rate_hz: int = 16000,
    ) -> bytes:
        self.calls.append((text, language_code, voice_pack, seed, sample_rate_hz))
        return f"WAV[{text}:{language_code}:{seed}:{sample_rate_hz}]".encode("utf-8")

class StubWhisper:
    """In-process ASR double. Matches audio.md §2.2 `ASREngine.transcribe` signature
    and the 4-field `TranscriptResult` contract (text, language_detected, confidence, duration_s)."""
    def __init__(self, scripted: dict[int, str] | None = None) -> None:
        self.calls: list[bytes] = []
        self._scripted = scripted or {}

    def transcribe(
        self,
        audio_bytes: bytes,
        language_hint: str | None,
        *,
        beam_size: int = 1,
        vad_filter: bool = True,
        max_duration_s: float = 30.0,
    ) -> TranscriptResult:
        self.calls.append(audio_bytes)
        turn = len(self.calls)
        return TranscriptResult(
            text=self._scripted.get(turn, "shaam ko, 7 baje"),
            language_detected="hinglish",
            confidence=0.82,
            duration_s=1.250,
        )

Neither stub exposes a .close() method: audio.md §2.1–2.2 defines no such method on TTSEngine/ASREngine, and the engines are process-global singletons (env.md §9 Q7) — U32 asserts env.close() does NOT invoke anything engine-side, so the stubs simply must not carry a close() attribute at all (U32's "call count is 0" is upgraded to not hasattr(stub, "close") to match the real contract).

5.1 Locally-defined drift events and schedules (not shipped by `drift_injector_tests.md`)

drift_injector_tests.md §5.1 publishes the 20-pattern catalogue (drift_patterns_fixture) but does NOT pre-compose per-test DriftEvent instances or full DriftSchedule objects — those are composed locally here because scheduling is an env-side concern. All three fixtures below are session-scoped and import drift_patterns_fixture to look up the authoritative pattern record.

from driftcall.models import DriftEvent
from driftcall.drift_injector import DriftSchedule

@pytest.fixture(scope="session")
def drift_event_airline_price_rename_turn3(drift_patterns_fixture) -> DriftEvent:
    """Used by I2. Pattern id asserted byte-identical to drift_patterns_fixture entry."""
    pattern = next(p for p in drift_patterns_fixture if p.id == "airline.price_rename")
    return DriftEvent(
        turn=3,
        drift_type=pattern.drift_type,       # "schema"
        domain=pattern.domain,               # "airline"
        description=pattern.description,
        from_version=pattern.from_version,   # "v1"
        to_version=pattern.to_version,       # "v2"
    )

@pytest.fixture(scope="session")
def drift_event_restaurant_items_shape_bump_turn3(drift_patterns_fixture) -> DriftEvent:
    """Used by env_stage2_restaurant_drift. `restaurant.items_shape_bump` is the
    canonical restaurant schema drift per drift_injector.md §4.4 (items gain required `modifiers`)."""
    pattern = next(p for p in drift_patterns_fixture if p.id == "restaurant.items_shape_bump")
    return DriftEvent(
        turn=3,
        drift_type=pattern.drift_type,
        domain=pattern.domain,
        description=pattern.description,
        from_version=pattern.from_version,
        to_version=pattern.to_version,
    )

@pytest.fixture(scope="session")
def drift_event_payment_auth_turn9(drift_patterns_fixture) -> DriftEvent:
    """Used by I3. Pattern id `payment.auth_scope_upgrade` (Auth axis, drift_injector.md §4.4)."""
    pattern = next(p for p in drift_patterns_fixture if p.id == "payment.auth_scope_upgrade")
    return DriftEvent(
        turn=9,
        drift_type=pattern.drift_type,       # "auth"
        domain=pattern.domain,               # "payment"
        description=pattern.description,
        from_version=pattern.from_version,
        to_version=pattern.to_version,
    )

@pytest.fixture(scope="session")
def schedule_stage3_compound(
    drift_event_airline_price_rename_turn3,
    drift_event_payment_auth_turn9,
) -> DriftSchedule:
    """Used by I3. Two drifts, one per domain, matching env.md §8.3 worked example."""
    return DriftSchedule(events=(
        drift_event_airline_price_rename_turn3,
        drift_event_payment_auth_turn9,
    ))

These three DriftEvents plus one DriftSchedule are the only fixtures defined in env_tests.md; everything else is imported from the sibling test plans cited in §3 above.

Fixture immutability rule: if any field of any fixture changes here, the matching fixture in deploy_env_space_tests.md §5 must be updated in the same commit — they share a single conftest.py definition. CI guards this via a grep-based pre-commit hook (scripts/check_fixture_parity.sh).

env_tests.md — Test Plan for driftcall/env.py