driftcall / docs /tests /env_tests.md
saumilyajj's picture
Upload folder using huggingface_hub
f2df60e verified

env_tests.md β€” Test Plan for driftcall/env.py

Module under test: driftcall/env.py (class DriftCallEnv) Design doc: DRIFTCALL/docs/modules/env.md (final sealed, 9-section spec) Owner: Person B (Rewards & Tests); reviewed by Person A (Environment) Implements test coverage for: DESIGN.md Β§4 (OpenEnv Interface), Β§4.2–4.5 (reset/step/budget), Β§6.2 (drift trigger), Β§7 (reward invariants), Β§9.4 (audio boundary), Β§11.1 (one env per session) Framework: pytest + hypothesis (+ pytest-cov) Coverage tool: pytest --cov=driftcall.env --cov-branch --cov-report=term-missing Status: Test plan β€” pre-critic-gate Last updated: 2026-04-24 Training path constraint: All tests are CUDA-free (text-only). Audio-boundary tests use in-process stub engines β€” no Kokoro / Whisper model loads, no network, no disk writes.

This plan specifies 100% line coverage and β‰₯ 95% branch coverage on driftcall/env.py. Every behavior clause in env.md Β§2–§3, every error mode E1–E12 in env.md Β§5, every edge case in env.md Β§7, and every worked example in env.md Β§8 has at least one dedicated test. Fixtures are shared with docs/tests/deploy_env_space_tests.md and reuse factories already defined in models_tests.md, vendors_tests.md, drift_injector_tests.md, task_generator_tests.md, and rewards_tests.md β€” single source of truth in tests/conftest.py.

Test count target: β‰₯ 25 unit + β‰₯ 5 property + 4 integration = 34 cases minimum; inventory below sums to 45 (35 unit + 6 property + 4 integration).


0. Scope & Contract

Covered (public surface of DriftCallEnv + EnvConfig.from_mapping):

  • DriftCallEnv.__init__(config) β€” config validation, unknown-key rejection, mutually-exclusive fields
  • reset(seed) β€” deterministic trajectory, curriculum_stage derivation, language_weights propagation, audio_boundary_enabled toggle invokes tts_engine.synthesize
  • step(action) β€” full pipeline ordering per env.md Β§2.3: (1a pure _validate_action β†’ 1b caller handles repeated failures β†’ 2 turn increment β†’ 3 drift fold β†’ 4 side-channel emit β†’ 5 dispatch β†’ 6 dataclasses.replace record β†’ 7 terminal check β†’ 8 compute_rewards once β†’ 9 observation)
  • state() β€” frozen reference return (no deepcopy), E2 when unready
  • close() β€” idempotent, E3 afterwards, does NOT free shared audio singletons (env.md Β§9 open question 7)
  • episode(), rewards(), done() β€” terminal-only gating, memoized return
  • All 12 typed exceptions in driftcall.env.errors rooted at DriftCallEnvError

Not covered here (covered elsewhere, referenced only):

  • Vendor dispatch internals β†’ vendors_tests.md
  • Drift pattern catalogue β†’ drift_injector_tests.md
  • Reward arithmetic β†’ rewards_tests.md
  • Sim-caller responder body β†’ resolved via env.md Β§9 Q1 at critic gate; this plan only asserts the responder is deterministic (seed, turn)-keyed.

1. Unit tests (β‰₯ 25 cases β€” inventory: 35)

All unit tests live in tests/test_env/. Layout:

tests/test_env/
  __init__.py
  test_init_config_validation.py
  test_reset.py
  test_step_ordering.py
  test_step_validation_purity.py
  test_state_accessor.py
  test_close_idempotent.py
  test_terminal_accessors.py
  test_audio_boundary_toggle.py
  test_error_taxonomy.py

1.1 __init__ + EnvConfig.from_mapping β€” config validation (9 cases)

Scope: E1 InvalidConfigError on every malformed-config branch. __init__ performs no I/O.

# Name Setup Assertion
U1 test_init_default_config_ok DriftCallEnv() (no arg) Succeeds. env._config.curriculum_stage == 1. env._config.language_weights == {"en":0.4,"hinglish":0.4,"hi":0.1,"ta":0.05,"kn":0.05}. env._config.audio_boundary_enabled is False. env._state is None.
U2 test_init_rejects_unknown_key DriftCallEnv({"curriculum_stage":1, "frobnicate":True}) Raises InvalidConfigError; message contains "frobnicate" and the full allowed-key list.
U3 test_init_rejects_invalid_stage Parametrized: 0, 4, -1, "1", 1.0, None Raises InvalidConfigError with "curriculum_stage".
U4 test_init_rejects_weights_wrong_sum language_weights={"en":0.5,"hinglish":0.4} (sum=0.9) Raises InvalidConfigError; message cites "sum".
U5 test_init_rejects_weights_negative language_weights={"en":0.6,"hinglish":0.5,"hi":-0.1} Raises InvalidConfigError; cites "negative".
U6 test_init_rejects_audio_enabled_missing_tts audio_boundary_enabled=True, tts_engine=None, asr_engine=<stub> Raises InvalidConfigError; cites "tts_engine".
U7 test_init_rejects_audio_disabled_with_tts audio_boundary_enabled=False, tts_engine=<stub> Raises InvalidConfigError ("tts_engine must be None when audio_boundary_enabled is False" β€” env.md Β§7.5).
U8 test_init_is_pure_no_io Patch builtins.open, socket.socket, and os.urandom to raise. DriftCallEnv({"curriculum_stage":2}). Succeeds without invoking any patched callable. Asserts env.md Β§2.1 "no I/O, no model load, no network call".
U9 test_init_stores_frozen_config_copy Pass a mutable weights dict; mutate it after construction. env._config.language_weights unchanged. EnvConfig instance has __dataclass_params__.frozen is True.

1.2 reset() β€” trajectory setup (8 cases)

# Name Setup Assertion
U10 test_reset_stage1_sets_max_turns_8 env=DriftCallEnv({"curriculum_stage":1}); obs=env.reset(seed=1) env._state.max_turns == 8. obs.budget_remaining == 8. obs.turn == 0.
U11 test_reset_stage2_sets_max_turns_12 stage 2 max_turns == 12; budget_remaining == 12.
U12 test_reset_stage3_sets_max_turns_16 stage 3 max_turns == 16; budget_remaining == 16.
U13 test_reset_populates_curriculum_stage_on_state stage 2 env._state.stage == 2 (or equivalent attribute; matches env.md Β§4.3 stage field piped into Episode).
U14 test_reset_passes_language_weights_to_task_generator Monkeypatch task_generator.generate to record args. reset(seed=7) with custom weights. Recorded language_weights argument is byte-identical to env._config.language_weights (not merely equal-by-value).
U15 test_reset_same_seed_same_goal_and_schedule env.reset(seed=42) twice (construct two envs) obs_a.goal == obs_b.goal; env_a._state.drift_schedule == env_b._state.drift_schedule; env_a._state.vendor_states == env_b._state.vendor_states.
U16 test_reset_none_seed_populates_from_urandom reset(seed=None) env._seed is an int. Two calls produce different _seed with high probability (assert inequality across 3 calls β€” tolerates 1-in-2^64 flake).
U17 test_reset_audio_boundary_enabled_invokes_tts_synthesize Stub tts_engine with a recording synthesize. audio_boundary_enabled=True. reset(seed=11). Stub recorded exactly one call with args (goal.seed_utterance, goal.language). obs.last_transcript == obs.goal.seed_utterance (canonical source unchanged β€” env.md Β§3.7 clause 1).

1.3 step() β€” pipeline ordering (7 cases)

Every case instruments the env by monkeypatching private helpers (_validate_action, _fire_drifts, _dispatch, _record_action, _check_terminal) to append their names to a shared call_log list, proving the order.

# Name Setup Assertion
U18 test_step_validates_before_any_mutation Valid stage-1 env after reset. Issue a valid TOOL_CALL. call_log == ["_validate_action", "_fire_drifts", "_emit_side_channel", "_dispatch", "_record_action", "_check_terminal", "_build_observation"] β€” this is the env.md Β§2.3 order.
U19 test_step_increments_turn_after_validate_before_dispatch Valid TOOL_CALL. obs.turn == 1 post-step. Turn counter bump occurs between _validate_action and _fire_drifts (per env.md Β§2.3 step 2). Instrumented via snapshot of self._state.turn inside stubbed _fire_drifts.
U20 test_step_fires_drifts_before_dispatch Scripted scheduler fires airline.price_rename at turn 1. Agent action: TOOL_CALL airline.search at turn 1. obs.tool_results[-1].schema_version == "v2" (tool saw post-drift schema). obs.drift_log[-1].pattern_id == "airline.price_rename".
U21 test_step_records_action_via_dataclasses_replace Valid TOOL_CALL. prev_state = env._state; env.step(a); next_state = env._state. Assert prev_state is not next_state, id(prev_state.actions) != id(next_state.actions), next_state.actions == prev_state.actions + (a,).
U22 test_step_checks_terminal_after_record Stage-1 env; issue 8 benign SPEAK actions (budget=8). 8th step: env.done() is True. env.episode().terminated_by == "TIMEOUT". Turn counter = 8.
U23 test_step_submit_calls_compute_rewards_exactly_once Monkeypatch rewards.compute_rewards with a recorder. Issue TOOL_CALL then SUBMIT. Recorder called once. env.rewards() returns the exact object the recorder produced. A second call to env.rewards() returns the same identity (memoized β€” env.md Β§3.6).
U24 test_step_abort_forces_r1_zero reset(seed=1); step(ABORT). env.episode().terminated_by == "ABORT". env.rewards().r1 == 0.0. R2…R5 still computed (non-None).

1.4 _validate_action purity & InvalidActionError (4 cases)

These cases pin env.md Β§3.5 / E4 behavior: _validate_action raises before any mutation; env remains valid for a subsequent step().

# Name Setup Assertion
U25 test_invalid_action_raises_no_state_mutation Valid stage-1 env. Snapshot prev_state = env._state. Call env.step(DriftCallAction(action_type=TOOL_CALL, tool_name="airline.search")) with tool_args=None (required dict). Raises InvalidActionError. env._state is prev_state. env._state.turn == prev_state.turn. len(env._state.actions) == len(prev_state.actions). env._state.done is False. No Rewards cached (env._rewards is None).
U26 test_env_valid_after_invalid_action U25's env, then issue a valid TOOL_CALL. Succeeds. env._state.turn == 1. Observation returned normally. Proves env is still steppable.
U27 test_invalid_action_no_drift_fired_no_terminal_marker Scripted scheduler places drift at turn 1. Attempt invalid action. Raises InvalidActionError. env._state.drift_fired == (). env.done() is False. The drift did NOT fire (drift firing is inside step 3, after validate).
U28 test_oversize_rationale_raises_invalid_action DriftCallAction(action_type=SUBMIT, confidence=0.5, rationale="x"*201) Raises InvalidActionError with "rationale". State unchanged (repeat U25's state-preservation asserts).

1.5 state() β€” frozen reference (2 cases)

# Name Setup Assertion
U29 test_state_returns_frozen_reference Post-reset env. env.state() is env._state. env.state().__dataclass_params__.frozen is True. Attempting env.state().turn = 99 raises dataclasses.FrozenInstanceError.
U30 test_state_unready_raises_e2 Fresh DriftCallEnv() without reset. env.state() raises EnvNotReadyError. env.done() is False (not an error β€” env.md Β§7.1).

1.6 close() β€” idempotency (2 cases)

# Name Setup Assertion
U31 test_close_idempotent env.close(); env.close(); env.close() No exception; env._closed is True after first call and stays True.
U32 test_close_does_not_free_shared_audio_engines Build env with audio_boundary_enabled=True and stub TTS/ASR engines. env.close(). env._closed is True; env._state is None; the stub engines expose no close() method at all (assert not hasattr(tts_stub, "close") and same for asr_stub) β€” env.md Β§9 Q7: engines are process-global singletons, and audio.md Β§2.1–2.2 define no close() on TTSEngine/ASREngine.

1.7 Terminal-only accessors + error taxonomy (3 cases)

# Name Setup Assertion
U33 test_episode_before_terminal_raises_e6 Post-reset, mid-episode. env.episode() raises EpisodeNotTerminalError. Same for env.rewards(). env.done() is False.
U34 test_double_submit_raises_e5 Submit, then attempt another step. Second step(...) raises EpisodeAlreadyTerminalError (E5 β€” env.md Β§7.2). env.done() still True. Rewards object identity preserved.
U35 test_all_12_errors_derive_from_driftcallenverror Introspect driftcall.env.errors. The set {InvalidConfigError, EnvNotReadyError, EnvClosedError, InvalidActionError, EpisodeAlreadyTerminalError, EpisodeNotTerminalError, ConcurrentStepError, UnknownDomainError, UnknownToolError, DriftInjectionError, RewardComputationError, AudioPipelineError} each subclass DriftCallEnvError which subclasses Exception. Count is exactly 12.

2. Property tests (β‰₯ 5 β€” inventory: 6)

Written with hypothesis. Strategies live in tests/test_env/strategies.py (shared with test_rewards where applicable).

# Name Property Strategy
P1 test_step_is_pure_per_call For a fresh env e1 and e2 constructed with the same config and reset(seed=s), given the same action sequence, every step() return is equal and the post-step states are equal. Same (state, action) β†’ (state', observation). Seeds in integers(0, 2**31-1); action sequences built from a DriftCallAction strategy over valid types; stage in sampled_from([1,2,3]). β‰₯ 200 examples.
P2 test_validation_failure_preserves_pre_step_state For any env in a steppable state and any DriftCallAction that fails _validate_action: state after the raised InvalidActionError equals state before (by identity β€” env._state is prev). Mixed-validity action strategy; hypothesis assume() filters to invalid ones.
P3 test_turn_counter_monotone_non_decreasing Across any legal step sequence, env._state.turn is monotone non-decreasing; it strictly increases on every non-raising step() and is unchanged on every raised InvalidActionError. Random action sequences up to length 20; assume stage=3 to permit budget 16.
P4 test_frozen_state_identity_changes_on_transition After every successful step(), prev_state is not next_state and id(prev_state.actions) != id(next_state.actions) whenever len(next.actions) > len(prev.actions). (env.md Β§3.8 invariant.) As P1.
P5 test_rewards_memoized_identity After termination, env.rewards() is env.rewards() (identity, not just equality) across 10 calls. Same for env.episode(). Parametrized over terminated_by ∈ {"SUBMIT","ABORT","TIMEOUT"}.
P6 test_available_tools_fixed_for_episode The set obs.available_tools is equal across every observation in an episode, regardless of drifts fired. (env.md Β§3.4 clause 4.) Random schedules over stage 2/3; β‰₯ 50 episodes.

3. Integration tests (4 cases)

Live in tests/test_env/test_e2e.py. These are full episode traces matching env.md Β§8 examples. All dependencies are real (real task_generator, real drift_injector, real vendors, real rewards.compute_rewards) β€” only the audio engines are stubbed in I4.

# Name Maps to Scenario
I1 test_episode_stage1_airline_happy_submit env.md Β§8.1 DriftCallEnv({"curriculum_stage":1}); reset(seed=42). Replay the 5-turn script: airline.search β†’ 3 more tool calls β†’ SUBMIT(confidence=0.9). Assertions: env.done() is True; env.episode().terminated_by == "SUBMIT"; env.episode().turns_used == 5; obs.drift_log == (); env.rewards().r1 == 1.0; env.rewards().r2 == 0.5 (stage-1 neutral); env.rewards().reward in [0.85, 1.0].
I2 test_episode_stage2_drift_detect_adapt env.md Β§8.2 stage=2; seed=7. Scripted sequence through turn 6 terminating in SUBMIT. Drift airline.price_rename fires turn 3. Agent SPEAK at turn 4 mentions "total_fare_inr". Assertions: obs.drift_log[0].pattern_id == "airline.price_rename"; obs.drift_log[0].turn == 3; obs.tool_results[-2].response references "total_fare_inr" (not "price"); env.rewards().r1 == 1.0; env.rewards().r2 == 1.0; env.rewards().reward β‰ˆ 0.90 Β± 0.05.
I3 test_episode_stage3_compound_drift_timeout env.md Β§8.3 stage=3; seed=2026. Script designed to consume all 16 turns. Two drifts fire (airline turn 3, payment turn 9). Assertions: env.done() is True; env.episode().terminated_by == "TIMEOUT"; env.episode().turns_used == 16; env.rewards().r1 == 0.0; env.rewards().r2 in {0.5, 1.0}; env.rewards().reward < 0.3.
I4 test_episode_audio_boundary_enabled_stubs env.md Β§8.4 audio_boundary_enabled=True, tts_engine=StubTTS(), asr_engine=StubWhisper() (contracts in Β§5 β€” signatures match audio.md Β§2.1–2.2). Stubs are in-process, CUDA-free, deterministic: synthesize(text, language_code, voice_pack=None, *, seed=0, sample_rate_hz=16000) β†’ f"WAV[{text}:{language_code}:{seed}:{sample_rate_hz}]".encode("utf-8"); transcribe(audio_bytes, language_hint, *, beam_size=1, vad_filter=True, max_duration_s=30.0) β†’ TranscriptResult(text=<scripted>, language_detected="hinglish", confidence=0.82, duration_s=1.250). Episode: reset(seed=11) β†’ CLARIFY β†’ TOOL_CALL β†’ SUBMIT. Assertions: stub TTS synthesize called on reset and on every CLARIFY/SPEAK side-channel emission; obs.last_transcript after CLARIFY equals the stubbed ASR text; obs.last_confidence == 0.82; reward computation is 100% textual β€” no TTS bytes reach compute_rewards (verified by asserting episode.actions and episode.tool_results contain no bytes objects).

All integration tests reuse fixtures:

  • goal_airline, goal_restaurant β€” from drift_injector_tests.md Β§5.2 (session-scoped GoalSpec instances)
  • airline_v1, airline_v2, payment_v2 β€” from vendors_tests.md Β§5.1 (per-domain aliases over vendor_states_v{1,2,3}; payment_v2 is the post-auth_scope_bump state)
  • drift_patterns_fixture β€” from drift_injector_tests.md Β§5.1 (authoritative 20-pattern catalogue; individual events + compound schedules used by I2/I3 are defined locally in Β§5 below as drift_event_airline_price_rename_turn3, drift_event_payment_auth_turn9, and schedule_stage3_compound, because drift_injector_tests.md only ships the catalogue, not pre-composed schedules)
  • episode_happy_airline, episode_timeout β€” from rewards_tests.md Β§5 (Β§5.1 and Β§5.4 respectively)
  • valid_tool_call_action, valid_submit_action, valid_observation_reset β€” from models_tests.md Β§5.4 (the factory/instance fixtures used to assemble per-step action sequences)

No integration test touches the network. No test loads a real Kokoro/Whisper model.


4. Coverage target

100% line coverage and β‰₯ 95% branch coverage on driftcall/env.py under pytest --cov=driftcall.env --cov-branch --cov-report=term-missing.

4.1 Error-mode coverage matrix (every E1–E12 raised at least once)

Code Exception Raised by which test
E1 InvalidConfigError U2 (unknown key), U3 (bad stage), U4 (weights sum), U5 (negative weight), U6 (missing TTS), U7 (forbidden TTS). Also raised from U4.3 reset if scripted scheduler produces turn > max_turns β€” covered by a dedicated test test_reset_scripted_bad_schedule_raises_e1.
E2 EnvNotReadyError U30 (state()), plus test_step_before_reset_raises_e2, test_episode_before_reset_raises_e2.
E3 EnvClosedError test_step_after_close_raises_e3, test_reset_after_close_raises_e3.
E4 InvalidActionError U25, U26, U27, U28, plus per-ActionType parametrized cases: missing tool_name on TOOL_CALL, message len 0 and len 2001 on SPEAK, NUL byte in message on CLARIFY, missing confidence on SUBMIT, forbidden tool_name on ABORT.
E5 EpisodeAlreadyTerminalError U34 (double SUBMIT).
E6 EpisodeNotTerminalError U33.
E7 ConcurrentStepError test_reentrant_step_raises_e7 β€” stub a vendor dispatch that re-invokes env.step(other_action); assert E7 raised on the inner call; assert outer state unchanged.
E8 UnknownDomainError test_probe_schema_unknown_domain_raises_e8 β€” PROBE_SCHEMA with tool_name="spaceship".
E9 UnknownToolError test_tool_call_unknown_tool_raises_e9 β€” tool_name="airline.teleport".
E10 DriftInjectionError test_drift_fold_error_propagates_e10 β€” scripted scheduler yields event with unknown pattern_id; env must not swallow.
E11 RewardComputationError test_reward_compute_error_propagates_e11 β€” monkeypatch rewards.compute_rewards to raise; env must surface.
E12 AudioPipelineError test_audio_pipeline_error_on_clarify β€” stub ASR that raises on 2nd transcribe; assert E12 surfaces from step(CLARIFY); episode does NOT terminate (env.md Β§5 E12 note). Second test: test_audio_pipeline_error_on_reset_is_e1_class β€” stub TTS that raises on reset; the env is unready afterwards per env.md Β§5 E12.

Total dedicated error-mode tests: 12 exceptions Γ— β‰₯ 1 = 12 minimum; inventory covers 18 error-mode paths.

4.2 Line/branch targets

  • DriftCallEnv.__init__ β€” 100% line; 100% branch (both config is None and config is dict branches hit in U1, U2).
  • EnvConfig.from_mapping β€” 100% line; 100% branch (all 7 raise branches covered by U2–U7 + reset-bad-schedule).
  • reset β€” 100% line; step 7b audio branch covered by U17 (True) and U10 (False).
  • step β€” 100% line; all 6 ActionType dispatch branches (TOOL_CALL / SPEAK / CLARIFY / PROBE_SCHEMA / SUBMIT / ABORT) each have β‰₯ 1 unit test + integration coverage; drift-fold-empty vs non-empty both covered (I1 empty, I2 non-empty); terminal vs non-terminal both covered (U22 TIMEOUT, U23 SUBMIT, I1/I2/I3 mix).
  • state, close, episode, rewards, done β€” 100% line; all raise/early-return branches covered.
  • _validate_action β€” 100% line; every row of env.md Β§3.1 Table is parametrized (per-ActionType forbidden-field matrix).
  • _build_observation β€” 100% line; last_transcript branches for turn 0 vs mid-episode vs audio-enabled all covered (U17, I4, I1).

Branch coverage < 95% is a hard CI fail.


5. Fixtures

All fixtures defined in tests/conftest.py under the env_* namespace. Shared with docs/tests/deploy_env_space_tests.md (same names, same content).

Name Scope Purpose Reuses
env_stage1_airline function DriftCallEnv({"curriculum_stage":1}) already reset(seed=42), goal forced to airline via scripted task_generator monkeypatch when hermetic goal needed. Provides (env, obs0) tuple. goal_airline from drift_injector_tests.md Β§5.2; airline_v1 from vendors_tests.md Β§5.1.
env_stage2_restaurant_drift function Stage-2 env reset(seed=7) with restaurant goal, scripted scheduler that fires restaurant.items_shape_bump at turn 3. Returns (env, obs0, drift_event). goal_restaurant from drift_injector_tests.md Β§5.2; drift_event_restaurant_items_shape_bump_turn3 defined below.
env_stage3_compound function Stage-3 env reset(seed=2026), scripted scheduler with compound drift (airline turn 3 + payment turn 9). Used by I3. schedule_stage3_compound defined below; reuses drift_patterns_fixture catalogue from drift_injector_tests.md Β§5.1.
env_audio_enabled function Stage-1 env with audio_boundary_enabled=True, tts_engine=StubTTS(), asr_engine=StubWhisper(). Stubs are pure Python, CUDA-free, deterministic. Returns (env, tts_stub, asr_stub) for assertions on call counts. StubTTS, StubWhisper classes defined in tests/stubs/audio_stubs.py.
env_config_invalid_key function {"curriculum_stage":1, "frobnicate":True} β€” a single malformed config dict reused across U2 and any critic-requested smoke test. β€”

Stub engine contracts (pinned here for cross-doc consistency with audio_tests.md; signatures match docs/modules/audio.md Β§2.1 and Β§2.2 exactly):

from driftcall.audio.asr_whisper import TranscriptResult
from driftcall.audio.tts_kokoro import VoicePack

class StubTTS:
    """In-process TTS double. Matches audio.md Β§2.1 `TTSEngine.synthesize` signature."""
    def __init__(self) -> None:
        self.calls: list[tuple[str, str, VoicePack | None, int, int]] = []

    def synthesize(
        self,
        text: str,
        language_code: str,
        voice_pack: VoicePack | None = None,
        *,
        seed: int = 0,
        sample_rate_hz: int = 16000,
    ) -> bytes:
        self.calls.append((text, language_code, voice_pack, seed, sample_rate_hz))
        return f"WAV[{text}:{language_code}:{seed}:{sample_rate_hz}]".encode("utf-8")

class StubWhisper:
    """In-process ASR double. Matches audio.md Β§2.2 `ASREngine.transcribe` signature
    and the 4-field `TranscriptResult` contract (text, language_detected, confidence, duration_s)."""
    def __init__(self, scripted: dict[int, str] | None = None) -> None:
        self.calls: list[bytes] = []
        self._scripted = scripted or {}

    def transcribe(
        self,
        audio_bytes: bytes,
        language_hint: str | None,
        *,
        beam_size: int = 1,
        vad_filter: bool = True,
        max_duration_s: float = 30.0,
    ) -> TranscriptResult:
        self.calls.append(audio_bytes)
        turn = len(self.calls)
        return TranscriptResult(
            text=self._scripted.get(turn, "shaam ko, 7 baje"),
            language_detected="hinglish",
            confidence=0.82,
            duration_s=1.250,
        )

Neither stub exposes a .close() method: audio.md Β§2.1–2.2 defines no such method on TTSEngine/ASREngine, and the engines are process-global singletons (env.md Β§9 Q7) β€” U32 asserts env.close() does NOT invoke anything engine-side, so the stubs simply must not carry a close() attribute at all (U32's "call count is 0" is upgraded to not hasattr(stub, "close") to match the real contract).

5.1 Locally-defined drift events and schedules (not shipped by drift_injector_tests.md)

drift_injector_tests.md Β§5.1 publishes the 20-pattern catalogue (drift_patterns_fixture) but does NOT pre-compose per-test DriftEvent instances or full DriftSchedule objects β€” those are composed locally here because scheduling is an env-side concern. All three fixtures below are session-scoped and import drift_patterns_fixture to look up the authoritative pattern record.

from driftcall.models import DriftEvent
from driftcall.drift_injector import DriftSchedule

@pytest.fixture(scope="session")
def drift_event_airline_price_rename_turn3(drift_patterns_fixture) -> DriftEvent:
    """Used by I2. Pattern id asserted byte-identical to drift_patterns_fixture entry."""
    pattern = next(p for p in drift_patterns_fixture if p.id == "airline.price_rename")
    return DriftEvent(
        turn=3,
        drift_type=pattern.drift_type,       # "schema"
        domain=pattern.domain,               # "airline"
        description=pattern.description,
        from_version=pattern.from_version,   # "v1"
        to_version=pattern.to_version,       # "v2"
    )

@pytest.fixture(scope="session")
def drift_event_restaurant_items_shape_bump_turn3(drift_patterns_fixture) -> DriftEvent:
    """Used by env_stage2_restaurant_drift. `restaurant.items_shape_bump` is the
    canonical restaurant schema drift per drift_injector.md Β§4.4 (items gain required `modifiers`)."""
    pattern = next(p for p in drift_patterns_fixture if p.id == "restaurant.items_shape_bump")
    return DriftEvent(
        turn=3,
        drift_type=pattern.drift_type,
        domain=pattern.domain,
        description=pattern.description,
        from_version=pattern.from_version,
        to_version=pattern.to_version,
    )

@pytest.fixture(scope="session")
def drift_event_payment_auth_turn9(drift_patterns_fixture) -> DriftEvent:
    """Used by I3. Pattern id `payment.auth_scope_upgrade` (Auth axis, drift_injector.md Β§4.4)."""
    pattern = next(p for p in drift_patterns_fixture if p.id == "payment.auth_scope_upgrade")
    return DriftEvent(
        turn=9,
        drift_type=pattern.drift_type,       # "auth"
        domain=pattern.domain,               # "payment"
        description=pattern.description,
        from_version=pattern.from_version,
        to_version=pattern.to_version,
    )

@pytest.fixture(scope="session")
def schedule_stage3_compound(
    drift_event_airline_price_rename_turn3,
    drift_event_payment_auth_turn9,
) -> DriftSchedule:
    """Used by I3. Two drifts, one per domain, matching env.md Β§8.3 worked example."""
    return DriftSchedule(events=(
        drift_event_airline_price_rename_turn3,
        drift_event_payment_auth_turn9,
    ))

These three DriftEvents plus one DriftSchedule are the only fixtures defined in env_tests.md; everything else is imported from the sibling test plans cited in Β§3 above.

Fixture immutability rule: if any field of any fixture changes here, the matching fixture in deploy_env_space_tests.md Β§5 must be updated in the same commit β€” they share a single conftest.py definition. CI guards this via a grep-based pre-commit hook (scripts/check_fixture_parity.sh).