driftcall / cells /step_10_env.md
saumilyajj's picture
Upload folder using huggingface_hub
b43d8da verified

step_10_env β€” DriftCallEnv

Implements docs/modules/env.md and DESIGN.md Β§4.

Public surface

Symbol Kind Notes
DriftCallEnv class OpenEnv-compliant RL environment. Single-session, single-episode-at-a-time.
EnvConfig frozen dataclass Validated config snapshot. Built via EnvConfig.from_mapping(...).
Episode frozen dataclass Terminal-only snapshot fed to cells.step_08_rewards.compute_rewards.
DriftScheduler Protocol (stage, seed, goal) -> tuple[DriftEvent, ...]. Default: drift_injector.build_schedule.
TTSEngine / ASREngine Protocols Audio boundary contracts (env.md Β§2.1).
DriftCallEnvError and 12 subclasses exceptions E1..E12 typed taxonomy.

Wiring

reset(seed)
  └── task_generator.generate(seed, stage, language_weights)
  └── per-domain vendor.initial_state(seed, goal)        # airline, cab, restaurant, hotel, payment
  └── scheduler(stage, seed, goal)                       # default = drift_injector.build_schedule
  └── audio_boundary_enabled? tts_engine.synthesize(seed_utterance, language)
  └── DriftCallObservation(turn=0, ...)

step(action, *, force_drift_pattern=None)
  1a. _validate_action(action)            # pure, raises InvalidActionError BEFORE mutation
  1b. force_drift_pattern resolved        # unknown -> InvalidActionError
  2.  turn += 1                            # via dataclasses.replace
  3.  drift fold:                          # forced pattern OR scheduled pending drifts
        - sort by (turn asc, pattern_id asc)
        - apply via drift_injector.apply_drift
  4.  side-channel emit pass               # vendor.emit_side_channel_if_pending per domain
  5.  dispatch:
        TOOL_CALL    -> vendor.dispatch(...) and merge any pending notice into ToolResult
        SPEAK/CLARIFY-> no state change
        PROBE_SCHEMA -> vendor.describe_schema(state, version), wrapped as ToolResult
        SUBMIT       -> terminate("SUBMIT")
        ABORT        -> terminate("ABORT")
  6.  record action (and ToolResult, if any) via dataclasses.replace
  7.  if turn >= max_turns -> terminate("TIMEOUT")
  8.  if terminal -> build Episode + step_08_rewards.compute_rewards (memoized)
  9.  return DriftCallObservation

Termination

terminated_by ∈ {SUBMIT, ABORT, TIMEOUT, ANTI_HACK}. Reward layer reads terminated_by to force r1=0 for ABORT/TIMEOUT/ANTI_HACK. Episode and Rewards are write-once; episode()/rewards() return memoized identities.

Determinism contract

Same (config, seed) β‡’ byte-identical goal, drift_schedule, and initial vendor_states. The only non-deterministic field is episode_id (uuid4), which is purely an audit handle (env.md Β§9 Q5).

Error taxonomy (E1–E12)

All extend DriftCallEnvError(Exception):

# Class When
E1 InvalidConfigError unknown key, bad weights, missing audio engine, etc.
E2 EnvNotReadyError step/state/episode/rewards before reset
E3 EnvClosedError reset/step after close
E4 InvalidActionError per-ActionType field-matrix violation; force_drift_pattern unknown
E5 EpisodeAlreadyTerminalError step after termination
E6 EpisodeNotTerminalError episode/rewards before termination
E7 ConcurrentStepError reentrant step
E8 UnknownDomainError PROBE_SCHEMA on unregistered domain
E9 UnknownToolError TOOL_CALL with tool_name not in available_tools
E10 DriftInjectionError drift fold failure (propagated from drift_injector)
E11 RewardComputationError compute_rewards failure
E12 AudioPipelineError TTS/ASR engine raised at boundary

Validation in _validate_action is strictly pure: raises before any state mutation, so the env remains valid for a subsequent step().

Audio boundary

audio_boundary_enabled=True requires both tts_engine and asr_engine. On reset() the env calls tts_engine.synthesize(goal.seed_utterance, goal.language); the canonical last_transcript remains the textual seed_utterance. The audio pipeline never feeds bytes back into reward computation.

Out of scope

  • LLM judging β€” never. The env is the judge.
  • Concurrency β€” single-session by contract; no locks, no asyncio.
  • Disk/network I/O at __init__ β€” strictly forbidden.