Spaces:

DGXAI
/

driftcall

Runtime error

App Files Files Community

driftcall / cells /step_10_env.md

saumilyajj

Upload folder using huggingface_hub

b43d8da verified 30 days ago

preview code

raw

history blame contribute delete

4.38 kB

step_10_env — DriftCallEnv

Implements docs/modules/env.md and DESIGN.md §4.

Public surface

Symbol	Kind	Notes
`DriftCallEnv`	class	OpenEnv-compliant RL environment. Single-session, single-episode-at-a-time.
`EnvConfig`	frozen dataclass	Validated config snapshot. Built via `EnvConfig.from_mapping(...)`.
`Episode`	frozen dataclass	Terminal-only snapshot fed to `cells.step_08_rewards.compute_rewards`.
`DriftScheduler`	Protocol	`(stage, seed, goal) -> tuple[DriftEvent, ...]`. Default: `drift_injector.build_schedule`.
`TTSEngine` / `ASREngine`	Protocols	Audio boundary contracts (env.md §2.1).
`DriftCallEnvError` and 12 subclasses	exceptions	E1..E12 typed taxonomy.

Wiring

reset(seed)
  └── task_generator.generate(seed, stage, language_weights)
  └── per-domain vendor.initial_state(seed, goal)        # airline, cab, restaurant, hotel, payment
  └── scheduler(stage, seed, goal)                       # default = drift_injector.build_schedule
  └── audio_boundary_enabled? tts_engine.synthesize(seed_utterance, language)
  └── DriftCallObservation(turn=0, ...)

step(action, *, force_drift_pattern=None)
  1a. _validate_action(action)            # pure, raises InvalidActionError BEFORE mutation
  1b. force_drift_pattern resolved        # unknown -> InvalidActionError
  2.  turn += 1                            # via dataclasses.replace
  3.  drift fold:                          # forced pattern OR scheduled pending drifts
        - sort by (turn asc, pattern_id asc)
        - apply via drift_injector.apply_drift
  4.  side-channel emit pass               # vendor.emit_side_channel_if_pending per domain
  5.  dispatch:
        TOOL_CALL    -> vendor.dispatch(...) and merge any pending notice into ToolResult
        SPEAK/CLARIFY-> no state change
        PROBE_SCHEMA -> vendor.describe_schema(state, version), wrapped as ToolResult
        SUBMIT       -> terminate("SUBMIT")
        ABORT        -> terminate("ABORT")
  6.  record action (and ToolResult, if any) via dataclasses.replace
  7.  if turn >= max_turns -> terminate("TIMEOUT")
  8.  if terminal -> build Episode + step_08_rewards.compute_rewards (memoized)
  9.  return DriftCallObservation

Termination

terminated_by ∈ {SUBMIT, ABORT, TIMEOUT, ANTI_HACK}. Reward layer reads terminated_by to force r1=0 for ABORT/TIMEOUT/ANTI_HACK. Episode and Rewards are write-once; episode()/rewards() return memoized identities.

Determinism contract

Same (config, seed) ⇒ byte-identical goal, drift_schedule, and initial vendor_states. The only non-deterministic field is episode_id (uuid4), which is purely an audit handle (env.md §9 Q5).

Error taxonomy (E1–E12)

All extend DriftCallEnvError(Exception):

#	Class	When
E1	`InvalidConfigError`	unknown key, bad weights, missing audio engine, etc.
E2	`EnvNotReadyError`	step/state/episode/rewards before reset
E3	`EnvClosedError`	reset/step after close
E4	`InvalidActionError`	per-`ActionType` field-matrix violation; force_drift_pattern unknown
E5	`EpisodeAlreadyTerminalError`	step after termination
E6	`EpisodeNotTerminalError`	episode/rewards before termination
E7	`ConcurrentStepError`	reentrant step
E8	`UnknownDomainError`	PROBE_SCHEMA on unregistered domain
E9	`UnknownToolError`	TOOL_CALL with tool_name not in available_tools
E10	`DriftInjectionError`	drift fold failure (propagated from drift_injector)
E11	`RewardComputationError`	compute_rewards failure
E12	`AudioPipelineError`	TTS/ASR engine raised at boundary

Validation in _validate_action is strictly pure: raises before any state mutation, so the env remains valid for a subsequent step().

Audio boundary

audio_boundary_enabled=True requires both tts_engine and asr_engine. On reset() the env calls tts_engine.synthesize(goal.seed_utterance, goal.language); the canonical last_transcript remains the textual seed_utterance. The audio pipeline never feeds bytes back into reward computation.

Out of scope

LLM judging — never. The env is the judge.
Concurrency — single-session by contract; no locks, no asyncio.
Disk/network I/O at __init__ — strictly forbidden.