Spaces:

saumilyajj
/

driftcall

Sleeping

App Files Files Community

driftcall / docs /modules /models.md

saumilyajj

Upload folder using huggingface_hub

f2df60e verified 22 days ago

preview code

raw

history blame contribute delete

28.9 kB

models.md — DriftCall Core Dataclasses

Owner: Person A (Environment) Implements: DESIGN.md §4.1 (Dataclasses), §4.2 (reset), §4.3 (step), §4.4 (termination), §7.1 (Reward inputs) Status: DRAFT — pending ≥ 2 fresh critic rounds

1. Purpose

driftcall/models.py is the single home for every immutable data type that crosses a module boundary in the DriftCall environment. It defines the on-the-wire contract between the agent, the env core, the vendor mocks, the drift injector, the reward suite, and the FastAPI surface. Nothing in this module does work — it only declares shape.

The module serves five consumers simultaneously:

The agent loop — builds DriftCallAction instances and receives DriftCallObservation.
The env core (env.py) — consumes actions, emits observations, owns DriftCallState.
The vendor mocks (vendors/*.py) — return ToolResult.
The drift injector (drift_injector.py) — emits DriftEvent and mutates vendor schemas.
The reward suite (rewards.py) — reads episode trails (actions + tool_results + drift_log) to compute R1–R5.

Because all dataclasses are frozen=True, every state transition produces a new object. This matches the project-wide immutability rule (CLAUDE.md §7) and makes episode replay trivially deterministic.

2. Interface

Every declaration below is the exact target signature. No additional fields may be introduced without a DESIGN.md update first.

from __future__ import annotations

from dataclasses import dataclass, field
from enum import Enum
from typing import Any, Literal


class ActionType(str, Enum):
    TOOL_CALL = "tool_call"
    SPEAK = "speak"
    CLARIFY = "clarify"
    PROBE_SCHEMA = "probe_schema"
    SUBMIT = "submit"
    ABORT = "abort"


@dataclass(frozen=True)
class DriftCallAction:
    action_type: ActionType
    tool_name: str | None = None
    tool_args: dict[str, Any] | None = None
    message: str | None = None
    confidence: float | None = None
    rationale: str | None = None


@dataclass(frozen=True)
class ToolResult:
    tool_name: str
    status: Literal["ok", "schema_error", "policy_error", "auth_error", "timeout"]
    response: dict[str, Any]
    schema_version: str
    latency_ms: int


@dataclass(frozen=True)
class DriftEvent:
    turn: int
    drift_type: Literal["schema", "policy", "tnc", "pricing", "auth"]
    domain: str
    description: str
    from_version: str
    to_version: str
    pattern_id: str                     # registry key — matches drift_injector catalogue


@dataclass(frozen=True)
class GoalSpec:
    domain: str
    intent: str
    slots: dict[str, Any]
    constraints: dict[str, Any]
    language: Literal["hi", "ta", "kn", "en", "hinglish"]
    seed_utterance: str


@dataclass(frozen=True)
class DriftCallObservation:
    turn: int
    goal: GoalSpec
    last_transcript: str
    last_lang: str
    last_confidence: float
    tool_results: tuple[ToolResult, ...]
    drift_log: tuple[DriftEvent, ...]
    budget_remaining: int
    available_tools: tuple[str, ...]


@dataclass(frozen=True)
class DriftCallState:
    episode_id: str
    goal: GoalSpec
    vendor_states: dict[str, dict[str, Any]]
    schema_versions: dict[str, str]
    drift_schedule: tuple[DriftEvent, ...]
    drift_fired: tuple[DriftEvent, ...]
    turn: int
    max_turns: int
    actions: tuple[DriftCallAction, ...]
    done: bool

Exports (module __all__):

__all__ = [
    "ActionType",
    "DriftCallAction",
    "ToolResult",
    "DriftEvent",
    "GoalSpec",
    "DriftCallObservation",
    "DriftCallState",
]

No factory helpers, no from_dict / to_dict methods — serialization lives in env.py and app.py (see §6). This module is pure shape.

3. Behavior spec

3.1 Immutability guarantees

Every class is @dataclass(frozen=True). Attribute assignment after construction raises dataclasses.FrozenInstanceError.
Sequence-typed fields use tuple[...] (not list[...]). This is both semantic ("append means new object") and structural (tuples hash cleanly).
Dict-typed fields (tool_args, response, slots, constraints, vendor_states, schema_versions) are conventionally immutable — the module does not deep-freeze them, and consumers MUST NOT mutate them in place. The env core always constructs new dicts when state evolves (see §3.3). Consumers that need to defensively copy should use copy.deepcopy at read time.
The decision to leave dicts unfrozen (vs. using types.MappingProxyType) is pragmatic: JSON round-trips through FastAPI / TRL / openenv tooling produce plain dict, and wrapping them would force an unwrap on every serialization.

3.2 Equality and hashing

frozen=True combined with the default eq=True means __hash__ is auto-generated using the tuple of fields.
A dataclass with a dict field is not hashable because dict is not hashable. Concretely:
- ActionType — hashable (Enum).
- DriftCallAction — not hashable (tool_args is dict).
- ToolResult — not hashable (response is dict).
- DriftEvent — hashable (all fields are primitive strings/ints).
- GoalSpec — not hashable (slots, constraints are dicts).
- DriftCallObservation — not hashable (contains GoalSpec + tuples of unhashable ToolResult).
- DriftCallState — not hashable (contains dicts and unhashable nested types).
Equality comparison still works for all of them (it compares field-by-field, and dict equality is value-based). Callers that need a hash key should compute one explicitly, e.g. via hash(episode_id) or a canonical JSON dump.

3.3 State evolution pattern

Mutation is forbidden; the env core always uses dataclasses.replace to emit new states:

new_state = replace(state, turn=state.turn + 1, actions=state.actions + (action,))

For dict fields, the env core builds a fresh dict:

new_versions = {**state.schema_versions, "airline": "v2"}
new_state = replace(state, schema_versions=new_versions)

This is a convention enforced by code review + tests — the models.py module itself cannot prevent an ill-behaved caller from state.vendor_states["airline"].update(...). See §5 for the enforcement strategy.

3.4 Serialization semantics

models.py does not define serializers. env.py uses dataclasses.asdict for outbound JSON (FastAPI boundary), which handles nested tuples/dicts correctly. Enum values serialize to their .value ("tool_call", etc.) because ActionType inherits from str.
Deserialization (inbound DriftCallAction from HTTP) happens in app.py with a pydantic model that mirrors the fields — pydantic validates types, then constructs the frozen dataclass. models.py is not involved.
Required invariant for any serializer: the JSON round-trip action → json → action must produce an equal object (== returns True) for every valid action.

3.5 Field-level invariants

Class	Field	Invariant	Enforced by
`DriftCallAction`	`action_type == TOOL_CALL`	`tool_name` and `tool_args` both non-None	`env.step` validation
`DriftCallAction`	`action_type == SPEAK` or `CLARIFY`	`message` non-None	`env.step` validation
`DriftCallAction`	`action_type == SUBMIT`	`confidence` in `[0.0, 1.0]`	`env.step` validation + reward pipeline
`DriftCallAction`	`action_type == PROBE_SCHEMA`	`tool_name` ∈ `{"airline", "cab", "restaurant", "hotel", "payment"}` (bare domain name — NOT `domain.verb` format); `tool_args` may be `None` or `{}`; `confidence` must be `None` (probe is not a submit)	`env.step` validation
`DriftCallAction`	`rationale`	len ≤ 200 chars if non-None	`env.step` validation (R4 penalizes excess)
`ToolResult`	`status == "ok"`	`response` non-empty dict	vendor mock contract
`ToolResult`	`schema_version`	matches `/^v\d+$/`	vendor mock contract
`ToolResult`	`latency_ms`	non-negative int	vendor mock contract
`DriftEvent`	`turn`	`1 ≤ turn ≤ max_turns - 1` (drifts fire strictly before the final turn so the agent has ≥ 1 post-drift turn to react; consistent with DESIGN.md §6.2 stage-2/3 schedules)	drift injector
`DriftEvent`	`from_version != to_version`	always	drift injector + drift pattern library
`GoalSpec`	`language`	one of the 5 Literals	task generator
`DriftCallObservation`	`budget_remaining`	`max_turns - turn`, `≥ 0`	env observation builder
`DriftCallState`	`len(actions) == turn`	always (one action per turn)	env.step
`DriftCallState`	`drift_fired ⊆ drift_schedule`	always (subset, order preserved)	drift injector
`DriftCallState`	`done == True` → no further step allowed	enforced by env	`env.step` raises on terminated state

models.py itself does NOT enforce any of these — they are contracts validated in env.py, vendors/*.py, and drift_injector.py. models.py is purely declarative.

4. Data structures

Every field is documented: name, type, semantic meaning, constraints, and who writes it.

4.1 `ActionType` (Enum)

str-mixed Enum so instances serialize naturally to JSON strings.

Value	Meaning	Required companion fields on `DriftCallAction`
`TOOL_CALL`	Invoke a mock vendor tool	`tool_name`, `tool_args`
`SPEAK`	Emit a text reply to the user (TTS at deploy boundary)	`message`
`CLARIFY`	Ask a clarifying question back to the user	`message`
`PROBE_SCHEMA`	Ask the env for the current schema snapshot of a domain	`tool_name` = domain (e.g., `"airline"`)
`SUBMIT`	Declare the task complete and attach `confidence`	`confidence`
`ABORT`	Explicit failure — terminate episode with R1=0	none

4.2 `DriftCallAction`

Field	Type	Semantic	Constraint	Writer
`action_type`	`ActionType`	Which action variant is being taken	Required; no default	Agent
`tool_name`	`str \| None`	Fully-qualified tool id (`"airline.search"`) or domain for `PROBE_SCHEMA`	`None` unless `action_type ∈ {TOOL_CALL, PROBE_SCHEMA}`; format `domain.verb` for tool calls	Agent
`tool_args`	`dict[str, Any] \| None`	Keyword arguments for the tool call	`None` unless `action_type == TOOL_CALL`; JSON-serializable; no nested callables	Agent
`message`	`str \| None`	Utterance text (user-language-mirrored)	Non-None when `action_type ∈ {SPEAK, CLARIFY}`; Unicode (Devanagari / Tamil / Kannada welcome); max ~4 KB	Agent
`confidence`	`float \| None`	Self-assessed probability of task success	Required when `action_type == SUBMIT`; `0.0 ≤ c ≤ 1.0`; feeds Brier term in §7.2	Agent
`rationale`	`str \| None`	Optional chain-of-thought / reasoning	Max 200 chars enforced at step-time; excess → R4 penalty	Agent

4.3 `ToolResult`

Field	Type	Semantic	Constraint	Writer
`tool_name`	`str`	Echoes the invoked tool	Must equal the triggering `DriftCallAction.tool_name`	Vendor mock
`status`	Literal (5-value)	Outcome classification — drives R2/R5 detection signals	Exactly one of `"ok"`, `"schema_error"`, `"policy_error"`, `"auth_error"`, `"timeout"`	Vendor mock
`response`	`dict[str, Any]`	Raw response body; shape depends on tool + current schema version	JSON-serializable; on non-ok status, contains `error_code` key	Vendor mock
`schema_version`	`str`	Version stamp of the schema used to serialize `response`	Matches `^v\d+$` — currently `"v1"`, `"v2"`, `"v3"`	Vendor mock
`latency_ms`	`int`	Simulated latency (deterministic per seed)	`≥ 0`; typical 50–400 ms; `timeout` status → 5000+	Vendor mock

4.4 `DriftEvent`

Field	Type	Semantic	Constraint	Writer
`turn`	`int`	Turn at which the drift fires (start-of-turn, before action evaluation, DESIGN.md §6.2)	`1 ≤ turn ≤ max_turns - 1`	Drift injector
`drift_type`	Literal (5-value)	Taxonomy — DESIGN.md §6.1	Exactly one of `"schema"`, `"policy"`, `"tnc"`, `"pricing"`, `"auth"`	Drift injector
`domain`	`str`	Target vendor	One of `"airline"`, `"cab"`, `"restaurant"`, `"hotel"`, `"payment"`	Drift injector
`description`	`str`	Human-readable, used by R2 keyword match	Non-empty; ≤ 256 chars; includes drifted field name where applicable	Drift injector
`from_version`	`str`	Schema version before drift	Matches `^v\d+$`; must differ from `to_version`	Drift injector
`to_version`	`str`	Schema version after drift	Matches `^v\d+$`	Drift injector

4.5 `GoalSpec`

Field	Type	Semantic	Constraint	Writer
`domain`	`str`	Primary vendor domain for this goal	One of the 4 consumer domains (`payment` is transversal, not a goal domain)	Task generator
`intent`	`str`	Intent id (e.g., `"book_flight"`, `"order_food"`)	From a closed set defined in `task_generator.md`	Task generator
`slots`	`dict[str, Any]`	Parsed required + optional slots (`{"from": "HYD", "to": "BLR", "when": "2026-04-30"}`)	All values JSON-serializable primitives; keys are string slot names	Task generator
`constraints`	`dict[str, Any]`	Budget / time window / dietary / etc.	Keys drawn from constraint vocabulary documented in `rewards.md`; values JSON primitives	Task generator
`language`	Literal (5-value)	Target language for the brief and expected reply mirror	Exactly one of `"hi"`, `"ta"`, `"kn"`, `"en"`, `"hinglish"`	Task generator
`seed_utterance`	`str`	The raw user utterance (text-form even in training)	Non-empty Unicode; no PII	Task generator

4.6 `DriftCallObservation`

The agent-facing view. Must never leak internal state beyond what DESIGN.md §4.3 defines.

Field	Type	Semantic	Constraint	Writer
`turn`	`int`	Current turn (0 at reset, incremented at step start)	`0 ≤ turn ≤ max_turns`	Env observation builder
`goal`	`GoalSpec`	Immutable goal for the episode — copied by ref from state	Identical `GoalSpec` across all observations in one episode	Env observation builder
`last_transcript`	`str`	Most recent user utterance in text form (post-ASR in deploy, as-authored in training)	Empty string on turn 0 if no prior utterance	Env observation builder
`last_lang`	`str`	Language detected from last utterance	One of the 5 language literals or `""` on turn 0	Env observation builder
`last_confidence`	`float`	ASR confidence for `last_transcript` (1.0 in training)	`0.0 ≤ c ≤ 1.0`	Env observation builder
`tool_results`	`tuple[ToolResult, ...]`	Full history of tool results this episode	Order = chronological; length grows by ≤ 1 per turn	Env observation builder
`drift_log`	`tuple[DriftEvent, ...]`	Drifts that HAVE fired (subset of schedule, order preserved)	Monotonically growing across turns	Env observation builder
`budget_remaining`	`int`	`max_turns - turn`	`≥ 0`	Env observation builder
`available_tools`	`tuple[str, ...]`	Fully-qualified tool ids the agent may call this turn	Stable within an episode (auth-drifted tools still listed but will return `auth_error`)	Env observation builder

4.7 `DriftCallState`

Env-internal authoritative state. The agent never sees this directly.

Field	Type	Semantic	Constraint	Writer
`episode_id`	`str`	Opaque unique id (e.g., `"ep_000123"` or a UUID4)	Unique per session; stable across one episode	Env (at reset)
`goal`	`GoalSpec`	Same object as observation.goal	Never changes within an episode	Env (at reset)
`vendor_states`	`dict[str, dict[str, Any]]`	Mutable mock DBs keyed by domain (`{"airline": {...}, ...}`)	Top-level keys = 5 domains; inner shape defined per-vendor in `vendors.md`	Env + vendor mocks (via `replace`)
`schema_versions`	`dict[str, str]`	Current schema version per domain	Keys = 5 domains; values `^v\d+$`; monotonically advanced by drift injector	Drift injector
`drift_schedule`	`tuple[DriftEvent, ...]`	Pre-computed schedule sampled at reset (DESIGN.md §6.2)	Sorted by `turn` ascending; length 0 / 1 / 2 for curriculum stages 1 / 2 / 3	Drift injector (at reset)
`drift_fired`	`tuple[DriftEvent, ...]`	Drifts that have already fired	Prefix of `drift_schedule` by turn order	Drift injector
`turn`	`int`	Current turn	`0 ≤ turn ≤ max_turns`; starts at 0, increments in `step`	Env
`max_turns`	`int`	Turn budget (8 / 12 / 16 per curriculum stage, DESIGN.md §4.5)	`> 0`; stable across episode	Env (at reset)
`actions`	`tuple[DriftCallAction, ...]`	Full agent action history	`len == turn` invariant	Env
`done`	`bool`	Terminal flag	`False` until SUBMIT / ABORT / timeout / R5 corruption	Env

5. Error modes

models.py itself has effectively zero runtime error surface — it only declares frozen dataclasses. Errors arise in the following situations:

Situation	Exception	Where raised
Caller assigns to a frozen field, e.g. `action.tool_name = "x"`	`dataclasses.FrozenInstanceError`	Python runtime (automatic)
Caller constructs `DriftCallAction` with wrong-typed `action_type` (not an `ActionType`)	`TypeError` at construction if type-checked; otherwise mypy catches it	Python runtime / mypy
Caller constructs a dataclass missing a required field	`TypeError: __init__() missing N required positional arguments`	Python runtime (automatic)
Caller passes `language` outside the 5-value Literal	Not enforced at runtime by Python — accepted. mypy `--strict` catches it. `env.step` validates and raises `ValueError` for HTTP callers.	env.py / app.py validation layer
Caller passes an unhashable object (e.g., set) inside `tool_args`	No error at construction; later `dataclasses.asdict` or `json.dumps` raises `TypeError`	Serialization layer (env.py / app.py)
Vendor mock returns a non-JSON-serializable value (e.g., `set`, `bytes`, a custom class instance) inside `ToolResult.response`	No error at `ToolResult` construction; `TypeError` raised later at the FastAPI/JSON serialization boundary (`json.dumps` in `env.py` / `app.py`). Mitigation: every vendor test (`tests/test_vendors.py`) must assert JSON round-trip safety (`json.loads(json.dumps(result.response))`) for every `ToolResult.response` the vendor can produce.	Serialization layer (env.py / app.py); detection NOT at construction time
Caller attempts to hash `DriftCallAction` / `ToolResult` / `GoalSpec` / `DriftCallObservation` / `DriftCallState`	`TypeError: unhashable type` (because of nested `dict`)	Python runtime when the hash is attempted
Caller mutates a `dict` field in place (e.g., `state.vendor_states["airline"]["x"] = 1`)	No exception — this is a convention violation. Enforcement is review + tests (a property test snapshots `state.vendor_states` before each step and diffs after; any unintended mutation fails the test).	Enforced by `tests/test_env.py` property tests, not by `models.py`

Partial-data behavior: no dataclass in this module supports "partial" construction. Every required field must be provided, or __init__ raises. Optional fields on DriftCallAction default to None. There is no from_partial_dict helper — callers build actions explicitly.

6. Dependencies

6.1 Stdlib only

models.py imports only:

__future__.annotations — PEP 563 deferred-evaluation annotations (mandatory per CLAUDE.md §4.2)
dataclasses.dataclass, dataclasses.field — for frozen dataclass declarations
enum.Enum — for ActionType
typing.Any, typing.Literal — for dict and enumerated-string fields

No third-party imports. No pydantic, no attrs, no msgspec. This keeps models.py importable inside the Unsloth training loop, the FastAPI server, and the reward suite with zero install cost.

6.2 Downstream consumers (who imports `models.py`)

Consumer module	Uses
`driftcall/env.py`	All 7 classes + `ActionType`. Central composition point.
`driftcall/rewards.py`	`DriftCallState`, `DriftCallAction`, `DriftEvent`, `ToolResult`, `ActionType`, `GoalSpec` — reads the episode trail to compute R1–R5.
`driftcall/drift_injector.py`	`DriftEvent`, `DriftCallState` — emits events, returns new state.
`driftcall/vendors/*.py` (5 files)	`ToolResult`. Each vendor returns `ToolResult` from its tool handlers.
`driftcall/task_generator.py`	`GoalSpec` — procedurally samples goals.
`driftcall/audio/asr_whisper.py`	None directly — it produces raw transcript+lang+confidence which `env.py` embeds into `DriftCallObservation`.
`driftcall/audio/tts_kokoro.py`	`DriftCallAction` (reads `.message` field for TTS synthesis).
`app.py` (FastAPI)	All 7 classes for request/response (de)serialization via companion pydantic models.
`training/train_grpo.py`	`DriftCallObservation`, `DriftCallAction`, `ActionType` — builds prompts + parses completions.
`training/eval_baseline.py`, `training/eval_final.py`	Same as training.
`demo/app_gradio.py`	`DriftCallObservation`, `DriftCallAction`, `ActionType` — drives the Gradio trace panel.
`tests/test_*.py`	All 7 classes for fixture construction.

6.3 Upstream dependencies of `models.py`

None. models.py is a leaf — the graph flows outward from it. This is deliberate: making it dependency-free means it can be imported in the trainer process without dragging in FastAPI, whisper, or vendor mocks.

7. Edge cases

Numbered edge cases with expected behavior. These are the cases the test plan (docs/tests/models_tests.md) must cover.

Empty tool_results / drift_log at turn 0. DriftCallObservation constructed at reset() has tool_results=() and drift_log=(). Both must type-check as empty tuples, not None. Tests must assert isinstance(obs.tool_results, tuple) and len(obs.tool_results) == 0. Downstream code iterating with for r in obs.tool_results: works correctly on empty.
None vs empty string on last_transcript / last_lang. At turn 0, before any user utterance has been processed, last_transcript="" and last_lang="" (empty strings), NOT None. This keeps the field non-nullable (typing simpler for the agent) and makes len(last_transcript) == 0 a clean "no-utterance" check. last_confidence=1.0 at turn 0 (treated as "authored", perfect-ASR placeholder).
Unicode slots in GoalSpec.seed_utterance and DriftCallAction.message. Hindi (Devanagari), Tamil, Kannada, and mixed Hinglish strings must round-trip through dataclasses.asdict → json.dumps(ensure_ascii=False) → json.loads → construction unchanged. Tests must cover at least: "मुझे कल दिल्ली जाना है", "{when} அன்று விமானம்", "{when} inda {to} ge", "Bhai Friday ko Bangalore jaana hai".
tool_args with nested dicts and lists. Agents may pass {"filters": {"class": ["economy", "premium"], "max_stops": 1}}. Nested structures must survive JSON round-trip. Non-JSON-serializable values (e.g., set, datetime) are rejected by env.step validation, not by the dataclass itself — this edge case is about documenting that models.py does NOT validate, so callers must.
Large drift_log history. Stage 3 episodes allow up to 2 drifts, so drift_log length is ≤ 2 in practice. However, DriftCallState.actions can grow to max_turns = 16 entries. The observation builder must NOT truncate; full history is always included because R1–R5 need it at submit time. Serializer must handle a 16-action tuple without pathological blowup (sanity check: < 64 KB JSON per typical observation).
DriftCallAction with action_type=SUBMIT and confidence=None. Construction succeeds (no runtime check). env.step validation must reject with ValueError("SUBMIT requires confidence"). Documenting this here because the dataclass's loose default (confidence: float | None = None) could mislead a reader into thinking SUBMIT without confidence is valid.
ToolResult.status != "ok" with an empty response dict. Permitted. Vendor mocks returning schema_error / policy_error / auth_error / timeout MUST still populate response with at least {"error_code": "<CODE>"} so R2 can keyword-match. Empty response={} is allowed by the dataclass but is a vendor-contract bug caught in tests/test_vendors.py.
DriftEvent.from_version == to_version. The Drift Injector MUST NOT emit such events; the drift pattern library rejects them at load. models.py does not enforce — but see §3.5 invariants. Tested by drift_injector tests, not here.
Constructing DriftCallState with len(actions) != turn. models.py accepts this silently. It is a severe env-core bug if it happens; tested by a property assertion in tests/test_env.py on every step. Documented here so critics know the invariant's home.
available_tools on an auth_error-drifted tool. Still listed. Agents should attempt calls to discover the auth drift; removing the tool would leak the drift. This is a critical design decision — documented in the constraint column of §4.6 and repeated here for visibility.

8. Examples

8.1 Constructing a valid `TOOL_CALL` action

from __future__ import annotations

from driftcall.models import ActionType, DriftCallAction

action = DriftCallAction(
    action_type=ActionType.TOOL_CALL,
    tool_name="airline.search",
    tool_args={
        "from": "HYD",
        "to": "BLR",
        "date": "2026-04-25",
        "max_price_inr": 8000,
        "time_window": "evening",
    },
    rationale="User asked for cheapest evening flight under 8000",
)

assert action.action_type is ActionType.TOOL_CALL
assert action.tool_name == "airline.search"
assert action.confidence is None  # not a SUBMIT

8.2 Constructing a `ToolResult` after a successful search

from driftcall.models import ToolResult

result = ToolResult(
    tool_name="airline.search",
    status="ok",
    response={
        "results": [
            {
                "flight_id": "6E-2345",
                "from": "HYD",
                "to": "BLR",
                "depart": "2026-04-25T18:30:00+05:30",
                "price": 7200,
                "currency": "INR",
                "seats_left": 14,
            }
        ]
    },
    schema_version="v1",
    latency_ms=142,
)

assert result.status == "ok"
assert result.response["results"][0]["price"] == 7200

8.3 Constructing a complete `DriftCallObservation` at turn 0 (reset)

from driftcall.models import DriftCallObservation, GoalSpec

goal = GoalSpec(
    domain="airline",
    intent="book_flight",
    slots={"from": "HYD", "to": "BLR", "when": "2026-04-25"},
    constraints={"budget_inr": 8000, "time_window": "evening"},
    language="hinglish",
    seed_utterance="Bhai Friday ko Bangalore jaana hai, 8000 rupees max, 6pm ke baad",
)

obs = DriftCallObservation(
    turn=0,
    goal=goal,
    last_transcript="",
    last_lang="",
    last_confidence=1.0,
    tool_results=(),
    drift_log=(),
    budget_remaining=12,      # Stage 2: max_turns = 12
    available_tools=(
        "airline.search",
        "airline.book",
        "airline.cancel",
        "airline.get_booking",
        "payment.charge",
    ),
)

assert obs.turn == 0
assert obs.goal.language == "hinglish"
assert len(obs.tool_results) == 0

8.4 Constructing a `DriftEvent` and appending it to state via `replace`

from dataclasses import replace

from driftcall.models import DriftCallState, DriftEvent

drift = DriftEvent(
    turn=4,
    drift_type="schema",
    domain="airline",
    description="field 'price' renamed to 'total_fare_inr'; 'currency' removed",
    from_version="v1",
    to_version="v2",
)

new_state = replace(
    state,
    drift_fired=state.drift_fired + (drift,),
    schema_versions={**state.schema_versions, "airline": "v2"},
)

assert drift in new_state.drift_fired
assert new_state.schema_versions["airline"] == "v2"
# Original state untouched:
assert drift not in state.drift_fired

9. Open questions

None — spec is complete. Every dataclass, field, type, and invariant is locked against DESIGN.md §4.1. No ambiguity remains that would block downstream modules (env.md, rewards.md, vendors.md, drift_injector.md, task_generator.md) from referencing this doc as a stable contract.

models.md — DriftCall Core Dataclasses

1. Purpose

2. Interface

3. Behavior spec

3.1 Immutability guarantees

3.2 Equality and hashing

3.3 State evolution pattern

3.4 Serialization semantics

3.5 Field-level invariants

4. Data structures

4.1 ActionType (Enum)

4.2 DriftCallAction

4.3 ToolResult

4.4 DriftEvent

4.5 GoalSpec

4.6 DriftCallObservation

4.7 DriftCallState