Spaces:

saumilyajj
/

driftcall

Sleeping

App Files Files Community

driftcall / docs /modules /models.md

saumilyajj

Upload folder using huggingface_hub

f2df60e verified 22 days ago

preview code

raw

history blame contribute delete

28.9 kB

	# models.md — DriftCall Core Dataclasses

	Owner: Person A (Environment)
	Implements: DESIGN.md §4.1 (Dataclasses), §4.2 (reset), §4.3 (step), §4.4 (termination), §7.1 (Reward inputs)
	Status: DRAFT — pending ≥ 2 fresh critic rounds

	---

	## 1. Purpose

	`driftcall/models.py` is the single home for every immutable data type that crosses a module boundary in the DriftCall environment. It defines the on-the-wire contract between the agent, the env core, the vendor mocks, the drift injector, the reward suite, and the FastAPI surface. Nothing in this module does work — it only declares shape.

	The module serves five consumers simultaneously:

	1. The agent loop — builds `DriftCallAction` instances and receives `DriftCallObservation`.
	2. The env core (`env.py`) — consumes actions, emits observations, owns `DriftCallState`.
	3. The vendor mocks (`vendors/*.py`) — return `ToolResult`.
	4. The drift injector (`drift_injector.py`) — emits `DriftEvent` and mutates vendor schemas.
	5. The reward suite (`rewards.py`) — reads episode trails (actions + tool_results + drift_log) to compute R1–R5.

	Because all dataclasses are `frozen=True`, every state transition produces a new object. This matches the project-wide immutability rule (CLAUDE.md §7) and makes episode replay trivially deterministic.

	---

	## 2. Interface

	Every declaration below is the exact target signature. No additional fields may be introduced without a DESIGN.md update first.

	```python
	from __future__ import annotations

	from dataclasses import dataclass, field
	from enum import Enum
	from typing import Any, Literal


	class ActionType(str, Enum):
	TOOL_CALL = "tool_call"
	SPEAK = "speak"
	CLARIFY = "clarify"
	PROBE_SCHEMA = "probe_schema"
	SUBMIT = "submit"
	ABORT = "abort"


	@dataclass(frozen=True)
	class DriftCallAction:
	action_type: ActionType
	tool_name: str \| None = None
	tool_args: dict[str, Any] \| None = None
	message: str \| None = None
	confidence: float \| None = None
	rationale: str \| None = None


	@dataclass(frozen=True)
	class ToolResult:
	tool_name: str
	status: Literal["ok", "schema_error", "policy_error", "auth_error", "timeout"]
	response: dict[str, Any]
	schema_version: str
	latency_ms: int


	@dataclass(frozen=True)
	class DriftEvent:
	turn: int
	drift_type: Literal["schema", "policy", "tnc", "pricing", "auth"]
	domain: str
	description: str
	from_version: str
	to_version: str
	pattern_id: str # registry key — matches drift_injector catalogue


	@dataclass(frozen=True)
	class GoalSpec:
	domain: str
	intent: str
	slots: dict[str, Any]
	constraints: dict[str, Any]
	language: Literal["hi", "ta", "kn", "en", "hinglish"]
	seed_utterance: str


	@dataclass(frozen=True)
	class DriftCallObservation:
	turn: int
	goal: GoalSpec
	last_transcript: str
	last_lang: str
	last_confidence: float
	tool_results: tuple[ToolResult, ...]
	drift_log: tuple[DriftEvent, ...]
	budget_remaining: int
	available_tools: tuple[str, ...]


	@dataclass(frozen=True)
	class DriftCallState:
	episode_id: str
	goal: GoalSpec
	vendor_states: dict[str, dict[str, Any]]
	schema_versions: dict[str, str]
	drift_schedule: tuple[DriftEvent, ...]
	drift_fired: tuple[DriftEvent, ...]
	turn: int
	max_turns: int
	actions: tuple[DriftCallAction, ...]
	done: bool
	```

	Exports (module `__all__`):

	```python
	__all__ = [
	"ActionType",
	"DriftCallAction",
	"ToolResult",
	"DriftEvent",
	"GoalSpec",
	"DriftCallObservation",
	"DriftCallState",
	]
	```

	No factory helpers, no `from_dict` / `to_dict` methods — serialization lives in `env.py` and `app.py` (see §6). This module is pure shape.

	---

	## 3. Behavior spec

	### 3.1 Immutability guarantees

	- Every class is `@dataclass(frozen=True)`. Attribute assignment after construction raises `dataclasses.FrozenInstanceError`.
	- Sequence-typed fields use `tuple[...]` (not `list[...]`). This is both semantic ("append means new object") and structural (tuples hash cleanly).
	- Dict-typed fields (`tool_args`, `response`, `slots`, `constraints`, `vendor_states`, `schema_versions`) are conventionally immutable — the module does not deep-freeze them, and consumers MUST NOT mutate them in place. The env core always constructs new dicts when state evolves (see §3.3). Consumers that need to defensively copy should use `copy.deepcopy` at read time.
	- The decision to leave dicts unfrozen (vs. using `types.MappingProxyType`) is pragmatic: JSON round-trips through FastAPI / TRL / `openenv` tooling produce plain `dict`, and wrapping them would force an unwrap on every serialization.

	### 3.2 Equality and hashing

	- `frozen=True` combined with the default `eq=True` means `__hash__` is auto-generated using the tuple of fields.
	- A dataclass with a `dict` field is not hashable because `dict` is not hashable. Concretely:
	- `ActionType` — hashable (Enum).
	- `DriftCallAction` — not hashable (`tool_args` is `dict`).
	- `ToolResult` — not hashable (`response` is `dict`).
	- `DriftEvent` — hashable (all fields are primitive strings/ints).
	- `GoalSpec` — not hashable (`slots`, `constraints` are dicts).
	- `DriftCallObservation` — not hashable (contains `GoalSpec` + tuples of unhashable `ToolResult`).
	- `DriftCallState` — not hashable (contains dicts and unhashable nested types).
	- Equality comparison still works for all of them (it compares field-by-field, and dict equality is value-based). Callers that need a hash key should compute one explicitly, e.g. via `hash(episode_id)` or a canonical JSON dump.

	### 3.3 State evolution pattern

	Mutation is forbidden; the env core always uses `dataclasses.replace` to emit new states:

	```python
	new_state = replace(state, turn=state.turn + 1, actions=state.actions + (action,))
	```

	For dict fields, the env core builds a fresh dict:

	```python
	new_versions = {**state.schema_versions, "airline": "v2"}
	new_state = replace(state, schema_versions=new_versions)
	```

	This is a convention enforced by code review + tests — the `models.py` module itself cannot prevent an ill-behaved caller from `state.vendor_states["airline"].update(...)`. See §5 for the enforcement strategy.

	### 3.4 Serialization semantics

	- `models.py` does not define serializers. `env.py` uses `dataclasses.asdict` for outbound JSON (FastAPI boundary), which handles nested tuples/dicts correctly. `Enum` values serialize to their `.value` (`"tool_call"`, etc.) because `ActionType` inherits from `str`.
	- Deserialization (inbound `DriftCallAction` from HTTP) happens in `app.py` with a pydantic model that mirrors the fields — pydantic validates types, then constructs the frozen dataclass. `models.py` is not involved.
	- Required invariant for any serializer: the JSON round-trip `action → json → action` must produce an equal object (`==` returns True) for every valid action.

	### 3.5 Field-level invariants

	\| Class \| Field \| Invariant \| Enforced by \|
	\|---\|---\|---\|---\|
	\| `DriftCallAction` \| `action_type == TOOL_CALL` \| `tool_name` and `tool_args` both non-None \| `env.step` validation \|
	\| `DriftCallAction` \| `action_type == SPEAK` or `CLARIFY` \| `message` non-None \| `env.step` validation \|
	\| `DriftCallAction` \| `action_type == SUBMIT` \| `confidence` in `[0.0, 1.0]` \| `env.step` validation + reward pipeline \|
	\| `DriftCallAction` \| `action_type == PROBE_SCHEMA` \| `tool_name` ∈ `{"airline", "cab", "restaurant", "hotel", "payment"}` (bare domain name — NOT `domain.verb` format); `tool_args` may be `None` or `{}`; `confidence` must be `None` (probe is not a submit) \| `env.step` validation \|
	\| `DriftCallAction` \| `rationale` \| len ≤ 200 chars if non-None \| `env.step` validation (R4 penalizes excess) \|
	\| `ToolResult` \| `status == "ok"` \| `response` non-empty dict \| vendor mock contract \|
	\| `ToolResult` \| `schema_version` \| matches `/^v\d+$/` \| vendor mock contract \|
	\| `ToolResult` \| `latency_ms` \| non-negative int \| vendor mock contract \|
	\| `DriftEvent` \| `turn` \| `1 ≤ turn ≤ max_turns - 1` (drifts fire strictly before the final turn so the agent has ≥ 1 post-drift turn to react; consistent with DESIGN.md §6.2 stage-2/3 schedules) \| drift injector \|
	\| `DriftEvent` \| `from_version != to_version` \| always \| drift injector + drift pattern library \|
	\| `GoalSpec` \| `language` \| one of the 5 Literals \| task generator \|
	\| `DriftCallObservation` \| `budget_remaining` \| `max_turns - turn`, `≥ 0` \| env observation builder \|
	\| `DriftCallState` \| `len(actions) == turn` \| always (one action per turn) \| env.step \|
	\| `DriftCallState` \| `drift_fired ⊆ drift_schedule` \| always (subset, order preserved) \| drift injector \|
	\| `DriftCallState` \| `done == True` → no further step allowed \| enforced by env \| `env.step` raises on terminated state \|

	`models.py` itself does NOT enforce any of these — they are contracts validated in `env.py`, `vendors/*.py`, and `drift_injector.py`. `models.py` is purely declarative.

	---

	## 4. Data structures

	Every field is documented: name, type, semantic meaning, constraints, and who writes it.

	### 4.1 `ActionType` (Enum)

	`str`-mixed Enum so instances serialize naturally to JSON strings.

	\| Value \| Meaning \| Required companion fields on `DriftCallAction` \|
	\|---\|---\|---\|
	\| `TOOL_CALL` \| Invoke a mock vendor tool \| `tool_name`, `tool_args` \|
	\| `SPEAK` \| Emit a text reply to the user (TTS at deploy boundary) \| `message` \|
	\| `CLARIFY` \| Ask a clarifying question back to the user \| `message` \|
	\| `PROBE_SCHEMA` \| Ask the env for the current schema snapshot of a domain \| `tool_name` = domain (e.g., `"airline"`) \|
	\| `SUBMIT` \| Declare the task complete and attach `confidence` \| `confidence` \|
	\| `ABORT` \| Explicit failure — terminate episode with R1=0 \| none \|

	### 4.2 `DriftCallAction`

	\| Field \| Type \| Semantic \| Constraint \| Writer \|
	\|---\|---\|---\|---\|---\|
	\| `action_type` \| `ActionType` \| Which action variant is being taken \| Required; no default \| Agent \|
	\| `tool_name` \| `str \\| None` \| Fully-qualified tool id (`"airline.search"`) or domain for `PROBE_SCHEMA` \| `None` unless `action_type ∈ {TOOL_CALL, PROBE_SCHEMA}`; format `domain.verb` for tool calls \| Agent \|
	\| `tool_args` \| `dict[str, Any] \\| None` \| Keyword arguments for the tool call \| `None` unless `action_type == TOOL_CALL`; JSON-serializable; no nested callables \| Agent \|
	\| `message` \| `str \\| None` \| Utterance text (user-language-mirrored) \| Non-None when `action_type ∈ {SPEAK, CLARIFY}`; Unicode (Devanagari / Tamil / Kannada welcome); max ~4 KB \| Agent \|
	\| `confidence` \| `float \\| None` \| Self-assessed probability of task success \| Required when `action_type == SUBMIT`; `0.0 ≤ c ≤ 1.0`; feeds Brier term in §7.2 \| Agent \|
	\| `rationale` \| `str \\| None` \| Optional chain-of-thought / reasoning \| Max 200 chars enforced at step-time; excess → R4 penalty \| Agent \|

	### 4.3 `ToolResult`

	\| Field \| Type \| Semantic \| Constraint \| Writer \|
	\|---\|---\|---\|---\|---\|
	\| `tool_name` \| `str` \| Echoes the invoked tool \| Must equal the triggering `DriftCallAction.tool_name` \| Vendor mock \|
	\| `status` \| Literal (5-value) \| Outcome classification — drives R2/R5 detection signals \| Exactly one of `"ok"`, `"schema_error"`, `"policy_error"`, `"auth_error"`, `"timeout"` \| Vendor mock \|
	\| `response` \| `dict[str, Any]` \| Raw response body; shape depends on tool + current schema version \| JSON-serializable; on non-ok status, contains `error_code` key \| Vendor mock \|
	\| `schema_version` \| `str` \| Version stamp of the schema used to serialize `response` \| Matches `^v\d+$` — currently `"v1"`, `"v2"`, `"v3"` \| Vendor mock \|
	\| `latency_ms` \| `int` \| Simulated latency (deterministic per seed) \| `≥ 0`; typical 50–400 ms; `timeout` status → 5000+ \| Vendor mock \|

	### 4.4 `DriftEvent`

	\| Field \| Type \| Semantic \| Constraint \| Writer \|
	\|---\|---\|---\|---\|---\|
	\| `turn` \| `int` \| Turn at which the drift fires (start-of-turn, before action evaluation, DESIGN.md §6.2) \| `1 ≤ turn ≤ max_turns - 1` \| Drift injector \|
	\| `drift_type` \| Literal (5-value) \| Taxonomy — DESIGN.md §6.1 \| Exactly one of `"schema"`, `"policy"`, `"tnc"`, `"pricing"`, `"auth"` \| Drift injector \|
	\| `domain` \| `str` \| Target vendor \| One of `"airline"`, `"cab"`, `"restaurant"`, `"hotel"`, `"payment"` \| Drift injector \|
	\| `description` \| `str` \| Human-readable, used by R2 keyword match \| Non-empty; ≤ 256 chars; includes drifted field name where applicable \| Drift injector \|
	\| `from_version` \| `str` \| Schema version before drift \| Matches `^v\d+$`; must differ from `to_version` \| Drift injector \|
	\| `to_version` \| `str` \| Schema version after drift \| Matches `^v\d+$` \| Drift injector \|

	### 4.5 `GoalSpec`

	\| Field \| Type \| Semantic \| Constraint \| Writer \|
	\|---\|---\|---\|---\|---\|
	\| `domain` \| `str` \| Primary vendor domain for this goal \| One of the 4 consumer domains (`payment` is transversal, not a goal domain) \| Task generator \|
	\| `intent` \| `str` \| Intent id (e.g., `"book_flight"`, `"order_food"`) \| From a closed set defined in `task_generator.md` \| Task generator \|
	\| `slots` \| `dict[str, Any]` \| Parsed required + optional slots (`{"from": "HYD", "to": "BLR", "when": "2026-04-30"}`) \| All values JSON-serializable primitives; keys are string slot names \| Task generator \|
	\| `constraints` \| `dict[str, Any]` \| Budget / time window / dietary / etc. \| Keys drawn from constraint vocabulary documented in `rewards.md`; values JSON primitives \| Task generator \|
	\| `language` \| Literal (5-value) \| Target language for the brief and expected reply mirror \| Exactly one of `"hi"`, `"ta"`, `"kn"`, `"en"`, `"hinglish"` \| Task generator \|
	\| `seed_utterance` \| `str` \| The raw user utterance (text-form even in training) \| Non-empty Unicode; no PII \| Task generator \|

	### 4.6 `DriftCallObservation`

	The agent-facing view. Must never leak internal state beyond what DESIGN.md §4.3 defines.

	\| Field \| Type \| Semantic \| Constraint \| Writer \|
	\|---\|---\|---\|---\|---\|
	\| `turn` \| `int` \| Current turn (0 at reset, incremented at step start) \| `0 ≤ turn ≤ max_turns` \| Env observation builder \|
	\| `goal` \| `GoalSpec` \| Immutable goal for the episode — copied by ref from state \| Identical `GoalSpec` across all observations in one episode \| Env observation builder \|
	\| `last_transcript` \| `str` \| Most recent user utterance in text form (post-ASR in deploy, as-authored in training) \| Empty string on turn 0 if no prior utterance \| Env observation builder \|
	\| `last_lang` \| `str` \| Language detected from last utterance \| One of the 5 language literals or `""` on turn 0 \| Env observation builder \|
	\| `last_confidence` \| `float` \| ASR confidence for `last_transcript` (1.0 in training) \| `0.0 ≤ c ≤ 1.0` \| Env observation builder \|
	\| `tool_results` \| `tuple[ToolResult, ...]` \| Full history of tool results this episode \| Order = chronological; length grows by ≤ 1 per turn \| Env observation builder \|
	\| `drift_log` \| `tuple[DriftEvent, ...]` \| Drifts that HAVE fired (subset of schedule, order preserved) \| Monotonically growing across turns \| Env observation builder \|
	\| `budget_remaining` \| `int` \| `max_turns - turn` \| `≥ 0` \| Env observation builder \|
	\| `available_tools` \| `tuple[str, ...]` \| Fully-qualified tool ids the agent may call this turn \| Stable within an episode (auth-drifted tools still listed but will return `auth_error`) \| Env observation builder \|

	### 4.7 `DriftCallState`

	Env-internal authoritative state. The agent never sees this directly.

	\| Field \| Type \| Semantic \| Constraint \| Writer \|
	\|---\|---\|---\|---\|---\|
	\| `episode_id` \| `str` \| Opaque unique id (e.g., `"ep_000123"` or a UUID4) \| Unique per session; stable across one episode \| Env (at reset) \|
	\| `goal` \| `GoalSpec` \| Same object as observation.goal \| Never changes within an episode \| Env (at reset) \|
	\| `vendor_states` \| `dict[str, dict[str, Any]]` \| Mutable mock DBs keyed by domain (`{"airline": {...}, ...}`) \| Top-level keys = 5 domains; inner shape defined per-vendor in `vendors.md` \| Env + vendor mocks (via `replace`) \|
	\| `schema_versions` \| `dict[str, str]` \| Current schema version per domain \| Keys = 5 domains; values `^v\d+$`; monotonically advanced by drift injector \| Drift injector \|
	\| `drift_schedule` \| `tuple[DriftEvent, ...]` \| Pre-computed schedule sampled at reset (DESIGN.md §6.2) \| Sorted by `turn` ascending; length 0 / 1 / 2 for curriculum stages 1 / 2 / 3 \| Drift injector (at reset) \|
	\| `drift_fired` \| `tuple[DriftEvent, ...]` \| Drifts that have already fired \| Prefix of `drift_schedule` by turn order \| Drift injector \|
	\| `turn` \| `int` \| Current turn \| `0 ≤ turn ≤ max_turns`; starts at 0, increments in `step` \| Env \|
	\| `max_turns` \| `int` \| Turn budget (8 / 12 / 16 per curriculum stage, DESIGN.md §4.5) \| `> 0`; stable across episode \| Env (at reset) \|
	\| `actions` \| `tuple[DriftCallAction, ...]` \| Full agent action history \| `len == turn` invariant \| Env \|
	\| `done` \| `bool` \| Terminal flag \| `False` until SUBMIT / ABORT / timeout / R5 corruption \| Env \|

	---

	## 5. Error modes

	`models.py` itself has effectively zero runtime error surface — it only declares frozen dataclasses. Errors arise in the following situations:

	\| Situation \| Exception \| Where raised \|
	\|---\|---\|---\|
	\| Caller assigns to a frozen field, e.g. `action.tool_name = "x"` \| `dataclasses.FrozenInstanceError` \| Python runtime (automatic) \|
	\| Caller constructs `DriftCallAction` with wrong-typed `action_type` (not an `ActionType`) \| `TypeError` at construction if type-checked; otherwise mypy catches it \| Python runtime / mypy \|
	\| Caller constructs a dataclass missing a required field \| `TypeError: __init__() missing N required positional arguments` \| Python runtime (automatic) \|
	\| Caller passes `language` outside the 5-value Literal \| Not enforced at runtime by Python — accepted. mypy `--strict` catches it. `env.step` validates and raises `ValueError` for HTTP callers. \| env.py / app.py validation layer \|
	\| Caller passes an unhashable object (e.g., set) inside `tool_args` \| No error at construction; later `dataclasses.asdict` or `json.dumps` raises `TypeError` \| Serialization layer (env.py / app.py) \|
	\| Vendor mock returns a non-JSON-serializable value (e.g., `set`, `bytes`, a custom class instance) inside `ToolResult.response` \| No error at `ToolResult` construction; `TypeError` raised later at the FastAPI/JSON serialization boundary (`json.dumps` in `env.py` / `app.py`). Mitigation: every vendor test (`tests/test_vendors.py`) must assert JSON round-trip safety (`json.loads(json.dumps(result.response))`) for every `ToolResult.response` the vendor can produce. \| Serialization layer (env.py / app.py); detection NOT at construction time \|
	\| Caller attempts to hash `DriftCallAction` / `ToolResult` / `GoalSpec` / `DriftCallObservation` / `DriftCallState` \| `TypeError: unhashable type` (because of nested `dict`) \| Python runtime when the hash is attempted \|
	\| Caller mutates a `dict` field in place (e.g., `state.vendor_states["airline"]["x"] = 1`) \| No exception — this is a convention violation. Enforcement is review + tests (a property test snapshots `state.vendor_states` before each step and diffs after; any unintended mutation fails the test). \| Enforced by `tests/test_env.py` property tests, not by `models.py` \|

	Partial-data behavior: no dataclass in this module supports "partial" construction. Every required field must be provided, or `__init__` raises. Optional fields on `DriftCallAction` default to `None`. There is no `from_partial_dict` helper — callers build actions explicitly.

	---

	## 6. Dependencies

	### 6.1 Stdlib only

	`models.py` imports only:

	- `__future__.annotations` — PEP 563 deferred-evaluation annotations (mandatory per CLAUDE.md §4.2)
	- `dataclasses.dataclass`, `dataclasses.field` — for frozen dataclass declarations
	- `enum.Enum` — for `ActionType`
	- `typing.Any`, `typing.Literal` — for dict and enumerated-string fields

	No third-party imports. No pydantic, no attrs, no msgspec. This keeps `models.py` importable inside the Unsloth training loop, the FastAPI server, and the reward suite with zero install cost.

	### 6.2 Downstream consumers (who imports `models.py`)

	\| Consumer module \| Uses \|
	\|---\|---\|
	\| `driftcall/env.py` \| All 7 classes + `ActionType`. Central composition point. \|
	\| `driftcall/rewards.py` \| `DriftCallState`, `DriftCallAction`, `DriftEvent`, `ToolResult`, `ActionType`, `GoalSpec` — reads the episode trail to compute R1–R5. \|
	\| `driftcall/drift_injector.py` \| `DriftEvent`, `DriftCallState` — emits events, returns new state. \|
	\| `driftcall/vendors/*.py` (5 files) \| `ToolResult`. Each vendor returns `ToolResult` from its tool handlers. \|
	\| `driftcall/task_generator.py` \| `GoalSpec` — procedurally samples goals. \|
	\| `driftcall/audio/asr_whisper.py` \| None directly — it produces raw transcript+lang+confidence which `env.py` embeds into `DriftCallObservation`. \|
	\| `driftcall/audio/tts_kokoro.py` \| `DriftCallAction` (reads `.message` field for TTS synthesis). \|
	\| `app.py` (FastAPI) \| All 7 classes for request/response (de)serialization via companion pydantic models. \|
	\| `training/train_grpo.py` \| `DriftCallObservation`, `DriftCallAction`, `ActionType` — builds prompts + parses completions. \|
	\| `training/eval_baseline.py`, `training/eval_final.py` \| Same as training. \|
	\| `demo/app_gradio.py` \| `DriftCallObservation`, `DriftCallAction`, `ActionType` — drives the Gradio trace panel. \|
	\| `tests/test_*.py` \| All 7 classes for fixture construction. \|

	### 6.3 Upstream dependencies of `models.py`

	None. `models.py` is a leaf — the graph flows outward from it. This is deliberate: making it dependency-free means it can be imported in the trainer process without dragging in FastAPI, whisper, or vendor mocks.

	---

	## 7. Edge cases

	Numbered edge cases with expected behavior. These are the cases the test plan (`docs/tests/models_tests.md`) must cover.

	1. Empty `tool_results` / `drift_log` at turn 0. `DriftCallObservation` constructed at `reset()` has `tool_results=()` and `drift_log=()`. Both must type-check as empty tuples, not `None`. Tests must assert `isinstance(obs.tool_results, tuple)` and `len(obs.tool_results) == 0`. Downstream code iterating with `for r in obs.tool_results:` works correctly on empty.

	2. `None` vs empty string on `last_transcript` / `last_lang`. At turn 0, before any user utterance has been processed, `last_transcript=""` and `last_lang=""` (empty strings), NOT `None`. This keeps the field non-nullable (typing simpler for the agent) and makes `len(last_transcript) == 0` a clean "no-utterance" check. `last_confidence=1.0` at turn 0 (treated as "authored", perfect-ASR placeholder).

	3. Unicode slots in `GoalSpec.seed_utterance` and `DriftCallAction.message`. Hindi (Devanagari), Tamil, Kannada, and mixed Hinglish strings must round-trip through `dataclasses.asdict` → `json.dumps(ensure_ascii=False)` → `json.loads` → construction unchanged. Tests must cover at least: `"मुझे कल दिल्ली जाना है"`, `"{when} அன்று விமானம்"`, `"{when} inda {to} ge"`, `"Bhai Friday ko Bangalore jaana hai"`.

	4. `tool_args` with nested dicts and lists. Agents may pass `{"filters": {"class": ["economy", "premium"], "max_stops": 1}}`. Nested structures must survive JSON round-trip. Non-JSON-serializable values (e.g., `set`, `datetime`) are rejected by `env.step` validation, not by the dataclass itself — this edge case is about documenting that `models.py` does NOT validate, so callers must.

	5. Large `drift_log` history. Stage 3 episodes allow up to 2 drifts, so `drift_log` length is `≤ 2` in practice. However, `DriftCallState.actions` can grow to `max_turns = 16` entries. The observation builder must NOT truncate; full history is always included because R1–R5 need it at submit time. Serializer must handle a 16-action tuple without pathological blowup (sanity check: < 64 KB JSON per typical observation).

	6. `DriftCallAction` with `action_type=SUBMIT` and `confidence=None`. Construction succeeds (no runtime check). `env.step` validation must reject with `ValueError("SUBMIT requires confidence")`. Documenting this here because the dataclass's loose default (`confidence: float \| None = None`) could mislead a reader into thinking SUBMIT without confidence is valid.

	7. `ToolResult.status != "ok"` with an empty `response` dict. Permitted. Vendor mocks returning `schema_error` / `policy_error` / `auth_error` / `timeout` MUST still populate `response` with at least `{"error_code": "<CODE>"}` so R2 can keyword-match. Empty `response={}` is allowed by the dataclass but is a vendor-contract bug caught in `tests/test_vendors.py`.

	8. `DriftEvent.from_version == to_version`. The Drift Injector MUST NOT emit such events; the drift pattern library rejects them at load. `models.py` does not enforce — but see §3.5 invariants. Tested by `drift_injector` tests, not here.

	9. Constructing `DriftCallState` with `len(actions) != turn`. `models.py` accepts this silently. It is a severe env-core bug if it happens; tested by a property assertion in `tests/test_env.py` on every step. Documented here so critics know the invariant's home.

	10. `available_tools` on an `auth_error`-drifted tool. Still listed. Agents should attempt calls to discover the auth drift; removing the tool would leak the drift. This is a critical design decision — documented in the constraint column of §4.6 and repeated here for visibility.

	---

	## 8. Examples

	### 8.1 Constructing a valid `TOOL_CALL` action

	```python
	from __future__ import annotations

	from driftcall.models import ActionType, DriftCallAction

	action = DriftCallAction(
	action_type=ActionType.TOOL_CALL,
	tool_name="airline.search",
	tool_args={
	"from": "HYD",
	"to": "BLR",
	"date": "2026-04-25",
	"max_price_inr": 8000,
	"time_window": "evening",
	},
	rationale="User asked for cheapest evening flight under 8000",
	)

	assert action.action_type is ActionType.TOOL_CALL
	assert action.tool_name == "airline.search"
	assert action.confidence is None # not a SUBMIT
	```

	### 8.2 Constructing a `ToolResult` after a successful search

	```python
	from driftcall.models import ToolResult

	result = ToolResult(
	tool_name="airline.search",
	status="ok",
	response={
	"results": [
	{
	"flight_id": "6E-2345",
	"from": "HYD",
	"to": "BLR",
	"depart": "2026-04-25T18:30:00+05:30",
	"price": 7200,
	"currency": "INR",
	"seats_left": 14,
	}
	]
	},
	schema_version="v1",
	latency_ms=142,
	)

	assert result.status == "ok"
	assert result.response["results"][0]["price"] == 7200
	```

	### 8.3 Constructing a complete `DriftCallObservation` at turn 0 (reset)

	```python
	from driftcall.models import DriftCallObservation, GoalSpec

	goal = GoalSpec(
	domain="airline",
	intent="book_flight",
	slots={"from": "HYD", "to": "BLR", "when": "2026-04-25"},
	constraints={"budget_inr": 8000, "time_window": "evening"},
	language="hinglish",
	seed_utterance="Bhai Friday ko Bangalore jaana hai, 8000 rupees max, 6pm ke baad",
	)

	obs = DriftCallObservation(
	turn=0,
	goal=goal,
	last_transcript="",
	last_lang="",
	last_confidence=1.0,
	tool_results=(),
	drift_log=(),
	budget_remaining=12, # Stage 2: max_turns = 12
	available_tools=(
	"airline.search",
	"airline.book",
	"airline.cancel",
	"airline.get_booking",
	"payment.charge",
	),
	)

	assert obs.turn == 0
	assert obs.goal.language == "hinglish"
	assert len(obs.tool_results) == 0
	```

	### 8.4 Constructing a `DriftEvent` and appending it to state via `replace`

	```python
	from dataclasses import replace

	from driftcall.models import DriftCallState, DriftEvent

	drift = DriftEvent(
	turn=4,
	drift_type="schema",
	domain="airline",
	description="field 'price' renamed to 'total_fare_inr'; 'currency' removed",
	from_version="v1",
	to_version="v2",
	)

	new_state = replace(
	state,
	drift_fired=state.drift_fired + (drift,),
	schema_versions={**state.schema_versions, "airline": "v2"},
	)

	assert drift in new_state.drift_fired
	assert new_state.schema_versions["airline"] == "v2"
	# Original state untouched:
	assert drift not in state.drift_fired
	```

	---

	## 9. Open questions

	None — spec is complete. Every dataclass, field, type, and invariant is locked against DESIGN.md §4.1. No ambiguity remains that would block downstream modules (`env.md`, `rewards.md`, `vendors.md`, `drift_injector.md`, `task_generator.md`) from referencing this doc as a stable contract.