Spaces:

saumilyajj
/

driftcall

Sleeping

App Files Files Community

driftcall / docs /tests /models_tests.md

saumilyajj

Upload folder using huggingface_hub

f2df60e verified 30 days ago

preview code

raw

history blame contribute delete

26.9 kB

models_tests.md — Test Plan for `driftcall/models.py`

Owner: Person B (Rewards & Tests) Target module: DRIFTCALL/docs/modules/models.md (sealed) Implements coverage for: DESIGN.md §4.1 (Dataclasses), §4.2 (reset), §4.3 (step), §4.4 (termination), §7.1 (Reward inputs) Frameworks: pytest, hypothesis Status: DRAFT — pending ≥ 1 fresh critic round (test-plan gate is lighter per CLAUDE.md §3.2 Batch D4)

0. Scope & Non-goals

models.py is pure shape — frozen dataclasses + one Enum. There is no runtime behavior to integration-test. This plan therefore concentrates on:

Constructibility — every dataclass can be instantiated with valid inputs and rejects malformed required fields.
Immutability — every frozen guarantee raises FrozenInstanceError on mutation attempt.
Type shape — tuples stay tuples, dicts stay dicts, Optionals default to None, Literals accept documented values.
Hashability boundaries — the classes the doc declares hashable hash; the classes the doc declares unhashable raise TypeError.
Round-trip equality — dataclasses.asdict + JSON encode + decode + reconstruct produces an equal object (the invariant in models.md §3.4).
Invariant witnessing — properties from models.md §3.5 that can be checked by construction are asserted via hypothesis. Invariants enforced elsewhere (e.g. env.step validation) are NOT retested here; they are documented as N/A with a pointer to env_tests.md.

Every test below maps to one numbered clause in docs/modules/models.md. Clause references are embedded in the test docstring as models.md §X.Y / models.md §7 edge N.

1. Unit tests

All unit tests live in DRIFTCALL/tests/test_models.py. Import line under test:

from driftcall.models import (
    ActionType,
    DriftCallAction,
    ToolResult,
    DriftEvent,
    GoalSpec,
    DriftCallObservation,
    DriftCallState,
)

Fixtures (§5) are imported from tests/conftest.py.

1.1 `ActionType` (Enum)

#	Test name	Asserts	Maps to
U1	`test_action_type_members_exactly_six`	`set(ActionType) == {TOOL_CALL, SPEAK, CLARIFY, PROBE_SCHEMA, SUBMIT, ABORT}`; `len(ActionType) == 6`.	models.md §4.1
U2	`test_action_type_is_str_subclass`	`isinstance(ActionType.TOOL_CALL, str)`; `ActionType.TOOL_CALL == "tool_call"`; `json.dumps({"t": ActionType.SPEAK}) == '{"t": "speak"}'` (string-mixed Enum serializes to its `.value`).	models.md §3.4, §4.1
U3	`test_action_type_values_match_spec`	`ActionType.TOOL_CALL.value == "tool_call"` and the other five values equal exactly the lowercase strings listed in the spec table.	models.md §4.1
U4	`test_action_type_is_hashable`	`hash(ActionType.TOOL_CALL)` returns an int; `{ActionType.SPEAK, ActionType.SPEAK} == {ActionType.SPEAK}` (set de-dupes).	models.md §3.2 bullet 1

1.2 `DriftCallAction`

#	Test name	Asserts	Maps to
U5	`test_driftcall_action_happy_tool_call`	Build a `TOOL_CALL` action from `valid_tool_call_action()` fixture; assert all six fields carry the values passed in; `action.confidence is None`; `action.message is None`.	models.md §4.2, §8.1
U6	`test_driftcall_action_happy_submit`	Build a `SUBMIT` action with `confidence=0.87`; assert `action.action_type is ActionType.SUBMIT`, `action.confidence == 0.87`, and `tool_name / tool_args / message` are all `None`.	models.md §4.2, §3.5 SUBMIT row
U7	`test_driftcall_action_happy_speak`	Build a `SPEAK` action with Unicode `message="मुझे कल दिल्ली जाना है"`; assert `action.message` round-trips byte-for-byte.	models.md §7 edge 3
U8	`test_driftcall_action_frozen_mutation_raises`	Construct any action; assert `action.tool_name = "x"` raises `dataclasses.FrozenInstanceError`; same for every one of the 6 fields. Parametrized over all 6 field names.	models.md §3.1, §5 row 1
U9	`test_driftcall_action_defaults_are_none`	`DriftCallAction(action_type=ActionType.ABORT)` succeeds; every optional field defaults to `None`.	models.md §2 (dataclass signature), §4.2
U10	`test_driftcall_action_missing_required_raises`	`DriftCallAction()` raises `TypeError` (missing `action_type`).	models.md §5 row 3
U11	`test_driftcall_action_equality_value_based`	Two actions built with identical fields compare `==`; changing any single field makes them `!=`. Covers all 6 fields via parametrize.	models.md §3.2
U12	`test_driftcall_action_unhashable_due_to_dict`	`hash(valid_tool_call_action())` raises `TypeError: unhashable type: 'dict'`.	models.md §3.2 bullet 2, §5 row 7
U13	`test_driftcall_action_confidence_none_vs_zero`	`DriftCallAction(action_type=SUBMIT, confidence=0.0) != DriftCallAction(action_type=SUBMIT, confidence=None)`; `0.0` is a real low confidence, `None` means absent. Proves the `Optional[float]` semantics are preserved by the dataclass (i.e. `None` is a distinguishable value, not coerced to `0.0`).	models.md §4.2, §7 edge 6

1.3 `ToolResult`

#	Test name	Asserts	Maps to
U14	`test_tool_result_happy_ok`	Build from `valid_tool_result(status="ok")`; fields round-trip; `result.status == "ok"`; `result.response["results"]` is a list.	models.md §4.3, §8.2
U15	`test_tool_result_frozen_mutation_raises`	`result.status = "timeout"` raises `FrozenInstanceError`; parametrized over all 5 fields.	models.md §3.1, §5 row 1
U16	`test_tool_result_accepts_all_five_statuses`	Parametrized over `["ok", "schema_error", "policy_error", "auth_error", "timeout"]`; all construct successfully.	models.md §4.3 status row
U17	`test_tool_result_empty_response_on_non_ok`	`valid_tool_result(status="schema_error", response={})` constructs without error (models.py does not validate; vendor-contract bug detection lives in `test_vendors.py`).	models.md §7 edge 7, §5 row 6
U18	`test_tool_result_unhashable_due_to_dict`	`hash(valid_tool_result())` raises `TypeError`.	models.md §3.2 bullet 2

1.4 `DriftEvent`

#	Test name	Asserts	Maps to
U19	`test_drift_event_happy_schema`	Build from `valid_drift_event(turn=3, domain="airline")`; all six fields match input; `from_version="v1"`, `to_version="v2"`.	models.md §4.4, §8.4
U20	`test_drift_event_frozen_mutation_raises`	`drift.turn = 99` raises `FrozenInstanceError`; parametrized over all 6 fields.	models.md §3.1
U21	`test_drift_event_is_hashable`	`hash(valid_drift_event())` returns an int; `{drift, drift} == {drift}`. Confirms primitive-only fields keep `DriftEvent` hashable (models.md §3.2 bullet 1 asserts this).	models.md §3.2
U22	`test_drift_event_accepts_all_five_drift_types`	Parametrized over `["schema", "policy", "tnc", "pricing", "auth"]`; all construct successfully.	models.md §4.4 drift_type row

1.5 `GoalSpec`

#	Test name	Asserts	Maps to
U23	`test_goal_spec_happy_hinglish`	Build from `valid_goal_spec()`; fields round-trip; `goal.language == "hinglish"`; `goal.slots["from"] == "HYD"`.	models.md §4.5, §8.3
U24	`test_goal_spec_accepts_all_five_languages`	Parametrized over `["hi", "ta", "kn", "en", "hinglish"]`; all construct successfully.	models.md §4.5 language row
U25	`test_goal_spec_frozen_mutation_raises`	`goal.intent = "other"` raises `FrozenInstanceError`.	models.md §3.1
U26	`test_goal_spec_unhashable_due_to_dict`	`hash(valid_goal_spec())` raises `TypeError`.	models.md §3.2 bullet 2
U27	`test_goal_spec_unicode_seed_utterance`	`seed_utterance="{when} அன்று விமானம்"` (Tamil) survives `json.dumps(..., ensure_ascii=False)` + `json.loads` + reconstruction; `reconstructed == original`.	models.md §7 edge 3

1.6 `DriftCallObservation`

#	Test name	Asserts	Maps to
U28	`test_observation_reset_state`	Build turn-0 observation with `tool_results=()`, `drift_log=()`, `last_transcript=""`, `last_lang=""`, `last_confidence=1.0`, `budget_remaining=12`. Assert `isinstance(obs.tool_results, tuple)`, `len(obs.tool_results) == 0`, same for `drift_log`.	models.md §7 edge 1, §7 edge 2, §8.3
U29	`test_observation_tuple_not_list_for_sequences`	Passing `tool_results=[valid_tool_result()]` (a list) still stores a list on the dataclass (Python does not coerce) — the test asserts `type(obs.tool_results) is list` and then asserts that our documented contract requires callers to pass tuples by flagging this shape as a contract-violating fixture. The real enforcement is a separate test that constructs with `tool_results=(valid_tool_result(),)` and asserts `isinstance(obs.tool_results, tuple)`. Covers both the "what Python does" and "what the contract requires" halves of models.md §3.1.	models.md §3.1, §7 edge 1
U30	`test_observation_frozen_mutation_raises`	`obs.turn = 1` raises `FrozenInstanceError`; parametrized over all 9 fields.	models.md §3.1
U31	`test_observation_unhashable`	`hash(obs)` raises `TypeError` (observation contains `GoalSpec` which contains dicts).	models.md §3.2 bullet 2
U32	`test_observation_goal_ref_stable`	Two observations built with the same `GoalSpec` object yield `obs_a.goal is obs_b.goal` when passed the same instance; i.e. `frozen=True` does not deep-copy. Encodes the invariant that goal is copied by reference within an episode.	models.md §4.6 goal row

1.7 `DriftCallState`

#	Test name	Asserts	Maps to
U33	`test_state_happy_turn_zero`	Build state with `turn=0`, `actions=()`, `drift_fired=()`, `done=False`; assert length invariant `len(state.actions) == state.turn` holds at construction; `state.done is False`.	models.md §4.7, §3.5 `len(actions) == turn` row
U34	`test_state_replace_appends_action`	Use `dataclasses.replace(state, turn=state.turn+1, actions=state.actions + (action,))`; assert original `state` is unchanged (`state.turn == 0`, `state.actions == ()`) and new state has `turn == 1`, `actions[-1] is action`. This witnesses the `replace`-don't-mutate pattern from models.md §3.3.	models.md §3.3, §8.4
U35	`test_state_replace_dict_field_builds_new_dict`	Call `replace(state, schema_versions={**state.schema_versions, "airline": "v2"})`; assert `state.schema_versions["airline"] == "v1"` (original untouched) and `new_state.schema_versions["airline"] == "v2"`. Confirms the "always build new dict" convention does not rely on shared references.	models.md §3.3
U36	`test_state_frozen_mutation_raises`	`state.done = True` raises `FrozenInstanceError`; parametrized over all 10 fields.	models.md §3.1
U37	`test_state_unhashable`	`hash(state)` raises `TypeError`.	models.md §3.2 bullet 2

1.8 Cross-cutting: JSON round-trip

#	Test name	Asserts	Maps to
U38	`test_action_json_roundtrip_equality`	For an action with Unicode message, nested-dict `tool_args`, and all optional fields set: `a2 = DriftCallAction(**json.loads(json.dumps(dataclasses.asdict(a1), ensure_ascii=False)))` then `action_type` is re-coerced to `ActionType`; `a2 == a1`.	models.md §3.4 invariant
U39	`test_tool_result_json_roundtrip_equality`	Same pattern for `ToolResult` with nested list-of-dict `response`.	models.md §3.4, §7 edge 4
U40	`test_observation_json_roundtrip_preserves_tuple_length`	After JSON round-trip (which makes tuples into lists), a reconstructor that wraps the sequence fields back into tuples yields `obs2 == obs1`. Documents the reconstructor contract `app.py` must uphold.	models.md §3.4

Unit test count: 40.

2. Property tests (Hypothesis)

All property tests live in DRIFTCALL/tests/test_models_properties.py. Hypothesis settings: deadline=None, max_examples=200 (enough coverage for dataclass shape; no IO so runs are fast).

Shared strategies live in tests/strategies.py:

# strategies.py (sketch for reference — NOT part of this test-plan doc to implement)
import string
from hypothesis import strategies as st
from driftcall.models import ActionType

_versions = st.from_regex(r"^v[1-9]\d?$", fullmatch=True)
_languages = st.sampled_from(["hi", "ta", "kn", "en", "hinglish"])
_domains = st.sampled_from(["airline", "cab", "restaurant", "hotel", "payment"])
_drift_types = st.sampled_from(["schema", "policy", "tnc", "pricing", "auth"])
_statuses = st.sampled_from(["ok", "schema_error", "policy_error", "auth_error", "timeout"])

_json_scalar = st.one_of(st.none(), st.booleans(), st.integers(-1_000_000, 1_000_000),
                         st.floats(allow_nan=False, allow_infinity=False),
                         st.text(alphabet=st.characters(blacklist_categories=("Cs",)), max_size=32))
_json_dict = st.dictionaries(keys=st.text(alphabet=string.ascii_letters, min_size=1, max_size=8),
                             values=_json_scalar, max_size=4)

Each property below names its strategies and states the invariant.

#	Property name	Strategy	Invariant	Maps to
P1	`test_state_turn_matches_len_actions_by_construction`	Generate `turn ∈ [0, 16]`, then build `actions` tuple of exactly `turn` `DriftCallAction` instances; construct `DriftCallState` with those values.	`len(state.actions) == state.turn` holds for every generated example (proves the invariant is witness-able by construction; actual runtime enforcement is in `env_tests.md`). This is the "turn monotone non-decreasing by construction" property at the state-boundary layer.	models.md §3.5 state row, §7 edge 9
P2	`test_drift_fired_is_subset_of_drift_schedule`	Generate a `drift_schedule` tuple of 0..2 `DriftEvent`s with unique ascending `turn`; pick a random prefix length `k ∈ [0, len(schedule)]`; set `drift_fired = drift_schedule[:k]`.	`set(state.drift_fired).issubset(set(state.drift_schedule))` AND `len(state.drift_fired) <= len(state.drift_schedule)` AND `state.drift_fired == state.drift_schedule[:len(state.drift_fired)]` (prefix, order preserved).	models.md §3.5 `drift_fired ⊆ drift_schedule` row, §4.7 drift_fired row
P3	`test_probe_schema_tool_name_is_bare_domain`	Generate `DriftCallAction(action_type=PROBE_SCHEMA, tool_name=domain)` where `domain` comes from `_domains`.	`"." not in action.tool_name` AND `action.tool_name in {"airline","cab","restaurant","hotel","payment"}`. Encodes the "PROBE_SCHEMA carries bare domain, NOT `domain.verb`" invariant from models.md §3.5. (Note: `models.py` itself does not validate this; the property asserts the TEST FIXTURES obey the contract — a regression-canary against a fixture drifting to `"airline.search"` by accident.)	models.md §3.5 PROBE_SCHEMA row
P4	`test_frozen_invariant_universal`	For every dataclass type in `{DriftCallAction, ToolResult, DriftEvent, GoalSpec, DriftCallObservation, DriftCallState}`, generate a valid instance using the type-specific strategy, then for each field name on the class, assert `setattr(instance, field_name, ...)` raises `FrozenInstanceError`.	The frozen guarantee is universal across the module. Stronger than the per-class U8/U15/U20/U25/U30/U36 tests because it is strategy-driven rather than single-example.	models.md §3.1, §5 row 1
P5	`test_json_roundtrip_preserves_equality_for_action`	Generate `DriftCallAction` with random valid fields (Unicode text up to 200 chars for `rationale`, nested JSON dicts for `tool_args`, random `ActionType`). Serialize via `json.dumps(dataclasses.asdict(a), ensure_ascii=False)`, parse, rebuild (re-coerce `action_type` to `ActionType`).	Rebuilt action `== original` for every generated example. Verifies the models.md §3.4 round-trip invariant holds under adversarial shape generation.	models.md §3.4 invariant
P6	`test_observation_budget_non_negative_by_construction`	Generate `max_turns ∈ [1, 16]`, `turn ∈ [0, max_turns]`, then `budget_remaining = max_turns - turn`. Build observation.	`obs.budget_remaining >= 0` always. (models.py does not enforce; the property ensures our fixture arithmetic respects models.md §3.5 budget_remaining row.)	models.md §3.5 budget row, §4.6 budget row

Property test count: 6.

Notes on properties intentionally not tested here:

"DriftCallState.turn monotone non-decreasing across a step sequence" — that is a runtime property of env.step, tested in env_tests.md. At the models.py layer the property is trivially "by construction a single instance has one turn value", which P1 already witnesses by rebuilding valid states at every turn.
"DriftEvent.from_version != to_version" — enforced by drift injector, not by models.py. Tested in drift_injector_tests.md.
"DriftCallAction.confidence ∈ [0, 1] when SUBMIT" — enforced by env.step, tested in env_tests.md.

Each of these deferrals matches models.md §3.5's "Enforced by" column (not models.py).

3. Integration tests

N/A — defer to env_tests.md.

models.py has no IO, no network, no filesystem, no cross-module composition. Every meaningful interaction (e.g. "step consumes action, emits observation") belongs to env.py and is tested in DRIFTCALL/docs/tests/env_tests.md once that test plan is authored. This section is deliberately empty; the pointer is the deliverable.

4. Coverage target

Line coverage: 100% on driftcall/models.py. Branch coverage: ≥ 95% on driftcall/models.py.

models.py is effectively branchless — the only "branches" are the implicit ones the dataclass decorator generates in __init__, __repr__, __eq__, and __hash__. Python's coverage.py with branch=True will report those auto-generated lines; our unit and property tests hit every one:

`models.py` region	Hit by
`class ActionType(str, Enum): ...` — 6 member lines	U1, U2, U3 (every member is named or sampled).
`@dataclass(frozen=True) class DriftCallAction: ...` — 6 field lines + auto-generated `__init__` / `__eq__` / `__repr__` / `__hash__`	U5–U13, U38, P4, P5.
`@dataclass(frozen=True) class ToolResult: ...` — 5 field lines + auto-methods	U14–U18, U39, P4.
`@dataclass(frozen=True) class DriftEvent: ...` — 6 field lines + auto-methods	U19–U22, P2, P4.
`@dataclass(frozen=True) class GoalSpec: ...` — 6 field lines + auto-methods	U23–U27, P4.
`@dataclass(frozen=True) class DriftCallObservation: ...` — 9 field lines + auto-methods	U28–U32, U40, P4, P6.
`@dataclass(frozen=True) class DriftCallState: ...` — 10 field lines + auto-methods	U33–U37, P1, P2, P4.
`__all__ = [...]` list	Imported in every test module; trivially covered.

Verification command (per CLAUDE.md §6):

python3 -m pytest tests/test_models.py tests/test_models_properties.py \
    --cov=driftcall.models --cov-branch --cov-report=term-missing

Gate: if either line-coverage < 100% or branch-coverage < 95%, the PR is blocked and the missing lines/branches are added as new unit tests before proceeding.

5. Fixtures

All fixtures live in DRIFTCALL/tests/conftest.py and are imported by every test file that needs models. They are pytest fixtures (not plain functions) so hypothesis strategies can reuse them via st.builds. Each fixture returns a frozen dataclass built with spec-valid defaults; keyword arguments override specific fields for variation. Naming follows valid_<thing>[_variant] convention.

5.1 `valid_goal_spec`

import pytest
from driftcall.models import GoalSpec

@pytest.fixture
def valid_goal_spec() -> GoalSpec:
    """A spec-valid Hinglish airline booking goal. Matches models.md §8.3."""
    return GoalSpec(
        domain="airline",
        intent="book_flight",
        slots={"from": "HYD", "to": "BLR", "when": "2026-04-25"},
        constraints={"budget_inr": 8000, "time_window": "evening"},
        language="hinglish",
        seed_utterance="Bhai Friday ko Bangalore jaana hai, 8000 rupees max, 6pm ke baad",
    )

Reused by: U23, U26, U28, U33, P6, and every downstream test file (test_env.py, test_rewards.py, test_vendors.py).

5.2 `valid_drift_event`

from driftcall.models import DriftEvent

@pytest.fixture
def valid_drift_event_factory():
    """Factory fixture so tests can override turn/domain. Matches models.md §8.4."""
    def _build(turn: int = 3, domain: str = "airline") -> DriftEvent:
        return DriftEvent(
            turn=turn,
            drift_type="schema",
            domain=domain,
            description="field 'price' renamed to 'total_fare_inr'; 'currency' removed",
            from_version="v1",
            to_version="v2",
        )
    return _build

Also provided as a plain (non-factory) valid_drift_event fixture with defaults turn=3, domain="airline" for convenience.

Reused by: U19, U20, U21, U22, P2, and drift_injector_tests.md / env_tests.md.

5.3 `valid_tool_result`

from driftcall.models import ToolResult
from typing import Any

@pytest.fixture
def valid_tool_result_factory():
    """Factory fixture so tests can override status/response. Matches models.md §8.2."""
    def _build(
        status: str = "ok",
        response: dict[str, Any] | None = None,
        tool_name: str = "airline.search",
        schema_version: str = "v1",
        latency_ms: int = 142,
    ) -> ToolResult:
        if response is None:
            response = (
                {
                    "results": [
                        {
                            "flight_id": "6E-2345",
                            "from": "HYD",
                            "to": "BLR",
                            "depart": "2026-04-25T18:30:00+05:30",
                            "price": 7200,
                            "currency": "INR",
                            "seats_left": 14,
                        }
                    ]
                }
                if status == "ok"
                else {"error_code": status.upper()}
            )
        return ToolResult(
            tool_name=tool_name,
            status=status,  # type: ignore[arg-type]
            response=response,
            schema_version=schema_version,
            latency_ms=latency_ms,
        )
    return _build

A plain valid_tool_result fixture with status="ok" defaults is also exposed.

Reused by: U14, U15, U16, U17, U18, U29, U39, and test_vendors.py, test_rewards.py, test_env.py.

5.4 Supporting fixtures (bonus, for the downstream test files)

These are included here because the fixtures module must ship as one unit, and the briefing names them "reusable across other modules' tests":

valid_tool_call_action() — returns the §8.1 DriftCallAction (airline.search with slots). Consumed by U5, U6, U11, U34.
valid_submit_action(confidence: float = 0.87) — builds a SUBMIT action; consumed by U6, U13.
valid_observation_reset(valid_goal_spec) — builds the §8.3 turn-0 observation; consumed by U28, U29, U30, U31, U32, U40, P6.
valid_state_reset(valid_goal_spec) — builds a turn-0 DriftCallState with max_turns=12, empty action/drift tuples, done=False; consumed by U33, U34, U35, U36, U37, P1.

Fixture count (fixtures that must exist in conftest.py): 7.

valid_goal_spec
valid_drift_event (plain)
valid_drift_event_factory
valid_tool_result (plain)
valid_tool_result_factory
valid_tool_call_action
valid_submit_action
valid_observation_reset
valid_state_reset

(The briefing requires the first three by name; the remaining four are the minimal set required to write the 40 unit tests above without duplicating construction boilerplate. All 9 ship together.)

6. Execution plan

conftest.py fixtures land first (no tests will even collect without them).
test_models.py is authored top-down: U1 → U40, committed in 4 logical chunks (ActionType + DriftCallAction, ToolResult + DriftEvent, GoalSpec + DriftCallObservation, DriftCallState + JSON round-trip). Each chunk is a separate commit with passing pytest -v + coverage output.
test_models_properties.py lands last, after all unit tests are green. Hypothesis shrinking behavior requires the fixtures and strategies to be stable first.
Coverage gate: pytest --cov=driftcall.models --cov-branch must show 100% line / ≥ 95% branch before PR.
Ruff + mypy --strict run on both test files (CLAUDE.md §6 commands). The test files themselves must obey the same hygiene bar as production code.

7. Traceability summary

models.md clause	Covered by
§2 (Interface)	U5–U37 (every dataclass built with full spec signature)
§3.1 Immutability	U8, U15, U20, U25, U30, U36, P4
§3.2 Equality & hashing	U4, U11, U12, U18, U21, U26, U31, U37
§3.3 `replace` pattern	U34, U35
§3.4 Serialization round-trip	U38, U39, U40, P5
§3.5 Field invariants (constructible-at-boundary subset)	P1, P2, P3, P6
§4.1–4.7 (per-class field tables)	U1–U37 (one class per subsection)
§5 Error modes	U8, U10, U12, U15, U18, U20, U25, U26, U30, U31, U36, U37
§7 Edge 1 (empty tuples at turn 0)	U28, U29
§7 Edge 2 (empty strings vs None)	U28
§7 Edge 3 (Unicode round-trip)	U7, U27, U38
§7 Edge 4 (nested tool_args)	U38, U39
§7 Edge 5 (large history)	N/A at models layer — deferred to `env_tests.md` §integration
§7 Edge 6 (SUBMIT + confidence=None)	U13
§7 Edge 7 (non-ok with empty response)	U17
§7 Edge 8 (from_version == to_version)	N/A at models layer — deferred to `drift_injector_tests.md`
§7 Edge 9 (`len(actions) != turn`)	P1 (witnesses constructibility of the good case; violation-detection deferred to `env_tests.md`)
§7 Edge 10 (auth-drifted tool still listed)	N/A at models layer — deferred to `env_tests.md`
§8.1–8.4 (worked examples)	U5, U14, U19, U28, U33, U34

Every models.md clause is either directly tested here or explicitly deferred to a named downstream test plan with a reason. No clause is silently unaddressed.

8. Open questions

None. The test surface for models.py is bounded by the module's pure-shape nature: 40 unit tests + 6 properties + 9 fixtures + 100%/95% coverage exhaustively cover every field, every frozen guarantee, every hashability rule, every JSON round-trip invariant, and every edge case that can be witnessed at construction time. All edge cases requiring runtime enforcement (env.step validation, drift injector rules) are explicitly routed to their home test plans.

models_tests.md — Test Plan for driftcall/models.py

0. Scope & Non-goals

1. Unit tests

1.1 ActionType (Enum)

1.2 DriftCallAction

1.3 ToolResult

1.4 DriftEvent

1.5 GoalSpec

1.6 DriftCallObservation

1.7 DriftCallState