Spaces:
Sleeping
models_tests.md — Test Plan for driftcall/models.py
Owner: Person B (Rewards & Tests)
Target module: DRIFTCALL/docs/modules/models.md (sealed)
Implements coverage for: DESIGN.md §4.1 (Dataclasses), §4.2 (reset), §4.3 (step), §4.4 (termination), §7.1 (Reward inputs)
Frameworks: pytest, hypothesis
Status: DRAFT — pending ≥ 1 fresh critic round (test-plan gate is lighter per CLAUDE.md §3.2 Batch D4)
0. Scope & Non-goals
models.py is pure shape — frozen dataclasses + one Enum. There is no runtime behavior to integration-test. This plan therefore concentrates on:
- Constructibility — every dataclass can be instantiated with valid inputs and rejects malformed required fields.
- Immutability — every frozen guarantee raises
FrozenInstanceErroron mutation attempt. - Type shape — tuples stay tuples, dicts stay dicts, Optionals default to
None, Literals accept documented values. - Hashability boundaries — the classes the doc declares hashable hash; the classes the doc declares unhashable raise
TypeError. - Round-trip equality —
dataclasses.asdict+ JSON encode + decode + reconstruct produces an equal object (the invariant in models.md §3.4). - Invariant witnessing — properties from models.md §3.5 that can be checked by construction are asserted via hypothesis. Invariants enforced elsewhere (e.g.
env.stepvalidation) are NOT retested here; they are documented as N/A with a pointer toenv_tests.md.
Every test below maps to one numbered clause in docs/modules/models.md. Clause references are embedded in the test docstring as models.md §X.Y / models.md §7 edge N.
1. Unit tests
All unit tests live in DRIFTCALL/tests/test_models.py. Import line under test:
from driftcall.models import (
ActionType,
DriftCallAction,
ToolResult,
DriftEvent,
GoalSpec,
DriftCallObservation,
DriftCallState,
)
Fixtures (§5) are imported from tests/conftest.py.
1.1 ActionType (Enum)
| # | Test name | Asserts | Maps to |
|---|---|---|---|
| U1 | test_action_type_members_exactly_six |
set(ActionType) == {TOOL_CALL, SPEAK, CLARIFY, PROBE_SCHEMA, SUBMIT, ABORT}; len(ActionType) == 6. |
models.md §4.1 |
| U2 | test_action_type_is_str_subclass |
isinstance(ActionType.TOOL_CALL, str); ActionType.TOOL_CALL == "tool_call"; json.dumps({"t": ActionType.SPEAK}) == '{"t": "speak"}' (string-mixed Enum serializes to its .value). |
models.md §3.4, §4.1 |
| U3 | test_action_type_values_match_spec |
ActionType.TOOL_CALL.value == "tool_call" and the other five values equal exactly the lowercase strings listed in the spec table. |
models.md §4.1 |
| U4 | test_action_type_is_hashable |
hash(ActionType.TOOL_CALL) returns an int; {ActionType.SPEAK, ActionType.SPEAK} == {ActionType.SPEAK} (set de-dupes). |
models.md §3.2 bullet 1 |
1.2 DriftCallAction
| # | Test name | Asserts | Maps to |
|---|---|---|---|
| U5 | test_driftcall_action_happy_tool_call |
Build a TOOL_CALL action from valid_tool_call_action() fixture; assert all six fields carry the values passed in; action.confidence is None; action.message is None. |
models.md §4.2, §8.1 |
| U6 | test_driftcall_action_happy_submit |
Build a SUBMIT action with confidence=0.87; assert action.action_type is ActionType.SUBMIT, action.confidence == 0.87, and tool_name / tool_args / message are all None. |
models.md §4.2, §3.5 SUBMIT row |
| U7 | test_driftcall_action_happy_speak |
Build a SPEAK action with Unicode message="मुझे कल दिल्ली जाना है"; assert action.message round-trips byte-for-byte. |
models.md §7 edge 3 |
| U8 | test_driftcall_action_frozen_mutation_raises |
Construct any action; assert action.tool_name = "x" raises dataclasses.FrozenInstanceError; same for every one of the 6 fields. Parametrized over all 6 field names. |
models.md §3.1, §5 row 1 |
| U9 | test_driftcall_action_defaults_are_none |
DriftCallAction(action_type=ActionType.ABORT) succeeds; every optional field defaults to None. |
models.md §2 (dataclass signature), §4.2 |
| U10 | test_driftcall_action_missing_required_raises |
DriftCallAction() raises TypeError (missing action_type). |
models.md §5 row 3 |
| U11 | test_driftcall_action_equality_value_based |
Two actions built with identical fields compare ==; changing any single field makes them !=. Covers all 6 fields via parametrize. |
models.md §3.2 |
| U12 | test_driftcall_action_unhashable_due_to_dict |
hash(valid_tool_call_action()) raises TypeError: unhashable type: 'dict'. |
models.md §3.2 bullet 2, §5 row 7 |
| U13 | test_driftcall_action_confidence_none_vs_zero |
DriftCallAction(action_type=SUBMIT, confidence=0.0) != DriftCallAction(action_type=SUBMIT, confidence=None); 0.0 is a real low confidence, None means absent. Proves the Optional[float] semantics are preserved by the dataclass (i.e. None is a distinguishable value, not coerced to 0.0). |
models.md §4.2, §7 edge 6 |
1.3 ToolResult
| # | Test name | Asserts | Maps to |
|---|---|---|---|
| U14 | test_tool_result_happy_ok |
Build from valid_tool_result(status="ok"); fields round-trip; result.status == "ok"; result.response["results"] is a list. |
models.md §4.3, §8.2 |
| U15 | test_tool_result_frozen_mutation_raises |
result.status = "timeout" raises FrozenInstanceError; parametrized over all 5 fields. |
models.md §3.1, §5 row 1 |
| U16 | test_tool_result_accepts_all_five_statuses |
Parametrized over ["ok", "schema_error", "policy_error", "auth_error", "timeout"]; all construct successfully. |
models.md §4.3 status row |
| U17 | test_tool_result_empty_response_on_non_ok |
valid_tool_result(status="schema_error", response={}) constructs without error (models.py does not validate; vendor-contract bug detection lives in test_vendors.py). |
models.md §7 edge 7, §5 row 6 |
| U18 | test_tool_result_unhashable_due_to_dict |
hash(valid_tool_result()) raises TypeError. |
models.md §3.2 bullet 2 |
1.4 DriftEvent
| # | Test name | Asserts | Maps to |
|---|---|---|---|
| U19 | test_drift_event_happy_schema |
Build from valid_drift_event(turn=3, domain="airline"); all six fields match input; from_version="v1", to_version="v2". |
models.md §4.4, §8.4 |
| U20 | test_drift_event_frozen_mutation_raises |
drift.turn = 99 raises FrozenInstanceError; parametrized over all 6 fields. |
models.md §3.1 |
| U21 | test_drift_event_is_hashable |
hash(valid_drift_event()) returns an int; {drift, drift} == {drift}. Confirms primitive-only fields keep DriftEvent hashable (models.md §3.2 bullet 1 asserts this). |
models.md §3.2 |
| U22 | test_drift_event_accepts_all_five_drift_types |
Parametrized over ["schema", "policy", "tnc", "pricing", "auth"]; all construct successfully. |
models.md §4.4 drift_type row |
1.5 GoalSpec
| # | Test name | Asserts | Maps to |
|---|---|---|---|
| U23 | test_goal_spec_happy_hinglish |
Build from valid_goal_spec(); fields round-trip; goal.language == "hinglish"; goal.slots["from"] == "HYD". |
models.md §4.5, §8.3 |
| U24 | test_goal_spec_accepts_all_five_languages |
Parametrized over ["hi", "ta", "kn", "en", "hinglish"]; all construct successfully. |
models.md §4.5 language row |
| U25 | test_goal_spec_frozen_mutation_raises |
goal.intent = "other" raises FrozenInstanceError. |
models.md §3.1 |
| U26 | test_goal_spec_unhashable_due_to_dict |
hash(valid_goal_spec()) raises TypeError. |
models.md §3.2 bullet 2 |
| U27 | test_goal_spec_unicode_seed_utterance |
seed_utterance="{when} அன்று விமானம்" (Tamil) survives json.dumps(..., ensure_ascii=False) + json.loads + reconstruction; reconstructed == original. |
models.md §7 edge 3 |
1.6 DriftCallObservation
| # | Test name | Asserts | Maps to |
|---|---|---|---|
| U28 | test_observation_reset_state |
Build turn-0 observation with tool_results=(), drift_log=(), last_transcript="", last_lang="", last_confidence=1.0, budget_remaining=12. Assert isinstance(obs.tool_results, tuple), len(obs.tool_results) == 0, same for drift_log. |
models.md §7 edge 1, §7 edge 2, §8.3 |
| U29 | test_observation_tuple_not_list_for_sequences |
Passing tool_results=[valid_tool_result()] (a list) still stores a list on the dataclass (Python does not coerce) — the test asserts type(obs.tool_results) is list and then asserts that our documented contract requires callers to pass tuples by flagging this shape as a contract-violating fixture. The real enforcement is a separate test that constructs with tool_results=(valid_tool_result(),) and asserts isinstance(obs.tool_results, tuple). Covers both the "what Python does" and "what the contract requires" halves of models.md §3.1. |
models.md §3.1, §7 edge 1 |
| U30 | test_observation_frozen_mutation_raises |
obs.turn = 1 raises FrozenInstanceError; parametrized over all 9 fields. |
models.md §3.1 |
| U31 | test_observation_unhashable |
hash(obs) raises TypeError (observation contains GoalSpec which contains dicts). |
models.md §3.2 bullet 2 |
| U32 | test_observation_goal_ref_stable |
Two observations built with the same GoalSpec object yield obs_a.goal is obs_b.goal when passed the same instance; i.e. frozen=True does not deep-copy. Encodes the invariant that goal is copied by reference within an episode. |
models.md §4.6 goal row |
1.7 DriftCallState
| # | Test name | Asserts | Maps to |
|---|---|---|---|
| U33 | test_state_happy_turn_zero |
Build state with turn=0, actions=(), drift_fired=(), done=False; assert length invariant len(state.actions) == state.turn holds at construction; state.done is False. |
models.md §4.7, §3.5 len(actions) == turn row |
| U34 | test_state_replace_appends_action |
Use dataclasses.replace(state, turn=state.turn+1, actions=state.actions + (action,)); assert original state is unchanged (state.turn == 0, state.actions == ()) and new state has turn == 1, actions[-1] is action. This witnesses the replace-don't-mutate pattern from models.md §3.3. |
models.md §3.3, §8.4 |
| U35 | test_state_replace_dict_field_builds_new_dict |
Call replace(state, schema_versions={**state.schema_versions, "airline": "v2"}); assert state.schema_versions["airline"] == "v1" (original untouched) and new_state.schema_versions["airline"] == "v2". Confirms the "always build new dict" convention does not rely on shared references. |
models.md §3.3 |
| U36 | test_state_frozen_mutation_raises |
state.done = True raises FrozenInstanceError; parametrized over all 10 fields. |
models.md §3.1 |
| U37 | test_state_unhashable |
hash(state) raises TypeError. |
models.md §3.2 bullet 2 |
1.8 Cross-cutting: JSON round-trip
| # | Test name | Asserts | Maps to |
|---|---|---|---|
| U38 | test_action_json_roundtrip_equality |
For an action with Unicode message, nested-dict tool_args, and all optional fields set: a2 = DriftCallAction(**json.loads(json.dumps(dataclasses.asdict(a1), ensure_ascii=False))) then action_type is re-coerced to ActionType; a2 == a1. |
models.md §3.4 invariant |
| U39 | test_tool_result_json_roundtrip_equality |
Same pattern for ToolResult with nested list-of-dict response. |
models.md §3.4, §7 edge 4 |
| U40 | test_observation_json_roundtrip_preserves_tuple_length |
After JSON round-trip (which makes tuples into lists), a reconstructor that wraps the sequence fields back into tuples yields obs2 == obs1. Documents the reconstructor contract app.py must uphold. |
models.md §3.4 |
Unit test count: 40.
2. Property tests (Hypothesis)
All property tests live in DRIFTCALL/tests/test_models_properties.py. Hypothesis settings: deadline=None, max_examples=200 (enough coverage for dataclass shape; no IO so runs are fast).
Shared strategies live in tests/strategies.py:
# strategies.py (sketch for reference — NOT part of this test-plan doc to implement)
import string
from hypothesis import strategies as st
from driftcall.models import ActionType
_versions = st.from_regex(r"^v[1-9]\d?$", fullmatch=True)
_languages = st.sampled_from(["hi", "ta", "kn", "en", "hinglish"])
_domains = st.sampled_from(["airline", "cab", "restaurant", "hotel", "payment"])
_drift_types = st.sampled_from(["schema", "policy", "tnc", "pricing", "auth"])
_statuses = st.sampled_from(["ok", "schema_error", "policy_error", "auth_error", "timeout"])
_json_scalar = st.one_of(st.none(), st.booleans(), st.integers(-1_000_000, 1_000_000),
st.floats(allow_nan=False, allow_infinity=False),
st.text(alphabet=st.characters(blacklist_categories=("Cs",)), max_size=32))
_json_dict = st.dictionaries(keys=st.text(alphabet=string.ascii_letters, min_size=1, max_size=8),
values=_json_scalar, max_size=4)
Each property below names its strategies and states the invariant.
| # | Property name | Strategy | Invariant | Maps to |
|---|---|---|---|---|
| P1 | test_state_turn_matches_len_actions_by_construction |
Generate turn ∈ [0, 16], then build actions tuple of exactly turn DriftCallAction instances; construct DriftCallState with those values. |
len(state.actions) == state.turn holds for every generated example (proves the invariant is witness-able by construction; actual runtime enforcement is in env_tests.md). This is the "turn monotone non-decreasing by construction" property at the state-boundary layer. |
models.md §3.5 state row, §7 edge 9 |
| P2 | test_drift_fired_is_subset_of_drift_schedule |
Generate a drift_schedule tuple of 0..2 DriftEvents with unique ascending turn; pick a random prefix length k ∈ [0, len(schedule)]; set drift_fired = drift_schedule[:k]. |
set(state.drift_fired).issubset(set(state.drift_schedule)) AND len(state.drift_fired) <= len(state.drift_schedule) AND state.drift_fired == state.drift_schedule[:len(state.drift_fired)] (prefix, order preserved). |
models.md §3.5 drift_fired ⊆ drift_schedule row, §4.7 drift_fired row |
| P3 | test_probe_schema_tool_name_is_bare_domain |
Generate DriftCallAction(action_type=PROBE_SCHEMA, tool_name=domain) where domain comes from _domains. |
"." not in action.tool_name AND action.tool_name in {"airline","cab","restaurant","hotel","payment"}. Encodes the "PROBE_SCHEMA carries bare domain, NOT domain.verb" invariant from models.md §3.5. (Note: models.py itself does not validate this; the property asserts the TEST FIXTURES obey the contract — a regression-canary against a fixture drifting to "airline.search" by accident.) |
models.md §3.5 PROBE_SCHEMA row |
| P4 | test_frozen_invariant_universal |
For every dataclass type in {DriftCallAction, ToolResult, DriftEvent, GoalSpec, DriftCallObservation, DriftCallState}, generate a valid instance using the type-specific strategy, then for each field name on the class, assert setattr(instance, field_name, ...) raises FrozenInstanceError. |
The frozen guarantee is universal across the module. Stronger than the per-class U8/U15/U20/U25/U30/U36 tests because it is strategy-driven rather than single-example. | models.md §3.1, §5 row 1 |
| P5 | test_json_roundtrip_preserves_equality_for_action |
Generate DriftCallAction with random valid fields (Unicode text up to 200 chars for rationale, nested JSON dicts for tool_args, random ActionType). Serialize via json.dumps(dataclasses.asdict(a), ensure_ascii=False), parse, rebuild (re-coerce action_type to ActionType). |
Rebuilt action == original for every generated example. Verifies the models.md §3.4 round-trip invariant holds under adversarial shape generation. |
models.md §3.4 invariant |
| P6 | test_observation_budget_non_negative_by_construction |
Generate max_turns ∈ [1, 16], turn ∈ [0, max_turns], then budget_remaining = max_turns - turn. Build observation. |
obs.budget_remaining >= 0 always. (models.py does not enforce; the property ensures our fixture arithmetic respects models.md §3.5 budget_remaining row.) |
models.md §3.5 budget row, §4.6 budget row |
Property test count: 6.
Notes on properties intentionally not tested here:
- "
DriftCallState.turnmonotone non-decreasing across a step sequence" — that is a runtime property ofenv.step, tested inenv_tests.md. At themodels.pylayer the property is trivially "by construction a single instance has one turn value", which P1 already witnesses by rebuilding valid states at every turn. - "
DriftEvent.from_version != to_version" — enforced by drift injector, not bymodels.py. Tested indrift_injector_tests.md. - "
DriftCallAction.confidence ∈ [0, 1]when SUBMIT" — enforced byenv.step, tested inenv_tests.md.
Each of these deferrals matches models.md §3.5's "Enforced by" column (not models.py).
3. Integration tests
N/A — defer to env_tests.md.
models.py has no IO, no network, no filesystem, no cross-module composition. Every meaningful interaction (e.g. "step consumes action, emits observation") belongs to env.py and is tested in DRIFTCALL/docs/tests/env_tests.md once that test plan is authored. This section is deliberately empty; the pointer is the deliverable.
4. Coverage target
Line coverage: 100% on driftcall/models.py.
Branch coverage: ≥ 95% on driftcall/models.py.
models.py is effectively branchless — the only "branches" are the implicit ones the dataclass decorator generates in __init__, __repr__, __eq__, and __hash__. Python's coverage.py with branch=True will report those auto-generated lines; our unit and property tests hit every one:
models.py region |
Hit by |
|---|---|
class ActionType(str, Enum): ... — 6 member lines |
U1, U2, U3 (every member is named or sampled). |
@dataclass(frozen=True) class DriftCallAction: ... — 6 field lines + auto-generated __init__ / __eq__ / __repr__ / __hash__ |
U5–U13, U38, P4, P5. |
@dataclass(frozen=True) class ToolResult: ... — 5 field lines + auto-methods |
U14–U18, U39, P4. |
@dataclass(frozen=True) class DriftEvent: ... — 6 field lines + auto-methods |
U19–U22, P2, P4. |
@dataclass(frozen=True) class GoalSpec: ... — 6 field lines + auto-methods |
U23–U27, P4. |
@dataclass(frozen=True) class DriftCallObservation: ... — 9 field lines + auto-methods |
U28–U32, U40, P4, P6. |
@dataclass(frozen=True) class DriftCallState: ... — 10 field lines + auto-methods |
U33–U37, P1, P2, P4. |
__all__ = [...] list |
Imported in every test module; trivially covered. |
Verification command (per CLAUDE.md §6):
python3 -m pytest tests/test_models.py tests/test_models_properties.py \
--cov=driftcall.models --cov-branch --cov-report=term-missing
Gate: if either line-coverage < 100% or branch-coverage < 95%, the PR is blocked and the missing lines/branches are added as new unit tests before proceeding.
5. Fixtures
All fixtures live in DRIFTCALL/tests/conftest.py and are imported by every test file that needs models. They are pytest fixtures (not plain functions) so hypothesis strategies can reuse them via st.builds. Each fixture returns a frozen dataclass built with spec-valid defaults; keyword arguments override specific fields for variation. Naming follows valid_<thing>[_variant] convention.
5.1 valid_goal_spec
import pytest
from driftcall.models import GoalSpec
@pytest.fixture
def valid_goal_spec() -> GoalSpec:
"""A spec-valid Hinglish airline booking goal. Matches models.md §8.3."""
return GoalSpec(
domain="airline",
intent="book_flight",
slots={"from": "HYD", "to": "BLR", "when": "2026-04-25"},
constraints={"budget_inr": 8000, "time_window": "evening"},
language="hinglish",
seed_utterance="Bhai Friday ko Bangalore jaana hai, 8000 rupees max, 6pm ke baad",
)
Reused by: U23, U26, U28, U33, P6, and every downstream test file (test_env.py, test_rewards.py, test_vendors.py).
5.2 valid_drift_event
from driftcall.models import DriftEvent
@pytest.fixture
def valid_drift_event_factory():
"""Factory fixture so tests can override turn/domain. Matches models.md §8.4."""
def _build(turn: int = 3, domain: str = "airline") -> DriftEvent:
return DriftEvent(
turn=turn,
drift_type="schema",
domain=domain,
description="field 'price' renamed to 'total_fare_inr'; 'currency' removed",
from_version="v1",
to_version="v2",
)
return _build
Also provided as a plain (non-factory) valid_drift_event fixture with defaults turn=3, domain="airline" for convenience.
Reused by: U19, U20, U21, U22, P2, and drift_injector_tests.md / env_tests.md.
5.3 valid_tool_result
from driftcall.models import ToolResult
from typing import Any
@pytest.fixture
def valid_tool_result_factory():
"""Factory fixture so tests can override status/response. Matches models.md §8.2."""
def _build(
status: str = "ok",
response: dict[str, Any] | None = None,
tool_name: str = "airline.search",
schema_version: str = "v1",
latency_ms: int = 142,
) -> ToolResult:
if response is None:
response = (
{
"results": [
{
"flight_id": "6E-2345",
"from": "HYD",
"to": "BLR",
"depart": "2026-04-25T18:30:00+05:30",
"price": 7200,
"currency": "INR",
"seats_left": 14,
}
]
}
if status == "ok"
else {"error_code": status.upper()}
)
return ToolResult(
tool_name=tool_name,
status=status, # type: ignore[arg-type]
response=response,
schema_version=schema_version,
latency_ms=latency_ms,
)
return _build
A plain valid_tool_result fixture with status="ok" defaults is also exposed.
Reused by: U14, U15, U16, U17, U18, U29, U39, and test_vendors.py, test_rewards.py, test_env.py.
5.4 Supporting fixtures (bonus, for the downstream test files)
These are included here because the fixtures module must ship as one unit, and the briefing names them "reusable across other modules' tests":
valid_tool_call_action()— returns the §8.1DriftCallAction(airline.search with slots). Consumed by U5, U6, U11, U34.valid_submit_action(confidence: float = 0.87)— builds a SUBMIT action; consumed by U6, U13.valid_observation_reset(valid_goal_spec)— builds the §8.3 turn-0 observation; consumed by U28, U29, U30, U31, U32, U40, P6.valid_state_reset(valid_goal_spec)— builds a turn-0DriftCallStatewithmax_turns=12, empty action/drift tuples,done=False; consumed by U33, U34, U35, U36, U37, P1.
Fixture count (fixtures that must exist in conftest.py): 7.
valid_goal_specvalid_drift_event(plain)valid_drift_event_factoryvalid_tool_result(plain)valid_tool_result_factoryvalid_tool_call_actionvalid_submit_actionvalid_observation_resetvalid_state_reset
(The briefing requires the first three by name; the remaining four are the minimal set required to write the 40 unit tests above without duplicating construction boilerplate. All 9 ship together.)
6. Execution plan
conftest.pyfixtures land first (no tests will even collect without them).test_models.pyis authored top-down: U1 → U40, committed in 4 logical chunks (ActionType + DriftCallAction,ToolResult + DriftEvent,GoalSpec + DriftCallObservation,DriftCallState + JSON round-trip). Each chunk is a separate commit with passingpytest -v+ coverage output.test_models_properties.pylands last, after all unit tests are green. Hypothesis shrinking behavior requires the fixtures and strategies to be stable first.- Coverage gate:
pytest --cov=driftcall.models --cov-branchmust show 100% line / ≥ 95% branch before PR. - Ruff + mypy --strict run on both test files (CLAUDE.md §6 commands). The test files themselves must obey the same hygiene bar as production code.
7. Traceability summary
| models.md clause | Covered by |
|---|---|
| §2 (Interface) | U5–U37 (every dataclass built with full spec signature) |
| §3.1 Immutability | U8, U15, U20, U25, U30, U36, P4 |
| §3.2 Equality & hashing | U4, U11, U12, U18, U21, U26, U31, U37 |
§3.3 replace pattern |
U34, U35 |
| §3.4 Serialization round-trip | U38, U39, U40, P5 |
| §3.5 Field invariants (constructible-at-boundary subset) | P1, P2, P3, P6 |
| §4.1–4.7 (per-class field tables) | U1–U37 (one class per subsection) |
| §5 Error modes | U8, U10, U12, U15, U18, U20, U25, U26, U30, U31, U36, U37 |
| §7 Edge 1 (empty tuples at turn 0) | U28, U29 |
| §7 Edge 2 (empty strings vs None) | U28 |
| §7 Edge 3 (Unicode round-trip) | U7, U27, U38 |
| §7 Edge 4 (nested tool_args) | U38, U39 |
| §7 Edge 5 (large history) | N/A at models layer — deferred to env_tests.md §integration |
| §7 Edge 6 (SUBMIT + confidence=None) | U13 |
| §7 Edge 7 (non-ok with empty response) | U17 |
| §7 Edge 8 (from_version == to_version) | N/A at models layer — deferred to drift_injector_tests.md |
§7 Edge 9 (len(actions) != turn) |
P1 (witnesses constructibility of the good case; violation-detection deferred to env_tests.md) |
| §7 Edge 10 (auth-drifted tool still listed) | N/A at models layer — deferred to env_tests.md |
| §8.1–8.4 (worked examples) | U5, U14, U19, U28, U33, U34 |
Every models.md clause is either directly tested here or explicitly deferred to a named downstream test plan with a reason. No clause is silently unaddressed.
8. Open questions
None. The test surface for models.py is bounded by the module's pure-shape nature: 40 unit tests + 6 properties + 9 fixtures + 100%/95% coverage exhaustively cover every field, every frozen guarantee, every hashability rule, every JSON round-trip invariant, and every edge case that can be witnessed at construction time. All edge cases requiring runtime enforcement (env.step validation, drift injector rules) are explicitly routed to their home test plans.