Spaces:

CallMeDaniel
/

neuralcad

Sleeping

App Files Files Community

CallMeDaniel Claude Opus 4.6 (1M context) commited on Apr 12

Commit

1923201

1 Parent(s): 46fb80d

docs: add pydantic unification implementation plan

Browse files

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Files changed (1) hide show

docs/superpowers/plans/2026-04-13-pydantic-unification.md +1828 -0

docs/superpowers/plans/2026-04-13-pydantic-unification.md ADDED Viewed

	@@ -0,0 +1,1828 @@

+# Pydantic Unification Implementation Plan
+> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
+**Goal:** Replace all raw dicts, dataclasses, and untyped state passing with Pydantic models across the NeuralCAD codebase.
+**Architecture:** Bottom-up migration: define new models first, convert existing dataclasses, then update function signatures and consumers from core -> agents -> server. Each task produces a working, test-passing codebase.
+**Tech Stack:** Pydantic v2 (BaseModel), pytest, FastAPI
+---
+## File Structure
+| File | Changes |
+|------|---------|
+| `core/cam.py` | Add `ToolConfig` model; update `CAMResult.tool_config`, `CAMPlan.to_tool_config()`, `generate_gcode()` |
+| `core/types.py` | Remove `AgentResponse` and `ChatResult` dataclasses; keep enums + `LLMBackend` ABC |
+| `core/pipeline.py` | Convert `PipelineResult` from dataclass to Pydantic |
+| `agents/definitions.py` | Convert `AgentDef` from dataclass to Pydantic |
+| `agents/agent_flow.py` | Add `PreviewData` and `ChatTurnResponse` models |
+| `agents/gap_analyzer.py` | Change `analyze_gaps()` to accept `list[AgentResponse]` |
+| `agents/design_state.py` | Change `update_from_messages()` / `extract_decisions()` to accept `list[AgentResponse]` |
+| `agents/tools.py` | Change `set/get_design_state()` to use `DesignState`; update `GenerateGcodeTool` |
+| `agents/base.py` | Update `BaseOrchestrator.chat_turn()` signature |
+| `agents/orchestrator.py` | Replace `_format_response()` with `AgentResponse.from_agent()`; update `MockChatBackend` |
+| `agents/crew_orchestrator.py` | Return `ChatTurnResponse` throughout; stop `.model_dump()` serialization of intermediate models |
+| `server/routes.py` | Type `design_state` and `plan` fields in request models; return `ChatTurnResponse` |
+| `server/mcp.py` | Use typed models in MCP tool responses |
+| `tests/test_cam.py` | Update for `ToolConfig` |
+| `tests/test_types.py` | Remove `AgentResponse`/`ChatResult` tests; keep enum + ABC tests |
+| `tests/test_gap_analyzer.py` | Use `AgentResponse` objects instead of dicts |
+| `tests/test_design_state.py` | Use `AgentResponse` objects instead of dicts |
+| `tests/test_mock_orchestrator.py` | Assert on `ChatTurnResponse` attributes instead of dict keys |
+| `tests/test_crew_orchestrator.py` | Assert on `ChatTurnResponse` attributes instead of dict keys |
+| `tests/test_api_routes.py` | Verify JSON shape still matches (routes serialize for HTTP) |
+| `tests/conftest.py` | Update `populated_design_state` fixture to return `DesignState` |
+---
+### Task 1: Add ToolConfig Model to core/cam.py
+**Files:**
+- Modify: `core/cam.py:12-19` (CAMResult), `core/cam.py:32-39` (CAMPlan.to_tool_config), `core/cam.py:45-50` (_get_default_tool_config), `core/cam.py:63-68` (generate_gcode)
+- Modify: `agents/tools.py:99-109` (GenerateGcodeTool._run)
+- Test: `tests/test_cam.py`
+- [ ] **Step 1: Write failing test for ToolConfig**
+In `tests/test_cam.py`, add after `TestCAMPlan`:
+```python
+from core.cam import ToolConfig
+class TestToolConfig:
+    def test_default_values(self):
+        tc = ToolConfig()
+        assert tc.diameter == 6.0
+        assert tc.h_feed == 800
+        assert tc.v_feed == 200
+        assert tc.speed == 18000
+    def test_custom_values(self):
+        tc = ToolConfig(diameter=3.0, h_feed=400, v_feed=100, speed=24000)
+        assert tc.diameter == 3.0
+        assert tc.h_feed == 400
+    def test_model_dump(self):
+        tc = ToolConfig()
+        d = tc.model_dump()
+        assert d == {"diameter": 6.0, "h_feed": 800, "v_feed": 200, "speed": 18000}
+```
+- [ ] **Step 2: Run test to verify it fails**
+Run: `cd /home/daniel/NeuralCAD && python -m pytest tests/test_cam.py::TestToolConfig -v`
+Expected: FAIL with `ImportError: cannot import name 'ToolConfig'`
+- [ ] **Step 3: Create ToolConfig model and update CAMResult, CAMPlan, generate_gcode**
+In `core/cam.py`, add the `ToolConfig` model after the imports, before `CAMResult`:
+```python
+class ToolConfig(BaseModel):
+    """CNC tool configuration for G-code generation."""
+    diameter: float = 6.0
+    h_feed: float = 800
+    v_feed: float = 200
+    speed: float = 18000
+```
+Change `CAMResult.tool_config` from `dict` to `ToolConfig`:
+```python
+class CAMResult(BaseModel):
+    """Result of G-code generation from a CadQuery shape."""
+    success: bool
+    gcode: str | None = None
+    operations: list[str] = Field(default_factory=list)
+    tool_config: ToolConfig = Field(default_factory=ToolConfig)
+    post_processor: str = "grbl"
+    error: str | None = None
+```
+Change `CAMPlan.to_tool_config()` to return `ToolConfig`:
+```python
+    def to_tool_config(self) -> ToolConfig:
+        """Convert to ToolConfig for generate_gcode()."""
+        return ToolConfig(
+            diameter=self.tool_diameter,
+            h_feed=self.tool_h_feed,
+            v_feed=self.tool_v_feed,
+            speed=self.tool_speed,
+        )
+```
+Change `_get_default_tool_config()` to return `ToolConfig`:
+```python
+def _get_default_tool_config() -> ToolConfig:
+    """Load default roughing tool config from config.yaml cam section."""
+    roughing = settings.cam.tools.get("roughing")
+    if roughing:
+        return ToolConfig(**roughing.model_dump())
+    return ToolConfig()
+```
+Change `generate_gcode()` signature:
+```python
+def generate_gcode(
+    shape,
+    operations: list[str],
+    tool_config: ToolConfig | None = None,
+    post_processor: str | None = None,
+    stock_offset_mm: float | None = None,
+) -> CAMResult:
+```
+And update the body to access `tool_config.diameter` etc. instead of `.get()`:
+```python
+        tool = Endmill(
+            diameter=tool_config.diameter,
+            h_feed=tool_config.h_feed,
+            v_feed=tool_config.v_feed,
+            speed=tool_config.speed,
+        )
+```
+- [ ] **Step 4: Update GenerateGcodeTool in agents/tools.py**
+Change `GenerateGcodeTool._run()` (line 99-109):
+```python
+    def _run(self, operations: list[str], tool_diameter: float = 6.0, post_processor: str = "grbl") -> str:
+        from core.cam import generate_gcode, ToolConfig
+        shape = get_last_shape()
+        if shape is None:
+            return json.dumps({"success": False, "error": "No shape available. Run Execute CadQuery Code first."})
+        tool_config = ToolConfig(diameter=tool_diameter, h_feed=800, v_feed=200, speed=18000)
+        result = generate_gcode(
+            shape=shape, operations=operations,
+            tool_config=tool_config, post_processor=post_processor,
+        )
+        return json.dumps(result.model_dump(), indent=2)
+```
+- [ ] **Step 5: Update existing CAM tests for ToolConfig type**
+In `tests/test_cam.py`, update `TestCAMResult`:
+```python
+class TestCAMResult:
+    def test_success_result(self):
+        r = CAMResult(
+            success=True,
+            gcode="G21 G90\nG00 X0 Y0 Z10\nM30",
+            operations=["pocket", "profile"],
+            tool_config=ToolConfig(diameter=6, h_feed=800),
+            post_processor="grbl",
+        )
+        assert r.success is True
+        assert "G21" in r.gcode
+        assert r.operations == ["pocket", "profile"]
+        assert r.error is None
+    def test_failure_result(self):
+        r = CAMResult(success=False, error="ocp-freecad-cam not available")
+        assert r.success is False
+        assert r.gcode is None
+        assert r.error == "ocp-freecad-cam not available"
+    def test_model_dump(self):
+        r = CAMResult(
+            success=True, gcode="G21 G90\nM30",
+            operations=["pocket"], tool_config=ToolConfig(diameter=6),
+        )
+        d = r.model_dump()
+        assert d["success"] is True
+        assert d["gcode"] == "G21 G90\nM30"
+        assert d["operations"] == ["pocket"]
+        assert d["tool_config"]["diameter"] == 6
+        assert d["post_processor"] == "grbl"
+        assert d["error"] is None
+```
+Update `TestGenerateGcode`:
+```python
+class TestGenerateGcode:
+    def test_returns_failure_when_ocp_not_available(self):
+        mock_shape = MagicMock()
+        result = generate_gcode(
+            shape=mock_shape,
+            operations=["pocket"],
+            tool_config=ToolConfig(diameter=6, h_feed=800, v_feed=200, speed=18000),
+            post_processor="grbl",
+        )
+        assert result.success is False
+        assert "not available" in result.error.lower() or "not installed" in result.error.lower()
+    def test_returns_failure_on_empty_operations(self):
+        mock_shape = MagicMock()
+        result = generate_gcode(shape=mock_shape, operations=[])
+        assert result.success is False
+        assert "no operations" in result.error.lower()
+    def test_uses_default_tool_config_when_none(self):
+        mock_shape = MagicMock()
+        result = generate_gcode(shape=mock_shape, operations=["pocket"], tool_config=None)
+        assert result.tool_config is not None
+        assert result.tool_config.diameter > 0
+    def test_uses_default_post_processor(self):
+        mock_shape = MagicMock()
+        result = generate_gcode(shape=mock_shape, operations=["pocket"])
+        assert result.post_processor == "grbl"
+```
+Update `TestCAMPlan.test_to_tool_config`:
+```python
+    def test_to_tool_config(self):
+        plan = CAMPlan(operations=["pocket"])
+        config = plan.to_tool_config()
+        assert isinstance(config, ToolConfig)
+        assert config.diameter == 6.0
+        assert config.h_feed == 800
+        assert config.v_feed == 200
+        assert config.speed == 18000
+```
+- [ ] **Step 6: Run all CAM and tool tests**
+Run: `cd /home/daniel/NeuralCAD && python -m pytest tests/test_cam.py tests/test_tools.py -v`
+Expected: ALL PASS
+- [ ] **Step 7: Commit**
+```bash
+git add core/cam.py agents/tools.py tests/test_cam.py
+git commit -m "refactor: add ToolConfig pydantic model, replace tool_config dicts"
+```
+---
+### Task 2: Convert AgentDef from Dataclass to Pydantic
+**Files:**
+- Modify: `agents/definitions.py`
+- Test: `tests/test_types.py` (verify no breakage from import changes)
+- [ ] **Step 1: Write failing test for AgentDef as Pydantic model**
+In `tests/test_types.py`, add:
+```python
+from agents.definitions import AgentDef
+class TestAgentDefModel:
+    def test_create(self):
+        ad = AgentDef(id="design", name="Design", role="Designer", color="#fff", avatar="D", goal="g", backstory="b")
+        assert ad.id == "design"
+        assert ad.name == "Design"
+    def test_model_dump(self):
+        ad = AgentDef(id="cad", name="CAD", role="Coder", color="#000", avatar="C", goal="g", backstory="b")
+        d = ad.model_dump()
+        assert d["id"] == "cad"
+        assert "role" in d
+```
+- [ ] **Step 2: Run test to verify it fails**
+Run: `cd /home/daniel/NeuralCAD && python -m pytest tests/test_types.py::TestAgentDefModel -v`
+Expected: FAIL with `AttributeError: 'AgentDef' object has no attribute 'model_dump'`
+- [ ] **Step 3: Convert AgentDef to Pydantic BaseModel**
+In `agents/definitions.py`, replace:
+```python
+from dataclasses import dataclass
+from config.settings import settings
+@dataclass
+class AgentDef:
+    """Definition of a chat agent."""
+    id: str
+    name: str
+    role: str
+    color: str
+    avatar: str
+    goal: str
+    backstory: str
+```
+With:
+```python
+from pydantic import BaseModel
+from config.settings import settings
+class AgentDef(BaseModel):
+    """Definition of a chat agent."""
+    id: str
+    name: str
+    role: str
+    color: str
+    avatar: str
+    goal: str
+    backstory: str
+```
+- [ ] **Step 4: Run tests**
+Run: `cd /home/daniel/NeuralCAD && python -m pytest tests/test_types.py::TestAgentDefModel tests/test_agent_flow.py -v`
+Expected: ALL PASS
+- [ ] **Step 5: Commit**
+```bash
+git add agents/definitions.py tests/test_types.py
+git commit -m "refactor: convert AgentDef from dataclass to Pydantic BaseModel"
+```
+---
+### Task 3: Convert PipelineResult from Dataclass to Pydantic
+**Files:**
+- Modify: `core/pipeline.py:28-56`
+- Test: `tests/test_pipeline.py`
+- [ ] **Step 1: Write failing test for PipelineResult.model_dump**
+In `tests/test_pipeline.py`, add (or modify existing):
+```python
+from core.pipeline import PipelineResult
+from core.executor import ExecutionResult
+class TestPipelineResultModel:
+    def test_model_dump(self):
+        exec_result = ExecutionResult(success=True, volume=1000.0, bounding_box=(10, 10, 10), face_count=6, edge_count=12)
+        pr = PipelineResult(
+            prompt="test",
+            generated_code="code",
+            execution=exec_result,
+            retry_count=0,
+        )
+        d = pr.model_dump()
+        assert d["prompt"] == "test"
+        assert d["retry_count"] == 0
+    def test_default_exported_files(self):
+        exec_result = ExecutionResult(success=False, error="fail")
+        pr = PipelineResult(prompt="test", generated_code="code", execution=exec_result)
+        assert pr.exported_files == {}
+        assert pr.validation is None
+```
+- [ ] **Step 2: Run test to verify it fails**
+Run: `cd /home/daniel/NeuralCAD && python -m pytest tests/test_pipeline.py::TestPipelineResultModel -v`
+Expected: FAIL with `AttributeError: 'PipelineResult' object has no attribute 'model_dump'`
+- [ ] **Step 3: Convert PipelineResult to Pydantic**
+In `core/pipeline.py`, replace the dataclass:
+```python
+from dataclasses import dataclass
+from pathlib import Path
+from typing import Optional
+```
+With:
+```python
+from pathlib import Path
+from typing import Optional
+from pydantic import BaseModel, Field
+```
+Replace the `PipelineResult` class:
+```python
+class PipelineResult(BaseModel):
+    model_config = {"arbitrary_types_allowed": True}
+    prompt: str
+    generated_code: str
+    execution: ExecutionResult
+    validation: Optional[CNCValidationResult] = None
+    exported_files: dict[str, Path] = Field(default_factory=dict)
+    retry_count: int = 0
+    def summary(self) -> str:
+        lines = [
+            "=" * 60,
+            "TEXT-TO-CNC PIPELINE RESULT",
+            "=" * 60,
+            f"Prompt: {self.prompt}",
+            f"Retries: {self.retry_count}",
+            "",
+            "-- Execution --",
+            self.execution.summary(),
+            "",
+        ]
+        if self.validation:
+            lines += ["-- CNC Validation --", self.validation.summary(), ""]
+        if self.exported_files:
+            lines += ["-- Exported Files --"]
+            for fmt, path in self.exported_files.items():
+                lines.append(f"  {fmt.upper()}: {path}")
+        lines.append("=" * 60)
+        return "\n".join(lines)
+```
+Note: `arbitrary_types_allowed` is needed because `ExecutionResult` contains `cq.Workplane`.
+- [ ] **Step 4: Run tests**
+Run: `cd /home/daniel/NeuralCAD && python -m pytest tests/test_pipeline.py -v`
+Expected: ALL PASS
+- [ ] **Step 5: Commit**
+```bash
+git add core/pipeline.py tests/test_pipeline.py
+git commit -m "refactor: convert PipelineResult from dataclass to Pydantic BaseModel"
+```
+---
+### Task 4: Remove Duplicate Dataclasses from core/types.py
+**Files:**
+- Modify: `core/types.py:1-87`
+- Modify: `tests/test_types.py`
+The `AgentResponse` dataclass in `core/types.py` is duplicated by the Pydantic version in `agents/agent_flow.py`. The `ChatResult` dataclass will be replaced by `ChatTurnResponse` in Task 6. Nothing in the codebase imports `AgentResponse` or `ChatResult` from `core/types.py` (orchestrators import from `agents/agent_flow.py`).
+- [ ] **Step 1: Verify no imports of AgentResponse/ChatResult from core/types**
+Run: `cd /home/daniel/NeuralCAD && grep -rn "from core.types import.*AgentResponse\|from core.types import.*ChatResult" --include="*.py" | grep -v test | grep -v __pycache__`
+Expected: No matches (only test files reference them)
+- [ ] **Step 2: Remove AgentResponse and ChatResult from core/types.py**
+Replace `core/types.py` with:
+```python
+"""Shared types, enums, and ABCs for NeuralCAD."""
+from __future__ import annotations
+from abc import ABC, abstractmethod
+from enum import Enum
+from pathlib import Path
+class BackendName(str, Enum):
+    MOCK = "mock"
+    ANTHROPIC = "anthropic"
+    OPENAI = "openai"
+    GEMINI = "gemini"
+class AgentId(str, Enum):
+    DESIGN = "design"
+    ENGINEERING = "engineering"
+    CNC = "cnc"
+    CAD = "cad"
+class LLMBackend(ABC):
+    """Abstract base class for LLM code generation backends."""
+    @abstractmethod
+    def generate(self, messages: list[dict]) -> str:
+        """Generate text from a list of messages."""
+        ...
+    def generate_with_image(self, messages: list[dict], image_path: str | Path) -> str:
+        """Generate text from messages that include an image."""
+        raise NotImplementedError(
+            f"{type(self).__name__} does not support image input"
+        )
+    @staticmethod
+    def split_system_message(messages: list[dict]) -> tuple[str, list[dict]]:
+        """Extract system message from a message list."""
+        system_msg = ""
+        user_messages = []
+        for m in messages:
+            if m["role"] == "system":
+                system_msg = m["content"]
+            else:
+                user_messages.append(m)
+        return system_msg, user_messages
+```
+- [ ] **Step 3: Update test_types.py**
+Remove `TestAgentResponse` and `TestChatResult` classes. Keep `TestEnums`, `TestLLMBackendABC`, and the new `TestAgentDefModel`. Update imports:
+```python
+"""Tests for core/types.py — enums and ABC."""
+import pytest
+from core.types import BackendName, AgentId, LLMBackend
+from agents.definitions import AgentDef
+class TestEnums:
+    def test_backend_names(self):
+        assert BackendName.MOCK == "mock"
+        assert BackendName.ANTHROPIC == "anthropic"
+        assert BackendName.OPENAI == "openai"
+        assert BackendName.GEMINI == "gemini"
+    def test_agent_ids(self):
+        assert AgentId.DESIGN == "design"
+        assert AgentId.ENGINEERING == "engineering"
+        assert AgentId.CNC == "cnc"
+        assert AgentId.CAD == "cad"
+    def test_backend_name_is_string(self):
+        assert isinstance(BackendName.MOCK, str)
+        assert BackendName.MOCK in {"mock", "anthropic"}
+class TestAgentDefModel:
+    def test_create(self):
+        ad = AgentDef(id="design", name="Design", role="Designer", color="#fff", avatar="D", goal="g", backstory="b")
+        assert ad.id == "design"
+        assert ad.name == "Design"
+    def test_model_dump(self):
+        ad = AgentDef(id="cad", name="CAD", role="Coder", color="#000", avatar="C", goal="g", backstory="b")
+        d = ad.model_dump()
+        assert d["id"] == "cad"
+        assert "role" in d
+class TestLLMBackendABC:
+    def test_cannot_instantiate(self):
+        with pytest.raises(TypeError):
+            LLMBackend()
+    def test_subclass_must_implement_generate(self):
+        class Incomplete(LLMBackend):
+            pass
+        with pytest.raises(TypeError):
+            Incomplete()
+    def test_subclass_with_generate(self):
+        class Complete(LLMBackend):
+            def generate(self, messages):
+                return "ok"
+        b = Complete()
+        assert b.generate([]) == "ok"
+    def test_split_system_message(self):
+        msgs = [
+            {"role": "system", "content": "You are a bot"},
+            {"role": "user", "content": "hello"},
+        ]
+        system, rest = LLMBackend.split_system_message(msgs)
+        assert system == "You are a bot"
+        assert len(rest) == 1
+        assert rest[0]["role"] == "user"
+    def test_split_system_message_no_system(self):
+        msgs = [{"role": "user", "content": "hello"}]
+        system, rest = LLMBackend.split_system_message(msgs)
+        assert system == ""
+        assert len(rest) == 1
+```
+- [ ] **Step 4: Run tests**
+Run: `cd /home/daniel/NeuralCAD && python -m pytest tests/test_types.py -v`
+Expected: ALL PASS
+- [ ] **Step 5: Commit**
+```bash
+git add core/types.py tests/test_types.py
+git commit -m "refactor: remove duplicate AgentResponse/ChatResult dataclasses from core/types"
+```
+---
+### Task 5: Add PreviewData and ChatTurnResponse Models
+**Files:**
+- Modify: `agents/agent_flow.py` (add models after `AgentFlowState`)
+- Test: `tests/test_agent_flow.py`
+- [ ] **Step 1: Write failing tests for PreviewData and ChatTurnResponse**
+In `tests/test_agent_flow.py`, add:
+```python
+from agents.agent_flow import PreviewData, ChatTurnResponse
+from agents.design_state import DesignState
+from agents.gap_analyzer import QuestionCard
+class TestPreviewData:
+    def test_success_preview(self):
+        p = PreviewData(
+            success=True,
+            part_name="bracket",
+            stl_url="/api/models/bracket.stl",
+            step_url="/api/models/bracket.step",
+            execution={"success": True, "volume_mm3": 1000.0},
+            validation={"machinable": True, "axis_recommendation": "3-axis"},
+        )
+        assert p.success is True
+        assert p.part_name == "bracket"
+    def test_failure_preview(self):
+        p = PreviewData(success=False, error="Execution failed")
+        assert p.success is False
+        assert p.error == "Execution failed"
+    def test_model_dump(self):
+        p = PreviewData(success=True, part_name="gear")
+        d = p.model_dump()
+        assert d["success"] is True
+        assert d["cam"] is None
+        assert d["gcode_url"] is None
+class TestChatTurnResponse:
+    def test_minimal(self):
+        r = ChatTurnResponse(design_state=DesignState())
+        assert r.responses == []
+        assert r.preview is None
+        assert r.question_cards == []
+    def test_full(self):
+        resp = AgentResponse(agent_id="design", agent_name="D", message="hi", color="#fff", avatar="D")
+        preview = PreviewData(success=True, part_name="test")
+        state = DesignState(material="aluminum")
+        card = QuestionCard(category="material", question="What material?", responsible_agent="engineering", agent_name="Eng", agent_color="#00e676")
+        r = ChatTurnResponse(responses=[resp], preview=preview, design_state=state, question_cards=[card])
+        assert len(r.responses) == 1
+        assert r.preview.part_name == "test"
+        assert r.design_state.material == "aluminum"
+        assert len(r.question_cards) == 1
+    def test_model_dump_roundtrip(self):
+        state = DesignState(part_name="bracket", material="steel")
+        r = ChatTurnResponse(design_state=state)
+        d = r.model_dump()
+        assert d["design_state"]["part_name"] == "bracket"
+        assert d["responses"] == []
+        assert d["preview"] is None
+```
+- [ ] **Step 2: Run test to verify it fails**
+Run: `cd /home/daniel/NeuralCAD && python -m pytest tests/test_agent_flow.py::TestPreviewData -v`
+Expected: FAIL with `ImportError: cannot import name 'PreviewData'`
+- [ ] **Step 3: Add PreviewData and ChatTurnResponse to agents/agent_flow.py**
+After the `AgentFlowState.model_rebuild()` line (line 89), add:
+```python
+from agents.gap_analyzer import QuestionCard
+class PreviewData(BaseModel):
+    """Preview data for a generated CAD model, sent to the frontend."""
+    success: bool
+    part_name: str = ""
+    stl_url: str = ""
+    step_url: str = ""
+    threemf_url: str = ""
+    execution: dict = Field(default_factory=dict)
+    validation: dict = Field(default_factory=dict)
+    cam: dict | None = None
+    gcode_url: str | None = None
+    error: str | None = None
+class ChatTurnResponse(BaseModel):
+    """Unified response envelope from all orchestrator chat_turn() methods."""
+    responses: list[AgentResponse] = Field(default_factory=list)
+    preview: PreviewData | None = None
+    design_state: "DesignState"
+    question_cards: list[QuestionCard] = Field(default_factory=list)
+```
+Add the forward-ref resolution after the class (similar to CAMPlan pattern):
+```python
+from agents.design_state import DesignState  # noqa: E402
+ChatTurnResponse.model_rebuild()
+```
+- [ ] **Step 4: Run tests**
+Run: `cd /home/daniel/NeuralCAD && python -m pytest tests/test_agent_flow.py -v`
+Expected: ALL PASS
+- [ ] **Step 5: Commit**
+```bash
+git add agents/agent_flow.py tests/test_agent_flow.py
+git commit -m "feat: add PreviewData and ChatTurnResponse pydantic models"
+```
+---
+### Task 6: Update analyze_gaps and extract_decisions to Accept AgentResponse
+**Files:**
+- Modify: `agents/gap_analyzer.py:88` (`analyze_gaps` signature)
+- Modify: `agents/design_state.py:136-140` (`update_from_messages` signature), `agents/design_state.py:275-281` (`extract_decisions` signature)
+- Test: `tests/test_gap_analyzer.py`, `tests/test_design_state.py`
+- [ ] **Step 1: Update test_gap_analyzer.py to use AgentResponse objects**
+Replace all dict literals with `AgentResponse` objects. Import at top:
+```python
+from agents.agent_flow import AgentResponse
+```
+Replace every test's response dicts. For example, in `test_no_gaps_when_no_not_ready`:
+```python
+    def test_no_gaps_when_no_not_ready(self):
+        responses = [
+            AgentResponse(agent_id="design", agent_name="Design", message="I suggest an L-bracket design.", color="#7c3aed", avatar="DA"),
+            AgentResponse(agent_id="engineering", agent_name="Engineering", message="Aluminum 6061 would work well.", color="#2979ff", avatar="EA"),
+        ]
+        result = analyze_gaps(responses)
+        assert not result.has_gaps
+        assert result.missing_items == []
+```
+For `test_detects_not_ready_from_cad`:
+```python
+    def test_detects_not_ready_from_cad(self):
+        responses = [
+            AgentResponse(agent_id="cad", agent_name="CAD", message="NOT READY: Need dimensions (width, height) and material selection.", color="#ffab40", avatar="CC"),
+        ]
+        result = analyze_gaps(responses)
+        assert result.has_gaps
+        categories = [item.category for item in result.missing_items]
+        assert "dimension" in categories
+        assert "material" in categories
+```
+Apply the same pattern to all other test methods: `test_detects_not_ready_from_cnc`, `test_detects_not_ready_from_cam`, `test_deduplicates_across_agents`, `test_case_insensitive_not_ready`, `test_no_false_positive_on_regular_message`. Each dict `{"agent_id": ..., "message": ...}` becomes `AgentResponse(agent_id=..., agent_name="X", message=..., color="#000", avatar="X")`.
+- [ ] **Step 2: Update test_design_state.py to use AgentResponse objects**
+Import at top:
+```python
+from agents.agent_flow import AgentResponse
+```
+Replace every response dict in `TestExtractDecisions`. For example:
+```python
+    def test_extracts_material(self):
+        responses = [
+            AgentResponse(agent_id="engineering", agent_name="Engineering", message="I recommend aluminum 6061 for this application.", color="#2979ff", avatar="EA"),
+        ]
+        state = extract_decisions(responses, DesignState())
+        assert "aluminum" in state.material.lower()
+```
+Apply to all: `test_extracts_fastener_features`, `test_extracts_axis_recommendation`, `test_preserves_existing_state`, `test_extracts_decisions_from_agreement`, `test_no_duplicate_features`. Each dict becomes an `AgentResponse(...)`.
+- [ ] **Step 3: Run tests to verify they fail**
+Run: `cd /home/daniel/NeuralCAD && python -m pytest tests/test_gap_analyzer.py tests/test_design_state.py -v`
+Expected: FAIL — `analyze_gaps` and `extract_decisions` still expect dicts
+- [ ] **Step 4: Update analyze_gaps signature in gap_analyzer.py**
+Change the function signature and body. In `agents/gap_analyzer.py`:
+Add import at top:
+```python
+from __future__ import annotations
+from typing import TYPE_CHECKING
+if TYPE_CHECKING:
+    from agents.agent_flow import AgentResponse as AgentResponseType
+```
+Wait — we need to avoid circular imports here. `agents/agent_flow.py` imports from `agents/definitions.py` and `config/settings.py`. `agents/gap_analyzer.py` imports from `config/settings.py` and `agents/definitions.py`. There's no circular dependency — `gap_analyzer` doesn't import `agent_flow` and vice versa. So we can import directly.
+Change `analyze_gaps`:
+```python
+def analyze_gaps(responses: list[AgentResponse]) -> GapAnalysis:
+```
+Add the import at the top of the file:
+```python
+from agents.agent_flow import AgentResponse
+```
+Update the body — change `response.get("message", "")` to `response.message` and `response.get("agent_id", "")` to `response.agent_id`:
+```python
+    for response in responses:
+        message: str = response.message
+        agent_id: str = response.agent_id
+```
+- [ ] **Step 5: Update update_from_messages and extract_decisions in design_state.py**
+In `agents/design_state.py`, add a forward-reference import to avoid circular import (design_state is imported by agent_flow):
+```python
+from __future__ import annotations
+from typing import TYPE_CHECKING
+if TYPE_CHECKING:
+    from agents.agent_flow import AgentResponse
+```
+Change `update_from_messages` signature:
+```python
+    def update_from_messages(
+        self,
+        agent_responses: list[AgentResponse],
+        user_message: str = "",
+    ) -> DesignState:
+```
+Update body — change `r.get("message", "")` to `r.message`:
+```python
+        all_text = user_message + " " + " ".join(r.message for r in agent_responses)
+```
+And in the decisions extraction loop:
+```python
+        for resp in agent_responses:
+            msg = resp.message
+```
+Change `extract_decisions` wrapper:
+```python
+def extract_decisions(
+    agent_responses: list[AgentResponse],
+    current_state: DesignState,
+    user_message: str = "",
+) -> DesignState:
+```
+Note: `design_state.py` already has `from __future__ import annotations` at line 1, so the TYPE_CHECKING import will work fine for type hints without runtime import.
+- [ ] **Step 6: Run tests**
+Run: `cd /home/daniel/NeuralCAD && python -m pytest tests/test_gap_analyzer.py tests/test_design_state.py -v`
+Expected: ALL PASS
+- [ ] **Step 7: Commit**
+```bash
+git add agents/gap_analyzer.py agents/design_state.py tests/test_gap_analyzer.py tests/test_design_state.py
+git commit -m "refactor: type analyze_gaps and extract_decisions with AgentResponse"
+```
+---
+### Task 7: Update ContextVar State to Use DesignState
+**Files:**
+- Modify: `agents/tools.py:29-43` (ContextVar and accessors)
+- Modify: `agents/tools.py:112-178` (QueryDesignStateTool)
+- Test: `tests/test_tools.py`
+- [ ] **Step 1: Update ContextVar and accessors in agents/tools.py**
+Change lines 29-43:
+```python
+from agents.design_state import DesignState
+_last_shape_var: ContextVar[object | None] = ContextVar("last_shape", default=None)
+_design_state_var: ContextVar[DesignState | None] = ContextVar("design_state", default=None)
+def set_last_shape(shape):
+    _last_shape_var.set(shape)
+def get_last_shape():
+    return _last_shape_var.get()
+def set_design_state(state: DesignState):
+    _design_state_var.set(state)
+def get_design_state() -> DesignState | None:
+    return _design_state_var.get()
+```
+- [ ] **Step 2: Update QueryDesignStateTool to use DesignState directly**
+Change `QueryDesignStateTool._run()` to stop reconstructing from dict:
+```python
+    def _run(self, check: str = "all") -> str:
+        from agents.design_state import compute_score
+        from config.settings import settings
+        if check not in VALID_CHECKS:
+            return json.dumps({"error": f"Invalid check: {check!r}. Valid: {sorted(VALID_CHECKS)}"})
+        state = get_design_state()
+        if state is None:
+            return json.dumps({"error": "No design state available."})
+        score = compute_score(state)
+        threshold = settings.planning.threshold
+        known = {}
+        missing = []
+        # ... rest unchanged — state is already a DesignState ...
+```
+Remove the `from agents.design_state import DesignState` line inside `_run()` (it's now at module level) and remove `state = DesignState(**state_dict)` line — `state` is already a `DesignState`.
+- [ ] **Step 3: Run tests**
+Run: `cd /home/daniel/NeuralCAD && python -m pytest tests/test_tools.py -v`
+Expected: ALL PASS
+- [ ] **Step 4: Commit**
+```bash
+git add agents/tools.py
+git commit -m "refactor: type ContextVar design state as DesignState"
+```
+---
+### Task 8: Update BaseOrchestrator and MockChatBackend
+**Files:**
+- Modify: `agents/base.py`
+- Modify: `agents/orchestrator.py`
+- Test: `tests/test_mock_orchestrator.py`, `tests/test_base_orchestrator.py`
+- [ ] **Step 1: Update BaseOrchestrator.chat_turn signature**
+In `agents/base.py`:
+```python
+"""Base orchestrator — abstract interface for all chat orchestrators."""
+from __future__ import annotations
+from abc import ABC, abstractmethod
+from pathlib import Path
+from typing import TYPE_CHECKING
+from config.settings import settings
+if TYPE_CHECKING:
+    from agents.agent_flow import ChatTurnResponse
+    from agents.design_state import DesignState
+class BaseOrchestrator(ABC):
+    """Abstract base for MockChatBackend and CrewOrchestrator."""
+    def __init__(self, output_dir: Path | str | None = None):
+        self.output_dir = Path(output_dir) if output_dir else settings.output_dir
+        self.output_dir.mkdir(parents=True, exist_ok=True)
+    @abstractmethod
+    def chat_turn(
+        self,
+        message: str,
+        history: list[dict],
+        mentions: list[str] | None = None,
+        design_state: DesignState | None = None,
+        plan_context: bool = False,
+    ) -> ChatTurnResponse:
+        """Run one chat turn. Returns ChatTurnResponse."""
+        ...
+```
+- [ ] **Step 2: Update _format_response and MockChatBackend in orchestrator.py**
+Replace `_format_response()` with usage of `AgentResponse.from_agent()`. In `agents/orchestrator.py`:
+Remove the `_format_response` function entirely.
+Add import:
+```python
+from agents.agent_flow import AgentResponse, ChatTurnResponse, PreviewData
+```
+Update `_execute_cad_code` to return `PreviewData | None`:
+```python
+def _execute_cad_code(
+    code: str,
+    prompt: str,
+    output_dir: Path,
+    backend: object | None = None,
+    max_retries: int = 2,
+    cam_plan: "CAMPlan | None" = None,
+) -> PreviewData | None:
+```
+Replace the dict construction with `PreviewData(...)`:
+```python
+    if not exec_result.success:
+        return PreviewData(success=False, error=exec_result.error)
+    # ...
+    preview_data = PreviewData(
+        success=True,
+        part_name=part_name,
+        stl_url=f"/api/models/{part_name}.stl",
+        step_url=f"/api/models/{part_name}.step",
+        execution=exec_result.model_dump(by_alias=True),
+        validation=validation.model_dump(),
+    )
+    if cam_plan:
+        cam_operations = cam_plan.operations
+        cam_tool = cam_plan.to_tool_config()
+        cam_post = cam_plan.post_processor
+        if cam_operations:
+            cam_result = generate_gcode(
+                shape=exec_result.result,
+                operations=cam_operations,
+                tool_config=cam_tool,
+                post_processor=cam_post,
+            )
+            preview_data.cam = cam_result.model_dump()
+            if cam_result.success and cam_result.gcode:
+                gcode_path = output_dir / f"{part_name}.gcode"
+                gcode_path.write_text(cam_result.gcode)
+                preview_data.gcode_url = f"/api/models/{part_name}.gcode"
+    return preview_data
+```
+Update `MockChatBackend.chat_turn()`:
+```python
+    def chat_turn(
+        self,
+        message: str,
+        history: list[dict],
+        mentions: list[str] | None = None,
+        max_history: int = 30,
+        design_state: DesignState | None = None,
+        plan_context: bool = False,
+    ) -> ChatTurnResponse:
+        """Return ChatTurnResponse."""
+        state = design_state if isinstance(design_state, DesignState) else DesignState(**(design_state or {}))
+        lower = message.lower()
+        if mentions:
+            active = mentions
+        else:
+            active = route_agents(message, mentions=[], is_approved_phase=False)
+        responses: list[AgentResponse] = []
+        preview = None
+        if "design" in active:
+            responses.append(AgentResponse.from_agent("design", self._design_response(lower)))
+        if "engineering" in active:
+            responses.append(AgentResponse.from_agent("engineering", self._engineering_response(lower)))
+        if "cnc" in active:
+            responses.append(AgentResponse.from_agent("cnc", self._cnc_response(lower)))
+        if "cad" in active:
+            from core.cadquery_prompts import build_messages
+            mock = MockBackend()
+            code = mock.generate(build_messages(message))
+            responses.append(
+                AgentResponse.from_agent("cad", "Model generated. Click the 3D viewer to inspect it.", code=code)
+            )
+            preview = _execute_cad_code(code, message, self.output_dir)
+        updated_state = extract_decisions(responses, state, message)
+        return ChatTurnResponse(responses=responses, preview=preview, design_state=updated_state)
+```
+- [ ] **Step 3: Update test_mock_orchestrator.py**
+Tests now assert on `ChatTurnResponse` attributes:
+```python
+"""Tests for agents/orchestrator.py — MockChatBackend and helpers."""
+from agents.orchestrator import MockChatBackend
+from agents.agent_flow import AgentResponse, ChatTurnResponse
+from agents.definitions import AGENTS
+class TestMockChatBackend:
+    def test_response_shape(self, tmp_output_dir):
+        mock = MockChatBackend(output_dir=tmp_output_dir)
+        result = mock.chat_turn("I need a bracket", history=[])
+        assert isinstance(result, ChatTurnResponse)
+        assert isinstance(result.responses, list)
+        assert len(result.responses) > 0
+        assert isinstance(result.responses[0], AgentResponse)
+    def test_bracket_routes_to_design(self, tmp_output_dir):
+        mock = MockChatBackend(output_dir=tmp_output_dir)
+        result = mock.chat_turn("Design a mounting bracket", history=[])
+        agent_ids = [r.agent_id for r in result.responses]
+        assert "design" in agent_ids
+    def test_mention_overrides_routing(self, tmp_output_dir):
+        mock = MockChatBackend(output_dir=tmp_output_dir)
+        result = mock.chat_turn("What do you think?", history=[], mentions=["cnc"])
+        agent_ids = [r.agent_id for r in result.responses]
+        assert agent_ids == ["cnc"]
+    def test_cad_mention_generates_code(self, tmp_output_dir):
+        mock = MockChatBackend(output_dir=tmp_output_dir)
+        result = mock.chat_turn("Generate a 50mm cube", history=[], mentions=["cad"])
+        agent_ids = [r.agent_id for r in result.responses]
+        assert "cad" in agent_ids
+        cad_resp = next(r for r in result.responses if r.agent_id == "cad")
+        assert cad_resp.code is not None
+        assert "result" in cad_resp.code
+    def test_design_state_updated(self, tmp_output_dir):
+        mock = MockChatBackend(output_dir=tmp_output_dir)
+        result = mock.chat_turn("Make it 60mm wide in aluminum", history=[])
+        assert result.design_state is not None
+    def test_engineering_keywords_trigger_engineering(self, tmp_output_dir):
+        mock = MockChatBackend(output_dir=tmp_output_dir)
+        result = mock.chat_turn("Use M6 bolts with 3mm wall thickness", history=[])
+        agent_ids = [r.agent_id for r in result.responses]
+        assert "engineering" in agent_ids
+    def test_cnc_keywords_trigger_cnc(self, tmp_output_dir):
+        mock = MockChatBackend(output_dir=tmp_output_dir)
+        result = mock.chat_turn("Can this be machined on a CNC mill?", history=[])
+        agent_ids = [r.agent_id for r in result.responses]
+        assert "cnc" in agent_ids
+    def test_generic_message_default_agents(self, tmp_output_dir):
+        mock = MockChatBackend(output_dir=tmp_output_dir)
+        result = mock.chat_turn("Hello there", history=[])
+        agent_ids = [r.agent_id for r in result.responses]
+        assert "design" in agent_ids
+        assert "engineering" in agent_ids
+```
+- [ ] **Step 4: Run tests**
+Run: `cd /home/daniel/NeuralCAD && python -m pytest tests/test_mock_orchestrator.py tests/test_base_orchestrator.py -v`
+Expected: ALL PASS
+- [ ] **Step 5: Commit**
+```bash
+git add agents/base.py agents/orchestrator.py tests/test_mock_orchestrator.py
+git commit -m "refactor: update BaseOrchestrator and MockChatBackend to return ChatTurnResponse"
+```
+---
+### Task 9: Update CrewOrchestrator to Return ChatTurnResponse
+**Files:**
+- Modify: `agents/crew_orchestrator.py`
+- Test: `tests/test_crew_orchestrator.py`
+This is the largest change. The crew orchestrator currently builds dicts at multiple points and serializes AgentResponse objects to dicts. We reverse that: keep everything as typed models.
+- [ ] **Step 1: Update CrewOrchestrator imports**
+At the top of `agents/crew_orchestrator.py`, update imports:
+```python
+from agents.agent_flow import AgentResponse, ChatTurnResponse, PreviewData
+```
+- [ ] **Step 2: Update chat_turn method**
+Change the signature to match the base:
+```python
+    def chat_turn(
+        self,
+        message: str,
+        history: list[dict],
+        mentions: list[str] | None = None,
+        max_history: int = 30,
+        design_state: DesignState | None = None,
+        plan_context: bool = False,
+    ) -> ChatTurnResponse:
+```
+Update the plan trigger early return:
+```python
+        state = design_state if isinstance(design_state, DesignState) else DesignState(**(design_state or {}))
+        if state.phase == "exploring" and _is_plan_trigger(message):
+            score = compute_score(state)
+            plan = DesignPlan.from_state(state, confidence_score=score)
+            state.phase = "planning"
+            state.plan = plan
+            return ChatTurnResponse(design_state=state)
+```
+Update error fallback return:
+```python
+                return ChatTurnResponse(
+                    responses=[AgentResponse.from_agent(
+                        "design",
+                        f"Backend error: {exc}. Fallback also failed: {fallback_exc}. "
+                        f"Please check that your API key is set correctly.",
+                    )],
+                    design_state=DesignState(**(design_state.model_dump() if isinstance(design_state, DesignState) else design_state or {})),
+                )
+```
+- [ ] **Step 3: Update _run_crew method**
+Change return type and body:
+```python
+    def _run_crew(
+        self,
+        message: str,
+        history: list[dict],
+        mentions: list[str] | None,
+        max_history: int,
+        design_state: DesignState | None,
+        plan_context: bool = False,
+    ) -> ChatTurnResponse:
+```
+Update `set_design_state` call — pass the DesignState directly (not `.model_dump()`):
+```python
+        set_design_state(state)
+```
+Remove the `responses = [r.model_dump() for r in agent_responses]` line. Keep responses as `list[AgentResponse]`.
+Build `PreviewData` instead of dict for preview:
+```python
+        preview = None
+        if cad_code:
+            from agents.tools import get_last_shape
+            shape = get_last_shape()
+            if shape is not None:
+                from core.executor import export_all
+                from core.validator import validate_for_cnc
+                part_name = derive_part_name(message)
+                base_path = self.output_dir / part_name
+                try:
+                    export_all(shape, base_path)
+                except Exception:
+                    pass
+                execution_data = {"success": True}
+                try:
+                    bb = shape.val().BoundingBox()
+                    execution_data["volume_mm3"] = shape.val().Volume()
+                    execution_data["bounding_box_mm"] = [bb.xlen, bb.ylen, bb.zlen]
+                    execution_data["face_count"] = len(shape.faces().vals())
+                    execution_data["edge_count"] = len(shape.edges().vals())
+                except Exception:
+                    pass
+                validation = validate_for_cnc(shape, part_name=part_name)
+                preview = PreviewData(
+                    success=True,
+                    part_name=part_name,
+                    stl_url=f"/api/models/{part_name}.stl",
+                    step_url=f"/api/models/{part_name}.step",
+                    threemf_url=f"/api/models/{part_name}.3mf",
+                    execution=execution_data,
+                    validation=validation.model_dump(),
+                )
+```
+Update G-code generation to use `preview.part_name` etc.:
+```python
+        if preview and preview.success and cam_plan:
+            from core.cam import generate_gcode
+            from agents.tools import get_last_shape
+            shape = get_last_shape()
+            if shape is not None:
+                cam_result = generate_gcode(
+                    shape=shape,
+                    operations=cam_plan.operations,
+                    tool_config=cam_plan.to_tool_config(),
+                    post_processor=cam_plan.post_processor,
+                )
+                preview.cam = cam_result.model_dump()
+                if cam_result.success and cam_result.gcode:
+                    gcode_path = self.output_dir / f"{preview.part_name}.gcode"
+                    gcode_path.write_text(cam_result.gcode)
+                    preview.gcode_url = f"/api/models/{preview.part_name}.gcode"
+```
+Pass `agent_responses` (list of `AgentResponse`) directly to `extract_decisions` and `analyze_gaps`:
+```python
+        updated_state = extract_decisions(agent_responses, state, message)
+        gap_result = analyze_gaps(agent_responses)
+        question_cards = []
+        if gap_result.has_gaps:
+            question_cards = generate_question_cards(gap_result, updated_state, user_message=message)
+```
+Update the NOT READY check to use `AgentResponse` attributes:
+```python
+        if state.phase == "approved":
+            for r in agent_responses:
+                if r.agent_id == "cad" and r.message.upper().startswith("NOT READY:"):
+                    updated_state.phase = "exploring"
+                    updated_state.plan = None
+                    break
+```
+Return `ChatTurnResponse`:
+```python
+        return ChatTurnResponse(
+            responses=agent_responses,
+            preview=preview,
+            design_state=updated_state,
+            question_cards=question_cards,
+        )
+```
+- [ ] **Step 4: Update _fallback method**
+```python
+    def _fallback(
+        self,
+        message: str,
+        history: list[dict],
+        mentions: list[str] | None,
+        max_history: int,
+        design_state: DesignState | None,
+        plan_context: bool = False,
+    ) -> ChatTurnResponse:
+        """Fall back to MockChatBackend."""
+        from agents.tools import set_design_state
+        from agents.orchestrator import MockChatBackend
+        state = design_state if isinstance(design_state, DesignState) else DesignState(**(design_state or {}))
+        state = state.update_from_messages([], user_message=message)
+        set_design_state(state)
+        mock = MockChatBackend(output_dir=self.output_dir)
+        result = mock.chat_turn(message, history, mentions, design_state=state, plan_context=plan_context)
+        if not result.question_cards:
+            gap_result = analyze_gaps(result.responses)
+            if gap_result.has_gaps:
+                result.question_cards = generate_question_cards(gap_result, state, user_message=message)
+        return result
+```
+- [ ] **Step 5: Update test_crew_orchestrator.py**
+Update tests to assert on `ChatTurnResponse` attributes:
+```python
+class TestCrewOrchestratorFallback:
+    def test_falls_back_when_crewai_unavailable(self, tmp_output_dir):
+        orch = CrewOrchestrator(backend_name="gemini", output_dir=tmp_output_dir)
+        orch._crew_available = False
+        result = orch.chat_turn("test", history=[])
+        assert isinstance(result, ChatTurnResponse)
+        assert result.preview is None or isinstance(result.preview, PreviewData)
+    def test_response_format(self, tmp_output_dir):
+        orch = CrewOrchestrator(backend_name="gemini", output_dir=tmp_output_dir)
+        orch._crew_available = False
+        result = orch.chat_turn("I need a bracket", history=[])
+        assert isinstance(result.responses, list)
+        assert isinstance(result.design_state, DesignState)
+```
+Add import at top:
+```python
+from agents.agent_flow import ChatTurnResponse, PreviewData
+from agents.design_state import DesignState
+```
+Update `TestGapAnalysis`:
+```python
+class TestGapAnalysis:
+    def test_not_ready_produces_question_cards(self):
+        orch = CrewOrchestrator(backend_name="mock")
+        result = orch.chat_turn(message="generate a bracket", history=[], design_state=None)
+        assert isinstance(result.question_cards, list)
+    def test_no_question_cards_when_no_gaps(self):
+        orch = CrewOrchestrator(backend_name="mock")
+        result = orch.chat_turn(
+            message="I need a bracket", history=[],
+            design_state=DesignState(material="aluminum", dimensions={"width": 60}),
+        )
+        assert isinstance(result.question_cards, list)
+    def test_plan_trigger_includes_question_cards_key(self):
+        orch = CrewOrchestrator(backend_name="mock")
+        result = orch.chat_turn(
+            message="show plan", history=[],
+            design_state=DesignState(material="aluminum"),
+        )
+        assert result.question_cards == []
+```
+Update `TestPlanningPhase`:
+```python
+class TestPlanningPhase:
+    def test_manual_plan_trigger(self):
+        orch = CrewOrchestrator(backend_name="mock")
+        state = DesignState(
+            part_name="bracket",
+            material="aluminum 6061",
+            dimensions={"width": 60, "height": 40, "depth": 20},
+            axis_recommendation="3-axis",
+        )
+        result = orch.chat_turn(message="show plan", history=[], design_state=state)
+        assert result.design_state.phase == "planning"
+        assert result.design_state.plan is not None
+        assert result.design_state.plan.material == "aluminum 6061"
+    def test_approved_phase_keeps_approved(self):
+        orch = CrewOrchestrator(backend_name="mock")
+        plan = DesignPlan(
+            part_name="bracket", description="test", material="aluminum",
+            dimensions={"width": 60}, features=[], constraints=[],
+            axis_recommendation="3-axis", machining_notes=[],
+            confidence_score=9.0,
+        )
+        state = DesignState(
+            phase="approved", plan=plan,
+            material="aluminum", dimensions={"width": 60},
+        )
+        result = orch.chat_turn(message="Generate the approved design", history=[], design_state=state)
+        assert isinstance(result.responses, list)
+    def test_planning_phase_resets_on_message(self):
+        orch = CrewOrchestrator(backend_name="mock")
+        plan = DesignPlan(
+            part_name="bracket", description="", material="steel",
+            dimensions={}, features=[], constraints=[],
+            axis_recommendation="", machining_notes=[],
+            confidence_score=5.0,
+        )
+        state = DesignState(phase="planning", plan=plan, material="steel")
+        result = orch.chat_turn(message="actually change the material", history=[], design_state=state)
+        assert result.design_state.phase in ("exploring", "planning")
+```
+- [ ] **Step 6: Run tests**
+Run: `cd /home/daniel/NeuralCAD && python -m pytest tests/test_crew_orchestrator.py tests/test_mock_orchestrator.py -v`
+Expected: ALL PASS
+- [ ] **Step 7: Commit**
+```bash
+git add agents/crew_orchestrator.py tests/test_crew_orchestrator.py
+git commit -m "refactor: update CrewOrchestrator to return ChatTurnResponse"
+```
+---
+### Task 10: Update Server Routes to Use Typed Models
+**Files:**
+- Modify: `server/routes.py`
+- Test: `tests/test_api_routes.py`
+The server routes are the HTTP boundary. They receive JSON (dicts) from clients and return JSON. The key changes: type request model fields, use `ChatTurnResponse.model_dump()` for JSON serialization.
+- [ ] **Step 1: Update request models in server/routes.py**
+```python
+from agents.design_state import DesignState, DesignPlan
+class ChatRequest(BaseModel):
+    message: str = Field(..., min_length=1)
+    history: list[ChatMessage] = Field(default_factory=list)
+    mentions: list[str] = Field(default_factory=list)
+    backend: str = "gemini"
+    design_state: DesignState = Field(default_factory=DesignState)
+    plan_context: bool = False
+class PlanApproveRequest(BaseModel):
+    plan: DesignPlan
+    design_state: DesignState = Field(default_factory=DesignState)
+class PlanRejectRequest(BaseModel):
+    design_state: DesignState = Field(default_factory=DesignState)
+```
+- [ ] **Step 2: Update chat endpoint**
+```python
+@router.post("/api/chat")
+async def chat(body: ChatRequest):
+    """Multi-agent chat turn."""
+    message = body.message.strip()
+    history = [m.model_dump() for m in body.history]
+    backend_name = body.backend
+    raw_mentions = body.mentions
+    if not raw_mentions:
+        message, raw_mentions = parse_mentions(message)
+    mentions = raw_mentions if raw_mentions else None
+    orchestrator = get_orchestrator(backend_name, output_dir=OUTPUT_DIR)
+    try:
+        result = orchestrator.chat_turn(
+            message=message,
+            history=history,
+            mentions=mentions,
+            design_state=body.design_state,
+            plan_context=body.plan_context,
+        )
+        return JSONResponse(result.model_dump())
+    except Exception as e:
+        import logging
+        logging.exception("Chat turn failed")
+        return JSONResponse(
+            {"error": f"Chat turn failed: {e}"},
+            status_code=500,
+        )
+```
+- [ ] **Step 3: Update plan endpoints**
+```python
+@router.post("/api/plan/approve")
+async def plan_approve(body: PlanApproveRequest):
+    """Approve (possibly edited) design plan, merge into state."""
+    plan = body.plan
+    state = body.design_state
+    state.part_name = plan.part_name
+    state.description = plan.description
+    state.material = plan.material
+    state.dimensions = dict(plan.dimensions)
+    state.features = list(plan.features)
+    state.constraints = list(plan.constraints)
+    state.axis_recommendation = plan.axis_recommendation
+    state.phase = "approved"
+    state.plan = plan
+    return JSONResponse({"design_state": state.model_dump()})
+@router.post("/api/plan/reject")
+async def plan_reject(body: PlanRejectRequest):
+    """Reject plan, reset to exploring."""
+    state = body.design_state
+    state.phase = "exploring"
+    state.plan = None
+    return JSONResponse({"design_state": state.model_dump()})
+```
+- [ ] **Step 4: Update report endpoint to access ChatMessage fields directly**
+```python
+@router.post("/api/report")
+async def report(body: ReportRequest):
+    """Generate a design report from conversation history."""
+    part_name = body.part_name
+    report_sections = [f"# Design Report: {part_name}\n"]
+    design_decisions = []
+    engineering_specs = []
+    cnc_notes = []
+    for msg in body.history:
+        if msg.agent_id == "design":
+            design_decisions.append(msg.content)
+        elif msg.agent_id == "engineering":
+            engineering_specs.append(msg.content)
+        elif msg.agent_id == "cnc":
+            cnc_notes.append(msg.content)
+    # ... rest unchanged ...
+```
+- [ ] **Step 5: Run API tests**
+Run: `cd /home/daniel/NeuralCAD && python -m pytest tests/test_api_routes.py -v`
+Expected: ALL PASS (JSON shape stays the same — Pydantic serializes to the same structure)
+- [ ] **Step 6: Commit**
+```bash
+git add server/routes.py
+git commit -m "refactor: type server route request models with DesignState/DesignPlan"
+```
+---
+### Task 11: Update conftest Fixtures and Run Full Test Suite
+**Files:**
+- Modify: `tests/conftest.py`
+- [ ] **Step 1: Update conftest fixtures**
+```python
+"""Shared fixtures for NeuralCAD tests."""
+import pytest
+from pathlib import Path
+from agents.design_state import DesignState
+@pytest.fixture
+def tmp_output_dir(tmp_path):
+    """Temporary output directory for model files."""
+    out = tmp_path / "output"
+    out.mkdir()
+    return out
+@pytest.fixture
+def sample_history():
+    """A typical multi-turn conversation history."""
+    return [
+        {"role": "user", "content": "I need a servo bracket for an MG996R"},
+        {"role": "agent", "agent_id": "design", "content": "I'd suggest an L-bracket with a servo pocket on the vertical face."},
+        {"role": "agent", "agent_id": "engineering", "content": "3mm wall thickness in aluminum 6061-T6 should handle the load."},
+        {"role": "user", "content": "Make it 60mm wide with M4 base mounting holes"},
+    ]
+@pytest.fixture
+def empty_design_state():
+    """Empty design state."""
+    return DesignState()
+@pytest.fixture
+def populated_design_state():
+    """Design state with some decisions already made."""
+    return DesignState(
+        part_name="servo_bracket",
+        material="aluminum 6061",
+        dimensions={"width": 60.0},
+        features=["4x M4 holes"],
+        decisions=["L-bracket form factor"],
+    )
+class FakeLLMBackend:
+    """A controllable fake LLM backend for testing orchestrators."""
+    def __init__(self, response: str = '{"agents": []}'):
+        self.response = response
+        self.calls: list[list[dict]] = []
+    def generate(self, messages: list[dict]) -> str:
+        self.calls.append(messages)
+        return self.response
+@pytest.fixture
+def fake_backend():
+    """FakeLLMBackend factory — call with desired JSON response."""
+    def _make(response: str = '{"agents": []}'):
+        return FakeLLMBackend(response)
+    return _make
+```
+- [ ] **Step 2: Run the full test suite**
+Run: `cd /home/daniel/NeuralCAD && python -m pytest tests/ -v --tb=short`
+Expected: ALL PASS
+- [ ] **Step 3: Commit**
+```bash
+git add tests/conftest.py
+git commit -m "refactor: update test fixtures to use Pydantic models"
+```
+---
+### Task 12: Update MCP Server to Use Typed Models
+**Files:**
+- Modify: `server/mcp.py`
+The MCP tools return JSON strings, so model usage is internal. The main change is using `ToolConfig` for the validate endpoint.
+- [ ] **Step 1: Update validate_cnc_model config parameter**
+In `server/mcp.py`, the `validate_cnc_model` function builds a config dict on line 178-181. Update to pass it through `_get_validation_config`:
+```python
+    if exec_result.success:
+        config = {
+            "min_wall_thickness_mm": min_wall_thickness_mm,
+            "max_part_size_mm": max_part_size_mm,
+        }
+        validation = validate_for_cnc(exec_result.result, part_name=part_name, config=config)
+```
+This stays as-is since `validate_for_cnc` already uses `_get_validation_config(overrides)` internally, and the config dict serves as override kwargs. No change needed here.
+- [ ] **Step 2: Run full test suite one final time**
+Run: `cd /home/daniel/NeuralCAD && python -m pytest tests/ -v --tb=short`
+Expected: ALL PASS
+- [ ] **Step 3: Commit**
+```bash
+git commit --allow-empty -m "refactor: verify MCP server compatible with pydantic unification"
+```
+---
+### Task 13: Final Cleanup — Remove Dead Imports and Verify
+**Files:**
+- All modified files
+- [ ] **Step 1: Check for dead imports**
+Run: `cd /home/daniel/NeuralCAD && python -m pytest tests/ -v --tb=short 2>&1 | head -80`
+Verify no `ImportError` or `AttributeError` warnings.
+- [ ] **Step 2: Verify no remaining dict returns from orchestrators**
+Run: `cd /home/daniel/NeuralCAD && grep -rn "-> dict" agents/base.py agents/orchestrator.py agents/crew_orchestrator.py`
+Expected: No matches (all return `ChatTurnResponse` now)
+- [ ] **Step 3: Verify no remaining dict parameters for design_state**
+Run: `cd /home/daniel/NeuralCAD && grep -rn "design_state: dict" agents/ server/`
+Expected: No matches
+- [ ] **Step 4: Run full test suite**
+Run: `cd /home/daniel/NeuralCAD && python -m pytest tests/ -v`
+Expected: ALL PASS
+- [ ] **Step 5: Final commit**
+```bash
+git add -A
+git commit -m "refactor: complete pydantic unification — all models typed, no raw dicts"
+```