Spaces:

CallMeDaniel
/

neuralcad

Sleeping

CallMeDaniel Claude Opus 4.6 (1M context) commited on Apr 12

Commit

052f613

1 Parent(s): 897ba1e

docs: add memory/planning/collab/tools implementation plan

6 tasks: part name utility + dead code cleanup, config, tool migration
to BaseTool, history removal + Memory creation, Flow memory/planning/
collaboration integration, final validation.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Files changed (1) hide show

docs/superpowers/plans/2026-04-13-memory-planning-collab-tools.md +907 -0

docs/superpowers/plans/2026-04-13-memory-planning-collab-tools.md ADDED Viewed

	@@ -0,0 +1,907 @@

+# Memory, Planning, Collaboration & Tool Migration Plan
+> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
+**Goal:** Add CrewAI memory (cross-turn recall), planning (step-by-step coordination), collaboration (agent delegation), and migrate tools to BaseTool subclasses with Pydantic schemas.
+**Architecture:** Pre-integration refactoring (dead code, part name utility, config) then layer in memory/planning/collaboration on the existing AgentDispatchFlow. Tools migrate from `@tool` decorators to `BaseTool` subclasses with `args_schema`. Memory instance lives on `CrewOrchestrator`, passed to Flow as instance attr. Planning and collaboration are Crew-level flags from config.
+**Tech Stack:** CrewAI 1.14 (Memory, BaseTool, Crew planning/collaboration), Pydantic BaseModel, google-generativeai embeddings
+**Spec:** `docs/superpowers/specs/2026-04-12-memory-planning-collab-design.md`
+---
+### Task 1: Extract part name utility + clean dead code
+**Files:**
+- Create: `core/utils.py`
+- Create: `tests/test_utils.py`
+- Modify: `agents/prompts.py`
+- Modify: `agents/crew_orchestrator.py`
+- Modify: `agents/orchestrator.py`
+- [ ] **Step 1: Write failing tests for derive_part_name**
+```python
+# tests/test_utils.py
+"""Tests for core/utils.py utilities."""
+from core.utils import derive_part_name
+class TestDerivePartName:
+    def test_basic_text(self):
+        assert derive_part_name("servo bracket") == "servo_bracket"
+    def test_strips_special_chars(self):
+        assert derive_part_name("my part! @#$%") == "my_part_"
+    def test_truncates_to_max_chars(self):
+        result = derive_part_name("a" * 100, max_chars=10)
+        assert len(result) <= 10
+    def test_empty_string_returns_part(self):
+        assert derive_part_name("") == "part"
+    def test_special_chars_only_returns_part(self):
+        assert derive_part_name("@#$%^&*") == "part"
+    def test_lowercases(self):
+        assert derive_part_name("My Bracket") == "my_bracket"
+    def test_preserves_underscores(self):
+        assert derive_part_name("servo_bracket_v2") == "servo_bracket_v2"
+```
+- [ ] **Step 2: Run tests to verify they fail**
+Run: `pytest tests/test_utils.py -v`
+Expected: FAIL — `ModuleNotFoundError: No module named 'core.utils'`
+- [ ] **Step 3: Implement derive_part_name**
+```python
+# core/utils.py
+"""Shared utility functions for NeuralCAD."""
+from __future__ import annotations
+def derive_part_name(text: str, max_chars: int = 40) -> str:
+    """Derive a filesystem-safe part name from text."""
+    name = text[:max_chars].strip().replace(" ", "_").lower()
+    name = "".join(c for c in name if c.isalnum() or c == "_")
+    return name or "part"
+```
+- [ ] **Step 4: Run tests to verify they pass**
+Run: `pytest tests/test_utils.py -v`
+Expected: 7 passed
+- [ ] **Step 5: Replace duplicated logic in crew_orchestrator.py**
+In `agents/crew_orchestrator.py`, add import at top:
+```python
+from core.utils import derive_part_name
+```
+Replace lines 208-209:
+```python
+                part_name = message[:40].strip().replace(" ", "_").lower()
+                part_name = "".join(c for c in part_name if c.isalnum() or c == "_") or "part"
+```
+with:
+```python
+                part_name = derive_part_name(message)
+```
+- [ ] **Step 6: Replace duplicated logic in orchestrator.py**
+In `agents/orchestrator.py`, add import at top:
+```python
+from core.utils import derive_part_name
+```
+Replace lines 100-104:
+```python
+    part_name = prompt[:40].strip().replace(" ", "_").lower()
+    part_name = "".join(c for c in part_name if c.isalnum() or c == "_")
+    if not part_name:
+        part_name = "part"
+```
+with:
+```python
+    part_name = derive_part_name(prompt)
+```
+- [ ] **Step 7: Clean dead code from prompts.py**
+Replace the entire content of `agents/prompts.py` with only `parse_mentions()`:
+```python
+"""Agent prompt utilities — @mention parsing for chat messages."""
+from __future__ import annotations
+import re
+from agents.definitions import AGENTS
+def parse_mentions(message: str) -> tuple[str, list[str]]:
+    """Extract @mentions from a message and return cleaned message + mention list.
+    Returns:
+        (cleaned_message, mentions) where mentions is list of agent IDs.
+    """
+    mentions = []
+    cleaned = message
+    for agent_id in AGENTS:
+        pattern = rf"@{agent_id}\b"
+        if re.search(pattern, message, re.IGNORECASE):
+            mentions.append(agent_id)
+            cleaned = re.sub(pattern, "", cleaned, flags=re.IGNORECASE).strip()
+    return cleaned, mentions
+```
+- [ ] **Step 8: Run full test suite**
+Run: `pytest tests/ -x -q`
+Expected: All pass (prompts tests for parse_mentions still pass, dead function tests removed automatically since they import from deleted functions)
+- [ ] **Step 9: Commit**
+```bash
+git add core/utils.py tests/test_utils.py agents/prompts.py agents/crew_orchestrator.py agents/orchestrator.py
+git commit -m "refactor: extract derive_part_name, remove dead code from prompts.py"
+```
+---
+### Task 2: Add memory/crew config to settings
+**Files:**
+- Modify: `config/settings.py`
+- Modify: `config.yaml`
+- [ ] **Step 1: Add MemoryConfig and CrewConfig to settings.py**
+Add these classes BEFORE the `Settings` class in `config/settings.py`:
+```python
+class MemoryConfig(BaseModel):
+    enabled: bool = True
+    embedder_provider: str = "google-generativeai"
+    embedder_model: str = "gemini-embedding-001"
+    recency_weight: float = 0.4
+    semantic_weight: float = 0.4
+    importance_weight: float = 0.2
+    recency_half_life_days: float = 1.0
+    recall_limit: int = 5
+    recall_depth: str = "shallow"
+class CrewConfig(BaseModel):
+    planning: bool = True
+    collaboration: bool = True
+```
+Add these fields to the `Settings` class (after `routing`):
+```python
+    memory: MemoryConfig = Field(default_factory=MemoryConfig)
+    crew: CrewConfig = Field(default_factory=CrewConfig)
+```
+- [ ] **Step 2: Add memory and crew sections to config.yaml**
+Append after the `fallback_messages` section at the end of `config.yaml`:
+```yaml
+memory:
+  enabled: true
+  embedder_provider: google-generativeai
+  embedder_model: gemini-embedding-001
+  recency_weight: 0.4
+  semantic_weight: 0.4
+  importance_weight: 0.2
+  recency_half_life_days: 1
+  recall_limit: 5
+  recall_depth: shallow
+crew:
+  planning: true
+  collaboration: true
+```
+- [ ] **Step 3: Verify config loads**
+Run: `python -c "from config.settings import settings; print(f'memory.enabled={settings.memory.enabled}, crew.planning={settings.crew.planning}, crew.collaboration={settings.crew.collaboration}, embedder={settings.memory.embedder_provider}')"`
+Expected: `memory.enabled=True, crew.planning=True, crew.collaboration=True, embedder=google-generativeai`
+- [ ] **Step 4: Run full test suite**
+Run: `pytest tests/ -x -q`
+Expected: All pass
+- [ ] **Step 5: Commit**
+```bash
+git add config/settings.py config.yaml
+git commit -m "feat: add memory and crew config sections"
+```
+---
+### Task 3: Migrate tools to BaseTool subclasses
+**Files:**
+- Modify: `agents/tools.py`
+- Modify: `tests/test_agent_flow.py` (update mocks if tool names changed)
+- [ ] **Step 1: Rewrite agents/tools.py with BaseTool classes**
+Replace the entire file content:
+```python
+"""CrewAI tools for CadQuery code execution and CNC validation.
+These tools allow agents to execute code, validate manufacturability,
+generate G-code, and query design state within their reasoning loop.
+Uses BaseTool subclasses with Pydantic args_schema for structured input.
+"""
+from __future__ import annotations
+import json
+import logging
+from contextvars import ContextVar
+from typing import Type
+from pydantic import BaseModel, Field
+logger = logging.getLogger(__name__)
+try:
+    from crewai.tools import BaseTool
+except ImportError:
+    class BaseTool:  # type: ignore[no-redef]
+        name: str = ""
+        description: str = ""
+        args_schema: type | None = None
+        def _run(self, **kwargs) -> str:
+            return ""
+# ── Per-request state (ContextVar — async-safe) ─────────────────────────
+_last_shape_var: ContextVar[object | None] = ContextVar("last_shape", default=None)
+_design_state_var: ContextVar[dict | None] = ContextVar("design_state", default=None)
+def set_last_shape(shape):
+    """Set the last executed CadQuery shape."""
+    _last_shape_var.set(shape)
+def get_last_shape():
+    """Get the last executed CadQuery shape."""
+    return _last_shape_var.get()
+def set_design_state(state_dict: dict):
+    """Set the current design state."""
+    _design_state_var.set(state_dict)
+def get_design_state() -> dict | None:
+    """Get the current design state."""
+    return _design_state_var.get()
+# ── Tool input schemas ──────────────────────────────────────────────────
+class ExecuteCadInput(BaseModel):
+    code: str = Field(..., description="CadQuery Python code. Must assign result to `result` as cq.Workplane. Import cadquery as cq.")
+class ValidateCadInput(BaseModel):
+    check_type: str = Field(default="full", description="Validation type: 'full' for complete CNC manufacturability check.")
+class GenerateGcodeInput(BaseModel):
+    operations: list[str] = Field(..., description="Ordered list of operations: adaptive, pocket, profile, face, drill, surface, waterline")
+    tool_diameter: float = Field(default=6.0, description="Endmill diameter in mm")
+    post_processor: str = Field(default="grbl", description="G-code format: grbl, linuxcnc, fanuc")
+VALID_CHECKS = {"all", "material", "dimensions", "features", "constraints", "axis"}
+class QueryDesignStateInput(BaseModel):
+    check: str = Field(default="all", description="What to check: 'all' for full state, or a specific field (material, dimensions, features, constraints, axis).")
+# ── Tool implementations ────────────────────────────────────────────────
+class ExecuteCadTool(BaseTool):
+    name: str = "Execute CadQuery Code"
+    description: str = "Execute CadQuery Python code and return geometry info: volume, bounding box, face count, edge count."
+    args_schema: Type[BaseModel] = ExecuteCadInput
+    def _run(self, code: str) -> str:
+        from core.executor import execute_cadquery
+        result = execute_cadquery(code)
+        if result.success and result.result is not None:
+            set_last_shape(result.result)
+        return json.dumps(result.model_dump(by_alias=True), indent=2)
+class ValidateCadTool(BaseTool):
+    name: str = "Validate CNC Manufacturability"
+    description: str = "Run CNC manufacturability checks on the last executed shape. Returns machinable status, axis recommendation, and issues list."
+    args_schema: Type[BaseModel] = ValidateCadInput
+    def _run(self, check_type: str = "full") -> str:
+        from core.validator import validate_for_cnc
+        shape = get_last_shape()
+        if shape is None:
+            return json.dumps({"success": False, "error": "No shape available. Run Execute CadQuery Code first."})
+        validation = validate_for_cnc(shape)
+        return json.dumps({"success": True, "validation": validation.model_dump()}, indent=2)
+class GenerateGcodeTool(BaseTool):
+    name: str = "Generate G-code Toolpath"
+    description: str = "Generate CNC G-code toolpath from the last executed CadQuery shape."
+    args_schema: Type[BaseModel] = GenerateGcodeInput
+    def _run(self, operations: list[str], tool_diameter: float = 6.0, post_processor: str = "grbl") -> str:
+        from core.cam import generate_gcode
+        shape = get_last_shape()
+        if shape is None:
+            return json.dumps({"success": False, "error": "No shape available. Run Execute CadQuery Code first."})
+        tool_config = {"diameter": tool_diameter, "h_feed": 800, "v_feed": 200, "speed": 18000}
+        result = generate_gcode(
+            shape=shape, operations=operations,
+            tool_config=tool_config, post_processor=post_processor,
+        )
+        return json.dumps(result.model_dump(), indent=2)
+class QueryDesignStateTool(BaseTool):
+    name: str = "Query Design State"
+    description: str = "Query the orchestrator for current design state and readiness. Call BEFORE saying NOT READY to check what information is already available."
+    args_schema: Type[BaseModel] = QueryDesignStateInput
+    def _run(self, check: str = "all") -> str:
+        from agents.design_state import DesignState, compute_score
+        from config.settings import settings
+        if check not in VALID_CHECKS:
+            return json.dumps({"error": f"Invalid check: {check!r}. Valid: {sorted(VALID_CHECKS)}"})
+        state_dict = get_design_state()
+        if state_dict is None:
+            return json.dumps({"error": "No design state available."})
+        state = DesignState(**state_dict)
+        score = compute_score(state)
+        threshold = settings.planning.threshold
+        known = {}
+        missing = []
+        if state.part_name:
+            known["part_name"] = state.part_name
+        else:
+            missing.append("part_name")
+        if state.material:
+            known["material"] = state.material
+        else:
+            missing.append("material")
+        if state.dimensions:
+            known["dimensions"] = state.dimensions
+        else:
+            missing.append("dimensions")
+        if state.features:
+            known["features"] = state.features
+        else:
+            missing.append("features")
+        if state.constraints:
+            known["constraints"] = state.constraints
+        else:
+            missing.append("constraints")
+        if state.axis_recommendation:
+            known["axis_recommendation"] = state.axis_recommendation
+        else:
+            missing.append("axis_recommendation")
+        if state.description:
+            known["description"] = state.description
+        if state.decisions:
+            known["recent_decisions"] = state.decisions[-5:]
+        result = {
+            "known": known,
+            "missing": missing,
+            "readiness_score": score,
+            "threshold": threshold,
+            "ready": score >= threshold,
+            "phase": state.phase,
+        }
+        if check != "all" and check in known:
+            return json.dumps({"field": check, "value": known[check], "ready": score >= threshold})
+        if check != "all" and check in missing:
+            return json.dumps({"field": check, "value": None, "missing": True, "ready": score >= threshold})
+        return json.dumps(result, indent=2)
+```
+- [ ] **Step 2: Update tool references in agent_flow.py**
+In `agents/agent_flow.py`, change the import in `_build_crew_agent()` from:
+```python
+        from agents.tools import (
+            query_design_state_tool, execute_cad_tool,
+            validate_cad_tool, generate_gcode_tool,
+        )
+```
+to:
+```python
+        from agents.tools import (
+            QueryDesignStateTool, ExecuteCadTool,
+            ValidateCadTool, GenerateGcodeTool,
+        )
+```
+And change tool assignments from function references to instances:
+```python
+        tools = [QueryDesignStateTool()]
+        ...
+        if agent_id == "cad":
+            tools.extend([ExecuteCadTool(), ValidateCadTool()])
+        ...
+        elif agent_id == "cam":
+            tools.append(GenerateGcodeTool())
+```
+- [ ] **Step 3: Run full test suite**
+Run: `pytest tests/ -x -q`
+Expected: All pass
+- [ ] **Step 4: Commit**
+```bash
+git add agents/tools.py agents/agent_flow.py
+git commit -m "refactor: migrate tools to BaseTool subclasses with args_schema"
+```
+---
+### Task 4: Remove raw history + add memory to orchestrator
+**Files:**
+- Modify: `agents/crew_orchestrator.py`
+- [ ] **Step 1: Remove raw history rendering from _build_agent_context**
+Replace the `_build_agent_context` function in `agents/crew_orchestrator.py`:
+```python
+def _build_agent_context(
+    message: str,
+    design_state: DesignState,
+    approved_plan: DesignPlan | None = None,
+) -> str:
+    """Build context string for agents: design spec + user message.
+    Raw history is no longer rendered — memory recall replaces it.
+    """
+    parts = []
+    if approved_plan:
+        parts.append(approved_plan.render_approved())
+    else:
+        spec = design_state.render()
+        if spec:
+            parts.append(f"## Current Design Spec\n{spec}")
+    parts.append(f"## User's latest message\n{message}")
+    return "\n\n".join(parts)
+```
+Update the call site in `_run_crew()` — remove `history` and `max_history` args:
+```python
+        context = _build_agent_context(message, state, approved_plan=approved_plan)
+```
+- [ ] **Step 2: Add Memory creation to __init__**
+```python
+    def __init__(self, backend_name: str = "gemini", output_dir=None):
+        super().__init__(output_dir=output_dir or DEFAULT_OUTPUT_DIR)
+        self.backend_name = backend_name
+        self._crew_available = self._check_crewai()
+        self._memory = self._create_memory()
+    def _create_memory(self):
+        """Create CrewAI Memory instance if enabled in config."""
+        if not settings.memory.enabled:
+            return None
+        try:
+            from crewai.memory import Memory
+            return Memory(
+                storage=str(self.output_dir / ".memory"),
+                embedder={
+                    "provider": settings.memory.embedder_provider,
+                    "config": {"model_name": settings.memory.embedder_model},
+                },
+                recency_weight=settings.memory.recency_weight,
+                semantic_weight=settings.memory.semantic_weight,
+                importance_weight=settings.memory.importance_weight,
+                recency_half_life_days=settings.memory.recency_half_life_days,
+            )
+        except (ImportError, Exception) as exc:
+            logger.warning("Memory creation failed (%s), continuing without memory", exc)
+            return None
+```
+- [ ] **Step 3: Pass memory to Flow**
+In `_run_crew()`, after creating the flow:
+```python
+        flow = AgentDispatchFlow(initial_state=AgentFlowState(
+            message=message,
+            context=context,
+            model_str=_get_crewai_model(self.backend_name),
+            mentions=list(mentions) if mentions else [],
+            is_approved_phase=is_approved,
+        ))
+        flow._memory = self._memory
+        flow.kickoff()
+```
+- [ ] **Step 4: Run full test suite**
+Run: `pytest tests/ -x -q`
+Expected: All pass (fallback path doesn't use memory)
+- [ ] **Step 5: Commit**
+```bash
+git add agents/crew_orchestrator.py
+git commit -m "feat: remove raw history, add Memory to CrewOrchestrator"
+```
+---
+### Task 5: Add memory recall/remember + planning + collaboration to Flow
+**Files:**
+- Modify: `agents/agent_flow.py`
+- Modify: `tests/test_agent_flow.py`
+- [ ] **Step 1: Write failing tests**
+```python
+# tests/test_agent_flow.py — append to file
+from unittest.mock import MagicMock
+class TestMemoryHelpers:
+    def test_recall_returns_empty_when_no_memory(self):
+        flow = AgentDispatchFlow(initial_state=AgentFlowState(
+            message="bracket design",
+            model_str="gemini/gemini-2.5-flash",
+        ))
+        flow._memory = None
+        result = flow._recall_for_agent("design")
+        assert result == ""
+    def test_recall_formats_matches(self):
+        mock_memory = MagicMock()
+        mock_match = MagicMock()
+        mock_match.record.content = "L-bracket with fillets"
+        mock_memory.recall.return_value = [mock_match]
+        flow = AgentDispatchFlow(initial_state=AgentFlowState(
+            message="bracket",
+            model_str="gemini/gemini-2.5-flash",
+        ))
+        flow._memory = mock_memory
+        result = flow._recall_for_agent("design")
+        assert "## Relevant context from prior turns" in result
+        assert "L-bracket with fillets" in result
+        mock_memory.recall.assert_called_once()
+    def test_recall_returns_empty_when_no_matches(self):
+        mock_memory = MagicMock()
+        mock_memory.recall.return_value = []
+        flow = AgentDispatchFlow(initial_state=AgentFlowState(
+            message="bracket",
+            model_str="gemini/gemini-2.5-flash",
+        ))
+        flow._memory = mock_memory
+        result = flow._recall_for_agent("design")
+        assert result == ""
+    def test_remember_stores_with_scope(self):
+        mock_memory = MagicMock()
+        flow = AgentDispatchFlow(initial_state=AgentFlowState(
+            message="test",
+            model_str="gemini/gemini-2.5-flash",
+        ))
+        flow._memory = mock_memory
+        flow._remember_response("engineering", "Use 3mm walls in aluminum.")
+        mock_memory.remember.assert_called_once_with(
+            "Use 3mm walls in aluminum.",
+            scope="/agent/engineering",
+        )
+    def test_remember_noop_when_no_memory(self):
+        flow = AgentDispatchFlow(initial_state=AgentFlowState(
+            message="test",
+            model_str="gemini/gemini-2.5-flash",
+        ))
+        flow._memory = None
+        flow._remember_response("design", "test")  # Should not raise
+class TestCollaborationFlag:
+    def test_advisors_get_delegation(self):
+        flow = AgentDispatchFlow(initial_state=AgentFlowState(
+            message="test",
+            context="",
+            model_str="gemini/gemini-2.5-flash",
+        ))
+        flow._memory = None
+        from crewai import LLM
+        llm = LLM(model="gemini/gemini-2.5-flash", temperature=0.2)
+        agent, task = flow._build_crew_agent("design", llm)
+        assert agent.allow_delegation is True
+    def test_generators_no_delegation(self):
+        flow = AgentDispatchFlow(initial_state=AgentFlowState(
+            message="test",
+            context="",
+            model_str="gemini/gemini-2.5-flash",
+        ))
+        flow._memory = None
+        from crewai import LLM
+        llm = LLM(model="gemini/gemini-2.5-flash", temperature=0.2)
+        agent, task = flow._build_crew_agent("cad", llm)
+        assert agent.allow_delegation is False
+```
+- [ ] **Step 2: Run tests to verify they fail**
+Run: `pytest tests/test_agent_flow.py::TestMemoryHelpers tests/test_agent_flow.py::TestCollaborationFlag -v`
+Expected: FAIL — `_recall_for_agent` not defined
+- [ ] **Step 3: Add memory helpers to AgentDispatchFlow**
+Add these methods to `AgentDispatchFlow` in `agents/agent_flow.py` (in the private helpers section):
+```python
+    _memory = None  # Set by CrewOrchestrator before kickoff
+    def _recall_for_agent(self, agent_id: str) -> str:
+        """Recall relevant memories for this agent, formatted as context."""
+        if self._memory is None:
+            return ""
+        try:
+            matches = self._memory.recall(
+                self.state.message,
+                scope=f"/agent/{agent_id}",
+                limit=settings.memory.recall_limit,
+                depth=settings.memory.recall_depth,
+            )
+        except Exception:
+            return ""
+        if not matches:
+            return ""
+        lines = [f"- {m.record.content}" for m in matches]
+        return "## Relevant context from prior turns\n" + "\n".join(lines)
+    def _remember_response(self, agent_id: str, content: str):
+        """Store an agent's response in its scoped memory."""
+        if self._memory is None:
+            return
+        try:
+            self._memory.remember(content, scope=f"/agent/{agent_id}")
+        except Exception:
+            pass
+```
+- [ ] **Step 4: Inject memories into task description**
+In `_build_crew_agent()`, update the `task_description` block:
+```python
+        memories = self._recall_for_agent(agent_id)
+        task_description = (
+            f"{self.state.context}\n\n"
+            f"{memories}\n\n" if memories else f"{self.state.context}\n\n"
+        )
+        task_description += (
+            f"As the {agent_def.role}, respond to the user's latest message. "
+            f"Keep your response concise (2-4 sentences). "
+            f"Do NOT repeat anything from the conversation history. "
+            f"Add NEW information from your expertise.\n\n"
+            f"Build on other agents' input — agree, disagree, refine, or add."
+        )
+```
+Wait — that ternary is awkward. Cleaner:
+```python
+        memories = self._recall_for_agent(agent_id)
+        context_parts = [self.state.context]
+        if memories:
+            context_parts.append(memories)
+        task_description = "\n\n".join(context_parts) + "\n\n"
+        task_description += (
+            f"As the {agent_def.role}, respond to the user's latest message. "
+            f"Keep your response concise (2-4 sentences). "
+            f"Do NOT repeat anything from the conversation history. "
+            f"Add NEW information from your expertise.\n\n"
+            f"Build on other agents' input — agree, disagree, refine, or add."
+        )
+```
+- [ ] **Step 5: Add remember calls after agent responses**
+In `_run_advisor_crew()`, after appending each response:
+```python
+        for i, agent_id in enumerate(advisor_ids):
+            raw = str(task_outputs[i]) if i < len(task_outputs) else (str(crew_result) if i == 0 else "")
+            if raw.strip():
+                responses.append(AgentResponse.from_agent(agent_id, raw.strip()))
+                self._remember_response(agent_id, raw.strip())
+        return responses
+```
+In `_run_cad_step()`, after setting cad_response (add at end of method):
+```python
+        if self.state.cad_response is not None:
+            self._remember_response("cad", raw_output)
+```
+In `_run_cam_step()`, after setting cam_response (add at end of method):
+```python
+        if self.state.cam_response is not None:
+            self._remember_response("cam", raw_output)
+```
+- [ ] **Step 6: Enable collaboration on advisors**
+In `_build_crew_agent()`, change `allow_delegation`:
+```python
+        crew_agent = Agent(
+            ...
+            allow_delegation=settings.crew.collaboration and agent_id in ADVISOR_IDS,
+            ...
+        )
+```
+- [ ] **Step 7: Enable planning on Crews**
+In `_run_advisor_crew()`:
+```python
+        crew = Crew(
+            agents=[p[0] for p in pairs],
+            tasks=[p[1] for p in pairs],
+            process=Process.sequential,
+            planning=settings.crew.planning,
+            planning_llm=self._build_llm(),
+            verbose=False,
+        )
+```
+In `_run_single_agent_crew()`:
+```python
+        crew = Crew(
+            agents=[crew_agent],
+            tasks=[task],
+            process=Process.sequential,
+            planning=settings.crew.planning,
+            planning_llm=self._build_llm(),
+            verbose=False,
+        )
+```
+- [ ] **Step 8: Run tests to verify they pass**
+Run: `pytest tests/test_agent_flow.py -v`
+Expected: All pass
+- [ ] **Step 9: Run full test suite**
+Run: `pytest tests/ -x -q`
+Expected: All pass
+- [ ] **Step 10: Commit**
+```bash
+git add agents/agent_flow.py tests/test_agent_flow.py
+git commit -m "feat: add memory recall/remember, planning, and collaboration to Flow"
+```
+---
+### Task 6: Final validation
+**Files:**
+- Verify all files
+- [ ] **Step 1: Run full test suite**
+Run: `pytest tests/ -v`
+Expected: All tests pass
+- [ ] **Step 2: Verify dead code removed**
+Run: `python -c "from agents.prompts import parse_mentions; print('parse_mentions OK')"`
+Expected: `parse_mentions OK`
+Run: `python -c "from agents.prompts import build_orchestrator_system_prompt" 2>&1`
+Expected: `ImportError` — function no longer exists
+- [ ] **Step 3: Verify tools are BaseTool subclasses**
+Run: `python -c "from agents.tools import ExecuteCadTool, ValidateCadTool, GenerateGcodeTool, QueryDesignStateTool; print('All BaseTool imports OK')"`
+Expected: `All BaseTool imports OK`
+- [ ] **Step 4: Verify memory config loads**
+Run: `python -c "from config.settings import settings; print(f'memory={settings.memory.enabled}, planning={settings.crew.planning}, collab={settings.crew.collaboration}')"`
+Expected: `memory=True, planning=True, collab=True`
+- [ ] **Step 5: Verify Flow has memory attr**
+Run: `python -c "from agents.agent_flow import AgentDispatchFlow; print(hasattr(AgentDispatchFlow, '_memory'))"`
+Expected: `True`
+- [ ] **Step 6: Check no stale imports**
+Run: `grep -r "from agents.routing" --include="*.py" .`
+Expected: No results
+Run: `grep -r "@tool(" --include="*.py" agents/`
+Expected: No results (all tools migrated to BaseTool)
+- [ ] **Step 7: Commit**
+```bash
+git add -A
+git commit -m "chore: final validation after memory/planning/collab/tools integration"
+```