CallMeDaniel Claude Opus 4.6 (1M context) commited on
Commit
052f613
·
1 Parent(s): 897ba1e

docs: add memory/planning/collab/tools implementation plan

Browse files

6 tasks: part name utility + dead code cleanup, config, tool migration
to BaseTool, history removal + Memory creation, Flow memory/planning/
collaboration integration, final validation.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

docs/superpowers/plans/2026-04-13-memory-planning-collab-tools.md ADDED
@@ -0,0 +1,907 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Memory, Planning, Collaboration & Tool Migration Plan
2
+
3
+ > **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
4
+
5
+ **Goal:** Add CrewAI memory (cross-turn recall), planning (step-by-step coordination), collaboration (agent delegation), and migrate tools to BaseTool subclasses with Pydantic schemas.
6
+
7
+ **Architecture:** Pre-integration refactoring (dead code, part name utility, config) then layer in memory/planning/collaboration on the existing AgentDispatchFlow. Tools migrate from `@tool` decorators to `BaseTool` subclasses with `args_schema`. Memory instance lives on `CrewOrchestrator`, passed to Flow as instance attr. Planning and collaboration are Crew-level flags from config.
8
+
9
+ **Tech Stack:** CrewAI 1.14 (Memory, BaseTool, Crew planning/collaboration), Pydantic BaseModel, google-generativeai embeddings
10
+
11
+ **Spec:** `docs/superpowers/specs/2026-04-12-memory-planning-collab-design.md`
12
+
13
+ ---
14
+
15
+ ### Task 1: Extract part name utility + clean dead code
16
+
17
+ **Files:**
18
+ - Create: `core/utils.py`
19
+ - Create: `tests/test_utils.py`
20
+ - Modify: `agents/prompts.py`
21
+ - Modify: `agents/crew_orchestrator.py`
22
+ - Modify: `agents/orchestrator.py`
23
+
24
+ - [ ] **Step 1: Write failing tests for derive_part_name**
25
+
26
+ ```python
27
+ # tests/test_utils.py
28
+ """Tests for core/utils.py utilities."""
29
+
30
+ from core.utils import derive_part_name
31
+
32
+
33
+ class TestDerivePartName:
34
+ def test_basic_text(self):
35
+ assert derive_part_name("servo bracket") == "servo_bracket"
36
+
37
+ def test_strips_special_chars(self):
38
+ assert derive_part_name("my part! @#$%") == "my_part_"
39
+
40
+ def test_truncates_to_max_chars(self):
41
+ result = derive_part_name("a" * 100, max_chars=10)
42
+ assert len(result) <= 10
43
+
44
+ def test_empty_string_returns_part(self):
45
+ assert derive_part_name("") == "part"
46
+
47
+ def test_special_chars_only_returns_part(self):
48
+ assert derive_part_name("@#$%^&*") == "part"
49
+
50
+ def test_lowercases(self):
51
+ assert derive_part_name("My Bracket") == "my_bracket"
52
+
53
+ def test_preserves_underscores(self):
54
+ assert derive_part_name("servo_bracket_v2") == "servo_bracket_v2"
55
+ ```
56
+
57
+ - [ ] **Step 2: Run tests to verify they fail**
58
+
59
+ Run: `pytest tests/test_utils.py -v`
60
+ Expected: FAIL — `ModuleNotFoundError: No module named 'core.utils'`
61
+
62
+ - [ ] **Step 3: Implement derive_part_name**
63
+
64
+ ```python
65
+ # core/utils.py
66
+ """Shared utility functions for NeuralCAD."""
67
+
68
+ from __future__ import annotations
69
+
70
+
71
+ def derive_part_name(text: str, max_chars: int = 40) -> str:
72
+ """Derive a filesystem-safe part name from text."""
73
+ name = text[:max_chars].strip().replace(" ", "_").lower()
74
+ name = "".join(c for c in name if c.isalnum() or c == "_")
75
+ return name or "part"
76
+ ```
77
+
78
+ - [ ] **Step 4: Run tests to verify they pass**
79
+
80
+ Run: `pytest tests/test_utils.py -v`
81
+ Expected: 7 passed
82
+
83
+ - [ ] **Step 5: Replace duplicated logic in crew_orchestrator.py**
84
+
85
+ In `agents/crew_orchestrator.py`, add import at top:
86
+ ```python
87
+ from core.utils import derive_part_name
88
+ ```
89
+
90
+ Replace lines 208-209:
91
+ ```python
92
+ part_name = message[:40].strip().replace(" ", "_").lower()
93
+ part_name = "".join(c for c in part_name if c.isalnum() or c == "_") or "part"
94
+ ```
95
+ with:
96
+ ```python
97
+ part_name = derive_part_name(message)
98
+ ```
99
+
100
+ - [ ] **Step 6: Replace duplicated logic in orchestrator.py**
101
+
102
+ In `agents/orchestrator.py`, add import at top:
103
+ ```python
104
+ from core.utils import derive_part_name
105
+ ```
106
+
107
+ Replace lines 100-104:
108
+ ```python
109
+ part_name = prompt[:40].strip().replace(" ", "_").lower()
110
+ part_name = "".join(c for c in part_name if c.isalnum() or c == "_")
111
+ if not part_name:
112
+ part_name = "part"
113
+ ```
114
+ with:
115
+ ```python
116
+ part_name = derive_part_name(prompt)
117
+ ```
118
+
119
+ - [ ] **Step 7: Clean dead code from prompts.py**
120
+
121
+ Replace the entire content of `agents/prompts.py` with only `parse_mentions()`:
122
+
123
+ ```python
124
+ """Agent prompt utilities — @mention parsing for chat messages."""
125
+
126
+ from __future__ import annotations
127
+
128
+ import re
129
+
130
+ from agents.definitions import AGENTS
131
+
132
+
133
+ def parse_mentions(message: str) -> tuple[str, list[str]]:
134
+ """Extract @mentions from a message and return cleaned message + mention list.
135
+
136
+ Returns:
137
+ (cleaned_message, mentions) where mentions is list of agent IDs.
138
+ """
139
+ mentions = []
140
+ cleaned = message
141
+
142
+ for agent_id in AGENTS:
143
+ pattern = rf"@{agent_id}\b"
144
+ if re.search(pattern, message, re.IGNORECASE):
145
+ mentions.append(agent_id)
146
+ cleaned = re.sub(pattern, "", cleaned, flags=re.IGNORECASE).strip()
147
+
148
+ return cleaned, mentions
149
+ ```
150
+
151
+ - [ ] **Step 8: Run full test suite**
152
+
153
+ Run: `pytest tests/ -x -q`
154
+ Expected: All pass (prompts tests for parse_mentions still pass, dead function tests removed automatically since they import from deleted functions)
155
+
156
+ - [ ] **Step 9: Commit**
157
+
158
+ ```bash
159
+ git add core/utils.py tests/test_utils.py agents/prompts.py agents/crew_orchestrator.py agents/orchestrator.py
160
+ git commit -m "refactor: extract derive_part_name, remove dead code from prompts.py"
161
+ ```
162
+
163
+ ---
164
+
165
+ ### Task 2: Add memory/crew config to settings
166
+
167
+ **Files:**
168
+ - Modify: `config/settings.py`
169
+ - Modify: `config.yaml`
170
+
171
+ - [ ] **Step 1: Add MemoryConfig and CrewConfig to settings.py**
172
+
173
+ Add these classes BEFORE the `Settings` class in `config/settings.py`:
174
+
175
+ ```python
176
+ class MemoryConfig(BaseModel):
177
+ enabled: bool = True
178
+ embedder_provider: str = "google-generativeai"
179
+ embedder_model: str = "gemini-embedding-001"
180
+ recency_weight: float = 0.4
181
+ semantic_weight: float = 0.4
182
+ importance_weight: float = 0.2
183
+ recency_half_life_days: float = 1.0
184
+ recall_limit: int = 5
185
+ recall_depth: str = "shallow"
186
+
187
+
188
+ class CrewConfig(BaseModel):
189
+ planning: bool = True
190
+ collaboration: bool = True
191
+ ```
192
+
193
+ Add these fields to the `Settings` class (after `routing`):
194
+
195
+ ```python
196
+ memory: MemoryConfig = Field(default_factory=MemoryConfig)
197
+ crew: CrewConfig = Field(default_factory=CrewConfig)
198
+ ```
199
+
200
+ - [ ] **Step 2: Add memory and crew sections to config.yaml**
201
+
202
+ Append after the `fallback_messages` section at the end of `config.yaml`:
203
+
204
+ ```yaml
205
+
206
+ memory:
207
+ enabled: true
208
+ embedder_provider: google-generativeai
209
+ embedder_model: gemini-embedding-001
210
+ recency_weight: 0.4
211
+ semantic_weight: 0.4
212
+ importance_weight: 0.2
213
+ recency_half_life_days: 1
214
+ recall_limit: 5
215
+ recall_depth: shallow
216
+
217
+ crew:
218
+ planning: true
219
+ collaboration: true
220
+ ```
221
+
222
+ - [ ] **Step 3: Verify config loads**
223
+
224
+ Run: `python -c "from config.settings import settings; print(f'memory.enabled={settings.memory.enabled}, crew.planning={settings.crew.planning}, crew.collaboration={settings.crew.collaboration}, embedder={settings.memory.embedder_provider}')"`
225
+ Expected: `memory.enabled=True, crew.planning=True, crew.collaboration=True, embedder=google-generativeai`
226
+
227
+ - [ ] **Step 4: Run full test suite**
228
+
229
+ Run: `pytest tests/ -x -q`
230
+ Expected: All pass
231
+
232
+ - [ ] **Step 5: Commit**
233
+
234
+ ```bash
235
+ git add config/settings.py config.yaml
236
+ git commit -m "feat: add memory and crew config sections"
237
+ ```
238
+
239
+ ---
240
+
241
+ ### Task 3: Migrate tools to BaseTool subclasses
242
+
243
+ **Files:**
244
+ - Modify: `agents/tools.py`
245
+ - Modify: `tests/test_agent_flow.py` (update mocks if tool names changed)
246
+
247
+ - [ ] **Step 1: Rewrite agents/tools.py with BaseTool classes**
248
+
249
+ Replace the entire file content:
250
+
251
+ ```python
252
+ """CrewAI tools for CadQuery code execution and CNC validation.
253
+
254
+ These tools allow agents to execute code, validate manufacturability,
255
+ generate G-code, and query design state within their reasoning loop.
256
+ Uses BaseTool subclasses with Pydantic args_schema for structured input.
257
+ """
258
+
259
+ from __future__ import annotations
260
+
261
+ import json
262
+ import logging
263
+ from contextvars import ContextVar
264
+ from typing import Type
265
+
266
+ from pydantic import BaseModel, Field
267
+
268
+ logger = logging.getLogger(__name__)
269
+
270
+ try:
271
+ from crewai.tools import BaseTool
272
+ except ImportError:
273
+ class BaseTool: # type: ignore[no-redef]
274
+ name: str = ""
275
+ description: str = ""
276
+ args_schema: type | None = None
277
+ def _run(self, **kwargs) -> str:
278
+ return ""
279
+
280
+ # ── Per-request state (ContextVar — async-safe) ─────────────────────────
281
+
282
+ _last_shape_var: ContextVar[object | None] = ContextVar("last_shape", default=None)
283
+ _design_state_var: ContextVar[dict | None] = ContextVar("design_state", default=None)
284
+
285
+
286
+ def set_last_shape(shape):
287
+ """Set the last executed CadQuery shape."""
288
+ _last_shape_var.set(shape)
289
+
290
+
291
+ def get_last_shape():
292
+ """Get the last executed CadQuery shape."""
293
+ return _last_shape_var.get()
294
+
295
+
296
+ def set_design_state(state_dict: dict):
297
+ """Set the current design state."""
298
+ _design_state_var.set(state_dict)
299
+
300
+
301
+ def get_design_state() -> dict | None:
302
+ """Get the current design state."""
303
+ return _design_state_var.get()
304
+
305
+
306
+ # ── Tool input schemas ──────────────────────────────────────────────────
307
+
308
+ class ExecuteCadInput(BaseModel):
309
+ code: str = Field(..., description="CadQuery Python code. Must assign result to `result` as cq.Workplane. Import cadquery as cq.")
310
+
311
+
312
+ class ValidateCadInput(BaseModel):
313
+ check_type: str = Field(default="full", description="Validation type: 'full' for complete CNC manufacturability check.")
314
+
315
+
316
+ class GenerateGcodeInput(BaseModel):
317
+ operations: list[str] = Field(..., description="Ordered list of operations: adaptive, pocket, profile, face, drill, surface, waterline")
318
+ tool_diameter: float = Field(default=6.0, description="Endmill diameter in mm")
319
+ post_processor: str = Field(default="grbl", description="G-code format: grbl, linuxcnc, fanuc")
320
+
321
+
322
+ VALID_CHECKS = {"all", "material", "dimensions", "features", "constraints", "axis"}
323
+
324
+
325
+ class QueryDesignStateInput(BaseModel):
326
+ check: str = Field(default="all", description="What to check: 'all' for full state, or a specific field (material, dimensions, features, constraints, axis).")
327
+
328
+
329
+ # ── Tool implementations ────────────────────────────────────────────────
330
+
331
+ class ExecuteCadTool(BaseTool):
332
+ name: str = "Execute CadQuery Code"
333
+ description: str = "Execute CadQuery Python code and return geometry info: volume, bounding box, face count, edge count."
334
+ args_schema: Type[BaseModel] = ExecuteCadInput
335
+
336
+ def _run(self, code: str) -> str:
337
+ from core.executor import execute_cadquery
338
+ result = execute_cadquery(code)
339
+ if result.success and result.result is not None:
340
+ set_last_shape(result.result)
341
+ return json.dumps(result.model_dump(by_alias=True), indent=2)
342
+
343
+
344
+ class ValidateCadTool(BaseTool):
345
+ name: str = "Validate CNC Manufacturability"
346
+ description: str = "Run CNC manufacturability checks on the last executed shape. Returns machinable status, axis recommendation, and issues list."
347
+ args_schema: Type[BaseModel] = ValidateCadInput
348
+
349
+ def _run(self, check_type: str = "full") -> str:
350
+ from core.validator import validate_for_cnc
351
+ shape = get_last_shape()
352
+ if shape is None:
353
+ return json.dumps({"success": False, "error": "No shape available. Run Execute CadQuery Code first."})
354
+ validation = validate_for_cnc(shape)
355
+ return json.dumps({"success": True, "validation": validation.model_dump()}, indent=2)
356
+
357
+
358
+ class GenerateGcodeTool(BaseTool):
359
+ name: str = "Generate G-code Toolpath"
360
+ description: str = "Generate CNC G-code toolpath from the last executed CadQuery shape."
361
+ args_schema: Type[BaseModel] = GenerateGcodeInput
362
+
363
+ def _run(self, operations: list[str], tool_diameter: float = 6.0, post_processor: str = "grbl") -> str:
364
+ from core.cam import generate_gcode
365
+ shape = get_last_shape()
366
+ if shape is None:
367
+ return json.dumps({"success": False, "error": "No shape available. Run Execute CadQuery Code first."})
368
+ tool_config = {"diameter": tool_diameter, "h_feed": 800, "v_feed": 200, "speed": 18000}
369
+ result = generate_gcode(
370
+ shape=shape, operations=operations,
371
+ tool_config=tool_config, post_processor=post_processor,
372
+ )
373
+ return json.dumps(result.model_dump(), indent=2)
374
+
375
+
376
+ class QueryDesignStateTool(BaseTool):
377
+ name: str = "Query Design State"
378
+ description: str = "Query the orchestrator for current design state and readiness. Call BEFORE saying NOT READY to check what information is already available."
379
+ args_schema: Type[BaseModel] = QueryDesignStateInput
380
+
381
+ def _run(self, check: str = "all") -> str:
382
+ from agents.design_state import DesignState, compute_score
383
+ from config.settings import settings
384
+
385
+ if check not in VALID_CHECKS:
386
+ return json.dumps({"error": f"Invalid check: {check!r}. Valid: {sorted(VALID_CHECKS)}"})
387
+
388
+ state_dict = get_design_state()
389
+ if state_dict is None:
390
+ return json.dumps({"error": "No design state available."})
391
+
392
+ state = DesignState(**state_dict)
393
+ score = compute_score(state)
394
+ threshold = settings.planning.threshold
395
+
396
+ known = {}
397
+ missing = []
398
+
399
+ if state.part_name:
400
+ known["part_name"] = state.part_name
401
+ else:
402
+ missing.append("part_name")
403
+
404
+ if state.material:
405
+ known["material"] = state.material
406
+ else:
407
+ missing.append("material")
408
+
409
+ if state.dimensions:
410
+ known["dimensions"] = state.dimensions
411
+ else:
412
+ missing.append("dimensions")
413
+
414
+ if state.features:
415
+ known["features"] = state.features
416
+ else:
417
+ missing.append("features")
418
+
419
+ if state.constraints:
420
+ known["constraints"] = state.constraints
421
+ else:
422
+ missing.append("constraints")
423
+
424
+ if state.axis_recommendation:
425
+ known["axis_recommendation"] = state.axis_recommendation
426
+ else:
427
+ missing.append("axis_recommendation")
428
+
429
+ if state.description:
430
+ known["description"] = state.description
431
+
432
+ if state.decisions:
433
+ known["recent_decisions"] = state.decisions[-5:]
434
+
435
+ result = {
436
+ "known": known,
437
+ "missing": missing,
438
+ "readiness_score": score,
439
+ "threshold": threshold,
440
+ "ready": score >= threshold,
441
+ "phase": state.phase,
442
+ }
443
+
444
+ if check != "all" and check in known:
445
+ return json.dumps({"field": check, "value": known[check], "ready": score >= threshold})
446
+ if check != "all" and check in missing:
447
+ return json.dumps({"field": check, "value": None, "missing": True, "ready": score >= threshold})
448
+
449
+ return json.dumps(result, indent=2)
450
+ ```
451
+
452
+ - [ ] **Step 2: Update tool references in agent_flow.py**
453
+
454
+ In `agents/agent_flow.py`, change the import in `_build_crew_agent()` from:
455
+
456
+ ```python
457
+ from agents.tools import (
458
+ query_design_state_tool, execute_cad_tool,
459
+ validate_cad_tool, generate_gcode_tool,
460
+ )
461
+ ```
462
+
463
+ to:
464
+
465
+ ```python
466
+ from agents.tools import (
467
+ QueryDesignStateTool, ExecuteCadTool,
468
+ ValidateCadTool, GenerateGcodeTool,
469
+ )
470
+ ```
471
+
472
+ And change tool assignments from function references to instances:
473
+
474
+ ```python
475
+ tools = [QueryDesignStateTool()]
476
+ ...
477
+ if agent_id == "cad":
478
+ tools.extend([ExecuteCadTool(), ValidateCadTool()])
479
+ ...
480
+ elif agent_id == "cam":
481
+ tools.append(GenerateGcodeTool())
482
+ ```
483
+
484
+ - [ ] **Step 3: Run full test suite**
485
+
486
+ Run: `pytest tests/ -x -q`
487
+ Expected: All pass
488
+
489
+ - [ ] **Step 4: Commit**
490
+
491
+ ```bash
492
+ git add agents/tools.py agents/agent_flow.py
493
+ git commit -m "refactor: migrate tools to BaseTool subclasses with args_schema"
494
+ ```
495
+
496
+ ---
497
+
498
+ ### Task 4: Remove raw history + add memory to orchestrator
499
+
500
+ **Files:**
501
+ - Modify: `agents/crew_orchestrator.py`
502
+
503
+ - [ ] **Step 1: Remove raw history rendering from _build_agent_context**
504
+
505
+ Replace the `_build_agent_context` function in `agents/crew_orchestrator.py`:
506
+
507
+ ```python
508
+ def _build_agent_context(
509
+ message: str,
510
+ design_state: DesignState,
511
+ approved_plan: DesignPlan | None = None,
512
+ ) -> str:
513
+ """Build context string for agents: design spec + user message.
514
+
515
+ Raw history is no longer rendered — memory recall replaces it.
516
+ """
517
+ parts = []
518
+
519
+ if approved_plan:
520
+ parts.append(approved_plan.render_approved())
521
+ else:
522
+ spec = design_state.render()
523
+ if spec:
524
+ parts.append(f"## Current Design Spec\n{spec}")
525
+
526
+ parts.append(f"## User's latest message\n{message}")
527
+ return "\n\n".join(parts)
528
+ ```
529
+
530
+ Update the call site in `_run_crew()` — remove `history` and `max_history` args:
531
+
532
+ ```python
533
+ context = _build_agent_context(message, state, approved_plan=approved_plan)
534
+ ```
535
+
536
+ - [ ] **Step 2: Add Memory creation to __init__**
537
+
538
+ ```python
539
+ def __init__(self, backend_name: str = "gemini", output_dir=None):
540
+ super().__init__(output_dir=output_dir or DEFAULT_OUTPUT_DIR)
541
+ self.backend_name = backend_name
542
+ self._crew_available = self._check_crewai()
543
+ self._memory = self._create_memory()
544
+
545
+ def _create_memory(self):
546
+ """Create CrewAI Memory instance if enabled in config."""
547
+ if not settings.memory.enabled:
548
+ return None
549
+ try:
550
+ from crewai.memory import Memory
551
+ return Memory(
552
+ storage=str(self.output_dir / ".memory"),
553
+ embedder={
554
+ "provider": settings.memory.embedder_provider,
555
+ "config": {"model_name": settings.memory.embedder_model},
556
+ },
557
+ recency_weight=settings.memory.recency_weight,
558
+ semantic_weight=settings.memory.semantic_weight,
559
+ importance_weight=settings.memory.importance_weight,
560
+ recency_half_life_days=settings.memory.recency_half_life_days,
561
+ )
562
+ except (ImportError, Exception) as exc:
563
+ logger.warning("Memory creation failed (%s), continuing without memory", exc)
564
+ return None
565
+ ```
566
+
567
+ - [ ] **Step 3: Pass memory to Flow**
568
+
569
+ In `_run_crew()`, after creating the flow:
570
+
571
+ ```python
572
+ flow = AgentDispatchFlow(initial_state=AgentFlowState(
573
+ message=message,
574
+ context=context,
575
+ model_str=_get_crewai_model(self.backend_name),
576
+ mentions=list(mentions) if mentions else [],
577
+ is_approved_phase=is_approved,
578
+ ))
579
+ flow._memory = self._memory
580
+ flow.kickoff()
581
+ ```
582
+
583
+ - [ ] **Step 4: Run full test suite**
584
+
585
+ Run: `pytest tests/ -x -q`
586
+ Expected: All pass (fallback path doesn't use memory)
587
+
588
+ - [ ] **Step 5: Commit**
589
+
590
+ ```bash
591
+ git add agents/crew_orchestrator.py
592
+ git commit -m "feat: remove raw history, add Memory to CrewOrchestrator"
593
+ ```
594
+
595
+ ---
596
+
597
+ ### Task 5: Add memory recall/remember + planning + collaboration to Flow
598
+
599
+ **Files:**
600
+ - Modify: `agents/agent_flow.py`
601
+ - Modify: `tests/test_agent_flow.py`
602
+
603
+ - [ ] **Step 1: Write failing tests**
604
+
605
+ ```python
606
+ # tests/test_agent_flow.py — append to file
607
+ from unittest.mock import MagicMock
608
+
609
+
610
+ class TestMemoryHelpers:
611
+ def test_recall_returns_empty_when_no_memory(self):
612
+ flow = AgentDispatchFlow(initial_state=AgentFlowState(
613
+ message="bracket design",
614
+ model_str="gemini/gemini-2.5-flash",
615
+ ))
616
+ flow._memory = None
617
+ result = flow._recall_for_agent("design")
618
+ assert result == ""
619
+
620
+ def test_recall_formats_matches(self):
621
+ mock_memory = MagicMock()
622
+ mock_match = MagicMock()
623
+ mock_match.record.content = "L-bracket with fillets"
624
+ mock_memory.recall.return_value = [mock_match]
625
+
626
+ flow = AgentDispatchFlow(initial_state=AgentFlowState(
627
+ message="bracket",
628
+ model_str="gemini/gemini-2.5-flash",
629
+ ))
630
+ flow._memory = mock_memory
631
+ result = flow._recall_for_agent("design")
632
+ assert "## Relevant context from prior turns" in result
633
+ assert "L-bracket with fillets" in result
634
+ mock_memory.recall.assert_called_once()
635
+
636
+ def test_recall_returns_empty_when_no_matches(self):
637
+ mock_memory = MagicMock()
638
+ mock_memory.recall.return_value = []
639
+
640
+ flow = AgentDispatchFlow(initial_state=AgentFlowState(
641
+ message="bracket",
642
+ model_str="gemini/gemini-2.5-flash",
643
+ ))
644
+ flow._memory = mock_memory
645
+ result = flow._recall_for_agent("design")
646
+ assert result == ""
647
+
648
+ def test_remember_stores_with_scope(self):
649
+ mock_memory = MagicMock()
650
+
651
+ flow = AgentDispatchFlow(initial_state=AgentFlowState(
652
+ message="test",
653
+ model_str="gemini/gemini-2.5-flash",
654
+ ))
655
+ flow._memory = mock_memory
656
+ flow._remember_response("engineering", "Use 3mm walls in aluminum.")
657
+ mock_memory.remember.assert_called_once_with(
658
+ "Use 3mm walls in aluminum.",
659
+ scope="/agent/engineering",
660
+ )
661
+
662
+ def test_remember_noop_when_no_memory(self):
663
+ flow = AgentDispatchFlow(initial_state=AgentFlowState(
664
+ message="test",
665
+ model_str="gemini/gemini-2.5-flash",
666
+ ))
667
+ flow._memory = None
668
+ flow._remember_response("design", "test") # Should not raise
669
+
670
+
671
+ class TestCollaborationFlag:
672
+ def test_advisors_get_delegation(self):
673
+ flow = AgentDispatchFlow(initial_state=AgentFlowState(
674
+ message="test",
675
+ context="",
676
+ model_str="gemini/gemini-2.5-flash",
677
+ ))
678
+ flow._memory = None
679
+ from crewai import LLM
680
+ llm = LLM(model="gemini/gemini-2.5-flash", temperature=0.2)
681
+ agent, task = flow._build_crew_agent("design", llm)
682
+ assert agent.allow_delegation is True
683
+
684
+ def test_generators_no_delegation(self):
685
+ flow = AgentDispatchFlow(initial_state=AgentFlowState(
686
+ message="test",
687
+ context="",
688
+ model_str="gemini/gemini-2.5-flash",
689
+ ))
690
+ flow._memory = None
691
+ from crewai import LLM
692
+ llm = LLM(model="gemini/gemini-2.5-flash", temperature=0.2)
693
+ agent, task = flow._build_crew_agent("cad", llm)
694
+ assert agent.allow_delegation is False
695
+ ```
696
+
697
+ - [ ] **Step 2: Run tests to verify they fail**
698
+
699
+ Run: `pytest tests/test_agent_flow.py::TestMemoryHelpers tests/test_agent_flow.py::TestCollaborationFlag -v`
700
+ Expected: FAIL — `_recall_for_agent` not defined
701
+
702
+ - [ ] **Step 3: Add memory helpers to AgentDispatchFlow**
703
+
704
+ Add these methods to `AgentDispatchFlow` in `agents/agent_flow.py` (in the private helpers section):
705
+
706
+ ```python
707
+ _memory = None # Set by CrewOrchestrator before kickoff
708
+
709
+ def _recall_for_agent(self, agent_id: str) -> str:
710
+ """Recall relevant memories for this agent, formatted as context."""
711
+ if self._memory is None:
712
+ return ""
713
+ try:
714
+ matches = self._memory.recall(
715
+ self.state.message,
716
+ scope=f"/agent/{agent_id}",
717
+ limit=settings.memory.recall_limit,
718
+ depth=settings.memory.recall_depth,
719
+ )
720
+ except Exception:
721
+ return ""
722
+ if not matches:
723
+ return ""
724
+ lines = [f"- {m.record.content}" for m in matches]
725
+ return "## Relevant context from prior turns\n" + "\n".join(lines)
726
+
727
+ def _remember_response(self, agent_id: str, content: str):
728
+ """Store an agent's response in its scoped memory."""
729
+ if self._memory is None:
730
+ return
731
+ try:
732
+ self._memory.remember(content, scope=f"/agent/{agent_id}")
733
+ except Exception:
734
+ pass
735
+ ```
736
+
737
+ - [ ] **Step 4: Inject memories into task description**
738
+
739
+ In `_build_crew_agent()`, update the `task_description` block:
740
+
741
+ ```python
742
+ memories = self._recall_for_agent(agent_id)
743
+ task_description = (
744
+ f"{self.state.context}\n\n"
745
+ f"{memories}\n\n" if memories else f"{self.state.context}\n\n"
746
+ )
747
+ task_description += (
748
+ f"As the {agent_def.role}, respond to the user's latest message. "
749
+ f"Keep your response concise (2-4 sentences). "
750
+ f"Do NOT repeat anything from the conversation history. "
751
+ f"Add NEW information from your expertise.\n\n"
752
+ f"Build on other agents' input — agree, disagree, refine, or add."
753
+ )
754
+ ```
755
+
756
+ Wait — that ternary is awkward. Cleaner:
757
+
758
+ ```python
759
+ memories = self._recall_for_agent(agent_id)
760
+
761
+ context_parts = [self.state.context]
762
+ if memories:
763
+ context_parts.append(memories)
764
+
765
+ task_description = "\n\n".join(context_parts) + "\n\n"
766
+ task_description += (
767
+ f"As the {agent_def.role}, respond to the user's latest message. "
768
+ f"Keep your response concise (2-4 sentences). "
769
+ f"Do NOT repeat anything from the conversation history. "
770
+ f"Add NEW information from your expertise.\n\n"
771
+ f"Build on other agents' input — agree, disagree, refine, or add."
772
+ )
773
+ ```
774
+
775
+ - [ ] **Step 5: Add remember calls after agent responses**
776
+
777
+ In `_run_advisor_crew()`, after appending each response:
778
+
779
+ ```python
780
+ for i, agent_id in enumerate(advisor_ids):
781
+ raw = str(task_outputs[i]) if i < len(task_outputs) else (str(crew_result) if i == 0 else "")
782
+ if raw.strip():
783
+ responses.append(AgentResponse.from_agent(agent_id, raw.strip()))
784
+ self._remember_response(agent_id, raw.strip())
785
+ return responses
786
+ ```
787
+
788
+ In `_run_cad_step()`, after setting cad_response (add at end of method):
789
+
790
+ ```python
791
+ if self.state.cad_response is not None:
792
+ self._remember_response("cad", raw_output)
793
+ ```
794
+
795
+ In `_run_cam_step()`, after setting cam_response (add at end of method):
796
+
797
+ ```python
798
+ if self.state.cam_response is not None:
799
+ self._remember_response("cam", raw_output)
800
+ ```
801
+
802
+ - [ ] **Step 6: Enable collaboration on advisors**
803
+
804
+ In `_build_crew_agent()`, change `allow_delegation`:
805
+
806
+ ```python
807
+ crew_agent = Agent(
808
+ ...
809
+ allow_delegation=settings.crew.collaboration and agent_id in ADVISOR_IDS,
810
+ ...
811
+ )
812
+ ```
813
+
814
+ - [ ] **Step 7: Enable planning on Crews**
815
+
816
+ In `_run_advisor_crew()`:
817
+
818
+ ```python
819
+ crew = Crew(
820
+ agents=[p[0] for p in pairs],
821
+ tasks=[p[1] for p in pairs],
822
+ process=Process.sequential,
823
+ planning=settings.crew.planning,
824
+ planning_llm=self._build_llm(),
825
+ verbose=False,
826
+ )
827
+ ```
828
+
829
+ In `_run_single_agent_crew()`:
830
+
831
+ ```python
832
+ crew = Crew(
833
+ agents=[crew_agent],
834
+ tasks=[task],
835
+ process=Process.sequential,
836
+ planning=settings.crew.planning,
837
+ planning_llm=self._build_llm(),
838
+ verbose=False,
839
+ )
840
+ ```
841
+
842
+ - [ ] **Step 8: Run tests to verify they pass**
843
+
844
+ Run: `pytest tests/test_agent_flow.py -v`
845
+ Expected: All pass
846
+
847
+ - [ ] **Step 9: Run full test suite**
848
+
849
+ Run: `pytest tests/ -x -q`
850
+ Expected: All pass
851
+
852
+ - [ ] **Step 10: Commit**
853
+
854
+ ```bash
855
+ git add agents/agent_flow.py tests/test_agent_flow.py
856
+ git commit -m "feat: add memory recall/remember, planning, and collaboration to Flow"
857
+ ```
858
+
859
+ ---
860
+
861
+ ### Task 6: Final validation
862
+
863
+ **Files:**
864
+ - Verify all files
865
+
866
+ - [ ] **Step 1: Run full test suite**
867
+
868
+ Run: `pytest tests/ -v`
869
+ Expected: All tests pass
870
+
871
+ - [ ] **Step 2: Verify dead code removed**
872
+
873
+ Run: `python -c "from agents.prompts import parse_mentions; print('parse_mentions OK')"`
874
+ Expected: `parse_mentions OK`
875
+
876
+ Run: `python -c "from agents.prompts import build_orchestrator_system_prompt" 2>&1`
877
+ Expected: `ImportError` — function no longer exists
878
+
879
+ - [ ] **Step 3: Verify tools are BaseTool subclasses**
880
+
881
+ Run: `python -c "from agents.tools import ExecuteCadTool, ValidateCadTool, GenerateGcodeTool, QueryDesignStateTool; print('All BaseTool imports OK')"`
882
+ Expected: `All BaseTool imports OK`
883
+
884
+ - [ ] **Step 4: Verify memory config loads**
885
+
886
+ Run: `python -c "from config.settings import settings; print(f'memory={settings.memory.enabled}, planning={settings.crew.planning}, collab={settings.crew.collaboration}')"`
887
+ Expected: `memory=True, planning=True, collab=True`
888
+
889
+ - [ ] **Step 5: Verify Flow has memory attr**
890
+
891
+ Run: `python -c "from agents.agent_flow import AgentDispatchFlow; print(hasattr(AgentDispatchFlow, '_memory'))"`
892
+ Expected: `True`
893
+
894
+ - [ ] **Step 6: Check no stale imports**
895
+
896
+ Run: `grep -r "from agents.routing" --include="*.py" .`
897
+ Expected: No results
898
+
899
+ Run: `grep -r "@tool(" --include="*.py" agents/`
900
+ Expected: No results (all tools migrated to BaseTool)
901
+
902
+ - [ ] **Step 7: Commit**
903
+
904
+ ```bash
905
+ git add -A
906
+ git commit -m "chore: final validation after memory/planning/collab/tools integration"
907
+ ```