VibecoderMcSwaggins commited on
Commit
0b27b1c
·
2 Parent(s): 3753c39 21dd8fe

Merge main into dev: P2 bug fixes + CodeRabbit feedback

Browse files
docs/bugs/ACTIVE_BUGS.md CHANGED
@@ -1,6 +1,6 @@
1
  # Active Bugs
2
 
3
- > Last updated: 2025-12-03
4
  >
5
  > **Note:** Completed bug docs archived to `docs/bugs/archive/`
6
  > **See also:** [ARCHITECTURE.md](../ARCHITECTURE.md) for unified architecture plan
@@ -59,6 +59,17 @@
59
 
60
  ---
61
 
 
 
 
 
 
 
 
 
 
 
 
62
  ## Resolved Bugs (December 2025)
63
 
64
  All resolved bugs have been moved to `docs/bugs/archive/`. Summary:
@@ -81,6 +92,8 @@ All resolved bugs have been moved to `docs/bugs/archive/`. Summary:
81
  - **P1 Simple Mode Removed Breaks Free Tier UX** - FIXED via Accumulator Pattern (PR #117)
82
 
83
  ### P2 Bugs (All FIXED)
 
 
84
  - **P2 7B Model Garbage Output** - SUPERSEDED by P1 Free Tier fix (root cause was premature marker, not model capacity)
85
  - **P2 Advanced Mode Cold Start No Feedback** - FIXED, all phases complete
86
  - **P2 Architectural BYOK Gaps** - FIXED, end-to-end BYOK support in PR #119
 
1
  # Active Bugs
2
 
3
+ > Last updated: 2025-12-04
4
  >
5
  > **Note:** Completed bug docs archived to `docs/bugs/archive/`
6
  > **See also:** [ARCHITECTURE.md](../ARCHITECTURE.md) for unified architecture plan
 
59
 
60
  ---
61
 
62
+ ### P3 - Remove Modal Integration
63
+
64
+ **File:** `docs/future-roadmap/P3_MODAL_INTEGRATION_REMOVAL.md`
65
+ **Status:** OPEN - Tech Debt
66
+
67
+ **Problem:** Modal (cloud functions) is integrated in 9 files but was decided against for this project. Creates dead code paths and confusion.
68
+
69
+ **Fix:** Remove all Modal references from codebase (config, services, agents, tools).
70
+
71
+ ---
72
+
73
  ## Resolved Bugs (December 2025)
74
 
75
  All resolved bugs have been moved to `docs/bugs/archive/`. Summary:
 
92
  - **P1 Simple Mode Removed Breaks Free Tier UX** - FIXED via Accumulator Pattern (PR #117)
93
 
94
  ### P2 Bugs (All FIXED)
95
+ - **P2 Duplicate Report Content** - FIXED in PR fix/p2-double-bug-squash, stateful deduplication in `run()` loop
96
+ - **P2 First Turn Timeout** - FIXED in PR fix/p2-double-bug-squash, reduced results per tool (10→5), increased timeout (5→10 min)
97
  - **P2 7B Model Garbage Output** - SUPERSEDED by P1 Free Tier fix (root cause was premature marker, not model capacity)
98
  - **P2 Advanced Mode Cold Start No Feedback** - FIXED, all phases complete
99
  - **P2 Architectural BYOK Gaps** - FIXED, end-to-end BYOK support in PR #119
docs/bugs/P2_DUPLICATE_REPORT_CONTENT.md CHANGED
@@ -1,7 +1,7 @@
1
  # P2 Bug: Duplicate Report Content in Output
2
 
3
  **Date**: 2025-12-03
4
- **Status**: OPEN
5
  **Severity**: P2 (UX - Duplicate content confuses users)
6
  **Component**: `src/orchestrators/advanced.py`
7
  **Affects**: Both Free Tier (HuggingFace) AND Paid Tier (OpenAI)
 
1
  # P2 Bug: Duplicate Report Content in Output
2
 
3
  **Date**: 2025-12-03
4
+ **Status**: FIXED (PR fix/p2-double-bug-squash)
5
  **Severity**: P2 (UX - Duplicate content confuses users)
6
  **Component**: `src/orchestrators/advanced.py`
7
  **Affects**: Both Free Tier (HuggingFace) AND Paid Tier (OpenAI)
docs/bugs/P2_FIRST_TURN_TIMEOUT.md CHANGED
@@ -1,7 +1,7 @@
1
  # P2 Bug: First Agent Turn Exceeds Workflow Timeout
2
 
3
  **Date**: 2025-12-03
4
- **Status**: OPEN
5
  **Severity**: P2 (UX - Workflow always times out on complex queries)
6
  **Component**: `src/orchestrators/advanced.py` + `src/agents/search_agent.py`
7
  **Affects**: Both Free Tier (HuggingFace) AND Paid Tier (OpenAI)
 
1
  # P2 Bug: First Agent Turn Exceeds Workflow Timeout
2
 
3
  **Date**: 2025-12-03
4
+ **Status**: FIXED (PR fix/p2-double-bug-squash)
5
  **Severity**: P2 (UX - Workflow always times out on complex queries)
6
  **Component**: `src/orchestrators/advanced.py` + `src/agents/search_agent.py`
7
  **Affects**: Both Free Tier (HuggingFace) AND Paid Tier (OpenAI)
docs/future-roadmap/P3_MODAL_INTEGRATION_REMOVAL.md ADDED
@@ -0,0 +1,78 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # P3 Tech Debt: Modal Integration Removal
2
+
3
+ **Date**: 2025-12-04
4
+ **Status**: OPEN - Tech Debt (Future Roadmap)
5
+ **Severity**: P3 (Tech Debt - Not blocking functionality)
6
+ **Component**: Multiple files
7
+
8
+ ---
9
+
10
+ ## Executive Summary
11
+
12
+ Modal (cloud function execution platform) is integrated throughout the codebase but was decided against for this project. This creates potential confusion and dead code paths that should be cleaned up when time permits.
13
+
14
+ ---
15
+
16
+ ## Affected Files
17
+
18
+ The following files contain Modal references:
19
+
20
+ | File | Usage |
21
+ |------|-------|
22
+ | `src/utils/config.py` | `MODAL_TOKEN_ID`, `MODAL_TOKEN_SECRET` settings |
23
+ | `src/utils/service_loader.py` | Modal service initialization |
24
+ | `src/services/llamaindex_rag.py` | Modal integration for RAG |
25
+ | `src/agents/code_executor_agent.py` | Modal sandbox execution |
26
+ | `src/utils/exceptions.py` | Modal-related exceptions |
27
+ | `src/tools/code_execution.py` | Modal code execution tool |
28
+ | `src/services/statistical_analyzer.py` | Modal statistical analysis |
29
+ | `src/mcp_tools.py` | Modal MCP tool wrappers |
30
+ | `src/agents/analysis_agent.py` | Modal analysis agent |
31
+
32
+ ---
33
+
34
+ ## Context
35
+
36
+ Modal was originally integrated for:
37
+ 1. **Code Execution Sandbox**: Running untrusted code in isolated containers
38
+ 2. **Statistical Analysis**: Offloading heavy statistical computations
39
+ 3. **LlamaIndex RAG**: Premium embeddings with persistent storage
40
+
41
+ However, the project decided against Modal because:
42
+ - Added infrastructure complexity
43
+ - Free Tier doesn't need cloud functions
44
+ - Paid Tier uses OpenAI directly
45
+
46
+ ---
47
+
48
+ ## Recommended Fix
49
+
50
+ 1. Remove Modal-related code from all affected files
51
+ 2. Remove `MODAL_TOKEN_ID` and `MODAL_TOKEN_SECRET` from config
52
+ 3. Remove Modal from dependencies in `pyproject.toml`
53
+ 4. Update any documentation referencing Modal
54
+
55
+ ---
56
+
57
+ ## Impact If Not Fixed
58
+
59
+ - Confusion for new contributors
60
+ - Dead code in production
61
+ - Unnecessary dependencies
62
+ - Config settings that do nothing
63
+
64
+ ---
65
+
66
+ ## Test Plan
67
+
68
+ 1. Remove Modal code
69
+ 2. Run `make check` to ensure no breakage
70
+ 3. Verify Free Tier and Paid Tier still work
71
+ 4. Search codebase for any remaining Modal references
72
+
73
+ ---
74
+
75
+ ## Related
76
+
77
+ - `P3_REMOVE_ANTHROPIC_PARTIAL_WIRING.md` - Similar tech debt for Anthropic
78
+ - ARCHITECTURE.md - Current architecture excludes Modal
src/agents/search_agent.py CHANGED
@@ -67,7 +67,7 @@ class SearchAgent(BaseAgent): # type: ignore[misc]
67
  )
68
 
69
  # Execute search
70
- result: SearchResult = await self._handler.execute(query, max_results_per_tool=10)
71
 
72
  # Track what to show in response (initialized to search results as default)
73
  evidence_to_show: list[Evidence] = result.evidence
 
67
  )
68
 
69
  # Execute search
70
+ result: SearchResult = await self._handler.execute(query, max_results_per_tool=5)
71
 
72
  # Track what to show in response (initialized to search results as default)
73
  evidence_to_show: list[Evidence] = result.evidence
src/orchestrators/advanced.py CHANGED
@@ -301,6 +301,7 @@ The final output should be a structured research report."""
301
  # to repr(message) if text is empty. We must reconstruct text from Deltas.
302
  current_message_buffer: str = ""
303
  current_agent_id: str | None = None
 
304
 
305
  try:
306
  async with asyncio.timeout(self._timeout_seconds):
@@ -333,15 +334,21 @@ The final output should be a structured research report."""
333
  yield comp_event
334
  yield prog_event
335
 
 
 
336
  # Clear buffer after consuming
337
  current_message_buffer = ""
338
  continue
339
 
340
- # 3. Handle other events normally
 
 
 
 
 
 
341
  agent_event = self._process_event(event, iteration)
342
  if agent_event:
343
- if agent_event.type == "complete":
344
- final_event_received = True
345
  yield agent_event
346
 
347
  # GUARANTEE: Always emit termination event if stream ends without one
@@ -413,6 +420,40 @@ The final output should be a structured research report."""
413
 
414
  return completion_event, progress_event
415
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
416
  def _extract_text(self, message: Any) -> str:
417
  """
418
  Defensively extract text from a message object.
@@ -526,30 +567,12 @@ The final output should be a structured research report."""
526
  iteration=iteration,
527
  )
528
 
529
- # NOTE: MagenticAgentMessageEvent is handled in run() loop with Accumulator Pattern
530
- # (see lines 322-335) and never reaches this method due to `continue` statement.
531
-
532
- elif isinstance(event, MagenticFinalResultEvent):
533
- text = self._extract_text(event.message) if event.message else "No result"
534
- return AgentEvent(
535
- type="complete",
536
- message=text,
537
- data={"iterations": iteration},
538
- iteration=iteration,
539
- )
540
-
541
- # NOTE: MagenticAgentDeltaEvent is handled in run() loop with Accumulator Pattern
542
- # (see lines 306-320) and never reaches this method due to `continue` statement.
543
-
544
- elif isinstance(event, WorkflowOutputEvent):
545
- if event.data:
546
- # Use _extract_text to properly handle ChatMessage objects
547
- text = self._extract_text(event.data)
548
- return AgentEvent(
549
- type="complete",
550
- message=text if text else "Research complete (no synthesis)",
551
- iteration=iteration,
552
- )
553
 
554
  return None
555
 
 
301
  # to repr(message) if text is empty. We must reconstruct text from Deltas.
302
  current_message_buffer: str = ""
303
  current_agent_id: str | None = None
304
+ last_streamed_length: int = 0 # Track for P2 Duplicate Report Bug Fix
305
 
306
  try:
307
  async with asyncio.timeout(self._timeout_seconds):
 
334
  yield comp_event
335
  yield prog_event
336
 
337
+ # P2 BUG FIX: Save length before clearing
338
+ last_streamed_length = len(current_message_buffer)
339
  # Clear buffer after consuming
340
  current_message_buffer = ""
341
  continue
342
 
343
+ # 3. Handle Final Events Inline (P2 Duplicate Report Fix)
344
+ if isinstance(event, (MagenticFinalResultEvent, WorkflowOutputEvent)):
345
+ final_event_received = True
346
+ yield self._handle_final_event(event, iteration, last_streamed_length)
347
+ continue
348
+
349
+ # 4. Handle other events normally
350
  agent_event = self._process_event(event, iteration)
351
  if agent_event:
 
 
352
  yield agent_event
353
 
354
  # GUARANTEE: Always emit termination event if stream ends without one
 
420
 
421
  return completion_event, progress_event
422
 
423
+ def _handle_final_event(
424
+ self,
425
+ event: MagenticFinalResultEvent | WorkflowOutputEvent,
426
+ iteration: int,
427
+ last_streamed_length: int,
428
+ ) -> AgentEvent:
429
+ """Handle final workflow events with duplicate content suppression (P2 Bug Fix)."""
430
+ # DECISION: Did we stream substantial content?
431
+ if last_streamed_length > 100:
432
+ # YES: Final event is a SIGNAL, not a payload
433
+ return AgentEvent(
434
+ type="complete",
435
+ message="Research complete.",
436
+ data={
437
+ "iterations": iteration,
438
+ "streamed_chars": last_streamed_length,
439
+ },
440
+ iteration=iteration,
441
+ )
442
+
443
+ # NO: Final event must carry the payload (tool-only turn, cache hit)
444
+ text = ""
445
+ if isinstance(event, MagenticFinalResultEvent):
446
+ text = self._extract_text(event.message) if event.message else "No result"
447
+ elif isinstance(event, WorkflowOutputEvent):
448
+ text = self._extract_text(event.data) if event.data else "Research complete"
449
+
450
+ return AgentEvent(
451
+ type="complete",
452
+ message=text,
453
+ data={"iterations": iteration},
454
+ iteration=iteration,
455
+ )
456
+
457
  def _extract_text(self, message: Any) -> str:
458
  """
459
  Defensively extract text from a message object.
 
567
  iteration=iteration,
568
  )
569
 
570
+ # NOTE: The following event types are handled inline in run() loop and never reach
571
+ # this method due to `continue` statements:
572
+ # - MagenticAgentMessageEvent: Accumulator Pattern (lines 322-335)
573
+ # - MagenticAgentDeltaEvent: Accumulator Pattern (lines 306-320)
574
+ # - MagenticFinalResultEvent: P2 Duplicate Fix via _handle_final_event() (lines 343-347)
575
+ # - WorkflowOutputEvent: P2 Duplicate Fix via _handle_final_event() (lines 343-347)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
576
 
577
  return None
578
 
src/utils/config.py CHANGED
@@ -71,10 +71,10 @@ class Settings(BaseSettings):
71
  description="Max coordination rounds for Advanced mode (default 5 for faster demos)",
72
  )
73
  advanced_timeout: float = Field(
74
- default=300.0,
75
  ge=60.0,
76
  le=900.0,
77
- description="Timeout for Advanced mode in seconds (default 5 min)",
78
  )
79
  search_timeout: int = Field(default=30, description="Seconds to wait for search")
80
  magentic_timeout: int = Field(
 
71
  description="Max coordination rounds for Advanced mode (default 5 for faster demos)",
72
  )
73
  advanced_timeout: float = Field(
74
+ default=600.0,
75
  ge=60.0,
76
  le=900.0,
77
+ description="Timeout for Advanced mode in seconds (default 10 min)",
78
  )
79
  search_timeout: int = Field(default=30, description="Seconds to wait for search")
80
  magentic_timeout: int = Field(
tests/unit/agents/test_search_agent.py CHANGED
@@ -47,7 +47,7 @@ async def test_run_executes_search(mock_handler: AsyncMock) -> None:
47
  response = await agent.run("test query")
48
 
49
  # Check handler called
50
- mock_handler.execute.assert_awaited_once_with("test query", max_results_per_tool=10)
51
 
52
  # Check store updated
53
  assert len(store["current"]) == 1
@@ -67,7 +67,7 @@ async def test_run_handles_chat_message_input(mock_handler: AsyncMock) -> None:
67
  message = ChatMessage(role=Role.USER, text="test query")
68
  await agent.run(message)
69
 
70
- mock_handler.execute.assert_awaited_once_with("test query", max_results_per_tool=10)
71
 
72
 
73
  @pytest.mark.asyncio
@@ -81,7 +81,7 @@ async def test_run_handles_list_input(mock_handler: AsyncMock) -> None:
81
  ChatMessage(role=Role.USER, text="test query"),
82
  ]
83
  await agent.run(messages)
84
- mock_handler.execute.assert_awaited_once_with("test query", max_results_per_tool=10)
85
 
86
 
87
  @pytest.mark.asyncio
 
47
  response = await agent.run("test query")
48
 
49
  # Check handler called
50
+ mock_handler.execute.assert_awaited_once_with("test query", max_results_per_tool=5)
51
 
52
  # Check store updated
53
  assert len(store["current"]) == 1
 
67
  message = ChatMessage(role=Role.USER, text="test query")
68
  await agent.run(message)
69
 
70
+ mock_handler.execute.assert_awaited_once_with("test query", max_results_per_tool=5)
71
 
72
 
73
  @pytest.mark.asyncio
 
81
  ChatMessage(role=Role.USER, text="test query"),
82
  ]
83
  await agent.run(messages)
84
+ mock_handler.execute.assert_awaited_once_with("test query", max_results_per_tool=5)
85
 
86
 
87
  @pytest.mark.asyncio
tests/unit/orchestrators/test_advanced_orchestrator.py CHANGED
@@ -28,11 +28,11 @@ class TestAdvancedOrchestratorConfig:
28
  assert orch._max_rounds == 7
29
 
30
  @patch("src.orchestrators.advanced.get_chat_client")
31
- def test_timeout_default_is_five_minutes(self, mock_get_client) -> None:
32
- """Default timeout should be 300s (5 min) from settings."""
33
  mock_get_client.return_value = MagicMock()
34
  orch = AdvancedOrchestrator()
35
- assert orch._timeout_seconds == 300.0
36
 
37
  @patch("src.orchestrators.advanced.get_chat_client")
38
  def test_explicit_timeout_overrides_settings(self, mock_get_client) -> None:
 
28
  assert orch._max_rounds == 7
29
 
30
  @patch("src.orchestrators.advanced.get_chat_client")
31
+ def test_timeout_default_is_ten_minutes(self, mock_get_client: MagicMock) -> None:
32
+ """Test that default timeout is 10 minutes (P2 fix)."""
33
  mock_get_client.return_value = MagicMock()
34
  orch = AdvancedOrchestrator()
35
+ assert orch._timeout_seconds == 600.0
36
 
37
  @patch("src.orchestrators.advanced.get_chat_client")
38
  def test_explicit_timeout_overrides_settings(self, mock_get_client) -> None: