DeepBoner / docs /bugs /P2_FIRST_TURN_TIMEOUT.md
VibecoderMcSwaggins's picture
fix(p2): Resolve duplicate report content and first-turn timeout bugs (#123)
21dd8fe unverified
|
raw
history blame
4.38 kB

P2 Bug: First Agent Turn Exceeds Workflow Timeout

Date: 2025-12-03 Status: FIXED (PR fix/p2-double-bug-squash) Severity: P2 (UX - Workflow always times out on complex queries) Component: src/orchestrators/advanced.py + src/agents/search_agent.py Affects: Both Free Tier (HuggingFace) AND Paid Tier (OpenAI)


Executive Summary

The search agent's first turn can exceed the 5-minute workflow timeout, causing:

  1. iterations=0 at timeout (no agent completed a turn)
  2. _handle_timeout() synthesizes from partial evidence
  3. Users get incomplete research results

This is a performance/architecture bug, not a model issue.


Symptom

[warning] Workflow timed out             iterations=0

The workflow times out with iterations=0 - meaning the first agent (search agent) never completed its turn before the 5-minute timeout.


Root Cause

The search agent's first turn is extremely expensive:

Search Agent First Turn:
β”œβ”€β”€ Manager assigns task
β”œβ”€β”€ Search agent starts
β”‚   β”œβ”€β”€ Calls PubMed search tool (10 results)
β”‚   β”œβ”€β”€ Calls ClinicalTrials search tool (10 results)
β”‚   β”œβ”€β”€ Calls EuropePMC search tool (10 results)
β”‚   └── For EACH result (30 total):
β”‚       β”œβ”€β”€ Generate embedding (OpenAI API call)
β”‚       β”œβ”€β”€ Check for duplicates (ChromaDB query)
β”‚       └── Store in ChromaDB
β”‚
β”‚   TOTAL: 30 results Γ— (embedding + dedup + store) = 90+ API/DB operations
β”‚
└── Agent turn completes (if timeout hasn't fired)

The timeout is on the WORKFLOW, not individual agent turns. A single greedy agent can consume the entire timeout budget.


Impact

Aspect Impact
UX Queries always timeout on first turn
Research quality Synthesis happens on partial evidence
Confusion iterations=0 looks like nothing happened

The Fix (Consensus)

Reduce work per turn + increase timeout budget.

Implementation

1. Reduce results per tool (immediate)

src/agents/search_agent.py line 70:

# Change from 10 to 5
result: SearchResult = await self._handler.execute(query, max_results_per_tool=5)

2. Increase workflow timeout (immediate)

src/utils/config.py:

advanced_timeout: float = Field(
    default=600.0,  # Was 300.0 (5 min), now 10 min
    ge=60.0,
    le=900.0,
    description="Timeout for Advanced mode in seconds",
)

Why NOT Per-Turn Timeout

DANGER: The SearchHandler uses asyncio.gather():

# src/tools/search_handler.py line 163-164
results = await asyncio.gather(*tasks, return_exceptions=True)

This is an all-or-nothing operation. If you wrap it with asyncio.timeout() and the timeout fires, you get zero results, not partial results.

# DON'T DO THIS - yields nothing on timeout
async with asyncio.timeout(60):
    result = await self._handler.execute(query)  # Cancelled = zero results

Per-turn timeout requires SearchHandler to support cancellation with partial results. That's a separate architectural change (see Future Work).


Future Work (Streaming Evidence Ingestion)

For proper fix, SearchHandler.execute() should:

  1. Yield results as they arrive (async generator)
  2. Support cancellation with partial results
  3. Allow agent to return "what we have so far" on timeout
# Future architecture
async def execute_streaming(self, query: str) -> AsyncIterator[Evidence]:
    for tool in self.tools:
        async for evidence in tool.search_streaming(query):
            yield evidence  # Can be cancelled at any point

This is out of scope for the immediate fix.


Test Plan

  1. Run query with 10-minute timeout
  2. Verify first agent turn completes before timeout
  3. Verify iterations >= 1 at workflow end

Verification Data

From diagnostic run:

=== RAW FRAMEWORK EVENTS ===
  MagenticAgentDeltaEvent: 284
  MagenticOrchestratorMessageEvent: 3
  ...
  NO MagenticAgentMessageEvent  ← Agent never completed a turn!

[warning] Workflow timed out             iterations=0

Related

  • P2 Duplicate Report Bug (separate issue, happens after successful completion)
  • _handle_timeout() correctly synthesizes, but with partial evidence
  • Not related to model quality - this is infrastructure/performance