Spaces:

VibecoderMcSwaggins
/

DeepBoner

Paused

App Files Files Community

DeepBoner / docs /bugs /P2_FIRST_TURN_TIMEOUT.md

VibecoderMcSwaggins

fix(p2): Resolve duplicate report content and first-turn timeout bugs (#123)

21dd8fe unverified 12 days ago

preview code

raw

history blame

4.38 kB

P2 Bug: First Agent Turn Exceeds Workflow Timeout

Date: 2025-12-03 Status: FIXED (PR fix/p2-double-bug-squash) Severity: P2 (UX - Workflow always times out on complex queries) Component: src/orchestrators/advanced.py + src/agents/search_agent.py Affects: Both Free Tier (HuggingFace) AND Paid Tier (OpenAI)

Executive Summary

The search agent's first turn can exceed the 5-minute workflow timeout, causing:

iterations=0 at timeout (no agent completed a turn)
_handle_timeout() synthesizes from partial evidence
Users get incomplete research results

This is a performance/architecture bug, not a model issue.

Symptom

[warning] Workflow timed out             iterations=0

The workflow times out with iterations=0 - meaning the first agent (search agent) never completed its turn before the 5-minute timeout.

Root Cause

The search agent's first turn is extremely expensive:

Search Agent First Turn:
├── Manager assigns task
├── Search agent starts
│   ├── Calls PubMed search tool (10 results)
│   ├── Calls ClinicalTrials search tool (10 results)
│   ├── Calls EuropePMC search tool (10 results)
│   └── For EACH result (30 total):
│       ├── Generate embedding (OpenAI API call)
│       ├── Check for duplicates (ChromaDB query)
│       └── Store in ChromaDB
│
│   TOTAL: 30 results × (embedding + dedup + store) = 90+ API/DB operations
│
└── Agent turn completes (if timeout hasn't fired)

The timeout is on the WORKFLOW, not individual agent turns. A single greedy agent can consume the entire timeout budget.

Impact

Aspect	Impact
UX	Queries always timeout on first turn
Research quality	Synthesis happens on partial evidence
Confusion	`iterations=0` looks like nothing happened

The Fix (Consensus)

Reduce work per turn + increase timeout budget.

Implementation

1. Reduce results per tool (immediate)

src/agents/search_agent.py line 70:

# Change from 10 to 5
result: SearchResult = await self._handler.execute(query, max_results_per_tool=5)

2. Increase workflow timeout (immediate)

src/utils/config.py:

advanced_timeout: float = Field(
    default=600.0,  # Was 300.0 (5 min), now 10 min
    ge=60.0,
    le=900.0,
    description="Timeout for Advanced mode in seconds",
)

Why NOT Per-Turn Timeout

DANGER: The SearchHandler uses asyncio.gather():

# src/tools/search_handler.py line 163-164
results = await asyncio.gather(*tasks, return_exceptions=True)

This is an all-or-nothing operation. If you wrap it with asyncio.timeout() and the timeout fires, you get zero results, not partial results.

# DON'T DO THIS - yields nothing on timeout
async with asyncio.timeout(60):
    result = await self._handler.execute(query)  # Cancelled = zero results

Per-turn timeout requires SearchHandler to support cancellation with partial results. That's a separate architectural change (see Future Work).

Future Work (Streaming Evidence Ingestion)

For proper fix, SearchHandler.execute() should:

Yield results as they arrive (async generator)
Support cancellation with partial results
Allow agent to return "what we have so far" on timeout

# Future architecture
async def execute_streaming(self, query: str) -> AsyncIterator[Evidence]:
    for tool in self.tools:
        async for evidence in tool.search_streaming(query):
            yield evidence  # Can be cancelled at any point

This is out of scope for the immediate fix.

Test Plan

Run query with 10-minute timeout
Verify first agent turn completes before timeout
Verify iterations >= 1 at workflow end

Verification Data

From diagnostic run:

=== RAW FRAMEWORK EVENTS ===
  MagenticAgentDeltaEvent: 284
  MagenticOrchestratorMessageEvent: 3
  ...
  NO MagenticAgentMessageEvent  ← Agent never completed a turn!

[warning] Workflow timed out             iterations=0

P2 Duplicate Report Bug (separate issue, happens after successful completion)
_handle_timeout() correctly synthesizes, but with partial evidence
Not related to model quality - this is infrastructure/performance