VibecoderMcSwaggins commited on
Commit
c6e9843
Β·
unverified Β·
1 Parent(s): 4337145

fix(P0): Implement Accumulator Pattern to resolve Repr Bug (#117)

Browse files

* docs: Add P1 bug doc for Simple Mode removal breaking Free Tier UX

SPEC-16 Unified Architecture removed Simple Mode, forcing all users
to Advanced Mode. When no API key is provided, Advanced Mode falls back
to HuggingFace Free Tier which triggers upstream agent-framework repr
bug (#2562).

Options documented:
A) Wait for upstream fix (PR #2566)
B) Restore Simple Mode for free tier
C) Current workaround in _extract_text()

* docs: Update P1 bug doc and SPEC-16 with rollback warning

CRITICAL: Simple Mode was deleted BEFORE verifying Advanced+HF worked.

Problem:
- Upstream agent-framework has repr bug (#2562)
- Advanced Mode + HuggingFace = garbage output
- Simple Mode (the working fallback) was deleted prematurely

Bug doc updates:
- Added "What Went Wrong" timeline
- Added Gradio UI confusion analysis (examples vs chat button)
- Recommendation: Restore Simple Mode as fallback

SPEC-16 updates:
- Status changed to "PARTIALLY IMPLEMENTED - ROLLBACK REQUIRED"
- Added critical warning about premature deletion
- Links to P1 bug doc for action items

* docs: CRITICAL - Simple Mode is NOT being deleted

This commit makes it CRYSTAL CLEAR across all documentation:

β›” SIMPLE MODE IS NOT BEING DELETED - NON-NEGOTIABLE

What went wrong:
- SPEC-16 was supposed to INTEGRATE Simple Mode, not DELETE it
- simple.py was deleted BEFORE verifying Advanced+HF worked
- Upstream agent-framework has repr bug (#2562)
- Free tier users now have no working fallback

Required actions:
1. RESTORE simple.py from git history or MCP reference
2. KEEP Simple Mode as free-tier fallback indefinitely
3. Use Advanced Mode ONLY for paid API key users
4. Wait for upstream #2566 to merge before reconsidering

Updated files:
- SPEC_16: Status changed to "ON HOLD", added warning
- P0_SIMPLE_MODE_FORCED_SYNTHESIS_BYPASS: Changed to "Patch simple.py"
- ACTIVE_BUGS: Marked Simple Mode issues as OPEN

* docs: DO NOT use MCP reference repo - it's buggy

Updated all docs and GitHub issues to clarify:

1. DO NOT restore from MCP reference repo - has known bugs
2. Git revert in THIS repo MAY be possible - review for bugs first
3. Clean implementation preferred if old code is too buggy
4. Goal is WORKING Simple Mode, not blindly restored buggy code

Files updated:
- ACTIVE_BUGS.md
- SPEC_16_UNIFIED_CHAT_CLIENT_ARCHITECTURE.md
- P1_SIMPLE_MODE_REMOVED_BREAKS_FREE_TIER_UX.md

GitHub issues updated:
- #105: Added warning about reference repo
- #113: Added warning about reference repo

* docs: Clarify UNIFIED architecture with Simple Mode INTEGRATED

- NOT two parallel universes/orchestrators
- ONE codebase handles all tiers (free + paid)
- Simple Mode behavior INTEGRATED, not separate
- Blocked by upstream bug #2562, waiting for PR #2566

* docs: Add architecture documentation for unified system

- Current state: Advanced Mode only, simple.py deleted
- Goal: ONE unified architecture (not parallel universes)
- Simple Mode INTEGRATED via HuggingFaceChatClient
- Blocked by upstream #2562, waiting for PR #2566
- Includes path forward for all scenarios

* docs: Update all bug docs for unified architecture consistency

- ACTIVE_BUGS.md: Consolidated free tier issue as single P0 blocker
- P0_SIMPLE_MODE_FORCED_SYNTHESIS_BYPASS.md: Simplified - bug fixed by unification
- All docs now consistently say: ONE unified architecture, NOT parallel universes
- Simple Mode behavior INTEGRATED via HuggingFaceChatClient
- simple.py is DELETED, not being restored

* docs: FINAL - Clear terminology, framework integration documented

Architecture:
- No API Key (Free) β†’ HuggingFace backend
- API Key (Paid) β†’ OpenAI backend
- ONE codebase, different backends, no "modes"

Framework Stack:
- Microsoft Agent Framework = orchestration (routes agents)
- Pydantic AI = structured outputs (validates data)
- Both work TOGETHER, not mutually exclusive

Blocked by upstream #2562, waiting for PR #2566.

All docs and GitHub issues now use consistent terminology.

* docs: Fix root-level docs for unified architecture

- CLAUDE.md: Remove simple.py reference, update orchestrator description
- AGENTS.md: Same fix
- GEMINI.md: Same fix
- README.md: "Two Modes" β†’ "Unified Architecture" + Free/Paid Tier

All root docs now consistent with unified architecture:
- ONE orchestrator (advanced.py) for all users
- Auto-selects backend: OpenAI (if key) or HuggingFace (free)
- No more "Simple Mode" vs "Advanced Mode" terminology

* fix(orchestrator): implement Accumulator Pattern to resolve Repr Bug (P0)

Implements SPEC-17 to fix the P0 'Repr Bug' where agent messages displayed raw Python object strings.

Changes:
- Implemented Accumulator Pattern in AdvancedOrchestrator to use streaming deltas as the source of truth for text content.
- Added fallback logic to handle tool-only turns safely without exposing internal object representations.
- Refactored to reduce complexity (PLR0915) by extracting , , and .
- Added comprehensive unit tests (tests/unit/orchestrators/test_accumulator_pattern.py) verifying the fix against mocked upstream events.
- Updated documentation with SPEC-17 and Root Cause Analysis.

* docs: Add analysis for Gradio Example vs Chat Arrow behavior

- Documented the analysis of user-reported discrepancies between Example Click and Chat Arrow outputs.
- Confirmed that both actions utilize the same code path, with differences attributed to timing rather than divergent code.
- Identified the root cause as an upstream representation issue, linking to related documentation for further context.
- Provided verification steps and next actions regarding the upstream bug fix.

* fix(tests): isolate accumulator pattern tests to prevent module pollution

Refactors tests/unit/orchestrators/test_accumulator_pattern.py to use scoped fixtures for patching sys.modules instead of global module-level patching. This prevents side effects on other tests (like test_advanced_events.py and test_chat_client_factory.py).

Changes:
- Moved mock setup into 'mock_agent_framework' fixture.
- Implemented module reloading logic for 'src.orchestrators.advanced' to ensure it picks up mocks during isolation tests and real modules afterwards.
- Updated MockOrchestratorMessageEvent signature to match real class (added 'message' arg).
- Verified all 20 related tests pass together.

* fix: Address CodeRabbit review feedback

- Add `text` language identifier to ASCII diagram code blocks (MD041)
- Fix broken URL typo: togithub.com β†’ github.com
- Remove unreachable dead code for MagenticAgentMessageEvent and
MagenticAgentDeltaEvent handlers in _process_event() (handled by
Accumulator Pattern in run() loop with continue statements)

* fix: Address all CodeRabbit review feedback

- Use synthesis_result.text instead of str() for AgentRunResponse
- Add Literal return type to _get_event_type_for_agent (eliminates type: ignore)
- Add

@pytest
.mark.unit markers to accumulator tests
- Add `text` language identifier to code fence in P0_SIMPLE_MODE doc
- Update P0_REPR_BUG checklist to reflect completed dead code removal
- Fix test mock to return object with .text property (matches AgentRunResponse API)

* docs: Fix markdown lint (blank line before code fence)

.gitignore CHANGED
@@ -50,6 +50,8 @@ reference_repos/pydanticai-research-agent/
50
  reference_repos/pubmed-mcp-server/
51
  reference_repos/DeepCritical/
52
  reference_repos/GradioDemo/
 
 
53
 
54
  # Keep the README in reference_repos
55
  !reference_repos/README.md
 
50
  reference_repos/pubmed-mcp-server/
51
  reference_repos/DeepCritical/
52
  reference_repos/GradioDemo/
53
+ reference_repos/deepboner-hf-space/
54
+ reference_repos/microsoft-agent-framework/
55
 
56
  # Keep the README in reference_repos
57
  !reference_repos/README.md
AGENTS.md CHANGED
@@ -50,10 +50,13 @@ Research Report with Citations
50
 
51
  **Key Components**:
52
 
53
- - `src/orchestrators/` - Orchestrator package (simple, advanced, langgraph modes)
54
- - `simple.py` - Main search-and-judge loop
55
- - `advanced.py` - Multi-agent Magentic mode
56
- - `langgraph_orchestrator.py` - LangGraph-based workflow
 
 
 
57
  - `src/tools/pubmed.py` - PubMed E-utilities search
58
  - `src/tools/clinicaltrials.py` - ClinicalTrials.gov API
59
  - `src/tools/europepmc.py` - Europe PMC search
 
50
 
51
  **Key Components**:
52
 
53
+ - `src/orchestrators/` - Unified orchestrator package
54
+ - `advanced.py` - Main orchestrator (handles both Free and Paid tiers)
55
+ - `factory.py` - Auto-selects backend based on API key presence
56
+ - `langgraph_orchestrator.py` - LangGraph-based workflow (experimental)
57
+ - `src/clients/` - LLM backend adapters
58
+ - `factory.py` - Auto-selects: OpenAI (if key) or HuggingFace (free)
59
+ - `huggingface.py` - HuggingFace adapter for free tier
60
  - `src/tools/pubmed.py` - PubMed E-utilities search
61
  - `src/tools/clinicaltrials.py` - ClinicalTrials.gov API
62
  - `src/tools/europepmc.py` - Europe PMC search
CLAUDE.md CHANGED
@@ -50,10 +50,13 @@ Research Report with Citations
50
 
51
  **Key Components**:
52
 
53
- - `src/orchestrators/` - Orchestrator package (simple, advanced, langgraph modes)
54
- - `simple.py` - Main search-and-judge loop
55
- - `advanced.py` - Multi-agent Magentic mode
56
- - `langgraph_orchestrator.py` - LangGraph-based workflow
 
 
 
57
  - `src/tools/pubmed.py` - PubMed E-utilities search
58
  - `src/tools/clinicaltrials.py` - ClinicalTrials.gov API
59
  - `src/tools/europepmc.py` - Europe PMC search
 
50
 
51
  **Key Components**:
52
 
53
+ - `src/orchestrators/` - Unified orchestrator package
54
+ - `advanced.py` - Main orchestrator (handles both Free and Paid tiers)
55
+ - `factory.py` - Auto-selects backend based on API key presence
56
+ - `langgraph_orchestrator.py` - LangGraph-based workflow (experimental)
57
+ - `src/clients/` - LLM backend adapters
58
+ - `factory.py` - Auto-selects: OpenAI (if key) or HuggingFace (free)
59
+ - `huggingface.py` - HuggingFace adapter for free tier
60
  - `src/tools/pubmed.py` - PubMed E-utilities search
61
  - `src/tools/clinicaltrials.py` - ClinicalTrials.gov API
62
  - `src/tools/europepmc.py` - Europe PMC search
GEMINI.md CHANGED
@@ -50,10 +50,13 @@ The project follows a **Vertical Slice Architecture** (Search -> Judge -> Orches
50
 
51
  ## Key Components
52
 
53
- - `src/orchestrators/` - Orchestrator package (simple, advanced, langgraph modes)
54
- - `simple.py` - Main search-and-judge loop
55
- - `advanced.py` - Multi-agent Magentic mode
56
- - `langgraph_orchestrator.py` - LangGraph-based workflow
 
 
 
57
  - `src/tools/pubmed.py` - PubMed E-utilities search
58
  - `src/tools/clinicaltrials.py` - ClinicalTrials.gov API
59
  - `src/tools/europepmc.py` - Europe PMC search
 
50
 
51
  ## Key Components
52
 
53
+ - `src/orchestrators/` - Unified orchestrator package
54
+ - `advanced.py` - Main orchestrator (handles both Free and Paid tiers)
55
+ - `factory.py` - Auto-selects backend based on API key presence
56
+ - `langgraph_orchestrator.py` - LangGraph-based workflow (experimental)
57
+ - `src/clients/` - LLM backend adapters
58
+ - `factory.py` - Auto-selects: OpenAI (if key) or HuggingFace (free)
59
+ - `huggingface.py` - HuggingFace adapter for free tier
60
  - `src/tools/pubmed.py` - PubMed E-utilities search
61
  - `src/tools/clinicaltrials.py` - ClinicalTrials.gov API
62
  - `src/tools/europepmc.py` - Europe PMC search
P0_REPR_BUG_ROOT_CAUSE_ANALYSIS.md ADDED
@@ -0,0 +1,99 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # P0: Event Handling Implementation Spec
2
+
3
+ **Status**: FIXED
4
+ **Priority**: P0
5
+ **Source of Truth**: `reference_repos/microsoft-agent-framework/python/samples/autogen-migration/orchestrations/04_magentic_one.py`
6
+
7
+ ---
8
+
9
+ ## Root Cause (One Sentence)
10
+
11
+ We were extracting content from `MagenticAgentMessageEvent.message` β€” **the wrong event type** β€” instead of using `MagenticAgentDeltaEvent.text` as the sole source of streaming content.
12
+
13
+ ---
14
+
15
+ ## The Fix: Correct Event Handling Per Microsoft SSOT
16
+
17
+ | Event Type | Correct Usage | What We Were Doing (Wrong) |
18
+ |------------|---------------|----------------------------|
19
+ | `MagenticAgentDeltaEvent` | **Extract `.text`** - This is the ONLY source of content | Partially used, not accumulated |
20
+ | `MagenticAgentMessageEvent` | **Signal only** - Agent turn complete. IGNORE `.message` | Extracting `.message.text` (hits repr bug) |
21
+ | `MagenticFinalResultEvent` | **Extract `.message.text`** - Final synthesis result | Correct |
22
+
23
+ ---
24
+
25
+ ## Implementation: Accumulator Pattern
26
+
27
+ From Microsoft's `04_magentic_one.py` (lines 108-138):
28
+
29
+ ```python
30
+ # Microsoft's Pattern
31
+ async for event in workflow.run_stream(task):
32
+ if isinstance(event, MagenticAgentDeltaEvent):
33
+ # STREAM CONTENT: Accumulate and display
34
+ if event.text:
35
+ print(event.text, end="", flush=True)
36
+
37
+ elif isinstance(event, MagenticAgentMessageEvent):
38
+ # SIGNAL ONLY: Agent done. Print newline. DO NOT read .message
39
+ print()
40
+
41
+ elif isinstance(event, MagenticFinalResultEvent):
42
+ # FINAL RESULT: Safe to read .message.text
43
+ print(event.message.text)
44
+ ```
45
+
46
+ ---
47
+
48
+ ## Our Implementation (`src/orchestrators/advanced.py`)
49
+
50
+ **Status**: βœ… IMPLEMENTED (lines 241-308)
51
+
52
+ ```python
53
+ # 1. Accumulate streaming content (ONLY source of truth)
54
+ if isinstance(event, MagenticAgentDeltaEvent):
55
+ if event.text:
56
+ current_message_buffer += event.text
57
+ yield AgentEvent(type="streaming", message=event.text, ...)
58
+
59
+ # 2. Use buffer on completion signal (IGNORE event.message)
60
+ if isinstance(event, MagenticAgentMessageEvent):
61
+ text_content = current_message_buffer or "Action completed (Tool Call)"
62
+ yield AgentEvent(message=f"{agent_name}: {text_content[:200]}...", ...)
63
+ current_message_buffer = "" # Reset for next agent
64
+
65
+ # 3. Final result - safe to extract
66
+ if isinstance(event, MagenticFinalResultEvent):
67
+ text = self._extract_text(event.message)
68
+ yield AgentEvent(type="complete", message=text, ...)
69
+ ```
70
+
71
+ ---
72
+
73
+ ## Why This Eliminates the Repr Bug
74
+
75
+ The repr bug occurs at `_magentic.py:1730`:
76
+
77
+ ```python
78
+ text = last.text or str(last) # Falls back to repr() for tool-only messages
79
+ ```
80
+
81
+ By **never reading** `MagenticAgentMessageEvent.message.text`, we never hit this code path.
82
+
83
+ **The repr bug is eliminated by correct implementation β€” no upstream fix required.**
84
+
85
+ ---
86
+
87
+ ## Verification Checklist
88
+
89
+ - [x] `MagenticAgentDeltaEvent.text` used as sole content source
90
+ - [x] `MagenticAgentMessageEvent` used as signal only (buffer consumed, not `.message`)
91
+ - [x] `MagenticFinalResultEvent.message.text` extracted for final result
92
+ - [x] Buffer reset on agent switch and completion
93
+ - [x] Remove dead code path in `_process_event()` that still calls `_extract_text` on `MagenticAgentMessageEvent`
94
+
95
+ ---
96
+
97
+ ## Remaining Cleanup
98
+
99
+ βœ… **DONE** - Dead code paths for `MagenticAgentMessageEvent` and `MagenticAgentDeltaEvent` have been removed from `_process_event()`. Comments now explain these events are handled by the Accumulator Pattern in `run()`.
README.md CHANGED
@@ -55,8 +55,9 @@ Sexual health is health. Period. Yet it remains one of the most under-researched
55
  - πŸ€– **MCP Integration**: Use our tools from Claude Desktop or any MCP client
56
  - πŸ”’ **Modal Sandbox**: Secure execution of AI-generated statistical analysis
57
  - 🧠 **Smart Evidence Synthesis**: LLM-powered judge evaluates and synthesizes findings
58
- - ⚑ **Two Modes**: Simple (fast) or Advanced (multi-agent deep dive)
59
- - πŸ†“ **Free Tier Available**: Works without API keys (HuggingFace Inference)
 
60
 
61
  ## Example Queries
62
 
 
55
  - πŸ€– **MCP Integration**: Use our tools from Claude Desktop or any MCP client
56
  - πŸ”’ **Modal Sandbox**: Secure execution of AI-generated statistical analysis
57
  - 🧠 **Smart Evidence Synthesis**: LLM-powered judge evaluates and synthesizes findings
58
+ - ⚑ **Unified Architecture**: Same powerful multi-agent orchestration for everyone
59
+ - πŸ†“ **Free Tier**: Works without API keys (HuggingFace Inference)
60
+ - πŸš€ **Paid Tier**: Unlocks GPT-5 automatically when OpenAI key is provided
61
 
62
  ## Example Queries
63
 
docs/ARCHITECTURE.md ADDED
@@ -0,0 +1,104 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # DeepBoner Architecture
2
+
3
+ > **Last Updated**: 2025-12-01
4
+
5
+ ---
6
+
7
+ ## How It Works (Simple Version)
8
+
9
+ ```text
10
+ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
11
+ β”‚ UNIFIED ARCHITECTURE β”‚
12
+ β”‚ β”‚
13
+ β”‚ User provides API key? β”‚
14
+ β”‚ β”‚
15
+ β”‚ NO (Free Tier) YES (Paid Tier) β”‚
16
+ β”‚ ────────────── ─────────────── β”‚
17
+ β”‚ HuggingFace backend OpenAI backend β”‚
18
+ β”‚ Qwen 2.5 72B (free) GPT-5 (paid) β”‚
19
+ β”‚ β”‚
20
+ β”‚ SAME orchestration logic for both β”‚
21
+ β”‚ ONE codebase, different LLM backends β”‚
22
+ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
23
+ ```
24
+
25
+ **That's it.** No "modes." Just: do you have an API key or not?
26
+
27
+ ---
28
+
29
+ ## Current Status
30
+
31
+ **Free Tier is BLOCKED** by upstream bug #2562.
32
+
33
+ Once [PR #2566](https://github.com/microsoft/agent-framework/pull/2566) merges:
34
+ 1. Update `agent-framework` dependency
35
+ 2. Free tier works
36
+ 3. Done
37
+
38
+ ---
39
+
40
+ ## Framework Stack
41
+
42
+ DeepBoner uses TWO frameworks that work TOGETHER:
43
+
44
+ | Framework | What It Does | Where Used |
45
+ |-----------|--------------|------------|
46
+ | **Microsoft Agent Framework** | Multi-agent orchestration | `src/orchestrators/advanced.py` |
47
+ | **Pydantic AI** | Structured outputs, validation | `src/agent_factory/judges.py`, `src/agents/*.py` |
48
+
49
+ **They are NOT mutually exclusive.** Microsoft AF handles the orchestration (Manager β†’ Search β†’ Judge β†’ Report). Pydantic AI handles structured outputs within those agents.
50
+
51
+ ---
52
+
53
+ ## LLM Backend Selection
54
+
55
+ Auto-detected by `src/clients/factory.py`:
56
+
57
+ ```python
58
+ def get_chat_client():
59
+ if settings.has_openai_key:
60
+ return OpenAIChatClient(...) # Paid tier
61
+ else:
62
+ return HuggingFaceChatClient(...) # Free tier
63
+ ```
64
+
65
+ | Condition | Backend | Model |
66
+ |-----------|---------|-------|
67
+ | User provides OpenAI key | OpenAI | GPT-5 |
68
+ | No API key provided | HuggingFace | Qwen 2.5 72B (free) |
69
+
70
+ ---
71
+
72
+ ## Key Files
73
+
74
+ | File | Purpose |
75
+ |------|---------|
76
+ | `src/orchestrators/advanced.py` | Multi-agent orchestration (Microsoft AF) |
77
+ | `src/clients/factory.py` | Auto-selects LLM backend |
78
+ | `src/clients/huggingface.py` | HuggingFace adapter for free tier |
79
+ | `src/agent_factory/judges.py` | Judge logic (Pydantic AI) |
80
+ | `src/agents/*.py` | Individual agents (Pydantic AI) |
81
+
82
+ ---
83
+
84
+ ## What Was Deleted
85
+
86
+ `simple.py` (778 lines) was a SEPARATE orchestrator that created a "parallel universe." It's gone. Now there's ONE orchestrator with different backends.
87
+
88
+ ---
89
+
90
+ ## Upstream Blocker
91
+
92
+ **Bug:** Microsoft Agent Framework produces `repr()` garbage for tool-call-only messages.
93
+
94
+ **Fix:** [PR #2566](https://github.com/microsoft/agent-framework/pull/2566) - waiting to merge.
95
+
96
+ **Tracking:** [Issue #2562](https://github.com/microsoft/agent-framework/issues/2562)
97
+
98
+ ---
99
+
100
+ ## References
101
+
102
+ - [Pydantic AI](https://ai.pydantic.dev/) - Structured outputs framework
103
+ - [Microsoft Agent Framework](https://github.com/microsoft/agent-framework) - Multi-agent orchestration
104
+ - [AG-UI Protocol](https://www.copilotkit.ai/blog/introducing-pydantic-ai-integration-with-ag-ui) - How they integrate
docs/bugs/ACTIVE_BUGS.md CHANGED
@@ -1,21 +1,36 @@
1
  # Active Bugs
2
 
3
- > Last updated: 2025-12-01 (16:30 PST)
4
  >
5
  > **Note:** Completed bug docs archived to `docs/bugs/archive/`
6
  > **See also:** [Code Quality Audit Findings (2025-11-30)](AUDIT_FINDINGS_2025_11_30.md)
 
7
 
8
- ## P0 - Critical
9
 
10
- (No active P0 bugs)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
11
 
12
  ---
13
 
14
- ## P3 - UX Polish
15
- ...
16
  ## Resolved Bugs
17
 
18
  ### ~~P0 - AIFunction Not JSON Serializable~~ FIXED
 
19
  **File:** `docs/bugs/P0_AIFUNCTION_NOT_JSON_SERIALIZABLE.md`
20
  **Found:** 2025-12-01
21
  **Resolved:** 2025-12-01
@@ -27,6 +42,7 @@
27
  - Result: Free Tier now supports full function calling capabilities with Qwen2.5-72B.
28
 
29
  ### ~~P1 - HuggingFace Router 401 Unauthorized~~ FIXED
 
30
  **File:** `docs/bugs/P1_HUGGINGFACE_ROUTER_401_HYPERBOLIC.md`
31
  **Found:** 2025-12-01
32
  **Resolved:** 2025-12-01
@@ -36,18 +52,8 @@
36
  - Fix: Generated new valid HF_TOKEN, updated `.env` and Spaces secrets
37
  - Also switched default model to `Qwen/Qwen2.5-72B-Instruct` for better reliability
38
 
39
- ### ~~P0 - Simple Mode Ignores Forced Synthesis~~ FIXED
40
- **File:** `docs/bugs/P0_SIMPLE_MODE_FORCED_SYNTHESIS_BYPASS.md`
41
- **Issue:** [#113](https://github.com/The-Obstacle-Is-The-Way/DeepBoner/issues/113)
42
- **PR:** [#115](https://github.com/The-Obstacle-Is-The-Way/DeepBoner/pull/115) (SPEC-16)
43
- **Found:** 2025-12-01
44
- **Resolved:** 2025-12-01
45
-
46
- - Problem: Simple Mode ignored forced synthesis signals from Judge.
47
- - Fix: SPEC-16 unified architecture - removed Simple Mode entirely, integrated HuggingFace into Advanced Mode.
48
- - Simple Mode code deleted, capability preserved via `HuggingFaceChatClient` adapter.
49
-
50
  ### ~~P1 - Advanced Mode Exposes Uninterpretable Chain-of-Thought~~ FIXED
 
51
  **File:** `docs/bugs/P1_ADVANCED_MODE_UNINTERPRETABLE_CHAIN_OF_THOUGHT.md`
52
  **PR:** [#107](https://github.com/The-Obstacle-Is-The-Way/DeepBoner/pull/107)
53
  **Found:** 2025-12-01
@@ -59,6 +65,7 @@
59
  - CodeRabbit review addressed: test markers, edge case handling, truncation test coverage.
60
 
61
  ### ~~P0 - Advanced Mode Timeout Yields No Synthesis~~ FIXED
 
62
  **File:** `docs/bugs/P0_ADVANCED_MODE_TIMEOUT_NO_SYNTHESIS.md`
63
  **Found:** 2025-11-30 (Manual Testing)
64
  **Resolved:** 2025-12-01
@@ -75,38 +82,35 @@
75
  - Tests: `tests/unit/orchestrators/test_advanced_timeout.py`
76
  - Key files: `src/orchestrators/advanced.py`, `src/orchestrators/factory.py`, `src/services/research_memory.py`
77
 
78
- ### ~~P0 - Free Tier Synthesis Incorrectly Uses Server-Side API Keys~~ FIXED
 
79
  **File:** `docs/bugs/P1_SYNTHESIS_BROKEN_KEY_FALLBACK.md`
80
  **PR:** [#103](https://github.com/The-Obstacle-Is-The-Way/DeepBoner/pull/103)
81
  **Found:** 2025-11-30 (Testing)
82
  **Resolved:** 2025-11-30
83
- **Verified:** Free Tier now produces full LLM-synthesized research reports βœ…
84
 
85
- - Problem: Simple Mode crashed with "OpenAIError" on HuggingFace Spaces when user provided no key but admin key was invalid.
86
- - Root Cause: Synthesis logic bypassed the Free Tier judge and incorrectly used server-side keys via `get_model()`.
87
- - Fix: Implemented `synthesize()` in `HFInferenceJudgeHandler` to use free HuggingFace Inference, ensuring consistency with the judge phase.
88
- - Key files: `src/agent_factory/judges.py`, `src/orchestrators/simple.py`
89
 
90
- ### ~~P0 - Synthesis Fails with OpenAIError in Free Mode~~ FIXED
91
  **File:** `docs/bugs/P0_SYNTHESIS_PROVIDER_MISMATCH.md`
92
  **Found:** 2025-11-30 (Code Audit)
93
  **Resolved:** 2025-11-30
94
 
95
  - Problem: "Simple Mode" (Free Tier) crashed with `OpenAIError`.
96
- - Root Cause: `get_model()` defaulted to OpenAI regardless of available keys.
97
- - Fix: Implemented auto-detection in `judges.py` (OpenAI > Anthropic > HuggingFace).
98
- - Added extensive unit tests and regression tests.
99
 
100
- ### ~~P0 - Simple Mode Never Synthesizes~~ FIXED
101
  **PR:** [#71](https://github.com/The-Obstacle-Is-The-Way/DeepBoner/pull/71) (SPEC_06)
102
  **Commit**: `5cac97d` (2025-11-29)
103
 
104
  - Root cause: LLM-as-Judge recommendations were being IGNORED
105
- - Fix: Code-enforced termination criteria (`_should_synthesize()`)
106
- - Added combined score thresholds, late-iteration logic, emergency fallback
107
- - Simple mode now synthesizes instead of spinning forever
108
 
109
  ### ~~P3 - Magentic Mode Missing Termination Guarantee~~ FIXED
 
110
  **Commit**: `d36ce3c` (2025-11-29)
111
 
112
  - Added `final_event_received` tracking in `orchestrator_magentic.py`
@@ -114,6 +118,7 @@
114
  - Verified with `test_magentic_termination.py`
115
 
116
  ### ~~P0 - Magentic Mode Report Generation~~ FIXED
 
117
  **Commit**: `9006d69` (2025-11-29)
118
 
119
  - Fixed `_extract_text()` to handle various message object formats
@@ -122,6 +127,7 @@
122
  - Advanced mode now produces full research reports
123
 
124
  ### ~~P1 - Streaming Spam + API Key Persistence~~ FIXED
 
125
  **Commit**: `0c9be4a` (2025-11-29)
126
 
127
  - Streaming events now buffered (not token-by-token spam)
@@ -129,6 +135,7 @@
129
  - Examples use explicit `None` values to avoid overwriting keys
130
 
131
  ### ~~P2 - Missing "Thinking" State~~ FIXED
 
132
  **Commit**: `9006d69` (2025-11-29)
133
 
134
  - Added `"thinking"` event type with hourglass icon
@@ -136,6 +143,7 @@
136
  - Users now see feedback during 2-5 minute initial processing
137
 
138
  ### ~~P2 - Gradio Example Not Filling Chat Box~~ FIXED
 
139
  **Commit**: `2ea01fd` (2025-11-29)
140
 
141
  - Third example (HSDD) wasn't populating chat box when clicked
 
1
  # Active Bugs
2
 
3
+ > Last updated: 2025-12-01 (21:00 PST)
4
  >
5
  > **Note:** Completed bug docs archived to `docs/bugs/archive/`
6
  > **See also:** [Code Quality Audit Findings (2025-11-30)](AUDIT_FINDINGS_2025_11_30.md)
7
+ > **See also:** [ARCHITECTURE.md](../ARCHITECTURE.md) for unified architecture plan
8
 
9
+ ## P0 - Critical (BLOCKED)
10
 
11
+ ### Free Tier Broken (Upstream #2562)
12
+
13
+ **Issue:** [#105](https://github.com/The-Obstacle-Is-The-Way/DeepBoner/issues/105), [#113](https://github.com/The-Obstacle-Is-The-Way/DeepBoner/issues/113)
14
+ **Status:** BLOCKED - Waiting for upstream PR #2566
15
+
16
+ **Problem:** Free tier (Advanced Mode + HuggingFace) shows repr garbage output.
17
+
18
+ **Cause:** Microsoft Agent Framework upstream bug #2562.
19
+
20
+ **Fix:** Upstream PR #2566 will fix this. Once merged:
21
+ 1. Update `agent-framework` dependency
22
+ 2. Verify Advanced + HuggingFace works
23
+ 3. Unified architecture complete
24
+
25
+ **Architecture Note:** We have ONE unified architecture. `simple.py` is deleted.
26
+ Simple Mode behavior is INTEGRATED via `HuggingFaceChatClient`, not a parallel orchestrator.
27
 
28
  ---
29
 
 
 
30
  ## Resolved Bugs
31
 
32
  ### ~~P0 - AIFunction Not JSON Serializable~~ FIXED
33
+
34
  **File:** `docs/bugs/P0_AIFUNCTION_NOT_JSON_SERIALIZABLE.md`
35
  **Found:** 2025-12-01
36
  **Resolved:** 2025-12-01
 
42
  - Result: Free Tier now supports full function calling capabilities with Qwen2.5-72B.
43
 
44
  ### ~~P1 - HuggingFace Router 401 Unauthorized~~ FIXED
45
+
46
  **File:** `docs/bugs/P1_HUGGINGFACE_ROUTER_401_HYPERBOLIC.md`
47
  **Found:** 2025-12-01
48
  **Resolved:** 2025-12-01
 
52
  - Fix: Generated new valid HF_TOKEN, updated `.env` and Spaces secrets
53
  - Also switched default model to `Qwen/Qwen2.5-72B-Instruct` for better reliability
54
 
 
 
 
 
 
 
 
 
 
 
 
55
  ### ~~P1 - Advanced Mode Exposes Uninterpretable Chain-of-Thought~~ FIXED
56
+
57
  **File:** `docs/bugs/P1_ADVANCED_MODE_UNINTERPRETABLE_CHAIN_OF_THOUGHT.md`
58
  **PR:** [#107](https://github.com/The-Obstacle-Is-The-Way/DeepBoner/pull/107)
59
  **Found:** 2025-12-01
 
65
  - CodeRabbit review addressed: test markers, edge case handling, truncation test coverage.
66
 
67
  ### ~~P0 - Advanced Mode Timeout Yields No Synthesis~~ FIXED
68
+
69
  **File:** `docs/bugs/P0_ADVANCED_MODE_TIMEOUT_NO_SYNTHESIS.md`
70
  **Found:** 2025-11-30 (Manual Testing)
71
  **Resolved:** 2025-12-01
 
82
  - Tests: `tests/unit/orchestrators/test_advanced_timeout.py`
83
  - Key files: `src/orchestrators/advanced.py`, `src/orchestrators/factory.py`, `src/services/research_memory.py`
84
 
85
+ ### ~~P0 - Free Tier Synthesis Incorrectly Uses Server-Side API Keys~~ FIXED (Historical)
86
+
87
  **File:** `docs/bugs/P1_SYNTHESIS_BROKEN_KEY_FALLBACK.md`
88
  **PR:** [#103](https://github.com/The-Obstacle-Is-The-Way/DeepBoner/pull/103)
89
  **Found:** 2025-11-30 (Testing)
90
  **Resolved:** 2025-11-30
 
91
 
92
+ - Problem: Simple Mode crashed with "OpenAIError" on HuggingFace Spaces.
93
+ - Note: This was in the OLD Simple Mode. Now we use Unified Architecture.
94
+
95
+ ### ~~P0 - Synthesis Fails with OpenAIError in Free Mode~~ FIXED (Historical)
96
 
 
97
  **File:** `docs/bugs/P0_SYNTHESIS_PROVIDER_MISMATCH.md`
98
  **Found:** 2025-11-30 (Code Audit)
99
  **Resolved:** 2025-11-30
100
 
101
  - Problem: "Simple Mode" (Free Tier) crashed with `OpenAIError`.
102
+ - Note: This was in the OLD Simple Mode. Now we use Unified Architecture.
103
+
104
+ ### ~~P0 - Simple Mode Never Synthesizes~~ FIXED (Historical)
105
 
 
106
  **PR:** [#71](https://github.com/The-Obstacle-Is-The-Way/DeepBoner/pull/71) (SPEC_06)
107
  **Commit**: `5cac97d` (2025-11-29)
108
 
109
  - Root cause: LLM-as-Judge recommendations were being IGNORED
110
+ - Note: This was in the OLD Simple Mode. Now we use Unified Architecture.
 
 
111
 
112
  ### ~~P3 - Magentic Mode Missing Termination Guarantee~~ FIXED
113
+
114
  **Commit**: `d36ce3c` (2025-11-29)
115
 
116
  - Added `final_event_received` tracking in `orchestrator_magentic.py`
 
118
  - Verified with `test_magentic_termination.py`
119
 
120
  ### ~~P0 - Magentic Mode Report Generation~~ FIXED
121
+
122
  **Commit**: `9006d69` (2025-11-29)
123
 
124
  - Fixed `_extract_text()` to handle various message object formats
 
127
  - Advanced mode now produces full research reports
128
 
129
  ### ~~P1 - Streaming Spam + API Key Persistence~~ FIXED
130
+
131
  **Commit**: `0c9be4a` (2025-11-29)
132
 
133
  - Streaming events now buffered (not token-by-token spam)
 
135
  - Examples use explicit `None` values to avoid overwriting keys
136
 
137
  ### ~~P2 - Missing "Thinking" State~~ FIXED
138
+
139
  **Commit**: `9006d69` (2025-11-29)
140
 
141
  - Added `"thinking"` event type with hourglass icon
 
143
  - Users now see feedback during 2-5 minute initial processing
144
 
145
  ### ~~P2 - Gradio Example Not Filling Chat Box~~ FIXED
146
+
147
  **Commit**: `2ea01fd` (2025-11-29)
148
 
149
  - Third example (HSDD) wasn't populating chat box when clicked
docs/bugs/GRADIO_EXAMPLE_VS_CHAT_ARROW_ANALYSIS.md ADDED
@@ -0,0 +1,147 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Gradio Example Click vs Chat Arrow - Code Path Analysis
2
+
3
+ **Status**: ANALYZED - NOT A BUG (Same code path, different timing)
4
+ **Priority**: N/A (Symptom of upstream repr bug)
5
+ **Analyzed**: 2025-12-01
6
+ **Related**: P0_HUGGINGFACE_TOOL_CALLING_BROKEN.md
7
+
8
+ ---
9
+
10
+ ## Symptom Reported
11
+
12
+ User observed two different outputs when:
13
+ 1. **Clicking an Example** β†’ Shows progress at 10%, "THINKING" message
14
+ 2. **Clicking Chat Arrow** β†’ Shows full 5 rounds with repr garbage
15
+
16
+ User suspected divergent code paths from vestigial Simple Mode deletion.
17
+
18
+ ---
19
+
20
+ ## Analysis: NO DIVERGENT CODE PATHS
21
+
22
+ ### Code Trace
23
+
24
+ Both Example Click and Chat Arrow use **the exact same code path**:
25
+
26
+ ```text
27
+ User Action (Example OR Chat Arrow)
28
+ ↓
29
+ app.py:research_agent() ← SAME FUNCTION
30
+ ↓
31
+ app.py:configure_orchestrator() ← SAME FUNCTION (mode="advanced" always)
32
+ ↓
33
+ factory.py:create_orchestrator() ← SAME FUNCTION
34
+ ↓
35
+ factory.py:_determine_mode() ← ALWAYS returns "advanced"
36
+ ↓
37
+ AdvancedOrchestrator ← SAME CLASS
38
+ ↓
39
+ clients/factory.py:get_chat_client() ← SAME FUNCTION
40
+ ↓
41
+ HuggingFaceChatClient (no API key) OR OpenAIChatClient (with API key)
42
+ ```
43
+
44
+ ### Evidence from Code
45
+
46
+ **app.py:279-325 - ChatInterface Setup:**
47
+ ```python
48
+ demo = gr.ChatInterface(
49
+ fn=research_agent, # ← SAME FUNCTION FOR BOTH
50
+ examples=[
51
+ ["What drugs improve female libido post-menopause?", "sexual_health", None, None],
52
+ # ...
53
+ ],
54
+ # ...
55
+ )
56
+ ```
57
+
58
+ **factory.py:76-90 - Mode Determination:**
59
+ ```python
60
+ def _determine_mode(explicit_mode: str | None) -> str:
61
+ if explicit_mode == "hierarchical":
62
+ return "hierarchical"
63
+ # "simple" is deprecated -> upgrade to "advanced"
64
+ # "magentic" is alias for "advanced"
65
+ return "advanced" # ← ALWAYS ADVANCED
66
+ ```
67
+
68
+ ---
69
+
70
+ ## Explanation of Visual Difference
71
+
72
+ The difference the user observed is **timing**, not code paths:
73
+
74
+ | Screenshot | When Captured | Content |
75
+ |------------|---------------|---------|
76
+ | Example Click | Mid-execution | Progress bar at 10%, "THINKING" |
77
+ | Chat Arrow | After completion | Full 5 rounds with repr garbage |
78
+
79
+ **Both show the same process at different stages.**
80
+
81
+ The repr garbage (`<agent_framework._types.ChatMessage object at 0x...>`) appears in BOTH:
82
+ - Example Click: Would show repr garbage if captured after completion
83
+ - Chat Arrow: Shows repr garbage because it was captured after completion
84
+
85
+ ---
86
+
87
+ ## The Real Bug: Upstream repr Issue
88
+
89
+ The repr garbage is the **upstream Microsoft Agent Framework bug** documented in:
90
+ - `docs/bugs/P0_HUGGINGFACE_TOOL_CALLING_BROKEN.md`
91
+
92
+ **Root cause in upstream code:**
93
+ ```python
94
+ # agent_framework/_workflows/_magentic.py line ~1799
95
+ text = last.text or str(last) # BUG: str(last) gives repr for tool-only messages
96
+ ```
97
+
98
+ **Our workaround in advanced.py:**
99
+ ```python
100
+ def _extract_text(self, message: Any) -> str:
101
+ # Filter out repr strings
102
+ if isinstance(message, str) and message.startswith("<") and "object at" in message:
103
+ return ""
104
+ # ...
105
+ ```
106
+
107
+ ---
108
+
109
+ ## Verification
110
+
111
+ 1. **No vestigial Simple Mode code** - `simple.py` is deleted, not imported anywhere
112
+ 2. **Factory always returns AdvancedOrchestrator** - verified in `factory.py:66-73`
113
+ 3. **Same research_agent function** - Gradio routes both Example and Chat Arrow through it
114
+
115
+ ---
116
+
117
+ ## Conclusion
118
+
119
+ **There are NO divergent code paths.** The unified architecture is correctly implemented:
120
+
121
+ | Component | Status |
122
+ |-----------|--------|
123
+ | Simple Mode | βœ… DELETED (no vestigial code) |
124
+ | Factory Pattern | βœ… Always returns AdvancedOrchestrator |
125
+ | Chat Client Factory | βœ… Auto-selects HuggingFace (free) or OpenAI (paid) |
126
+ | Example Click | βœ… Uses same `research_agent()` function |
127
+ | Chat Arrow Click | βœ… Uses same `research_agent()` function |
128
+
129
+ **The only bug is the upstream repr display issue**, which affects BOTH paths equally.
130
+
131
+ ---
132
+
133
+ ## Next Steps
134
+
135
+ 1. **Wait for upstream fix** - [PR #2566](https://github.com/microsoft/agent-framework/pull/2566)
136
+ 2. **Once merged**: `uv add agent-framework@latest`
137
+ 3. **Test**: Verify both Example Click and Chat Arrow work identically
138
+
139
+ ---
140
+
141
+ ## References
142
+
143
+ - `src/app.py` - Line 134-247 (`research_agent()`)
144
+ - `src/app.py` - Line 279-325 (ChatInterface with examples)
145
+ - `src/orchestrators/factory.py` - Line 43-73 (`create_orchestrator()`)
146
+ - `src/clients/factory.py` - Line 15-76 (`get_chat_client()`)
147
+ - `docs/bugs/P0_HUGGINGFACE_TOOL_CALLING_BROKEN.md` - Upstream repr bug details
docs/bugs/P0_SIMPLE_MODE_FORCED_SYNTHESIS_BYPASS.md CHANGED
@@ -1,219 +1,59 @@
1
- # P0 BUG: Simple Mode Ignores Forced Synthesis from HF Inference Failures
2
 
3
- **Status**: Open β†’ **Fix via SPEC_16 (Integration)**
4
  **Priority**: P0 (Demo-blocking)
5
  **Discovered**: 2025-12-01
6
- **Affected Component**: `src/orchestrators/simple.py`
7
- **Strategic Fix**: [SPEC_16: Unified Chat Client Architecture](../specs/SPEC_16_UNIFIED_CHAT_CLIENT_ARCHITECTURE.md)
8
  **GitHub Issue**: [#113](https://github.com/The-Obstacle-Is-The-Way/DeepBoner/issues/113)
9
 
10
- > **Decision**: Instead of patching Simple Mode, we will **INTEGRATE its capability into Advanced Mode** per SPEC_16.
11
- >
12
- > **What this means:**
13
- > - βœ… Free-tier HuggingFace capability is PRESERVED via `HuggingFaceChatClient`
14
- > - βœ… Users without API keys still get full functionality (Advanced Mode + HuggingFace backend)
15
- > - πŸ—‘οΈ Simple Mode's redundant orchestration CODE is retired (not the capability!)
16
- > - πŸ› The bug disappears because Advanced Mode's Manager agent handles termination correctly
17
-
18
  ---
19
 
20
- ## Problem Statement
21
-
22
- When HuggingFace Inference API fails 3 consecutive times, the `HFInferenceJudgeHandler` correctly returns a "forced synthesis" assessment with `sufficient=True, recommendation="synthesize"`. However, **Simple Mode's `_should_synthesize()` method ignores this signal** because of overly strict code-enforced thresholds.
23
-
24
- ### Observed Behavior
25
-
26
- ```
27
- βœ… JUDGE_COMPLETE: Assessment: synthesize (confidence: 10%)
28
- πŸ”„ LOOPING: Gathering more evidence... ← BUG: Should have synthesized!
29
- ```
30
-
31
- The orchestrator loops **10 full iterations** despite the judge repeatedly saying "synthesize" after iteration 4.
32
 
33
- ### Expected Behavior
34
 
35
- When `HFInferenceJudgeHandler._create_forced_synthesis_assessment()` returns:
36
- - `sufficient=True`
37
- - `recommendation="synthesize"`
38
-
39
- The orchestrator should **immediately synthesize**, regardless of score thresholds.
40
 
41
  ---
42
 
43
- ## Root Cause Analysis
44
-
45
- ### The Forced Synthesis Assessment (judges.py:514-549)
46
-
47
- ```python
48
- def _create_forced_synthesis_assessment(self, question, evidence):
49
- return JudgeAssessment(
50
- details=AssessmentDetails(
51
- mechanism_score=0, # ← Problem 1: Score is 0
52
- clinical_evidence_score=0, # ← Problem 2: Score is 0
53
- drug_candidates=["AI analysis required..."],
54
- key_findings=findings,
55
- ),
56
- sufficient=True, # ← Correct: Says sufficient
57
- confidence=0.1, # ← Problem 3: Too low for emergency
58
- recommendation="synthesize", # ← Correct: Says synthesize
59
- ...
60
- )
61
- ```
62
-
63
- ### The _should_synthesize Logic (simple.py:159-216)
64
 
65
- ```python
66
- def _should_synthesize(self, assessment, iteration, max_iterations, evidence_count):
67
- combined_score = mechanism_score + clinical_evidence_score # = 0
68
 
69
- # Priority 1: Judge approved - BUT REQUIRES combined_score >= 10!
70
- if assessment.sufficient and assessment.recommendation == "synthesize":
71
- if combined_score >= 10: # ← 0 >= 10 is FALSE!
72
- return True, "judge_approved"
73
-
74
- # Priority 2-5: All require scores or drug candidates we don't have
75
-
76
- # Priority 6: Emergency synthesis
77
- if is_late_iteration and evidence_count >= 30 and confidence >= 0.5:
78
- # ↑ 0.1 >= 0.5 is FALSE!
79
- return True, "emergency_synthesis"
80
-
81
- return False, "continue_searching" # ← Always ends up here!
82
  ```
83
 
84
- ### The Bug
85
-
86
- 1. **Priority 1 has wrong precondition**: It checks `combined_score >= 10` even when the judge explicitly says "synthesize". The score check should be skipped when it's a forced/error recovery synthesis.
87
-
88
- 2. **Priority 6 confidence threshold is too high**: 0.5 confidence is reasonable for "emergency" synthesis, but forced synthesis from API failures uses 0.1 confidence to indicate low qualityβ€”this should still trigger synthesis.
89
-
90
- ---
91
-
92
- ## Impact
93
-
94
- - **User sees**: 10 iterations of "Gathering more evidence" with 0% confidence
95
- - **Final output**: Partial synthesis with "Max iterations reached"
96
- - **Time wasted**: ~2-3 minutes of useless API calls
97
- - **UX**: Extremely confusing - user sees "synthesize" but system keeps searching
98
-
99
  ---
100
 
101
- ## Proposed Fix
102
 
103
- ### ~~Option A: Patch Simple Mode~~ (REJECTED)
 
 
 
104
 
105
- We considered patching `_should_synthesize()` to respect forced synthesis signals. However, this adds MORE complexity to an already complex system that we plan to delete.
106
 
107
- ### βœ… Strategic Fix: SPEC_16 Unification (APPROVED)
108
-
109
- **Delete Simple Mode entirely and unify on Advanced Mode.**
110
-
111
- See: [SPEC_16: Unified Chat Client Architecture](../specs/SPEC_16_UNIFIED_CHAT_CLIENT_ARCHITECTURE.md)
112
-
113
- The implementation path:
114
-
115
- 1. **Phase 1**: Create `HuggingFaceChatClient` adapter (~150 lines)
116
- - Implements `agent_framework.BaseChatClient`
117
- - Wraps `huggingface_hub.InferenceClient`
118
- - Enables Advanced Mode to work with free tier
119
-
120
- 2. **Phase 2**: Delete Simple Mode
121
- - Remove `src/orchestrators/simple.py` (~778 lines)
122
- - Remove `src/tools/search_handler.py` (~219 lines)
123
- - Update factory to always use `AdvancedOrchestrator`
124
-
125
- 3. **Why this works**: Advanced Mode uses Microsoft Agent Framework's built-in termination. When JudgeAgent returns "SUFFICIENT EVIDENCE" (per SPEC_15), the Manager agent immediately delegates to ReportAgent. **No custom `_should_synthesize()` thresholds needed.**
126
-
127
- ### Why Unification > Patching
128
-
129
- | Approach | Lines Changed | Bug Fixed? | Technical Debt |
130
- |----------|---------------|------------|----------------|
131
- | Patch Simple Mode | +20 lines | Temporarily | Adds complexity |
132
- | **SPEC_16 Unification** | **-997 lines** | **Permanently** | **Eliminates 778 lines** |
133
-
134
- ---
135
-
136
- ## Files to DELETE (via SPEC_16)
137
-
138
- | File | Lines | Reason |
139
- |------|-------|--------|
140
- | `src/orchestrators/simple.py` | 778 | Contains buggy `_should_synthesize()` - entire file deleted |
141
- | `src/tools/search_handler.py` | 219 | Manager agent handles orchestration in Advanced Mode |
142
-
143
- ## Files to CREATE (via SPEC_16)
144
-
145
- | File | Lines | Purpose |
146
- |------|-------|---------|
147
- | `src/clients/__init__.py` | ~10 | Package exports |
148
- | `src/clients/factory.py` | ~50 | `get_chat_client()` factory |
149
- | `src/clients/huggingface.py` | ~150 | `HuggingFaceChatClient` adapter |
150
-
151
- **Net change: -997 lines deleted, +210 lines added = ~787 lines removed**
152
-
153
- ---
154
-
155
- ## Acceptance Criteria (SPEC_16 Implementation)
156
-
157
- - [ ] `HuggingFaceChatClient` implements `agent_framework.BaseChatClient`
158
- - [ ] `get_chat_client()` returns HuggingFace client when no OpenAI key
159
- - [ ] `AdvancedOrchestrator` works with HuggingFace backend
160
- - [ ] `simple.py` is deleted (778 lines removed)
161
- - [ ] Free tier users get Advanced Mode with HuggingFace
162
- - [ ] No more "continue_searching" loops when HF fails
163
- - [ ] Manager agent respects "SUFFICIENT EVIDENCE" signal (SPEC_15)
164
-
165
- ---
166
-
167
- ## Test Case (SPEC_16 Verification)
168
-
169
- ```python
170
- @pytest.mark.asyncio
171
- async def test_unified_architecture_handles_hf_failures():
172
- """
173
- After SPEC_16: Free tier uses Advanced Mode with HuggingFace backend.
174
- When HF fails, Manager agent should trigger synthesis via ReportAgent.
175
-
176
- This test replaces the old Simple Mode test because:
177
- - simple.py is DELETED
178
- - Advanced Mode handles termination via Manager agent signals
179
- - No _should_synthesize() thresholds to bypass
180
- """
181
- from unittest.mock import patch, MagicMock
182
- from src.orchestrators.advanced import AdvancedOrchestrator
183
- from src.clients.factory import get_chat_client
184
-
185
- # Verify factory returns HuggingFace client when no OpenAI key
186
- with patch("src.utils.config.settings") as mock_settings:
187
- mock_settings.has_openai_key = False
188
- mock_settings.has_gemini_key = False
189
- mock_settings.has_huggingface_key = True
190
-
191
- client = get_chat_client()
192
- assert "HuggingFace" in type(client).__name__
193
-
194
- # Verify AdvancedOrchestrator accepts HuggingFace client
195
- # (The actual termination is handled by Manager agent respecting
196
- # "SUFFICIENT EVIDENCE" signals per SPEC_15)
197
- ```
198
 
199
  ---
200
 
201
- ## Related Issues & Specs
202
 
203
- | Reference | Type | Relationship |
204
- |-----------|------|--------------|
205
- | [SPEC_16](../specs/SPEC_16_UNIFIED_CHAT_CLIENT_ARCHITECTURE.md) | Spec | **THE FIX** - Unified architecture eliminates this bug |
206
- | [SPEC_15](../specs/SPEC_15_ADVANCED_MODE_PERFORMANCE.md) | Spec | Manager agent termination logic (already implemented) |
207
- | [Issue #105](https://github.com/The-Obstacle-Is-The-Way/DeepBoner/issues/105) | GitHub | Deprecate Simple Mode |
208
- | [Issue #109](https://github.com/The-Obstacle-Is-The-Way/DeepBoner/issues/109) | GitHub | Simplify Provider Architecture |
209
- | [Issue #110](https://github.com/The-Obstacle-Is-The-Way/DeepBoner/issues/110) | GitHub | Remove Anthropic Support |
210
- | PR #71 (SPEC_06) | PR | Added `_should_synthesize()` - now causes this bug |
211
- | Commit 5e761eb | Commit | Added `_create_forced_synthesis_assessment()` |
212
 
213
  ---
214
 
215
- ## References
216
 
217
- - `src/orchestrators/simple.py:159-216` - `_should_synthesize()` method
218
- - `src/agent_factory/judges.py:514-549` - `_create_forced_synthesis_assessment()`
219
- - `src/agent_factory/judges.py:477-512` - `_create_quota_exhausted_assessment()`
 
 
 
 
 
1
+ # P0 BUG: Simple Mode Synthesis Bypass (WILL BE FIXED BY UNIFIED ARCHITECTURE)
2
 
3
+ **Status**: BLOCKED - Waiting for upstream PR #2566
4
  **Priority**: P0 (Demo-blocking)
5
  **Discovered**: 2025-12-01
 
 
6
  **GitHub Issue**: [#113](https://github.com/The-Obstacle-Is-The-Way/DeepBoner/issues/113)
7
 
 
 
 
 
 
 
 
 
8
  ---
9
 
10
+ ## Current State
 
 
 
 
 
 
 
 
 
 
 
11
 
12
+ **`simple.py` is DELETED.** This bug existed in the old Simple Mode code.
13
 
14
+ The bug will NOT be fixed by restoring Simple Mode. Instead, it will be **automatically fixed** when we complete the unified architecture (after upstream PR #2566 merges).
 
 
 
 
15
 
16
  ---
17
 
18
+ ## The Bug (Historical)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
19
 
20
+ When HuggingFace Inference API failed, Simple Mode's `_should_synthesize()` ignored forced synthesis signals due to overly strict thresholds.
 
 
21
 
22
+ ```text
23
+ βœ… JUDGE_COMPLETE: Assessment: synthesize (confidence: 10%)
24
+ πŸ”„ LOOPING: Gathering more evidence... ← BUG: Should have synthesized!
 
 
 
 
 
 
 
 
 
 
25
  ```
26
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
27
  ---
28
 
29
+ ## Why Unified Architecture Fixes This
30
 
31
+ | Architecture | How Termination Works |
32
+ |--------------|----------------------|
33
+ | **Old (Simple Mode)** | Custom `_should_synthesize()` with buggy thresholds |
34
+ | **New (Unified)** | Manager agent respects "SUFFICIENT EVIDENCE" signals |
35
 
36
+ The Manager agent in Advanced Mode already works correctly. By completing the unified architecture with HuggingFace support, we inherit that correct behavior.
37
 
38
+ **No need to patch `_should_synthesize()` because the code is deleted.**
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
39
 
40
  ---
41
 
42
+ ## Path Forward
43
 
44
+ 1. **Wait** for upstream PR #2566 to merge (fixes repr bug)
45
+ 2. **Update** `agent-framework` dependency
46
+ 3. **Verify** Advanced Mode + HuggingFace works
47
+ 4. **Done** - This bug is gone (no `_should_synthesize()` thresholds)
 
 
 
 
 
48
 
49
  ---
50
 
51
+ ## Related
52
 
53
+ | Reference | Description |
54
+ |-----------|-------------|
55
+ | [ARCHITECTURE.md](../ARCHITECTURE.md) | Current state and unified plan |
56
+ | [SPEC_16](../specs/SPEC_16_UNIFIED_CHAT_CLIENT_ARCHITECTURE.md) | Unified architecture spec |
57
+ | [Issue #105](https://github.com/The-Obstacle-Is-The-Way/DeepBoner/issues/105) | GitHub tracking |
58
+ | [Upstream #2562](https://github.com/microsoft/agent-framework/issues/2562) | Framework bug |
59
+ | [Upstream PR #2566](https://github.com/microsoft/agent-framework/pull/2566) | Framework fix |
docs/bugs/P1_SIMPLE_MODE_REMOVED_BREAKS_FREE_TIER_UX.md ADDED
@@ -0,0 +1,61 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Free Tier (No API Key) - BLOCKED by Upstream #2562
2
+
3
+ **Status**: BLOCKED - Waiting for upstream PR #2566
4
+ **Priority**: P1
5
+ **Discovered**: 2025-12-01
6
+
7
+ ---
8
+
9
+ ## Problem
10
+
11
+ Free tier (no API key provided) shows garbage output:
12
+
13
+ ```
14
+ πŸ“š **SEARCH_COMPLETE**: searcher: <agent_framework._types.ChatMessage object at 0x7fd3f8617b10>
15
+ ```
16
+
17
+ ## Cause
18
+
19
+ **Upstream Bug #2562**: Microsoft Agent Framework produces `repr()` garbage for tool-call-only messages.
20
+
21
+ ## Architecture
22
+
23
+ ```
24
+ User provides API key?
25
+
26
+ NO (Free Tier) YES (Paid Tier)
27
+ ────────────── ───────────────
28
+ HuggingFace backend OpenAI backend
29
+ Qwen 2.5 72B (free) GPT-5 (paid)
30
+
31
+ SAME orchestration, different backends
32
+ ONE codebase, not parallel universes
33
+ ```
34
+
35
+ ## Framework Stack
36
+
37
+ | Framework | Role |
38
+ |-----------|------|
39
+ | Microsoft Agent Framework | Multi-agent orchestration |
40
+ | Pydantic AI | Structured outputs & validation |
41
+
42
+ Both work TOGETHER. Not mutually exclusive.
43
+
44
+ ## Fix
45
+
46
+ **Upstream PR #2566** will fix this.
47
+
48
+ Once merged:
49
+ 1. `uv add agent-framework@latest`
50
+ 2. Verify free tier works
51
+ 3. Done
52
+
53
+ ## What Was Deleted
54
+
55
+ `simple.py` (778 lines) was a SEPARATE orchestrator. Created parallel universe. Now deleted. ONE orchestrator with different backends.
56
+
57
+ ## Related
58
+
59
+ - [Issue #105](https://github.com/The-Obstacle-Is-The-Way/DeepBoner/issues/105)
60
+ - [Upstream #2562](https://github.com/microsoft/agent-framework/issues/2562)
61
+ - [Upstream PR #2566](https://github.com/microsoft/agent-framework/pull/2566)
docs/specs/SPEC_16_UNIFIED_CHAT_CLIENT_ARCHITECTURE.md CHANGED
@@ -1,350 +1,115 @@
1
- # SPEC_16: Unified Chat Client Architecture
2
 
3
- **Status**: Proposed
4
- **Priority**: P0 (Fixes Critical Bug #113)
5
- **Issue**: Updates [#105](https://github.com/The-Obstacle-Is-The-Way/DeepBoner/issues/105), [#109](https://github.com/The-Obstacle-Is-The-Way/DeepBoner/issues/109), **[#113](https://github.com/The-Obstacle-Is-The-Way/DeepBoner/issues/113)** (P0 Bug)
6
  **Created**: 2025-12-01
7
- **Last Updated**: 2025-12-01
8
 
9
  ---
10
 
11
- ## ⚠️ CRITICAL CLARIFICATION: Integration, Not Deletion
12
-
13
- **This spec INTEGRATES Simple Mode's free-tier capability into Advanced Mode.**
14
-
15
- | What We're Doing | What We're NOT Doing |
16
- |------------------|----------------------|
17
- | βœ… Integrating HuggingFace support into Advanced Mode | ❌ Removing free-tier capability |
18
- | βœ… Unifying two parallel implementations into one | ❌ Breaking functionality for users without API keys |
19
- | βœ… Deleting redundant orchestration CODE | ❌ Deleting the CAPABILITY that code provided |
20
- | βœ… Making Advanced Mode work with ANY provider | ❌ Locking users into paid-only tiers |
21
-
22
- **After this spec:**
23
- - Users WITH OpenAI key β†’ Advanced Mode (OpenAI backend) βœ…
24
- - Users WITHOUT any key β†’ Advanced Mode (HuggingFace backend) βœ… **SAME CAPABILITY, UNIFIED ARCHITECTURE**
25
-
26
- ---
27
-
28
- ## Summary
29
-
30
- Unify Simple Mode and Advanced Mode into a **single orchestration system** by:
31
-
32
- 1. **Renaming the namespace**: `OpenAIChatClient` β†’ `BaseChatClient` (neutral protocol)
33
- 2. **Creating an adapter**: `HuggingFaceChatClient` implements `BaseChatClient`
34
- 3. **Retiring parallel code**: Simple Mode's while-loop becomes unnecessary
35
-
36
- The result: **One codebase, multiple providers, zero parallel universes.**
37
-
38
- > **πŸ”₯ P0 Bug Fix**: This also resolves Issue #113. Simple Mode's `_should_synthesize()` has a bug that ignores forced synthesis signals. Advanced Mode's Manager agent handles termination correctly. By integrating, the bug disappears.
39
-
40
- ---
41
-
42
- ## The Integration Concept
43
-
44
- ### Before: Two Parallel Universes (Current)
45
-
46
- ```text
47
- User Query
48
- β”‚
49
- β”œβ”€β”€ Has API Key? ──Yes──→ Advanced Mode (488 lines)
50
- β”‚ └── Microsoft Agent Framework
51
- β”‚ └── OpenAIChatClient (hardcoded) ◄── THE BOTTLENECK
52
- β”‚
53
- └── No API Key? ──────────→ Simple Mode (778 lines)
54
- └── While-loop orchestration (SEPARATE CODE)
55
- └── Pydantic AI + HuggingFace
56
- ```
57
-
58
- **Problem**: Same capability, two implementations, double maintenance, P0 bug in Simple Mode.
59
-
60
- ### After: Unified Architecture (This Spec)
61
 
62
  ```text
63
- User Query
64
- β”‚
65
- └──→ Advanced Mode (unified) ◄── ONE SYSTEM FOR ALL USERS
66
- └── Microsoft Agent Framework
67
- └── get_chat_client() returns: ◄── NAMESPACE NEUTRAL
68
- β”‚
69
- β”œβ”€β”€ OpenAIChatClient (if OpenAI key present)
70
- β”œβ”€β”€ GeminiChatClient (if Gemini key present) [Future]
71
- └── HuggingFaceChatClient (fallback - FREE TIER) ◄── INTEGRATED!
72
- ```
73
-
74
- **Result**: Free-tier users get the SAME Advanced Mode experience, just with HuggingFace as the LLM backend.
75
-
76
- ---
77
-
78
- ## What Gets Integrated vs Retired
79
-
80
- ### βœ… INTEGRATED (Capability Preserved)
81
-
82
- | Simple Mode Component | Integration Target | How |
83
- |-----------------------|-------------------|-----|
84
- | HuggingFace LLM calls | `HuggingFaceChatClient` | New adapter (~150 lines) |
85
- | Free-tier access | `get_chat_client()` factory | Auto-selects HF when no key |
86
- | Search tools (PubMed, etc.) | Already shared | `src/agents/tools.py` |
87
- | Evidence models | Already shared | `src/utils/models.py` |
88
-
89
- ### πŸ—‘οΈ RETIRED (Redundant Code Removed)
90
-
91
- | Simple Mode Component | Why Retired | Replacement in Advanced Mode |
92
- |-----------------------|-------------|------------------------------|
93
- | While-loop orchestration | Redundant | Manager agent orchestrates |
94
- | `_should_synthesize()` thresholds | **BUGGY** (P0 #113) | Manager agent signals |
95
- | `SearchHandler` scatter-gather | Redundant | SearchAgent handles this |
96
- | `JudgeHandler` | Redundant | JudgeAgent handles this |
97
-
98
- **Key insight**: We're not losing functionality. We're consolidating two implementations of the SAME functionality into one.
99
-
100
- ---
101
-
102
- ## Technical Implementation
103
-
104
- ### The Single Change That Enables Unification
105
-
106
- ```python
107
- # BEFORE (hardcoded to OpenAI):
108
- from agent_framework.openai import OpenAIChatClient
109
-
110
- class AdvancedOrchestrator:
111
- def __init__(self, ...):
112
- self._chat_client = OpenAIChatClient(...) # ❌ Only OpenAI works
113
-
114
- # AFTER (neutral - any provider):
115
- from agent_framework import BaseChatClient
116
- from src.clients.factory import get_chat_client
117
-
118
- class AdvancedOrchestrator:
119
- def __init__(self, ...):
120
- self._chat_client = get_chat_client() # βœ… OpenAI, Gemini, OR HuggingFace
121
- ```
122
-
123
- ### HuggingFaceChatClient Adapter
124
-
125
- ```python
126
- # src/clients/huggingface.py
127
- from agent_framework import BaseChatClient, ChatMessage, ChatResponse
128
- from huggingface_hub import InferenceClient
129
-
130
- class HuggingFaceChatClient(BaseChatClient):
131
- """Adapter that makes HuggingFace work with Microsoft Agent Framework."""
132
-
133
- def __init__(self, model_id: str = "meta-llama/Llama-3.1-70B-Instruct"):
134
- self._client = InferenceClient(model=model_id)
135
- self._model_id = model_id
136
-
137
- async def _inner_get_response(
138
- self,
139
- messages: list[ChatMessage],
140
- **kwargs
141
- ) -> ChatResponse:
142
- """Convert HuggingFace response to Agent Framework format."""
143
- # Convert messages to HF format
144
- hf_messages = [{"role": m.role, "content": m.content} for m in messages]
145
-
146
- # Call HuggingFace
147
- response = self._client.chat_completion(messages=hf_messages)
148
-
149
- # Convert back to Agent Framework format
150
- return ChatResponse(
151
- content=response.choices[0].message.content,
152
- # ... other fields
153
- )
154
-
155
- async def _inner_get_streaming_response(self, ...):
156
- """Streaming version."""
157
- ...
158
  ```
159
 
160
- ### ChatClientFactory
161
-
162
- ```python
163
- # src/clients/factory.py
164
- from agent_framework import BaseChatClient
165
- from agent_framework.openai import OpenAIChatClient
166
- from src.utils.config import settings
167
-
168
- def get_chat_client(provider: str | None = None) -> BaseChatClient:
169
- """
170
- Factory that returns the appropriate chat client.
171
-
172
- Priority:
173
- 1. OpenAI (if key available) - Best function calling, GPT-5
174
- 2. Gemini (if key available) - Good alternative [Future]
175
- 3. HuggingFace (always available) - FREE TIER FALLBACK
176
- """
177
- if provider == "openai" or (provider is None and settings.has_openai_key):
178
- return OpenAIChatClient(
179
- model_id=settings.openai_model, # gpt-5
180
- api_key=settings.openai_api_key,
181
- )
182
-
183
- # Future: Gemini support
184
- # if settings.has_gemini_key:
185
- # return GeminiChatClient(...)
186
-
187
- # FREE TIER: HuggingFace (no API key required for public models)
188
- from src.clients.huggingface import HuggingFaceChatClient
189
- return HuggingFaceChatClient(
190
- model_id="meta-llama/Llama-3.1-70B-Instruct",
191
- )
192
- ```
193
 
194
  ---
195
 
196
- ## Why This Fixes P0 Bug #113
197
 
198
- ### The Bug (Simple Mode)
199
 
200
- ```python
201
- # src/orchestrators/simple.py - THE BUG
202
- def _should_synthesize(self, assessment, ...):
203
- # When HF fails, judge returns: score=0, confidence=0.1, recommendation="synthesize"
204
 
205
- if assessment.sufficient and assessment.recommendation == "synthesize":
206
- if combined_score >= 10: # ❌ 0 >= 10 is FALSE
207
- return True
208
 
209
- if confidence >= 0.5: # ❌ 0.1 >= 0.5 is FALSE
210
- return True, "emergency"
211
 
212
- return False, "continue_searching" # ❌ LOOPS FOREVER
213
- ```
214
-
215
- ### The Fix (Advanced Mode - Already Works Correctly)
216
 
217
- ```python
218
- # Advanced Mode doesn't have this bug because:
219
- # 1. JudgeAgent says "SUFFICIENT EVIDENCE" in natural language
220
- # 2. Manager agent understands this and delegates to ReportAgent
221
- # 3. No hardcoded thresholds to bypass
222
-
223
- # The Manager agent prompt (src/orchestrators/advanced.py:152):
224
- """
225
- When JudgeAgent says "SUFFICIENT EVIDENCE" or "STOP SEARCHING":
226
- β†’ IMMEDIATELY delegate to ReportAgent for synthesis
227
- """
228
- ```
229
 
230
- **By integrating Simple Mode's capability into Advanced Mode, the bug disappears** because Advanced Mode's termination logic works correctly.
 
 
 
 
 
 
231
 
232
  ---
233
 
234
- ## Migration Plan
235
-
236
- ### Phase 1: Create HuggingFaceChatClient (Enables Integration)
237
-
238
- - [ ] Create `src/clients/` package
239
- - [ ] Implement `HuggingFaceChatClient` (~150 lines)
240
- - Extends `agent_framework.BaseChatClient`
241
- - Wraps `huggingface_hub.InferenceClient.chat_completion()`
242
- - Implements required abstract methods
243
- - [ ] Implement `get_chat_client()` factory (~50 lines)
244
- - [ ] Add unit tests
245
-
246
- **Exit Criteria**: `get_chat_client()` returns working HuggingFace client when no API key.
247
-
248
- ### Phase 2: Integrate into Advanced Mode (Fixes P0 Bug)
249
 
250
- - [ ] Update `AdvancedOrchestrator` to use `get_chat_client()`
251
- - [ ] Update `magentic_agents.py` type hints: `OpenAIChatClient` β†’ `BaseChatClient`
252
- - [ ] Update `orchestrators/factory.py` to always return `AdvancedOrchestrator`
253
- - [ ] Update `app.py` to remove mode toggle (everyone gets Advanced Mode)
254
- - [ ] Archive `simple.py` to `docs/archive/` (for reference)
255
- - [ ] Migrate Simple Mode tests to Advanced Mode tests
256
 
257
- **Exit Criteria**: Free-tier users get Advanced Mode with HuggingFace backend. P0 bug gone.
258
-
259
- ### Phase 3: Cleanup (Optional)
260
-
261
- - [ ] Remove Anthropic provider code (Issue #110)
262
- - [ ] Add Gemini support (Issue #109)
263
- - [ ] Delete archived files after verification period
264
 
265
  ---
266
 
267
- ## Files Changed
268
-
269
- ### New Files (~200 lines)
270
-
271
- | File | Lines | Purpose |
272
- |------|-------|---------|
273
- | `src/clients/__init__.py` | ~10 | Package exports |
274
- | `src/clients/factory.py` | ~50 | `get_chat_client()` |
275
- | `src/clients/huggingface.py` | ~150 | HuggingFace adapter |
276
 
277
- ### Modified Files
278
 
279
- | File | Change |
280
- |------|--------|
281
- | `src/orchestrators/advanced.py` | Use `get_chat_client()` instead of `OpenAIChatClient` |
282
- | `src/orchestrators/factory.py` | Always return `AdvancedOrchestrator` |
283
- | `src/agents/magentic_agents.py` | Type hints: `OpenAIChatClient` β†’ `BaseChatClient` |
284
- | `src/app.py` | Remove mode toggle, always use Advanced |
285
 
286
- ### Archived Files (NOT deleted from git history)
287
-
288
- | File | Lines | Reason |
289
- |------|-------|--------|
290
- | `src/orchestrators/simple.py` | 778 | Functionality INTEGRATED, code retired |
291
- | `src/tools/search_handler.py` | 219 | Manager agent handles this now |
292
 
293
  ---
294
 
295
- ## Verification Checklist
296
-
297
- ### Technical Prerequisites (Verified βœ…)
298
-
299
- - [x] `agent_framework.BaseChatClient` exists
300
- - [x] Abstract methods: `_inner_get_response`, `_inner_get_streaming_response`
301
- - [x] `huggingface_hub.InferenceClient.chat_completion()` exists
302
- - [x] `chat_completion()` has `tools` parameter (verified in 0.36.0)
303
- - [x] HuggingFace supports Llama 3.1 70B via free inference
304
- - [x] **Dependency pinned**: `huggingface-hub>=0.24.0` in pyproject.toml (required for stable tool calling)
305
-
306
- ### Capability Preservation Checklist
307
 
308
- After implementation, verify:
 
 
 
309
 
310
- - [ ] User with OpenAI key β†’ Gets Advanced Mode with OpenAI (GPT-5)
311
- - [ ] User with NO key β†’ Gets Advanced Mode with HuggingFace (Llama 3.1 70B)
312
- - [ ] Free-tier search works (PubMed, ClinicalTrials, EuropePMC)
313
- - [ ] Free-tier synthesis works (LLM generates report)
314
- - [ ] No more "continue_searching" infinite loops (P0 bug fixed)
315
 
316
  ---
317
 
318
- ## Implementation Notes (From Independent Audit)
319
-
320
- ### Dependency Requirement βœ… FIXED
321
-
322
- The `huggingface-hub` package must be `>=0.24.0` for stable `chat_completion` with tools support.
323
-
324
- ```toml
325
- # pyproject.toml - ALREADY UPDATED
326
- "huggingface-hub>=0.24.0", # Required for stable chat_completion with tools
327
- ```
328
-
329
- ### Llama 3.1 Prompt Considerations ⚠️
330
-
331
- The Manager agent prompt in `AdvancedOrchestrator._create_task_prompt()` was optimized for GPT-5. When using Llama 3.1 70B via HuggingFace, the prompt **may need tuning** to ensure strict adherence to delegation logic.
332
-
333
- **Potential issue**: Llama 3.1 may not immediately delegate to ReportAgent when JudgeAgent says "SUFFICIENT EVIDENCE".
334
-
335
- **Mitigation**: During implementation, test with HuggingFace backend and add reinforcement phrases if needed:
336
- - "You MUST delegate to ReportAgent when you see SUFFICIENT EVIDENCE"
337
- - "Do NOT continue searching after Judge approves"
338
 
339
- This is a **runtime verification** task, not a spec change.
 
 
 
 
 
 
 
340
 
341
  ---
342
 
343
  ## References
344
 
345
- - Microsoft Agent Framework: `agent_framework.BaseChatClient`
346
- - HuggingFace Inference: `huggingface_hub.InferenceClient`
347
- - Issue #105: Deprecate Simple Mode β†’ **Reframe as "Integrate Simple Mode"**
348
- - Issue #109: Simplify Provider Architecture
349
- - Issue #110: Remove Anthropic Provider Support
350
- - Issue #113: P0 Bug - Simple Mode ignores forced synthesis
 
1
+ # SPEC_16: Unified Architecture
2
 
3
+ **Status**: BLOCKED - Waiting for upstream PR #2566
4
+ **Priority**: P0
5
+ **Issue**: [#105](https://github.com/The-Obstacle-Is-The-Way/DeepBoner/issues/105)
6
  **Created**: 2025-12-01
 
7
 
8
  ---
9
 
10
+ ## The Architecture (No Bullshit Version)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
11
 
12
  ```text
13
+ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
14
+ β”‚ UNIFIED ARCHITECTURE β”‚
15
+ β”‚ β”‚
16
+ β”‚ User provides API key? β”‚
17
+ β”‚ β”‚
18
+ β”‚ NO (Free Tier) YES (Paid Tier) β”‚
19
+ β”‚ ────────────── ─────────────── β”‚
20
+ β”‚ HuggingFace backend OpenAI backend β”‚
21
+ β”‚ Qwen 2.5 72B (free) GPT-5 (paid) β”‚
22
+ β”‚ β”‚
23
+ β”‚ SAME orchestration logic for both β”‚
24
+ β”‚ ONE codebase, different LLM backends β”‚
25
+ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
26
  ```
27
 
28
+ **No "modes."** Just: do you have an API key or not?
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
29
 
30
  ---
31
 
32
+ ## Framework Stack
33
 
34
+ DeepBoner uses TWO frameworks that work TOGETHER:
35
 
36
+ | Framework | Role | Files |
37
+ |-----------|------|-------|
38
+ | **Microsoft Agent Framework** | Multi-agent ORCHESTRATION | `src/orchestrators/advanced.py` |
39
+ | **Pydantic AI** | Structured OUTPUTS & validation | `src/agent_factory/judges.py`, `src/agents/*.py` |
40
 
41
+ ### Why Both?
 
 
42
 
43
+ - **Microsoft AF** handles: Manager β†’ Search β†’ Judge β†’ Report agent coordination
44
+ - **Pydantic AI** handles: Structured responses, type validation, schema enforcement
45
 
46
+ They are **NOT mutually exclusive**. They are **complementary**:
47
+ - Microsoft AF = the highway system (routes agents)
48
+ - Pydantic AI = the cargo containers (structures data)
 
49
 
50
+ ### Current Integration
 
 
 
 
 
 
 
 
 
 
 
51
 
52
+ | Component | Framework | Purpose |
53
+ |-----------|-----------|---------|
54
+ | `AdvancedOrchestrator` | Microsoft AF | Coordinates multi-agent workflow |
55
+ | `JudgeAssessment` | Pydantic AI | Structured judge output with validation |
56
+ | `Evidence`, `Citation` | Pydantic | Validated data models |
57
+ | Agent tool calling | Microsoft AF | Function execution |
58
+ | Agent structured output | Pydantic AI | Response validation |
59
 
60
  ---
61
 
62
+ ## LLM Backend Selection
 
 
 
 
 
 
 
 
 
 
 
 
 
 
63
 
64
+ Auto-detected by `src/clients/factory.py`:
 
 
 
 
 
65
 
66
+ | Condition | Backend | Model |
67
+ |-----------|---------|-------|
68
+ | User provides OpenAI key | OpenAI | GPT-5 |
69
+ | No API key | HuggingFace | Qwen 2.5 72B (free) |
 
 
 
70
 
71
  ---
72
 
73
+ ## Current Blocker
 
 
 
 
 
 
 
 
74
 
75
+ **Upstream Bug #2562**: Microsoft Agent Framework produces `repr()` garbage for tool-call-only messages.
76
 
77
+ **Fix**: [PR #2566](https://github.com/microsoft/agent-framework/pull/2566) - waiting for merge.
 
 
 
 
 
78
 
79
+ **Once merged**:
80
+ 1. `uv add agent-framework@latest`
81
+ 2. Verify free tier works
82
+ 3. Done
 
 
83
 
84
  ---
85
 
86
+ ## What Was Deleted
 
 
 
 
 
 
 
 
 
 
 
87
 
88
+ `simple.py` (778 lines) was a SEPARATE orchestrator that created a parallel universe:
89
+ - Used Pydantic AI directly for LLM calls
90
+ - Had its own while-loop orchestration
91
+ - Duplicated search/judge logic
92
 
93
+ Now there's ONE orchestrator with different backends.
 
 
 
 
94
 
95
  ---
96
 
97
+ ## Files
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
98
 
99
+ | File | Framework | Purpose |
100
+ |------|-----------|---------|
101
+ | `src/orchestrators/advanced.py` | Microsoft AF | Multi-agent orchestration |
102
+ | `src/clients/factory.py` | - | Auto-selects LLM backend |
103
+ | `src/clients/huggingface.py` | - | HuggingFace adapter (free tier) |
104
+ | `src/agent_factory/judges.py` | Pydantic AI | Structured judge assessments |
105
+ | `src/agents/report_agent.py` | Pydantic AI | Structured report generation |
106
+ | `src/utils/models.py` | Pydantic | Data models (Evidence, Citation) |
107
 
108
  ---
109
 
110
  ## References
111
 
112
+ - [Microsoft Agent Framework](https://github.com/microsoft/agent-framework) - Multi-agent orchestration
113
+ - [Pydantic AI](https://ai.pydantic.dev/) - Structured outputs framework
114
+ - [Multi-Agent Research System with Pydantic](https://www.analyticsvidhya.com/blog/2025/03/multi-agent-research-assistant-system-using-pydantic/) - Architecture pattern
115
+ - [AG-UI Protocol](https://www.copilotkit.ai/blog/introducing-pydantic-ai-integration-with-ag-ui) - How frameworks integrate
 
 
docs/specs/SPEC_17_ACCUMULATOR_PATTERN.md ADDED
@@ -0,0 +1,62 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # SPEC 17: Accumulator Pattern for Agent Events
2
+
3
+ **Status**: IMPLEMENTED
4
+ **Created**: 2025-12-02
5
+ **Author**: AI Agent
6
+ **Related**: P0_REPR_BUG_ROOT_CAUSE_ANALYSIS.md
7
+
8
+ ## 1. Context
9
+
10
+ The Microsoft Agent Framework event model has a specific intended usage pattern:
11
+ - `MagenticAgentDeltaEvent.text` β†’ **Content Source** (Streaming)
12
+ - `MagenticAgentMessageEvent` β†’ **Completion Signal** (End of Turn)
13
+
14
+ Our previous implementation incorrectly attempted to extract content from `MagenticAgentMessageEvent.message`. This property is not designed for content extraction and can contain internal representation data (repr strings) for tool-only messages. This led to the "repr bug" where users saw raw Python object strings in the UI.
15
+
16
+ The **Accumulator Pattern** aligns our codebase with Microsoft's intended architecture (as demonstrated in their `04_magentic_one.py` sample) and resolves the display issues by using the correct event data source.
17
+
18
+ ## 2. The Solution: Accumulator Pattern
19
+
20
+ Instead of relying on the final message event for content, we adopt the **Accumulator Pattern**, which aligns with the Microsoft Agent Framework's intended usage (as seen in their sample `04_magentic_one.py`).
21
+
22
+ ### 2.1 Core Concept
23
+
24
+ 1. **Streaming is Truth**: `MagenticAgentDeltaEvent` is the exclusive source of text content. These events are not affected by the upstream bug.
25
+ 2. **Accumulation**: The orchestrator maintains a stateful buffer (`current_message_buffer`) that appends text from delta events.
26
+ 3. **Signal Processing**: `MagenticAgentMessageEvent` is treated solely as a completion signal ("end of turn"). When received, we consume the buffer to form the final UI message and then clear the buffer.
27
+
28
+ ### 2.2 Logic Flow
29
+
30
+ ```python
31
+ current_message_buffer = ""
32
+
33
+ for event in stream:
34
+ if event is DeltaEvent:
35
+ current_message_buffer += event.text
36
+ emit_streaming_event(event.text)
37
+
38
+ elif event is MessageEvent:
39
+ # IGNORE event.message (it might be corrupted)
40
+ final_text = current_message_buffer
41
+ if not final_text:
42
+ final_text = "Action completed (Tool Call)"
43
+
44
+ emit_complete_event(final_text)
45
+ current_message_buffer = ""
46
+ ```
47
+
48
+ ## 3. Test Plan
49
+
50
+ To verify this pattern ensures correct output regardless of upstream bugs, we define the following test scenarios:
51
+
52
+ ### 3.1 Scenario A: Standard Text Message
53
+ - **Input**: Sequence of `MagenticAgentDeltaEvent` (with text parts) -> `MagenticAgentMessageEvent` (with corrupted repr).
54
+ - **Expected Output**: The `AgentEvent` emitted at the end must contain the concatenated text from the deltas, NOT the repr string.
55
+
56
+ ### 3.2 Scenario B: Tool Call (No Text)
57
+ - **Input**: No text deltas -> `MagenticAgentMessageEvent` (with corrupted repr).
58
+ - **Expected Output**: The `AgentEvent` should contain a fallback message (e.g., "Action completed (Tool Call)"), NOT the repr string.
59
+
60
+ ## 4. Implementation Details
61
+
62
+ The pattern is implemented in `src/orchestrators/advanced.py` within the `run()` method loop. It bypasses `_process_event` for these specific event types to ensure strict control over data flow.
src/orchestrators/advanced.py CHANGED
@@ -17,7 +17,7 @@ Design Patterns:
17
 
18
  import asyncio
19
  from collections.abc import AsyncGenerator
20
- from typing import TYPE_CHECKING, Any
21
 
22
  import structlog
23
  from agent_framework import (
@@ -181,6 +181,69 @@ The final output should be a structured research report."""
181
 
182
  return f"Round {iteration}/{self._max_rounds} (~{est_display} remaining)"
183
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
184
  async def run(self, query: str) -> AsyncGenerator[AgentEvent, None]:
185
  """
186
  Run the workflow.
@@ -193,18 +256,10 @@ The final output should be a structured research report."""
193
  """
194
  logger.info("Starting Advanced orchestrator", query=query)
195
 
196
- yield AgentEvent(
197
- type="started",
198
- message=f"Starting research (Advanced mode): {query}",
199
- iteration=0,
200
- )
201
 
202
  # Initialize context state
203
- yield AgentEvent(
204
- type="progress",
205
- message="Loading embedding service (LlamaIndex/ChromaDB)...",
206
- iteration=0,
207
- )
208
  embedding_service = self._init_embedding_service()
209
 
210
  yield AgentEvent(
@@ -238,25 +293,52 @@ The final output should be a structured research report."""
238
  iteration = 0
239
  final_event_received = False
240
 
 
 
 
 
 
 
241
  try:
242
  async with asyncio.timeout(self._timeout_seconds):
243
  async for event in workflow.run_stream(task):
244
- agent_event = self._process_event(event, iteration)
245
- if agent_event:
246
- if isinstance(event, MagenticAgentMessageEvent):
247
- iteration += 1
248
- progress_msg = self._get_progress_message(iteration)
249
-
250
- # Yield progress update before the agent action
 
 
251
  yield AgentEvent(
252
- type="progress",
253
- message=progress_msg,
 
254
  iteration=iteration,
255
  )
 
 
 
 
 
 
256
 
 
 
 
 
 
 
 
 
 
 
 
 
 
257
  if agent_event.type == "complete":
258
  final_event_received = True
259
-
260
  yield agent_event
261
 
262
  # GUARANTEE: Always emit termination event if stream ends without one
@@ -278,52 +360,8 @@ The final output should be a structured research report."""
278
  )
279
 
280
  except TimeoutError:
281
- logger.warning("Workflow timed out", iterations=iteration)
282
-
283
- # ACTUALLY synthesize from gathered evidence
284
- try:
285
- from src.agents.magentic_agents import create_report_agent
286
- from src.agents.state import get_magentic_state
287
-
288
- state = get_magentic_state()
289
- memory = state.memory
290
-
291
- # Get evidence summary from memory
292
- evidence_summary = await memory.get_context_summary()
293
-
294
- # Create and invoke ReportAgent for synthesis
295
- report_agent = create_report_agent(self._chat_client, domain=self.domain)
296
-
297
- yield AgentEvent(
298
- type="synthesizing",
299
- message="Workflow timed out. Synthesizing available evidence...",
300
- iteration=iteration,
301
- )
302
-
303
- # Invoke ReportAgent directly
304
- # Note: ChatAgent.run() returns the final response string
305
- synthesis_result = await report_agent.run(
306
- "Synthesize research report from this evidence. "
307
- f"If evidence is sparse, say so.\n\n{evidence_summary}"
308
- )
309
-
310
- yield AgentEvent(
311
- type="complete",
312
- message=str(synthesis_result),
313
- data={"reason": "timeout_synthesis", "iterations": iteration},
314
- iteration=iteration,
315
- )
316
- except Exception as synth_error:
317
- logger.error("Timeout synthesis failed", error=str(synth_error))
318
- yield AgentEvent(
319
- type="complete",
320
- message=(
321
- f"Research timed out after {iteration} rounds. "
322
- f"Evidence gathered but synthesis failed: {synth_error}"
323
- ),
324
- data={"reason": "timeout_synthesis_failed", "iterations": iteration},
325
- iteration=iteration,
326
- )
327
 
328
  except Exception as e:
329
  logger.error("Workflow failed", error=str(e))
@@ -333,6 +371,45 @@ The final output should be a structured research report."""
333
  iteration=iteration,
334
  )
335
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
336
  def _extract_text(self, message: Any) -> str:
337
  """
338
  Defensively extract text from a message object.
@@ -384,7 +461,9 @@ The final output should be a structured research report."""
384
  # The repr is useless for display purposes
385
  return ""
386
 
387
- def _get_event_type_for_agent(self, agent_name: str) -> str:
 
 
388
  """Map agent name to appropriate event type.
389
 
390
  Args:
@@ -444,17 +523,8 @@ The final output should be a structured research report."""
444
  iteration=iteration,
445
  )
446
 
447
- elif isinstance(event, MagenticAgentMessageEvent):
448
- agent_name = event.agent_id or "unknown"
449
- text = self._extract_text(event.message)
450
- event_type = self._get_event_type_for_agent(agent_name)
451
-
452
- # All returned types are valid AgentEvent.type literals
453
- return AgentEvent(
454
- type=event_type, # type: ignore[arg-type]
455
- message=f"{agent_name}: {self._smart_truncate(text)}",
456
- iteration=iteration + 1,
457
- )
458
 
459
  elif isinstance(event, MagenticFinalResultEvent):
460
  text = self._extract_text(event.message) if event.message else "No result"
@@ -465,14 +535,8 @@ The final output should be a structured research report."""
465
  iteration=iteration,
466
  )
467
 
468
- elif isinstance(event, MagenticAgentDeltaEvent):
469
- if event.text:
470
- return AgentEvent(
471
- type="streaming",
472
- message=event.text,
473
- data={"agent_id": event.agent_id},
474
- iteration=iteration,
475
- )
476
 
477
  elif isinstance(event, WorkflowOutputEvent):
478
  if event.data:
 
17
 
18
  import asyncio
19
  from collections.abc import AsyncGenerator
20
+ from typing import TYPE_CHECKING, Any, Literal
21
 
22
  import structlog
23
  from agent_framework import (
 
181
 
182
  return f"Round {iteration}/{self._max_rounds} (~{est_display} remaining)"
183
 
184
+ async def _init_workflow_events(self, query: str) -> AsyncGenerator[AgentEvent, None]:
185
+ """Yield initialization events."""
186
+ yield AgentEvent(
187
+ type="started",
188
+ message=f"Starting research (Advanced mode): {query}",
189
+ iteration=0,
190
+ )
191
+
192
+ yield AgentEvent(
193
+ type="progress",
194
+ message="Loading embedding service (LlamaIndex/ChromaDB)...",
195
+ iteration=0,
196
+ )
197
+
198
+ async def _handle_timeout(self, iteration: int) -> AsyncGenerator[AgentEvent, None]:
199
+ """Handle workflow timeout by attempting synthesis."""
200
+ logger.warning("Workflow timed out", iterations=iteration)
201
+
202
+ # ACTUALLY synthesize from gathered evidence
203
+ try:
204
+ from src.agents.magentic_agents import create_report_agent
205
+ from src.agents.state import get_magentic_state
206
+
207
+ state = get_magentic_state()
208
+ memory = state.memory
209
+
210
+ # Get evidence summary from memory
211
+ evidence_summary = await memory.get_context_summary()
212
+
213
+ # Create and invoke ReportAgent for synthesis
214
+ report_agent = create_report_agent(self._chat_client, domain=self.domain)
215
+
216
+ yield AgentEvent(
217
+ type="synthesizing",
218
+ message="Workflow timed out. Synthesizing available evidence...",
219
+ iteration=iteration,
220
+ )
221
+
222
+ # Invoke ReportAgent directly
223
+ # Note: ChatAgent.run() returns AgentRunResponse; access text via .text
224
+ synthesis_result = await report_agent.run(
225
+ "Synthesize research report from this evidence. "
226
+ f"If evidence is sparse, say so.\n\n{evidence_summary}"
227
+ )
228
+
229
+ yield AgentEvent(
230
+ type="complete",
231
+ message=synthesis_result.text,
232
+ data={"reason": "timeout_synthesis", "iterations": iteration},
233
+ iteration=iteration,
234
+ )
235
+ except Exception as synth_error:
236
+ logger.error("Timeout synthesis failed", error=str(synth_error))
237
+ yield AgentEvent(
238
+ type="complete",
239
+ message=(
240
+ f"Research timed out after {iteration} rounds. "
241
+ f"Evidence gathered but synthesis failed: {synth_error}"
242
+ ),
243
+ data={"reason": "timeout_synthesis_failed", "iterations": iteration},
244
+ iteration=iteration,
245
+ )
246
+
247
  async def run(self, query: str) -> AsyncGenerator[AgentEvent, None]:
248
  """
249
  Run the workflow.
 
256
  """
257
  logger.info("Starting Advanced orchestrator", query=query)
258
 
259
+ async for event in self._init_workflow_events(query):
260
+ yield event
 
 
 
261
 
262
  # Initialize context state
 
 
 
 
 
263
  embedding_service = self._init_embedding_service()
264
 
265
  yield AgentEvent(
 
293
  iteration = 0
294
  final_event_received = False
295
 
296
+ # ACCUMULATOR PATTERN: Track streaming content to bypass upstream Repr Bug
297
+ # Upstream bug in _magentic.py flattens message.contents and sets message.text
298
+ # to repr(message) if text is empty. We must reconstruct text from Deltas.
299
+ current_message_buffer: str = ""
300
+ current_agent_id: str | None = None
301
+
302
  try:
303
  async with asyncio.timeout(self._timeout_seconds):
304
  async for event in workflow.run_stream(task):
305
+ # 1. Handle Streaming (Source of Truth for Content)
306
+ if isinstance(event, MagenticAgentDeltaEvent):
307
+ # Detect agent switch to clear buffer
308
+ if event.agent_id != current_agent_id:
309
+ current_message_buffer = ""
310
+ current_agent_id = event.agent_id
311
+
312
+ if event.text:
313
+ current_message_buffer += event.text
314
  yield AgentEvent(
315
+ type="streaming",
316
+ message=event.text,
317
+ data={"agent_id": event.agent_id},
318
  iteration=iteration,
319
  )
320
+ continue
321
+
322
+ # 2. Handle Completion Signal
323
+ # We use our accumulated buffer instead of the corrupted event.message
324
+ if isinstance(event, MagenticAgentMessageEvent):
325
+ iteration += 1
326
 
327
+ comp_event, prog_event = self._handle_completion_event(
328
+ event, current_message_buffer, iteration
329
+ )
330
+ yield comp_event
331
+ yield prog_event
332
+
333
+ # Clear buffer after consuming
334
+ current_message_buffer = ""
335
+ continue
336
+
337
+ # 3. Handle other events normally
338
+ agent_event = self._process_event(event, iteration)
339
+ if agent_event:
340
  if agent_event.type == "complete":
341
  final_event_received = True
 
342
  yield agent_event
343
 
344
  # GUARANTEE: Always emit termination event if stream ends without one
 
360
  )
361
 
362
  except TimeoutError:
363
+ async for event in self._handle_timeout(iteration):
364
+ yield event
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
365
 
366
  except Exception as e:
367
  logger.error("Workflow failed", error=str(e))
 
371
  iteration=iteration,
372
  )
373
 
374
+ def _handle_completion_event(
375
+ self, event: MagenticAgentMessageEvent, buffer: str, iteration: int
376
+ ) -> tuple[AgentEvent, AgentEvent]:
377
+ """Handle an agent completion event using the accumulated buffer."""
378
+ # Use buffer if available, otherwise fall back cautiously
379
+ # (Only fall back if buffer empty, which implies tool-only turn)
380
+ text_content = buffer
381
+ if not text_content:
382
+ # Try extraction but ignore repr strings AND empty strings
383
+ raw_text = self._extract_text(event.message)
384
+ if raw_text and not (raw_text.startswith("<") and "object at" in raw_text):
385
+ text_content = raw_text
386
+ else:
387
+ text_content = "Action completed (Tool Call)"
388
+
389
+ agent_name = event.agent_id or "unknown"
390
+ event_type = self._get_event_type_for_agent(agent_name)
391
+
392
+ completion_event = AgentEvent(
393
+ type=event_type,
394
+ message=f"{agent_name}: {text_content[:200]}...",
395
+ iteration=iteration,
396
+ )
397
+
398
+ # Progress update
399
+ rounds_remaining = max(self._max_rounds - iteration, 0)
400
+ est_seconds = rounds_remaining * 45
401
+ est_display = (
402
+ f"{est_seconds // 60}m {est_seconds % 60}s" if est_seconds >= 60 else f"{est_seconds}s"
403
+ )
404
+
405
+ progress_event = AgentEvent(
406
+ type="progress",
407
+ message=f"Round {iteration}/{self._max_rounds} (~{est_display} remaining)",
408
+ iteration=iteration,
409
+ )
410
+
411
+ return completion_event, progress_event
412
+
413
  def _extract_text(self, message: Any) -> str:
414
  """
415
  Defensively extract text from a message object.
 
461
  # The repr is useless for display purposes
462
  return ""
463
 
464
+ def _get_event_type_for_agent(
465
+ self, agent_name: str
466
+ ) -> Literal["search_complete", "judge_complete", "hypothesizing", "synthesizing", "judging"]:
467
  """Map agent name to appropriate event type.
468
 
469
  Args:
 
523
  iteration=iteration,
524
  )
525
 
526
+ # NOTE: MagenticAgentMessageEvent is handled in run() loop with Accumulator Pattern
527
+ # (see lines 322-335) and never reaches this method due to `continue` statement.
 
 
 
 
 
 
 
 
 
528
 
529
  elif isinstance(event, MagenticFinalResultEvent):
530
  text = self._extract_text(event.message) if event.message else "No result"
 
535
  iteration=iteration,
536
  )
537
 
538
+ # NOTE: MagenticAgentDeltaEvent is handled in run() loop with Accumulator Pattern
539
+ # (see lines 306-320) and never reaches this method due to `continue` statement.
 
 
 
 
 
 
540
 
541
  elif isinstance(event, WorkflowOutputEvent):
542
  if event.data:
tests/unit/orchestrators/test_accumulator_pattern.py ADDED
@@ -0,0 +1,294 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Test the Accumulator Pattern for Microsoft Agent Framework event handling.
3
+
4
+ This tests SPEC 17: We use MagenticAgentDeltaEvent.text as the sole source of content,
5
+ and MagenticAgentMessageEvent as a signal only (ignoring .message to avoid repr bug).
6
+ """
7
+
8
+ import importlib
9
+ import sys
10
+ import types
11
+ from unittest.mock import MagicMock, patch
12
+
13
+ import pytest
14
+
15
+
16
+ # --- Create real event classes ---
17
+ class MockDeltaEvent:
18
+ """Simulates MagenticAgentDeltaEvent with streaming text."""
19
+
20
+ def __init__(self, text: str, agent_id: str = "TestAgent"):
21
+ self.text = text
22
+ self.agent_id = agent_id
23
+
24
+
25
+ class MockMessageEvent:
26
+ """Simulates MagenticAgentMessageEvent with potentially corrupted .message."""
27
+
28
+ def __init__(self, message_text: str, agent_id: str = "TestAgent"):
29
+ self.message = MagicMock()
30
+ self.message.text = message_text # This could be repr garbage
31
+ self.agent_id = agent_id
32
+ self.text = None # No top-level .text on MessageEvent
33
+
34
+
35
+ class MockFinalResultEvent:
36
+ """Simulates MagenticFinalResultEvent."""
37
+
38
+ def __init__(self, text: str):
39
+ self.message = MagicMock()
40
+ self.message.text = text
41
+ self.text = None
42
+
43
+
44
+ class MockOrchestratorMessageEvent:
45
+ """Simulates MagenticOrchestratorMessageEvent."""
46
+
47
+ def __init__(self, kind: str = "user_task", message: str = "test"):
48
+ self.kind = kind
49
+ self.message = MagicMock()
50
+ self.message.text = message
51
+
52
+
53
+ class MockWorkflowOutputEvent:
54
+ """Simulates WorkflowOutputEvent."""
55
+
56
+ def __init__(self, data=None):
57
+ self.data = data
58
+
59
+
60
+ # Pass-through decorators
61
+ def mock_use_function_invocation(func=None):
62
+ return func if func else lambda f: f
63
+
64
+
65
+ def mock_use_observability(func=None):
66
+ return func if func else lambda f: f
67
+
68
+
69
+ @pytest.fixture
70
+ def mock_agent_framework():
71
+ """
72
+ Mock the agent_framework module structure in sys.modules.
73
+ """
74
+ # Create the mock module structure
75
+ mock_af = types.ModuleType("agent_framework")
76
+ mock_af_openai = types.ModuleType("agent_framework.openai")
77
+ mock_af_middleware = types.ModuleType("agent_framework._middleware")
78
+ mock_af_tools = types.ModuleType("agent_framework._tools")
79
+ mock_af_types = types.ModuleType("agent_framework._types")
80
+ mock_af_observability = types.ModuleType("agent_framework.observability")
81
+
82
+ # Populate submodules
83
+ mock_af.openai = mock_af_openai
84
+ mock_af._middleware = mock_af_middleware
85
+ mock_af._tools = mock_af_tools
86
+ mock_af._types = mock_af_types
87
+ mock_af.observability = mock_af_observability
88
+
89
+ # Assign our REAL event classes as the module-level types
90
+ mock_af.MagenticAgentDeltaEvent = MockDeltaEvent
91
+ mock_af.MagenticAgentMessageEvent = MockMessageEvent
92
+ mock_af.MagenticFinalResultEvent = MockFinalResultEvent
93
+ mock_af.MagenticOrchestratorMessageEvent = MockOrchestratorMessageEvent
94
+ mock_af.WorkflowOutputEvent = MockWorkflowOutputEvent
95
+
96
+ # Mock other classes
97
+ mock_af.MagenticBuilder = MagicMock
98
+ mock_af.ChatAgent = MagicMock
99
+ mock_af.ai_function = MagicMock
100
+ mock_af.BaseChatClient = MagicMock
101
+ mock_af.ToolProtocol = MagicMock
102
+ mock_af.ChatMessage = MagicMock
103
+ mock_af.ChatResponse = MagicMock
104
+ mock_af.ChatResponseUpdate = MagicMock
105
+ mock_af.ChatOptions = MagicMock
106
+ mock_af.FinishReason = MagicMock
107
+ mock_af.Role = MagicMock
108
+
109
+ # Populate symbols in submodules
110
+ mock_af_openai.OpenAIChatClient = MagicMock
111
+ mock_af_middleware.use_chat_middleware = MagicMock
112
+ mock_af_tools.use_function_invocation = mock_use_function_invocation
113
+ mock_af_types.FunctionCallContent = MagicMock
114
+ mock_af_types.FunctionResultContent = MagicMock
115
+ mock_af_observability.use_observability = mock_use_observability
116
+
117
+ # Patch sys.modules to include our mocks
118
+ with patch.dict(
119
+ sys.modules,
120
+ {
121
+ "agent_framework": mock_af,
122
+ "agent_framework.openai": mock_af_openai,
123
+ "agent_framework._middleware": mock_af_middleware,
124
+ "agent_framework._tools": mock_af_tools,
125
+ "agent_framework._types": mock_af_types,
126
+ "agent_framework.observability": mock_af_observability,
127
+ },
128
+ ):
129
+ yield mock_af
130
+
131
+
132
+ @pytest.fixture(scope="module", autouse=True)
133
+ def cleanup_orchestrator_module():
134
+ """
135
+ Ensure src.orchestrators.advanced is restored to a clean state after tests.
136
+ This prevents 'Mock' classes from leaking into other tests via module globals.
137
+ """
138
+ yield
139
+ # After all tests in this module, reload the orchestrator module
140
+ # This will use the REAL agent_framework (since the mock fixture is teardown)
141
+ import src.orchestrators.advanced
142
+
143
+ importlib.reload(src.orchestrators.advanced)
144
+
145
+
146
+ @pytest.fixture
147
+ def mock_orchestrator(mock_agent_framework):
148
+ """
149
+ Create an AdvancedOrchestrator with all dependencies mocked.
150
+ Relies on reloading the module to pick up the mocked agent_framework.
151
+ """
152
+ # Import locally
153
+ import src.orchestrators.advanced
154
+
155
+ # RELOAD to ensure it picks up the mocked agent_framework from sys.modules
156
+ importlib.reload(src.orchestrators.advanced)
157
+
158
+ from src.orchestrators.advanced import AdvancedOrchestrator
159
+
160
+ with (
161
+ patch("src.orchestrators.advanced.get_chat_client"),
162
+ patch("src.orchestrators.advanced.get_embedding_service_if_available", return_value=None),
163
+ patch("src.orchestrators.advanced.init_magentic_state"),
164
+ patch("src.agents.state.ResearchMemory"),
165
+ patch("src.utils.service_loader.get_embedding_service", return_value=MagicMock()),
166
+ ):
167
+ orch = AdvancedOrchestrator(max_rounds=5)
168
+ yield orch
169
+
170
+
171
+ @pytest.mark.unit
172
+ @pytest.mark.asyncio
173
+ async def test_accumulator_pattern_scenario_a_standard_text(mock_orchestrator):
174
+ """
175
+ Scenario A: Standard Text Message
176
+ Input: Deltas ("Hello", " World") -> MessageEvent (Poisoned Repr)
177
+ Expected: AgentEvent with "Hello World", NOT the repr string
178
+ """
179
+ events = [
180
+ MockDeltaEvent("Hello", agent_id="ChatBot"),
181
+ MockDeltaEvent(" World", agent_id="ChatBot"),
182
+ MockMessageEvent("<ChatMessage object at 0xDEADBEEF>", agent_id="ChatBot"),
183
+ ]
184
+
185
+ async def mock_stream(*args, **kwargs):
186
+ for event in events:
187
+ yield event
188
+
189
+ mock_workflow = MagicMock()
190
+ mock_workflow.run_stream = mock_stream
191
+
192
+ with patch.object(mock_orchestrator, "_build_workflow", return_value=mock_workflow):
193
+ generated_events = []
194
+ async for event in mock_orchestrator.run("test query"):
195
+ generated_events.append(event)
196
+
197
+ # Find the completion event for ChatBot (non-streaming)
198
+ chat_events = [
199
+ e for e in generated_events if "ChatBot" in str(e.message) and e.type != "streaming"
200
+ ]
201
+
202
+ assert len(chat_events) >= 1, (
203
+ f"Expected ChatBot events, got: {[e.message for e in generated_events]}"
204
+ )
205
+ final_event = chat_events[0]
206
+
207
+ # CRITICAL: Must contain accumulated text, NOT repr
208
+ assert "Hello World" in final_event.message or "Hello" in final_event.message
209
+ assert "<ChatMessage" not in final_event.message, f"Repr bug! Got: {final_event.message}"
210
+
211
+
212
+ @pytest.mark.unit
213
+ @pytest.mark.asyncio
214
+ async def test_accumulator_pattern_scenario_b_tool_call(mock_orchestrator):
215
+ """
216
+ Scenario B: Tool Call (No Text Deltas)
217
+ Input: No Deltas -> MessageEvent (Poisoned Repr)
218
+ Expected: AgentEvent with fallback text, NOT the repr string
219
+ """
220
+ events = [
221
+ MockMessageEvent("<ChatMessage object at 0xDEADBEEF>", agent_id="SearchAgent"),
222
+ ]
223
+
224
+ async def mock_stream(*args, **kwargs):
225
+ for event in events:
226
+ yield event
227
+
228
+ mock_workflow = MagicMock()
229
+ mock_workflow.run_stream = mock_stream
230
+
231
+ with patch.object(mock_orchestrator, "_build_workflow", return_value=mock_workflow):
232
+ generated_events = []
233
+ async for event in mock_orchestrator.run("test query"):
234
+ generated_events.append(event)
235
+
236
+ # Find completion events for SearchAgent
237
+ search_events = [
238
+ e for e in generated_events if "SearchAgent" in str(e.message) and e.type != "streaming"
239
+ ]
240
+
241
+ assert len(search_events) >= 1, (
242
+ f"Expected SearchAgent events, got: {[e.message for e in generated_events]}"
243
+ )
244
+ final_event = search_events[0]
245
+
246
+ # CRITICAL: Should use fallback, NOT repr
247
+ assert "<ChatMessage" not in final_event.message, f"Repr bug! Got: {final_event.message}"
248
+ # Should contain fallback or tool indicator
249
+ assert "Action completed" in final_event.message or "Tool" in final_event.message
250
+
251
+
252
+ @pytest.mark.unit
253
+ @pytest.mark.asyncio
254
+ async def test_accumulator_pattern_buffer_clearing(mock_orchestrator):
255
+ """
256
+ Verify buffer clears between agents.
257
+ Agent B should NOT inherit Agent A's accumulated text.
258
+ """
259
+ events = [
260
+ MockDeltaEvent("Agent A says hi", agent_id="AgentA"),
261
+ MockMessageEvent("irrelevant", agent_id="AgentA"),
262
+ MockDeltaEvent("Agent B responds", agent_id="AgentB"),
263
+ MockMessageEvent("irrelevant", agent_id="AgentB"),
264
+ ]
265
+
266
+ async def mock_stream(*args, **kwargs):
267
+ for event in events:
268
+ yield event
269
+
270
+ mock_workflow = MagicMock()
271
+ mock_workflow.run_stream = mock_stream
272
+
273
+ with patch.object(mock_orchestrator, "_build_workflow", return_value=mock_workflow):
274
+ generated_events = []
275
+ async for event in mock_orchestrator.run("test query"):
276
+ generated_events.append(event)
277
+
278
+ # Find non-streaming events for each agent
279
+ agent_a_events = [
280
+ e for e in generated_events if "AgentA" in str(e.message) and e.type != "streaming"
281
+ ]
282
+ agent_b_events = [
283
+ e for e in generated_events if "AgentB" in str(e.message) and e.type != "streaming"
284
+ ]
285
+
286
+ # Both should have completion events
287
+ assert len(agent_a_events) >= 1, f"No AgentA events: {[e.message for e in generated_events]}"
288
+ assert len(agent_b_events) >= 1, f"No AgentB events: {[e.message for e in generated_events]}"
289
+
290
+ # Agent A should have its own text
291
+ assert "Agent A" in agent_a_events[0].message
292
+ # Agent B should have its own text, NOT Agent A's
293
+ assert "Agent B" in agent_b_events[0].message
294
+ assert "Agent A" not in agent_b_events[0].message, "Buffer not cleared between agents!"
tests/unit/orchestrators/test_advanced_timeout.py CHANGED
@@ -41,8 +41,11 @@ async def test_timeout_synthesizes_evidence():
41
  mock_get_state.return_value = mock_state
42
 
43
  # Setup mock ReportAgent
 
44
  mock_report_agent = AsyncMock()
45
- mock_report_agent.run.return_value = "Final Synthesized Report"
 
 
46
  mock_create_agent.return_value = mock_report_agent
47
 
48
  events = []
 
41
  mock_get_state.return_value = mock_state
42
 
43
  # Setup mock ReportAgent
44
+ # ChatAgent.run() returns AgentRunResponse with .text property
45
  mock_report_agent = AsyncMock()
46
+ mock_response = MagicMock()
47
+ mock_response.text = "Final Synthesized Report"
48
+ mock_report_agent.run.return_value = mock_response
49
  mock_create_agent.return_value = mock_report_agent
50
 
51
  events = []