fix(P0): Implement Accumulator Pattern to resolve Repr Bug (#117)
Browse files* docs: Add P1 bug doc for Simple Mode removal breaking Free Tier UX
SPEC-16 Unified Architecture removed Simple Mode, forcing all users
to Advanced Mode. When no API key is provided, Advanced Mode falls back
to HuggingFace Free Tier which triggers upstream agent-framework repr
bug (#2562).
Options documented:
A) Wait for upstream fix (PR #2566)
B) Restore Simple Mode for free tier
C) Current workaround in _extract_text()
* docs: Update P1 bug doc and SPEC-16 with rollback warning
CRITICAL: Simple Mode was deleted BEFORE verifying Advanced+HF worked.
Problem:
- Upstream agent-framework has repr bug (#2562)
- Advanced Mode + HuggingFace = garbage output
- Simple Mode (the working fallback) was deleted prematurely
Bug doc updates:
- Added "What Went Wrong" timeline
- Added Gradio UI confusion analysis (examples vs chat button)
- Recommendation: Restore Simple Mode as fallback
SPEC-16 updates:
- Status changed to "PARTIALLY IMPLEMENTED - ROLLBACK REQUIRED"
- Added critical warning about premature deletion
- Links to P1 bug doc for action items
* docs: CRITICAL - Simple Mode is NOT being deleted
This commit makes it CRYSTAL CLEAR across all documentation:
β SIMPLE MODE IS NOT BEING DELETED - NON-NEGOTIABLE
What went wrong:
- SPEC-16 was supposed to INTEGRATE Simple Mode, not DELETE it
- simple.py was deleted BEFORE verifying Advanced+HF worked
- Upstream agent-framework has repr bug (#2562)
- Free tier users now have no working fallback
Required actions:
1. RESTORE simple.py from git history or MCP reference
2. KEEP Simple Mode as free-tier fallback indefinitely
3. Use Advanced Mode ONLY for paid API key users
4. Wait for upstream #2566 to merge before reconsidering
Updated files:
- SPEC_16: Status changed to "ON HOLD", added warning
- P0_SIMPLE_MODE_FORCED_SYNTHESIS_BYPASS: Changed to "Patch simple.py"
- ACTIVE_BUGS: Marked Simple Mode issues as OPEN
* docs: DO NOT use MCP reference repo - it's buggy
Updated all docs and GitHub issues to clarify:
1. DO NOT restore from MCP reference repo - has known bugs
2. Git revert in THIS repo MAY be possible - review for bugs first
3. Clean implementation preferred if old code is too buggy
4. Goal is WORKING Simple Mode, not blindly restored buggy code
Files updated:
- ACTIVE_BUGS.md
- SPEC_16_UNIFIED_CHAT_CLIENT_ARCHITECTURE.md
- P1_SIMPLE_MODE_REMOVED_BREAKS_FREE_TIER_UX.md
GitHub issues updated:
- #105: Added warning about reference repo
- #113: Added warning about reference repo
* docs: Clarify UNIFIED architecture with Simple Mode INTEGRATED
- NOT two parallel universes/orchestrators
- ONE codebase handles all tiers (free + paid)
- Simple Mode behavior INTEGRATED, not separate
- Blocked by upstream bug #2562, waiting for PR #2566
* docs: Add architecture documentation for unified system
- Current state: Advanced Mode only, simple.py deleted
- Goal: ONE unified architecture (not parallel universes)
- Simple Mode INTEGRATED via HuggingFaceChatClient
- Blocked by upstream #2562, waiting for PR #2566
- Includes path forward for all scenarios
* docs: Update all bug docs for unified architecture consistency
- ACTIVE_BUGS.md: Consolidated free tier issue as single P0 blocker
- P0_SIMPLE_MODE_FORCED_SYNTHESIS_BYPASS.md: Simplified - bug fixed by unification
- All docs now consistently say: ONE unified architecture, NOT parallel universes
- Simple Mode behavior INTEGRATED via HuggingFaceChatClient
- simple.py is DELETED, not being restored
* docs: FINAL - Clear terminology, framework integration documented
Architecture:
- No API Key (Free) β HuggingFace backend
- API Key (Paid) β OpenAI backend
- ONE codebase, different backends, no "modes"
Framework Stack:
- Microsoft Agent Framework = orchestration (routes agents)
- Pydantic AI = structured outputs (validates data)
- Both work TOGETHER, not mutually exclusive
Blocked by upstream #2562, waiting for PR #2566.
All docs and GitHub issues now use consistent terminology.
* docs: Fix root-level docs for unified architecture
- CLAUDE.md: Remove simple.py reference, update orchestrator description
- AGENTS.md: Same fix
- GEMINI.md: Same fix
- README.md: "Two Modes" β "Unified Architecture" + Free/Paid Tier
All root docs now consistent with unified architecture:
- ONE orchestrator (advanced.py) for all users
- Auto-selects backend: OpenAI (if key) or HuggingFace (free)
- No more "Simple Mode" vs "Advanced Mode" terminology
* fix(orchestrator): implement Accumulator Pattern to resolve Repr Bug (P0)
Implements SPEC-17 to fix the P0 'Repr Bug' where agent messages displayed raw Python object strings.
Changes:
- Implemented Accumulator Pattern in AdvancedOrchestrator to use streaming deltas as the source of truth for text content.
- Added fallback logic to handle tool-only turns safely without exposing internal object representations.
- Refactored to reduce complexity (PLR0915) by extracting , , and .
- Added comprehensive unit tests (tests/unit/orchestrators/test_accumulator_pattern.py) verifying the fix against mocked upstream events.
- Updated documentation with SPEC-17 and Root Cause Analysis.
* docs: Add analysis for Gradio Example vs Chat Arrow behavior
- Documented the analysis of user-reported discrepancies between Example Click and Chat Arrow outputs.
- Confirmed that both actions utilize the same code path, with differences attributed to timing rather than divergent code.
- Identified the root cause as an upstream representation issue, linking to related documentation for further context.
- Provided verification steps and next actions regarding the upstream bug fix.
* fix(tests): isolate accumulator pattern tests to prevent module pollution
Refactors tests/unit/orchestrators/test_accumulator_pattern.py to use scoped fixtures for patching sys.modules instead of global module-level patching. This prevents side effects on other tests (like test_advanced_events.py and test_chat_client_factory.py).
Changes:
- Moved mock setup into 'mock_agent_framework' fixture.
- Implemented module reloading logic for 'src.orchestrators.advanced' to ensure it picks up mocks during isolation tests and real modules afterwards.
- Updated MockOrchestratorMessageEvent signature to match real class (added 'message' arg).
- Verified all 20 related tests pass together.
* fix: Address CodeRabbit review feedback
- Add `text` language identifier to ASCII diagram code blocks (MD041)
- Fix broken URL typo: togithub.com β github.com
- Remove unreachable dead code for MagenticAgentMessageEvent and
MagenticAgentDeltaEvent handlers in _process_event() (handled by
Accumulator Pattern in run() loop with continue statements)
* fix: Address all CodeRabbit review feedback
- Use synthesis_result.text instead of str() for AgentRunResponse
- Add Literal return type to _get_event_type_for_agent (eliminates type: ignore)
- Add
@pytest
.mark.unit markers to accumulator tests
- Add `text` language identifier to code fence in P0_SIMPLE_MODE doc
- Update P0_REPR_BUG checklist to reflect completed dead code removal
- Fix test mock to return object with .text property (matches AgentRunResponse API)
* docs: Fix markdown lint (blank line before code fence)
- .gitignore +2 -0
- AGENTS.md +7 -4
- CLAUDE.md +7 -4
- GEMINI.md +7 -4
- P0_REPR_BUG_ROOT_CAUSE_ANALYSIS.md +99 -0
- README.md +3 -2
- docs/ARCHITECTURE.md +104 -0
- docs/bugs/ACTIVE_BUGS.md +38 -30
- docs/bugs/GRADIO_EXAMPLE_VS_CHAT_ARROW_ANALYSIS.md +147 -0
- docs/bugs/P0_SIMPLE_MODE_FORCED_SYNTHESIS_BYPASS.md +30 -190
- docs/bugs/P1_SIMPLE_MODE_REMOVED_BREAKS_FREE_TIER_UX.md +61 -0
- docs/specs/SPEC_16_UNIFIED_CHAT_CLIENT_ARCHITECTURE.md +71 -306
- docs/specs/SPEC_17_ACCUMULATOR_PATTERN.md +62 -0
- src/orchestrators/advanced.py +151 -87
- tests/unit/orchestrators/test_accumulator_pattern.py +294 -0
- tests/unit/orchestrators/test_advanced_timeout.py +4 -1
|
@@ -50,6 +50,8 @@ reference_repos/pydanticai-research-agent/
|
|
| 50 |
reference_repos/pubmed-mcp-server/
|
| 51 |
reference_repos/DeepCritical/
|
| 52 |
reference_repos/GradioDemo/
|
|
|
|
|
|
|
| 53 |
|
| 54 |
# Keep the README in reference_repos
|
| 55 |
!reference_repos/README.md
|
|
|
|
| 50 |
reference_repos/pubmed-mcp-server/
|
| 51 |
reference_repos/DeepCritical/
|
| 52 |
reference_repos/GradioDemo/
|
| 53 |
+
reference_repos/deepboner-hf-space/
|
| 54 |
+
reference_repos/microsoft-agent-framework/
|
| 55 |
|
| 56 |
# Keep the README in reference_repos
|
| 57 |
!reference_repos/README.md
|
|
@@ -50,10 +50,13 @@ Research Report with Citations
|
|
| 50 |
|
| 51 |
**Key Components**:
|
| 52 |
|
| 53 |
-
- `src/orchestrators/` -
|
| 54 |
-
- `
|
| 55 |
-
- `
|
| 56 |
-
- `langgraph_orchestrator.py` - LangGraph-based workflow
|
|
|
|
|
|
|
|
|
|
| 57 |
- `src/tools/pubmed.py` - PubMed E-utilities search
|
| 58 |
- `src/tools/clinicaltrials.py` - ClinicalTrials.gov API
|
| 59 |
- `src/tools/europepmc.py` - Europe PMC search
|
|
|
|
| 50 |
|
| 51 |
**Key Components**:
|
| 52 |
|
| 53 |
+
- `src/orchestrators/` - Unified orchestrator package
|
| 54 |
+
- `advanced.py` - Main orchestrator (handles both Free and Paid tiers)
|
| 55 |
+
- `factory.py` - Auto-selects backend based on API key presence
|
| 56 |
+
- `langgraph_orchestrator.py` - LangGraph-based workflow (experimental)
|
| 57 |
+
- `src/clients/` - LLM backend adapters
|
| 58 |
+
- `factory.py` - Auto-selects: OpenAI (if key) or HuggingFace (free)
|
| 59 |
+
- `huggingface.py` - HuggingFace adapter for free tier
|
| 60 |
- `src/tools/pubmed.py` - PubMed E-utilities search
|
| 61 |
- `src/tools/clinicaltrials.py` - ClinicalTrials.gov API
|
| 62 |
- `src/tools/europepmc.py` - Europe PMC search
|
|
@@ -50,10 +50,13 @@ Research Report with Citations
|
|
| 50 |
|
| 51 |
**Key Components**:
|
| 52 |
|
| 53 |
-
- `src/orchestrators/` -
|
| 54 |
-
- `
|
| 55 |
-
- `
|
| 56 |
-
- `langgraph_orchestrator.py` - LangGraph-based workflow
|
|
|
|
|
|
|
|
|
|
| 57 |
- `src/tools/pubmed.py` - PubMed E-utilities search
|
| 58 |
- `src/tools/clinicaltrials.py` - ClinicalTrials.gov API
|
| 59 |
- `src/tools/europepmc.py` - Europe PMC search
|
|
|
|
| 50 |
|
| 51 |
**Key Components**:
|
| 52 |
|
| 53 |
+
- `src/orchestrators/` - Unified orchestrator package
|
| 54 |
+
- `advanced.py` - Main orchestrator (handles both Free and Paid tiers)
|
| 55 |
+
- `factory.py` - Auto-selects backend based on API key presence
|
| 56 |
+
- `langgraph_orchestrator.py` - LangGraph-based workflow (experimental)
|
| 57 |
+
- `src/clients/` - LLM backend adapters
|
| 58 |
+
- `factory.py` - Auto-selects: OpenAI (if key) or HuggingFace (free)
|
| 59 |
+
- `huggingface.py` - HuggingFace adapter for free tier
|
| 60 |
- `src/tools/pubmed.py` - PubMed E-utilities search
|
| 61 |
- `src/tools/clinicaltrials.py` - ClinicalTrials.gov API
|
| 62 |
- `src/tools/europepmc.py` - Europe PMC search
|
|
@@ -50,10 +50,13 @@ The project follows a **Vertical Slice Architecture** (Search -> Judge -> Orches
|
|
| 50 |
|
| 51 |
## Key Components
|
| 52 |
|
| 53 |
-
- `src/orchestrators/` -
|
| 54 |
-
- `
|
| 55 |
-
- `
|
| 56 |
-
- `langgraph_orchestrator.py` - LangGraph-based workflow
|
|
|
|
|
|
|
|
|
|
| 57 |
- `src/tools/pubmed.py` - PubMed E-utilities search
|
| 58 |
- `src/tools/clinicaltrials.py` - ClinicalTrials.gov API
|
| 59 |
- `src/tools/europepmc.py` - Europe PMC search
|
|
|
|
| 50 |
|
| 51 |
## Key Components
|
| 52 |
|
| 53 |
+
- `src/orchestrators/` - Unified orchestrator package
|
| 54 |
+
- `advanced.py` - Main orchestrator (handles both Free and Paid tiers)
|
| 55 |
+
- `factory.py` - Auto-selects backend based on API key presence
|
| 56 |
+
- `langgraph_orchestrator.py` - LangGraph-based workflow (experimental)
|
| 57 |
+
- `src/clients/` - LLM backend adapters
|
| 58 |
+
- `factory.py` - Auto-selects: OpenAI (if key) or HuggingFace (free)
|
| 59 |
+
- `huggingface.py` - HuggingFace adapter for free tier
|
| 60 |
- `src/tools/pubmed.py` - PubMed E-utilities search
|
| 61 |
- `src/tools/clinicaltrials.py` - ClinicalTrials.gov API
|
| 62 |
- `src/tools/europepmc.py` - Europe PMC search
|
|
@@ -0,0 +1,99 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# P0: Event Handling Implementation Spec
|
| 2 |
+
|
| 3 |
+
**Status**: FIXED
|
| 4 |
+
**Priority**: P0
|
| 5 |
+
**Source of Truth**: `reference_repos/microsoft-agent-framework/python/samples/autogen-migration/orchestrations/04_magentic_one.py`
|
| 6 |
+
|
| 7 |
+
---
|
| 8 |
+
|
| 9 |
+
## Root Cause (One Sentence)
|
| 10 |
+
|
| 11 |
+
We were extracting content from `MagenticAgentMessageEvent.message` β **the wrong event type** β instead of using `MagenticAgentDeltaEvent.text` as the sole source of streaming content.
|
| 12 |
+
|
| 13 |
+
---
|
| 14 |
+
|
| 15 |
+
## The Fix: Correct Event Handling Per Microsoft SSOT
|
| 16 |
+
|
| 17 |
+
| Event Type | Correct Usage | What We Were Doing (Wrong) |
|
| 18 |
+
|------------|---------------|----------------------------|
|
| 19 |
+
| `MagenticAgentDeltaEvent` | **Extract `.text`** - This is the ONLY source of content | Partially used, not accumulated |
|
| 20 |
+
| `MagenticAgentMessageEvent` | **Signal only** - Agent turn complete. IGNORE `.message` | Extracting `.message.text` (hits repr bug) |
|
| 21 |
+
| `MagenticFinalResultEvent` | **Extract `.message.text`** - Final synthesis result | Correct |
|
| 22 |
+
|
| 23 |
+
---
|
| 24 |
+
|
| 25 |
+
## Implementation: Accumulator Pattern
|
| 26 |
+
|
| 27 |
+
From Microsoft's `04_magentic_one.py` (lines 108-138):
|
| 28 |
+
|
| 29 |
+
```python
|
| 30 |
+
# Microsoft's Pattern
|
| 31 |
+
async for event in workflow.run_stream(task):
|
| 32 |
+
if isinstance(event, MagenticAgentDeltaEvent):
|
| 33 |
+
# STREAM CONTENT: Accumulate and display
|
| 34 |
+
if event.text:
|
| 35 |
+
print(event.text, end="", flush=True)
|
| 36 |
+
|
| 37 |
+
elif isinstance(event, MagenticAgentMessageEvent):
|
| 38 |
+
# SIGNAL ONLY: Agent done. Print newline. DO NOT read .message
|
| 39 |
+
print()
|
| 40 |
+
|
| 41 |
+
elif isinstance(event, MagenticFinalResultEvent):
|
| 42 |
+
# FINAL RESULT: Safe to read .message.text
|
| 43 |
+
print(event.message.text)
|
| 44 |
+
```
|
| 45 |
+
|
| 46 |
+
---
|
| 47 |
+
|
| 48 |
+
## Our Implementation (`src/orchestrators/advanced.py`)
|
| 49 |
+
|
| 50 |
+
**Status**: β
IMPLEMENTED (lines 241-308)
|
| 51 |
+
|
| 52 |
+
```python
|
| 53 |
+
# 1. Accumulate streaming content (ONLY source of truth)
|
| 54 |
+
if isinstance(event, MagenticAgentDeltaEvent):
|
| 55 |
+
if event.text:
|
| 56 |
+
current_message_buffer += event.text
|
| 57 |
+
yield AgentEvent(type="streaming", message=event.text, ...)
|
| 58 |
+
|
| 59 |
+
# 2. Use buffer on completion signal (IGNORE event.message)
|
| 60 |
+
if isinstance(event, MagenticAgentMessageEvent):
|
| 61 |
+
text_content = current_message_buffer or "Action completed (Tool Call)"
|
| 62 |
+
yield AgentEvent(message=f"{agent_name}: {text_content[:200]}...", ...)
|
| 63 |
+
current_message_buffer = "" # Reset for next agent
|
| 64 |
+
|
| 65 |
+
# 3. Final result - safe to extract
|
| 66 |
+
if isinstance(event, MagenticFinalResultEvent):
|
| 67 |
+
text = self._extract_text(event.message)
|
| 68 |
+
yield AgentEvent(type="complete", message=text, ...)
|
| 69 |
+
```
|
| 70 |
+
|
| 71 |
+
---
|
| 72 |
+
|
| 73 |
+
## Why This Eliminates the Repr Bug
|
| 74 |
+
|
| 75 |
+
The repr bug occurs at `_magentic.py:1730`:
|
| 76 |
+
|
| 77 |
+
```python
|
| 78 |
+
text = last.text or str(last) # Falls back to repr() for tool-only messages
|
| 79 |
+
```
|
| 80 |
+
|
| 81 |
+
By **never reading** `MagenticAgentMessageEvent.message.text`, we never hit this code path.
|
| 82 |
+
|
| 83 |
+
**The repr bug is eliminated by correct implementation β no upstream fix required.**
|
| 84 |
+
|
| 85 |
+
---
|
| 86 |
+
|
| 87 |
+
## Verification Checklist
|
| 88 |
+
|
| 89 |
+
- [x] `MagenticAgentDeltaEvent.text` used as sole content source
|
| 90 |
+
- [x] `MagenticAgentMessageEvent` used as signal only (buffer consumed, not `.message`)
|
| 91 |
+
- [x] `MagenticFinalResultEvent.message.text` extracted for final result
|
| 92 |
+
- [x] Buffer reset on agent switch and completion
|
| 93 |
+
- [x] Remove dead code path in `_process_event()` that still calls `_extract_text` on `MagenticAgentMessageEvent`
|
| 94 |
+
|
| 95 |
+
---
|
| 96 |
+
|
| 97 |
+
## Remaining Cleanup
|
| 98 |
+
|
| 99 |
+
β
**DONE** - Dead code paths for `MagenticAgentMessageEvent` and `MagenticAgentDeltaEvent` have been removed from `_process_event()`. Comments now explain these events are handled by the Accumulator Pattern in `run()`.
|
|
@@ -55,8 +55,9 @@ Sexual health is health. Period. Yet it remains one of the most under-researched
|
|
| 55 |
- π€ **MCP Integration**: Use our tools from Claude Desktop or any MCP client
|
| 56 |
- π **Modal Sandbox**: Secure execution of AI-generated statistical analysis
|
| 57 |
- π§ **Smart Evidence Synthesis**: LLM-powered judge evaluates and synthesizes findings
|
| 58 |
-
- β‘ **
|
| 59 |
-
- π **Free Tier
|
|
|
|
| 60 |
|
| 61 |
## Example Queries
|
| 62 |
|
|
|
|
| 55 |
- π€ **MCP Integration**: Use our tools from Claude Desktop or any MCP client
|
| 56 |
- π **Modal Sandbox**: Secure execution of AI-generated statistical analysis
|
| 57 |
- π§ **Smart Evidence Synthesis**: LLM-powered judge evaluates and synthesizes findings
|
| 58 |
+
- β‘ **Unified Architecture**: Same powerful multi-agent orchestration for everyone
|
| 59 |
+
- π **Free Tier**: Works without API keys (HuggingFace Inference)
|
| 60 |
+
- π **Paid Tier**: Unlocks GPT-5 automatically when OpenAI key is provided
|
| 61 |
|
| 62 |
## Example Queries
|
| 63 |
|
|
@@ -0,0 +1,104 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# DeepBoner Architecture
|
| 2 |
+
|
| 3 |
+
> **Last Updated**: 2025-12-01
|
| 4 |
+
|
| 5 |
+
---
|
| 6 |
+
|
| 7 |
+
## How It Works (Simple Version)
|
| 8 |
+
|
| 9 |
+
```text
|
| 10 |
+
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 11 |
+
β UNIFIED ARCHITECTURE β
|
| 12 |
+
β β
|
| 13 |
+
β User provides API key? β
|
| 14 |
+
β β
|
| 15 |
+
β NO (Free Tier) YES (Paid Tier) β
|
| 16 |
+
β ββββββββββββββ βββββββββββββββ β
|
| 17 |
+
β HuggingFace backend OpenAI backend β
|
| 18 |
+
β Qwen 2.5 72B (free) GPT-5 (paid) β
|
| 19 |
+
β β
|
| 20 |
+
β SAME orchestration logic for both β
|
| 21 |
+
β ONE codebase, different LLM backends β
|
| 22 |
+
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 23 |
+
```
|
| 24 |
+
|
| 25 |
+
**That's it.** No "modes." Just: do you have an API key or not?
|
| 26 |
+
|
| 27 |
+
---
|
| 28 |
+
|
| 29 |
+
## Current Status
|
| 30 |
+
|
| 31 |
+
**Free Tier is BLOCKED** by upstream bug #2562.
|
| 32 |
+
|
| 33 |
+
Once [PR #2566](https://github.com/microsoft/agent-framework/pull/2566) merges:
|
| 34 |
+
1. Update `agent-framework` dependency
|
| 35 |
+
2. Free tier works
|
| 36 |
+
3. Done
|
| 37 |
+
|
| 38 |
+
---
|
| 39 |
+
|
| 40 |
+
## Framework Stack
|
| 41 |
+
|
| 42 |
+
DeepBoner uses TWO frameworks that work TOGETHER:
|
| 43 |
+
|
| 44 |
+
| Framework | What It Does | Where Used |
|
| 45 |
+
|-----------|--------------|------------|
|
| 46 |
+
| **Microsoft Agent Framework** | Multi-agent orchestration | `src/orchestrators/advanced.py` |
|
| 47 |
+
| **Pydantic AI** | Structured outputs, validation | `src/agent_factory/judges.py`, `src/agents/*.py` |
|
| 48 |
+
|
| 49 |
+
**They are NOT mutually exclusive.** Microsoft AF handles the orchestration (Manager β Search β Judge β Report). Pydantic AI handles structured outputs within those agents.
|
| 50 |
+
|
| 51 |
+
---
|
| 52 |
+
|
| 53 |
+
## LLM Backend Selection
|
| 54 |
+
|
| 55 |
+
Auto-detected by `src/clients/factory.py`:
|
| 56 |
+
|
| 57 |
+
```python
|
| 58 |
+
def get_chat_client():
|
| 59 |
+
if settings.has_openai_key:
|
| 60 |
+
return OpenAIChatClient(...) # Paid tier
|
| 61 |
+
else:
|
| 62 |
+
return HuggingFaceChatClient(...) # Free tier
|
| 63 |
+
```
|
| 64 |
+
|
| 65 |
+
| Condition | Backend | Model |
|
| 66 |
+
|-----------|---------|-------|
|
| 67 |
+
| User provides OpenAI key | OpenAI | GPT-5 |
|
| 68 |
+
| No API key provided | HuggingFace | Qwen 2.5 72B (free) |
|
| 69 |
+
|
| 70 |
+
---
|
| 71 |
+
|
| 72 |
+
## Key Files
|
| 73 |
+
|
| 74 |
+
| File | Purpose |
|
| 75 |
+
|------|---------|
|
| 76 |
+
| `src/orchestrators/advanced.py` | Multi-agent orchestration (Microsoft AF) |
|
| 77 |
+
| `src/clients/factory.py` | Auto-selects LLM backend |
|
| 78 |
+
| `src/clients/huggingface.py` | HuggingFace adapter for free tier |
|
| 79 |
+
| `src/agent_factory/judges.py` | Judge logic (Pydantic AI) |
|
| 80 |
+
| `src/agents/*.py` | Individual agents (Pydantic AI) |
|
| 81 |
+
|
| 82 |
+
---
|
| 83 |
+
|
| 84 |
+
## What Was Deleted
|
| 85 |
+
|
| 86 |
+
`simple.py` (778 lines) was a SEPARATE orchestrator that created a "parallel universe." It's gone. Now there's ONE orchestrator with different backends.
|
| 87 |
+
|
| 88 |
+
---
|
| 89 |
+
|
| 90 |
+
## Upstream Blocker
|
| 91 |
+
|
| 92 |
+
**Bug:** Microsoft Agent Framework produces `repr()` garbage for tool-call-only messages.
|
| 93 |
+
|
| 94 |
+
**Fix:** [PR #2566](https://github.com/microsoft/agent-framework/pull/2566) - waiting to merge.
|
| 95 |
+
|
| 96 |
+
**Tracking:** [Issue #2562](https://github.com/microsoft/agent-framework/issues/2562)
|
| 97 |
+
|
| 98 |
+
---
|
| 99 |
+
|
| 100 |
+
## References
|
| 101 |
+
|
| 102 |
+
- [Pydantic AI](https://ai.pydantic.dev/) - Structured outputs framework
|
| 103 |
+
- [Microsoft Agent Framework](https://github.com/microsoft/agent-framework) - Multi-agent orchestration
|
| 104 |
+
- [AG-UI Protocol](https://www.copilotkit.ai/blog/introducing-pydantic-ai-integration-with-ag-ui) - How they integrate
|
|
@@ -1,21 +1,36 @@
|
|
| 1 |
# Active Bugs
|
| 2 |
|
| 3 |
-
> Last updated: 2025-12-01 (
|
| 4 |
>
|
| 5 |
> **Note:** Completed bug docs archived to `docs/bugs/archive/`
|
| 6 |
> **See also:** [Code Quality Audit Findings (2025-11-30)](AUDIT_FINDINGS_2025_11_30.md)
|
|
|
|
| 7 |
|
| 8 |
-
## P0 - Critical
|
| 9 |
|
| 10 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 11 |
|
| 12 |
---
|
| 13 |
|
| 14 |
-
## P3 - UX Polish
|
| 15 |
-
...
|
| 16 |
## Resolved Bugs
|
| 17 |
|
| 18 |
### ~~P0 - AIFunction Not JSON Serializable~~ FIXED
|
|
|
|
| 19 |
**File:** `docs/bugs/P0_AIFUNCTION_NOT_JSON_SERIALIZABLE.md`
|
| 20 |
**Found:** 2025-12-01
|
| 21 |
**Resolved:** 2025-12-01
|
|
@@ -27,6 +42,7 @@
|
|
| 27 |
- Result: Free Tier now supports full function calling capabilities with Qwen2.5-72B.
|
| 28 |
|
| 29 |
### ~~P1 - HuggingFace Router 401 Unauthorized~~ FIXED
|
|
|
|
| 30 |
**File:** `docs/bugs/P1_HUGGINGFACE_ROUTER_401_HYPERBOLIC.md`
|
| 31 |
**Found:** 2025-12-01
|
| 32 |
**Resolved:** 2025-12-01
|
|
@@ -36,18 +52,8 @@
|
|
| 36 |
- Fix: Generated new valid HF_TOKEN, updated `.env` and Spaces secrets
|
| 37 |
- Also switched default model to `Qwen/Qwen2.5-72B-Instruct` for better reliability
|
| 38 |
|
| 39 |
-
### ~~P0 - Simple Mode Ignores Forced Synthesis~~ FIXED
|
| 40 |
-
**File:** `docs/bugs/P0_SIMPLE_MODE_FORCED_SYNTHESIS_BYPASS.md`
|
| 41 |
-
**Issue:** [#113](https://github.com/The-Obstacle-Is-The-Way/DeepBoner/issues/113)
|
| 42 |
-
**PR:** [#115](https://github.com/The-Obstacle-Is-The-Way/DeepBoner/pull/115) (SPEC-16)
|
| 43 |
-
**Found:** 2025-12-01
|
| 44 |
-
**Resolved:** 2025-12-01
|
| 45 |
-
|
| 46 |
-
- Problem: Simple Mode ignored forced synthesis signals from Judge.
|
| 47 |
-
- Fix: SPEC-16 unified architecture - removed Simple Mode entirely, integrated HuggingFace into Advanced Mode.
|
| 48 |
-
- Simple Mode code deleted, capability preserved via `HuggingFaceChatClient` adapter.
|
| 49 |
-
|
| 50 |
### ~~P1 - Advanced Mode Exposes Uninterpretable Chain-of-Thought~~ FIXED
|
|
|
|
| 51 |
**File:** `docs/bugs/P1_ADVANCED_MODE_UNINTERPRETABLE_CHAIN_OF_THOUGHT.md`
|
| 52 |
**PR:** [#107](https://github.com/The-Obstacle-Is-The-Way/DeepBoner/pull/107)
|
| 53 |
**Found:** 2025-12-01
|
|
@@ -59,6 +65,7 @@
|
|
| 59 |
- CodeRabbit review addressed: test markers, edge case handling, truncation test coverage.
|
| 60 |
|
| 61 |
### ~~P0 - Advanced Mode Timeout Yields No Synthesis~~ FIXED
|
|
|
|
| 62 |
**File:** `docs/bugs/P0_ADVANCED_MODE_TIMEOUT_NO_SYNTHESIS.md`
|
| 63 |
**Found:** 2025-11-30 (Manual Testing)
|
| 64 |
**Resolved:** 2025-12-01
|
|
@@ -75,38 +82,35 @@
|
|
| 75 |
- Tests: `tests/unit/orchestrators/test_advanced_timeout.py`
|
| 76 |
- Key files: `src/orchestrators/advanced.py`, `src/orchestrators/factory.py`, `src/services/research_memory.py`
|
| 77 |
|
| 78 |
-
### ~~P0 - Free Tier Synthesis Incorrectly Uses Server-Side API Keys~~ FIXED
|
|
|
|
| 79 |
**File:** `docs/bugs/P1_SYNTHESIS_BROKEN_KEY_FALLBACK.md`
|
| 80 |
**PR:** [#103](https://github.com/The-Obstacle-Is-The-Way/DeepBoner/pull/103)
|
| 81 |
**Found:** 2025-11-30 (Testing)
|
| 82 |
**Resolved:** 2025-11-30
|
| 83 |
-
**Verified:** Free Tier now produces full LLM-synthesized research reports β
|
| 84 |
|
| 85 |
-
- Problem: Simple Mode crashed with "OpenAIError" on HuggingFace Spaces
|
| 86 |
-
-
|
| 87 |
-
|
| 88 |
-
-
|
| 89 |
|
| 90 |
-
### ~~P0 - Synthesis Fails with OpenAIError in Free Mode~~ FIXED
|
| 91 |
**File:** `docs/bugs/P0_SYNTHESIS_PROVIDER_MISMATCH.md`
|
| 92 |
**Found:** 2025-11-30 (Code Audit)
|
| 93 |
**Resolved:** 2025-11-30
|
| 94 |
|
| 95 |
- Problem: "Simple Mode" (Free Tier) crashed with `OpenAIError`.
|
| 96 |
-
-
|
| 97 |
-
|
| 98 |
-
-
|
| 99 |
|
| 100 |
-
### ~~P0 - Simple Mode Never Synthesizes~~ FIXED
|
| 101 |
**PR:** [#71](https://github.com/The-Obstacle-Is-The-Way/DeepBoner/pull/71) (SPEC_06)
|
| 102 |
**Commit**: `5cac97d` (2025-11-29)
|
| 103 |
|
| 104 |
- Root cause: LLM-as-Judge recommendations were being IGNORED
|
| 105 |
-
-
|
| 106 |
-
- Added combined score thresholds, late-iteration logic, emergency fallback
|
| 107 |
-
- Simple mode now synthesizes instead of spinning forever
|
| 108 |
|
| 109 |
### ~~P3 - Magentic Mode Missing Termination Guarantee~~ FIXED
|
|
|
|
| 110 |
**Commit**: `d36ce3c` (2025-11-29)
|
| 111 |
|
| 112 |
- Added `final_event_received` tracking in `orchestrator_magentic.py`
|
|
@@ -114,6 +118,7 @@
|
|
| 114 |
- Verified with `test_magentic_termination.py`
|
| 115 |
|
| 116 |
### ~~P0 - Magentic Mode Report Generation~~ FIXED
|
|
|
|
| 117 |
**Commit**: `9006d69` (2025-11-29)
|
| 118 |
|
| 119 |
- Fixed `_extract_text()` to handle various message object formats
|
|
@@ -122,6 +127,7 @@
|
|
| 122 |
- Advanced mode now produces full research reports
|
| 123 |
|
| 124 |
### ~~P1 - Streaming Spam + API Key Persistence~~ FIXED
|
|
|
|
| 125 |
**Commit**: `0c9be4a` (2025-11-29)
|
| 126 |
|
| 127 |
- Streaming events now buffered (not token-by-token spam)
|
|
@@ -129,6 +135,7 @@
|
|
| 129 |
- Examples use explicit `None` values to avoid overwriting keys
|
| 130 |
|
| 131 |
### ~~P2 - Missing "Thinking" State~~ FIXED
|
|
|
|
| 132 |
**Commit**: `9006d69` (2025-11-29)
|
| 133 |
|
| 134 |
- Added `"thinking"` event type with hourglass icon
|
|
@@ -136,6 +143,7 @@
|
|
| 136 |
- Users now see feedback during 2-5 minute initial processing
|
| 137 |
|
| 138 |
### ~~P2 - Gradio Example Not Filling Chat Box~~ FIXED
|
|
|
|
| 139 |
**Commit**: `2ea01fd` (2025-11-29)
|
| 140 |
|
| 141 |
- Third example (HSDD) wasn't populating chat box when clicked
|
|
|
|
| 1 |
# Active Bugs
|
| 2 |
|
| 3 |
+
> Last updated: 2025-12-01 (21:00 PST)
|
| 4 |
>
|
| 5 |
> **Note:** Completed bug docs archived to `docs/bugs/archive/`
|
| 6 |
> **See also:** [Code Quality Audit Findings (2025-11-30)](AUDIT_FINDINGS_2025_11_30.md)
|
| 7 |
+
> **See also:** [ARCHITECTURE.md](../ARCHITECTURE.md) for unified architecture plan
|
| 8 |
|
| 9 |
+
## P0 - Critical (BLOCKED)
|
| 10 |
|
| 11 |
+
### Free Tier Broken (Upstream #2562)
|
| 12 |
+
|
| 13 |
+
**Issue:** [#105](https://github.com/The-Obstacle-Is-The-Way/DeepBoner/issues/105), [#113](https://github.com/The-Obstacle-Is-The-Way/DeepBoner/issues/113)
|
| 14 |
+
**Status:** BLOCKED - Waiting for upstream PR #2566
|
| 15 |
+
|
| 16 |
+
**Problem:** Free tier (Advanced Mode + HuggingFace) shows repr garbage output.
|
| 17 |
+
|
| 18 |
+
**Cause:** Microsoft Agent Framework upstream bug #2562.
|
| 19 |
+
|
| 20 |
+
**Fix:** Upstream PR #2566 will fix this. Once merged:
|
| 21 |
+
1. Update `agent-framework` dependency
|
| 22 |
+
2. Verify Advanced + HuggingFace works
|
| 23 |
+
3. Unified architecture complete
|
| 24 |
+
|
| 25 |
+
**Architecture Note:** We have ONE unified architecture. `simple.py` is deleted.
|
| 26 |
+
Simple Mode behavior is INTEGRATED via `HuggingFaceChatClient`, not a parallel orchestrator.
|
| 27 |
|
| 28 |
---
|
| 29 |
|
|
|
|
|
|
|
| 30 |
## Resolved Bugs
|
| 31 |
|
| 32 |
### ~~P0 - AIFunction Not JSON Serializable~~ FIXED
|
| 33 |
+
|
| 34 |
**File:** `docs/bugs/P0_AIFUNCTION_NOT_JSON_SERIALIZABLE.md`
|
| 35 |
**Found:** 2025-12-01
|
| 36 |
**Resolved:** 2025-12-01
|
|
|
|
| 42 |
- Result: Free Tier now supports full function calling capabilities with Qwen2.5-72B.
|
| 43 |
|
| 44 |
### ~~P1 - HuggingFace Router 401 Unauthorized~~ FIXED
|
| 45 |
+
|
| 46 |
**File:** `docs/bugs/P1_HUGGINGFACE_ROUTER_401_HYPERBOLIC.md`
|
| 47 |
**Found:** 2025-12-01
|
| 48 |
**Resolved:** 2025-12-01
|
|
|
|
| 52 |
- Fix: Generated new valid HF_TOKEN, updated `.env` and Spaces secrets
|
| 53 |
- Also switched default model to `Qwen/Qwen2.5-72B-Instruct` for better reliability
|
| 54 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 55 |
### ~~P1 - Advanced Mode Exposes Uninterpretable Chain-of-Thought~~ FIXED
|
| 56 |
+
|
| 57 |
**File:** `docs/bugs/P1_ADVANCED_MODE_UNINTERPRETABLE_CHAIN_OF_THOUGHT.md`
|
| 58 |
**PR:** [#107](https://github.com/The-Obstacle-Is-The-Way/DeepBoner/pull/107)
|
| 59 |
**Found:** 2025-12-01
|
|
|
|
| 65 |
- CodeRabbit review addressed: test markers, edge case handling, truncation test coverage.
|
| 66 |
|
| 67 |
### ~~P0 - Advanced Mode Timeout Yields No Synthesis~~ FIXED
|
| 68 |
+
|
| 69 |
**File:** `docs/bugs/P0_ADVANCED_MODE_TIMEOUT_NO_SYNTHESIS.md`
|
| 70 |
**Found:** 2025-11-30 (Manual Testing)
|
| 71 |
**Resolved:** 2025-12-01
|
|
|
|
| 82 |
- Tests: `tests/unit/orchestrators/test_advanced_timeout.py`
|
| 83 |
- Key files: `src/orchestrators/advanced.py`, `src/orchestrators/factory.py`, `src/services/research_memory.py`
|
| 84 |
|
| 85 |
+
### ~~P0 - Free Tier Synthesis Incorrectly Uses Server-Side API Keys~~ FIXED (Historical)
|
| 86 |
+
|
| 87 |
**File:** `docs/bugs/P1_SYNTHESIS_BROKEN_KEY_FALLBACK.md`
|
| 88 |
**PR:** [#103](https://github.com/The-Obstacle-Is-The-Way/DeepBoner/pull/103)
|
| 89 |
**Found:** 2025-11-30 (Testing)
|
| 90 |
**Resolved:** 2025-11-30
|
|
|
|
| 91 |
|
| 92 |
+
- Problem: Simple Mode crashed with "OpenAIError" on HuggingFace Spaces.
|
| 93 |
+
- Note: This was in the OLD Simple Mode. Now we use Unified Architecture.
|
| 94 |
+
|
| 95 |
+
### ~~P0 - Synthesis Fails with OpenAIError in Free Mode~~ FIXED (Historical)
|
| 96 |
|
|
|
|
| 97 |
**File:** `docs/bugs/P0_SYNTHESIS_PROVIDER_MISMATCH.md`
|
| 98 |
**Found:** 2025-11-30 (Code Audit)
|
| 99 |
**Resolved:** 2025-11-30
|
| 100 |
|
| 101 |
- Problem: "Simple Mode" (Free Tier) crashed with `OpenAIError`.
|
| 102 |
+
- Note: This was in the OLD Simple Mode. Now we use Unified Architecture.
|
| 103 |
+
|
| 104 |
+
### ~~P0 - Simple Mode Never Synthesizes~~ FIXED (Historical)
|
| 105 |
|
|
|
|
| 106 |
**PR:** [#71](https://github.com/The-Obstacle-Is-The-Way/DeepBoner/pull/71) (SPEC_06)
|
| 107 |
**Commit**: `5cac97d` (2025-11-29)
|
| 108 |
|
| 109 |
- Root cause: LLM-as-Judge recommendations were being IGNORED
|
| 110 |
+
- Note: This was in the OLD Simple Mode. Now we use Unified Architecture.
|
|
|
|
|
|
|
| 111 |
|
| 112 |
### ~~P3 - Magentic Mode Missing Termination Guarantee~~ FIXED
|
| 113 |
+
|
| 114 |
**Commit**: `d36ce3c` (2025-11-29)
|
| 115 |
|
| 116 |
- Added `final_event_received` tracking in `orchestrator_magentic.py`
|
|
|
|
| 118 |
- Verified with `test_magentic_termination.py`
|
| 119 |
|
| 120 |
### ~~P0 - Magentic Mode Report Generation~~ FIXED
|
| 121 |
+
|
| 122 |
**Commit**: `9006d69` (2025-11-29)
|
| 123 |
|
| 124 |
- Fixed `_extract_text()` to handle various message object formats
|
|
|
|
| 127 |
- Advanced mode now produces full research reports
|
| 128 |
|
| 129 |
### ~~P1 - Streaming Spam + API Key Persistence~~ FIXED
|
| 130 |
+
|
| 131 |
**Commit**: `0c9be4a` (2025-11-29)
|
| 132 |
|
| 133 |
- Streaming events now buffered (not token-by-token spam)
|
|
|
|
| 135 |
- Examples use explicit `None` values to avoid overwriting keys
|
| 136 |
|
| 137 |
### ~~P2 - Missing "Thinking" State~~ FIXED
|
| 138 |
+
|
| 139 |
**Commit**: `9006d69` (2025-11-29)
|
| 140 |
|
| 141 |
- Added `"thinking"` event type with hourglass icon
|
|
|
|
| 143 |
- Users now see feedback during 2-5 minute initial processing
|
| 144 |
|
| 145 |
### ~~P2 - Gradio Example Not Filling Chat Box~~ FIXED
|
| 146 |
+
|
| 147 |
**Commit**: `2ea01fd` (2025-11-29)
|
| 148 |
|
| 149 |
- Third example (HSDD) wasn't populating chat box when clicked
|
|
@@ -0,0 +1,147 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Gradio Example Click vs Chat Arrow - Code Path Analysis
|
| 2 |
+
|
| 3 |
+
**Status**: ANALYZED - NOT A BUG (Same code path, different timing)
|
| 4 |
+
**Priority**: N/A (Symptom of upstream repr bug)
|
| 5 |
+
**Analyzed**: 2025-12-01
|
| 6 |
+
**Related**: P0_HUGGINGFACE_TOOL_CALLING_BROKEN.md
|
| 7 |
+
|
| 8 |
+
---
|
| 9 |
+
|
| 10 |
+
## Symptom Reported
|
| 11 |
+
|
| 12 |
+
User observed two different outputs when:
|
| 13 |
+
1. **Clicking an Example** β Shows progress at 10%, "THINKING" message
|
| 14 |
+
2. **Clicking Chat Arrow** β Shows full 5 rounds with repr garbage
|
| 15 |
+
|
| 16 |
+
User suspected divergent code paths from vestigial Simple Mode deletion.
|
| 17 |
+
|
| 18 |
+
---
|
| 19 |
+
|
| 20 |
+
## Analysis: NO DIVERGENT CODE PATHS
|
| 21 |
+
|
| 22 |
+
### Code Trace
|
| 23 |
+
|
| 24 |
+
Both Example Click and Chat Arrow use **the exact same code path**:
|
| 25 |
+
|
| 26 |
+
```text
|
| 27 |
+
User Action (Example OR Chat Arrow)
|
| 28 |
+
β
|
| 29 |
+
app.py:research_agent() β SAME FUNCTION
|
| 30 |
+
β
|
| 31 |
+
app.py:configure_orchestrator() β SAME FUNCTION (mode="advanced" always)
|
| 32 |
+
β
|
| 33 |
+
factory.py:create_orchestrator() β SAME FUNCTION
|
| 34 |
+
β
|
| 35 |
+
factory.py:_determine_mode() β ALWAYS returns "advanced"
|
| 36 |
+
β
|
| 37 |
+
AdvancedOrchestrator β SAME CLASS
|
| 38 |
+
β
|
| 39 |
+
clients/factory.py:get_chat_client() β SAME FUNCTION
|
| 40 |
+
β
|
| 41 |
+
HuggingFaceChatClient (no API key) OR OpenAIChatClient (with API key)
|
| 42 |
+
```
|
| 43 |
+
|
| 44 |
+
### Evidence from Code
|
| 45 |
+
|
| 46 |
+
**app.py:279-325 - ChatInterface Setup:**
|
| 47 |
+
```python
|
| 48 |
+
demo = gr.ChatInterface(
|
| 49 |
+
fn=research_agent, # β SAME FUNCTION FOR BOTH
|
| 50 |
+
examples=[
|
| 51 |
+
["What drugs improve female libido post-menopause?", "sexual_health", None, None],
|
| 52 |
+
# ...
|
| 53 |
+
],
|
| 54 |
+
# ...
|
| 55 |
+
)
|
| 56 |
+
```
|
| 57 |
+
|
| 58 |
+
**factory.py:76-90 - Mode Determination:**
|
| 59 |
+
```python
|
| 60 |
+
def _determine_mode(explicit_mode: str | None) -> str:
|
| 61 |
+
if explicit_mode == "hierarchical":
|
| 62 |
+
return "hierarchical"
|
| 63 |
+
# "simple" is deprecated -> upgrade to "advanced"
|
| 64 |
+
# "magentic" is alias for "advanced"
|
| 65 |
+
return "advanced" # β ALWAYS ADVANCED
|
| 66 |
+
```
|
| 67 |
+
|
| 68 |
+
---
|
| 69 |
+
|
| 70 |
+
## Explanation of Visual Difference
|
| 71 |
+
|
| 72 |
+
The difference the user observed is **timing**, not code paths:
|
| 73 |
+
|
| 74 |
+
| Screenshot | When Captured | Content |
|
| 75 |
+
|------------|---------------|---------|
|
| 76 |
+
| Example Click | Mid-execution | Progress bar at 10%, "THINKING" |
|
| 77 |
+
| Chat Arrow | After completion | Full 5 rounds with repr garbage |
|
| 78 |
+
|
| 79 |
+
**Both show the same process at different stages.**
|
| 80 |
+
|
| 81 |
+
The repr garbage (`<agent_framework._types.ChatMessage object at 0x...>`) appears in BOTH:
|
| 82 |
+
- Example Click: Would show repr garbage if captured after completion
|
| 83 |
+
- Chat Arrow: Shows repr garbage because it was captured after completion
|
| 84 |
+
|
| 85 |
+
---
|
| 86 |
+
|
| 87 |
+
## The Real Bug: Upstream repr Issue
|
| 88 |
+
|
| 89 |
+
The repr garbage is the **upstream Microsoft Agent Framework bug** documented in:
|
| 90 |
+
- `docs/bugs/P0_HUGGINGFACE_TOOL_CALLING_BROKEN.md`
|
| 91 |
+
|
| 92 |
+
**Root cause in upstream code:**
|
| 93 |
+
```python
|
| 94 |
+
# agent_framework/_workflows/_magentic.py line ~1799
|
| 95 |
+
text = last.text or str(last) # BUG: str(last) gives repr for tool-only messages
|
| 96 |
+
```
|
| 97 |
+
|
| 98 |
+
**Our workaround in advanced.py:**
|
| 99 |
+
```python
|
| 100 |
+
def _extract_text(self, message: Any) -> str:
|
| 101 |
+
# Filter out repr strings
|
| 102 |
+
if isinstance(message, str) and message.startswith("<") and "object at" in message:
|
| 103 |
+
return ""
|
| 104 |
+
# ...
|
| 105 |
+
```
|
| 106 |
+
|
| 107 |
+
---
|
| 108 |
+
|
| 109 |
+
## Verification
|
| 110 |
+
|
| 111 |
+
1. **No vestigial Simple Mode code** - `simple.py` is deleted, not imported anywhere
|
| 112 |
+
2. **Factory always returns AdvancedOrchestrator** - verified in `factory.py:66-73`
|
| 113 |
+
3. **Same research_agent function** - Gradio routes both Example and Chat Arrow through it
|
| 114 |
+
|
| 115 |
+
---
|
| 116 |
+
|
| 117 |
+
## Conclusion
|
| 118 |
+
|
| 119 |
+
**There are NO divergent code paths.** The unified architecture is correctly implemented:
|
| 120 |
+
|
| 121 |
+
| Component | Status |
|
| 122 |
+
|-----------|--------|
|
| 123 |
+
| Simple Mode | β
DELETED (no vestigial code) |
|
| 124 |
+
| Factory Pattern | β
Always returns AdvancedOrchestrator |
|
| 125 |
+
| Chat Client Factory | β
Auto-selects HuggingFace (free) or OpenAI (paid) |
|
| 126 |
+
| Example Click | β
Uses same `research_agent()` function |
|
| 127 |
+
| Chat Arrow Click | β
Uses same `research_agent()` function |
|
| 128 |
+
|
| 129 |
+
**The only bug is the upstream repr display issue**, which affects BOTH paths equally.
|
| 130 |
+
|
| 131 |
+
---
|
| 132 |
+
|
| 133 |
+
## Next Steps
|
| 134 |
+
|
| 135 |
+
1. **Wait for upstream fix** - [PR #2566](https://github.com/microsoft/agent-framework/pull/2566)
|
| 136 |
+
2. **Once merged**: `uv add agent-framework@latest`
|
| 137 |
+
3. **Test**: Verify both Example Click and Chat Arrow work identically
|
| 138 |
+
|
| 139 |
+
---
|
| 140 |
+
|
| 141 |
+
## References
|
| 142 |
+
|
| 143 |
+
- `src/app.py` - Line 134-247 (`research_agent()`)
|
| 144 |
+
- `src/app.py` - Line 279-325 (ChatInterface with examples)
|
| 145 |
+
- `src/orchestrators/factory.py` - Line 43-73 (`create_orchestrator()`)
|
| 146 |
+
- `src/clients/factory.py` - Line 15-76 (`get_chat_client()`)
|
| 147 |
+
- `docs/bugs/P0_HUGGINGFACE_TOOL_CALLING_BROKEN.md` - Upstream repr bug details
|
|
@@ -1,219 +1,59 @@
|
|
| 1 |
-
# P0 BUG: Simple Mode
|
| 2 |
|
| 3 |
-
**Status**:
|
| 4 |
**Priority**: P0 (Demo-blocking)
|
| 5 |
**Discovered**: 2025-12-01
|
| 6 |
-
**Affected Component**: `src/orchestrators/simple.py`
|
| 7 |
-
**Strategic Fix**: [SPEC_16: Unified Chat Client Architecture](../specs/SPEC_16_UNIFIED_CHAT_CLIENT_ARCHITECTURE.md)
|
| 8 |
**GitHub Issue**: [#113](https://github.com/The-Obstacle-Is-The-Way/DeepBoner/issues/113)
|
| 9 |
|
| 10 |
-
> **Decision**: Instead of patching Simple Mode, we will **INTEGRATE its capability into Advanced Mode** per SPEC_16.
|
| 11 |
-
>
|
| 12 |
-
> **What this means:**
|
| 13 |
-
> - β
Free-tier HuggingFace capability is PRESERVED via `HuggingFaceChatClient`
|
| 14 |
-
> - β
Users without API keys still get full functionality (Advanced Mode + HuggingFace backend)
|
| 15 |
-
> - ποΈ Simple Mode's redundant orchestration CODE is retired (not the capability!)
|
| 16 |
-
> - π The bug disappears because Advanced Mode's Manager agent handles termination correctly
|
| 17 |
-
|
| 18 |
---
|
| 19 |
|
| 20 |
-
##
|
| 21 |
-
|
| 22 |
-
When HuggingFace Inference API fails 3 consecutive times, the `HFInferenceJudgeHandler` correctly returns a "forced synthesis" assessment with `sufficient=True, recommendation="synthesize"`. However, **Simple Mode's `_should_synthesize()` method ignores this signal** because of overly strict code-enforced thresholds.
|
| 23 |
-
|
| 24 |
-
### Observed Behavior
|
| 25 |
-
|
| 26 |
-
```
|
| 27 |
-
β
JUDGE_COMPLETE: Assessment: synthesize (confidence: 10%)
|
| 28 |
-
π LOOPING: Gathering more evidence... β BUG: Should have synthesized!
|
| 29 |
-
```
|
| 30 |
-
|
| 31 |
-
The orchestrator loops **10 full iterations** despite the judge repeatedly saying "synthesize" after iteration 4.
|
| 32 |
|
| 33 |
-
|
| 34 |
|
| 35 |
-
|
| 36 |
-
- `sufficient=True`
|
| 37 |
-
- `recommendation="synthesize"`
|
| 38 |
-
|
| 39 |
-
The orchestrator should **immediately synthesize**, regardless of score thresholds.
|
| 40 |
|
| 41 |
---
|
| 42 |
|
| 43 |
-
##
|
| 44 |
-
|
| 45 |
-
### The Forced Synthesis Assessment (judges.py:514-549)
|
| 46 |
-
|
| 47 |
-
```python
|
| 48 |
-
def _create_forced_synthesis_assessment(self, question, evidence):
|
| 49 |
-
return JudgeAssessment(
|
| 50 |
-
details=AssessmentDetails(
|
| 51 |
-
mechanism_score=0, # β Problem 1: Score is 0
|
| 52 |
-
clinical_evidence_score=0, # β Problem 2: Score is 0
|
| 53 |
-
drug_candidates=["AI analysis required..."],
|
| 54 |
-
key_findings=findings,
|
| 55 |
-
),
|
| 56 |
-
sufficient=True, # β Correct: Says sufficient
|
| 57 |
-
confidence=0.1, # β Problem 3: Too low for emergency
|
| 58 |
-
recommendation="synthesize", # β Correct: Says synthesize
|
| 59 |
-
...
|
| 60 |
-
)
|
| 61 |
-
```
|
| 62 |
-
|
| 63 |
-
### The _should_synthesize Logic (simple.py:159-216)
|
| 64 |
|
| 65 |
-
|
| 66 |
-
def _should_synthesize(self, assessment, iteration, max_iterations, evidence_count):
|
| 67 |
-
combined_score = mechanism_score + clinical_evidence_score # = 0
|
| 68 |
|
| 69 |
-
|
| 70 |
-
|
| 71 |
-
|
| 72 |
-
return True, "judge_approved"
|
| 73 |
-
|
| 74 |
-
# Priority 2-5: All require scores or drug candidates we don't have
|
| 75 |
-
|
| 76 |
-
# Priority 6: Emergency synthesis
|
| 77 |
-
if is_late_iteration and evidence_count >= 30 and confidence >= 0.5:
|
| 78 |
-
# β 0.1 >= 0.5 is FALSE!
|
| 79 |
-
return True, "emergency_synthesis"
|
| 80 |
-
|
| 81 |
-
return False, "continue_searching" # β Always ends up here!
|
| 82 |
```
|
| 83 |
|
| 84 |
-
### The Bug
|
| 85 |
-
|
| 86 |
-
1. **Priority 1 has wrong precondition**: It checks `combined_score >= 10` even when the judge explicitly says "synthesize". The score check should be skipped when it's a forced/error recovery synthesis.
|
| 87 |
-
|
| 88 |
-
2. **Priority 6 confidence threshold is too high**: 0.5 confidence is reasonable for "emergency" synthesis, but forced synthesis from API failures uses 0.1 confidence to indicate low qualityβthis should still trigger synthesis.
|
| 89 |
-
|
| 90 |
-
---
|
| 91 |
-
|
| 92 |
-
## Impact
|
| 93 |
-
|
| 94 |
-
- **User sees**: 10 iterations of "Gathering more evidence" with 0% confidence
|
| 95 |
-
- **Final output**: Partial synthesis with "Max iterations reached"
|
| 96 |
-
- **Time wasted**: ~2-3 minutes of useless API calls
|
| 97 |
-
- **UX**: Extremely confusing - user sees "synthesize" but system keeps searching
|
| 98 |
-
|
| 99 |
---
|
| 100 |
|
| 101 |
-
##
|
| 102 |
|
| 103 |
-
|
|
|
|
|
|
|
|
|
|
| 104 |
|
| 105 |
-
|
| 106 |
|
| 107 |
-
|
| 108 |
-
|
| 109 |
-
**Delete Simple Mode entirely and unify on Advanced Mode.**
|
| 110 |
-
|
| 111 |
-
See: [SPEC_16: Unified Chat Client Architecture](../specs/SPEC_16_UNIFIED_CHAT_CLIENT_ARCHITECTURE.md)
|
| 112 |
-
|
| 113 |
-
The implementation path:
|
| 114 |
-
|
| 115 |
-
1. **Phase 1**: Create `HuggingFaceChatClient` adapter (~150 lines)
|
| 116 |
-
- Implements `agent_framework.BaseChatClient`
|
| 117 |
-
- Wraps `huggingface_hub.InferenceClient`
|
| 118 |
-
- Enables Advanced Mode to work with free tier
|
| 119 |
-
|
| 120 |
-
2. **Phase 2**: Delete Simple Mode
|
| 121 |
-
- Remove `src/orchestrators/simple.py` (~778 lines)
|
| 122 |
-
- Remove `src/tools/search_handler.py` (~219 lines)
|
| 123 |
-
- Update factory to always use `AdvancedOrchestrator`
|
| 124 |
-
|
| 125 |
-
3. **Why this works**: Advanced Mode uses Microsoft Agent Framework's built-in termination. When JudgeAgent returns "SUFFICIENT EVIDENCE" (per SPEC_15), the Manager agent immediately delegates to ReportAgent. **No custom `_should_synthesize()` thresholds needed.**
|
| 126 |
-
|
| 127 |
-
### Why Unification > Patching
|
| 128 |
-
|
| 129 |
-
| Approach | Lines Changed | Bug Fixed? | Technical Debt |
|
| 130 |
-
|----------|---------------|------------|----------------|
|
| 131 |
-
| Patch Simple Mode | +20 lines | Temporarily | Adds complexity |
|
| 132 |
-
| **SPEC_16 Unification** | **-997 lines** | **Permanently** | **Eliminates 778 lines** |
|
| 133 |
-
|
| 134 |
-
---
|
| 135 |
-
|
| 136 |
-
## Files to DELETE (via SPEC_16)
|
| 137 |
-
|
| 138 |
-
| File | Lines | Reason |
|
| 139 |
-
|------|-------|--------|
|
| 140 |
-
| `src/orchestrators/simple.py` | 778 | Contains buggy `_should_synthesize()` - entire file deleted |
|
| 141 |
-
| `src/tools/search_handler.py` | 219 | Manager agent handles orchestration in Advanced Mode |
|
| 142 |
-
|
| 143 |
-
## Files to CREATE (via SPEC_16)
|
| 144 |
-
|
| 145 |
-
| File | Lines | Purpose |
|
| 146 |
-
|------|-------|---------|
|
| 147 |
-
| `src/clients/__init__.py` | ~10 | Package exports |
|
| 148 |
-
| `src/clients/factory.py` | ~50 | `get_chat_client()` factory |
|
| 149 |
-
| `src/clients/huggingface.py` | ~150 | `HuggingFaceChatClient` adapter |
|
| 150 |
-
|
| 151 |
-
**Net change: -997 lines deleted, +210 lines added = ~787 lines removed**
|
| 152 |
-
|
| 153 |
-
---
|
| 154 |
-
|
| 155 |
-
## Acceptance Criteria (SPEC_16 Implementation)
|
| 156 |
-
|
| 157 |
-
- [ ] `HuggingFaceChatClient` implements `agent_framework.BaseChatClient`
|
| 158 |
-
- [ ] `get_chat_client()` returns HuggingFace client when no OpenAI key
|
| 159 |
-
- [ ] `AdvancedOrchestrator` works with HuggingFace backend
|
| 160 |
-
- [ ] `simple.py` is deleted (778 lines removed)
|
| 161 |
-
- [ ] Free tier users get Advanced Mode with HuggingFace
|
| 162 |
-
- [ ] No more "continue_searching" loops when HF fails
|
| 163 |
-
- [ ] Manager agent respects "SUFFICIENT EVIDENCE" signal (SPEC_15)
|
| 164 |
-
|
| 165 |
-
---
|
| 166 |
-
|
| 167 |
-
## Test Case (SPEC_16 Verification)
|
| 168 |
-
|
| 169 |
-
```python
|
| 170 |
-
@pytest.mark.asyncio
|
| 171 |
-
async def test_unified_architecture_handles_hf_failures():
|
| 172 |
-
"""
|
| 173 |
-
After SPEC_16: Free tier uses Advanced Mode with HuggingFace backend.
|
| 174 |
-
When HF fails, Manager agent should trigger synthesis via ReportAgent.
|
| 175 |
-
|
| 176 |
-
This test replaces the old Simple Mode test because:
|
| 177 |
-
- simple.py is DELETED
|
| 178 |
-
- Advanced Mode handles termination via Manager agent signals
|
| 179 |
-
- No _should_synthesize() thresholds to bypass
|
| 180 |
-
"""
|
| 181 |
-
from unittest.mock import patch, MagicMock
|
| 182 |
-
from src.orchestrators.advanced import AdvancedOrchestrator
|
| 183 |
-
from src.clients.factory import get_chat_client
|
| 184 |
-
|
| 185 |
-
# Verify factory returns HuggingFace client when no OpenAI key
|
| 186 |
-
with patch("src.utils.config.settings") as mock_settings:
|
| 187 |
-
mock_settings.has_openai_key = False
|
| 188 |
-
mock_settings.has_gemini_key = False
|
| 189 |
-
mock_settings.has_huggingface_key = True
|
| 190 |
-
|
| 191 |
-
client = get_chat_client()
|
| 192 |
-
assert "HuggingFace" in type(client).__name__
|
| 193 |
-
|
| 194 |
-
# Verify AdvancedOrchestrator accepts HuggingFace client
|
| 195 |
-
# (The actual termination is handled by Manager agent respecting
|
| 196 |
-
# "SUFFICIENT EVIDENCE" signals per SPEC_15)
|
| 197 |
-
```
|
| 198 |
|
| 199 |
---
|
| 200 |
|
| 201 |
-
##
|
| 202 |
|
| 203 |
-
|
| 204 |
-
|
| 205 |
-
|
| 206 |
-
|
| 207 |
-
| [Issue #105](https://github.com/The-Obstacle-Is-The-Way/DeepBoner/issues/105) | GitHub | Deprecate Simple Mode |
|
| 208 |
-
| [Issue #109](https://github.com/The-Obstacle-Is-The-Way/DeepBoner/issues/109) | GitHub | Simplify Provider Architecture |
|
| 209 |
-
| [Issue #110](https://github.com/The-Obstacle-Is-The-Way/DeepBoner/issues/110) | GitHub | Remove Anthropic Support |
|
| 210 |
-
| PR #71 (SPEC_06) | PR | Added `_should_synthesize()` - now causes this bug |
|
| 211 |
-
| Commit 5e761eb | Commit | Added `_create_forced_synthesis_assessment()` |
|
| 212 |
|
| 213 |
---
|
| 214 |
|
| 215 |
-
##
|
| 216 |
|
| 217 |
-
|
| 218 |
-
|
| 219 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# P0 BUG: Simple Mode Synthesis Bypass (WILL BE FIXED BY UNIFIED ARCHITECTURE)
|
| 2 |
|
| 3 |
+
**Status**: BLOCKED - Waiting for upstream PR #2566
|
| 4 |
**Priority**: P0 (Demo-blocking)
|
| 5 |
**Discovered**: 2025-12-01
|
|
|
|
|
|
|
| 6 |
**GitHub Issue**: [#113](https://github.com/The-Obstacle-Is-The-Way/DeepBoner/issues/113)
|
| 7 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 8 |
---
|
| 9 |
|
| 10 |
+
## Current State
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 11 |
|
| 12 |
+
**`simple.py` is DELETED.** This bug existed in the old Simple Mode code.
|
| 13 |
|
| 14 |
+
The bug will NOT be fixed by restoring Simple Mode. Instead, it will be **automatically fixed** when we complete the unified architecture (after upstream PR #2566 merges).
|
|
|
|
|
|
|
|
|
|
|
|
|
| 15 |
|
| 16 |
---
|
| 17 |
|
| 18 |
+
## The Bug (Historical)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 19 |
|
| 20 |
+
When HuggingFace Inference API failed, Simple Mode's `_should_synthesize()` ignored forced synthesis signals due to overly strict thresholds.
|
|
|
|
|
|
|
| 21 |
|
| 22 |
+
```text
|
| 23 |
+
β
JUDGE_COMPLETE: Assessment: synthesize (confidence: 10%)
|
| 24 |
+
π LOOPING: Gathering more evidence... β BUG: Should have synthesized!
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 25 |
```
|
| 26 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 27 |
---
|
| 28 |
|
| 29 |
+
## Why Unified Architecture Fixes This
|
| 30 |
|
| 31 |
+
| Architecture | How Termination Works |
|
| 32 |
+
|--------------|----------------------|
|
| 33 |
+
| **Old (Simple Mode)** | Custom `_should_synthesize()` with buggy thresholds |
|
| 34 |
+
| **New (Unified)** | Manager agent respects "SUFFICIENT EVIDENCE" signals |
|
| 35 |
|
| 36 |
+
The Manager agent in Advanced Mode already works correctly. By completing the unified architecture with HuggingFace support, we inherit that correct behavior.
|
| 37 |
|
| 38 |
+
**No need to patch `_should_synthesize()` because the code is deleted.**
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 39 |
|
| 40 |
---
|
| 41 |
|
| 42 |
+
## Path Forward
|
| 43 |
|
| 44 |
+
1. **Wait** for upstream PR #2566 to merge (fixes repr bug)
|
| 45 |
+
2. **Update** `agent-framework` dependency
|
| 46 |
+
3. **Verify** Advanced Mode + HuggingFace works
|
| 47 |
+
4. **Done** - This bug is gone (no `_should_synthesize()` thresholds)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 48 |
|
| 49 |
---
|
| 50 |
|
| 51 |
+
## Related
|
| 52 |
|
| 53 |
+
| Reference | Description |
|
| 54 |
+
|-----------|-------------|
|
| 55 |
+
| [ARCHITECTURE.md](../ARCHITECTURE.md) | Current state and unified plan |
|
| 56 |
+
| [SPEC_16](../specs/SPEC_16_UNIFIED_CHAT_CLIENT_ARCHITECTURE.md) | Unified architecture spec |
|
| 57 |
+
| [Issue #105](https://github.com/The-Obstacle-Is-The-Way/DeepBoner/issues/105) | GitHub tracking |
|
| 58 |
+
| [Upstream #2562](https://github.com/microsoft/agent-framework/issues/2562) | Framework bug |
|
| 59 |
+
| [Upstream PR #2566](https://github.com/microsoft/agent-framework/pull/2566) | Framework fix |
|
|
@@ -0,0 +1,61 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Free Tier (No API Key) - BLOCKED by Upstream #2562
|
| 2 |
+
|
| 3 |
+
**Status**: BLOCKED - Waiting for upstream PR #2566
|
| 4 |
+
**Priority**: P1
|
| 5 |
+
**Discovered**: 2025-12-01
|
| 6 |
+
|
| 7 |
+
---
|
| 8 |
+
|
| 9 |
+
## Problem
|
| 10 |
+
|
| 11 |
+
Free tier (no API key provided) shows garbage output:
|
| 12 |
+
|
| 13 |
+
```
|
| 14 |
+
π **SEARCH_COMPLETE**: searcher: <agent_framework._types.ChatMessage object at 0x7fd3f8617b10>
|
| 15 |
+
```
|
| 16 |
+
|
| 17 |
+
## Cause
|
| 18 |
+
|
| 19 |
+
**Upstream Bug #2562**: Microsoft Agent Framework produces `repr()` garbage for tool-call-only messages.
|
| 20 |
+
|
| 21 |
+
## Architecture
|
| 22 |
+
|
| 23 |
+
```
|
| 24 |
+
User provides API key?
|
| 25 |
+
|
| 26 |
+
NO (Free Tier) YES (Paid Tier)
|
| 27 |
+
ββββββββββββββ βββββββββββββββ
|
| 28 |
+
HuggingFace backend OpenAI backend
|
| 29 |
+
Qwen 2.5 72B (free) GPT-5 (paid)
|
| 30 |
+
|
| 31 |
+
SAME orchestration, different backends
|
| 32 |
+
ONE codebase, not parallel universes
|
| 33 |
+
```
|
| 34 |
+
|
| 35 |
+
## Framework Stack
|
| 36 |
+
|
| 37 |
+
| Framework | Role |
|
| 38 |
+
|-----------|------|
|
| 39 |
+
| Microsoft Agent Framework | Multi-agent orchestration |
|
| 40 |
+
| Pydantic AI | Structured outputs & validation |
|
| 41 |
+
|
| 42 |
+
Both work TOGETHER. Not mutually exclusive.
|
| 43 |
+
|
| 44 |
+
## Fix
|
| 45 |
+
|
| 46 |
+
**Upstream PR #2566** will fix this.
|
| 47 |
+
|
| 48 |
+
Once merged:
|
| 49 |
+
1. `uv add agent-framework@latest`
|
| 50 |
+
2. Verify free tier works
|
| 51 |
+
3. Done
|
| 52 |
+
|
| 53 |
+
## What Was Deleted
|
| 54 |
+
|
| 55 |
+
`simple.py` (778 lines) was a SEPARATE orchestrator. Created parallel universe. Now deleted. ONE orchestrator with different backends.
|
| 56 |
+
|
| 57 |
+
## Related
|
| 58 |
+
|
| 59 |
+
- [Issue #105](https://github.com/The-Obstacle-Is-The-Way/DeepBoner/issues/105)
|
| 60 |
+
- [Upstream #2562](https://github.com/microsoft/agent-framework/issues/2562)
|
| 61 |
+
- [Upstream PR #2566](https://github.com/microsoft/agent-framework/pull/2566)
|
|
@@ -1,350 +1,115 @@
|
|
| 1 |
-
# SPEC_16: Unified
|
| 2 |
|
| 3 |
-
**Status**:
|
| 4 |
-
**Priority**: P0
|
| 5 |
-
**Issue**:
|
| 6 |
**Created**: 2025-12-01
|
| 7 |
-
**Last Updated**: 2025-12-01
|
| 8 |
|
| 9 |
---
|
| 10 |
|
| 11 |
-
##
|
| 12 |
-
|
| 13 |
-
**This spec INTEGRATES Simple Mode's free-tier capability into Advanced Mode.**
|
| 14 |
-
|
| 15 |
-
| What We're Doing | What We're NOT Doing |
|
| 16 |
-
|------------------|----------------------|
|
| 17 |
-
| β
Integrating HuggingFace support into Advanced Mode | β Removing free-tier capability |
|
| 18 |
-
| β
Unifying two parallel implementations into one | β Breaking functionality for users without API keys |
|
| 19 |
-
| β
Deleting redundant orchestration CODE | β Deleting the CAPABILITY that code provided |
|
| 20 |
-
| β
Making Advanced Mode work with ANY provider | β Locking users into paid-only tiers |
|
| 21 |
-
|
| 22 |
-
**After this spec:**
|
| 23 |
-
- Users WITH OpenAI key β Advanced Mode (OpenAI backend) β
|
| 24 |
-
- Users WITHOUT any key β Advanced Mode (HuggingFace backend) β
**SAME CAPABILITY, UNIFIED ARCHITECTURE**
|
| 25 |
-
|
| 26 |
-
---
|
| 27 |
-
|
| 28 |
-
## Summary
|
| 29 |
-
|
| 30 |
-
Unify Simple Mode and Advanced Mode into a **single orchestration system** by:
|
| 31 |
-
|
| 32 |
-
1. **Renaming the namespace**: `OpenAIChatClient` β `BaseChatClient` (neutral protocol)
|
| 33 |
-
2. **Creating an adapter**: `HuggingFaceChatClient` implements `BaseChatClient`
|
| 34 |
-
3. **Retiring parallel code**: Simple Mode's while-loop becomes unnecessary
|
| 35 |
-
|
| 36 |
-
The result: **One codebase, multiple providers, zero parallel universes.**
|
| 37 |
-
|
| 38 |
-
> **π₯ P0 Bug Fix**: This also resolves Issue #113. Simple Mode's `_should_synthesize()` has a bug that ignores forced synthesis signals. Advanced Mode's Manager agent handles termination correctly. By integrating, the bug disappears.
|
| 39 |
-
|
| 40 |
-
---
|
| 41 |
-
|
| 42 |
-
## The Integration Concept
|
| 43 |
-
|
| 44 |
-
### Before: Two Parallel Universes (Current)
|
| 45 |
-
|
| 46 |
-
```text
|
| 47 |
-
User Query
|
| 48 |
-
β
|
| 49 |
-
βββ Has API Key? ββYesβββ Advanced Mode (488 lines)
|
| 50 |
-
β βββ Microsoft Agent Framework
|
| 51 |
-
β βββ OpenAIChatClient (hardcoded) βββ THE BOTTLENECK
|
| 52 |
-
β
|
| 53 |
-
βββ No API Key? βββββββββββ Simple Mode (778 lines)
|
| 54 |
-
βββ While-loop orchestration (SEPARATE CODE)
|
| 55 |
-
βββ Pydantic AI + HuggingFace
|
| 56 |
-
```
|
| 57 |
-
|
| 58 |
-
**Problem**: Same capability, two implementations, double maintenance, P0 bug in Simple Mode.
|
| 59 |
-
|
| 60 |
-
### After: Unified Architecture (This Spec)
|
| 61 |
|
| 62 |
```text
|
| 63 |
-
|
| 64 |
-
|
| 65 |
-
|
| 66 |
-
|
| 67 |
-
|
| 68 |
-
|
| 69 |
-
|
| 70 |
-
|
| 71 |
-
|
| 72 |
-
|
| 73 |
-
|
| 74 |
-
|
| 75 |
-
|
| 76 |
-
---
|
| 77 |
-
|
| 78 |
-
## What Gets Integrated vs Retired
|
| 79 |
-
|
| 80 |
-
### β
INTEGRATED (Capability Preserved)
|
| 81 |
-
|
| 82 |
-
| Simple Mode Component | Integration Target | How |
|
| 83 |
-
|-----------------------|-------------------|-----|
|
| 84 |
-
| HuggingFace LLM calls | `HuggingFaceChatClient` | New adapter (~150 lines) |
|
| 85 |
-
| Free-tier access | `get_chat_client()` factory | Auto-selects HF when no key |
|
| 86 |
-
| Search tools (PubMed, etc.) | Already shared | `src/agents/tools.py` |
|
| 87 |
-
| Evidence models | Already shared | `src/utils/models.py` |
|
| 88 |
-
|
| 89 |
-
### ποΈ RETIRED (Redundant Code Removed)
|
| 90 |
-
|
| 91 |
-
| Simple Mode Component | Why Retired | Replacement in Advanced Mode |
|
| 92 |
-
|-----------------------|-------------|------------------------------|
|
| 93 |
-
| While-loop orchestration | Redundant | Manager agent orchestrates |
|
| 94 |
-
| `_should_synthesize()` thresholds | **BUGGY** (P0 #113) | Manager agent signals |
|
| 95 |
-
| `SearchHandler` scatter-gather | Redundant | SearchAgent handles this |
|
| 96 |
-
| `JudgeHandler` | Redundant | JudgeAgent handles this |
|
| 97 |
-
|
| 98 |
-
**Key insight**: We're not losing functionality. We're consolidating two implementations of the SAME functionality into one.
|
| 99 |
-
|
| 100 |
-
---
|
| 101 |
-
|
| 102 |
-
## Technical Implementation
|
| 103 |
-
|
| 104 |
-
### The Single Change That Enables Unification
|
| 105 |
-
|
| 106 |
-
```python
|
| 107 |
-
# BEFORE (hardcoded to OpenAI):
|
| 108 |
-
from agent_framework.openai import OpenAIChatClient
|
| 109 |
-
|
| 110 |
-
class AdvancedOrchestrator:
|
| 111 |
-
def __init__(self, ...):
|
| 112 |
-
self._chat_client = OpenAIChatClient(...) # β Only OpenAI works
|
| 113 |
-
|
| 114 |
-
# AFTER (neutral - any provider):
|
| 115 |
-
from agent_framework import BaseChatClient
|
| 116 |
-
from src.clients.factory import get_chat_client
|
| 117 |
-
|
| 118 |
-
class AdvancedOrchestrator:
|
| 119 |
-
def __init__(self, ...):
|
| 120 |
-
self._chat_client = get_chat_client() # β
OpenAI, Gemini, OR HuggingFace
|
| 121 |
-
```
|
| 122 |
-
|
| 123 |
-
### HuggingFaceChatClient Adapter
|
| 124 |
-
|
| 125 |
-
```python
|
| 126 |
-
# src/clients/huggingface.py
|
| 127 |
-
from agent_framework import BaseChatClient, ChatMessage, ChatResponse
|
| 128 |
-
from huggingface_hub import InferenceClient
|
| 129 |
-
|
| 130 |
-
class HuggingFaceChatClient(BaseChatClient):
|
| 131 |
-
"""Adapter that makes HuggingFace work with Microsoft Agent Framework."""
|
| 132 |
-
|
| 133 |
-
def __init__(self, model_id: str = "meta-llama/Llama-3.1-70B-Instruct"):
|
| 134 |
-
self._client = InferenceClient(model=model_id)
|
| 135 |
-
self._model_id = model_id
|
| 136 |
-
|
| 137 |
-
async def _inner_get_response(
|
| 138 |
-
self,
|
| 139 |
-
messages: list[ChatMessage],
|
| 140 |
-
**kwargs
|
| 141 |
-
) -> ChatResponse:
|
| 142 |
-
"""Convert HuggingFace response to Agent Framework format."""
|
| 143 |
-
# Convert messages to HF format
|
| 144 |
-
hf_messages = [{"role": m.role, "content": m.content} for m in messages]
|
| 145 |
-
|
| 146 |
-
# Call HuggingFace
|
| 147 |
-
response = self._client.chat_completion(messages=hf_messages)
|
| 148 |
-
|
| 149 |
-
# Convert back to Agent Framework format
|
| 150 |
-
return ChatResponse(
|
| 151 |
-
content=response.choices[0].message.content,
|
| 152 |
-
# ... other fields
|
| 153 |
-
)
|
| 154 |
-
|
| 155 |
-
async def _inner_get_streaming_response(self, ...):
|
| 156 |
-
"""Streaming version."""
|
| 157 |
-
...
|
| 158 |
```
|
| 159 |
|
| 160 |
-
|
| 161 |
-
|
| 162 |
-
```python
|
| 163 |
-
# src/clients/factory.py
|
| 164 |
-
from agent_framework import BaseChatClient
|
| 165 |
-
from agent_framework.openai import OpenAIChatClient
|
| 166 |
-
from src.utils.config import settings
|
| 167 |
-
|
| 168 |
-
def get_chat_client(provider: str | None = None) -> BaseChatClient:
|
| 169 |
-
"""
|
| 170 |
-
Factory that returns the appropriate chat client.
|
| 171 |
-
|
| 172 |
-
Priority:
|
| 173 |
-
1. OpenAI (if key available) - Best function calling, GPT-5
|
| 174 |
-
2. Gemini (if key available) - Good alternative [Future]
|
| 175 |
-
3. HuggingFace (always available) - FREE TIER FALLBACK
|
| 176 |
-
"""
|
| 177 |
-
if provider == "openai" or (provider is None and settings.has_openai_key):
|
| 178 |
-
return OpenAIChatClient(
|
| 179 |
-
model_id=settings.openai_model, # gpt-5
|
| 180 |
-
api_key=settings.openai_api_key,
|
| 181 |
-
)
|
| 182 |
-
|
| 183 |
-
# Future: Gemini support
|
| 184 |
-
# if settings.has_gemini_key:
|
| 185 |
-
# return GeminiChatClient(...)
|
| 186 |
-
|
| 187 |
-
# FREE TIER: HuggingFace (no API key required for public models)
|
| 188 |
-
from src.clients.huggingface import HuggingFaceChatClient
|
| 189 |
-
return HuggingFaceChatClient(
|
| 190 |
-
model_id="meta-llama/Llama-3.1-70B-Instruct",
|
| 191 |
-
)
|
| 192 |
-
```
|
| 193 |
|
| 194 |
---
|
| 195 |
|
| 196 |
-
##
|
| 197 |
|
| 198 |
-
|
| 199 |
|
| 200 |
-
|
| 201 |
-
|
| 202 |
-
|
| 203 |
-
|
| 204 |
|
| 205 |
-
|
| 206 |
-
if combined_score >= 10: # β 0 >= 10 is FALSE
|
| 207 |
-
return True
|
| 208 |
|
| 209 |
-
|
| 210 |
-
|
| 211 |
|
| 212 |
-
|
| 213 |
-
|
| 214 |
-
|
| 215 |
-
### The Fix (Advanced Mode - Already Works Correctly)
|
| 216 |
|
| 217 |
-
|
| 218 |
-
# Advanced Mode doesn't have this bug because:
|
| 219 |
-
# 1. JudgeAgent says "SUFFICIENT EVIDENCE" in natural language
|
| 220 |
-
# 2. Manager agent understands this and delegates to ReportAgent
|
| 221 |
-
# 3. No hardcoded thresholds to bypass
|
| 222 |
-
|
| 223 |
-
# The Manager agent prompt (src/orchestrators/advanced.py:152):
|
| 224 |
-
"""
|
| 225 |
-
When JudgeAgent says "SUFFICIENT EVIDENCE" or "STOP SEARCHING":
|
| 226 |
-
β IMMEDIATELY delegate to ReportAgent for synthesis
|
| 227 |
-
"""
|
| 228 |
-
```
|
| 229 |
|
| 230 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 231 |
|
| 232 |
---
|
| 233 |
|
| 234 |
-
##
|
| 235 |
-
|
| 236 |
-
### Phase 1: Create HuggingFaceChatClient (Enables Integration)
|
| 237 |
-
|
| 238 |
-
- [ ] Create `src/clients/` package
|
| 239 |
-
- [ ] Implement `HuggingFaceChatClient` (~150 lines)
|
| 240 |
-
- Extends `agent_framework.BaseChatClient`
|
| 241 |
-
- Wraps `huggingface_hub.InferenceClient.chat_completion()`
|
| 242 |
-
- Implements required abstract methods
|
| 243 |
-
- [ ] Implement `get_chat_client()` factory (~50 lines)
|
| 244 |
-
- [ ] Add unit tests
|
| 245 |
-
|
| 246 |
-
**Exit Criteria**: `get_chat_client()` returns working HuggingFace client when no API key.
|
| 247 |
-
|
| 248 |
-
### Phase 2: Integrate into Advanced Mode (Fixes P0 Bug)
|
| 249 |
|
| 250 |
-
-
|
| 251 |
-
- [ ] Update `magentic_agents.py` type hints: `OpenAIChatClient` β `BaseChatClient`
|
| 252 |
-
- [ ] Update `orchestrators/factory.py` to always return `AdvancedOrchestrator`
|
| 253 |
-
- [ ] Update `app.py` to remove mode toggle (everyone gets Advanced Mode)
|
| 254 |
-
- [ ] Archive `simple.py` to `docs/archive/` (for reference)
|
| 255 |
-
- [ ] Migrate Simple Mode tests to Advanced Mode tests
|
| 256 |
|
| 257 |
-
|
| 258 |
-
|
| 259 |
-
|
| 260 |
-
|
| 261 |
-
- [ ] Remove Anthropic provider code (Issue #110)
|
| 262 |
-
- [ ] Add Gemini support (Issue #109)
|
| 263 |
-
- [ ] Delete archived files after verification period
|
| 264 |
|
| 265 |
---
|
| 266 |
|
| 267 |
-
##
|
| 268 |
-
|
| 269 |
-
### New Files (~200 lines)
|
| 270 |
-
|
| 271 |
-
| File | Lines | Purpose |
|
| 272 |
-
|------|-------|---------|
|
| 273 |
-
| `src/clients/__init__.py` | ~10 | Package exports |
|
| 274 |
-
| `src/clients/factory.py` | ~50 | `get_chat_client()` |
|
| 275 |
-
| `src/clients/huggingface.py` | ~150 | HuggingFace adapter |
|
| 276 |
|
| 277 |
-
|
| 278 |
|
| 279 |
-
|
| 280 |
-
|------|--------|
|
| 281 |
-
| `src/orchestrators/advanced.py` | Use `get_chat_client()` instead of `OpenAIChatClient` |
|
| 282 |
-
| `src/orchestrators/factory.py` | Always return `AdvancedOrchestrator` |
|
| 283 |
-
| `src/agents/magentic_agents.py` | Type hints: `OpenAIChatClient` β `BaseChatClient` |
|
| 284 |
-
| `src/app.py` | Remove mode toggle, always use Advanced |
|
| 285 |
|
| 286 |
-
|
| 287 |
-
|
| 288 |
-
|
| 289 |
-
|
| 290 |
-
| `src/orchestrators/simple.py` | 778 | Functionality INTEGRATED, code retired |
|
| 291 |
-
| `src/tools/search_handler.py` | 219 | Manager agent handles this now |
|
| 292 |
|
| 293 |
---
|
| 294 |
|
| 295 |
-
##
|
| 296 |
-
|
| 297 |
-
### Technical Prerequisites (Verified β
)
|
| 298 |
-
|
| 299 |
-
- [x] `agent_framework.BaseChatClient` exists
|
| 300 |
-
- [x] Abstract methods: `_inner_get_response`, `_inner_get_streaming_response`
|
| 301 |
-
- [x] `huggingface_hub.InferenceClient.chat_completion()` exists
|
| 302 |
-
- [x] `chat_completion()` has `tools` parameter (verified in 0.36.0)
|
| 303 |
-
- [x] HuggingFace supports Llama 3.1 70B via free inference
|
| 304 |
-
- [x] **Dependency pinned**: `huggingface-hub>=0.24.0` in pyproject.toml (required for stable tool calling)
|
| 305 |
-
|
| 306 |
-
### Capability Preservation Checklist
|
| 307 |
|
| 308 |
-
|
|
|
|
|
|
|
|
|
|
| 309 |
|
| 310 |
-
|
| 311 |
-
- [ ] User with NO key β Gets Advanced Mode with HuggingFace (Llama 3.1 70B)
|
| 312 |
-
- [ ] Free-tier search works (PubMed, ClinicalTrials, EuropePMC)
|
| 313 |
-
- [ ] Free-tier synthesis works (LLM generates report)
|
| 314 |
-
- [ ] No more "continue_searching" infinite loops (P0 bug fixed)
|
| 315 |
|
| 316 |
---
|
| 317 |
|
| 318 |
-
##
|
| 319 |
-
|
| 320 |
-
### Dependency Requirement β
FIXED
|
| 321 |
-
|
| 322 |
-
The `huggingface-hub` package must be `>=0.24.0` for stable `chat_completion` with tools support.
|
| 323 |
-
|
| 324 |
-
```toml
|
| 325 |
-
# pyproject.toml - ALREADY UPDATED
|
| 326 |
-
"huggingface-hub>=0.24.0", # Required for stable chat_completion with tools
|
| 327 |
-
```
|
| 328 |
-
|
| 329 |
-
### Llama 3.1 Prompt Considerations β οΈ
|
| 330 |
-
|
| 331 |
-
The Manager agent prompt in `AdvancedOrchestrator._create_task_prompt()` was optimized for GPT-5. When using Llama 3.1 70B via HuggingFace, the prompt **may need tuning** to ensure strict adherence to delegation logic.
|
| 332 |
-
|
| 333 |
-
**Potential issue**: Llama 3.1 may not immediately delegate to ReportAgent when JudgeAgent says "SUFFICIENT EVIDENCE".
|
| 334 |
-
|
| 335 |
-
**Mitigation**: During implementation, test with HuggingFace backend and add reinforcement phrases if needed:
|
| 336 |
-
- "You MUST delegate to ReportAgent when you see SUFFICIENT EVIDENCE"
|
| 337 |
-
- "Do NOT continue searching after Judge approves"
|
| 338 |
|
| 339 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 340 |
|
| 341 |
---
|
| 342 |
|
| 343 |
## References
|
| 344 |
|
| 345 |
-
- Microsoft Agent Framework
|
| 346 |
-
-
|
| 347 |
-
-
|
| 348 |
-
-
|
| 349 |
-
- Issue #110: Remove Anthropic Provider Support
|
| 350 |
-
- Issue #113: P0 Bug - Simple Mode ignores forced synthesis
|
|
|
|
| 1 |
+
# SPEC_16: Unified Architecture
|
| 2 |
|
| 3 |
+
**Status**: BLOCKED - Waiting for upstream PR #2566
|
| 4 |
+
**Priority**: P0
|
| 5 |
+
**Issue**: [#105](https://github.com/The-Obstacle-Is-The-Way/DeepBoner/issues/105)
|
| 6 |
**Created**: 2025-12-01
|
|
|
|
| 7 |
|
| 8 |
---
|
| 9 |
|
| 10 |
+
## The Architecture (No Bullshit Version)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 11 |
|
| 12 |
```text
|
| 13 |
+
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 14 |
+
β UNIFIED ARCHITECTURE β
|
| 15 |
+
β β
|
| 16 |
+
β User provides API key? β
|
| 17 |
+
β β
|
| 18 |
+
β NO (Free Tier) YES (Paid Tier) β
|
| 19 |
+
β ββββββββββββββ βββββββββββββββ β
|
| 20 |
+
β HuggingFace backend OpenAI backend β
|
| 21 |
+
β Qwen 2.5 72B (free) GPT-5 (paid) β
|
| 22 |
+
β β
|
| 23 |
+
β SAME orchestration logic for both β
|
| 24 |
+
β ONE codebase, different LLM backends β
|
| 25 |
+
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 26 |
```
|
| 27 |
|
| 28 |
+
**No "modes."** Just: do you have an API key or not?
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 29 |
|
| 30 |
---
|
| 31 |
|
| 32 |
+
## Framework Stack
|
| 33 |
|
| 34 |
+
DeepBoner uses TWO frameworks that work TOGETHER:
|
| 35 |
|
| 36 |
+
| Framework | Role | Files |
|
| 37 |
+
|-----------|------|-------|
|
| 38 |
+
| **Microsoft Agent Framework** | Multi-agent ORCHESTRATION | `src/orchestrators/advanced.py` |
|
| 39 |
+
| **Pydantic AI** | Structured OUTPUTS & validation | `src/agent_factory/judges.py`, `src/agents/*.py` |
|
| 40 |
|
| 41 |
+
### Why Both?
|
|
|
|
|
|
|
| 42 |
|
| 43 |
+
- **Microsoft AF** handles: Manager β Search β Judge β Report agent coordination
|
| 44 |
+
- **Pydantic AI** handles: Structured responses, type validation, schema enforcement
|
| 45 |
|
| 46 |
+
They are **NOT mutually exclusive**. They are **complementary**:
|
| 47 |
+
- Microsoft AF = the highway system (routes agents)
|
| 48 |
+
- Pydantic AI = the cargo containers (structures data)
|
|
|
|
| 49 |
|
| 50 |
+
### Current Integration
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 51 |
|
| 52 |
+
| Component | Framework | Purpose |
|
| 53 |
+
|-----------|-----------|---------|
|
| 54 |
+
| `AdvancedOrchestrator` | Microsoft AF | Coordinates multi-agent workflow |
|
| 55 |
+
| `JudgeAssessment` | Pydantic AI | Structured judge output with validation |
|
| 56 |
+
| `Evidence`, `Citation` | Pydantic | Validated data models |
|
| 57 |
+
| Agent tool calling | Microsoft AF | Function execution |
|
| 58 |
+
| Agent structured output | Pydantic AI | Response validation |
|
| 59 |
|
| 60 |
---
|
| 61 |
|
| 62 |
+
## LLM Backend Selection
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 63 |
|
| 64 |
+
Auto-detected by `src/clients/factory.py`:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 65 |
|
| 66 |
+
| Condition | Backend | Model |
|
| 67 |
+
|-----------|---------|-------|
|
| 68 |
+
| User provides OpenAI key | OpenAI | GPT-5 |
|
| 69 |
+
| No API key | HuggingFace | Qwen 2.5 72B (free) |
|
|
|
|
|
|
|
|
|
|
| 70 |
|
| 71 |
---
|
| 72 |
|
| 73 |
+
## Current Blocker
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 74 |
|
| 75 |
+
**Upstream Bug #2562**: Microsoft Agent Framework produces `repr()` garbage for tool-call-only messages.
|
| 76 |
|
| 77 |
+
**Fix**: [PR #2566](https://github.com/microsoft/agent-framework/pull/2566) - waiting for merge.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 78 |
|
| 79 |
+
**Once merged**:
|
| 80 |
+
1. `uv add agent-framework@latest`
|
| 81 |
+
2. Verify free tier works
|
| 82 |
+
3. Done
|
|
|
|
|
|
|
| 83 |
|
| 84 |
---
|
| 85 |
|
| 86 |
+
## What Was Deleted
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 87 |
|
| 88 |
+
`simple.py` (778 lines) was a SEPARATE orchestrator that created a parallel universe:
|
| 89 |
+
- Used Pydantic AI directly for LLM calls
|
| 90 |
+
- Had its own while-loop orchestration
|
| 91 |
+
- Duplicated search/judge logic
|
| 92 |
|
| 93 |
+
Now there's ONE orchestrator with different backends.
|
|
|
|
|
|
|
|
|
|
|
|
|
| 94 |
|
| 95 |
---
|
| 96 |
|
| 97 |
+
## Files
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 98 |
|
| 99 |
+
| File | Framework | Purpose |
|
| 100 |
+
|------|-----------|---------|
|
| 101 |
+
| `src/orchestrators/advanced.py` | Microsoft AF | Multi-agent orchestration |
|
| 102 |
+
| `src/clients/factory.py` | - | Auto-selects LLM backend |
|
| 103 |
+
| `src/clients/huggingface.py` | - | HuggingFace adapter (free tier) |
|
| 104 |
+
| `src/agent_factory/judges.py` | Pydantic AI | Structured judge assessments |
|
| 105 |
+
| `src/agents/report_agent.py` | Pydantic AI | Structured report generation |
|
| 106 |
+
| `src/utils/models.py` | Pydantic | Data models (Evidence, Citation) |
|
| 107 |
|
| 108 |
---
|
| 109 |
|
| 110 |
## References
|
| 111 |
|
| 112 |
+
- [Microsoft Agent Framework](https://github.com/microsoft/agent-framework) - Multi-agent orchestration
|
| 113 |
+
- [Pydantic AI](https://ai.pydantic.dev/) - Structured outputs framework
|
| 114 |
+
- [Multi-Agent Research System with Pydantic](https://www.analyticsvidhya.com/blog/2025/03/multi-agent-research-assistant-system-using-pydantic/) - Architecture pattern
|
| 115 |
+
- [AG-UI Protocol](https://www.copilotkit.ai/blog/introducing-pydantic-ai-integration-with-ag-ui) - How frameworks integrate
|
|
|
|
|
|
|
@@ -0,0 +1,62 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# SPEC 17: Accumulator Pattern for Agent Events
|
| 2 |
+
|
| 3 |
+
**Status**: IMPLEMENTED
|
| 4 |
+
**Created**: 2025-12-02
|
| 5 |
+
**Author**: AI Agent
|
| 6 |
+
**Related**: P0_REPR_BUG_ROOT_CAUSE_ANALYSIS.md
|
| 7 |
+
|
| 8 |
+
## 1. Context
|
| 9 |
+
|
| 10 |
+
The Microsoft Agent Framework event model has a specific intended usage pattern:
|
| 11 |
+
- `MagenticAgentDeltaEvent.text` β **Content Source** (Streaming)
|
| 12 |
+
- `MagenticAgentMessageEvent` β **Completion Signal** (End of Turn)
|
| 13 |
+
|
| 14 |
+
Our previous implementation incorrectly attempted to extract content from `MagenticAgentMessageEvent.message`. This property is not designed for content extraction and can contain internal representation data (repr strings) for tool-only messages. This led to the "repr bug" where users saw raw Python object strings in the UI.
|
| 15 |
+
|
| 16 |
+
The **Accumulator Pattern** aligns our codebase with Microsoft's intended architecture (as demonstrated in their `04_magentic_one.py` sample) and resolves the display issues by using the correct event data source.
|
| 17 |
+
|
| 18 |
+
## 2. The Solution: Accumulator Pattern
|
| 19 |
+
|
| 20 |
+
Instead of relying on the final message event for content, we adopt the **Accumulator Pattern**, which aligns with the Microsoft Agent Framework's intended usage (as seen in their sample `04_magentic_one.py`).
|
| 21 |
+
|
| 22 |
+
### 2.1 Core Concept
|
| 23 |
+
|
| 24 |
+
1. **Streaming is Truth**: `MagenticAgentDeltaEvent` is the exclusive source of text content. These events are not affected by the upstream bug.
|
| 25 |
+
2. **Accumulation**: The orchestrator maintains a stateful buffer (`current_message_buffer`) that appends text from delta events.
|
| 26 |
+
3. **Signal Processing**: `MagenticAgentMessageEvent` is treated solely as a completion signal ("end of turn"). When received, we consume the buffer to form the final UI message and then clear the buffer.
|
| 27 |
+
|
| 28 |
+
### 2.2 Logic Flow
|
| 29 |
+
|
| 30 |
+
```python
|
| 31 |
+
current_message_buffer = ""
|
| 32 |
+
|
| 33 |
+
for event in stream:
|
| 34 |
+
if event is DeltaEvent:
|
| 35 |
+
current_message_buffer += event.text
|
| 36 |
+
emit_streaming_event(event.text)
|
| 37 |
+
|
| 38 |
+
elif event is MessageEvent:
|
| 39 |
+
# IGNORE event.message (it might be corrupted)
|
| 40 |
+
final_text = current_message_buffer
|
| 41 |
+
if not final_text:
|
| 42 |
+
final_text = "Action completed (Tool Call)"
|
| 43 |
+
|
| 44 |
+
emit_complete_event(final_text)
|
| 45 |
+
current_message_buffer = ""
|
| 46 |
+
```
|
| 47 |
+
|
| 48 |
+
## 3. Test Plan
|
| 49 |
+
|
| 50 |
+
To verify this pattern ensures correct output regardless of upstream bugs, we define the following test scenarios:
|
| 51 |
+
|
| 52 |
+
### 3.1 Scenario A: Standard Text Message
|
| 53 |
+
- **Input**: Sequence of `MagenticAgentDeltaEvent` (with text parts) -> `MagenticAgentMessageEvent` (with corrupted repr).
|
| 54 |
+
- **Expected Output**: The `AgentEvent` emitted at the end must contain the concatenated text from the deltas, NOT the repr string.
|
| 55 |
+
|
| 56 |
+
### 3.2 Scenario B: Tool Call (No Text)
|
| 57 |
+
- **Input**: No text deltas -> `MagenticAgentMessageEvent` (with corrupted repr).
|
| 58 |
+
- **Expected Output**: The `AgentEvent` should contain a fallback message (e.g., "Action completed (Tool Call)"), NOT the repr string.
|
| 59 |
+
|
| 60 |
+
## 4. Implementation Details
|
| 61 |
+
|
| 62 |
+
The pattern is implemented in `src/orchestrators/advanced.py` within the `run()` method loop. It bypasses `_process_event` for these specific event types to ensure strict control over data flow.
|
|
@@ -17,7 +17,7 @@ Design Patterns:
|
|
| 17 |
|
| 18 |
import asyncio
|
| 19 |
from collections.abc import AsyncGenerator
|
| 20 |
-
from typing import TYPE_CHECKING, Any
|
| 21 |
|
| 22 |
import structlog
|
| 23 |
from agent_framework import (
|
|
@@ -181,6 +181,69 @@ The final output should be a structured research report."""
|
|
| 181 |
|
| 182 |
return f"Round {iteration}/{self._max_rounds} (~{est_display} remaining)"
|
| 183 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 184 |
async def run(self, query: str) -> AsyncGenerator[AgentEvent, None]:
|
| 185 |
"""
|
| 186 |
Run the workflow.
|
|
@@ -193,18 +256,10 @@ The final output should be a structured research report."""
|
|
| 193 |
"""
|
| 194 |
logger.info("Starting Advanced orchestrator", query=query)
|
| 195 |
|
| 196 |
-
|
| 197 |
-
|
| 198 |
-
message=f"Starting research (Advanced mode): {query}",
|
| 199 |
-
iteration=0,
|
| 200 |
-
)
|
| 201 |
|
| 202 |
# Initialize context state
|
| 203 |
-
yield AgentEvent(
|
| 204 |
-
type="progress",
|
| 205 |
-
message="Loading embedding service (LlamaIndex/ChromaDB)...",
|
| 206 |
-
iteration=0,
|
| 207 |
-
)
|
| 208 |
embedding_service = self._init_embedding_service()
|
| 209 |
|
| 210 |
yield AgentEvent(
|
|
@@ -238,25 +293,52 @@ The final output should be a structured research report."""
|
|
| 238 |
iteration = 0
|
| 239 |
final_event_received = False
|
| 240 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 241 |
try:
|
| 242 |
async with asyncio.timeout(self._timeout_seconds):
|
| 243 |
async for event in workflow.run_stream(task):
|
| 244 |
-
|
| 245 |
-
if
|
| 246 |
-
|
| 247 |
-
|
| 248 |
-
|
| 249 |
-
|
| 250 |
-
|
|
|
|
|
|
|
| 251 |
yield AgentEvent(
|
| 252 |
-
type="
|
| 253 |
-
message=
|
|
|
|
| 254 |
iteration=iteration,
|
| 255 |
)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 256 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 257 |
if agent_event.type == "complete":
|
| 258 |
final_event_received = True
|
| 259 |
-
|
| 260 |
yield agent_event
|
| 261 |
|
| 262 |
# GUARANTEE: Always emit termination event if stream ends without one
|
|
@@ -278,52 +360,8 @@ The final output should be a structured research report."""
|
|
| 278 |
)
|
| 279 |
|
| 280 |
except TimeoutError:
|
| 281 |
-
|
| 282 |
-
|
| 283 |
-
# ACTUALLY synthesize from gathered evidence
|
| 284 |
-
try:
|
| 285 |
-
from src.agents.magentic_agents import create_report_agent
|
| 286 |
-
from src.agents.state import get_magentic_state
|
| 287 |
-
|
| 288 |
-
state = get_magentic_state()
|
| 289 |
-
memory = state.memory
|
| 290 |
-
|
| 291 |
-
# Get evidence summary from memory
|
| 292 |
-
evidence_summary = await memory.get_context_summary()
|
| 293 |
-
|
| 294 |
-
# Create and invoke ReportAgent for synthesis
|
| 295 |
-
report_agent = create_report_agent(self._chat_client, domain=self.domain)
|
| 296 |
-
|
| 297 |
-
yield AgentEvent(
|
| 298 |
-
type="synthesizing",
|
| 299 |
-
message="Workflow timed out. Synthesizing available evidence...",
|
| 300 |
-
iteration=iteration,
|
| 301 |
-
)
|
| 302 |
-
|
| 303 |
-
# Invoke ReportAgent directly
|
| 304 |
-
# Note: ChatAgent.run() returns the final response string
|
| 305 |
-
synthesis_result = await report_agent.run(
|
| 306 |
-
"Synthesize research report from this evidence. "
|
| 307 |
-
f"If evidence is sparse, say so.\n\n{evidence_summary}"
|
| 308 |
-
)
|
| 309 |
-
|
| 310 |
-
yield AgentEvent(
|
| 311 |
-
type="complete",
|
| 312 |
-
message=str(synthesis_result),
|
| 313 |
-
data={"reason": "timeout_synthesis", "iterations": iteration},
|
| 314 |
-
iteration=iteration,
|
| 315 |
-
)
|
| 316 |
-
except Exception as synth_error:
|
| 317 |
-
logger.error("Timeout synthesis failed", error=str(synth_error))
|
| 318 |
-
yield AgentEvent(
|
| 319 |
-
type="complete",
|
| 320 |
-
message=(
|
| 321 |
-
f"Research timed out after {iteration} rounds. "
|
| 322 |
-
f"Evidence gathered but synthesis failed: {synth_error}"
|
| 323 |
-
),
|
| 324 |
-
data={"reason": "timeout_synthesis_failed", "iterations": iteration},
|
| 325 |
-
iteration=iteration,
|
| 326 |
-
)
|
| 327 |
|
| 328 |
except Exception as e:
|
| 329 |
logger.error("Workflow failed", error=str(e))
|
|
@@ -333,6 +371,45 @@ The final output should be a structured research report."""
|
|
| 333 |
iteration=iteration,
|
| 334 |
)
|
| 335 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 336 |
def _extract_text(self, message: Any) -> str:
|
| 337 |
"""
|
| 338 |
Defensively extract text from a message object.
|
|
@@ -384,7 +461,9 @@ The final output should be a structured research report."""
|
|
| 384 |
# The repr is useless for display purposes
|
| 385 |
return ""
|
| 386 |
|
| 387 |
-
def _get_event_type_for_agent(
|
|
|
|
|
|
|
| 388 |
"""Map agent name to appropriate event type.
|
| 389 |
|
| 390 |
Args:
|
|
@@ -444,17 +523,8 @@ The final output should be a structured research report."""
|
|
| 444 |
iteration=iteration,
|
| 445 |
)
|
| 446 |
|
| 447 |
-
|
| 448 |
-
|
| 449 |
-
text = self._extract_text(event.message)
|
| 450 |
-
event_type = self._get_event_type_for_agent(agent_name)
|
| 451 |
-
|
| 452 |
-
# All returned types are valid AgentEvent.type literals
|
| 453 |
-
return AgentEvent(
|
| 454 |
-
type=event_type, # type: ignore[arg-type]
|
| 455 |
-
message=f"{agent_name}: {self._smart_truncate(text)}",
|
| 456 |
-
iteration=iteration + 1,
|
| 457 |
-
)
|
| 458 |
|
| 459 |
elif isinstance(event, MagenticFinalResultEvent):
|
| 460 |
text = self._extract_text(event.message) if event.message else "No result"
|
|
@@ -465,14 +535,8 @@ The final output should be a structured research report."""
|
|
| 465 |
iteration=iteration,
|
| 466 |
)
|
| 467 |
|
| 468 |
-
|
| 469 |
-
|
| 470 |
-
return AgentEvent(
|
| 471 |
-
type="streaming",
|
| 472 |
-
message=event.text,
|
| 473 |
-
data={"agent_id": event.agent_id},
|
| 474 |
-
iteration=iteration,
|
| 475 |
-
)
|
| 476 |
|
| 477 |
elif isinstance(event, WorkflowOutputEvent):
|
| 478 |
if event.data:
|
|
|
|
| 17 |
|
| 18 |
import asyncio
|
| 19 |
from collections.abc import AsyncGenerator
|
| 20 |
+
from typing import TYPE_CHECKING, Any, Literal
|
| 21 |
|
| 22 |
import structlog
|
| 23 |
from agent_framework import (
|
|
|
|
| 181 |
|
| 182 |
return f"Round {iteration}/{self._max_rounds} (~{est_display} remaining)"
|
| 183 |
|
| 184 |
+
async def _init_workflow_events(self, query: str) -> AsyncGenerator[AgentEvent, None]:
|
| 185 |
+
"""Yield initialization events."""
|
| 186 |
+
yield AgentEvent(
|
| 187 |
+
type="started",
|
| 188 |
+
message=f"Starting research (Advanced mode): {query}",
|
| 189 |
+
iteration=0,
|
| 190 |
+
)
|
| 191 |
+
|
| 192 |
+
yield AgentEvent(
|
| 193 |
+
type="progress",
|
| 194 |
+
message="Loading embedding service (LlamaIndex/ChromaDB)...",
|
| 195 |
+
iteration=0,
|
| 196 |
+
)
|
| 197 |
+
|
| 198 |
+
async def _handle_timeout(self, iteration: int) -> AsyncGenerator[AgentEvent, None]:
|
| 199 |
+
"""Handle workflow timeout by attempting synthesis."""
|
| 200 |
+
logger.warning("Workflow timed out", iterations=iteration)
|
| 201 |
+
|
| 202 |
+
# ACTUALLY synthesize from gathered evidence
|
| 203 |
+
try:
|
| 204 |
+
from src.agents.magentic_agents import create_report_agent
|
| 205 |
+
from src.agents.state import get_magentic_state
|
| 206 |
+
|
| 207 |
+
state = get_magentic_state()
|
| 208 |
+
memory = state.memory
|
| 209 |
+
|
| 210 |
+
# Get evidence summary from memory
|
| 211 |
+
evidence_summary = await memory.get_context_summary()
|
| 212 |
+
|
| 213 |
+
# Create and invoke ReportAgent for synthesis
|
| 214 |
+
report_agent = create_report_agent(self._chat_client, domain=self.domain)
|
| 215 |
+
|
| 216 |
+
yield AgentEvent(
|
| 217 |
+
type="synthesizing",
|
| 218 |
+
message="Workflow timed out. Synthesizing available evidence...",
|
| 219 |
+
iteration=iteration,
|
| 220 |
+
)
|
| 221 |
+
|
| 222 |
+
# Invoke ReportAgent directly
|
| 223 |
+
# Note: ChatAgent.run() returns AgentRunResponse; access text via .text
|
| 224 |
+
synthesis_result = await report_agent.run(
|
| 225 |
+
"Synthesize research report from this evidence. "
|
| 226 |
+
f"If evidence is sparse, say so.\n\n{evidence_summary}"
|
| 227 |
+
)
|
| 228 |
+
|
| 229 |
+
yield AgentEvent(
|
| 230 |
+
type="complete",
|
| 231 |
+
message=synthesis_result.text,
|
| 232 |
+
data={"reason": "timeout_synthesis", "iterations": iteration},
|
| 233 |
+
iteration=iteration,
|
| 234 |
+
)
|
| 235 |
+
except Exception as synth_error:
|
| 236 |
+
logger.error("Timeout synthesis failed", error=str(synth_error))
|
| 237 |
+
yield AgentEvent(
|
| 238 |
+
type="complete",
|
| 239 |
+
message=(
|
| 240 |
+
f"Research timed out after {iteration} rounds. "
|
| 241 |
+
f"Evidence gathered but synthesis failed: {synth_error}"
|
| 242 |
+
),
|
| 243 |
+
data={"reason": "timeout_synthesis_failed", "iterations": iteration},
|
| 244 |
+
iteration=iteration,
|
| 245 |
+
)
|
| 246 |
+
|
| 247 |
async def run(self, query: str) -> AsyncGenerator[AgentEvent, None]:
|
| 248 |
"""
|
| 249 |
Run the workflow.
|
|
|
|
| 256 |
"""
|
| 257 |
logger.info("Starting Advanced orchestrator", query=query)
|
| 258 |
|
| 259 |
+
async for event in self._init_workflow_events(query):
|
| 260 |
+
yield event
|
|
|
|
|
|
|
|
|
|
| 261 |
|
| 262 |
# Initialize context state
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 263 |
embedding_service = self._init_embedding_service()
|
| 264 |
|
| 265 |
yield AgentEvent(
|
|
|
|
| 293 |
iteration = 0
|
| 294 |
final_event_received = False
|
| 295 |
|
| 296 |
+
# ACCUMULATOR PATTERN: Track streaming content to bypass upstream Repr Bug
|
| 297 |
+
# Upstream bug in _magentic.py flattens message.contents and sets message.text
|
| 298 |
+
# to repr(message) if text is empty. We must reconstruct text from Deltas.
|
| 299 |
+
current_message_buffer: str = ""
|
| 300 |
+
current_agent_id: str | None = None
|
| 301 |
+
|
| 302 |
try:
|
| 303 |
async with asyncio.timeout(self._timeout_seconds):
|
| 304 |
async for event in workflow.run_stream(task):
|
| 305 |
+
# 1. Handle Streaming (Source of Truth for Content)
|
| 306 |
+
if isinstance(event, MagenticAgentDeltaEvent):
|
| 307 |
+
# Detect agent switch to clear buffer
|
| 308 |
+
if event.agent_id != current_agent_id:
|
| 309 |
+
current_message_buffer = ""
|
| 310 |
+
current_agent_id = event.agent_id
|
| 311 |
+
|
| 312 |
+
if event.text:
|
| 313 |
+
current_message_buffer += event.text
|
| 314 |
yield AgentEvent(
|
| 315 |
+
type="streaming",
|
| 316 |
+
message=event.text,
|
| 317 |
+
data={"agent_id": event.agent_id},
|
| 318 |
iteration=iteration,
|
| 319 |
)
|
| 320 |
+
continue
|
| 321 |
+
|
| 322 |
+
# 2. Handle Completion Signal
|
| 323 |
+
# We use our accumulated buffer instead of the corrupted event.message
|
| 324 |
+
if isinstance(event, MagenticAgentMessageEvent):
|
| 325 |
+
iteration += 1
|
| 326 |
|
| 327 |
+
comp_event, prog_event = self._handle_completion_event(
|
| 328 |
+
event, current_message_buffer, iteration
|
| 329 |
+
)
|
| 330 |
+
yield comp_event
|
| 331 |
+
yield prog_event
|
| 332 |
+
|
| 333 |
+
# Clear buffer after consuming
|
| 334 |
+
current_message_buffer = ""
|
| 335 |
+
continue
|
| 336 |
+
|
| 337 |
+
# 3. Handle other events normally
|
| 338 |
+
agent_event = self._process_event(event, iteration)
|
| 339 |
+
if agent_event:
|
| 340 |
if agent_event.type == "complete":
|
| 341 |
final_event_received = True
|
|
|
|
| 342 |
yield agent_event
|
| 343 |
|
| 344 |
# GUARANTEE: Always emit termination event if stream ends without one
|
|
|
|
| 360 |
)
|
| 361 |
|
| 362 |
except TimeoutError:
|
| 363 |
+
async for event in self._handle_timeout(iteration):
|
| 364 |
+
yield event
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 365 |
|
| 366 |
except Exception as e:
|
| 367 |
logger.error("Workflow failed", error=str(e))
|
|
|
|
| 371 |
iteration=iteration,
|
| 372 |
)
|
| 373 |
|
| 374 |
+
def _handle_completion_event(
|
| 375 |
+
self, event: MagenticAgentMessageEvent, buffer: str, iteration: int
|
| 376 |
+
) -> tuple[AgentEvent, AgentEvent]:
|
| 377 |
+
"""Handle an agent completion event using the accumulated buffer."""
|
| 378 |
+
# Use buffer if available, otherwise fall back cautiously
|
| 379 |
+
# (Only fall back if buffer empty, which implies tool-only turn)
|
| 380 |
+
text_content = buffer
|
| 381 |
+
if not text_content:
|
| 382 |
+
# Try extraction but ignore repr strings AND empty strings
|
| 383 |
+
raw_text = self._extract_text(event.message)
|
| 384 |
+
if raw_text and not (raw_text.startswith("<") and "object at" in raw_text):
|
| 385 |
+
text_content = raw_text
|
| 386 |
+
else:
|
| 387 |
+
text_content = "Action completed (Tool Call)"
|
| 388 |
+
|
| 389 |
+
agent_name = event.agent_id or "unknown"
|
| 390 |
+
event_type = self._get_event_type_for_agent(agent_name)
|
| 391 |
+
|
| 392 |
+
completion_event = AgentEvent(
|
| 393 |
+
type=event_type,
|
| 394 |
+
message=f"{agent_name}: {text_content[:200]}...",
|
| 395 |
+
iteration=iteration,
|
| 396 |
+
)
|
| 397 |
+
|
| 398 |
+
# Progress update
|
| 399 |
+
rounds_remaining = max(self._max_rounds - iteration, 0)
|
| 400 |
+
est_seconds = rounds_remaining * 45
|
| 401 |
+
est_display = (
|
| 402 |
+
f"{est_seconds // 60}m {est_seconds % 60}s" if est_seconds >= 60 else f"{est_seconds}s"
|
| 403 |
+
)
|
| 404 |
+
|
| 405 |
+
progress_event = AgentEvent(
|
| 406 |
+
type="progress",
|
| 407 |
+
message=f"Round {iteration}/{self._max_rounds} (~{est_display} remaining)",
|
| 408 |
+
iteration=iteration,
|
| 409 |
+
)
|
| 410 |
+
|
| 411 |
+
return completion_event, progress_event
|
| 412 |
+
|
| 413 |
def _extract_text(self, message: Any) -> str:
|
| 414 |
"""
|
| 415 |
Defensively extract text from a message object.
|
|
|
|
| 461 |
# The repr is useless for display purposes
|
| 462 |
return ""
|
| 463 |
|
| 464 |
+
def _get_event_type_for_agent(
|
| 465 |
+
self, agent_name: str
|
| 466 |
+
) -> Literal["search_complete", "judge_complete", "hypothesizing", "synthesizing", "judging"]:
|
| 467 |
"""Map agent name to appropriate event type.
|
| 468 |
|
| 469 |
Args:
|
|
|
|
| 523 |
iteration=iteration,
|
| 524 |
)
|
| 525 |
|
| 526 |
+
# NOTE: MagenticAgentMessageEvent is handled in run() loop with Accumulator Pattern
|
| 527 |
+
# (see lines 322-335) and never reaches this method due to `continue` statement.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 528 |
|
| 529 |
elif isinstance(event, MagenticFinalResultEvent):
|
| 530 |
text = self._extract_text(event.message) if event.message else "No result"
|
|
|
|
| 535 |
iteration=iteration,
|
| 536 |
)
|
| 537 |
|
| 538 |
+
# NOTE: MagenticAgentDeltaEvent is handled in run() loop with Accumulator Pattern
|
| 539 |
+
# (see lines 306-320) and never reaches this method due to `continue` statement.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 540 |
|
| 541 |
elif isinstance(event, WorkflowOutputEvent):
|
| 542 |
if event.data:
|
|
@@ -0,0 +1,294 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
Test the Accumulator Pattern for Microsoft Agent Framework event handling.
|
| 3 |
+
|
| 4 |
+
This tests SPEC 17: We use MagenticAgentDeltaEvent.text as the sole source of content,
|
| 5 |
+
and MagenticAgentMessageEvent as a signal only (ignoring .message to avoid repr bug).
|
| 6 |
+
"""
|
| 7 |
+
|
| 8 |
+
import importlib
|
| 9 |
+
import sys
|
| 10 |
+
import types
|
| 11 |
+
from unittest.mock import MagicMock, patch
|
| 12 |
+
|
| 13 |
+
import pytest
|
| 14 |
+
|
| 15 |
+
|
| 16 |
+
# --- Create real event classes ---
|
| 17 |
+
class MockDeltaEvent:
|
| 18 |
+
"""Simulates MagenticAgentDeltaEvent with streaming text."""
|
| 19 |
+
|
| 20 |
+
def __init__(self, text: str, agent_id: str = "TestAgent"):
|
| 21 |
+
self.text = text
|
| 22 |
+
self.agent_id = agent_id
|
| 23 |
+
|
| 24 |
+
|
| 25 |
+
class MockMessageEvent:
|
| 26 |
+
"""Simulates MagenticAgentMessageEvent with potentially corrupted .message."""
|
| 27 |
+
|
| 28 |
+
def __init__(self, message_text: str, agent_id: str = "TestAgent"):
|
| 29 |
+
self.message = MagicMock()
|
| 30 |
+
self.message.text = message_text # This could be repr garbage
|
| 31 |
+
self.agent_id = agent_id
|
| 32 |
+
self.text = None # No top-level .text on MessageEvent
|
| 33 |
+
|
| 34 |
+
|
| 35 |
+
class MockFinalResultEvent:
|
| 36 |
+
"""Simulates MagenticFinalResultEvent."""
|
| 37 |
+
|
| 38 |
+
def __init__(self, text: str):
|
| 39 |
+
self.message = MagicMock()
|
| 40 |
+
self.message.text = text
|
| 41 |
+
self.text = None
|
| 42 |
+
|
| 43 |
+
|
| 44 |
+
class MockOrchestratorMessageEvent:
|
| 45 |
+
"""Simulates MagenticOrchestratorMessageEvent."""
|
| 46 |
+
|
| 47 |
+
def __init__(self, kind: str = "user_task", message: str = "test"):
|
| 48 |
+
self.kind = kind
|
| 49 |
+
self.message = MagicMock()
|
| 50 |
+
self.message.text = message
|
| 51 |
+
|
| 52 |
+
|
| 53 |
+
class MockWorkflowOutputEvent:
|
| 54 |
+
"""Simulates WorkflowOutputEvent."""
|
| 55 |
+
|
| 56 |
+
def __init__(self, data=None):
|
| 57 |
+
self.data = data
|
| 58 |
+
|
| 59 |
+
|
| 60 |
+
# Pass-through decorators
|
| 61 |
+
def mock_use_function_invocation(func=None):
|
| 62 |
+
return func if func else lambda f: f
|
| 63 |
+
|
| 64 |
+
|
| 65 |
+
def mock_use_observability(func=None):
|
| 66 |
+
return func if func else lambda f: f
|
| 67 |
+
|
| 68 |
+
|
| 69 |
+
@pytest.fixture
|
| 70 |
+
def mock_agent_framework():
|
| 71 |
+
"""
|
| 72 |
+
Mock the agent_framework module structure in sys.modules.
|
| 73 |
+
"""
|
| 74 |
+
# Create the mock module structure
|
| 75 |
+
mock_af = types.ModuleType("agent_framework")
|
| 76 |
+
mock_af_openai = types.ModuleType("agent_framework.openai")
|
| 77 |
+
mock_af_middleware = types.ModuleType("agent_framework._middleware")
|
| 78 |
+
mock_af_tools = types.ModuleType("agent_framework._tools")
|
| 79 |
+
mock_af_types = types.ModuleType("agent_framework._types")
|
| 80 |
+
mock_af_observability = types.ModuleType("agent_framework.observability")
|
| 81 |
+
|
| 82 |
+
# Populate submodules
|
| 83 |
+
mock_af.openai = mock_af_openai
|
| 84 |
+
mock_af._middleware = mock_af_middleware
|
| 85 |
+
mock_af._tools = mock_af_tools
|
| 86 |
+
mock_af._types = mock_af_types
|
| 87 |
+
mock_af.observability = mock_af_observability
|
| 88 |
+
|
| 89 |
+
# Assign our REAL event classes as the module-level types
|
| 90 |
+
mock_af.MagenticAgentDeltaEvent = MockDeltaEvent
|
| 91 |
+
mock_af.MagenticAgentMessageEvent = MockMessageEvent
|
| 92 |
+
mock_af.MagenticFinalResultEvent = MockFinalResultEvent
|
| 93 |
+
mock_af.MagenticOrchestratorMessageEvent = MockOrchestratorMessageEvent
|
| 94 |
+
mock_af.WorkflowOutputEvent = MockWorkflowOutputEvent
|
| 95 |
+
|
| 96 |
+
# Mock other classes
|
| 97 |
+
mock_af.MagenticBuilder = MagicMock
|
| 98 |
+
mock_af.ChatAgent = MagicMock
|
| 99 |
+
mock_af.ai_function = MagicMock
|
| 100 |
+
mock_af.BaseChatClient = MagicMock
|
| 101 |
+
mock_af.ToolProtocol = MagicMock
|
| 102 |
+
mock_af.ChatMessage = MagicMock
|
| 103 |
+
mock_af.ChatResponse = MagicMock
|
| 104 |
+
mock_af.ChatResponseUpdate = MagicMock
|
| 105 |
+
mock_af.ChatOptions = MagicMock
|
| 106 |
+
mock_af.FinishReason = MagicMock
|
| 107 |
+
mock_af.Role = MagicMock
|
| 108 |
+
|
| 109 |
+
# Populate symbols in submodules
|
| 110 |
+
mock_af_openai.OpenAIChatClient = MagicMock
|
| 111 |
+
mock_af_middleware.use_chat_middleware = MagicMock
|
| 112 |
+
mock_af_tools.use_function_invocation = mock_use_function_invocation
|
| 113 |
+
mock_af_types.FunctionCallContent = MagicMock
|
| 114 |
+
mock_af_types.FunctionResultContent = MagicMock
|
| 115 |
+
mock_af_observability.use_observability = mock_use_observability
|
| 116 |
+
|
| 117 |
+
# Patch sys.modules to include our mocks
|
| 118 |
+
with patch.dict(
|
| 119 |
+
sys.modules,
|
| 120 |
+
{
|
| 121 |
+
"agent_framework": mock_af,
|
| 122 |
+
"agent_framework.openai": mock_af_openai,
|
| 123 |
+
"agent_framework._middleware": mock_af_middleware,
|
| 124 |
+
"agent_framework._tools": mock_af_tools,
|
| 125 |
+
"agent_framework._types": mock_af_types,
|
| 126 |
+
"agent_framework.observability": mock_af_observability,
|
| 127 |
+
},
|
| 128 |
+
):
|
| 129 |
+
yield mock_af
|
| 130 |
+
|
| 131 |
+
|
| 132 |
+
@pytest.fixture(scope="module", autouse=True)
|
| 133 |
+
def cleanup_orchestrator_module():
|
| 134 |
+
"""
|
| 135 |
+
Ensure src.orchestrators.advanced is restored to a clean state after tests.
|
| 136 |
+
This prevents 'Mock' classes from leaking into other tests via module globals.
|
| 137 |
+
"""
|
| 138 |
+
yield
|
| 139 |
+
# After all tests in this module, reload the orchestrator module
|
| 140 |
+
# This will use the REAL agent_framework (since the mock fixture is teardown)
|
| 141 |
+
import src.orchestrators.advanced
|
| 142 |
+
|
| 143 |
+
importlib.reload(src.orchestrators.advanced)
|
| 144 |
+
|
| 145 |
+
|
| 146 |
+
@pytest.fixture
|
| 147 |
+
def mock_orchestrator(mock_agent_framework):
|
| 148 |
+
"""
|
| 149 |
+
Create an AdvancedOrchestrator with all dependencies mocked.
|
| 150 |
+
Relies on reloading the module to pick up the mocked agent_framework.
|
| 151 |
+
"""
|
| 152 |
+
# Import locally
|
| 153 |
+
import src.orchestrators.advanced
|
| 154 |
+
|
| 155 |
+
# RELOAD to ensure it picks up the mocked agent_framework from sys.modules
|
| 156 |
+
importlib.reload(src.orchestrators.advanced)
|
| 157 |
+
|
| 158 |
+
from src.orchestrators.advanced import AdvancedOrchestrator
|
| 159 |
+
|
| 160 |
+
with (
|
| 161 |
+
patch("src.orchestrators.advanced.get_chat_client"),
|
| 162 |
+
patch("src.orchestrators.advanced.get_embedding_service_if_available", return_value=None),
|
| 163 |
+
patch("src.orchestrators.advanced.init_magentic_state"),
|
| 164 |
+
patch("src.agents.state.ResearchMemory"),
|
| 165 |
+
patch("src.utils.service_loader.get_embedding_service", return_value=MagicMock()),
|
| 166 |
+
):
|
| 167 |
+
orch = AdvancedOrchestrator(max_rounds=5)
|
| 168 |
+
yield orch
|
| 169 |
+
|
| 170 |
+
|
| 171 |
+
@pytest.mark.unit
|
| 172 |
+
@pytest.mark.asyncio
|
| 173 |
+
async def test_accumulator_pattern_scenario_a_standard_text(mock_orchestrator):
|
| 174 |
+
"""
|
| 175 |
+
Scenario A: Standard Text Message
|
| 176 |
+
Input: Deltas ("Hello", " World") -> MessageEvent (Poisoned Repr)
|
| 177 |
+
Expected: AgentEvent with "Hello World", NOT the repr string
|
| 178 |
+
"""
|
| 179 |
+
events = [
|
| 180 |
+
MockDeltaEvent("Hello", agent_id="ChatBot"),
|
| 181 |
+
MockDeltaEvent(" World", agent_id="ChatBot"),
|
| 182 |
+
MockMessageEvent("<ChatMessage object at 0xDEADBEEF>", agent_id="ChatBot"),
|
| 183 |
+
]
|
| 184 |
+
|
| 185 |
+
async def mock_stream(*args, **kwargs):
|
| 186 |
+
for event in events:
|
| 187 |
+
yield event
|
| 188 |
+
|
| 189 |
+
mock_workflow = MagicMock()
|
| 190 |
+
mock_workflow.run_stream = mock_stream
|
| 191 |
+
|
| 192 |
+
with patch.object(mock_orchestrator, "_build_workflow", return_value=mock_workflow):
|
| 193 |
+
generated_events = []
|
| 194 |
+
async for event in mock_orchestrator.run("test query"):
|
| 195 |
+
generated_events.append(event)
|
| 196 |
+
|
| 197 |
+
# Find the completion event for ChatBot (non-streaming)
|
| 198 |
+
chat_events = [
|
| 199 |
+
e for e in generated_events if "ChatBot" in str(e.message) and e.type != "streaming"
|
| 200 |
+
]
|
| 201 |
+
|
| 202 |
+
assert len(chat_events) >= 1, (
|
| 203 |
+
f"Expected ChatBot events, got: {[e.message for e in generated_events]}"
|
| 204 |
+
)
|
| 205 |
+
final_event = chat_events[0]
|
| 206 |
+
|
| 207 |
+
# CRITICAL: Must contain accumulated text, NOT repr
|
| 208 |
+
assert "Hello World" in final_event.message or "Hello" in final_event.message
|
| 209 |
+
assert "<ChatMessage" not in final_event.message, f"Repr bug! Got: {final_event.message}"
|
| 210 |
+
|
| 211 |
+
|
| 212 |
+
@pytest.mark.unit
|
| 213 |
+
@pytest.mark.asyncio
|
| 214 |
+
async def test_accumulator_pattern_scenario_b_tool_call(mock_orchestrator):
|
| 215 |
+
"""
|
| 216 |
+
Scenario B: Tool Call (No Text Deltas)
|
| 217 |
+
Input: No Deltas -> MessageEvent (Poisoned Repr)
|
| 218 |
+
Expected: AgentEvent with fallback text, NOT the repr string
|
| 219 |
+
"""
|
| 220 |
+
events = [
|
| 221 |
+
MockMessageEvent("<ChatMessage object at 0xDEADBEEF>", agent_id="SearchAgent"),
|
| 222 |
+
]
|
| 223 |
+
|
| 224 |
+
async def mock_stream(*args, **kwargs):
|
| 225 |
+
for event in events:
|
| 226 |
+
yield event
|
| 227 |
+
|
| 228 |
+
mock_workflow = MagicMock()
|
| 229 |
+
mock_workflow.run_stream = mock_stream
|
| 230 |
+
|
| 231 |
+
with patch.object(mock_orchestrator, "_build_workflow", return_value=mock_workflow):
|
| 232 |
+
generated_events = []
|
| 233 |
+
async for event in mock_orchestrator.run("test query"):
|
| 234 |
+
generated_events.append(event)
|
| 235 |
+
|
| 236 |
+
# Find completion events for SearchAgent
|
| 237 |
+
search_events = [
|
| 238 |
+
e for e in generated_events if "SearchAgent" in str(e.message) and e.type != "streaming"
|
| 239 |
+
]
|
| 240 |
+
|
| 241 |
+
assert len(search_events) >= 1, (
|
| 242 |
+
f"Expected SearchAgent events, got: {[e.message for e in generated_events]}"
|
| 243 |
+
)
|
| 244 |
+
final_event = search_events[0]
|
| 245 |
+
|
| 246 |
+
# CRITICAL: Should use fallback, NOT repr
|
| 247 |
+
assert "<ChatMessage" not in final_event.message, f"Repr bug! Got: {final_event.message}"
|
| 248 |
+
# Should contain fallback or tool indicator
|
| 249 |
+
assert "Action completed" in final_event.message or "Tool" in final_event.message
|
| 250 |
+
|
| 251 |
+
|
| 252 |
+
@pytest.mark.unit
|
| 253 |
+
@pytest.mark.asyncio
|
| 254 |
+
async def test_accumulator_pattern_buffer_clearing(mock_orchestrator):
|
| 255 |
+
"""
|
| 256 |
+
Verify buffer clears between agents.
|
| 257 |
+
Agent B should NOT inherit Agent A's accumulated text.
|
| 258 |
+
"""
|
| 259 |
+
events = [
|
| 260 |
+
MockDeltaEvent("Agent A says hi", agent_id="AgentA"),
|
| 261 |
+
MockMessageEvent("irrelevant", agent_id="AgentA"),
|
| 262 |
+
MockDeltaEvent("Agent B responds", agent_id="AgentB"),
|
| 263 |
+
MockMessageEvent("irrelevant", agent_id="AgentB"),
|
| 264 |
+
]
|
| 265 |
+
|
| 266 |
+
async def mock_stream(*args, **kwargs):
|
| 267 |
+
for event in events:
|
| 268 |
+
yield event
|
| 269 |
+
|
| 270 |
+
mock_workflow = MagicMock()
|
| 271 |
+
mock_workflow.run_stream = mock_stream
|
| 272 |
+
|
| 273 |
+
with patch.object(mock_orchestrator, "_build_workflow", return_value=mock_workflow):
|
| 274 |
+
generated_events = []
|
| 275 |
+
async for event in mock_orchestrator.run("test query"):
|
| 276 |
+
generated_events.append(event)
|
| 277 |
+
|
| 278 |
+
# Find non-streaming events for each agent
|
| 279 |
+
agent_a_events = [
|
| 280 |
+
e for e in generated_events if "AgentA" in str(e.message) and e.type != "streaming"
|
| 281 |
+
]
|
| 282 |
+
agent_b_events = [
|
| 283 |
+
e for e in generated_events if "AgentB" in str(e.message) and e.type != "streaming"
|
| 284 |
+
]
|
| 285 |
+
|
| 286 |
+
# Both should have completion events
|
| 287 |
+
assert len(agent_a_events) >= 1, f"No AgentA events: {[e.message for e in generated_events]}"
|
| 288 |
+
assert len(agent_b_events) >= 1, f"No AgentB events: {[e.message for e in generated_events]}"
|
| 289 |
+
|
| 290 |
+
# Agent A should have its own text
|
| 291 |
+
assert "Agent A" in agent_a_events[0].message
|
| 292 |
+
# Agent B should have its own text, NOT Agent A's
|
| 293 |
+
assert "Agent B" in agent_b_events[0].message
|
| 294 |
+
assert "Agent A" not in agent_b_events[0].message, "Buffer not cleared between agents!"
|
|
@@ -41,8 +41,11 @@ async def test_timeout_synthesizes_evidence():
|
|
| 41 |
mock_get_state.return_value = mock_state
|
| 42 |
|
| 43 |
# Setup mock ReportAgent
|
|
|
|
| 44 |
mock_report_agent = AsyncMock()
|
| 45 |
-
|
|
|
|
|
|
|
| 46 |
mock_create_agent.return_value = mock_report_agent
|
| 47 |
|
| 48 |
events = []
|
|
|
|
| 41 |
mock_get_state.return_value = mock_state
|
| 42 |
|
| 43 |
# Setup mock ReportAgent
|
| 44 |
+
# ChatAgent.run() returns AgentRunResponse with .text property
|
| 45 |
mock_report_agent = AsyncMock()
|
| 46 |
+
mock_response = MagicMock()
|
| 47 |
+
mock_response.text = "Final Synthesized Report"
|
| 48 |
+
mock_report_agent.run.return_value = mock_response
|
| 49 |
mock_create_agent.return_value = mock_report_agent
|
| 50 |
|
| 51 |
events = []
|