Merge pull request #62 from The-Obstacle-Is-The-Way/dev
Browse filesfix: resolve all P0-P3 bugs (termination, streaming, thinking state)
- docs/bugs/ACTIVE_BUGS.md +35 -20
- docs/bugs/FIX_PLAN_CRITICAL_BUGS.md +0 -36
- docs/bugs/FIX_PLAN_MAGENTIC_MODE.md +0 -227
- docs/bugs/FIX_UI_SIMPLIFICATION.md +0 -314
- docs/bugs/INVESTIGATION_INVALID_MODELS.md +0 -31
- docs/bugs/INVESTIGATION_QUOTA_BLOCKER.md +0 -49
- docs/bugs/P0_CRITICAL_BUGS.md +0 -43
- docs/bugs/P1_GRADIO_SETTINGS_CLEANUP.md +0 -81
- docs/bugs/P1_MAGENTIC_STREAMING_AND_KEY_PERSISTENCE.md +0 -181
- docs/bugs/P3_MAGENTIC_NO_TERMINATION_EVENT.md +177 -0
- docs/bugs/SENIOR_AGENT_AUDIT_PROMPT.md +0 -247
- docs/bugs/SENIOR_AUDIT_RESULTS.md +0 -84
- src/app.py +5 -1
- src/orchestrator_magentic.py +24 -0
- tests/unit/test_magentic_termination.py +111 -0
docs/bugs/ACTIVE_BUGS.md
CHANGED
|
@@ -1,39 +1,54 @@
|
|
| 1 |
# Active Bugs
|
| 2 |
|
| 3 |
-
> Last updated: 2025-11-
|
| 4 |
|
| 5 |
-
##
|
| 6 |
|
| 7 |
-
|
| 8 |
-
**File**: [FIX_PLAN_MAGENTIC_MODE.md](./FIX_PLAN_MAGENTIC_MODE.md)
|
| 9 |
|
| 10 |
-
|
| 11 |
|
| 12 |
-
|
| 13 |
-
- `event.message.text` extraction fails in orchestrator
|
| 14 |
-
- `max_rounds=3` too low for SearchAgent + JudgeAgent + ReportAgent sequence
|
| 15 |
|
| 16 |
-
|
|
|
|
| 17 |
|
| 18 |
-
|
|
|
|
|
|
|
| 19 |
|
| 20 |
-
|
|
|
|
| 21 |
|
| 22 |
-
|
|
|
|
|
|
|
|
|
|
| 23 |
|
| 24 |
-
###
|
| 25 |
-
**
|
| 26 |
|
| 27 |
-
|
|
|
|
|
|
|
| 28 |
|
| 29 |
-
|
|
|
|
| 30 |
|
| 31 |
-
|
|
|
|
|
|
|
| 32 |
|
| 33 |
-
|
|
|
|
|
|
|
|
|
|
| 34 |
|
| 35 |
---
|
| 36 |
|
| 37 |
-
##
|
| 38 |
|
| 39 |
-
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
# Active Bugs
|
| 2 |
|
| 3 |
+
> Last updated: 2025-11-29
|
| 4 |
|
| 5 |
+
## P3 - Edge Case
|
| 6 |
|
| 7 |
+
*(None)*
|
|
|
|
| 8 |
|
| 9 |
+
---
|
| 10 |
|
| 11 |
+
## Resolved Bugs
|
|
|
|
|
|
|
| 12 |
|
| 13 |
+
### ~~P3 - Magentic Mode Missing Termination Guarantee~~ FIXED
|
| 14 |
+
**Commit**: `d36ce3c` (2025-11-29)
|
| 15 |
|
| 16 |
+
- Added `final_event_received` tracking in `orchestrator_magentic.py`
|
| 17 |
+
- Added fallback yield for "max iterations reached" scenario
|
| 18 |
+
- Verified with `test_magentic_termination.py`
|
| 19 |
|
| 20 |
+
### ~~P0 - Magentic Mode Report Generation~~ FIXED
|
| 21 |
+
**Commit**: `9006d69` (2025-11-29)
|
| 22 |
|
| 23 |
+
- Fixed `_extract_text()` to handle various message object formats
|
| 24 |
+
- Increased `max_rounds=10` (was 3)
|
| 25 |
+
- Added `temperature=1.0` for reasoning model compatibility
|
| 26 |
+
- Advanced mode now produces full research reports
|
| 27 |
|
| 28 |
+
### ~~P1 - Streaming Spam + API Key Persistence~~ FIXED
|
| 29 |
+
**Commit**: `0c9be4a` (2025-11-29)
|
| 30 |
|
| 31 |
+
- Streaming events now buffered (not token-by-token spam)
|
| 32 |
+
- API key persists across example clicks via `gr.State`
|
| 33 |
+
- Examples use explicit `None` values to avoid overwriting keys
|
| 34 |
|
| 35 |
+
### ~~P2 - Missing "Thinking" State~~ FIXED
|
| 36 |
+
**Commit**: `9006d69` (2025-11-29)
|
| 37 |
|
| 38 |
+
- Added `"thinking"` event type with hourglass icon
|
| 39 |
+
- Yields "Multi-agent reasoning in progress..." before blocking workflow call
|
| 40 |
+
- Users now see feedback during 2-5 minute initial processing
|
| 41 |
|
| 42 |
+
### ~~P1 - Gradio Settings Accordion~~ WONTFIX
|
| 43 |
+
|
| 44 |
+
Decision: Removed nested Blocks, using ChatInterface directly.
|
| 45 |
+
Accordion behavior is default Gradio - acceptable for demo.
|
| 46 |
|
| 47 |
---
|
| 48 |
|
| 49 |
+
## How to Report Bugs
|
| 50 |
|
| 51 |
+
1. Create `docs/bugs/P{N}_{SHORT_NAME}.md`
|
| 52 |
+
2. Include: Symptom, Root Cause, Fix Plan, Test Plan
|
| 53 |
+
3. Update this index
|
| 54 |
+
4. Priority: P0=blocker, P1=important, P2=UX, P3=edge case
|
docs/bugs/FIX_PLAN_CRITICAL_BUGS.md
DELETED
|
@@ -1,36 +0,0 @@
|
|
| 1 |
-
# Fix Plan: Critical Bugs (P0)
|
| 2 |
-
|
| 3 |
-
**Date**: 2025-11-28
|
| 4 |
-
**Status**: COMPLETED (2025-11-29)
|
| 5 |
-
**Based on**: `docs/bugs/SENIOR_AUDIT_RESULTS.md`
|
| 6 |
-
|
| 7 |
-
---
|
| 8 |
-
|
| 9 |
-
## Summary of Fixes
|
| 10 |
-
|
| 11 |
-
### 1. Fixed Data Leak (Bug 4 & 2)
|
| 12 |
-
- **Action**: Removed singleton `_embedding_service` in `src/services/embeddings.py`.
|
| 13 |
-
- **Action**: Updated `EmbeddingService.__init__` to use a unique collection name (`evidence_{uuid}`) for complete isolation per instance.
|
| 14 |
-
- **Action**: Refactored `SentenceTransformer` loading to a shared global to maintain performance while isolating state.
|
| 15 |
-
- **Verified**: Unit tests passed, including new isolation verification.
|
| 16 |
-
|
| 17 |
-
### 2. Fixed Advanced Mode BYOK (Bug 3)
|
| 18 |
-
- **Action**: Updated `create_orchestrator` in `src/orchestrator_factory.py` to accept `api_key`.
|
| 19 |
-
- **Action**: Updated `MagenticOrchestrator` to accept and use the `api_key` for the manager and agents.
|
| 20 |
-
- **Action**: Updated `src/app.py` to pass the user's API key during orchestrator configuration.
|
| 21 |
-
- **Verified**: `test_dual_mode_e2e.py` passed.
|
| 22 |
-
|
| 23 |
-
### 3. Fixed Free Tier Experience (Bug 1)
|
| 24 |
-
- **Action**: Updated `HFInferenceJudgeHandler` in `src/agent_factory/judges.py` to catch 402 (Payment Required) errors.
|
| 25 |
-
- **Action**: Added logic to return a "synthesize" assessment with a clear error message when quota is exhausted, stopping the infinite loop.
|
| 26 |
-
- **Verified**: Unit tests passed.
|
| 27 |
-
|
| 28 |
-
---
|
| 29 |
-
|
| 30 |
-
## Verification
|
| 31 |
-
|
| 32 |
-
All changes have been verified with:
|
| 33 |
-
- `make check` (lint, typecheck, test) - ALL PASSED
|
| 34 |
-
- Custom reproduction script for isolation - PASSED
|
| 35 |
-
|
| 36 |
-
The system is now stable for the hackathon demo.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
docs/bugs/FIX_PLAN_MAGENTIC_MODE.md
DELETED
|
@@ -1,227 +0,0 @@
|
|
| 1 |
-
# Fix Plan: Magentic Mode Report Generation
|
| 2 |
-
|
| 3 |
-
**Related Bug**: `P0_MAGENTIC_MODE_BROKEN.md`
|
| 4 |
-
**Approach**: Test-Driven Development (TDD)
|
| 5 |
-
**Estimated Scope**: 4 tasks, ~2-3 hours
|
| 6 |
-
|
| 7 |
-
---
|
| 8 |
-
|
| 9 |
-
## Problem Summary
|
| 10 |
-
|
| 11 |
-
Magentic mode runs but fails to produce readable reports due to:
|
| 12 |
-
|
| 13 |
-
1. **Primary Bug**: `MagenticFinalResultEvent.message` returns `ChatMessage` object, not text
|
| 14 |
-
2. **Secondary Bug**: Max rounds (3) reached before ReportAgent completes
|
| 15 |
-
3. **Tertiary Issues**: Stale "bioRxiv" references in prompts
|
| 16 |
-
|
| 17 |
-
---
|
| 18 |
-
|
| 19 |
-
## Fix Order (TDD)
|
| 20 |
-
|
| 21 |
-
### Phase 1: Write Failing Tests
|
| 22 |
-
|
| 23 |
-
**Task 1.1**: Create test for ChatMessage text extraction
|
| 24 |
-
|
| 25 |
-
```python
|
| 26 |
-
# tests/unit/test_orchestrator_magentic.py
|
| 27 |
-
|
| 28 |
-
def test_process_event_extracts_text_from_chat_message():
|
| 29 |
-
"""Final result event should extract text from ChatMessage object."""
|
| 30 |
-
# Arrange: Mock ChatMessage with .content attribute
|
| 31 |
-
# Act: Call _process_event with MagenticFinalResultEvent
|
| 32 |
-
# Assert: Returned AgentEvent.message is a string, not object repr
|
| 33 |
-
```
|
| 34 |
-
|
| 35 |
-
**Task 1.2**: Create test for max rounds configuration
|
| 36 |
-
|
| 37 |
-
```python
|
| 38 |
-
def test_orchestrator_uses_configured_max_rounds():
|
| 39 |
-
"""MagenticOrchestrator should use max_rounds from constructor."""
|
| 40 |
-
# Arrange: Create orchestrator with max_rounds=10
|
| 41 |
-
# Act: Build workflow
|
| 42 |
-
# Assert: Workflow has max_round_count=10
|
| 43 |
-
```
|
| 44 |
-
|
| 45 |
-
**Task 1.3**: Create test for bioRxiv reference removal
|
| 46 |
-
|
| 47 |
-
```python
|
| 48 |
-
def test_task_prompt_references_europe_pmc():
|
| 49 |
-
"""Task prompt should reference Europe PMC, not bioRxiv."""
|
| 50 |
-
# Arrange: Create orchestrator
|
| 51 |
-
# Act: Check task string in run()
|
| 52 |
-
# Assert: Contains "Europe PMC", not "bioRxiv"
|
| 53 |
-
```
|
| 54 |
-
|
| 55 |
-
---
|
| 56 |
-
|
| 57 |
-
### Phase 2: Fix ChatMessage Text Extraction
|
| 58 |
-
|
| 59 |
-
**File**: `src/orchestrator_magentic.py`
|
| 60 |
-
**Lines**: 192-199
|
| 61 |
-
|
| 62 |
-
**Current Code**:
|
| 63 |
-
```python
|
| 64 |
-
elif isinstance(event, MagenticFinalResultEvent):
|
| 65 |
-
text = event.message.text if event.message else "No result"
|
| 66 |
-
```
|
| 67 |
-
|
| 68 |
-
**Fixed Code**:
|
| 69 |
-
```python
|
| 70 |
-
elif isinstance(event, MagenticFinalResultEvent):
|
| 71 |
-
if event.message:
|
| 72 |
-
# ChatMessage may have .content or .text depending on version
|
| 73 |
-
if hasattr(event.message, 'content') and event.message.content:
|
| 74 |
-
text = str(event.message.content)
|
| 75 |
-
elif hasattr(event.message, 'text') and event.message.text:
|
| 76 |
-
text = str(event.message.text)
|
| 77 |
-
else:
|
| 78 |
-
# Fallback: convert entire message to string
|
| 79 |
-
text = str(event.message)
|
| 80 |
-
else:
|
| 81 |
-
text = "No result generated"
|
| 82 |
-
```
|
| 83 |
-
|
| 84 |
-
**Why**: The `agent_framework.ChatMessage` object structure may vary. We need defensive extraction.
|
| 85 |
-
|
| 86 |
-
---
|
| 87 |
-
|
| 88 |
-
### Phase 3: Fix Max Rounds Configuration
|
| 89 |
-
|
| 90 |
-
**File**: `src/orchestrator_magentic.py`
|
| 91 |
-
**Lines**: 97-99
|
| 92 |
-
|
| 93 |
-
**Current Code**:
|
| 94 |
-
```python
|
| 95 |
-
.with_standard_manager(
|
| 96 |
-
chat_client=manager_client,
|
| 97 |
-
max_round_count=self._max_rounds, # Already uses config
|
| 98 |
-
max_stall_count=3,
|
| 99 |
-
max_reset_count=2,
|
| 100 |
-
)
|
| 101 |
-
```
|
| 102 |
-
|
| 103 |
-
**Issue**: Default `max_rounds` in `__init__` is 10, but workflow may need more for complex queries.
|
| 104 |
-
|
| 105 |
-
**Fix**: Verify the value flows through correctly. Add logging.
|
| 106 |
-
|
| 107 |
-
```python
|
| 108 |
-
logger.info(
|
| 109 |
-
"Building Magentic workflow",
|
| 110 |
-
max_rounds=self._max_rounds,
|
| 111 |
-
max_stall=3,
|
| 112 |
-
max_reset=2,
|
| 113 |
-
)
|
| 114 |
-
```
|
| 115 |
-
|
| 116 |
-
**Also check**: `src/orchestrator_factory.py` passes config correctly:
|
| 117 |
-
```python
|
| 118 |
-
return MagenticOrchestrator(
|
| 119 |
-
max_rounds=config.max_iterations if config else 10,
|
| 120 |
-
)
|
| 121 |
-
```
|
| 122 |
-
|
| 123 |
-
---
|
| 124 |
-
|
| 125 |
-
### Phase 4: Fix Stale bioRxiv References
|
| 126 |
-
|
| 127 |
-
**Files to update**:
|
| 128 |
-
|
| 129 |
-
| File | Line | Change |
|
| 130 |
-
|------|------|--------|
|
| 131 |
-
| `src/orchestrator_magentic.py` | 131 | "bioRxiv" → "Europe PMC" |
|
| 132 |
-
| `src/agents/magentic_agents.py` | 32-33 | "bioRxiv" → "Europe PMC" |
|
| 133 |
-
| `src/app.py` | 202-203 | "bioRxiv" → "Europe PMC" |
|
| 134 |
-
|
| 135 |
-
**Search command to verify**:
|
| 136 |
-
```bash
|
| 137 |
-
grep -rn "bioRxiv\|biorxiv" src/
|
| 138 |
-
```
|
| 139 |
-
|
| 140 |
-
---
|
| 141 |
-
|
| 142 |
-
## Implementation Checklist
|
| 143 |
-
|
| 144 |
-
```
|
| 145 |
-
[ ] Phase 1: Write failing tests
|
| 146 |
-
[ ] 1.1 Test ChatMessage text extraction
|
| 147 |
-
[ ] 1.2 Test max rounds configuration
|
| 148 |
-
[ ] 1.3 Test Europe PMC references
|
| 149 |
-
|
| 150 |
-
[ ] Phase 2: Fix ChatMessage extraction
|
| 151 |
-
[ ] Update _process_event() in orchestrator_magentic.py
|
| 152 |
-
[ ] Run test 1.1 - should pass
|
| 153 |
-
|
| 154 |
-
[ ] Phase 3: Fix max rounds
|
| 155 |
-
[ ] Add logging to _build_workflow()
|
| 156 |
-
[ ] Verify factory passes config correctly
|
| 157 |
-
[ ] Run test 1.2 - should pass
|
| 158 |
-
|
| 159 |
-
[ ] Phase 4: Fix bioRxiv references
|
| 160 |
-
[ ] Update orchestrator_magentic.py task prompt
|
| 161 |
-
[ ] Update magentic_agents.py descriptions
|
| 162 |
-
[ ] Update app.py UI text
|
| 163 |
-
[ ] Run test 1.3 - should pass
|
| 164 |
-
[ ] Run grep to verify no remaining refs
|
| 165 |
-
|
| 166 |
-
[ ] Final Verification
|
| 167 |
-
[ ] make check passes
|
| 168 |
-
[ ] All tests pass (108+)
|
| 169 |
-
[ ] Manual test: run_magentic.py produces readable report
|
| 170 |
-
```
|
| 171 |
-
|
| 172 |
-
---
|
| 173 |
-
|
| 174 |
-
## Test Commands
|
| 175 |
-
|
| 176 |
-
```bash
|
| 177 |
-
# Run specific test file
|
| 178 |
-
uv run pytest tests/unit/test_orchestrator_magentic.py -v
|
| 179 |
-
|
| 180 |
-
# Run all tests
|
| 181 |
-
uv run pytest tests/unit/ -v
|
| 182 |
-
|
| 183 |
-
# Full check
|
| 184 |
-
make check
|
| 185 |
-
|
| 186 |
-
# Manual integration test
|
| 187 |
-
set -a && source .env && set +a
|
| 188 |
-
uv run python examples/orchestrator_demo/run_magentic.py "metformin alzheimer"
|
| 189 |
-
```
|
| 190 |
-
|
| 191 |
-
---
|
| 192 |
-
|
| 193 |
-
## Success Criteria
|
| 194 |
-
|
| 195 |
-
1. `run_magentic.py` outputs a readable research report (not `<ChatMessage object>`)
|
| 196 |
-
2. Report includes: Executive Summary, Key Findings, Drug Candidates, References
|
| 197 |
-
3. No "Max round count reached" error with default settings
|
| 198 |
-
4. No "bioRxiv" references anywhere in codebase
|
| 199 |
-
5. All 108+ tests pass
|
| 200 |
-
6. `make check` passes
|
| 201 |
-
|
| 202 |
-
---
|
| 203 |
-
|
| 204 |
-
## Files Modified
|
| 205 |
-
|
| 206 |
-
```
|
| 207 |
-
src/
|
| 208 |
-
├── orchestrator_magentic.py # ChatMessage fix, logging
|
| 209 |
-
├── agents/magentic_agents.py # bioRxiv → Europe PMC
|
| 210 |
-
└── app.py # bioRxiv → Europe PMC
|
| 211 |
-
|
| 212 |
-
tests/unit/
|
| 213 |
-
└── test_orchestrator_magentic.py # NEW: 3 tests
|
| 214 |
-
```
|
| 215 |
-
|
| 216 |
-
---
|
| 217 |
-
|
| 218 |
-
## Notes for AI Agent
|
| 219 |
-
|
| 220 |
-
When implementing this fix plan:
|
| 221 |
-
|
| 222 |
-
1. **DO NOT** create mock data or fake responses
|
| 223 |
-
2. **DO** write real tests that verify actual behavior
|
| 224 |
-
3. **DO** run `make check` after each phase
|
| 225 |
-
4. **DO** test with real OpenAI API key via `.env`
|
| 226 |
-
5. **DO** preserve existing functionality - simple mode must still work
|
| 227 |
-
6. **DO NOT** over-engineer - minimal changes to fix the specific bugs
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
docs/bugs/FIX_UI_SIMPLIFICATION.md
DELETED
|
@@ -1,314 +0,0 @@
|
|
| 1 |
-
# UI Simplification: Remove API Provider Dropdown
|
| 2 |
-
|
| 3 |
-
**Issues**: #52, #53
|
| 4 |
-
**Priority**: P1 - UX improvement for hackathon demo
|
| 5 |
-
**Estimated Time**: 30 minutes
|
| 6 |
-
**Senior Review**: ✅ Approved with changes (incorporated below)
|
| 7 |
-
|
| 8 |
-
---
|
| 9 |
-
|
| 10 |
-
## Problem
|
| 11 |
-
|
| 12 |
-
The current UI has confusing BYOK (Bring Your Own Key) settings:
|
| 13 |
-
|
| 14 |
-
1. **Provider dropdown is misleading** - Shows "openai" but actually uses free tier when no key
|
| 15 |
-
2. **Examples table shows useless columns** - API Key (empty), Provider (ignored)
|
| 16 |
-
3. **Anthropic doesn't work with Advanced mode** - Only OpenAI has `agent-framework` support
|
| 17 |
-
|
| 18 |
-
## Solution
|
| 19 |
-
|
| 20 |
-
Remove `api_provider` dropdown entirely. Auto-detect provider from key prefix.
|
| 21 |
-
|
| 22 |
-
**Functionality preserved:**
|
| 23 |
-
- Simple mode: Free tier, OpenAI, OR Anthropic (all work)
|
| 24 |
-
- Advanced mode: OpenAI only (Magentic multi-agent requires `OpenAIChatClient`)
|
| 25 |
-
|
| 26 |
-
---
|
| 27 |
-
|
| 28 |
-
## Implementation
|
| 29 |
-
|
| 30 |
-
### File: `src/app.py`
|
| 31 |
-
|
| 32 |
-
#### Change 1: Update `configure_orchestrator()` signature (lines 23-28)
|
| 33 |
-
|
| 34 |
-
```python
|
| 35 |
-
# BEFORE
|
| 36 |
-
def configure_orchestrator(
|
| 37 |
-
use_mock: bool = False,
|
| 38 |
-
mode: str = "simple",
|
| 39 |
-
user_api_key: str | None = None,
|
| 40 |
-
api_provider: str = "openai", # ← REMOVE
|
| 41 |
-
) -> tuple[Any, str]:
|
| 42 |
-
|
| 43 |
-
# AFTER
|
| 44 |
-
def configure_orchestrator(
|
| 45 |
-
use_mock: bool = False,
|
| 46 |
-
mode: str = "simple",
|
| 47 |
-
user_api_key: str | None = None,
|
| 48 |
-
) -> tuple[Any, str]:
|
| 49 |
-
```
|
| 50 |
-
|
| 51 |
-
#### Change 2: Update docstring (lines 29-40)
|
| 52 |
-
|
| 53 |
-
```python
|
| 54 |
-
# AFTER
|
| 55 |
-
"""
|
| 56 |
-
Create an orchestrator instance.
|
| 57 |
-
|
| 58 |
-
Args:
|
| 59 |
-
use_mock: If True, use MockJudgeHandler (no API key needed)
|
| 60 |
-
mode: Orchestrator mode ("simple" or "advanced")
|
| 61 |
-
user_api_key: Optional user-provided API key (BYOK) - auto-detects provider
|
| 62 |
-
|
| 63 |
-
Returns:
|
| 64 |
-
Tuple of (Orchestrator instance, backend_name)
|
| 65 |
-
"""
|
| 66 |
-
```
|
| 67 |
-
|
| 68 |
-
#### Change 3: Replace provider logic with auto-detection (lines 62-88)
|
| 69 |
-
|
| 70 |
-
```python
|
| 71 |
-
# BEFORE (lines 62-88) - complex provider checking with api_provider param
|
| 72 |
-
|
| 73 |
-
# AFTER - auto-detect from key prefix
|
| 74 |
-
# 2. Paid API Key (User provided or Env)
|
| 75 |
-
elif user_api_key and user_api_key.strip():
|
| 76 |
-
# Auto-detect provider from key prefix
|
| 77 |
-
model: AnthropicModel | OpenAIModel
|
| 78 |
-
if user_api_key.startswith("sk-ant-"):
|
| 79 |
-
# Anthropic key
|
| 80 |
-
anthropic_provider = AnthropicProvider(api_key=user_api_key)
|
| 81 |
-
model = AnthropicModel(settings.anthropic_model, provider=anthropic_provider)
|
| 82 |
-
backend_info = "Paid API (Anthropic)"
|
| 83 |
-
elif user_api_key.startswith("sk-"):
|
| 84 |
-
# OpenAI key
|
| 85 |
-
openai_provider = OpenAIProvider(api_key=user_api_key)
|
| 86 |
-
model = OpenAIModel(settings.openai_model, provider=openai_provider)
|
| 87 |
-
backend_info = "Paid API (OpenAI)"
|
| 88 |
-
else:
|
| 89 |
-
raise ValueError(
|
| 90 |
-
"Invalid API key format. Expected sk-... (OpenAI) or sk-ant-... (Anthropic)"
|
| 91 |
-
)
|
| 92 |
-
judge_handler = JudgeHandler(model=model)
|
| 93 |
-
|
| 94 |
-
# 3. Environment API Keys (fallback)
|
| 95 |
-
elif os.getenv("OPENAI_API_KEY"):
|
| 96 |
-
judge_handler = JudgeHandler(model=None) # Uses env key
|
| 97 |
-
backend_info = "Paid API (OpenAI from env)"
|
| 98 |
-
|
| 99 |
-
elif os.getenv("ANTHROPIC_API_KEY"):
|
| 100 |
-
judge_handler = JudgeHandler(model=None) # Uses env key
|
| 101 |
-
backend_info = "Paid API (Anthropic from env)"
|
| 102 |
-
|
| 103 |
-
# 4. Free Tier (HuggingFace Inference)
|
| 104 |
-
else:
|
| 105 |
-
judge_handler = HFInferenceJudgeHandler()
|
| 106 |
-
backend_info = "Free Tier (Llama 3.1 / Mistral)"
|
| 107 |
-
```
|
| 108 |
-
|
| 109 |
-
#### Change 4: Update `research_agent()` signature (lines 105-111)
|
| 110 |
-
|
| 111 |
-
```python
|
| 112 |
-
# BEFORE
|
| 113 |
-
async def research_agent(
|
| 114 |
-
message: str,
|
| 115 |
-
history: list[dict[str, Any]],
|
| 116 |
-
mode: str = "simple",
|
| 117 |
-
api_key: str = "",
|
| 118 |
-
api_provider: str = "openai", # ← REMOVE
|
| 119 |
-
) -> AsyncGenerator[str, None]:
|
| 120 |
-
|
| 121 |
-
# AFTER
|
| 122 |
-
async def research_agent(
|
| 123 |
-
message: str,
|
| 124 |
-
history: list[dict[str, Any]],
|
| 125 |
-
mode: str = "simple",
|
| 126 |
-
api_key: str = "",
|
| 127 |
-
) -> AsyncGenerator[str, None]:
|
| 128 |
-
```
|
| 129 |
-
|
| 130 |
-
#### Change 5: Update docstring (lines 112-124)
|
| 131 |
-
|
| 132 |
-
```python
|
| 133 |
-
# AFTER
|
| 134 |
-
"""
|
| 135 |
-
Gradio chat function that runs the research agent.
|
| 136 |
-
|
| 137 |
-
Args:
|
| 138 |
-
message: User's research question
|
| 139 |
-
history: Chat history (Gradio format)
|
| 140 |
-
mode: Orchestrator mode ("simple" or "advanced")
|
| 141 |
-
api_key: Optional user-provided API key (BYOK - auto-detects provider)
|
| 142 |
-
|
| 143 |
-
Yields:
|
| 144 |
-
Markdown-formatted responses for streaming
|
| 145 |
-
"""
|
| 146 |
-
```
|
| 147 |
-
|
| 148 |
-
#### Change 6: Fix Advanced mode check (line 139)
|
| 149 |
-
|
| 150 |
-
```python
|
| 151 |
-
# BEFORE
|
| 152 |
-
if mode == "advanced" and not (has_openai or (has_user_key and api_provider == "openai")):
|
| 153 |
-
|
| 154 |
-
# AFTER - auto-detect OpenAI key from prefix
|
| 155 |
-
is_openai_user_key = user_api_key and user_api_key.startswith("sk-") and not user_api_key.startswith("sk-ant-")
|
| 156 |
-
if mode == "advanced" and not (has_openai or is_openai_user_key):
|
| 157 |
-
yield (
|
| 158 |
-
"⚠️ **Advanced mode requires OpenAI API key.** "
|
| 159 |
-
"Anthropic keys only work in Simple mode. Falling back to Simple.\n\n"
|
| 160 |
-
)
|
| 161 |
-
mode = "simple"
|
| 162 |
-
```
|
| 163 |
-
|
| 164 |
-
#### Change 7: Remove premature "Using your key" message (lines 146-151)
|
| 165 |
-
|
| 166 |
-
```python
|
| 167 |
-
# BEFORE - uses api_provider which no longer exists
|
| 168 |
-
if has_user_key:
|
| 169 |
-
yield (
|
| 170 |
-
f"🔑 **Using your {api_provider.upper()} API key** - "
|
| 171 |
-
"Your key is used only for this session and is never stored.\n\n"
|
| 172 |
-
)
|
| 173 |
-
|
| 174 |
-
# AFTER - remove this block entirely
|
| 175 |
-
# The backend_name from configure_orchestrator already shows "Paid API (OpenAI)" or "Paid API (Anthropic)"
|
| 176 |
-
# No need for duplicate messaging
|
| 177 |
-
```
|
| 178 |
-
|
| 179 |
-
#### Change 8: Update configure_orchestrator call (lines 165-170)
|
| 180 |
-
|
| 181 |
-
```python
|
| 182 |
-
# BEFORE
|
| 183 |
-
orchestrator, backend_name = configure_orchestrator(
|
| 184 |
-
use_mock=False,
|
| 185 |
-
mode=mode,
|
| 186 |
-
user_api_key=user_api_key,
|
| 187 |
-
api_provider=api_provider, # ← REMOVE
|
| 188 |
-
)
|
| 189 |
-
|
| 190 |
-
# AFTER
|
| 191 |
-
orchestrator, backend_name = configure_orchestrator(
|
| 192 |
-
use_mock=False,
|
| 193 |
-
mode=mode,
|
| 194 |
-
user_api_key=user_api_key,
|
| 195 |
-
)
|
| 196 |
-
```
|
| 197 |
-
|
| 198 |
-
#### Change 9: Simplify examples (lines 210-229)
|
| 199 |
-
|
| 200 |
-
```python
|
| 201 |
-
# BEFORE - 4 items per example
|
| 202 |
-
examples=[
|
| 203 |
-
["What drugs improve female libido post-menopause?", "simple", "", "openai"],
|
| 204 |
-
["Clinical trials for erectile dysfunction alternatives to PDE5 inhibitors?", "simple", "", "openai"],
|
| 205 |
-
["Evidence for testosterone therapy in women with HSDD?", "simple", "", "openai"],
|
| 206 |
-
],
|
| 207 |
-
|
| 208 |
-
# AFTER - 2 items per example (query, mode) - API key always empty in examples
|
| 209 |
-
examples=[
|
| 210 |
-
["What drugs improve female libido post-menopause?", "simple"],
|
| 211 |
-
["Clinical trials for ED alternatives to PDE5 inhibitors?", "simple"],
|
| 212 |
-
["Evidence for testosterone therapy in women with HSDD?", "simple"],
|
| 213 |
-
],
|
| 214 |
-
```
|
| 215 |
-
|
| 216 |
-
#### Change 10: Update additional_inputs (lines 231-252)
|
| 217 |
-
|
| 218 |
-
```python
|
| 219 |
-
# BEFORE - 3 inputs (mode, api_key, api_provider)
|
| 220 |
-
additional_inputs=[
|
| 221 |
-
gr.Radio(
|
| 222 |
-
choices=["simple", "advanced"],
|
| 223 |
-
value="simple",
|
| 224 |
-
label="Orchestrator Mode",
|
| 225 |
-
info="Simple: Linear (Free Tier Friendly) | Advanced: Multi-Agent (Requires OpenAI)",
|
| 226 |
-
),
|
| 227 |
-
gr.Textbox(
|
| 228 |
-
label="🔑 API Key (Optional - BYOK)",
|
| 229 |
-
placeholder="sk-... or sk-ant-...",
|
| 230 |
-
type="password",
|
| 231 |
-
info="Enter your own API key. Never stored.",
|
| 232 |
-
),
|
| 233 |
-
gr.Radio( # ← REMOVE THIS ENTIRE BLOCK
|
| 234 |
-
choices=["openai", "anthropic"],
|
| 235 |
-
value="openai",
|
| 236 |
-
label="API Provider",
|
| 237 |
-
info="Select the provider for your API key",
|
| 238 |
-
),
|
| 239 |
-
],
|
| 240 |
-
|
| 241 |
-
# AFTER - 2 inputs (mode, api_key)
|
| 242 |
-
additional_inputs=[
|
| 243 |
-
gr.Radio(
|
| 244 |
-
choices=["simple", "advanced"],
|
| 245 |
-
value="simple",
|
| 246 |
-
label="Orchestrator Mode",
|
| 247 |
-
info="Simple: Works with any key or free tier | Advanced: Requires OpenAI key",
|
| 248 |
-
),
|
| 249 |
-
gr.Textbox(
|
| 250 |
-
label="🔑 API Key (Optional)",
|
| 251 |
-
placeholder="sk-... (OpenAI) or sk-ant-... (Anthropic)",
|
| 252 |
-
type="password",
|
| 253 |
-
info="Leave empty for free tier. Auto-detects provider from key prefix.",
|
| 254 |
-
),
|
| 255 |
-
],
|
| 256 |
-
```
|
| 257 |
-
|
| 258 |
-
#### Change 11: Update accordion label (line 230)
|
| 259 |
-
|
| 260 |
-
```python
|
| 261 |
-
# BEFORE
|
| 262 |
-
additional_inputs_accordion=gr.Accordion(label="⚙️ Settings", open=False),
|
| 263 |
-
|
| 264 |
-
# AFTER
|
| 265 |
-
additional_inputs_accordion=gr.Accordion(label="⚙️ Settings (Free tier works without API key)", open=False),
|
| 266 |
-
```
|
| 267 |
-
|
| 268 |
-
---
|
| 269 |
-
|
| 270 |
-
## Testing Checklist
|
| 271 |
-
|
| 272 |
-
### Manual Tests
|
| 273 |
-
- [ ] **No key**: Shows "Free Tier (Llama 3.1 / Mistral)" in backend
|
| 274 |
-
- [ ] **OpenAI key (sk-...)**: Shows "Paid API (OpenAI)" in backend
|
| 275 |
-
- [ ] **Anthropic key (sk-ant-...)**: Shows "Paid API (Anthropic)" in backend
|
| 276 |
-
- [ ] **Invalid key format**: Shows error message
|
| 277 |
-
- [ ] **Anthropic key + Advanced mode**: Falls back to Simple with warning
|
| 278 |
-
- [ ] **OpenAI key + Advanced mode**: Uses full Magentic multi-agent
|
| 279 |
-
- [ ] **Examples table**: Shows only 2 columns (query, mode)
|
| 280 |
-
- [ ] **MCP server**: Still accessible at `/gradio_api/mcp/`
|
| 281 |
-
|
| 282 |
-
### Unit Test Updates
|
| 283 |
-
- [ ] `tests/unit/test_app_smoke.py` - may need update if checking input count
|
| 284 |
-
|
| 285 |
-
---
|
| 286 |
-
|
| 287 |
-
## Definition of Done
|
| 288 |
-
|
| 289 |
-
- [ ] `api_provider` parameter removed from `configure_orchestrator()`
|
| 290 |
-
- [ ] `api_provider` parameter removed from `research_agent()`
|
| 291 |
-
- [ ] Auto-detection logic works for `sk-` and `sk-ant-` prefixes
|
| 292 |
-
- [ ] Advanced mode check uses auto-detection (not removed param)
|
| 293 |
-
- [ ] "Using your X key" message removed (backend_name handles this)
|
| 294 |
-
- [ ] Examples table shows 2 columns
|
| 295 |
-
- [ ] Accordion label updated
|
| 296 |
-
- [ ] Placeholder text shows both key formats
|
| 297 |
-
- [ ] All existing tests pass
|
| 298 |
-
- [ ] MCP server still works
|
| 299 |
-
|
| 300 |
-
---
|
| 301 |
-
|
| 302 |
-
## Mode Compatibility Matrix (Unchanged)
|
| 303 |
-
|
| 304 |
-
| Mode | No Key | OpenAI Key | Anthropic Key |
|
| 305 |
-
|------|--------|------------|---------------|
|
| 306 |
-
| **Simple** | ✅ Free tier | ✅ GPT-5.1 | ✅ Claude Sonnet 4.5 |
|
| 307 |
-
| **Advanced** | ⚠️ Falls back | ✅ Full Magentic | ⚠️ Falls back to Simple |
|
| 308 |
-
|
| 309 |
-
---
|
| 310 |
-
|
| 311 |
-
## Related
|
| 312 |
-
- Issue #52: UI Polish - Examples table confusion
|
| 313 |
-
- Issue #53: API Provider Simplification
|
| 314 |
-
- Senior Review: Approved 2025-11-28
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
docs/bugs/INVESTIGATION_INVALID_MODELS.md
DELETED
|
@@ -1,31 +0,0 @@
|
|
| 1 |
-
# Bug Investigation: Invalid Default LLM Models
|
| 2 |
-
|
| 3 |
-
## Status
|
| 4 |
-
- **Date:** 2025-11-29
|
| 5 |
-
- **Reporter:** CLI User
|
| 6 |
-
- **Component:** `src/utils/config.py`
|
| 7 |
-
- **Priority:** High (Magentic Mode Blocker)
|
| 8 |
-
- **Resolution:** FIXED
|
| 9 |
-
|
| 10 |
-
## Issue Description
|
| 11 |
-
The user encountered a 403 error when running in Magentic mode:
|
| 12 |
-
`Error code: 403 - {'error': {'message': 'Project ... does not have access to model gpt-5', ... 'code': 'model_not_found'}}`
|
| 13 |
-
|
| 14 |
-
## Root Cause Analysis
|
| 15 |
-
OpenAI deprecated the base `gpt-5` model. Tier 5 accounts now have access to:
|
| 16 |
-
- `gpt-5.1` (current flagship)
|
| 17 |
-
- `gpt-5-mini`
|
| 18 |
-
- `gpt-5-nano`
|
| 19 |
-
- `gpt-4.1`, `gpt-4.1-mini`, `gpt-4.1-nano`
|
| 20 |
-
- `o3`, `o4-mini`
|
| 21 |
-
|
| 22 |
-
The base `gpt-5` is NO LONGER available via API.
|
| 23 |
-
|
| 24 |
-
## Solution Implemented
|
| 25 |
-
Updated `src/utils/config.py` to use:
|
| 26 |
-
- `openai_model`: `gpt-5.1` (the actual current model)
|
| 27 |
-
- `anthropic_model`: `claude-sonnet-4-5-20250929` (unchanged)
|
| 28 |
-
|
| 29 |
-
## Verification
|
| 30 |
-
- `tests/unit/agent_factory/test_judges_factory.py` updated and passed.
|
| 31 |
-
- User confirmed Tier 5 access to `gpt-5.1` via OpenAI dashboard.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
docs/bugs/INVESTIGATION_QUOTA_BLOCKER.md
DELETED
|
@@ -1,49 +0,0 @@
|
|
| 1 |
-
# Bug Investigation: HF Free Tier Quota Exhaustion
|
| 2 |
-
|
| 3 |
-
## Status
|
| 4 |
-
- **Date:** 2025-11-29
|
| 5 |
-
- **Reporter:** CLI User
|
| 6 |
-
- **Component:** `HFInferenceJudgeHandler`
|
| 7 |
-
- **Priority:** High (UX Blocker for Free Tier)
|
| 8 |
-
- **Resolution:** FIXED
|
| 9 |
-
|
| 10 |
-
## Issue Description
|
| 11 |
-
On a fresh run with a simple query ("What drugs improve female libido post-menopause?"), the system retrieved 20 valid sources but failed during the Judge/Analysis phase with:
|
| 12 |
-
`⚠️ Free Tier Quota Exceeded ⚠️`
|
| 13 |
-
|
| 14 |
-
This results in a "Synthesis" step that has 0 candidates and 0 findings, rendering the application useless for free users once the (very low) limit is hit, despite having valid search results.
|
| 15 |
-
|
| 16 |
-
## Evidence
|
| 17 |
-
Output provided:
|
| 18 |
-
```text
|
| 19 |
-
### Citations (20 sources)
|
| 20 |
-
...
|
| 21 |
-
### Reasoning
|
| 22 |
-
⚠️ **Free Tier Quota Exceeded** ⚠️
|
| 23 |
-
```
|
| 24 |
-
|
| 25 |
-
## Root Cause Analysis
|
| 26 |
-
1. **Search Success:** `SearchAgent` correctly found 20 documents (PubMed/EuropePMC).
|
| 27 |
-
2. **Judge Failure:** `HFInferenceJudgeHandler` called the HF Inference API.
|
| 28 |
-
3. **Quota Trap:** The API returned a 402 (Payment Required) or Quota error.
|
| 29 |
-
4. **Previous Handling:** The handler caught this error and returned a `JudgeAssessment` with `sufficient=True` (to stop the loop) and *empty* fields.
|
| 30 |
-
5. **Data Loss:** The 20 valid search results were effectively discarded from the "Analysis" perspective.
|
| 31 |
-
|
| 32 |
-
## The "Deep Blocker"
|
| 33 |
-
The system had a "hard failure" mode for quota exhaustion, assuming that if the LLM can't judge, we have *no* useful information. This "bricked" the UX for free users immediately upon hitting the limit.
|
| 34 |
-
|
| 35 |
-
## Solution Implemented
|
| 36 |
-
Modified `HFInferenceJudgeHandler._create_quota_exhausted_assessment` to:
|
| 37 |
-
1. Accept the `evidence` list as an argument.
|
| 38 |
-
2. Perform basic heuristic extraction (borrowed from `MockJudgeHandler` logic):
|
| 39 |
-
- Use titles as "Key Findings" (first 5 sources).
|
| 40 |
-
- Add a clear message in "Drug Candidates" telling the user to upgrade.
|
| 41 |
-
3. Return this "Partial" assessment instead of an empty one.
|
| 42 |
-
|
| 43 |
-
## Verification
|
| 44 |
-
- Created `tests/unit/agent_factory/test_judges_hf_quota.py` to verify that:
|
| 45 |
-
- 402 errors are caught.
|
| 46 |
-
- `sufficient` is set to `True` (stops loop).
|
| 47 |
-
- `key_findings` are populated from search result titles.
|
| 48 |
-
- `reasoning` contains the warning message.
|
| 49 |
-
- Ran existing tests `tests/unit/agent_factory/test_judges_hf.py` - All passed.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
docs/bugs/P0_CRITICAL_BUGS.md
DELETED
|
@@ -1,43 +0,0 @@
|
|
| 1 |
-
# P0 Critical Bugs - DeepBoner Demo Broken
|
| 2 |
-
|
| 3 |
-
**Date**: 2025-11-28
|
| 4 |
-
**Status**: RESOLVED (2025-11-29)
|
| 5 |
-
**Priority**: P0 - Blocking hackathon submission
|
| 6 |
-
|
| 7 |
-
---
|
| 8 |
-
|
| 9 |
-
## Summary
|
| 10 |
-
|
| 11 |
-
The Gradio demo was non-functional due to 4 critical bugs. All have been fixed and verified.
|
| 12 |
-
|
| 13 |
-
---
|
| 14 |
-
|
| 15 |
-
## Bug 1: Free Tier LLM Quota Exhausted (P0) - FIXED
|
| 16 |
-
|
| 17 |
-
**Resolution**:
|
| 18 |
-
- Implemented `QuotaExhaustedError` detection in `HFInferenceJudgeHandler`.
|
| 19 |
-
- The agent now gracefully stops and displays a clear "Free Tier Quota Exceeded" message instead of looping infinitely.
|
| 20 |
-
|
| 21 |
-
## Bug 2: Evidence Counter Shows 0 After Dedup (P1) - FIXED
|
| 22 |
-
|
| 23 |
-
**Resolution**:
|
| 24 |
-
- Fixed by resolving Bug 4 (Data Leak). Deduplication now works correctly on isolated per-request collections.
|
| 25 |
-
|
| 26 |
-
## Bug 3: API Key Not Passed to Advanced Mode (P0) - FIXED
|
| 27 |
-
|
| 28 |
-
**Resolution**:
|
| 29 |
-
- Plumbed `api_key` from the UI through `configure_orchestrator` -> `create_orchestrator` -> `MagenticOrchestrator`.
|
| 30 |
-
- Magentic agents now correctly use the user-provided OpenAI key.
|
| 31 |
-
|
| 32 |
-
## Bug 4: Singleton EmbeddingService Causes Cross-Session Pollution (P0) - FIXED
|
| 33 |
-
|
| 34 |
-
**Resolution**:
|
| 35 |
-
- Removed the singleton pattern for `EmbeddingService`.
|
| 36 |
-
- Each request now gets a fresh `EmbeddingService` with a unique, isolated ChromaDB collection (`evidence_{uuid}`).
|
| 37 |
-
- `SentenceTransformer` model is lazily cached globally to maintain performance.
|
| 38 |
-
|
| 39 |
-
---
|
| 40 |
-
|
| 41 |
-
## Verification
|
| 42 |
-
|
| 43 |
-
Run `make check` to verify all tests pass.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
docs/bugs/P1_GRADIO_SETTINGS_CLEANUP.md
DELETED
|
@@ -1,81 +0,0 @@
|
|
| 1 |
-
# P1 Bug: Gradio Settings Accordion Not Collapsing
|
| 2 |
-
|
| 3 |
-
**Priority**: P1 (UX Bug)
|
| 4 |
-
**Status**: OPEN
|
| 5 |
-
**Date**: 2025-11-27
|
| 6 |
-
**Target Component**: `src/app.py`
|
| 7 |
-
|
| 8 |
-
---
|
| 9 |
-
|
| 10 |
-
## 1. Problem Description
|
| 11 |
-
|
| 12 |
-
The "Settings" accordion in the Gradio UI (containing Orchestrator Mode, API Key, Provider) fails to collapse, even when configured with `open=False`. It remains permanently expanded, cluttering the interface and obscuring the chat history.
|
| 13 |
-
|
| 14 |
-
### Symptoms
|
| 15 |
-
- Accordion arrow toggles visually, but content remains visible.
|
| 16 |
-
- Occurs in both local development (`uv run src/app.py`) and HuggingFace Spaces.
|
| 17 |
-
|
| 18 |
-
---
|
| 19 |
-
|
| 20 |
-
## 2. Root Cause Analysis
|
| 21 |
-
|
| 22 |
-
**Definitive Cause**: Nested `Blocks` Context Bug.
|
| 23 |
-
`gr.ChatInterface` is itself a high-level abstraction that creates a `gr.Blocks` context. Wrapping `gr.ChatInterface` inside an external `with gr.Blocks():` context causes event listener conflicts, specifically breaking the JavaScript state management for `additional_inputs_accordion`.
|
| 24 |
-
|
| 25 |
-
**Reference**: [Gradio Issue #8861](https://github.com/gradio-app/gradio/issues/8861) confirms that `additional_inputs_accordion` malfunctions when `ChatInterface` is not the top-level block.
|
| 26 |
-
|
| 27 |
-
---
|
| 28 |
-
|
| 29 |
-
## 3. Solution Strategy: "The Unwrap Fix"
|
| 30 |
-
|
| 31 |
-
We will remove the redundant `gr.Blocks` wrapper. This restores the native behavior of `ChatInterface`, ensuring the accordion respects `open=False`.
|
| 32 |
-
|
| 33 |
-
### Implementation Plan
|
| 34 |
-
|
| 35 |
-
**Refactor `src/app.py` / `create_demo()`**:
|
| 36 |
-
|
| 37 |
-
1. **Remove** the `with gr.Blocks() as demo:` context manager.
|
| 38 |
-
2. **Instantiate** `gr.ChatInterface` directly as the `demo` object.
|
| 39 |
-
3. **Migrate UI Elements**:
|
| 40 |
-
* **Header**: Move the H1/Title text into the `title` parameter of `ChatInterface`.
|
| 41 |
-
* **Footer**: Move the footer text ("MCP Server Active...") into the `description` parameter. `ChatInterface` supports Markdown in `description`, making it the ideal place for static info below the title but above the chat.
|
| 42 |
-
|
| 43 |
-
### Before (Buggy)
|
| 44 |
-
```python
|
| 45 |
-
def create_demo():
|
| 46 |
-
with gr.Blocks() as demo: # <--- CAUSE OF BUG
|
| 47 |
-
gr.Markdown("# Title")
|
| 48 |
-
gr.ChatInterface(..., additional_inputs_accordion=gr.Accordion(open=False))
|
| 49 |
-
gr.Markdown("Footer")
|
| 50 |
-
return demo
|
| 51 |
-
```
|
| 52 |
-
|
| 53 |
-
### After (Correct)
|
| 54 |
-
```python
|
| 55 |
-
def create_demo():
|
| 56 |
-
return gr.ChatInterface( # <--- FIX: Top-level component
|
| 57 |
-
...,
|
| 58 |
-
title="🧬 DeepBoner",
|
| 59 |
-
description="*AI-Powered Drug Repurposing Agent...*\n\n---\n**MCP Server Active**...",
|
| 60 |
-
additional_inputs_accordion=gr.Accordion(label="⚙️ Settings", open=False)
|
| 61 |
-
)
|
| 62 |
-
```
|
| 63 |
-
|
| 64 |
-
---
|
| 65 |
-
|
| 66 |
-
## 4. Validation
|
| 67 |
-
|
| 68 |
-
1. **Run**: `uv run python src/app.py`
|
| 69 |
-
2. **Check**: Open `http://localhost:7860`
|
| 70 |
-
3. **Verify**:
|
| 71 |
-
* Settings accordion starts **COLLAPSED**.
|
| 72 |
-
* Header title ("DeepBoner") is visible.
|
| 73 |
-
* Footer text ("MCP Server Active") is visible in the description area.
|
| 74 |
-
* Chat functionality works (Magentic/Simple modes).
|
| 75 |
-
|
| 76 |
-
---
|
| 77 |
-
|
| 78 |
-
## 5. Constraints & Notes
|
| 79 |
-
|
| 80 |
-
- **Layout**: We lose the ability to place arbitrary elements *below* the chat box (footer will move to top, under title), but this is an acceptable trade-off for a working UI.
|
| 81 |
-
- **CSS**: `ChatInterface` handles its own CSS; any custom class styling from the previous footer will be standardized to the description text style.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
docs/bugs/P1_MAGENTIC_STREAMING_AND_KEY_PERSISTENCE.md
DELETED
|
@@ -1,181 +0,0 @@
|
|
| 1 |
-
# Bug Report: Magentic Mode Integration Issues
|
| 2 |
-
|
| 3 |
-
## Status
|
| 4 |
-
- **Date:** 2025-11-29
|
| 5 |
-
- **Reporter:** CLI User
|
| 6 |
-
- **Priority:** P1 (UX Degradation + Deprecation Warnings)
|
| 7 |
-
- **Component:** `src/app.py`, `src/orchestrator_magentic.py`, `src/utils/llm_factory.py`
|
| 8 |
-
- **Status:** ✅ FIXED (Bug 1 & Bug 2) - 2025-11-29
|
| 9 |
-
- **Tests:** 138 passing (136 original + 2 new validation tests)
|
| 10 |
-
|
| 11 |
-
---
|
| 12 |
-
|
| 13 |
-
## Bug 1: Token-by-Token Streaming Spam ✅ FIXED
|
| 14 |
-
|
| 15 |
-
### Symptoms
|
| 16 |
-
When running Magentic (Advanced) mode, the UI shows hundreds of individual lines like:
|
| 17 |
-
```text
|
| 18 |
-
📡 STREAMING: Below
|
| 19 |
-
📡 STREAMING: is
|
| 20 |
-
📡 STREAMING: a
|
| 21 |
-
📡 STREAMING: curated
|
| 22 |
-
📡 STREAMING: list
|
| 23 |
-
...
|
| 24 |
-
```
|
| 25 |
-
|
| 26 |
-
Each token is displayed as a separate streaming event, creating visual spam and making it impossible to read the output until completion.
|
| 27 |
-
|
| 28 |
-
### Root Cause (VALIDATED)
|
| 29 |
-
**File:** `src/orchestrator_magentic.py:247-254`
|
| 30 |
-
|
| 31 |
-
```python
|
| 32 |
-
elif isinstance(event, MagenticAgentDeltaEvent):
|
| 33 |
-
if event.text:
|
| 34 |
-
return AgentEvent(
|
| 35 |
-
type="streaming",
|
| 36 |
-
message=event.text, # Single token!
|
| 37 |
-
data={"agent_id": event.agent_id},
|
| 38 |
-
iteration=iteration,
|
| 39 |
-
)
|
| 40 |
-
```
|
| 41 |
-
|
| 42 |
-
Every LLM token emits a `MagenticAgentDeltaEvent`, which creates an `AgentEvent(type="streaming")`.
|
| 43 |
-
|
| 44 |
-
**File:** `src/app.py:171-192` (BEFORE FIX)
|
| 45 |
-
|
| 46 |
-
```python
|
| 47 |
-
async for event in orchestrator.run(message):
|
| 48 |
-
event_md = event.to_markdown()
|
| 49 |
-
response_parts.append(event_md) # Appends EVERY token
|
| 50 |
-
|
| 51 |
-
if event.type == "complete":
|
| 52 |
-
yield event.message
|
| 53 |
-
else:
|
| 54 |
-
yield "\n\n".join(response_parts) # Yields ALL accumulated tokens
|
| 55 |
-
```
|
| 56 |
-
|
| 57 |
-
For N tokens, this yields N times, each time showing all previous tokens. This is O(N²) string operations and creates massive visual spam.
|
| 58 |
-
|
| 59 |
-
### Fix Applied
|
| 60 |
-
**File:** `src/app.py:175-204`
|
| 61 |
-
|
| 62 |
-
Implemented streaming token buffering with live updates:
|
| 63 |
-
1. Added `streaming_buffer = ""` to accumulate tokens
|
| 64 |
-
2. For each streaming event: append to buffer, yield immediately (for live typing UX)
|
| 65 |
-
3. **Key fix**: Don't append streaming events to `response_parts` (prevents O(N²) list growth)
|
| 66 |
-
4. Each yield has only ONE `📡 STREAMING:` line (the accumulated buffer)
|
| 67 |
-
5. Flush buffer to `response_parts` only when non-streaming event occurs
|
| 68 |
-
|
| 69 |
-
**Result**: Live typing feel preserved, but no visual spam (each update replaces, not accumulates)
|
| 70 |
-
|
| 71 |
-
### Proposed Fix Options
|
| 72 |
-
|
| 73 |
-
**Option A: Buffer streaming tokens (recommended)**
|
| 74 |
-
```python
|
| 75 |
-
# In app.py - accumulate streaming tokens, yield periodically
|
| 76 |
-
streaming_buffer = ""
|
| 77 |
-
last_yield_time = time.time()
|
| 78 |
-
|
| 79 |
-
async for event in orchestrator.run(message):
|
| 80 |
-
if event.type == "streaming":
|
| 81 |
-
streaming_buffer += event.message
|
| 82 |
-
# Only yield every 500ms or on newline
|
| 83 |
-
if time.time() - last_yield_time > 0.5 or "\n" in event.message:
|
| 84 |
-
yield f"📡 {streaming_buffer}"
|
| 85 |
-
last_yield_time = time.time()
|
| 86 |
-
elif event.type == "complete":
|
| 87 |
-
yield event.message
|
| 88 |
-
else:
|
| 89 |
-
# Non-streaming events
|
| 90 |
-
response_parts.append(event.to_markdown())
|
| 91 |
-
yield "\n\n".join(response_parts)
|
| 92 |
-
```
|
| 93 |
-
|
| 94 |
-
**Option B: Don't yield streaming events at all**
|
| 95 |
-
```python
|
| 96 |
-
# In app.py - only yield meaningful events
|
| 97 |
-
async for event in orchestrator.run(message):
|
| 98 |
-
if event.type == "streaming":
|
| 99 |
-
continue # Skip token-by-token spam
|
| 100 |
-
# ... rest of logic
|
| 101 |
-
```
|
| 102 |
-
|
| 103 |
-
**Option C: Fix at orchestrator level**
|
| 104 |
-
Don't emit `AgentEvent` for every delta - buffer in `_process_event`.
|
| 105 |
-
|
| 106 |
-
---
|
| 107 |
-
|
| 108 |
-
## Bug 2: API Key Does Not Persist in Textbox ✅ FIXED
|
| 109 |
-
|
| 110 |
-
### Symptoms
|
| 111 |
-
1. User opens the "Mode & API Key" accordion
|
| 112 |
-
2. User pastes their API key into the password textbox
|
| 113 |
-
3. User clicks an example OR clicks elsewhere
|
| 114 |
-
4. The API key textbox is now empty - value lost
|
| 115 |
-
|
| 116 |
-
### Root Cause (VALIDATED)
|
| 117 |
-
**File:** `src/app.py:255-267` (BEFORE FIX)
|
| 118 |
-
|
| 119 |
-
```python
|
| 120 |
-
additional_inputs_accordion=additional_inputs_accordion,
|
| 121 |
-
additional_inputs=[
|
| 122 |
-
gr.Radio(...),
|
| 123 |
-
gr.Textbox(
|
| 124 |
-
label="🔑 API Key (Optional)",
|
| 125 |
-
type="password",
|
| 126 |
-
# No `value` parameter - defaults to empty
|
| 127 |
-
# No state persistence mechanism
|
| 128 |
-
),
|
| 129 |
-
],
|
| 130 |
-
```
|
| 131 |
-
|
| 132 |
-
Gradio's `ChatInterface` with `additional_inputs` has known issues:
|
| 133 |
-
1. Clicking examples resets additional inputs to defaults
|
| 134 |
-
2. The accordion state and input values may not persist correctly
|
| 135 |
-
3. No explicit state management for the API key
|
| 136 |
-
|
| 137 |
-
### Fix Applied
|
| 138 |
-
**Files Modified:**
|
| 139 |
-
1. `src/app.py`
|
| 140 |
-
2. `src/utils/llm_factory.py`
|
| 141 |
-
|
| 142 |
-
**Bug 1 (Streaming Spam):**
|
| 143 |
-
- Accumulate tokens in `streaming_buffer`
|
| 144 |
-
- Yield updates immediately for live typing UX
|
| 145 |
-
- **Key**: Don't append to `response_parts` until stream segment complete
|
| 146 |
-
- Each yield has ONE `📡 STREAMING:` line (not N accumulated lines)
|
| 147 |
-
|
| 148 |
-
**Bug 2 (API Key Persistence):**
|
| 149 |
-
- **Strategy:** Partial example list (relies on Gradio behavior)
|
| 150 |
-
- Examples have only 2 elements `[message, mode]` instead of 4
|
| 151 |
-
- Gradio only updates inputs with corresponding example values
|
| 152 |
-
- Remaining inputs (api_key textbox) are left unchanged
|
| 153 |
-
- `api_key_state` parameter exists as fallback but may be redundant
|
| 154 |
-
- **Note:** This is a workaround relying on undocumented Gradio behavior
|
| 155 |
-
|
| 156 |
-
**Bug 3 (OpenAIModel Deprecation):** ✅ FIXED
|
| 157 |
-
- Replaced all `OpenAIModel` imports with `OpenAIChatModel` in `src/app.py` and `src/utils/llm_factory.py`.
|
| 158 |
-
|
| 159 |
-
### Test Results
|
| 160 |
-
```bash
|
| 161 |
-
uv run pytest tests/ -q
|
| 162 |
-
============================= 138 passed in 20.60s =============================
|
| 163 |
-
```
|
| 164 |
-
|
| 165 |
-
**Status:** ✅ All tests passing
|
| 166 |
-
|
| 167 |
-
### Why This Fix Works
|
| 168 |
-
|
| 169 |
-
**Bug 1 (Streaming Spam):**
|
| 170 |
-
- **Before:** Every token → `append()` to list → `yield` → List grew to size N → O(N²) complexity.
|
| 171 |
-
- **After:** Every token → `yield` dynamically constructed string (buffer + history) → List stays size K (number of *events*).
|
| 172 |
-
- **Impact:** Smooth streaming, no visual spam, no browser freeze.
|
| 173 |
-
|
| 174 |
-
**Bug 2 (API Key):**
|
| 175 |
-
- **Before:** Example click → Overwrote API Key textbox with `""`.
|
| 176 |
-
- **After:** Example click → Updates only `message` and `mode` → API Key textbox untouched.
|
| 177 |
-
- **Impact:** User input persists naturally.
|
| 178 |
-
|
| 179 |
-
### Remaining Work
|
| 180 |
-
- **Bug 4 (Asyncio GC errors):** Monitoring only - likely Gradio/HF Spaces issue
|
| 181 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
docs/bugs/P3_MAGENTIC_NO_TERMINATION_EVENT.md
ADDED
|
@@ -0,0 +1,177 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# P3 Bug Report: Advanced Mode Missing Termination Guarantee
|
| 2 |
+
|
| 3 |
+
## Status
|
| 4 |
+
- **Date:** 2025-11-29
|
| 5 |
+
- **Priority:** P3 (Edge case, but confusing UX)
|
| 6 |
+
- **Component:** `src/orchestrator_magentic.py`
|
| 7 |
+
- **Resolution:** Fixed (Guarantee termination event)
|
| 8 |
+
|
| 9 |
+
---
|
| 10 |
+
|
| 11 |
+
## Symptoms
|
| 12 |
+
|
| 13 |
+
In **Advanced (Magentic) mode** with OpenAI API key:
|
| 14 |
+
|
| 15 |
+
1. Workflow runs for many iterations (up to 10 rounds)
|
| 16 |
+
2. Agents search, judge, hypothesize repeatedly
|
| 17 |
+
3. Eventually... **nothing happens**
|
| 18 |
+
- No "complete" event
|
| 19 |
+
- No error message
|
| 20 |
+
- UI just stops updating
|
| 21 |
+
|
| 22 |
+
**User perception:** "Did it finish? Did it crash? What happened?"
|
| 23 |
+
|
| 24 |
+
### Observed Behavior
|
| 25 |
+
|
| 26 |
+
When workflow hits `max_round_count=10`:
|
| 27 |
+
- `workflow.run_stream(task)` iterator ends
|
| 28 |
+
- NO `MagenticFinalResultEvent` is emitted by agent-framework
|
| 29 |
+
- Our code yields nothing after the loop
|
| 30 |
+
- User is left hanging
|
| 31 |
+
|
| 32 |
+
---
|
| 33 |
+
|
| 34 |
+
## Root Cause Analysis
|
| 35 |
+
|
| 36 |
+
### Code Path (`src/orchestrator_magentic.py:170-186`)
|
| 37 |
+
|
| 38 |
+
```python
|
| 39 |
+
iteration = 0
|
| 40 |
+
try:
|
| 41 |
+
async for event in workflow.run_stream(task):
|
| 42 |
+
agent_event = self._process_event(event, iteration)
|
| 43 |
+
if agent_event:
|
| 44 |
+
if isinstance(event, MagenticAgentMessageEvent):
|
| 45 |
+
iteration += 1
|
| 46 |
+
yield agent_event
|
| 47 |
+
# BUG: NO FALLBACK HERE!
|
| 48 |
+
# If loop ends without FinalResultEvent, user sees nothing
|
| 49 |
+
|
| 50 |
+
except Exception as e:
|
| 51 |
+
logger.error("Magentic workflow failed", error=str(e))
|
| 52 |
+
yield AgentEvent(
|
| 53 |
+
type="error",
|
| 54 |
+
message=f"Workflow error: {e!s}",
|
| 55 |
+
iteration=iteration,
|
| 56 |
+
)
|
| 57 |
+
# BUG: NO FINALLY BLOCK TO GUARANTEE TERMINATION EVENT
|
| 58 |
+
```
|
| 59 |
+
|
| 60 |
+
### Workflow Configuration (`src/orchestrator_magentic.py:110-116`)
|
| 61 |
+
|
| 62 |
+
```python
|
| 63 |
+
.with_standard_manager(
|
| 64 |
+
chat_client=manager_client,
|
| 65 |
+
max_round_count=self._max_rounds, # 10 - can hit this limit
|
| 66 |
+
max_stall_count=3, # If agents repeat 3x
|
| 67 |
+
max_reset_count=2, # Workflow reset limit
|
| 68 |
+
)
|
| 69 |
+
```
|
| 70 |
+
|
| 71 |
+
### Failure Modes
|
| 72 |
+
|
| 73 |
+
| Scenario | What Happens | User Sees |
|
| 74 |
+
|----------|--------------|-----------|
|
| 75 |
+
| `MagenticFinalResultEvent` emitted | `_process_event` yields "complete" | Final report |
|
| 76 |
+
| Max rounds (10) reached, no final event | Loop ends silently | **Nothing** |
|
| 77 |
+
| `max_stall_count` triggered | Workflow ends | **Nothing** |
|
| 78 |
+
| `max_reset_count` triggered | Workflow ends | **Nothing** |
|
| 79 |
+
| OpenAI API error | Exception caught | Error message |
|
| 80 |
+
|
| 81 |
+
---
|
| 82 |
+
|
| 83 |
+
## The Fix
|
| 84 |
+
|
| 85 |
+
Add guaranteed termination event after the loop:
|
| 86 |
+
|
| 87 |
+
```python
|
| 88 |
+
iteration = 0
|
| 89 |
+
final_event_received = False
|
| 90 |
+
|
| 91 |
+
try:
|
| 92 |
+
async for event in workflow.run_stream(task):
|
| 93 |
+
agent_event = self._process_event(event, iteration)
|
| 94 |
+
if agent_event:
|
| 95 |
+
if isinstance(event, MagenticAgentMessageEvent):
|
| 96 |
+
iteration += 1
|
| 97 |
+
if agent_event.type == "complete":
|
| 98 |
+
final_event_received = True
|
| 99 |
+
yield agent_event
|
| 100 |
+
|
| 101 |
+
except Exception as e:
|
| 102 |
+
logger.error("Magentic workflow failed", error=str(e))
|
| 103 |
+
yield AgentEvent(
|
| 104 |
+
type="error",
|
| 105 |
+
message=f"Workflow error: {e!s}",
|
| 106 |
+
iteration=iteration,
|
| 107 |
+
)
|
| 108 |
+
final_event_received = True # Error is a form of termination
|
| 109 |
+
|
| 110 |
+
finally:
|
| 111 |
+
# GUARANTEE: Always emit termination event
|
| 112 |
+
if not final_event_received:
|
| 113 |
+
logger.warning(
|
| 114 |
+
"Workflow ended without final event",
|
| 115 |
+
iterations=iteration,
|
| 116 |
+
)
|
| 117 |
+
yield AgentEvent(
|
| 118 |
+
type="complete",
|
| 119 |
+
message=(
|
| 120 |
+
f"Research completed after {iteration} agent rounds. "
|
| 121 |
+
"Max iterations reached - results may be partial. "
|
| 122 |
+
"Try a more specific query for better results."
|
| 123 |
+
),
|
| 124 |
+
data={"iterations": iteration, "reason": "max_rounds_reached"},
|
| 125 |
+
iteration=iteration,
|
| 126 |
+
)
|
| 127 |
+
```
|
| 128 |
+
|
| 129 |
+
---
|
| 130 |
+
|
| 131 |
+
## Alternative: Increase Max Rounds
|
| 132 |
+
|
| 133 |
+
The default `max_rounds=10` might be too low for complex queries.
|
| 134 |
+
|
| 135 |
+
In `src/orchestrator_factory.py:52-53`:
|
| 136 |
+
```python
|
| 137 |
+
return orchestrator_cls(
|
| 138 |
+
max_rounds=config.max_iterations if config else 10, # Could increase to 15-20
|
| 139 |
+
api_key=api_key,
|
| 140 |
+
)
|
| 141 |
+
```
|
| 142 |
+
|
| 143 |
+
**Trade-off:** More rounds = more API cost, but better chance of complete results.
|
| 144 |
+
|
| 145 |
+
---
|
| 146 |
+
|
| 147 |
+
## Test Plan
|
| 148 |
+
|
| 149 |
+
- [ ] Add fallback yield after async for loop
|
| 150 |
+
- [ ] Add `final_event_received` flag tracking
|
| 151 |
+
- [ ] Log warning when fallback is used
|
| 152 |
+
- [ ] Test with `max_rounds=2` to force hitting limit
|
| 153 |
+
- [ ] Verify user always sees termination event
|
| 154 |
+
- [ ] `make check` passes
|
| 155 |
+
|
| 156 |
+
---
|
| 157 |
+
|
| 158 |
+
## Related Files
|
| 159 |
+
|
| 160 |
+
- `src/orchestrator_magentic.py` - Main fix location
|
| 161 |
+
- `src/orchestrator_factory.py` - Max rounds configuration
|
| 162 |
+
- `src/utils/models.py` - AgentEvent types
|
| 163 |
+
- `docs/bugs/P2_MAGENTIC_THINKING_STATE.md` - Related UX issue (implemented)
|
| 164 |
+
|
| 165 |
+
---
|
| 166 |
+
|
| 167 |
+
## Priority Justification
|
| 168 |
+
|
| 169 |
+
**P3** because:
|
| 170 |
+
- Advanced mode is working for most queries
|
| 171 |
+
- Only hits edge case when max rounds reached without synthesis
|
| 172 |
+
- User CAN retry with different query
|
| 173 |
+
- Not blocking hackathon demo (free tier Simple mode works)
|
| 174 |
+
|
| 175 |
+
Would be P2 if:
|
| 176 |
+
- This happened frequently
|
| 177 |
+
- No workaround existed
|
docs/bugs/SENIOR_AGENT_AUDIT_PROMPT.md
DELETED
|
@@ -1,247 +0,0 @@
|
|
| 1 |
-
# Senior Agent Audit Request: DeepBoner Codebase Bug Hunt
|
| 2 |
-
|
| 3 |
-
**Date**: 2025-11-28
|
| 4 |
-
**Requesting Agent**: Claude (Opus)
|
| 5 |
-
**Purpose**: Comprehensive bug audit and verification of P0_CRITICAL_BUGS.md
|
| 6 |
-
|
| 7 |
-
---
|
| 8 |
-
|
| 9 |
-
## Your Mission
|
| 10 |
-
|
| 11 |
-
You are a senior software engineer performing a comprehensive audit of the DeepBoner codebase. Your goals:
|
| 12 |
-
|
| 13 |
-
1. **VERIFY** the 4 bugs documented in `docs/bugs/P0_CRITICAL_BUGS.md` are accurately described
|
| 14 |
-
2. **FIND** any additional bugs (P0-P4) that could affect the demo
|
| 15 |
-
3. **TRACE** the complete code paths for Simple and Advanced modes
|
| 16 |
-
4. **IDENTIFY** any silent failures, race conditions, or edge cases
|
| 17 |
-
|
| 18 |
-
---
|
| 19 |
-
|
| 20 |
-
## Context: What DeepBoner Does
|
| 21 |
-
|
| 22 |
-
DeepBoner is a Gradio-based biomedical research agent that:
|
| 23 |
-
1. Takes a research question from user
|
| 24 |
-
2. Searches PubMed, ClinicalTrials.gov, Europe PMC
|
| 25 |
-
3. Uses an LLM "judge" to evaluate if evidence is sufficient
|
| 26 |
-
4. Either loops for more evidence or synthesizes a final report
|
| 27 |
-
|
| 28 |
-
**Two Modes**:
|
| 29 |
-
- **Simple**: Linear orchestrator with search → judge → report loop
|
| 30 |
-
- **Advanced**: Magentic multi-agent with SearchAgent, JudgeAgent, HypothesisAgent, ReportAgent
|
| 31 |
-
|
| 32 |
-
**Three Backend Options**:
|
| 33 |
-
- Free tier: HuggingFace Inference API (Llama/Mistral)
|
| 34 |
-
- OpenAI: User-provided or env var key
|
| 35 |
-
- Anthropic: User-provided or env var key (Simple mode only)
|
| 36 |
-
|
| 37 |
-
---
|
| 38 |
-
|
| 39 |
-
## Files to Audit (Priority Order)
|
| 40 |
-
|
| 41 |
-
### Critical Path Files:
|
| 42 |
-
1. `src/app.py` - Gradio UI, entry point, key routing
|
| 43 |
-
2. `src/orchestrator.py` - Simple mode main loop
|
| 44 |
-
3. `src/orchestrator_factory.py` - Mode selection and orchestrator creation
|
| 45 |
-
4. `src/orchestrator_magentic.py` - Advanced mode implementation
|
| 46 |
-
5. `src/services/embeddings.py` - Deduplication singleton (KNOWN BUG)
|
| 47 |
-
6. `src/agent_factory/judges.py` - LLM judge handlers (HF, OpenAI, Anthropic)
|
| 48 |
-
|
| 49 |
-
### Supporting Files:
|
| 50 |
-
7. `src/tools/search_handler.py` - Parallel search orchestration
|
| 51 |
-
8. `src/tools/pubmed.py` - PubMed API integration
|
| 52 |
-
9. `src/tools/clinicaltrials.py` - ClinicalTrials.gov API
|
| 53 |
-
10. `src/tools/europepmc.py` - Europe PMC API
|
| 54 |
-
11. `src/agents/magentic_agents.py` - Agent factories (KNOWN BUG: hardcoded env key)
|
| 55 |
-
12. `src/utils/config.py` - Settings and configuration
|
| 56 |
-
13. `src/utils/models.py` - Data models (Evidence, Citation, etc.)
|
| 57 |
-
|
| 58 |
-
---
|
| 59 |
-
|
| 60 |
-
## Known Bugs to Verify
|
| 61 |
-
|
| 62 |
-
### Bug 1: Free Tier LLM Quota Exhausted
|
| 63 |
-
**Claim**: HuggingFace Inference returns 402, all 3 fallback models fail
|
| 64 |
-
**Verify**:
|
| 65 |
-
- Check `src/agent_factory/judges.py` class `HFInferenceJudgeHandler`
|
| 66 |
-
- Trace the fallback chain: Llama → Mistral → Zephyr
|
| 67 |
-
- Confirm what happens when ALL fail (does it return default "continue"?)
|
| 68 |
-
- Check if the error message reaches the user or is swallowed
|
| 69 |
-
|
| 70 |
-
### Bug 2: Evidence Counter Shows 0 After Dedup
|
| 71 |
-
**Claim**: `_deduplicate_and_rank()` can return empty list, losing all evidence
|
| 72 |
-
**Verify**:
|
| 73 |
-
- Check `src/orchestrator.py` lines 97-114 and 219
|
| 74 |
-
- Trace what happens if `embeddings.deduplicate()` returns `[]`
|
| 75 |
-
- Is there defensive handling? Does exception handler catch this?
|
| 76 |
-
- Could this be a race condition in async code?
|
| 77 |
-
|
| 78 |
-
### Bug 3: API Key Not Passed to Advanced Mode
|
| 79 |
-
**Claim**: User's API key from Gradio is never passed to MagenticOrchestrator
|
| 80 |
-
**Verify**:
|
| 81 |
-
- Trace: `app.py:research_agent()` → `configure_orchestrator()` → `orchestrator_factory.py`
|
| 82 |
-
- Check if `user_api_key` is passed to `create_orchestrator()`
|
| 83 |
-
- Check if `MagenticOrchestrator.__init__()` receives a key
|
| 84 |
-
- Check `src/agents/magentic_agents.py` - do agents use `settings.openai_api_key`?
|
| 85 |
-
|
| 86 |
-
### Bug 4: Singleton EmbeddingService Cross-Session Pollution
|
| 87 |
-
**Claim**: ChromaDB collection persists across requests, causing false duplicates
|
| 88 |
-
**Verify**:
|
| 89 |
-
- Check `src/services/embeddings.py` singleton pattern
|
| 90 |
-
- Is `_embedding_service` ever reset?
|
| 91 |
-
- What happens to ChromaDB collection between Gradio requests?
|
| 92 |
-
- Could this cause "Found 20 new sources (0 total)"?
|
| 93 |
-
|
| 94 |
-
---
|
| 95 |
-
|
| 96 |
-
## Additional Bug Categories to Search For
|
| 97 |
-
|
| 98 |
-
### A. Error Handling Gaps
|
| 99 |
-
- [ ] Silent `except: pass` blocks
|
| 100 |
-
- [ ] Exceptions logged but not re-raised
|
| 101 |
-
- [ ] Missing error messages to user
|
| 102 |
-
- [ ] Swallowed API errors
|
| 103 |
-
|
| 104 |
-
### B. Async/Concurrency Issues
|
| 105 |
-
- [ ] Race conditions in parallel searches
|
| 106 |
-
- [ ] Shared mutable state across async calls
|
| 107 |
-
- [ ] Missing `await` keywords
|
| 108 |
-
- [ ] Event loop blocking (sync code in async context)
|
| 109 |
-
|
| 110 |
-
### C. API Integration Bugs
|
| 111 |
-
- [ ] Missing rate limiting
|
| 112 |
-
- [ ] Hardcoded timeouts that are too short
|
| 113 |
-
- [ ] XML/JSON parsing failures not handled
|
| 114 |
-
- [ ] Empty response handling
|
| 115 |
-
|
| 116 |
-
### D. State Management Issues
|
| 117 |
-
- [ ] Global singletons that should be session-scoped
|
| 118 |
-
- [ ] Gradio state not properly isolated between users
|
| 119 |
-
- [ ] Memory leaks from accumulated data
|
| 120 |
-
|
| 121 |
-
### E. Configuration Bugs
|
| 122 |
-
- [ ] Missing env var defaults
|
| 123 |
-
- [ ] Type mismatches in settings
|
| 124 |
-
- [ ] Hardcoded values that should be configurable
|
| 125 |
-
|
| 126 |
-
### F. UI/UX Bugs
|
| 127 |
-
- [ ] Streaming not working properly
|
| 128 |
-
- [ ] Progress messages misleading
|
| 129 |
-
- [ ] Examples not matching actual functionality
|
| 130 |
-
- [ ] Error messages not user-friendly
|
| 131 |
-
|
| 132 |
-
---
|
| 133 |
-
|
| 134 |
-
## Output Format
|
| 135 |
-
|
| 136 |
-
Please produce a report with:
|
| 137 |
-
|
| 138 |
-
### 1. Verification of Known Bugs
|
| 139 |
-
For each of the 4 bugs in P0_CRITICAL_BUGS.md:
|
| 140 |
-
- **CONFIRMED** or **INCORRECT** or **PARTIALLY CORRECT**
|
| 141 |
-
- Exact file:line references
|
| 142 |
-
- Any corrections or additional details
|
| 143 |
-
|
| 144 |
-
### 2. New Bugs Found
|
| 145 |
-
For each new bug:
|
| 146 |
-
```
|
| 147 |
-
## Bug N: [Title]
|
| 148 |
-
**Priority**: P0/P1/P2/P3/P4
|
| 149 |
-
**File**: path/to/file.py:line
|
| 150 |
-
**Symptoms**: What the user sees
|
| 151 |
-
**Root Cause**: Technical explanation
|
| 152 |
-
**Code**:
|
| 153 |
-
```python
|
| 154 |
-
# The buggy code
|
| 155 |
-
```
|
| 156 |
-
**Fix**:
|
| 157 |
-
```python
|
| 158 |
-
# The corrected code
|
| 159 |
-
```
|
| 160 |
-
```
|
| 161 |
-
|
| 162 |
-
### 3. Code Quality Concerns
|
| 163 |
-
Any patterns that aren't bugs but could cause issues:
|
| 164 |
-
- Technical debt
|
| 165 |
-
- Missing tests for critical paths
|
| 166 |
-
- Unclear error handling
|
| 167 |
-
|
| 168 |
-
### 4. Recommended Fix Order
|
| 169 |
-
Prioritized list of what to fix first for a working demo.
|
| 170 |
-
|
| 171 |
-
---
|
| 172 |
-
|
| 173 |
-
## Commands to Help Your Investigation
|
| 174 |
-
|
| 175 |
-
```bash
|
| 176 |
-
# Run the tests
|
| 177 |
-
make check
|
| 178 |
-
|
| 179 |
-
# Test search works
|
| 180 |
-
uv run python -c "
|
| 181 |
-
import asyncio
|
| 182 |
-
from src.tools.pubmed import PubMedTool
|
| 183 |
-
async def test():
|
| 184 |
-
tool = PubMedTool()
|
| 185 |
-
results = await tool.search('female libido', 5)
|
| 186 |
-
print(f'Found {len(results)} results')
|
| 187 |
-
asyncio.run(test())
|
| 188 |
-
"
|
| 189 |
-
|
| 190 |
-
# Test HF inference (will show 402 if quota exhausted)
|
| 191 |
-
uv run python -c "
|
| 192 |
-
from huggingface_hub import InferenceClient
|
| 193 |
-
client = InferenceClient()
|
| 194 |
-
try:
|
| 195 |
-
resp = client.chat_completion(
|
| 196 |
-
messages=[{'role': 'user', 'content': 'Hi'}],
|
| 197 |
-
model='meta-llama/Llama-3.1-8B-Instruct',
|
| 198 |
-
max_tokens=10
|
| 199 |
-
)
|
| 200 |
-
print(resp)
|
| 201 |
-
except Exception as e:
|
| 202 |
-
print(f'Error: {e}')
|
| 203 |
-
"
|
| 204 |
-
|
| 205 |
-
# Test full orchestrator (simple mode)
|
| 206 |
-
uv run python -c "
|
| 207 |
-
import asyncio
|
| 208 |
-
from src.app import configure_orchestrator
|
| 209 |
-
async def test():
|
| 210 |
-
orch, backend = configure_orchestrator(use_mock=True, mode='simple')
|
| 211 |
-
print(f'Backend: {backend}')
|
| 212 |
-
async for event in orch.run('test query'):
|
| 213 |
-
print(f'{event.type}: {event.message[:50] if event.message else \"\"}'[:60])
|
| 214 |
-
asyncio.run(test())
|
| 215 |
-
"
|
| 216 |
-
|
| 217 |
-
# Check for hardcoded API keys (security)
|
| 218 |
-
grep -r "sk-" src/ --include="*.py" | grep -v "sk-..." | grep -v "sk-ant-..."
|
| 219 |
-
|
| 220 |
-
# Find all singletons
|
| 221 |
-
grep -r "_.*: .* | None = None" src/ --include="*.py"
|
| 222 |
-
|
| 223 |
-
# Find all except blocks
|
| 224 |
-
grep -rn "except.*:" src/ --include="*.py" | head -50
|
| 225 |
-
```
|
| 226 |
-
|
| 227 |
-
---
|
| 228 |
-
|
| 229 |
-
## Important Notes
|
| 230 |
-
|
| 231 |
-
1. **DO NOT fix bugs** - just document them
|
| 232 |
-
2. **Be thorough** - check edge cases and error paths
|
| 233 |
-
3. **Be specific** - include file:line references
|
| 234 |
-
4. **Be skeptical** - verify claims in P0_CRITICAL_BUGS.md independently
|
| 235 |
-
5. **Think like a user** - what would break the demo experience?
|
| 236 |
-
|
| 237 |
-
The hackathon deadline is approaching. We need a working demo. Your audit will determine what gets fixed first.
|
| 238 |
-
|
| 239 |
-
---
|
| 240 |
-
|
| 241 |
-
## Deliverable
|
| 242 |
-
|
| 243 |
-
A comprehensive markdown report that:
|
| 244 |
-
1. Confirms or corrects the 4 known bugs
|
| 245 |
-
2. Lists any new bugs found (with priority)
|
| 246 |
-
3. Recommends the optimal fix order
|
| 247 |
-
4. Can be saved as `docs/bugs/SENIOR_AUDIT_RESULTS.md`
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
docs/bugs/SENIOR_AUDIT_RESULTS.md
DELETED
|
@@ -1,84 +0,0 @@
|
|
| 1 |
-
# Senior Agent Audit Results: DeepBoner Codebase
|
| 2 |
-
|
| 3 |
-
**Date**: 2025-11-28
|
| 4 |
-
**Auditor**: Claude (Senior Software Engineer)
|
| 5 |
-
**Status**: COMPLETE
|
| 6 |
-
|
| 7 |
-
---
|
| 8 |
-
|
| 9 |
-
## Executive Summary
|
| 10 |
-
|
| 11 |
-
The DeepBoner codebase has **4 critical defects** that render the demo non-functional for most users. The most severe is a **data leak** where the vector database persists across user sessions, causing search result corruption and potential privacy issues. Additionally, the "Advanced" mode ignores user-provided API keys, and the "Free Tier" mode fails silently when quotas are exhausted.
|
| 12 |
-
|
| 13 |
-
**Recommendation**: Immediate remediation of P0 bugs is required before hackathon submission.
|
| 14 |
-
|
| 15 |
-
---
|
| 16 |
-
|
| 17 |
-
## 1. Verification of Known Bugs (P0_CRITICAL_BUGS.md)
|
| 18 |
-
|
| 19 |
-
| Bug | Claim | Verification Status | Notes |
|
| 20 |
-
| :--- | :--- | :--- | :--- |
|
| 21 |
-
| **Bug 1** | Free Tier LLM Quota Exhausted | **CONFIRMED** | `HFInferenceJudgeHandler` catches errors but returns a fallback assessment with `recommendation="continue"`. This causes the orchestrator to loop uselessly until `max_iterations` is reached. The user sees no error message. |
|
| 22 |
-
| **Bug 2** | Evidence Counter Shows 0 | **CONFIRMED** | Directly caused by Bug 4. Deduplication logic works correctly *in isolation*, but fails because the underlying ChromaDB collection is polluted with stale data from previous sessions. |
|
| 23 |
-
| **Bug 3** | API Key Not Passed to Advanced | **CONFIRMED** | `create_orchestrator` in `orchestrator_factory.py` ignores the user's API key. `MagenticOrchestrator` and its agents fall back to `settings.openai_api_key` (env var), which is empty for BYOK users. |
|
| 24 |
-
| **Bug 4** | Singleton EmbeddingService | **CONFIRMED** | `EmbeddingService` is a global singleton with an in-memory ChromaDB. The collection is never cleared. Data leaks between sessions, causing valid new results to be marked as duplicates of old results. |
|
| 25 |
-
|
| 26 |
-
---
|
| 27 |
-
|
| 28 |
-
## 2. New Bugs Found
|
| 29 |
-
|
| 30 |
-
### Bug 5: Search Error Swallowing (P2)
|
| 31 |
-
**File**: `src/orchestrator.py` / `src/tools/search_handler.py`
|
| 32 |
-
**Symptoms**: If all search tools fail (e.g., network issue, API limit), the UI shows "Found 0 sources" without explaining why.
|
| 33 |
-
**Root Cause**: `SearchHandler` captures exceptions and returns them in an `errors` list, but `Orchestrator` only logs them to the console (`logger.warning`) and proceeds with empty evidence.
|
| 34 |
-
**Fix**: Yield an `AgentEvent(type="error")` or include errors in the `search_complete` event message.
|
| 35 |
-
|
| 36 |
-
### Bug 6: Hardcoded Model Names (P3)
|
| 37 |
-
**File**: `src/agent_factory/judges.py`
|
| 38 |
-
**Symptoms**: Maintenance burden.
|
| 39 |
-
**Root Cause**: Model names like `meta-llama/Llama-3.1-8B-Instruct` are hardcoded in the class `HFInferenceJudgeHandler` rather than pulled from `config.py`.
|
| 40 |
-
**Fix**: Move to `Settings`.
|
| 41 |
-
|
| 42 |
-
---
|
| 43 |
-
|
| 44 |
-
## 3. Code Quality Concerns
|
| 45 |
-
|
| 46 |
-
1. **Singleton Abuse**: The `_embedding_service` global in `src/services/embeddings.py` is a major architectural flaw for a multi-user web app (even a demo). It should be scoped to the `Orchestrator` instance.
|
| 47 |
-
2. **Inconsistent Factory Signatures**: `create_orchestrator` does not accept `api_key`, forcing hacks or reliance on global env vars.
|
| 48 |
-
3. **Silent Failures**: The pervasive use of `try...except Exception` with only logging (no user feedback) makes debugging difficult for end-users.
|
| 49 |
-
|
| 50 |
-
---
|
| 51 |
-
|
| 52 |
-
## 4. Recommended Fix Order
|
| 53 |
-
|
| 54 |
-
### Step 1: Fix the Data Leak (Bug 4 & 2)
|
| 55 |
-
**Why**: Prevents result corruption and cross-user data leakage.
|
| 56 |
-
**Plan**:
|
| 57 |
-
1. Remove singleton pattern from `src/services/embeddings.py`.
|
| 58 |
-
2. Make `EmbeddingService` an instance variable of `Orchestrator`.
|
| 59 |
-
3. Initialize a fresh `EmbeddingService` (and ChromaDB collection) for each `run()`.
|
| 60 |
-
|
| 61 |
-
### Step 2: Fix Advanced Mode BYOK (Bug 3)
|
| 62 |
-
**Why**: Enables the core "Advanced" feature for judges/users.
|
| 63 |
-
**Plan**:
|
| 64 |
-
1. Update `create_orchestrator` signature to accept `api_key`.
|
| 65 |
-
2. Update `MagenticOrchestrator` to accept `api_key`.
|
| 66 |
-
3. Update `configure_orchestrator` in `app.py` to pass the key.
|
| 67 |
-
4. Ensure `MagenticOrchestrator` constructs `OpenAIChatClient` with the user's key.
|
| 68 |
-
|
| 69 |
-
### Step 3: Fix Free Tier Experience (Bug 1)
|
| 70 |
-
**Why**: Ensures a usable fallback for those without keys.
|
| 71 |
-
**Plan**:
|
| 72 |
-
1. In `HFInferenceJudgeHandler`, detect 402/429 errors.
|
| 73 |
-
2. If caught, return a `JudgeAssessment` that triggers a "Complete" event with a clear error message, rather than "Continue".
|
| 74 |
-
3. Add `HF_TOKEN` to the deployment environment if possible.
|
| 75 |
-
|
| 76 |
-
---
|
| 77 |
-
|
| 78 |
-
## Verification Plan
|
| 79 |
-
|
| 80 |
-
After applying fixes, run:
|
| 81 |
-
1. **Unit Tests**: `make check`
|
| 82 |
-
2. **Manual Test (Simple)**: Run without key, verify 402 error is handled OR works if token added.
|
| 83 |
-
3. **Manual Test (Advanced)**: Run with OpenAI key, verify it proceeds past initialization.
|
| 84 |
-
4. **Manual Test (Dedup)**: Run same query twice. Second run should find same number of results (not 0).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
src/app.py
CHANGED
|
@@ -173,7 +173,11 @@ async def research_agent(
|
|
| 173 |
user_api_key=user_api_key,
|
| 174 |
)
|
| 175 |
|
| 176 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 177 |
|
| 178 |
# Immediate loading feedback so user knows something is happening
|
| 179 |
yield (
|
|
|
|
| 173 |
user_api_key=user_api_key,
|
| 174 |
)
|
| 175 |
|
| 176 |
+
# Immediate backend info + loading feedback so user knows something is happening
|
| 177 |
+
yield (
|
| 178 |
+
f"🧠 **Backend**: {backend_name}\n\n"
|
| 179 |
+
"⏳ **Processing...** Searching PubMed, ClinicalTrials.gov, Europe PMC...\n"
|
| 180 |
+
)
|
| 181 |
|
| 182 |
# Immediate loading feedback so user knows something is happening
|
| 183 |
yield (
|
src/orchestrator_magentic.py
CHANGED
|
@@ -168,14 +168,38 @@ The final output should be a structured research report."""
|
|
| 168 |
)
|
| 169 |
|
| 170 |
iteration = 0
|
|
|
|
|
|
|
| 171 |
try:
|
| 172 |
async for event in workflow.run_stream(task):
|
| 173 |
agent_event = self._process_event(event, iteration)
|
| 174 |
if agent_event:
|
| 175 |
if isinstance(event, MagenticAgentMessageEvent):
|
| 176 |
iteration += 1
|
|
|
|
|
|
|
|
|
|
|
|
|
| 177 |
yield agent_event
|
| 178 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 179 |
except Exception as e:
|
| 180 |
logger.error("Magentic workflow failed", error=str(e))
|
| 181 |
yield AgentEvent(
|
|
|
|
| 168 |
)
|
| 169 |
|
| 170 |
iteration = 0
|
| 171 |
+
final_event_received = False
|
| 172 |
+
|
| 173 |
try:
|
| 174 |
async for event in workflow.run_stream(task):
|
| 175 |
agent_event = self._process_event(event, iteration)
|
| 176 |
if agent_event:
|
| 177 |
if isinstance(event, MagenticAgentMessageEvent):
|
| 178 |
iteration += 1
|
| 179 |
+
|
| 180 |
+
if agent_event.type == "complete":
|
| 181 |
+
final_event_received = True
|
| 182 |
+
|
| 183 |
yield agent_event
|
| 184 |
|
| 185 |
+
# GUARANTEE: Always emit termination event if stream ends without one
|
| 186 |
+
# (e.g., max rounds reached)
|
| 187 |
+
if not final_event_received:
|
| 188 |
+
logger.warning(
|
| 189 |
+
"Workflow ended without final event",
|
| 190 |
+
iterations=iteration,
|
| 191 |
+
)
|
| 192 |
+
yield AgentEvent(
|
| 193 |
+
type="complete",
|
| 194 |
+
message=(
|
| 195 |
+
f"Research completed after {iteration} agent rounds. "
|
| 196 |
+
"Max iterations reached - results may be partial. "
|
| 197 |
+
"Try a more specific query for better results."
|
| 198 |
+
),
|
| 199 |
+
data={"iterations": iteration, "reason": "max_rounds_reached"},
|
| 200 |
+
iteration=iteration,
|
| 201 |
+
)
|
| 202 |
+
|
| 203 |
except Exception as e:
|
| 204 |
logger.error("Magentic workflow failed", error=str(e))
|
| 205 |
yield AgentEvent(
|
tests/unit/test_magentic_termination.py
ADDED
|
@@ -0,0 +1,111 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""Tests for Magentic Orchestrator termination guarantee."""
|
| 2 |
+
|
| 3 |
+
from unittest.mock import MagicMock, patch
|
| 4 |
+
|
| 5 |
+
import pytest
|
| 6 |
+
from agent_framework import MagenticAgentMessageEvent
|
| 7 |
+
|
| 8 |
+
from src.orchestrator_magentic import MagenticOrchestrator
|
| 9 |
+
from src.utils.models import AgentEvent
|
| 10 |
+
|
| 11 |
+
# Skip tests if agent_framework is not installed
|
| 12 |
+
pytest.importorskip("agent_framework")
|
| 13 |
+
|
| 14 |
+
|
| 15 |
+
class MockChatMessage:
|
| 16 |
+
def __init__(self, content):
|
| 17 |
+
self.content = content
|
| 18 |
+
self.role = "assistant"
|
| 19 |
+
|
| 20 |
+
@property
|
| 21 |
+
def text(self):
|
| 22 |
+
return self.content
|
| 23 |
+
|
| 24 |
+
|
| 25 |
+
@pytest.fixture
|
| 26 |
+
def mock_magentic_requirements():
|
| 27 |
+
"""Mock requirements check."""
|
| 28 |
+
with patch("src.orchestrator_magentic.check_magentic_requirements"):
|
| 29 |
+
yield
|
| 30 |
+
|
| 31 |
+
|
| 32 |
+
@pytest.mark.asyncio
|
| 33 |
+
async def test_termination_event_emitted_on_stream_end(mock_magentic_requirements):
|
| 34 |
+
"""
|
| 35 |
+
Verify that a termination event is emitted when the workflow stream ends
|
| 36 |
+
without a MagenticFinalResultEvent (e.g. max rounds reached).
|
| 37 |
+
"""
|
| 38 |
+
orchestrator = MagenticOrchestrator(max_rounds=2)
|
| 39 |
+
|
| 40 |
+
# Use real event class
|
| 41 |
+
mock_message = MockChatMessage("Thinking...")
|
| 42 |
+
mock_agent_event = MagenticAgentMessageEvent(agent_id="SearchAgent", message=mock_message)
|
| 43 |
+
|
| 44 |
+
# Mock the workflow and its run_stream method
|
| 45 |
+
mock_workflow = MagicMock()
|
| 46 |
+
|
| 47 |
+
# Create an async generator for run_stream
|
| 48 |
+
async def mock_stream(task):
|
| 49 |
+
# Yield the real message event
|
| 50 |
+
yield mock_agent_event
|
| 51 |
+
# STOP HERE - No FinalResultEvent
|
| 52 |
+
|
| 53 |
+
mock_workflow.run_stream = mock_stream
|
| 54 |
+
|
| 55 |
+
# Mock _build_workflow to return our mock workflow
|
| 56 |
+
with patch.object(orchestrator, "_build_workflow", return_value=mock_workflow):
|
| 57 |
+
events = []
|
| 58 |
+
async for event in orchestrator.run("Research query"):
|
| 59 |
+
events.append(event)
|
| 60 |
+
|
| 61 |
+
for i, e in enumerate(events):
|
| 62 |
+
print(f"Event {i}: {e.type} - {e.message}")
|
| 63 |
+
|
| 64 |
+
assert len(events) >= 2
|
| 65 |
+
assert events[0].type == "started"
|
| 66 |
+
|
| 67 |
+
# Verify the message event was processed
|
| 68 |
+
# Depending on _process_event logic, MagenticAgentMessageEvent might map to different types
|
| 69 |
+
# We assume it maps to something valid or we just check presence.
|
| 70 |
+
assert any("Thinking..." in e.message for e in events)
|
| 71 |
+
|
| 72 |
+
# THE CRITICAL CHECK: Did we get the fallback termination event?
|
| 73 |
+
last_event = events[-1]
|
| 74 |
+
assert last_event.type == "complete"
|
| 75 |
+
assert "Max iterations reached" in last_event.message
|
| 76 |
+
assert last_event.data.get("reason") == "max_rounds_reached"
|
| 77 |
+
|
| 78 |
+
|
| 79 |
+
@pytest.mark.asyncio
|
| 80 |
+
async def test_no_double_termination_event(mock_magentic_requirements):
|
| 81 |
+
"""
|
| 82 |
+
Verify that we DO NOT emit a fallback event if the workflow finished normally.
|
| 83 |
+
"""
|
| 84 |
+
orchestrator = MagenticOrchestrator()
|
| 85 |
+
|
| 86 |
+
mock_workflow = MagicMock()
|
| 87 |
+
|
| 88 |
+
with patch.object(orchestrator, "_build_workflow", return_value=mock_workflow):
|
| 89 |
+
# Mock _process_event to simulate a natural completion event
|
| 90 |
+
with patch.object(orchestrator, "_process_event") as mock_process:
|
| 91 |
+
mock_process.side_effect = [
|
| 92 |
+
AgentEvent(type="thinking", message="Working...", iteration=1),
|
| 93 |
+
AgentEvent(type="complete", message="Done!", iteration=2),
|
| 94 |
+
]
|
| 95 |
+
|
| 96 |
+
async def mock_stream_with_yields(task):
|
| 97 |
+
yield "raw_event_1"
|
| 98 |
+
yield "raw_event_2"
|
| 99 |
+
|
| 100 |
+
mock_workflow.run_stream = mock_stream_with_yields
|
| 101 |
+
|
| 102 |
+
events = []
|
| 103 |
+
async for event in orchestrator.run("Research query"):
|
| 104 |
+
events.append(event)
|
| 105 |
+
|
| 106 |
+
assert events[-1].message == "Done!"
|
| 107 |
+
assert events[-1].type == "complete"
|
| 108 |
+
|
| 109 |
+
# Verify we didn't get a SECOND "Max iterations reached" event
|
| 110 |
+
fallback_events = [e for e in events if "Max iterations reached" in e.message]
|
| 111 |
+
assert len(fallback_events) == 0
|