VibecoderMcSwaggins commited on
Commit
37b8559
Β·
1 Parent(s): 87feaa7

docs: add P0 critical bugs report - demo is broken

Browse files

4 critical bugs identified that make the demo non-functional:

1. **Free Tier LLM Quota Exhausted** (P0)
- HuggingFace Inference API returns 402 Payment Required
- All 3 fallback models fail (Llama, Mistral, Zephyr)
- Judge always returns 0% confidence, loops until max iterations

2. **Evidence Counter Shows 0 After Dedup** (P1)
- Deduplication may return empty list silently
- Need defensive check to keep original if dedup fails

3. **API Key Not Passed to Advanced Mode** (P0)
- User's API key from Gradio form is NEVER passed to MagenticOrchestrator
- Agents use settings.openai_api_key (env var) instead

4. **Singleton EmbeddingService Causes Cross-Session Pollution** (P0)
- ChromaDB collection persists across ALL requests
- Second query marks everything as "duplicate" of first query
- Results in "Found 20 new sources (0 total)"

Files changed (1) hide show
  1. docs/bugs/P0_CRITICAL_BUGS.md +215 -0
docs/bugs/P0_CRITICAL_BUGS.md ADDED
@@ -0,0 +1,215 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # P0 Critical Bugs - DeepBoner Demo Broken
2
+
3
+ **Date**: 2025-11-28
4
+ **Status**: ACTIVE - Demo is non-functional
5
+ **Priority**: P0 - Blocking hackathon submission
6
+
7
+ ---
8
+
9
+ ## Summary
10
+
11
+ The Gradio demo is completely non-functional. Both Simple and Advanced modes fail to produce results.
12
+
13
+ ---
14
+
15
+ ## Bug 1: Free Tier LLM Quota Exhausted (P0)
16
+
17
+ **Symptoms**:
18
+ - "Found 20 new sources (0 total)" in UI
19
+ - Judge returns 0% confidence
20
+ - Loops until max iterations
21
+ - Final report shows "Found 0 sources"
22
+
23
+ **Root Cause**:
24
+ HuggingFace Inference API free tier quota is exhausted:
25
+ ```
26
+ 402 Client Error: Payment Required
27
+ You have exceeded your monthly included credits for Inference Providers
28
+ ```
29
+
30
+ All 3 fallback models fail:
31
+ 1. `meta-llama/Llama-3.1-8B-Instruct` - 402
32
+ 2. `mistralai/Mistral-7B-Instruct-v0.3` - 402
33
+ 3. `HuggingFaceH4/zephyr-7b-beta` - 402
34
+
35
+ **Impact**:
36
+ - Free tier users cannot use the demo AT ALL
37
+ - Judge always returns "continue" with 0% confidence
38
+ - Evidence IS found but never synthesized
39
+
40
+ **Fix Options**:
41
+ 1. **Upgrade HF account to PRO** (~$9/month) - immediate fix
42
+ 2. **Add HF_TOKEN env var** in HF Spaces secrets
43
+ 3. **Fall back to mock judge** when all LLMs fail (not great UX)
44
+ 4. **Show clear error message** instead of fake "0 sources"
45
+
46
+ ---
47
+
48
+ ## Bug 2: Evidence Counter Shows 0 After Dedup (P1)
49
+
50
+ **Symptoms**:
51
+ - "Found 20 new sources (0 total)"
52
+ - Evidence is found but total is 0
53
+
54
+ **Root Cause**:
55
+ On HuggingFace Spaces, the embeddings service may be failing silently.
56
+ The `_deduplicate_and_rank` function returns empty list instead of original.
57
+
58
+ **Code Location**: `src/orchestrator.py:219`
59
+ ```python
60
+ all_evidence = await self._deduplicate_and_rank(all_evidence, query)
61
+ ```
62
+
63
+ If this returns `[]`, we lose all evidence.
64
+
65
+ **Fix**:
66
+ ```python
67
+ # Add defensive check
68
+ deduped = await self._deduplicate_and_rank(all_evidence, query)
69
+ if not deduped and all_evidence:
70
+ logger.warning("Deduplication returned empty, keeping original")
71
+ # Keep original evidence
72
+ else:
73
+ all_evidence = deduped
74
+ ```
75
+
76
+ ---
77
+
78
+ ## Bug 3: API Key Not Passed to Advanced Mode (P0)
79
+
80
+ **Symptoms**:
81
+ - User enters OpenAI API key
82
+ - Selects Advanced mode
83
+ - Gets error or uses wrong/no key
84
+
85
+ **Root Cause**: CONFIRMED
86
+ The user-provided API key is **NEVER passed** to MagenticOrchestrator!
87
+
88
+ **Code Flow**:
89
+ 1. `research_agent()` receives `api_key` from Gradio βœ…
90
+ 2. `configure_orchestrator(user_api_key=api_key)` is called βœ…
91
+ 3. For Simple mode: `JudgeHandler(model=OpenAIModel(..., api_key=user_api_key))` βœ…
92
+ 4. For Advanced mode: `MagenticOrchestrator(max_rounds=...)` - **NO API KEY PASSED** ❌
93
+
94
+ **Bug Location 1**: `src/orchestrator_factory.py:48-52`
95
+ ```python
96
+ if effective_mode == "advanced":
97
+ orchestrator_cls = _get_magentic_orchestrator_class()
98
+ return orchestrator_cls(
99
+ max_rounds=config.max_iterations if config else 10,
100
+ # MISSING: api_key or chat_client parameter!
101
+ )
102
+ ```
103
+
104
+ **Bug Location 2**: `src/agents/magentic_agents.py:24-27`
105
+ ```python
106
+ client = chat_client or OpenAIChatClient(
107
+ model_id=settings.openai_model,
108
+ api_key=settings.openai_api_key, # READS FROM ENV, NOT USER INPUT!
109
+ )
110
+ ```
111
+
112
+ **Fix Required**:
113
+ 1. Pass `user_api_key` to `create_orchestrator()`
114
+ 2. Create `OpenAIChatClient` with user's key
115
+ 3. Pass `chat_client` to `MagenticOrchestrator`
116
+ 4. Propagate to all agent factories
117
+
118
+ ---
119
+
120
+ ## Bug 4: Singleton EmbeddingService Causes Cross-Session Pollution (P0)
121
+
122
+ **Symptoms**:
123
+ - First query: "Found 20 new sources (20 total)" βœ…
124
+ - Second query: "Found 20 new sources (0 total)" ❌
125
+ - Same query twice: 0 sources second time
126
+
127
+ **Root Cause**: CONFIRMED
128
+ The EmbeddingService is a **SINGLETON** that persists across ALL Gradio requests!
129
+
130
+ **Code Location**: `src/services/embeddings.py:164-172`
131
+ ```python
132
+ _embedding_service: EmbeddingService | None = None # SINGLETON - NEVER RESET!
133
+
134
+ def get_embedding_service() -> EmbeddingService:
135
+ global _embedding_service
136
+ if _embedding_service is None:
137
+ _embedding_service = EmbeddingService() # Created ONCE per process
138
+ return _embedding_service
139
+ ```
140
+
141
+ **What Happens**:
142
+ 1. Query 1: Finds 20 articles β†’ adds to ChromaDB β†’ `unique = 20`
143
+ 2. Query 2: Finds 20 articles β†’ `search_similar()` matches Query 1's data β†’ `is_duplicate=True` β†’ `unique = 0`
144
+ 3. Evidence list becomes empty after deduplication!
145
+
146
+ **The Real Bug**: `_deduplicate_and_rank()` returns empty list and REPLACES all_evidence:
147
+ ```python
148
+ all_evidence = await self._deduplicate_and_rank(all_evidence, query) # Returns []!
149
+ ```
150
+
151
+ **Fix Options**:
152
+ 1. **Clear collection per session**: Add `clear()` method and call at start of each `run()`
153
+ 2. **Use session-scoped collections**: Create unique collection name per Gradio session
154
+ 3. **Don't use singleton**: Create fresh EmbeddingService per orchestrator run
155
+ 4. **Defensive check**: If dedup returns empty but input wasn't, keep original
156
+
157
+ ---
158
+
159
+ ## Verification Commands
160
+
161
+ ```bash
162
+ # Test search works
163
+ uv run python -c "
164
+ import asyncio
165
+ from src.tools.pubmed import PubMedTool
166
+ async def test():
167
+ tool = PubMedTool()
168
+ results = await tool.search('female libido', 5)
169
+ print(f'Found {len(results)} results')
170
+ asyncio.run(test())
171
+ "
172
+
173
+ # Test HF inference (will fail with 402 if quota exhausted)
174
+ uv run python -c "
175
+ from huggingface_hub import InferenceClient
176
+ client = InferenceClient()
177
+ resp = client.chat_completion(
178
+ messages=[{'role': 'user', 'content': 'Hi'}],
179
+ model='meta-llama/Llama-3.1-8B-Instruct',
180
+ max_tokens=10
181
+ )
182
+ print(resp)
183
+ "
184
+ ```
185
+
186
+ ---
187
+
188
+ ## Immediate Actions
189
+
190
+ ### Option A: Add HF Pro Account (Recommended)
191
+ 1. Upgrade HF account to PRO: https://huggingface.co/pricing
192
+ 2. Generate access token with "inference" scope
193
+ 3. Add `HF_TOKEN` secret to HF Spaces
194
+ 4. Verify in HFInferenceJudgeHandler
195
+
196
+ ### Option B: Require Paid API Key
197
+ 1. Remove "Free Tier" option from UI
198
+ 2. Make API key required
199
+ 3. Update messaging
200
+
201
+ ### Option C: Better Error Handling
202
+ 1. Detect 402 errors specifically
203
+ 2. Show user-friendly message: "Free tier exhausted, please add API key"
204
+ 3. Don't loop - fail fast with clear explanation
205
+
206
+ ---
207
+
208
+ ## Definition of Done
209
+
210
+ - [ ] Demo works with free tier OR shows clear error
211
+ - [ ] Demo works with OpenAI key (Simple + Advanced)
212
+ - [ ] Demo works with Anthropic key (Simple only)
213
+ - [ ] Evidence is correctly accumulated
214
+ - [ ] Final report shows actual sources found
215
+ - [ ] No silent failures