VibecoderMcSwaggins commited on
Commit
8abea0c
Β·
1 Parent(s): ed76153

docs: Add P2 duplicate report content bug

Browse files

Root cause identified: MagenticFinalResultEvent and WorkflowOutputEvent
emit full report content that was already streamed. No deduplication exists.

This is a STACK BUG, not a model limitation. The fix involves tracking
streamed content length and emitting minimal completion message.

Also moves archived P1/P2 bugs to archive folder.

docs/bugs/ACTIVE_BUGS.md CHANGED
@@ -9,6 +9,19 @@
9
 
10
  ## Currently Active Bugs
11
 
 
 
 
 
 
 
 
 
 
 
 
 
 
12
  ### P3 - Progress Bar Positioning in ChatInterface
13
 
14
  **File:** `docs/bugs/P3_PROGRESS_BAR_POSITIONING.md`
 
9
 
10
  ## Currently Active Bugs
11
 
12
+ ### P2 - Duplicate Report Content in Output
13
+
14
+ **File:** `docs/bugs/P2_DUPLICATE_REPORT_CONTENT.md`
15
+ **Status:** OPEN - UX Bug
16
+
17
+ **Problem:** The final research report appears twice in the UI - once as streaming content, then again as a complete event. This is a **stack bug**, not a model issue.
18
+
19
+ **Root Cause:** Both `MagenticFinalResultEvent` and `WorkflowOutputEvent` emit the full report content that was already streamed. No deduplication exists.
20
+
21
+ **Recommended Fix:** Track streamed content length in orchestrator; emit minimal "Research complete." message instead of repeating content.
22
+
23
+ ---
24
+
25
  ### P3 - Progress Bar Positioning in ChatInterface
26
 
27
  **File:** `docs/bugs/P3_PROGRESS_BAR_POSITIONING.md`
docs/bugs/P2_DUPLICATE_REPORT_CONTENT.md ADDED
@@ -0,0 +1,221 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # P2 Bug: Duplicate Report Content in Output
2
+
3
+ **Date**: 2025-12-03
4
+ **Status**: OPEN
5
+ **Severity**: P2 (UX - Duplicate content confuses users)
6
+ **Component**: `src/orchestrators/advanced.py` + `src/app.py`
7
+
8
+ ---
9
+
10
+ ## Symptom
11
+
12
+ The final research report appears **twice** in the UI output:
13
+ 1. First as streaming content (with `πŸ“‘ **STREAMING**:` prefix)
14
+ 2. Then again as a complete event (without prefix)
15
+
16
+ Example:
17
+ ```
18
+ πŸ“‘ **STREAMING**:
19
+ ### Summary of Drugs and Mechanisms of Action
20
+ ...
21
+ ### Conclusion
22
+ Post-menopausal women experiencing libido issues can benefit from...
23
+ ### Recommendations
24
+ - Estrogen Therapy: Effective in enhancing...
25
+
26
+ Based on the information gathered, we have identified... <-- DUPLICATE STARTS
27
+ ### Summary of Drugs and Mechanisms of Action
28
+ ...
29
+ ### Conclusion
30
+ Post-menopausal women experiencing libido issues can benefit from...
31
+ ### Recommendations
32
+ - Estrogen Therapy: Effective in enhancing...
33
+ ```
34
+
35
+ ---
36
+
37
+ ## Root Cause Analysis
38
+
39
+ ### Event Flow (Current - Buggy)
40
+
41
+ ```
42
+ 1. Reporter Agent streams content
43
+ └─ MagenticAgentDeltaEvent Γ— N
44
+ └─ Each yields AgentEvent(type="streaming", message=delta)
45
+ └─ app.py: streaming_buffer += event.message
46
+ └─ User sees: "πŸ“‘ **STREAMING**: [content building up]"
47
+
48
+ 2. Reporter Agent completes
49
+ └─ MagenticAgentMessageEvent
50
+ └─ Yields truncated completion: "reporter: [first 200 chars]..."
51
+ └─ app.py: flushes streaming_buffer to response_parts
52
+
53
+ 3. Workflow ends
54
+ └─ MagenticFinalResultEvent OR WorkflowOutputEvent
55
+ └─ Contains FULL report content (same as streaming)
56
+ └─ Yields AgentEvent(type="complete", message=FULL_CONTENT)
57
+ └─ app.py: appends event.message to response_parts
58
+ └─ User sees: [SAME CONTENT AGAIN]
59
+ ```
60
+
61
+ ### Bug Location
62
+
63
+ **`src/orchestrators/advanced.py` lines 532-552:**
64
+ ```python
65
+ elif isinstance(event, MagenticFinalResultEvent):
66
+ text = self._extract_text(event.message) if event.message else "No result"
67
+ return AgentEvent(
68
+ type="complete",
69
+ message=text, # <-- FULL content, already streamed
70
+ ...
71
+ )
72
+
73
+ elif isinstance(event, WorkflowOutputEvent):
74
+ if event.data:
75
+ text = self._extract_text(event.data)
76
+ return AgentEvent(
77
+ type="complete",
78
+ message=text, # <-- FULL content, already streamed
79
+ ...
80
+ )
81
+ ```
82
+
83
+ **`src/app.py` lines 229-232:**
84
+ ```python
85
+ if event.type == "complete":
86
+ response_parts.append(event.message) # <-- Appends duplicate
87
+ yield "\n\n".join(response_parts)
88
+ ```
89
+
90
+ ### Why It Happens
91
+
92
+ 1. **Streaming events** yield the full report character-by-character
93
+ 2. **Final events** (`MagenticFinalResultEvent`, `WorkflowOutputEvent`) contain the same full content
94
+ 3. **No deduplication** exists between streamed content and final event content
95
+ 4. **app.py appends both** to the output
96
+
97
+ ---
98
+
99
+ ## Impact
100
+
101
+ | Aspect | Impact |
102
+ |--------|--------|
103
+ | UX | Report appears twice, looks buggy |
104
+ | Token usage | Renders same content twice |
105
+ | Trust | Users may think system is broken |
106
+
107
+ ---
108
+
109
+ ## Proposed Fix Options
110
+
111
+ ### Option 1: Skip Complete Event if Content Matches Streaming (Recommended)
112
+
113
+ **Location**: `src/app.py` lines 229-232
114
+
115
+ ```python
116
+ if event.type == "complete":
117
+ # Skip if content matches what we already streamed
118
+ streaming_content = next(
119
+ (p.replace("πŸ“‘ **STREAMING**: ", "") for p in response_parts if p.startswith("πŸ“‘ **STREAMING**:")),
120
+ None
121
+ )
122
+ if streaming_content and event.message.strip() == streaming_content.strip():
123
+ continue # Skip duplicate
124
+ response_parts.append(event.message)
125
+ yield "\n\n".join(response_parts)
126
+ ```
127
+
128
+ **Pros**: Simple, targets exact issue
129
+ **Cons**: String comparison may be fragile
130
+
131
+ ### Option 2: Track Streamed Content Hash
132
+
133
+ **Location**: `src/app.py`
134
+
135
+ ```python
136
+ streaming_hash = None
137
+ ...
138
+ if streaming_buffer:
139
+ streaming_hash = hash(streaming_buffer.strip())
140
+ response_parts.append(f"πŸ“‘ **STREAMING**: {streaming_buffer}")
141
+ streaming_buffer = ""
142
+ ...
143
+ if event.type == "complete":
144
+ if streaming_hash and hash(event.message.strip()) == streaming_hash:
145
+ continue # Skip duplicate
146
+ response_parts.append(event.message)
147
+ ```
148
+
149
+ **Pros**: More robust comparison
150
+ **Cons**: Hash collision possible (unlikely)
151
+
152
+ ### Option 3: Don't Emit Complete Event Content from Orchestrator
153
+
154
+ **Location**: `src/orchestrators/advanced.py` lines 532-552
155
+
156
+ Replace full content with summary:
157
+ ```python
158
+ elif isinstance(event, MagenticFinalResultEvent):
159
+ return AgentEvent(
160
+ type="complete",
161
+ message="Research complete.", # Don't repeat content
162
+ data={"iterations": iteration},
163
+ iteration=iteration,
164
+ )
165
+ ```
166
+
167
+ **Pros**: Clean separation of streaming vs completion
168
+ **Cons**: Loses fallback if streaming failed
169
+
170
+ ### Option 4: Flag-Based Deduplication in Orchestrator
171
+
172
+ **Location**: `src/orchestrators/advanced.py`
173
+
174
+ Track if substantial streaming occurred:
175
+ ```python
176
+ has_substantial_streaming = len(current_message_buffer) > 100
177
+
178
+ # In _process_event for final events:
179
+ if has_substantial_streaming:
180
+ return AgentEvent(
181
+ type="complete",
182
+ message="Research complete.", # Don't repeat
183
+ ...
184
+ )
185
+ ```
186
+
187
+ ---
188
+
189
+ ## Recommended Fix
190
+
191
+ **Option 3** is cleanest - the orchestrator should not re-emit content that was already streamed.
192
+
193
+ **Implementation**:
194
+ 1. Track `streamed_report_length` in the run loop
195
+ 2. If substantial content was streamed (>500 chars), emit minimal complete message
196
+ 3. If no streaming occurred, emit full content as fallback
197
+
198
+ ---
199
+
200
+ ## Files Involved
201
+
202
+ | File | Role |
203
+ |------|------|
204
+ | `src/orchestrators/advanced.py:532-552` | Emits duplicate complete events |
205
+ | `src/app.py:229-232` | Appends duplicate to output |
206
+
207
+ ---
208
+
209
+ ## Test Plan
210
+
211
+ 1. Run Free Tier query: "What drugs improve female libido post-menopause?"
212
+ 2. Verify report appears ONCE (with streaming prefix)
213
+ 3. Verify `complete` event does NOT repeat content
214
+ 4. Verify fallback works if streaming fails
215
+
216
+ ---
217
+
218
+ ## Related
219
+
220
+ - **Not related to model quality** - This is a stack bug, not model limitation
221
+ - P1 Free Tier fix (PR fix/P1-free-tier) enabled streaming, exposing this bug