VibecoderMcSwaggins commited on
Commit
2ac49c3
Β·
1 Parent(s): e6f0fda

docs: Add tech debt specs and architecture documentation

Browse files

Adds comprehensive documentation for phased bug fixes:
- SPEC-20: PubMed JSON Parsing Fix (P2)
- SPEC-21: Middleware Architecture Refactor (P2)
- SPEC-22: Progress Bar Removal (P3)

Also adds:
- docs/specs/README.md - Master index with implementation order
- docs/bugs/p2-hardening-issues.md - P2 issue tracker
- docs/bugs/p3-ms-framework-gaps.md - MS framework gap analysis
- docs/architecture/adr-001-middleware-refactor.md - ADR for middleware
- docs/architecture/ms-framework-usage-report.md - What we use from MS

docs/architecture/adr-001-middleware-refactor.md ADDED
@@ -0,0 +1,54 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # ADR-001: Middleware Architecture Refactor
2
+
3
+ **Status:** ACCEPTED
4
+ **Date:** 2025-12-06
5
+ **Decision Makers:** Development Team
6
+
7
+ ---
8
+
9
+ ## Context
10
+
11
+ The current `src/middleware/` folder is misleadingly named. It contains `SubIterationMiddleware`, which implements a workflow pattern (team→judge loop), not the interceptor middleware pattern used by Microsoft Agent Framework.
12
+
13
+ Additionally, we're missing proper middleware implementations for:
14
+ - Retry logic on transient errors (429, 500, 502, 503, 504)
15
+ - Token usage tracking for cost monitoring
16
+
17
+ ---
18
+
19
+ ## Decision
20
+
21
+ 1. **Rename `src/middleware/` to `src/workflows/`** to accurately reflect what it contains
22
+ 2. **Create new `src/middleware/` with proper MS-pattern middleware:**
23
+ - `RetryMiddleware(ChatMiddleware)` - exponential backoff retry
24
+ - `TokenTrackingMiddleware(ChatMiddleware)` - token usage logging
25
+
26
+ ---
27
+
28
+ ## Consequences
29
+
30
+ ### Positive
31
+ - Clearer codebase organization
32
+ - Proper use of MS Agent Framework patterns
33
+ - HuggingFace 429 crashes will be handled gracefully
34
+ - Token usage will be visible for cost monitoring
35
+
36
+ ### Negative
37
+ - Requires updating imports in `src/orchestrators/hierarchical.py`
38
+ - One-time migration effort
39
+
40
+ ### Neutral
41
+ - Aligns with Microsoft Agent Framework conventions
42
+
43
+ ---
44
+
45
+ ## Implementation
46
+
47
+ See [SPEC-21: Middleware Architecture Refactor](../specs/SPEC-21-MIDDLEWARE-ARCHITECTURE.md) for detailed implementation steps.
48
+
49
+ ---
50
+
51
+ ## References
52
+
53
+ - Microsoft Agent Framework `_middleware.py`
54
+ - [P2 Hardening Issues](../bugs/p2-hardening-issues.md) Issue 2 & 3
docs/architecture/ms-framework-usage-report.md ADDED
@@ -0,0 +1,68 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Microsoft Agent Framework Usage Report
2
+
3
+ **Date:** 2025-12-06
4
+ **Framework Version:** agent-framework-core==1.0.0b251204
5
+
6
+ ---
7
+
8
+ ## What We Use From MS Framework (pip installed)
9
+
10
+ ### Core Classes
11
+ - `BaseChatClient` - Base class for chat clients
12
+ - `ChatMessage`, `ChatRole` - Message types
13
+ - `ChatOptions` - Request configuration
14
+ - `ChatAgent` - Agent base class
15
+
16
+ ### Decorators (Applied to HuggingFaceChatClient)
17
+ - `@use_function_invocation` - Enables tool/function calling
18
+ - `@use_observability` - Adds OTEL tracing hooks
19
+ - `@use_chat_middleware` - Enables middleware pipeline
20
+
21
+ ### Middleware Base Classes (Available but NOT yet used)
22
+ - `ChatMiddleware` - Intercepts chat client requests
23
+ - `AgentMiddleware` - Intercepts agent invocations
24
+ - `FunctionMiddleware` - Intercepts tool calls
25
+
26
+ ---
27
+
28
+ ## What We Hand-Roll (Custom Implementation)
29
+
30
+ ### Orchestration
31
+ - `AdvancedOrchestrator` - Main research workflow
32
+ - `HierarchicalOrchestrator` - Team-based orchestration
33
+ - `SubIterationMiddleware` - Team→judge loop (workflow, not middleware)
34
+
35
+ ### Clients
36
+ - `HuggingFaceChatClient` - Adapter for HuggingFace Inference API
37
+ - Client factory with auto-detection
38
+
39
+ ### Tools
40
+ - `PubMedTool`, `ClinicalTrialsTool`, `EuropePMCTool`
41
+ - `SearchHandler` - Scatter-gather orchestration
42
+
43
+ ### Services
44
+ - `EmbeddingService` - Local sentence-transformers
45
+ - `LlamaIndexRAG` - Premium OpenAI embeddings
46
+ - `ResearchMemory` - Research state management
47
+
48
+ ---
49
+
50
+ ## Gap Analysis
51
+
52
+ | Component | MS Framework Has | DeepBoner Has | Status |
53
+ |-----------|------------------|---------------|--------|
54
+ | Chat Middleware | `ChatMiddleware` base | Uses decorator only | SPEC-21 |
55
+ | Retry Logic | N/A (left to user) | None | SPEC-21 |
56
+ | Token Tracking | OTEL histograms | None | SPEC-21 |
57
+ | Thread State | `AgentThread` serialization | `ResearchMemory` (no serialization) | P3 |
58
+ | Observability | Full OTEL | structlog only | P3 |
59
+
60
+ ---
61
+
62
+ ## Recommendations
63
+
64
+ 1. **Implement `RetryMiddleware`** using MS `ChatMiddleware` base class
65
+ 2. **Implement `TokenTrackingMiddleware`** for cost visibility
66
+ 3. **Rename `src/middleware/`** to avoid confusion with MS patterns
67
+
68
+ See [SPEC-21](../specs/SPEC-21-MIDDLEWARE-ARCHITECTURE.md) for implementation details.
docs/bugs/p2-hardening-issues.md ADDED
@@ -0,0 +1,45 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # P2: Hardening Issues
2
+
3
+ **Date:** 2025-12-06
4
+ **Priority:** P2 (Should fix soon)
5
+
6
+ ---
7
+
8
+ ## Issue 1: PubMed JSON Parsing Crash
9
+
10
+ **Status:** SPEC-20 CREATED
11
+ **File:** `src/tools/pubmed.py:88`
12
+
13
+ The PubMed search tool crashes when the API returns non-JSON responses (maintenance pages, error pages). The JSON parsing happens outside the try/except block.
14
+
15
+ **Resolution:** See [SPEC-20: PubMed JSON Parsing Fix](../specs/SPEC-20-PUBMED-JSON-FIX.md)
16
+
17
+ ---
18
+
19
+ ## Issue 2: HuggingFace Client Missing Retry Logic
20
+
21
+ **Status:** SPEC-21 CREATED
22
+ **File:** `src/clients/huggingface.py`
23
+
24
+ The HuggingFaceChatClient has no retry logic for transient errors (429 rate limits, 500 server errors). When the API returns a 429, the entire research workflow crashes.
25
+
26
+ **Resolution:** See [SPEC-21: Middleware Architecture Refactor](../specs/SPEC-21-MIDDLEWARE-ARCHITECTURE.md)
27
+
28
+ ---
29
+
30
+ ## Issue 3: Misleading Middleware Folder Name
31
+
32
+ **Status:** SPEC-21 CREATED
33
+ **File:** `src/middleware/`
34
+
35
+ The `src/middleware/` folder contains `SubIterationMiddleware`, which is actually a workflow pattern (team→judge loop), not interceptor middleware. This is confusing and misleading.
36
+
37
+ **Resolution:** See [SPEC-21: Middleware Architecture Refactor](../specs/SPEC-21-MIDDLEWARE-ARCHITECTURE.md)
38
+
39
+ ---
40
+
41
+ ## Related Documentation
42
+
43
+ - [SPEC-20: PubMed JSON Parsing Fix](../specs/SPEC-20-PUBMED-JSON-FIX.md)
44
+ - [SPEC-21: Middleware Architecture Refactor](../specs/SPEC-21-MIDDLEWARE-ARCHITECTURE.md)
45
+ - [P3: MS Framework Gaps](./p3-ms-framework-gaps.md)
docs/bugs/p3-ms-framework-gaps.md ADDED
@@ -0,0 +1,289 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # P3: Microsoft Agent Framework Gaps Analysis
2
+
3
+ **Date:** 2025-12-06
4
+ **Priority:** P3 (Nice-to-Have)
5
+ **Source:** Comparison with Microsoft Agent Framework v1.0.0b251204 (commit 8c6b12e)
6
+
7
+ ## Executive Summary
8
+
9
+ Comparison of DeepBoner's implementation against Microsoft Agent Framework reveals several architectural patterns we're missing. These are not bugs but opportunities for hardening and production-readiness.
10
+
11
+ ---
12
+
13
+ ## Gap 1: OpenTelemetry Observability (HIGH VALUE)
14
+
15
+ **What MS Framework Has:**
16
+ ```python
17
+ # observability.py - 1706 lines of comprehensive OTEL integration
18
+ from opentelemetry.trace import get_tracer, Span
19
+ from opentelemetry.metrics import get_meter, Histogram
20
+
21
+ @use_observability # Decorator for ChatClient
22
+ @use_agent_observability # Decorator for Agent
23
+
24
+ # Token usage histograms with bucket boundaries
25
+ TOKEN_USAGE_BUCKET_BOUNDARIES = (1, 4, 16, 64, 256, 1024, 4096, 16384, 65536, 262144, 1048576)
26
+
27
+ # Operation duration histograms
28
+ OPERATION_DURATION_BUCKET_BOUNDARIES = (0.01, 0.02, 0.04, 0.08, 0.16, ...)
29
+
30
+ # 80+ semantic span attributes (OtelAttr enum)
31
+ OtelAttr.GEN_AI_OPERATION_NAME
32
+ OtelAttr.GEN_AI_REQUEST_MODEL
33
+ OtelAttr.GEN_AI_USAGE_INPUT_TOKENS
34
+ OtelAttr.GEN_AI_USAGE_OUTPUT_TOKENS
35
+ ```
36
+
37
+ **What DeepBoner Has:**
38
+ - `structlog` for logging only
39
+ - No distributed tracing
40
+ - No metrics collection
41
+ - No token usage tracking
42
+
43
+ **Gap Impact:**
44
+ - Cannot trace requests across agents
45
+ - No token cost monitoring
46
+ - No performance profiling in production
47
+
48
+ **Recommended Fix:**
49
+ ```python
50
+ # Add optional OTEL support to orchestrator
51
+ # src/observability/__init__.py
52
+ from opentelemetry import trace
53
+ from opentelemetry.metrics import get_meter
54
+
55
+ def setup_observability():
56
+ """One-time setup for OpenTelemetry."""
57
+ ...
58
+
59
+ @contextmanager
60
+ def trace_agent_operation(name: str, attributes: dict):
61
+ """Context manager for tracing agent operations."""
62
+ ...
63
+ ```
64
+
65
+ ---
66
+
67
+ ## Gap 2: Middleware Pipelines (MEDIUM VALUE) - ADDRESSED IN ADR-001
68
+
69
+ > **NOTE:** This gap is being addressed in [ADR-001: Middleware Architecture Refactor](../architecture/adr-001-middleware-refactor.md)
70
+
71
+ **What MS Framework Has:**
72
+ ```python
73
+ # _middleware.py - Three types of middleware
74
+
75
+ class AgentMiddleware(ABC):
76
+ """Intercepts agent invocations."""
77
+ async def process(self, context: AgentRunContext, next): ...
78
+
79
+ class FunctionMiddleware(ABC):
80
+ """Intercepts tool/function calls."""
81
+ async def process(self, context: FunctionInvocationContext, next): ...
82
+
83
+ class ChatMiddleware(ABC):
84
+ """Intercepts chat client requests."""
85
+ async def process(self, context: ChatContext, next): ...
86
+
87
+ # Decorators for easy middleware creation
88
+ @agent_middleware
89
+ async def logging_middleware(context: AgentRunContext, next):
90
+ print(f"Before: {context.agent.name}")
91
+ await next(context)
92
+ print(f"After: {context.result}")
93
+
94
+ # Pipeline execution with terminate support
95
+ context.terminate = True # Stops pipeline early
96
+ ```
97
+
98
+ **What DeepBoner Has:**
99
+ - Uses MS decorators (`@use_chat_middleware`, `@use_observability`) βœ“
100
+ - BUT: No custom `ChatMiddleware` class implementations βœ—
101
+ - `src/middleware/` folder contains a workflow, not actual middleware βœ—
102
+
103
+ **ADR-001 Solution:**
104
+ 1. Rename `src/middleware/` β†’ `src/workflows/` (fix misleading name)
105
+ 2. Create proper `src/middleware/` with MS-pattern implementations:
106
+ - `RetryMiddleware(ChatMiddleware)` - fixes HuggingFace retry bug
107
+ - `TokenTrackingMiddleware(ChatMiddleware)` - enables cost monitoring
108
+ - `LoggingMiddleware(ChatMiddleware)` - structured request/response logs
109
+
110
+ ---
111
+
112
+ ## Gap 3: Thread/Conversation State Management (MEDIUM VALUE)
113
+
114
+ **What MS Framework Has:**
115
+ ```python
116
+ # _threads.py
117
+ class AgentThread:
118
+ """Maintains conversation state with serialization support."""
119
+
120
+ def __init__(self, service_thread_id=None, message_store=None):
121
+ ...
122
+
123
+ async def serialize(self) -> dict[str, Any]:
124
+ """Persist thread state."""
125
+ ...
126
+
127
+ @classmethod
128
+ async def deserialize(cls, state: dict) -> "AgentThread":
129
+ """Restore thread from persisted state."""
130
+ ...
131
+
132
+ class ChatMessageStoreProtocol(Protocol):
133
+ """Protocol for message storage backends."""
134
+ async def list_messages(self) -> list[ChatMessage]: ...
135
+ async def add_messages(self, messages: Sequence[ChatMessage]): ...
136
+ ```
137
+
138
+ **What DeepBoner Has:**
139
+ - `ResearchMemory` for research state only
140
+ - No conversation persistence
141
+ - No serialization/deserialization
142
+
143
+ **Gap Impact:**
144
+ - Cannot resume interrupted research sessions
145
+ - Cannot persist conversation history
146
+ - Cannot implement checkpointing
147
+
148
+ ---
149
+
150
+ ## Gap 4: Function/Tool Configuration (MEDIUM VALUE)
151
+
152
+ **What MS Framework Has:**
153
+ ```python
154
+ # _tools.py
155
+ class FunctionInvocationConfiguration:
156
+ """Configuration for function invocation in chat clients."""
157
+
158
+ enabled: bool = True
159
+ max_iterations: int = 40 # Maximum tool loop iterations
160
+ max_consecutive_errors_per_request: int = 3
161
+ terminate_on_unknown_calls: bool = False
162
+ include_detailed_errors: bool = False
163
+
164
+ class AIFunction:
165
+ """Wraps Python function for AI model calling."""
166
+ approval_mode: Literal["always_require", "never_require"]
167
+ max_invocations: int # Per-function invocation limit
168
+ max_invocation_exceptions: int # Per-function error limit
169
+ invocation_count: int # Tracks usage
170
+ ```
171
+
172
+ **What DeepBoner Has:**
173
+ - `max_iterations` in Settings
174
+ - Basic tool execution
175
+ - No per-tool configuration
176
+ - No approval mode
177
+
178
+ **Gap Impact:**
179
+ - Cannot limit individual tool usage
180
+ - No human-in-the-loop for dangerous tools
181
+ - No per-tool error budgets
182
+
183
+ ---
184
+
185
+ ## Gap 5: Context Provider Lifecycle (LOW VALUE)
186
+
187
+ **What MS Framework Has:**
188
+ ```python
189
+ # _memory.py
190
+ class ContextProvider(ABC):
191
+ """Abstract pattern for injecting context into agent invocations."""
192
+
193
+ async def invoking(self, agent, thread) -> str | None:
194
+ """Called before agent invocation. Returns context to inject."""
195
+ ...
196
+
197
+ async def invoked(self, agent, thread, result):
198
+ """Called after agent invocation."""
199
+ ...
200
+
201
+ async def thread_created(self, thread):
202
+ """Called when new thread is created."""
203
+ ...
204
+
205
+ class AggregateContextProvider(ContextProvider):
206
+ """Combines multiple context providers."""
207
+ ...
208
+ ```
209
+
210
+ **What DeepBoner Has:**
211
+ - `ResearchMemory` as simple state container
212
+ - No lifecycle hooks
213
+ - No provider aggregation
214
+
215
+ ---
216
+
217
+ ## Gap 6: Exception Granularity (LOW VALUE)
218
+
219
+ **What MS Framework Has:**
220
+ ```
221
+ AgentFrameworkException (base)
222
+ β”œβ”€β”€ AgentException
223
+ β”‚ β”œβ”€β”€ AgentExecutionException
224
+ β”‚ β”œβ”€β”€ AgentInitializationError
225
+ β”‚ └── AgentThreadException
226
+ β”œβ”€β”€ ChatClientException
227
+ β”‚ └── ChatClientInitializationError
228
+ β”œβ”€β”€ ServiceException
229
+ β”‚ β”œβ”€β”€ ServiceInitializationError
230
+ β”‚ β”œβ”€β”€ ServiceResponseException
231
+ β”‚ β”‚ β”œβ”€β”€ ServiceContentFilterException
232
+ β”‚ β”‚ β”œβ”€β”€ ServiceInvalidExecutionSettingsError
233
+ β”‚ β”‚ └── ServiceInvalidResponseError
234
+ β”‚ └── ServiceInvalidAuthError
235
+ β”œβ”€β”€ ToolException
236
+ β”‚ └── ToolExecutionException
237
+ β”œβ”€β”€ MiddlewareException
238
+ └── ContentError
239
+ ```
240
+
241
+ **What DeepBoner Has:**
242
+ ```
243
+ DeepBonerError (base)
244
+ β”œβ”€β”€ SearchError
245
+ β”‚ └── RateLimitError
246
+ β”œβ”€β”€ JudgeError
247
+ β”œβ”€β”€ ConfigurationError
248
+ └── EmbeddingError
249
+ ```
250
+
251
+ **Gap Impact:**
252
+ - Less precise error handling
253
+ - Harder to distinguish error sources
254
+ - Less informative error messages for users
255
+
256
+ ---
257
+
258
+ ## Prioritized Implementation Roadmap
259
+
260
+ ### Phase 1: Quick Wins (1-2 days)
261
+ 1. Add token tracking to orchestrator (no OTEL yet, just counters)
262
+ 2. Add `max_consecutive_errors` to tool execution
263
+
264
+ ### Phase 2: Medium Effort (3-5 days)
265
+ 1. Add basic middleware pattern to orchestrator
266
+ 2. Implement thread serialization for `ResearchMemory`
267
+
268
+ ### Phase 3: Full Production (1-2 weeks)
269
+ 1. Full OpenTelemetry integration
270
+ 2. Complete middleware pipeline
271
+ 3. Context provider lifecycle hooks
272
+
273
+ ---
274
+
275
+ ## Related Issues
276
+
277
+ - **P2 Hardening Issues:** `docs/bugs/p2-hardening-issues.md`
278
+ - **MS Framework Reference:** `reference_repos/microsoft-agent-framework/`
279
+
280
+ ---
281
+
282
+ ## Notes
283
+
284
+ These gaps are P3 because:
285
+ 1. DeepBoner is functional without them
286
+ 2. They're architectural improvements, not bug fixes
287
+ 3. User-facing functionality is not impacted
288
+
289
+ However, for production deployment serving multiple users, Gaps 1 (Observability) and 3 (Thread State) become P1/P2.
docs/specs/README.md ADDED
@@ -0,0 +1,146 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Tech Debt & Bug Fix Specs
2
+
3
+ **Status:** AWAITING SENIOR REVIEW
4
+ **Created:** 2025-12-06
5
+
6
+ ---
7
+
8
+ ## Overview
9
+
10
+ These specs consolidate all identified bugs, tech debt, and architectural issues into phased, implementable work packages. Each spec is designed to be a single PR with TDD, SOLID, DRY, Gang of Four principles.
11
+
12
+ **Implementation Order:** SPEC-20 β†’ SPEC-21 β†’ SPEC-22
13
+
14
+ ---
15
+
16
+ ## Spec Index
17
+
18
+ | Spec | Title | Priority | Effort | Status |
19
+ |------|-------|----------|--------|--------|
20
+ | [SPEC-20](./SPEC-20-PUBMED-JSON-FIX.md) | PubMed JSON Parsing Fix | P2 | 15 min | READY |
21
+ | [SPEC-21](./SPEC-21-MIDDLEWARE-ARCHITECTURE.md) | Middleware Architecture Refactor | P2 | 2 hours | READY |
22
+ | [SPEC-22](./SPEC-22-PROGRESS-BAR-REMOVAL.md) | Progress Bar Removal | P3 | 15 min | READY |
23
+
24
+ **Total Effort:** ~2.5 hours
25
+
26
+ ---
27
+
28
+ ## Why This Order?
29
+
30
+ ### SPEC-20 First (15 min)
31
+ - Quickest win
32
+ - Fixes a real crash bug
33
+ - Builds confidence before larger refactor
34
+ - Single file, single PR
35
+
36
+ ### SPEC-21 Second (2 hours)
37
+ - The big architectural fix
38
+ - Renames confusing folder
39
+ - Implements proper MS framework patterns
40
+ - Fixes HuggingFace retry bug THE RIGHT WAY
41
+ - Adds token tracking
42
+
43
+ ### SPEC-22 Last (15 min)
44
+ - Cosmetic only
45
+ - Can be deferred if needed
46
+ - Easy cleanup
47
+
48
+ ---
49
+
50
+ ## What These Specs Consolidate
51
+
52
+ These specs replace the scattered documentation in:
53
+
54
+ | Old Location | Now Covered By |
55
+ |--------------|----------------|
56
+ | `docs/bugs/p2-hardening-issues.md` Issue 1 | SPEC-20 |
57
+ | `docs/bugs/p2-hardening-issues.md` Issue 2 | SPEC-21 |
58
+ | `docs/architecture/adr-001-middleware-refactor.md` | SPEC-21 |
59
+ | `docs/bugs/p3-progress-bar-positioning.md` | SPEC-22 |
60
+
61
+ ---
62
+
63
+ ## What's NOT In These Specs (Deferred P3)
64
+
65
+ The following are documented but deferred for later:
66
+
67
+ 1. **OpenTelemetry observability** - Nice to have, not blocking
68
+ 2. **Thread state serialization** - Nice to have, not blocking
69
+ 3. **ResearchMemory locks** - Not a bug today (sequential execution)
70
+ 4. **Error path cleanup** - Minor resource leakage, GC handles it
71
+ 5. **Per-tool configuration** - Nice to have
72
+ 6. **Context provider lifecycle** - Nice to have
73
+
74
+ These remain documented in `docs/bugs/p3-ms-framework-gaps.md` for future work.
75
+
76
+ ---
77
+
78
+ ## Implementation Protocol
79
+
80
+ For each spec:
81
+
82
+ 1. **Read the spec** - Understand the problem and solution
83
+ 2. **TDD** - Write failing test first
84
+ 3. **Implement** - Minimal code to pass tests
85
+ 4. **Run `make check`** - Lint + typecheck + test
86
+ 5. **Commit** - Single commit per spec
87
+ 6. **PR** - One PR per spec with spec number in title
88
+
89
+ ---
90
+
91
+ ## Commit Message Format
92
+
93
+ ```
94
+ fix: PubMed JSON parsing (SPEC-20)
95
+
96
+ Moves JSON parsing inside try/except block to handle API
97
+ maintenance pages gracefully. Adds JSONDecodeError handling.
98
+
99
+ Fixes: production crash on PubMed maintenance pages
100
+ ```
101
+
102
+ ```
103
+ refactor: middleware architecture (SPEC-21)
104
+
105
+ - Renames src/middleware β†’ src/workflows (accurate naming)
106
+ - Creates proper src/middleware with ChatMiddleware implementations
107
+ - Implements RetryMiddleware (fixes HuggingFace 429 crashes)
108
+ - Implements TokenTrackingMiddleware (enables cost monitoring)
109
+ ```
110
+
111
+ ```
112
+ fix: remove progress bar overlap (SPEC-22)
113
+
114
+ Removes gr.Progress() from research_agent function.
115
+ Gradio's Progress is incompatible with ChatInterface.
116
+ Emoji status messages in chat output are retained.
117
+ ```
118
+
119
+ ---
120
+
121
+ ## Senior Review Checklist
122
+
123
+ Before implementation, please verify:
124
+
125
+ - [ ] SPEC-20: Fix approach is correct (move into try/except)
126
+ - [ ] SPEC-21: MS middleware pattern is used correctly
127
+ - [ ] SPEC-21: RetryMiddleware implementation follows framework conventions
128
+ - [ ] SPEC-21: Folder rename won't break anything else
129
+ - [ ] SPEC-22: Removing gr.Progress() is the right fix (vs CSS hack)
130
+ - [ ] Order of implementation makes sense
131
+ - [ ] Nothing critical is missing from these specs
132
+
133
+ ---
134
+
135
+ ## After Implementation
136
+
137
+ Once all specs are implemented:
138
+
139
+ 1. Archive old docs:
140
+ - `docs/bugs/p2-hardening-issues.md` β†’ Mark as RESOLVED
141
+ - `docs/architecture/adr-001-middleware-refactor.md` β†’ Delete or archive
142
+ - `docs/bugs/p3-progress-bar-positioning.md` β†’ Mark as RESOLVED
143
+
144
+ 2. Update `docs/bugs/active-bugs.md` to reflect completed fixes
145
+
146
+ 3. Consider v0.2.0 release with these fixes
docs/specs/SPEC-20-PUBMED-JSON-FIX.md ADDED
@@ -0,0 +1,129 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # SPEC-20: PubMed JSON Parsing Fix
2
+
3
+ **Status:** READY FOR IMPLEMENTATION
4
+ **Priority:** P2 (Critical - causes crashes)
5
+ **Effort:** 15 minutes
6
+ **PR Scope:** Single file fix
7
+
8
+ ---
9
+
10
+ ## Problem Statement
11
+
12
+ The PubMed search tool crashes when the API returns non-JSON responses (maintenance pages, error pages). The JSON parsing happens **outside** the try/except block.
13
+
14
+ **File:** `src/tools/pubmed.py:88`
15
+ **Crash Type:** `json.JSONDecodeError`
16
+ **User Impact:** Entire research workflow crashes
17
+
18
+ ---
19
+
20
+ ## Current Code (BROKEN)
21
+
22
+ ```python
23
+ # src/tools/pubmed.py - Lines ~80-95
24
+ try:
25
+ search_resp = await client.get(
26
+ f"{NCBI_BASE_URL}/esearch.fcgi",
27
+ params=search_params,
28
+ )
29
+ search_resp.raise_for_status()
30
+ except httpx.HTTPStatusError as e:
31
+ logger.warning("PubMed search failed", status=e.response.status_code)
32
+ return []
33
+
34
+ # ↓↓↓ THIS IS OUTSIDE THE TRY BLOCK ↓↓↓
35
+ search_data = search_resp.json() # CRASHES HERE on maintenance pages
36
+ pmids = search_data.get("esearchresult", {}).get("idlist", [])
37
+ ```
38
+
39
+ ---
40
+
41
+ ## Required Fix
42
+
43
+ Move JSON parsing inside try/except and add `JSONDecodeError` handling:
44
+
45
+ ```python
46
+ # src/tools/pubmed.py - Fixed version
47
+ try:
48
+ search_resp = await client.get(
49
+ f"{NCBI_BASE_URL}/esearch.fcgi",
50
+ params=search_params,
51
+ )
52
+ search_resp.raise_for_status()
53
+ search_data = search_resp.json() # ← MOVED INSIDE TRY
54
+ except httpx.HTTPStatusError as e:
55
+ logger.warning("PubMed search failed", status=e.response.status_code)
56
+ return []
57
+ except json.JSONDecodeError as e:
58
+ logger.warning(
59
+ "PubMed returned invalid JSON (possible maintenance page)",
60
+ error=str(e),
61
+ response_preview=search_resp.text[:200] if search_resp else "N/A",
62
+ )
63
+ return []
64
+
65
+ pmids = search_data.get("esearchresult", {}).get("idlist", [])
66
+ ```
67
+
68
+ ---
69
+
70
+ ## Implementation Checklist
71
+
72
+ - [ ] Add `import json` at top of file (if not present)
73
+ - [ ] Move `search_resp.json()` inside try block (line ~88)
74
+ - [ ] Add `except json.JSONDecodeError` handler
75
+ - [ ] Log warning with response preview for debugging
76
+ - [ ] Return empty list (graceful degradation)
77
+ - [ ] Write unit test: mock response with HTML content
78
+ - [ ] Run `make check` (lint + typecheck + test)
79
+
80
+ ---
81
+
82
+ ## Test Case
83
+
84
+ ```python
85
+ # tests/unit/tools/test_pubmed.py
86
+ import pytest
87
+ from unittest.mock import AsyncMock, MagicMock
88
+ from src.tools.pubmed import search_pubmed
89
+
90
+ @pytest.mark.asyncio
91
+ async def test_pubmed_handles_maintenance_page():
92
+ """PubMed should gracefully handle non-JSON responses."""
93
+ # Mock httpx client returning HTML maintenance page
94
+ mock_response = MagicMock()
95
+ mock_response.status_code = 200
96
+ mock_response.text = "<html><body>Service Temporarily Unavailable</body></html>"
97
+ mock_response.json.side_effect = json.JSONDecodeError("Expecting value", "", 0)
98
+ mock_response.raise_for_status = MagicMock()
99
+
100
+ mock_client = AsyncMock()
101
+ mock_client.get.return_value = mock_response
102
+
103
+ # Should return empty list, not crash
104
+ result = await search_pubmed("test query", client=mock_client)
105
+ assert result == []
106
+ ```
107
+
108
+ ---
109
+
110
+ ## Acceptance Criteria
111
+
112
+ 1. `search_pubmed()` returns `[]` when API returns HTML
113
+ 2. Warning logged with response preview
114
+ 3. No `JSONDecodeError` propagates to caller
115
+ 4. All existing tests pass
116
+ 5. `make check` passes
117
+
118
+ ---
119
+
120
+ ## Dependencies
121
+
122
+ None. This is a standalone fix.
123
+
124
+ ---
125
+
126
+ ## Notes
127
+
128
+ - This same pattern may exist in `clinicaltrials.py` and `europepmc.py` - check after this fix
129
+ - Do NOT over-engineer. Single fix, single PR.
docs/specs/SPEC-21-MIDDLEWARE-ARCHITECTURE.md ADDED
@@ -0,0 +1,445 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # SPEC-21: Middleware Architecture Refactor
2
+
3
+ **Status:** READY FOR IMPLEMENTATION
4
+ **Priority:** P2 (Architectural hygiene + fixes HuggingFace retry bug)
5
+ **Effort:** 2 hours
6
+ **PR Scope:** Folder rename + new middleware implementations
7
+
8
+ ---
9
+
10
+ ## Problem Statement
11
+
12
+ 1. **Misleading folder name:** `src/middleware/` contains a workflow (`SubIterationMiddleware`), not interceptor middleware
13
+ 2. **Missing retry logic:** `HuggingFaceChatClient` has no retry on 429/transient errors (P2 bug)
14
+ 3. **No token tracking:** Cannot monitor API costs
15
+ 4. **Not using MS framework patterns:** We use decorators but not `ChatMiddleware` base classes
16
+
17
+ ---
18
+
19
+ ## Current State (WRONG)
20
+
21
+ ```
22
+ src/
23
+ β”œβ”€β”€ middleware/ ← MISLEADING: contains workflow
24
+ β”‚ β”œβ”€β”€ __init__.py
25
+ β”‚ β”œβ”€β”€ sub_iteration.py ← This is a WORKFLOW, not middleware
26
+ β”‚ └── .gitkeep
27
+ ```
28
+
29
+ **HuggingFace client has no retry:**
30
+ ```python
31
+ # src/clients/huggingface.py:263-265
32
+ except Exception as e:
33
+ logger.error("HuggingFace API error", error=str(e))
34
+ raise # ← No retry, crashes on 429
35
+ ```
36
+
37
+ ---
38
+
39
+ ## Target State (CORRECT)
40
+
41
+ ```
42
+ src/
43
+ β”œβ”€β”€ workflows/ ← RENAMED: now accurate
44
+ β”‚ β”œβ”€β”€ __init__.py
45
+ β”‚ └── sub_iteration.py
46
+ β”‚
47
+ β”œβ”€β”€ middleware/ ← NEW: actual MS-pattern middleware
48
+ β”‚ β”œβ”€β”€ __init__.py
49
+ β”‚ β”œβ”€β”€ retry.py ← RetryMiddleware(ChatMiddleware)
50
+ β”‚ └── token_tracking.py ← TokenTrackingMiddleware(ChatMiddleware)
51
+ ```
52
+
53
+ ---
54
+
55
+ ## Implementation Steps
56
+
57
+ ### Step 1: Rename Folder (5 min)
58
+
59
+ ```bash
60
+ # Rename middleware β†’ workflows
61
+ git mv src/middleware src/workflows
62
+ ```
63
+
64
+ ### Step 2: Update Import (5 min)
65
+
66
+ ```python
67
+ # src/orchestrators/hierarchical.py - Line ~15
68
+ # BEFORE:
69
+ from src.middleware.sub_iteration import SubIterationMiddleware, SubIterationTeam
70
+
71
+ # AFTER:
72
+ from src.workflows.sub_iteration import SubIterationMiddleware, SubIterationTeam
73
+ ```
74
+
75
+ ### Step 3: Create New Middleware Package (10 min)
76
+
77
+ ```python
78
+ # src/middleware/__init__.py
79
+ """Microsoft Agent Framework middleware implementations.
80
+
81
+ These are interceptor-pattern middleware that wrap chat client calls.
82
+ They are NOT workflows - see src/workflows/ for orchestration patterns.
83
+ """
84
+
85
+ from src.middleware.retry import RetryMiddleware
86
+ from src.middleware.token_tracking import TokenTrackingMiddleware
87
+
88
+ __all__ = ["RetryMiddleware", "TokenTrackingMiddleware"]
89
+ ```
90
+
91
+ ### Step 4: Implement RetryMiddleware (30 min)
92
+
93
+ ```python
94
+ # src/middleware/retry.py
95
+ """Retry middleware for chat clients with exponential backoff."""
96
+
97
+ import asyncio
98
+ from typing import Any
99
+
100
+ import structlog
101
+ from agent_framework._middleware import ChatContext, ChatMiddleware
102
+
103
+ logger = structlog.get_logger()
104
+
105
+
106
+ class RetryMiddleware(ChatMiddleware):
107
+ """Retries failed chat requests with exponential backoff.
108
+
109
+ This middleware intercepts chat client calls and retries on transient
110
+ errors (rate limits, timeouts, server errors).
111
+
112
+ Attributes:
113
+ max_attempts: Maximum number of attempts (default: 3)
114
+ min_wait: Minimum wait between retries in seconds (default: 1.0)
115
+ max_wait: Maximum wait between retries in seconds (default: 10.0)
116
+ retryable_status_codes: HTTP status codes to retry (default: 429, 500, 502, 503, 504)
117
+ """
118
+
119
+ def __init__(
120
+ self,
121
+ max_attempts: int = 3,
122
+ min_wait: float = 1.0,
123
+ max_wait: float = 10.0,
124
+ retryable_status_codes: tuple[int, ...] = (429, 500, 502, 503, 504),
125
+ ) -> None:
126
+ self.max_attempts = max_attempts
127
+ self.min_wait = min_wait
128
+ self.max_wait = max_wait
129
+ self.retryable_status_codes = retryable_status_codes
130
+
131
+ def _is_retryable(self, error: Exception) -> bool:
132
+ """Check if error is retryable."""
133
+ # Check for httpx status errors
134
+ if hasattr(error, "response") and hasattr(error.response, "status_code"):
135
+ return error.response.status_code in self.retryable_status_codes
136
+
137
+ # Check for timeout errors
138
+ error_name = type(error).__name__.lower()
139
+ if "timeout" in error_name:
140
+ return True
141
+
142
+ # Check for connection errors
143
+ if "connection" in error_name:
144
+ return True
145
+
146
+ return False
147
+
148
+ def _calculate_wait(self, attempt: int) -> float:
149
+ """Calculate wait time with exponential backoff."""
150
+ wait = self.min_wait * (2 ** attempt)
151
+ return min(wait, self.max_wait)
152
+
153
+ async def process(self, context: ChatContext, next: Any) -> None:
154
+ """Process the chat request with retry logic."""
155
+ last_error: Exception | None = None
156
+
157
+ for attempt in range(self.max_attempts):
158
+ try:
159
+ await next(context)
160
+ return # Success - exit retry loop
161
+
162
+ except Exception as e:
163
+ last_error = e
164
+
165
+ if not self._is_retryable(e):
166
+ logger.warning(
167
+ "Non-retryable error",
168
+ error=str(e),
169
+ error_type=type(e).__name__,
170
+ )
171
+ raise # Don't retry non-retryable errors
172
+
173
+ if attempt < self.max_attempts - 1:
174
+ wait_time = self._calculate_wait(attempt)
175
+ logger.info(
176
+ "Retrying after error",
177
+ attempt=attempt + 1,
178
+ max_attempts=self.max_attempts,
179
+ wait_seconds=wait_time,
180
+ error=str(e),
181
+ )
182
+ await asyncio.sleep(wait_time)
183
+
184
+ # All retries exhausted
185
+ logger.error(
186
+ "All retry attempts failed",
187
+ max_attempts=self.max_attempts,
188
+ last_error=str(last_error),
189
+ )
190
+ if last_error:
191
+ raise last_error
192
+ ```
193
+
194
+ ### Step 5: Implement TokenTrackingMiddleware (20 min)
195
+
196
+ ```python
197
+ # src/middleware/token_tracking.py
198
+ """Token tracking middleware for monitoring API usage."""
199
+
200
+ from contextvars import ContextVar
201
+ from typing import Any
202
+
203
+ import structlog
204
+ from agent_framework._middleware import ChatContext, ChatMiddleware
205
+
206
+ logger = structlog.get_logger()
207
+
208
+ # ContextVar for per-request token tracking
209
+ _request_tokens: ContextVar[dict[str, int]] = ContextVar(
210
+ "request_tokens",
211
+ default={"input": 0, "output": 0},
212
+ )
213
+
214
+
215
+ class TokenTrackingMiddleware(ChatMiddleware):
216
+ """Tracks token usage across chat requests.
217
+
218
+ This middleware logs token usage after each chat completion
219
+ and maintains running totals for the session.
220
+
221
+ Usage metrics are logged via structlog for observability.
222
+ """
223
+
224
+ def __init__(self) -> None:
225
+ self.total_input_tokens = 0
226
+ self.total_output_tokens = 0
227
+ self.request_count = 0
228
+
229
+ async def process(self, context: ChatContext, next: Any) -> None:
230
+ """Process request and track token usage."""
231
+ await next(context)
232
+
233
+ # Extract usage from response if available
234
+ if context.result is None:
235
+ return
236
+
237
+ usage = None
238
+
239
+ # Try to get usage from response
240
+ if hasattr(context.result, "usage"):
241
+ usage = context.result.usage
242
+ elif hasattr(context.result, "messages") and context.result.messages:
243
+ # Check first message for usage metadata
244
+ msg = context.result.messages[0]
245
+ if hasattr(msg, "metadata") and msg.metadata:
246
+ usage = msg.metadata.get("usage")
247
+
248
+ if usage:
249
+ input_tokens = usage.get("input_tokens", 0) or usage.get("prompt_tokens", 0)
250
+ output_tokens = usage.get("output_tokens", 0) or usage.get("completion_tokens", 0)
251
+
252
+ self.total_input_tokens += input_tokens
253
+ self.total_output_tokens += output_tokens
254
+ self.request_count += 1
255
+
256
+ logger.info(
257
+ "Token usage",
258
+ request_input=input_tokens,
259
+ request_output=output_tokens,
260
+ total_input=self.total_input_tokens,
261
+ total_output=self.total_output_tokens,
262
+ total_requests=self.request_count,
263
+ )
264
+
265
+
266
+ def get_token_stats() -> dict[str, int]:
267
+ """Get current request's token usage."""
268
+ return _request_tokens.get().copy()
269
+ ```
270
+
271
+ ### Step 6: Apply Middleware to HuggingFaceChatClient (15 min)
272
+
273
+ ```python
274
+ # src/clients/huggingface.py - Update __init__
275
+
276
+ from src.middleware.retry import RetryMiddleware
277
+ from src.middleware.token_tracking import TokenTrackingMiddleware
278
+
279
+ @use_function_invocation
280
+ @use_observability
281
+ @use_chat_middleware
282
+ class HuggingFaceChatClient(BaseChatClient):
283
+ def __init__(
284
+ self,
285
+ model_id: str | None = None,
286
+ api_key: str | None = None,
287
+ **kwargs: Any,
288
+ ) -> None:
289
+ # Create middleware instances
290
+ middleware = [
291
+ RetryMiddleware(max_attempts=3, min_wait=1.0, max_wait=10.0),
292
+ TokenTrackingMiddleware(),
293
+ ]
294
+
295
+ super().__init__(middleware=middleware, **kwargs)
296
+ # ... rest of __init__
297
+ ```
298
+
299
+ ### Step 7: Update Tests (20 min)
300
+
301
+ ```python
302
+ # tests/unit/middleware/test_retry.py
303
+ import pytest
304
+ from unittest.mock import AsyncMock, MagicMock
305
+ from src.middleware.retry import RetryMiddleware
306
+
307
+
308
+ @pytest.mark.asyncio
309
+ async def test_retry_middleware_succeeds_first_try():
310
+ """RetryMiddleware should pass through on success."""
311
+ middleware = RetryMiddleware(max_attempts=3)
312
+ context = MagicMock()
313
+ next_fn = AsyncMock()
314
+
315
+ await middleware.process(context, next_fn)
316
+
317
+ next_fn.assert_called_once_with(context)
318
+
319
+
320
+ @pytest.mark.asyncio
321
+ async def test_retry_middleware_retries_on_429():
322
+ """RetryMiddleware should retry on 429 rate limit."""
323
+ middleware = RetryMiddleware(max_attempts=3, min_wait=0.01)
324
+ context = MagicMock()
325
+
326
+ # First two calls fail with 429, third succeeds
327
+ call_count = 0
328
+
329
+ async def mock_next(ctx):
330
+ nonlocal call_count
331
+ call_count += 1
332
+ if call_count < 3:
333
+ error = Exception("Rate limited")
334
+ error.response = MagicMock(status_code=429)
335
+ raise error
336
+
337
+ await middleware.process(context, mock_next)
338
+ assert call_count == 3
339
+
340
+
341
+ @pytest.mark.asyncio
342
+ async def test_retry_middleware_raises_after_max_attempts():
343
+ """RetryMiddleware should raise after max attempts exhausted."""
344
+ middleware = RetryMiddleware(max_attempts=2, min_wait=0.01)
345
+ context = MagicMock()
346
+
347
+ async def always_fails(ctx):
348
+ error = Exception("Always fails")
349
+ error.response = MagicMock(status_code=500)
350
+ raise error
351
+
352
+ with pytest.raises(Exception, match="Always fails"):
353
+ await middleware.process(context, always_fails)
354
+ ```
355
+
356
+ ---
357
+
358
+ ## Implementation Checklist
359
+
360
+ ### Phase 1: Folder Rename
361
+ - [ ] `git mv src/middleware src/workflows`
362
+ - [ ] Update import in `src/orchestrators/hierarchical.py`
363
+ - [ ] Update `src/workflows/__init__.py` docstring
364
+ - [ ] Run `make check` - verify no import errors
365
+
366
+ ### Phase 2: Create Middleware Package
367
+ - [ ] Create `src/middleware/__init__.py`
368
+ - [ ] Create `src/middleware/retry.py` with `RetryMiddleware`
369
+ - [ ] Create `src/middleware/token_tracking.py` with `TokenTrackingMiddleware`
370
+ - [ ] Run `make check` - verify no syntax errors
371
+
372
+ ### Phase 3: Apply to Client
373
+ - [ ] Update `src/clients/huggingface.py` to use middleware
374
+ - [ ] Test manually: `uv run python -c "from src.clients.huggingface import HuggingFaceChatClient; print('OK')"`
375
+ - [ ] Run `make check`
376
+
377
+ ### Phase 4: Tests
378
+ - [ ] Create `tests/unit/middleware/__init__.py`
379
+ - [ ] Create `tests/unit/middleware/test_retry.py`
380
+ - [ ] Create `tests/unit/middleware/test_token_tracking.py`
381
+ - [ ] Run `make test` - all tests pass
382
+
383
+ ### Phase 5: Cleanup
384
+ - [ ] Remove `.gitkeep` from `src/workflows/` if present
385
+ - [ ] Run full `make check`
386
+ - [ ] Commit with message: "refactor: implement proper middleware architecture (SPEC-21)"
387
+
388
+ ---
389
+
390
+ ## Acceptance Criteria
391
+
392
+ 1. `src/middleware/` folder contains actual `ChatMiddleware` implementations
393
+ 2. `src/workflows/` folder contains `SubIterationMiddleware` (renamed from middleware)
394
+ 3. `HuggingFaceChatClient` uses `RetryMiddleware` - no more crashes on 429
395
+ 4. Token usage is logged via `TokenTrackingMiddleware`
396
+ 5. All existing tests pass
397
+ 6. `make check` passes
398
+ 7. No import errors anywhere in codebase
399
+
400
+ ---
401
+
402
+ ## Dependencies
403
+
404
+ - **SPEC-20** should be done first (simpler, builds confidence)
405
+ - Requires `agent-framework-core` package (already installed)
406
+
407
+ ---
408
+
409
+ ## Gotchas & Nuances
410
+
411
+ 1. **MS middleware signature:** The `process` method takes `(context, next)` where `next` is a callable
412
+ 2. **Middleware order matters:** Retry should be FIRST so it wraps everything
413
+ 3. **ContextVar for token tracking:** Use ContextVar for per-request isolation
414
+ 4. **Don't break HierarchicalOrchestrator:** It uses `SubIterationMiddleware` - update import path
415
+ 5. **BaseChatClient constructor:** Check if it accepts `middleware=` parameter - may need to register differently
416
+
417
+ ---
418
+
419
+ ## Testing the Fix
420
+
421
+ After implementation, verify 429 handling:
422
+
423
+ ```python
424
+ # Manual test
425
+ import asyncio
426
+ from src.clients.huggingface import HuggingFaceChatClient
427
+ from agent_framework import ChatMessage, ChatOptions
428
+
429
+ async def test():
430
+ client = HuggingFaceChatClient()
431
+ # Make rapid requests to trigger rate limit
432
+ for i in range(10):
433
+ try:
434
+ resp = await client.get_response(
435
+ messages=[ChatMessage(role="user", text="Hello")],
436
+ chat_options=ChatOptions(),
437
+ )
438
+ print(f"Request {i}: OK")
439
+ except Exception as e:
440
+ print(f"Request {i}: {e}")
441
+
442
+ asyncio.run(test())
443
+ ```
444
+
445
+ Should see retry logs instead of immediate crashes.
docs/specs/SPEC-22-PROGRESS-BAR-REMOVAL.md ADDED
@@ -0,0 +1,127 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # SPEC-22: Progress Bar Removal
2
+
3
+ **Status:** READY FOR IMPLEMENTATION
4
+ **Priority:** P3 (Cosmetic UX fix)
5
+ **Effort:** 15 minutes
6
+ **PR Scope:** Single file fix
7
+
8
+ ---
9
+
10
+ ## Problem Statement
11
+
12
+ The `gr.Progress()` bar conflicts with Gradio's `ChatInterface`, causing visual glitches:
13
+ - Progress bar "floats" in the middle of chat output
14
+ - Text overlaps with progress bar
15
+ - Looks unprofessional
16
+
17
+ **Root Cause:** `gr.Progress()` is designed for `gr.Interface`, not `ChatInterface`. It's a known Gradio limitation.
18
+
19
+ ---
20
+
21
+ ## Current Code (BROKEN)
22
+
23
+ ```python
24
+ # src/app.py - research_agent function
25
+ async def research_agent(
26
+ message: str,
27
+ history: list[dict[str, Any]],
28
+ domain: str = "sexual_health",
29
+ api_key: str = "",
30
+ api_key_state: str = "",
31
+ progress: gr.Progress = gr.Progress(), # ← PROBLEM: Causes visual glitches
32
+ ) -> AsyncGenerator[str, None]:
33
+ ...
34
+ if event.type == "started":
35
+ progress(0, desc="Starting research...") # ← These cause overlap
36
+ elif event.type == "progress":
37
+ progress(p, desc=event.message)
38
+ ```
39
+
40
+ ---
41
+
42
+ ## Required Fix
43
+
44
+ Remove `gr.Progress()` entirely. We already have emoji status messages in chat output.
45
+
46
+ ```python
47
+ # src/app.py - Fixed version
48
+ async def research_agent(
49
+ message: str,
50
+ history: list[dict[str, Any]],
51
+ domain: str = "sexual_health",
52
+ api_key: str = "",
53
+ api_key_state: str = "",
54
+ # REMOVED: progress: gr.Progress = gr.Progress(),
55
+ ) -> AsyncGenerator[str, None]:
56
+ ...
57
+ # REMOVED: All progress(...) calls
58
+
59
+ # KEEP: Emoji status messages are already being yielded
60
+ # These work great with ChatInterface:
61
+ # yield "⏱️ **PROGRESS**: Round 1/5 (~3m 0s remaining)"
62
+ ```
63
+
64
+ ---
65
+
66
+ ## Implementation Checklist
67
+
68
+ - [ ] Open `src/app.py`
69
+ - [ ] Remove `progress: gr.Progress = gr.Progress()` from `research_agent` signature
70
+ - [ ] Remove all `progress(...)` calls inside `research_agent`
71
+ - [ ] Verify emoji status yields are still present (they should be)
72
+ - [ ] Run `uv run python -c "from src.app import create_demo; print('OK')"`
73
+ - [ ] Run `make check`
74
+ - [ ] Test locally: `uv run python src/app.py` and verify no floating progress bar
75
+
76
+ ---
77
+
78
+ ## What We Keep
79
+
80
+ The emoji status messages in chat output:
81
+
82
+ ```
83
+ ⏱️ **PROGRESS**: Round 1/5 (~3m 0s remaining)
84
+ πŸ”¬ **Step 2: SearchAgent** - Searching for evidence...
85
+ βœ… **COMPLETE**: Research finished in 45 seconds
86
+ ```
87
+
88
+ These are yielded directly to chat and work perfectly with `ChatInterface`.
89
+
90
+ ---
91
+
92
+ ## Acceptance Criteria
93
+
94
+ 1. No `gr.Progress()` in `research_agent` function signature
95
+ 2. No `progress(...)` calls in `research_agent` function body
96
+ 3. Emoji status messages still appear in chat output
97
+ 4. No floating/overlapping progress bar in UI
98
+ 5. `make check` passes
99
+
100
+ ---
101
+
102
+ ## Dependencies
103
+
104
+ None. This is a standalone cosmetic fix.
105
+
106
+ ---
107
+
108
+ ## Testing
109
+
110
+ ```bash
111
+ # Start local server
112
+ uv run python src/app.py
113
+
114
+ # In browser:
115
+ # 1. Submit a research query
116
+ # 2. Verify NO floating progress bar appears
117
+ # 3. Verify emoji status messages DO appear in chat
118
+ # 4. Verify chat messages don't have visual glitches
119
+ ```
120
+
121
+ ---
122
+
123
+ ## Notes
124
+
125
+ - This is the recommended fix from Gradio's own documentation
126
+ - `ChatInterface.show_progress="minimal"` (default) still shows a spinner, which is fine
127
+ - If we need a real progress bar later, we'd need to refactor to `gr.Blocks` wrapper