Commit
Β·
2ac49c3
1
Parent(s):
e6f0fda
docs: Add tech debt specs and architecture documentation
Browse filesAdds comprehensive documentation for phased bug fixes:
- SPEC-20: PubMed JSON Parsing Fix (P2)
- SPEC-21: Middleware Architecture Refactor (P2)
- SPEC-22: Progress Bar Removal (P3)
Also adds:
- docs/specs/README.md - Master index with implementation order
- docs/bugs/p2-hardening-issues.md - P2 issue tracker
- docs/bugs/p3-ms-framework-gaps.md - MS framework gap analysis
- docs/architecture/adr-001-middleware-refactor.md - ADR for middleware
- docs/architecture/ms-framework-usage-report.md - What we use from MS
- docs/architecture/adr-001-middleware-refactor.md +54 -0
- docs/architecture/ms-framework-usage-report.md +68 -0
- docs/bugs/p2-hardening-issues.md +45 -0
- docs/bugs/p3-ms-framework-gaps.md +289 -0
- docs/specs/README.md +146 -0
- docs/specs/SPEC-20-PUBMED-JSON-FIX.md +129 -0
- docs/specs/SPEC-21-MIDDLEWARE-ARCHITECTURE.md +445 -0
- docs/specs/SPEC-22-PROGRESS-BAR-REMOVAL.md +127 -0
docs/architecture/adr-001-middleware-refactor.md
ADDED
|
@@ -0,0 +1,54 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# ADR-001: Middleware Architecture Refactor
|
| 2 |
+
|
| 3 |
+
**Status:** ACCEPTED
|
| 4 |
+
**Date:** 2025-12-06
|
| 5 |
+
**Decision Makers:** Development Team
|
| 6 |
+
|
| 7 |
+
---
|
| 8 |
+
|
| 9 |
+
## Context
|
| 10 |
+
|
| 11 |
+
The current `src/middleware/` folder is misleadingly named. It contains `SubIterationMiddleware`, which implements a workflow pattern (teamβjudge loop), not the interceptor middleware pattern used by Microsoft Agent Framework.
|
| 12 |
+
|
| 13 |
+
Additionally, we're missing proper middleware implementations for:
|
| 14 |
+
- Retry logic on transient errors (429, 500, 502, 503, 504)
|
| 15 |
+
- Token usage tracking for cost monitoring
|
| 16 |
+
|
| 17 |
+
---
|
| 18 |
+
|
| 19 |
+
## Decision
|
| 20 |
+
|
| 21 |
+
1. **Rename `src/middleware/` to `src/workflows/`** to accurately reflect what it contains
|
| 22 |
+
2. **Create new `src/middleware/` with proper MS-pattern middleware:**
|
| 23 |
+
- `RetryMiddleware(ChatMiddleware)` - exponential backoff retry
|
| 24 |
+
- `TokenTrackingMiddleware(ChatMiddleware)` - token usage logging
|
| 25 |
+
|
| 26 |
+
---
|
| 27 |
+
|
| 28 |
+
## Consequences
|
| 29 |
+
|
| 30 |
+
### Positive
|
| 31 |
+
- Clearer codebase organization
|
| 32 |
+
- Proper use of MS Agent Framework patterns
|
| 33 |
+
- HuggingFace 429 crashes will be handled gracefully
|
| 34 |
+
- Token usage will be visible for cost monitoring
|
| 35 |
+
|
| 36 |
+
### Negative
|
| 37 |
+
- Requires updating imports in `src/orchestrators/hierarchical.py`
|
| 38 |
+
- One-time migration effort
|
| 39 |
+
|
| 40 |
+
### Neutral
|
| 41 |
+
- Aligns with Microsoft Agent Framework conventions
|
| 42 |
+
|
| 43 |
+
---
|
| 44 |
+
|
| 45 |
+
## Implementation
|
| 46 |
+
|
| 47 |
+
See [SPEC-21: Middleware Architecture Refactor](../specs/SPEC-21-MIDDLEWARE-ARCHITECTURE.md) for detailed implementation steps.
|
| 48 |
+
|
| 49 |
+
---
|
| 50 |
+
|
| 51 |
+
## References
|
| 52 |
+
|
| 53 |
+
- Microsoft Agent Framework `_middleware.py`
|
| 54 |
+
- [P2 Hardening Issues](../bugs/p2-hardening-issues.md) Issue 2 & 3
|
docs/architecture/ms-framework-usage-report.md
ADDED
|
@@ -0,0 +1,68 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Microsoft Agent Framework Usage Report
|
| 2 |
+
|
| 3 |
+
**Date:** 2025-12-06
|
| 4 |
+
**Framework Version:** agent-framework-core==1.0.0b251204
|
| 5 |
+
|
| 6 |
+
---
|
| 7 |
+
|
| 8 |
+
## What We Use From MS Framework (pip installed)
|
| 9 |
+
|
| 10 |
+
### Core Classes
|
| 11 |
+
- `BaseChatClient` - Base class for chat clients
|
| 12 |
+
- `ChatMessage`, `ChatRole` - Message types
|
| 13 |
+
- `ChatOptions` - Request configuration
|
| 14 |
+
- `ChatAgent` - Agent base class
|
| 15 |
+
|
| 16 |
+
### Decorators (Applied to HuggingFaceChatClient)
|
| 17 |
+
- `@use_function_invocation` - Enables tool/function calling
|
| 18 |
+
- `@use_observability` - Adds OTEL tracing hooks
|
| 19 |
+
- `@use_chat_middleware` - Enables middleware pipeline
|
| 20 |
+
|
| 21 |
+
### Middleware Base Classes (Available but NOT yet used)
|
| 22 |
+
- `ChatMiddleware` - Intercepts chat client requests
|
| 23 |
+
- `AgentMiddleware` - Intercepts agent invocations
|
| 24 |
+
- `FunctionMiddleware` - Intercepts tool calls
|
| 25 |
+
|
| 26 |
+
---
|
| 27 |
+
|
| 28 |
+
## What We Hand-Roll (Custom Implementation)
|
| 29 |
+
|
| 30 |
+
### Orchestration
|
| 31 |
+
- `AdvancedOrchestrator` - Main research workflow
|
| 32 |
+
- `HierarchicalOrchestrator` - Team-based orchestration
|
| 33 |
+
- `SubIterationMiddleware` - Teamβjudge loop (workflow, not middleware)
|
| 34 |
+
|
| 35 |
+
### Clients
|
| 36 |
+
- `HuggingFaceChatClient` - Adapter for HuggingFace Inference API
|
| 37 |
+
- Client factory with auto-detection
|
| 38 |
+
|
| 39 |
+
### Tools
|
| 40 |
+
- `PubMedTool`, `ClinicalTrialsTool`, `EuropePMCTool`
|
| 41 |
+
- `SearchHandler` - Scatter-gather orchestration
|
| 42 |
+
|
| 43 |
+
### Services
|
| 44 |
+
- `EmbeddingService` - Local sentence-transformers
|
| 45 |
+
- `LlamaIndexRAG` - Premium OpenAI embeddings
|
| 46 |
+
- `ResearchMemory` - Research state management
|
| 47 |
+
|
| 48 |
+
---
|
| 49 |
+
|
| 50 |
+
## Gap Analysis
|
| 51 |
+
|
| 52 |
+
| Component | MS Framework Has | DeepBoner Has | Status |
|
| 53 |
+
|-----------|------------------|---------------|--------|
|
| 54 |
+
| Chat Middleware | `ChatMiddleware` base | Uses decorator only | SPEC-21 |
|
| 55 |
+
| Retry Logic | N/A (left to user) | None | SPEC-21 |
|
| 56 |
+
| Token Tracking | OTEL histograms | None | SPEC-21 |
|
| 57 |
+
| Thread State | `AgentThread` serialization | `ResearchMemory` (no serialization) | P3 |
|
| 58 |
+
| Observability | Full OTEL | structlog only | P3 |
|
| 59 |
+
|
| 60 |
+
---
|
| 61 |
+
|
| 62 |
+
## Recommendations
|
| 63 |
+
|
| 64 |
+
1. **Implement `RetryMiddleware`** using MS `ChatMiddleware` base class
|
| 65 |
+
2. **Implement `TokenTrackingMiddleware`** for cost visibility
|
| 66 |
+
3. **Rename `src/middleware/`** to avoid confusion with MS patterns
|
| 67 |
+
|
| 68 |
+
See [SPEC-21](../specs/SPEC-21-MIDDLEWARE-ARCHITECTURE.md) for implementation details.
|
docs/bugs/p2-hardening-issues.md
ADDED
|
@@ -0,0 +1,45 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# P2: Hardening Issues
|
| 2 |
+
|
| 3 |
+
**Date:** 2025-12-06
|
| 4 |
+
**Priority:** P2 (Should fix soon)
|
| 5 |
+
|
| 6 |
+
---
|
| 7 |
+
|
| 8 |
+
## Issue 1: PubMed JSON Parsing Crash
|
| 9 |
+
|
| 10 |
+
**Status:** SPEC-20 CREATED
|
| 11 |
+
**File:** `src/tools/pubmed.py:88`
|
| 12 |
+
|
| 13 |
+
The PubMed search tool crashes when the API returns non-JSON responses (maintenance pages, error pages). The JSON parsing happens outside the try/except block.
|
| 14 |
+
|
| 15 |
+
**Resolution:** See [SPEC-20: PubMed JSON Parsing Fix](../specs/SPEC-20-PUBMED-JSON-FIX.md)
|
| 16 |
+
|
| 17 |
+
---
|
| 18 |
+
|
| 19 |
+
## Issue 2: HuggingFace Client Missing Retry Logic
|
| 20 |
+
|
| 21 |
+
**Status:** SPEC-21 CREATED
|
| 22 |
+
**File:** `src/clients/huggingface.py`
|
| 23 |
+
|
| 24 |
+
The HuggingFaceChatClient has no retry logic for transient errors (429 rate limits, 500 server errors). When the API returns a 429, the entire research workflow crashes.
|
| 25 |
+
|
| 26 |
+
**Resolution:** See [SPEC-21: Middleware Architecture Refactor](../specs/SPEC-21-MIDDLEWARE-ARCHITECTURE.md)
|
| 27 |
+
|
| 28 |
+
---
|
| 29 |
+
|
| 30 |
+
## Issue 3: Misleading Middleware Folder Name
|
| 31 |
+
|
| 32 |
+
**Status:** SPEC-21 CREATED
|
| 33 |
+
**File:** `src/middleware/`
|
| 34 |
+
|
| 35 |
+
The `src/middleware/` folder contains `SubIterationMiddleware`, which is actually a workflow pattern (teamβjudge loop), not interceptor middleware. This is confusing and misleading.
|
| 36 |
+
|
| 37 |
+
**Resolution:** See [SPEC-21: Middleware Architecture Refactor](../specs/SPEC-21-MIDDLEWARE-ARCHITECTURE.md)
|
| 38 |
+
|
| 39 |
+
---
|
| 40 |
+
|
| 41 |
+
## Related Documentation
|
| 42 |
+
|
| 43 |
+
- [SPEC-20: PubMed JSON Parsing Fix](../specs/SPEC-20-PUBMED-JSON-FIX.md)
|
| 44 |
+
- [SPEC-21: Middleware Architecture Refactor](../specs/SPEC-21-MIDDLEWARE-ARCHITECTURE.md)
|
| 45 |
+
- [P3: MS Framework Gaps](./p3-ms-framework-gaps.md)
|
docs/bugs/p3-ms-framework-gaps.md
ADDED
|
@@ -0,0 +1,289 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# P3: Microsoft Agent Framework Gaps Analysis
|
| 2 |
+
|
| 3 |
+
**Date:** 2025-12-06
|
| 4 |
+
**Priority:** P3 (Nice-to-Have)
|
| 5 |
+
**Source:** Comparison with Microsoft Agent Framework v1.0.0b251204 (commit 8c6b12e)
|
| 6 |
+
|
| 7 |
+
## Executive Summary
|
| 8 |
+
|
| 9 |
+
Comparison of DeepBoner's implementation against Microsoft Agent Framework reveals several architectural patterns we're missing. These are not bugs but opportunities for hardening and production-readiness.
|
| 10 |
+
|
| 11 |
+
---
|
| 12 |
+
|
| 13 |
+
## Gap 1: OpenTelemetry Observability (HIGH VALUE)
|
| 14 |
+
|
| 15 |
+
**What MS Framework Has:**
|
| 16 |
+
```python
|
| 17 |
+
# observability.py - 1706 lines of comprehensive OTEL integration
|
| 18 |
+
from opentelemetry.trace import get_tracer, Span
|
| 19 |
+
from opentelemetry.metrics import get_meter, Histogram
|
| 20 |
+
|
| 21 |
+
@use_observability # Decorator for ChatClient
|
| 22 |
+
@use_agent_observability # Decorator for Agent
|
| 23 |
+
|
| 24 |
+
# Token usage histograms with bucket boundaries
|
| 25 |
+
TOKEN_USAGE_BUCKET_BOUNDARIES = (1, 4, 16, 64, 256, 1024, 4096, 16384, 65536, 262144, 1048576)
|
| 26 |
+
|
| 27 |
+
# Operation duration histograms
|
| 28 |
+
OPERATION_DURATION_BUCKET_BOUNDARIES = (0.01, 0.02, 0.04, 0.08, 0.16, ...)
|
| 29 |
+
|
| 30 |
+
# 80+ semantic span attributes (OtelAttr enum)
|
| 31 |
+
OtelAttr.GEN_AI_OPERATION_NAME
|
| 32 |
+
OtelAttr.GEN_AI_REQUEST_MODEL
|
| 33 |
+
OtelAttr.GEN_AI_USAGE_INPUT_TOKENS
|
| 34 |
+
OtelAttr.GEN_AI_USAGE_OUTPUT_TOKENS
|
| 35 |
+
```
|
| 36 |
+
|
| 37 |
+
**What DeepBoner Has:**
|
| 38 |
+
- `structlog` for logging only
|
| 39 |
+
- No distributed tracing
|
| 40 |
+
- No metrics collection
|
| 41 |
+
- No token usage tracking
|
| 42 |
+
|
| 43 |
+
**Gap Impact:**
|
| 44 |
+
- Cannot trace requests across agents
|
| 45 |
+
- No token cost monitoring
|
| 46 |
+
- No performance profiling in production
|
| 47 |
+
|
| 48 |
+
**Recommended Fix:**
|
| 49 |
+
```python
|
| 50 |
+
# Add optional OTEL support to orchestrator
|
| 51 |
+
# src/observability/__init__.py
|
| 52 |
+
from opentelemetry import trace
|
| 53 |
+
from opentelemetry.metrics import get_meter
|
| 54 |
+
|
| 55 |
+
def setup_observability():
|
| 56 |
+
"""One-time setup for OpenTelemetry."""
|
| 57 |
+
...
|
| 58 |
+
|
| 59 |
+
@contextmanager
|
| 60 |
+
def trace_agent_operation(name: str, attributes: dict):
|
| 61 |
+
"""Context manager for tracing agent operations."""
|
| 62 |
+
...
|
| 63 |
+
```
|
| 64 |
+
|
| 65 |
+
---
|
| 66 |
+
|
| 67 |
+
## Gap 2: Middleware Pipelines (MEDIUM VALUE) - ADDRESSED IN ADR-001
|
| 68 |
+
|
| 69 |
+
> **NOTE:** This gap is being addressed in [ADR-001: Middleware Architecture Refactor](../architecture/adr-001-middleware-refactor.md)
|
| 70 |
+
|
| 71 |
+
**What MS Framework Has:**
|
| 72 |
+
```python
|
| 73 |
+
# _middleware.py - Three types of middleware
|
| 74 |
+
|
| 75 |
+
class AgentMiddleware(ABC):
|
| 76 |
+
"""Intercepts agent invocations."""
|
| 77 |
+
async def process(self, context: AgentRunContext, next): ...
|
| 78 |
+
|
| 79 |
+
class FunctionMiddleware(ABC):
|
| 80 |
+
"""Intercepts tool/function calls."""
|
| 81 |
+
async def process(self, context: FunctionInvocationContext, next): ...
|
| 82 |
+
|
| 83 |
+
class ChatMiddleware(ABC):
|
| 84 |
+
"""Intercepts chat client requests."""
|
| 85 |
+
async def process(self, context: ChatContext, next): ...
|
| 86 |
+
|
| 87 |
+
# Decorators for easy middleware creation
|
| 88 |
+
@agent_middleware
|
| 89 |
+
async def logging_middleware(context: AgentRunContext, next):
|
| 90 |
+
print(f"Before: {context.agent.name}")
|
| 91 |
+
await next(context)
|
| 92 |
+
print(f"After: {context.result}")
|
| 93 |
+
|
| 94 |
+
# Pipeline execution with terminate support
|
| 95 |
+
context.terminate = True # Stops pipeline early
|
| 96 |
+
```
|
| 97 |
+
|
| 98 |
+
**What DeepBoner Has:**
|
| 99 |
+
- Uses MS decorators (`@use_chat_middleware`, `@use_observability`) β
|
| 100 |
+
- BUT: No custom `ChatMiddleware` class implementations β
|
| 101 |
+
- `src/middleware/` folder contains a workflow, not actual middleware β
|
| 102 |
+
|
| 103 |
+
**ADR-001 Solution:**
|
| 104 |
+
1. Rename `src/middleware/` β `src/workflows/` (fix misleading name)
|
| 105 |
+
2. Create proper `src/middleware/` with MS-pattern implementations:
|
| 106 |
+
- `RetryMiddleware(ChatMiddleware)` - fixes HuggingFace retry bug
|
| 107 |
+
- `TokenTrackingMiddleware(ChatMiddleware)` - enables cost monitoring
|
| 108 |
+
- `LoggingMiddleware(ChatMiddleware)` - structured request/response logs
|
| 109 |
+
|
| 110 |
+
---
|
| 111 |
+
|
| 112 |
+
## Gap 3: Thread/Conversation State Management (MEDIUM VALUE)
|
| 113 |
+
|
| 114 |
+
**What MS Framework Has:**
|
| 115 |
+
```python
|
| 116 |
+
# _threads.py
|
| 117 |
+
class AgentThread:
|
| 118 |
+
"""Maintains conversation state with serialization support."""
|
| 119 |
+
|
| 120 |
+
def __init__(self, service_thread_id=None, message_store=None):
|
| 121 |
+
...
|
| 122 |
+
|
| 123 |
+
async def serialize(self) -> dict[str, Any]:
|
| 124 |
+
"""Persist thread state."""
|
| 125 |
+
...
|
| 126 |
+
|
| 127 |
+
@classmethod
|
| 128 |
+
async def deserialize(cls, state: dict) -> "AgentThread":
|
| 129 |
+
"""Restore thread from persisted state."""
|
| 130 |
+
...
|
| 131 |
+
|
| 132 |
+
class ChatMessageStoreProtocol(Protocol):
|
| 133 |
+
"""Protocol for message storage backends."""
|
| 134 |
+
async def list_messages(self) -> list[ChatMessage]: ...
|
| 135 |
+
async def add_messages(self, messages: Sequence[ChatMessage]): ...
|
| 136 |
+
```
|
| 137 |
+
|
| 138 |
+
**What DeepBoner Has:**
|
| 139 |
+
- `ResearchMemory` for research state only
|
| 140 |
+
- No conversation persistence
|
| 141 |
+
- No serialization/deserialization
|
| 142 |
+
|
| 143 |
+
**Gap Impact:**
|
| 144 |
+
- Cannot resume interrupted research sessions
|
| 145 |
+
- Cannot persist conversation history
|
| 146 |
+
- Cannot implement checkpointing
|
| 147 |
+
|
| 148 |
+
---
|
| 149 |
+
|
| 150 |
+
## Gap 4: Function/Tool Configuration (MEDIUM VALUE)
|
| 151 |
+
|
| 152 |
+
**What MS Framework Has:**
|
| 153 |
+
```python
|
| 154 |
+
# _tools.py
|
| 155 |
+
class FunctionInvocationConfiguration:
|
| 156 |
+
"""Configuration for function invocation in chat clients."""
|
| 157 |
+
|
| 158 |
+
enabled: bool = True
|
| 159 |
+
max_iterations: int = 40 # Maximum tool loop iterations
|
| 160 |
+
max_consecutive_errors_per_request: int = 3
|
| 161 |
+
terminate_on_unknown_calls: bool = False
|
| 162 |
+
include_detailed_errors: bool = False
|
| 163 |
+
|
| 164 |
+
class AIFunction:
|
| 165 |
+
"""Wraps Python function for AI model calling."""
|
| 166 |
+
approval_mode: Literal["always_require", "never_require"]
|
| 167 |
+
max_invocations: int # Per-function invocation limit
|
| 168 |
+
max_invocation_exceptions: int # Per-function error limit
|
| 169 |
+
invocation_count: int # Tracks usage
|
| 170 |
+
```
|
| 171 |
+
|
| 172 |
+
**What DeepBoner Has:**
|
| 173 |
+
- `max_iterations` in Settings
|
| 174 |
+
- Basic tool execution
|
| 175 |
+
- No per-tool configuration
|
| 176 |
+
- No approval mode
|
| 177 |
+
|
| 178 |
+
**Gap Impact:**
|
| 179 |
+
- Cannot limit individual tool usage
|
| 180 |
+
- No human-in-the-loop for dangerous tools
|
| 181 |
+
- No per-tool error budgets
|
| 182 |
+
|
| 183 |
+
---
|
| 184 |
+
|
| 185 |
+
## Gap 5: Context Provider Lifecycle (LOW VALUE)
|
| 186 |
+
|
| 187 |
+
**What MS Framework Has:**
|
| 188 |
+
```python
|
| 189 |
+
# _memory.py
|
| 190 |
+
class ContextProvider(ABC):
|
| 191 |
+
"""Abstract pattern for injecting context into agent invocations."""
|
| 192 |
+
|
| 193 |
+
async def invoking(self, agent, thread) -> str | None:
|
| 194 |
+
"""Called before agent invocation. Returns context to inject."""
|
| 195 |
+
...
|
| 196 |
+
|
| 197 |
+
async def invoked(self, agent, thread, result):
|
| 198 |
+
"""Called after agent invocation."""
|
| 199 |
+
...
|
| 200 |
+
|
| 201 |
+
async def thread_created(self, thread):
|
| 202 |
+
"""Called when new thread is created."""
|
| 203 |
+
...
|
| 204 |
+
|
| 205 |
+
class AggregateContextProvider(ContextProvider):
|
| 206 |
+
"""Combines multiple context providers."""
|
| 207 |
+
...
|
| 208 |
+
```
|
| 209 |
+
|
| 210 |
+
**What DeepBoner Has:**
|
| 211 |
+
- `ResearchMemory` as simple state container
|
| 212 |
+
- No lifecycle hooks
|
| 213 |
+
- No provider aggregation
|
| 214 |
+
|
| 215 |
+
---
|
| 216 |
+
|
| 217 |
+
## Gap 6: Exception Granularity (LOW VALUE)
|
| 218 |
+
|
| 219 |
+
**What MS Framework Has:**
|
| 220 |
+
```
|
| 221 |
+
AgentFrameworkException (base)
|
| 222 |
+
βββ AgentException
|
| 223 |
+
β βββ AgentExecutionException
|
| 224 |
+
β βββ AgentInitializationError
|
| 225 |
+
β βββ AgentThreadException
|
| 226 |
+
βββ ChatClientException
|
| 227 |
+
β βββ ChatClientInitializationError
|
| 228 |
+
βββ ServiceException
|
| 229 |
+
β βββ ServiceInitializationError
|
| 230 |
+
β βββ ServiceResponseException
|
| 231 |
+
β β βββ ServiceContentFilterException
|
| 232 |
+
β β βββ ServiceInvalidExecutionSettingsError
|
| 233 |
+
β β βββ ServiceInvalidResponseError
|
| 234 |
+
β βββ ServiceInvalidAuthError
|
| 235 |
+
βββ ToolException
|
| 236 |
+
β βββ ToolExecutionException
|
| 237 |
+
βββ MiddlewareException
|
| 238 |
+
βββ ContentError
|
| 239 |
+
```
|
| 240 |
+
|
| 241 |
+
**What DeepBoner Has:**
|
| 242 |
+
```
|
| 243 |
+
DeepBonerError (base)
|
| 244 |
+
βββ SearchError
|
| 245 |
+
β βββ RateLimitError
|
| 246 |
+
βββ JudgeError
|
| 247 |
+
βββ ConfigurationError
|
| 248 |
+
βββ EmbeddingError
|
| 249 |
+
```
|
| 250 |
+
|
| 251 |
+
**Gap Impact:**
|
| 252 |
+
- Less precise error handling
|
| 253 |
+
- Harder to distinguish error sources
|
| 254 |
+
- Less informative error messages for users
|
| 255 |
+
|
| 256 |
+
---
|
| 257 |
+
|
| 258 |
+
## Prioritized Implementation Roadmap
|
| 259 |
+
|
| 260 |
+
### Phase 1: Quick Wins (1-2 days)
|
| 261 |
+
1. Add token tracking to orchestrator (no OTEL yet, just counters)
|
| 262 |
+
2. Add `max_consecutive_errors` to tool execution
|
| 263 |
+
|
| 264 |
+
### Phase 2: Medium Effort (3-5 days)
|
| 265 |
+
1. Add basic middleware pattern to orchestrator
|
| 266 |
+
2. Implement thread serialization for `ResearchMemory`
|
| 267 |
+
|
| 268 |
+
### Phase 3: Full Production (1-2 weeks)
|
| 269 |
+
1. Full OpenTelemetry integration
|
| 270 |
+
2. Complete middleware pipeline
|
| 271 |
+
3. Context provider lifecycle hooks
|
| 272 |
+
|
| 273 |
+
---
|
| 274 |
+
|
| 275 |
+
## Related Issues
|
| 276 |
+
|
| 277 |
+
- **P2 Hardening Issues:** `docs/bugs/p2-hardening-issues.md`
|
| 278 |
+
- **MS Framework Reference:** `reference_repos/microsoft-agent-framework/`
|
| 279 |
+
|
| 280 |
+
---
|
| 281 |
+
|
| 282 |
+
## Notes
|
| 283 |
+
|
| 284 |
+
These gaps are P3 because:
|
| 285 |
+
1. DeepBoner is functional without them
|
| 286 |
+
2. They're architectural improvements, not bug fixes
|
| 287 |
+
3. User-facing functionality is not impacted
|
| 288 |
+
|
| 289 |
+
However, for production deployment serving multiple users, Gaps 1 (Observability) and 3 (Thread State) become P1/P2.
|
docs/specs/README.md
ADDED
|
@@ -0,0 +1,146 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Tech Debt & Bug Fix Specs
|
| 2 |
+
|
| 3 |
+
**Status:** AWAITING SENIOR REVIEW
|
| 4 |
+
**Created:** 2025-12-06
|
| 5 |
+
|
| 6 |
+
---
|
| 7 |
+
|
| 8 |
+
## Overview
|
| 9 |
+
|
| 10 |
+
These specs consolidate all identified bugs, tech debt, and architectural issues into phased, implementable work packages. Each spec is designed to be a single PR with TDD, SOLID, DRY, Gang of Four principles.
|
| 11 |
+
|
| 12 |
+
**Implementation Order:** SPEC-20 β SPEC-21 β SPEC-22
|
| 13 |
+
|
| 14 |
+
---
|
| 15 |
+
|
| 16 |
+
## Spec Index
|
| 17 |
+
|
| 18 |
+
| Spec | Title | Priority | Effort | Status |
|
| 19 |
+
|------|-------|----------|--------|--------|
|
| 20 |
+
| [SPEC-20](./SPEC-20-PUBMED-JSON-FIX.md) | PubMed JSON Parsing Fix | P2 | 15 min | READY |
|
| 21 |
+
| [SPEC-21](./SPEC-21-MIDDLEWARE-ARCHITECTURE.md) | Middleware Architecture Refactor | P2 | 2 hours | READY |
|
| 22 |
+
| [SPEC-22](./SPEC-22-PROGRESS-BAR-REMOVAL.md) | Progress Bar Removal | P3 | 15 min | READY |
|
| 23 |
+
|
| 24 |
+
**Total Effort:** ~2.5 hours
|
| 25 |
+
|
| 26 |
+
---
|
| 27 |
+
|
| 28 |
+
## Why This Order?
|
| 29 |
+
|
| 30 |
+
### SPEC-20 First (15 min)
|
| 31 |
+
- Quickest win
|
| 32 |
+
- Fixes a real crash bug
|
| 33 |
+
- Builds confidence before larger refactor
|
| 34 |
+
- Single file, single PR
|
| 35 |
+
|
| 36 |
+
### SPEC-21 Second (2 hours)
|
| 37 |
+
- The big architectural fix
|
| 38 |
+
- Renames confusing folder
|
| 39 |
+
- Implements proper MS framework patterns
|
| 40 |
+
- Fixes HuggingFace retry bug THE RIGHT WAY
|
| 41 |
+
- Adds token tracking
|
| 42 |
+
|
| 43 |
+
### SPEC-22 Last (15 min)
|
| 44 |
+
- Cosmetic only
|
| 45 |
+
- Can be deferred if needed
|
| 46 |
+
- Easy cleanup
|
| 47 |
+
|
| 48 |
+
---
|
| 49 |
+
|
| 50 |
+
## What These Specs Consolidate
|
| 51 |
+
|
| 52 |
+
These specs replace the scattered documentation in:
|
| 53 |
+
|
| 54 |
+
| Old Location | Now Covered By |
|
| 55 |
+
|--------------|----------------|
|
| 56 |
+
| `docs/bugs/p2-hardening-issues.md` Issue 1 | SPEC-20 |
|
| 57 |
+
| `docs/bugs/p2-hardening-issues.md` Issue 2 | SPEC-21 |
|
| 58 |
+
| `docs/architecture/adr-001-middleware-refactor.md` | SPEC-21 |
|
| 59 |
+
| `docs/bugs/p3-progress-bar-positioning.md` | SPEC-22 |
|
| 60 |
+
|
| 61 |
+
---
|
| 62 |
+
|
| 63 |
+
## What's NOT In These Specs (Deferred P3)
|
| 64 |
+
|
| 65 |
+
The following are documented but deferred for later:
|
| 66 |
+
|
| 67 |
+
1. **OpenTelemetry observability** - Nice to have, not blocking
|
| 68 |
+
2. **Thread state serialization** - Nice to have, not blocking
|
| 69 |
+
3. **ResearchMemory locks** - Not a bug today (sequential execution)
|
| 70 |
+
4. **Error path cleanup** - Minor resource leakage, GC handles it
|
| 71 |
+
5. **Per-tool configuration** - Nice to have
|
| 72 |
+
6. **Context provider lifecycle** - Nice to have
|
| 73 |
+
|
| 74 |
+
These remain documented in `docs/bugs/p3-ms-framework-gaps.md` for future work.
|
| 75 |
+
|
| 76 |
+
---
|
| 77 |
+
|
| 78 |
+
## Implementation Protocol
|
| 79 |
+
|
| 80 |
+
For each spec:
|
| 81 |
+
|
| 82 |
+
1. **Read the spec** - Understand the problem and solution
|
| 83 |
+
2. **TDD** - Write failing test first
|
| 84 |
+
3. **Implement** - Minimal code to pass tests
|
| 85 |
+
4. **Run `make check`** - Lint + typecheck + test
|
| 86 |
+
5. **Commit** - Single commit per spec
|
| 87 |
+
6. **PR** - One PR per spec with spec number in title
|
| 88 |
+
|
| 89 |
+
---
|
| 90 |
+
|
| 91 |
+
## Commit Message Format
|
| 92 |
+
|
| 93 |
+
```
|
| 94 |
+
fix: PubMed JSON parsing (SPEC-20)
|
| 95 |
+
|
| 96 |
+
Moves JSON parsing inside try/except block to handle API
|
| 97 |
+
maintenance pages gracefully. Adds JSONDecodeError handling.
|
| 98 |
+
|
| 99 |
+
Fixes: production crash on PubMed maintenance pages
|
| 100 |
+
```
|
| 101 |
+
|
| 102 |
+
```
|
| 103 |
+
refactor: middleware architecture (SPEC-21)
|
| 104 |
+
|
| 105 |
+
- Renames src/middleware β src/workflows (accurate naming)
|
| 106 |
+
- Creates proper src/middleware with ChatMiddleware implementations
|
| 107 |
+
- Implements RetryMiddleware (fixes HuggingFace 429 crashes)
|
| 108 |
+
- Implements TokenTrackingMiddleware (enables cost monitoring)
|
| 109 |
+
```
|
| 110 |
+
|
| 111 |
+
```
|
| 112 |
+
fix: remove progress bar overlap (SPEC-22)
|
| 113 |
+
|
| 114 |
+
Removes gr.Progress() from research_agent function.
|
| 115 |
+
Gradio's Progress is incompatible with ChatInterface.
|
| 116 |
+
Emoji status messages in chat output are retained.
|
| 117 |
+
```
|
| 118 |
+
|
| 119 |
+
---
|
| 120 |
+
|
| 121 |
+
## Senior Review Checklist
|
| 122 |
+
|
| 123 |
+
Before implementation, please verify:
|
| 124 |
+
|
| 125 |
+
- [ ] SPEC-20: Fix approach is correct (move into try/except)
|
| 126 |
+
- [ ] SPEC-21: MS middleware pattern is used correctly
|
| 127 |
+
- [ ] SPEC-21: RetryMiddleware implementation follows framework conventions
|
| 128 |
+
- [ ] SPEC-21: Folder rename won't break anything else
|
| 129 |
+
- [ ] SPEC-22: Removing gr.Progress() is the right fix (vs CSS hack)
|
| 130 |
+
- [ ] Order of implementation makes sense
|
| 131 |
+
- [ ] Nothing critical is missing from these specs
|
| 132 |
+
|
| 133 |
+
---
|
| 134 |
+
|
| 135 |
+
## After Implementation
|
| 136 |
+
|
| 137 |
+
Once all specs are implemented:
|
| 138 |
+
|
| 139 |
+
1. Archive old docs:
|
| 140 |
+
- `docs/bugs/p2-hardening-issues.md` β Mark as RESOLVED
|
| 141 |
+
- `docs/architecture/adr-001-middleware-refactor.md` β Delete or archive
|
| 142 |
+
- `docs/bugs/p3-progress-bar-positioning.md` β Mark as RESOLVED
|
| 143 |
+
|
| 144 |
+
2. Update `docs/bugs/active-bugs.md` to reflect completed fixes
|
| 145 |
+
|
| 146 |
+
3. Consider v0.2.0 release with these fixes
|
docs/specs/SPEC-20-PUBMED-JSON-FIX.md
ADDED
|
@@ -0,0 +1,129 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# SPEC-20: PubMed JSON Parsing Fix
|
| 2 |
+
|
| 3 |
+
**Status:** READY FOR IMPLEMENTATION
|
| 4 |
+
**Priority:** P2 (Critical - causes crashes)
|
| 5 |
+
**Effort:** 15 minutes
|
| 6 |
+
**PR Scope:** Single file fix
|
| 7 |
+
|
| 8 |
+
---
|
| 9 |
+
|
| 10 |
+
## Problem Statement
|
| 11 |
+
|
| 12 |
+
The PubMed search tool crashes when the API returns non-JSON responses (maintenance pages, error pages). The JSON parsing happens **outside** the try/except block.
|
| 13 |
+
|
| 14 |
+
**File:** `src/tools/pubmed.py:88`
|
| 15 |
+
**Crash Type:** `json.JSONDecodeError`
|
| 16 |
+
**User Impact:** Entire research workflow crashes
|
| 17 |
+
|
| 18 |
+
---
|
| 19 |
+
|
| 20 |
+
## Current Code (BROKEN)
|
| 21 |
+
|
| 22 |
+
```python
|
| 23 |
+
# src/tools/pubmed.py - Lines ~80-95
|
| 24 |
+
try:
|
| 25 |
+
search_resp = await client.get(
|
| 26 |
+
f"{NCBI_BASE_URL}/esearch.fcgi",
|
| 27 |
+
params=search_params,
|
| 28 |
+
)
|
| 29 |
+
search_resp.raise_for_status()
|
| 30 |
+
except httpx.HTTPStatusError as e:
|
| 31 |
+
logger.warning("PubMed search failed", status=e.response.status_code)
|
| 32 |
+
return []
|
| 33 |
+
|
| 34 |
+
# βββ THIS IS OUTSIDE THE TRY BLOCK βββ
|
| 35 |
+
search_data = search_resp.json() # CRASHES HERE on maintenance pages
|
| 36 |
+
pmids = search_data.get("esearchresult", {}).get("idlist", [])
|
| 37 |
+
```
|
| 38 |
+
|
| 39 |
+
---
|
| 40 |
+
|
| 41 |
+
## Required Fix
|
| 42 |
+
|
| 43 |
+
Move JSON parsing inside try/except and add `JSONDecodeError` handling:
|
| 44 |
+
|
| 45 |
+
```python
|
| 46 |
+
# src/tools/pubmed.py - Fixed version
|
| 47 |
+
try:
|
| 48 |
+
search_resp = await client.get(
|
| 49 |
+
f"{NCBI_BASE_URL}/esearch.fcgi",
|
| 50 |
+
params=search_params,
|
| 51 |
+
)
|
| 52 |
+
search_resp.raise_for_status()
|
| 53 |
+
search_data = search_resp.json() # β MOVED INSIDE TRY
|
| 54 |
+
except httpx.HTTPStatusError as e:
|
| 55 |
+
logger.warning("PubMed search failed", status=e.response.status_code)
|
| 56 |
+
return []
|
| 57 |
+
except json.JSONDecodeError as e:
|
| 58 |
+
logger.warning(
|
| 59 |
+
"PubMed returned invalid JSON (possible maintenance page)",
|
| 60 |
+
error=str(e),
|
| 61 |
+
response_preview=search_resp.text[:200] if search_resp else "N/A",
|
| 62 |
+
)
|
| 63 |
+
return []
|
| 64 |
+
|
| 65 |
+
pmids = search_data.get("esearchresult", {}).get("idlist", [])
|
| 66 |
+
```
|
| 67 |
+
|
| 68 |
+
---
|
| 69 |
+
|
| 70 |
+
## Implementation Checklist
|
| 71 |
+
|
| 72 |
+
- [ ] Add `import json` at top of file (if not present)
|
| 73 |
+
- [ ] Move `search_resp.json()` inside try block (line ~88)
|
| 74 |
+
- [ ] Add `except json.JSONDecodeError` handler
|
| 75 |
+
- [ ] Log warning with response preview for debugging
|
| 76 |
+
- [ ] Return empty list (graceful degradation)
|
| 77 |
+
- [ ] Write unit test: mock response with HTML content
|
| 78 |
+
- [ ] Run `make check` (lint + typecheck + test)
|
| 79 |
+
|
| 80 |
+
---
|
| 81 |
+
|
| 82 |
+
## Test Case
|
| 83 |
+
|
| 84 |
+
```python
|
| 85 |
+
# tests/unit/tools/test_pubmed.py
|
| 86 |
+
import pytest
|
| 87 |
+
from unittest.mock import AsyncMock, MagicMock
|
| 88 |
+
from src.tools.pubmed import search_pubmed
|
| 89 |
+
|
| 90 |
+
@pytest.mark.asyncio
|
| 91 |
+
async def test_pubmed_handles_maintenance_page():
|
| 92 |
+
"""PubMed should gracefully handle non-JSON responses."""
|
| 93 |
+
# Mock httpx client returning HTML maintenance page
|
| 94 |
+
mock_response = MagicMock()
|
| 95 |
+
mock_response.status_code = 200
|
| 96 |
+
mock_response.text = "<html><body>Service Temporarily Unavailable</body></html>"
|
| 97 |
+
mock_response.json.side_effect = json.JSONDecodeError("Expecting value", "", 0)
|
| 98 |
+
mock_response.raise_for_status = MagicMock()
|
| 99 |
+
|
| 100 |
+
mock_client = AsyncMock()
|
| 101 |
+
mock_client.get.return_value = mock_response
|
| 102 |
+
|
| 103 |
+
# Should return empty list, not crash
|
| 104 |
+
result = await search_pubmed("test query", client=mock_client)
|
| 105 |
+
assert result == []
|
| 106 |
+
```
|
| 107 |
+
|
| 108 |
+
---
|
| 109 |
+
|
| 110 |
+
## Acceptance Criteria
|
| 111 |
+
|
| 112 |
+
1. `search_pubmed()` returns `[]` when API returns HTML
|
| 113 |
+
2. Warning logged with response preview
|
| 114 |
+
3. No `JSONDecodeError` propagates to caller
|
| 115 |
+
4. All existing tests pass
|
| 116 |
+
5. `make check` passes
|
| 117 |
+
|
| 118 |
+
---
|
| 119 |
+
|
| 120 |
+
## Dependencies
|
| 121 |
+
|
| 122 |
+
None. This is a standalone fix.
|
| 123 |
+
|
| 124 |
+
---
|
| 125 |
+
|
| 126 |
+
## Notes
|
| 127 |
+
|
| 128 |
+
- This same pattern may exist in `clinicaltrials.py` and `europepmc.py` - check after this fix
|
| 129 |
+
- Do NOT over-engineer. Single fix, single PR.
|
docs/specs/SPEC-21-MIDDLEWARE-ARCHITECTURE.md
ADDED
|
@@ -0,0 +1,445 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# SPEC-21: Middleware Architecture Refactor
|
| 2 |
+
|
| 3 |
+
**Status:** READY FOR IMPLEMENTATION
|
| 4 |
+
**Priority:** P2 (Architectural hygiene + fixes HuggingFace retry bug)
|
| 5 |
+
**Effort:** 2 hours
|
| 6 |
+
**PR Scope:** Folder rename + new middleware implementations
|
| 7 |
+
|
| 8 |
+
---
|
| 9 |
+
|
| 10 |
+
## Problem Statement
|
| 11 |
+
|
| 12 |
+
1. **Misleading folder name:** `src/middleware/` contains a workflow (`SubIterationMiddleware`), not interceptor middleware
|
| 13 |
+
2. **Missing retry logic:** `HuggingFaceChatClient` has no retry on 429/transient errors (P2 bug)
|
| 14 |
+
3. **No token tracking:** Cannot monitor API costs
|
| 15 |
+
4. **Not using MS framework patterns:** We use decorators but not `ChatMiddleware` base classes
|
| 16 |
+
|
| 17 |
+
---
|
| 18 |
+
|
| 19 |
+
## Current State (WRONG)
|
| 20 |
+
|
| 21 |
+
```
|
| 22 |
+
src/
|
| 23 |
+
βββ middleware/ β MISLEADING: contains workflow
|
| 24 |
+
β βββ __init__.py
|
| 25 |
+
β βββ sub_iteration.py β This is a WORKFLOW, not middleware
|
| 26 |
+
β βββ .gitkeep
|
| 27 |
+
```
|
| 28 |
+
|
| 29 |
+
**HuggingFace client has no retry:**
|
| 30 |
+
```python
|
| 31 |
+
# src/clients/huggingface.py:263-265
|
| 32 |
+
except Exception as e:
|
| 33 |
+
logger.error("HuggingFace API error", error=str(e))
|
| 34 |
+
raise # β No retry, crashes on 429
|
| 35 |
+
```
|
| 36 |
+
|
| 37 |
+
---
|
| 38 |
+
|
| 39 |
+
## Target State (CORRECT)
|
| 40 |
+
|
| 41 |
+
```
|
| 42 |
+
src/
|
| 43 |
+
βββ workflows/ β RENAMED: now accurate
|
| 44 |
+
β βββ __init__.py
|
| 45 |
+
β βββ sub_iteration.py
|
| 46 |
+
β
|
| 47 |
+
βββ middleware/ β NEW: actual MS-pattern middleware
|
| 48 |
+
β βββ __init__.py
|
| 49 |
+
β βββ retry.py β RetryMiddleware(ChatMiddleware)
|
| 50 |
+
β βββ token_tracking.py β TokenTrackingMiddleware(ChatMiddleware)
|
| 51 |
+
```
|
| 52 |
+
|
| 53 |
+
---
|
| 54 |
+
|
| 55 |
+
## Implementation Steps
|
| 56 |
+
|
| 57 |
+
### Step 1: Rename Folder (5 min)
|
| 58 |
+
|
| 59 |
+
```bash
|
| 60 |
+
# Rename middleware β workflows
|
| 61 |
+
git mv src/middleware src/workflows
|
| 62 |
+
```
|
| 63 |
+
|
| 64 |
+
### Step 2: Update Import (5 min)
|
| 65 |
+
|
| 66 |
+
```python
|
| 67 |
+
# src/orchestrators/hierarchical.py - Line ~15
|
| 68 |
+
# BEFORE:
|
| 69 |
+
from src.middleware.sub_iteration import SubIterationMiddleware, SubIterationTeam
|
| 70 |
+
|
| 71 |
+
# AFTER:
|
| 72 |
+
from src.workflows.sub_iteration import SubIterationMiddleware, SubIterationTeam
|
| 73 |
+
```
|
| 74 |
+
|
| 75 |
+
### Step 3: Create New Middleware Package (10 min)
|
| 76 |
+
|
| 77 |
+
```python
|
| 78 |
+
# src/middleware/__init__.py
|
| 79 |
+
"""Microsoft Agent Framework middleware implementations.
|
| 80 |
+
|
| 81 |
+
These are interceptor-pattern middleware that wrap chat client calls.
|
| 82 |
+
They are NOT workflows - see src/workflows/ for orchestration patterns.
|
| 83 |
+
"""
|
| 84 |
+
|
| 85 |
+
from src.middleware.retry import RetryMiddleware
|
| 86 |
+
from src.middleware.token_tracking import TokenTrackingMiddleware
|
| 87 |
+
|
| 88 |
+
__all__ = ["RetryMiddleware", "TokenTrackingMiddleware"]
|
| 89 |
+
```
|
| 90 |
+
|
| 91 |
+
### Step 4: Implement RetryMiddleware (30 min)
|
| 92 |
+
|
| 93 |
+
```python
|
| 94 |
+
# src/middleware/retry.py
|
| 95 |
+
"""Retry middleware for chat clients with exponential backoff."""
|
| 96 |
+
|
| 97 |
+
import asyncio
|
| 98 |
+
from typing import Any
|
| 99 |
+
|
| 100 |
+
import structlog
|
| 101 |
+
from agent_framework._middleware import ChatContext, ChatMiddleware
|
| 102 |
+
|
| 103 |
+
logger = structlog.get_logger()
|
| 104 |
+
|
| 105 |
+
|
| 106 |
+
class RetryMiddleware(ChatMiddleware):
|
| 107 |
+
"""Retries failed chat requests with exponential backoff.
|
| 108 |
+
|
| 109 |
+
This middleware intercepts chat client calls and retries on transient
|
| 110 |
+
errors (rate limits, timeouts, server errors).
|
| 111 |
+
|
| 112 |
+
Attributes:
|
| 113 |
+
max_attempts: Maximum number of attempts (default: 3)
|
| 114 |
+
min_wait: Minimum wait between retries in seconds (default: 1.0)
|
| 115 |
+
max_wait: Maximum wait between retries in seconds (default: 10.0)
|
| 116 |
+
retryable_status_codes: HTTP status codes to retry (default: 429, 500, 502, 503, 504)
|
| 117 |
+
"""
|
| 118 |
+
|
| 119 |
+
def __init__(
|
| 120 |
+
self,
|
| 121 |
+
max_attempts: int = 3,
|
| 122 |
+
min_wait: float = 1.0,
|
| 123 |
+
max_wait: float = 10.0,
|
| 124 |
+
retryable_status_codes: tuple[int, ...] = (429, 500, 502, 503, 504),
|
| 125 |
+
) -> None:
|
| 126 |
+
self.max_attempts = max_attempts
|
| 127 |
+
self.min_wait = min_wait
|
| 128 |
+
self.max_wait = max_wait
|
| 129 |
+
self.retryable_status_codes = retryable_status_codes
|
| 130 |
+
|
| 131 |
+
def _is_retryable(self, error: Exception) -> bool:
|
| 132 |
+
"""Check if error is retryable."""
|
| 133 |
+
# Check for httpx status errors
|
| 134 |
+
if hasattr(error, "response") and hasattr(error.response, "status_code"):
|
| 135 |
+
return error.response.status_code in self.retryable_status_codes
|
| 136 |
+
|
| 137 |
+
# Check for timeout errors
|
| 138 |
+
error_name = type(error).__name__.lower()
|
| 139 |
+
if "timeout" in error_name:
|
| 140 |
+
return True
|
| 141 |
+
|
| 142 |
+
# Check for connection errors
|
| 143 |
+
if "connection" in error_name:
|
| 144 |
+
return True
|
| 145 |
+
|
| 146 |
+
return False
|
| 147 |
+
|
| 148 |
+
def _calculate_wait(self, attempt: int) -> float:
|
| 149 |
+
"""Calculate wait time with exponential backoff."""
|
| 150 |
+
wait = self.min_wait * (2 ** attempt)
|
| 151 |
+
return min(wait, self.max_wait)
|
| 152 |
+
|
| 153 |
+
async def process(self, context: ChatContext, next: Any) -> None:
|
| 154 |
+
"""Process the chat request with retry logic."""
|
| 155 |
+
last_error: Exception | None = None
|
| 156 |
+
|
| 157 |
+
for attempt in range(self.max_attempts):
|
| 158 |
+
try:
|
| 159 |
+
await next(context)
|
| 160 |
+
return # Success - exit retry loop
|
| 161 |
+
|
| 162 |
+
except Exception as e:
|
| 163 |
+
last_error = e
|
| 164 |
+
|
| 165 |
+
if not self._is_retryable(e):
|
| 166 |
+
logger.warning(
|
| 167 |
+
"Non-retryable error",
|
| 168 |
+
error=str(e),
|
| 169 |
+
error_type=type(e).__name__,
|
| 170 |
+
)
|
| 171 |
+
raise # Don't retry non-retryable errors
|
| 172 |
+
|
| 173 |
+
if attempt < self.max_attempts - 1:
|
| 174 |
+
wait_time = self._calculate_wait(attempt)
|
| 175 |
+
logger.info(
|
| 176 |
+
"Retrying after error",
|
| 177 |
+
attempt=attempt + 1,
|
| 178 |
+
max_attempts=self.max_attempts,
|
| 179 |
+
wait_seconds=wait_time,
|
| 180 |
+
error=str(e),
|
| 181 |
+
)
|
| 182 |
+
await asyncio.sleep(wait_time)
|
| 183 |
+
|
| 184 |
+
# All retries exhausted
|
| 185 |
+
logger.error(
|
| 186 |
+
"All retry attempts failed",
|
| 187 |
+
max_attempts=self.max_attempts,
|
| 188 |
+
last_error=str(last_error),
|
| 189 |
+
)
|
| 190 |
+
if last_error:
|
| 191 |
+
raise last_error
|
| 192 |
+
```
|
| 193 |
+
|
| 194 |
+
### Step 5: Implement TokenTrackingMiddleware (20 min)
|
| 195 |
+
|
| 196 |
+
```python
|
| 197 |
+
# src/middleware/token_tracking.py
|
| 198 |
+
"""Token tracking middleware for monitoring API usage."""
|
| 199 |
+
|
| 200 |
+
from contextvars import ContextVar
|
| 201 |
+
from typing import Any
|
| 202 |
+
|
| 203 |
+
import structlog
|
| 204 |
+
from agent_framework._middleware import ChatContext, ChatMiddleware
|
| 205 |
+
|
| 206 |
+
logger = structlog.get_logger()
|
| 207 |
+
|
| 208 |
+
# ContextVar for per-request token tracking
|
| 209 |
+
_request_tokens: ContextVar[dict[str, int]] = ContextVar(
|
| 210 |
+
"request_tokens",
|
| 211 |
+
default={"input": 0, "output": 0},
|
| 212 |
+
)
|
| 213 |
+
|
| 214 |
+
|
| 215 |
+
class TokenTrackingMiddleware(ChatMiddleware):
|
| 216 |
+
"""Tracks token usage across chat requests.
|
| 217 |
+
|
| 218 |
+
This middleware logs token usage after each chat completion
|
| 219 |
+
and maintains running totals for the session.
|
| 220 |
+
|
| 221 |
+
Usage metrics are logged via structlog for observability.
|
| 222 |
+
"""
|
| 223 |
+
|
| 224 |
+
def __init__(self) -> None:
|
| 225 |
+
self.total_input_tokens = 0
|
| 226 |
+
self.total_output_tokens = 0
|
| 227 |
+
self.request_count = 0
|
| 228 |
+
|
| 229 |
+
async def process(self, context: ChatContext, next: Any) -> None:
|
| 230 |
+
"""Process request and track token usage."""
|
| 231 |
+
await next(context)
|
| 232 |
+
|
| 233 |
+
# Extract usage from response if available
|
| 234 |
+
if context.result is None:
|
| 235 |
+
return
|
| 236 |
+
|
| 237 |
+
usage = None
|
| 238 |
+
|
| 239 |
+
# Try to get usage from response
|
| 240 |
+
if hasattr(context.result, "usage"):
|
| 241 |
+
usage = context.result.usage
|
| 242 |
+
elif hasattr(context.result, "messages") and context.result.messages:
|
| 243 |
+
# Check first message for usage metadata
|
| 244 |
+
msg = context.result.messages[0]
|
| 245 |
+
if hasattr(msg, "metadata") and msg.metadata:
|
| 246 |
+
usage = msg.metadata.get("usage")
|
| 247 |
+
|
| 248 |
+
if usage:
|
| 249 |
+
input_tokens = usage.get("input_tokens", 0) or usage.get("prompt_tokens", 0)
|
| 250 |
+
output_tokens = usage.get("output_tokens", 0) or usage.get("completion_tokens", 0)
|
| 251 |
+
|
| 252 |
+
self.total_input_tokens += input_tokens
|
| 253 |
+
self.total_output_tokens += output_tokens
|
| 254 |
+
self.request_count += 1
|
| 255 |
+
|
| 256 |
+
logger.info(
|
| 257 |
+
"Token usage",
|
| 258 |
+
request_input=input_tokens,
|
| 259 |
+
request_output=output_tokens,
|
| 260 |
+
total_input=self.total_input_tokens,
|
| 261 |
+
total_output=self.total_output_tokens,
|
| 262 |
+
total_requests=self.request_count,
|
| 263 |
+
)
|
| 264 |
+
|
| 265 |
+
|
| 266 |
+
def get_token_stats() -> dict[str, int]:
|
| 267 |
+
"""Get current request's token usage."""
|
| 268 |
+
return _request_tokens.get().copy()
|
| 269 |
+
```
|
| 270 |
+
|
| 271 |
+
### Step 6: Apply Middleware to HuggingFaceChatClient (15 min)
|
| 272 |
+
|
| 273 |
+
```python
|
| 274 |
+
# src/clients/huggingface.py - Update __init__
|
| 275 |
+
|
| 276 |
+
from src.middleware.retry import RetryMiddleware
|
| 277 |
+
from src.middleware.token_tracking import TokenTrackingMiddleware
|
| 278 |
+
|
| 279 |
+
@use_function_invocation
|
| 280 |
+
@use_observability
|
| 281 |
+
@use_chat_middleware
|
| 282 |
+
class HuggingFaceChatClient(BaseChatClient):
|
| 283 |
+
def __init__(
|
| 284 |
+
self,
|
| 285 |
+
model_id: str | None = None,
|
| 286 |
+
api_key: str | None = None,
|
| 287 |
+
**kwargs: Any,
|
| 288 |
+
) -> None:
|
| 289 |
+
# Create middleware instances
|
| 290 |
+
middleware = [
|
| 291 |
+
RetryMiddleware(max_attempts=3, min_wait=1.0, max_wait=10.0),
|
| 292 |
+
TokenTrackingMiddleware(),
|
| 293 |
+
]
|
| 294 |
+
|
| 295 |
+
super().__init__(middleware=middleware, **kwargs)
|
| 296 |
+
# ... rest of __init__
|
| 297 |
+
```
|
| 298 |
+
|
| 299 |
+
### Step 7: Update Tests (20 min)
|
| 300 |
+
|
| 301 |
+
```python
|
| 302 |
+
# tests/unit/middleware/test_retry.py
|
| 303 |
+
import pytest
|
| 304 |
+
from unittest.mock import AsyncMock, MagicMock
|
| 305 |
+
from src.middleware.retry import RetryMiddleware
|
| 306 |
+
|
| 307 |
+
|
| 308 |
+
@pytest.mark.asyncio
|
| 309 |
+
async def test_retry_middleware_succeeds_first_try():
|
| 310 |
+
"""RetryMiddleware should pass through on success."""
|
| 311 |
+
middleware = RetryMiddleware(max_attempts=3)
|
| 312 |
+
context = MagicMock()
|
| 313 |
+
next_fn = AsyncMock()
|
| 314 |
+
|
| 315 |
+
await middleware.process(context, next_fn)
|
| 316 |
+
|
| 317 |
+
next_fn.assert_called_once_with(context)
|
| 318 |
+
|
| 319 |
+
|
| 320 |
+
@pytest.mark.asyncio
|
| 321 |
+
async def test_retry_middleware_retries_on_429():
|
| 322 |
+
"""RetryMiddleware should retry on 429 rate limit."""
|
| 323 |
+
middleware = RetryMiddleware(max_attempts=3, min_wait=0.01)
|
| 324 |
+
context = MagicMock()
|
| 325 |
+
|
| 326 |
+
# First two calls fail with 429, third succeeds
|
| 327 |
+
call_count = 0
|
| 328 |
+
|
| 329 |
+
async def mock_next(ctx):
|
| 330 |
+
nonlocal call_count
|
| 331 |
+
call_count += 1
|
| 332 |
+
if call_count < 3:
|
| 333 |
+
error = Exception("Rate limited")
|
| 334 |
+
error.response = MagicMock(status_code=429)
|
| 335 |
+
raise error
|
| 336 |
+
|
| 337 |
+
await middleware.process(context, mock_next)
|
| 338 |
+
assert call_count == 3
|
| 339 |
+
|
| 340 |
+
|
| 341 |
+
@pytest.mark.asyncio
|
| 342 |
+
async def test_retry_middleware_raises_after_max_attempts():
|
| 343 |
+
"""RetryMiddleware should raise after max attempts exhausted."""
|
| 344 |
+
middleware = RetryMiddleware(max_attempts=2, min_wait=0.01)
|
| 345 |
+
context = MagicMock()
|
| 346 |
+
|
| 347 |
+
async def always_fails(ctx):
|
| 348 |
+
error = Exception("Always fails")
|
| 349 |
+
error.response = MagicMock(status_code=500)
|
| 350 |
+
raise error
|
| 351 |
+
|
| 352 |
+
with pytest.raises(Exception, match="Always fails"):
|
| 353 |
+
await middleware.process(context, always_fails)
|
| 354 |
+
```
|
| 355 |
+
|
| 356 |
+
---
|
| 357 |
+
|
| 358 |
+
## Implementation Checklist
|
| 359 |
+
|
| 360 |
+
### Phase 1: Folder Rename
|
| 361 |
+
- [ ] `git mv src/middleware src/workflows`
|
| 362 |
+
- [ ] Update import in `src/orchestrators/hierarchical.py`
|
| 363 |
+
- [ ] Update `src/workflows/__init__.py` docstring
|
| 364 |
+
- [ ] Run `make check` - verify no import errors
|
| 365 |
+
|
| 366 |
+
### Phase 2: Create Middleware Package
|
| 367 |
+
- [ ] Create `src/middleware/__init__.py`
|
| 368 |
+
- [ ] Create `src/middleware/retry.py` with `RetryMiddleware`
|
| 369 |
+
- [ ] Create `src/middleware/token_tracking.py` with `TokenTrackingMiddleware`
|
| 370 |
+
- [ ] Run `make check` - verify no syntax errors
|
| 371 |
+
|
| 372 |
+
### Phase 3: Apply to Client
|
| 373 |
+
- [ ] Update `src/clients/huggingface.py` to use middleware
|
| 374 |
+
- [ ] Test manually: `uv run python -c "from src.clients.huggingface import HuggingFaceChatClient; print('OK')"`
|
| 375 |
+
- [ ] Run `make check`
|
| 376 |
+
|
| 377 |
+
### Phase 4: Tests
|
| 378 |
+
- [ ] Create `tests/unit/middleware/__init__.py`
|
| 379 |
+
- [ ] Create `tests/unit/middleware/test_retry.py`
|
| 380 |
+
- [ ] Create `tests/unit/middleware/test_token_tracking.py`
|
| 381 |
+
- [ ] Run `make test` - all tests pass
|
| 382 |
+
|
| 383 |
+
### Phase 5: Cleanup
|
| 384 |
+
- [ ] Remove `.gitkeep` from `src/workflows/` if present
|
| 385 |
+
- [ ] Run full `make check`
|
| 386 |
+
- [ ] Commit with message: "refactor: implement proper middleware architecture (SPEC-21)"
|
| 387 |
+
|
| 388 |
+
---
|
| 389 |
+
|
| 390 |
+
## Acceptance Criteria
|
| 391 |
+
|
| 392 |
+
1. `src/middleware/` folder contains actual `ChatMiddleware` implementations
|
| 393 |
+
2. `src/workflows/` folder contains `SubIterationMiddleware` (renamed from middleware)
|
| 394 |
+
3. `HuggingFaceChatClient` uses `RetryMiddleware` - no more crashes on 429
|
| 395 |
+
4. Token usage is logged via `TokenTrackingMiddleware`
|
| 396 |
+
5. All existing tests pass
|
| 397 |
+
6. `make check` passes
|
| 398 |
+
7. No import errors anywhere in codebase
|
| 399 |
+
|
| 400 |
+
---
|
| 401 |
+
|
| 402 |
+
## Dependencies
|
| 403 |
+
|
| 404 |
+
- **SPEC-20** should be done first (simpler, builds confidence)
|
| 405 |
+
- Requires `agent-framework-core` package (already installed)
|
| 406 |
+
|
| 407 |
+
---
|
| 408 |
+
|
| 409 |
+
## Gotchas & Nuances
|
| 410 |
+
|
| 411 |
+
1. **MS middleware signature:** The `process` method takes `(context, next)` where `next` is a callable
|
| 412 |
+
2. **Middleware order matters:** Retry should be FIRST so it wraps everything
|
| 413 |
+
3. **ContextVar for token tracking:** Use ContextVar for per-request isolation
|
| 414 |
+
4. **Don't break HierarchicalOrchestrator:** It uses `SubIterationMiddleware` - update import path
|
| 415 |
+
5. **BaseChatClient constructor:** Check if it accepts `middleware=` parameter - may need to register differently
|
| 416 |
+
|
| 417 |
+
---
|
| 418 |
+
|
| 419 |
+
## Testing the Fix
|
| 420 |
+
|
| 421 |
+
After implementation, verify 429 handling:
|
| 422 |
+
|
| 423 |
+
```python
|
| 424 |
+
# Manual test
|
| 425 |
+
import asyncio
|
| 426 |
+
from src.clients.huggingface import HuggingFaceChatClient
|
| 427 |
+
from agent_framework import ChatMessage, ChatOptions
|
| 428 |
+
|
| 429 |
+
async def test():
|
| 430 |
+
client = HuggingFaceChatClient()
|
| 431 |
+
# Make rapid requests to trigger rate limit
|
| 432 |
+
for i in range(10):
|
| 433 |
+
try:
|
| 434 |
+
resp = await client.get_response(
|
| 435 |
+
messages=[ChatMessage(role="user", text="Hello")],
|
| 436 |
+
chat_options=ChatOptions(),
|
| 437 |
+
)
|
| 438 |
+
print(f"Request {i}: OK")
|
| 439 |
+
except Exception as e:
|
| 440 |
+
print(f"Request {i}: {e}")
|
| 441 |
+
|
| 442 |
+
asyncio.run(test())
|
| 443 |
+
```
|
| 444 |
+
|
| 445 |
+
Should see retry logs instead of immediate crashes.
|
docs/specs/SPEC-22-PROGRESS-BAR-REMOVAL.md
ADDED
|
@@ -0,0 +1,127 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# SPEC-22: Progress Bar Removal
|
| 2 |
+
|
| 3 |
+
**Status:** READY FOR IMPLEMENTATION
|
| 4 |
+
**Priority:** P3 (Cosmetic UX fix)
|
| 5 |
+
**Effort:** 15 minutes
|
| 6 |
+
**PR Scope:** Single file fix
|
| 7 |
+
|
| 8 |
+
---
|
| 9 |
+
|
| 10 |
+
## Problem Statement
|
| 11 |
+
|
| 12 |
+
The `gr.Progress()` bar conflicts with Gradio's `ChatInterface`, causing visual glitches:
|
| 13 |
+
- Progress bar "floats" in the middle of chat output
|
| 14 |
+
- Text overlaps with progress bar
|
| 15 |
+
- Looks unprofessional
|
| 16 |
+
|
| 17 |
+
**Root Cause:** `gr.Progress()` is designed for `gr.Interface`, not `ChatInterface`. It's a known Gradio limitation.
|
| 18 |
+
|
| 19 |
+
---
|
| 20 |
+
|
| 21 |
+
## Current Code (BROKEN)
|
| 22 |
+
|
| 23 |
+
```python
|
| 24 |
+
# src/app.py - research_agent function
|
| 25 |
+
async def research_agent(
|
| 26 |
+
message: str,
|
| 27 |
+
history: list[dict[str, Any]],
|
| 28 |
+
domain: str = "sexual_health",
|
| 29 |
+
api_key: str = "",
|
| 30 |
+
api_key_state: str = "",
|
| 31 |
+
progress: gr.Progress = gr.Progress(), # β PROBLEM: Causes visual glitches
|
| 32 |
+
) -> AsyncGenerator[str, None]:
|
| 33 |
+
...
|
| 34 |
+
if event.type == "started":
|
| 35 |
+
progress(0, desc="Starting research...") # β These cause overlap
|
| 36 |
+
elif event.type == "progress":
|
| 37 |
+
progress(p, desc=event.message)
|
| 38 |
+
```
|
| 39 |
+
|
| 40 |
+
---
|
| 41 |
+
|
| 42 |
+
## Required Fix
|
| 43 |
+
|
| 44 |
+
Remove `gr.Progress()` entirely. We already have emoji status messages in chat output.
|
| 45 |
+
|
| 46 |
+
```python
|
| 47 |
+
# src/app.py - Fixed version
|
| 48 |
+
async def research_agent(
|
| 49 |
+
message: str,
|
| 50 |
+
history: list[dict[str, Any]],
|
| 51 |
+
domain: str = "sexual_health",
|
| 52 |
+
api_key: str = "",
|
| 53 |
+
api_key_state: str = "",
|
| 54 |
+
# REMOVED: progress: gr.Progress = gr.Progress(),
|
| 55 |
+
) -> AsyncGenerator[str, None]:
|
| 56 |
+
...
|
| 57 |
+
# REMOVED: All progress(...) calls
|
| 58 |
+
|
| 59 |
+
# KEEP: Emoji status messages are already being yielded
|
| 60 |
+
# These work great with ChatInterface:
|
| 61 |
+
# yield "β±οΈ **PROGRESS**: Round 1/5 (~3m 0s remaining)"
|
| 62 |
+
```
|
| 63 |
+
|
| 64 |
+
---
|
| 65 |
+
|
| 66 |
+
## Implementation Checklist
|
| 67 |
+
|
| 68 |
+
- [ ] Open `src/app.py`
|
| 69 |
+
- [ ] Remove `progress: gr.Progress = gr.Progress()` from `research_agent` signature
|
| 70 |
+
- [ ] Remove all `progress(...)` calls inside `research_agent`
|
| 71 |
+
- [ ] Verify emoji status yields are still present (they should be)
|
| 72 |
+
- [ ] Run `uv run python -c "from src.app import create_demo; print('OK')"`
|
| 73 |
+
- [ ] Run `make check`
|
| 74 |
+
- [ ] Test locally: `uv run python src/app.py` and verify no floating progress bar
|
| 75 |
+
|
| 76 |
+
---
|
| 77 |
+
|
| 78 |
+
## What We Keep
|
| 79 |
+
|
| 80 |
+
The emoji status messages in chat output:
|
| 81 |
+
|
| 82 |
+
```
|
| 83 |
+
β±οΈ **PROGRESS**: Round 1/5 (~3m 0s remaining)
|
| 84 |
+
π¬ **Step 2: SearchAgent** - Searching for evidence...
|
| 85 |
+
β
**COMPLETE**: Research finished in 45 seconds
|
| 86 |
+
```
|
| 87 |
+
|
| 88 |
+
These are yielded directly to chat and work perfectly with `ChatInterface`.
|
| 89 |
+
|
| 90 |
+
---
|
| 91 |
+
|
| 92 |
+
## Acceptance Criteria
|
| 93 |
+
|
| 94 |
+
1. No `gr.Progress()` in `research_agent` function signature
|
| 95 |
+
2. No `progress(...)` calls in `research_agent` function body
|
| 96 |
+
3. Emoji status messages still appear in chat output
|
| 97 |
+
4. No floating/overlapping progress bar in UI
|
| 98 |
+
5. `make check` passes
|
| 99 |
+
|
| 100 |
+
---
|
| 101 |
+
|
| 102 |
+
## Dependencies
|
| 103 |
+
|
| 104 |
+
None. This is a standalone cosmetic fix.
|
| 105 |
+
|
| 106 |
+
---
|
| 107 |
+
|
| 108 |
+
## Testing
|
| 109 |
+
|
| 110 |
+
```bash
|
| 111 |
+
# Start local server
|
| 112 |
+
uv run python src/app.py
|
| 113 |
+
|
| 114 |
+
# In browser:
|
| 115 |
+
# 1. Submit a research query
|
| 116 |
+
# 2. Verify NO floating progress bar appears
|
| 117 |
+
# 3. Verify emoji status messages DO appear in chat
|
| 118 |
+
# 4. Verify chat messages don't have visual glitches
|
| 119 |
+
```
|
| 120 |
+
|
| 121 |
+
---
|
| 122 |
+
|
| 123 |
+
## Notes
|
| 124 |
+
|
| 125 |
+
- This is the recommended fix from Gradio's own documentation
|
| 126 |
+
- `ChatInterface.show_progress="minimal"` (default) still shows a spinner, which is fine
|
| 127 |
+
- If we need a real progress bar later, we'd need to refactor to `gr.Blocks` wrapper
|