Spaces:

VibecoderMcSwaggins
/

DeepBoner

Paused

VibecoderMcSwaggins commited on Dec 7, 2025

Commit

2ac49c3

1 Parent(s): e6f0fda

docs: Add tech debt specs and architecture documentation

Adds comprehensive documentation for phased bug fixes:
- SPEC-20: PubMed JSON Parsing Fix (P2)
- SPEC-21: Middleware Architecture Refactor (P2)
- SPEC-22: Progress Bar Removal (P3)

Also adds:
- docs/specs/README.md - Master index with implementation order
- docs/bugs/p2-hardening-issues.md - P2 issue tracker
- docs/bugs/p3-ms-framework-gaps.md - MS framework gap analysis
- docs/architecture/adr-001-middleware-refactor.md - ADR for middleware
- docs/architecture/ms-framework-usage-report.md - What we use from MS

Files changed (8) hide show

docs/architecture/adr-001-middleware-refactor.md +54 -0
docs/architecture/ms-framework-usage-report.md +68 -0
docs/bugs/p2-hardening-issues.md +45 -0
docs/bugs/p3-ms-framework-gaps.md +289 -0
docs/specs/README.md +146 -0
docs/specs/SPEC-20-PUBMED-JSON-FIX.md +129 -0
docs/specs/SPEC-21-MIDDLEWARE-ARCHITECTURE.md +445 -0
docs/specs/SPEC-22-PROGRESS-BAR-REMOVAL.md +127 -0

docs/architecture/adr-001-middleware-refactor.md ADDED Viewed

	@@ -0,0 +1,54 @@

+# ADR-001: Middleware Architecture Refactor
+**Status:** ACCEPTED
+**Date:** 2025-12-06
+**Decision Makers:** Development Team
+---
+## Context
+The current `src/middleware/` folder is misleadingly named. It contains `SubIterationMiddleware`, which implements a workflow pattern (team→judge loop), not the interceptor middleware pattern used by Microsoft Agent Framework.
+Additionally, we're missing proper middleware implementations for:
+- Retry logic on transient errors (429, 500, 502, 503, 504)
+- Token usage tracking for cost monitoring
+---
+## Decision
+1. **Rename `src/middleware/` to `src/workflows/`** to accurately reflect what it contains
+2. **Create new `src/middleware/` with proper MS-pattern middleware:**
+   - `RetryMiddleware(ChatMiddleware)` - exponential backoff retry
+   - `TokenTrackingMiddleware(ChatMiddleware)` - token usage logging
+---
+## Consequences
+### Positive
+- Clearer codebase organization
+- Proper use of MS Agent Framework patterns
+- HuggingFace 429 crashes will be handled gracefully
+- Token usage will be visible for cost monitoring
+### Negative
+- Requires updating imports in `src/orchestrators/hierarchical.py`
+- One-time migration effort
+### Neutral
+- Aligns with Microsoft Agent Framework conventions
+---
+## Implementation
+See [SPEC-21: Middleware Architecture Refactor](../specs/SPEC-21-MIDDLEWARE-ARCHITECTURE.md) for detailed implementation steps.
+---
+## References
+- Microsoft Agent Framework `_middleware.py`
+- [P2 Hardening Issues](../bugs/p2-hardening-issues.md) Issue 2 & 3

docs/architecture/ms-framework-usage-report.md ADDED Viewed

	@@ -0,0 +1,68 @@

+# Microsoft Agent Framework Usage Report
+**Date:** 2025-12-06
+**Framework Version:** agent-framework-core==1.0.0b251204
+---
+## What We Use From MS Framework (pip installed)
+### Core Classes
+- `BaseChatClient` - Base class for chat clients
+- `ChatMessage`, `ChatRole` - Message types
+- `ChatOptions` - Request configuration
+- `ChatAgent` - Agent base class
+### Decorators (Applied to HuggingFaceChatClient)
+- `@use_function_invocation` - Enables tool/function calling
+- `@use_observability` - Adds OTEL tracing hooks
+- `@use_chat_middleware` - Enables middleware pipeline
+### Middleware Base Classes (Available but NOT yet used)
+- `ChatMiddleware` - Intercepts chat client requests
+- `AgentMiddleware` - Intercepts agent invocations
+- `FunctionMiddleware` - Intercepts tool calls
+---
+## What We Hand-Roll (Custom Implementation)
+### Orchestration
+- `AdvancedOrchestrator` - Main research workflow
+- `HierarchicalOrchestrator` - Team-based orchestration
+- `SubIterationMiddleware` - Team→judge loop (workflow, not middleware)
+### Clients
+- `HuggingFaceChatClient` - Adapter for HuggingFace Inference API
+- Client factory with auto-detection
+### Tools
+- `PubMedTool`, `ClinicalTrialsTool`, `EuropePMCTool`
+- `SearchHandler` - Scatter-gather orchestration
+### Services
+- `EmbeddingService` - Local sentence-transformers
+- `LlamaIndexRAG` - Premium OpenAI embeddings
+- `ResearchMemory` - Research state management
+---
+## Gap Analysis
+| Component | MS Framework Has | DeepBoner Has | Status |
+|-----------|------------------|---------------|--------|
+| Chat Middleware | `ChatMiddleware` base | Uses decorator only | SPEC-21 |
+| Retry Logic | N/A (left to user) | None | SPEC-21 |
+| Token Tracking | OTEL histograms | None | SPEC-21 |
+| Thread State | `AgentThread` serialization | `ResearchMemory` (no serialization) | P3 |
+| Observability | Full OTEL | structlog only | P3 |
+---
+## Recommendations
+1. **Implement `RetryMiddleware`** using MS `ChatMiddleware` base class
+2. **Implement `TokenTrackingMiddleware`** for cost visibility
+3. **Rename `src/middleware/`** to avoid confusion with MS patterns
+See [SPEC-21](../specs/SPEC-21-MIDDLEWARE-ARCHITECTURE.md) for implementation details.

docs/bugs/p2-hardening-issues.md ADDED Viewed

	@@ -0,0 +1,45 @@

+# P2: Hardening Issues
+**Date:** 2025-12-06
+**Priority:** P2 (Should fix soon)
+---
+## Issue 1: PubMed JSON Parsing Crash
+**Status:** SPEC-20 CREATED
+**File:** `src/tools/pubmed.py:88`
+The PubMed search tool crashes when the API returns non-JSON responses (maintenance pages, error pages). The JSON parsing happens outside the try/except block.
+**Resolution:** See [SPEC-20: PubMed JSON Parsing Fix](../specs/SPEC-20-PUBMED-JSON-FIX.md)
+---
+## Issue 2: HuggingFace Client Missing Retry Logic
+**Status:** SPEC-21 CREATED
+**File:** `src/clients/huggingface.py`
+The HuggingFaceChatClient has no retry logic for transient errors (429 rate limits, 500 server errors). When the API returns a 429, the entire research workflow crashes.
+**Resolution:** See [SPEC-21: Middleware Architecture Refactor](../specs/SPEC-21-MIDDLEWARE-ARCHITECTURE.md)
+---
+## Issue 3: Misleading Middleware Folder Name
+**Status:** SPEC-21 CREATED
+**File:** `src/middleware/`
+The `src/middleware/` folder contains `SubIterationMiddleware`, which is actually a workflow pattern (team→judge loop), not interceptor middleware. This is confusing and misleading.
+**Resolution:** See [SPEC-21: Middleware Architecture Refactor](../specs/SPEC-21-MIDDLEWARE-ARCHITECTURE.md)
+---
+## Related Documentation
+- [SPEC-20: PubMed JSON Parsing Fix](../specs/SPEC-20-PUBMED-JSON-FIX.md)
+- [SPEC-21: Middleware Architecture Refactor](../specs/SPEC-21-MIDDLEWARE-ARCHITECTURE.md)
+- [P3: MS Framework Gaps](./p3-ms-framework-gaps.md)

docs/bugs/p3-ms-framework-gaps.md ADDED Viewed

	@@ -0,0 +1,289 @@

+# P3: Microsoft Agent Framework Gaps Analysis
+**Date:** 2025-12-06
+**Priority:** P3 (Nice-to-Have)
+**Source:** Comparison with Microsoft Agent Framework v1.0.0b251204 (commit 8c6b12e)
+## Executive Summary
+Comparison of DeepBoner's implementation against Microsoft Agent Framework reveals several architectural patterns we're missing. These are not bugs but opportunities for hardening and production-readiness.
+---
+## Gap 1: OpenTelemetry Observability (HIGH VALUE)
+**What MS Framework Has:**
+```python
+# observability.py - 1706 lines of comprehensive OTEL integration
+from opentelemetry.trace import get_tracer, Span
+from opentelemetry.metrics import get_meter, Histogram
+@use_observability   # Decorator for ChatClient
+@use_agent_observability  # Decorator for Agent
+# Token usage histograms with bucket boundaries
+TOKEN_USAGE_BUCKET_BOUNDARIES = (1, 4, 16, 64, 256, 1024, 4096, 16384, 65536, 262144, 1048576)
+# Operation duration histograms
+OPERATION_DURATION_BUCKET_BOUNDARIES = (0.01, 0.02, 0.04, 0.08, 0.16, ...)
+# 80+ semantic span attributes (OtelAttr enum)
+OtelAttr.GEN_AI_OPERATION_NAME
+OtelAttr.GEN_AI_REQUEST_MODEL
+OtelAttr.GEN_AI_USAGE_INPUT_TOKENS
+OtelAttr.GEN_AI_USAGE_OUTPUT_TOKENS
+```
+**What DeepBoner Has:**
+- `structlog` for logging only
+- No distributed tracing
+- No metrics collection
+- No token usage tracking
+**Gap Impact:**
+- Cannot trace requests across agents
+- No token cost monitoring
+- No performance profiling in production
+**Recommended Fix:**
+```python
+# Add optional OTEL support to orchestrator
+# src/observability/__init__.py
+from opentelemetry import trace
+from opentelemetry.metrics import get_meter
+def setup_observability():
+    """One-time setup for OpenTelemetry."""
+    ...
+@contextmanager
+def trace_agent_operation(name: str, attributes: dict):
+    """Context manager for tracing agent operations."""
+    ...
+```
+---
+## Gap 2: Middleware Pipelines (MEDIUM VALUE) - ADDRESSED IN ADR-001
+> **NOTE:** This gap is being addressed in [ADR-001: Middleware Architecture Refactor](../architecture/adr-001-middleware-refactor.md)
+**What MS Framework Has:**
+```python
+# _middleware.py - Three types of middleware
+class AgentMiddleware(ABC):
+    """Intercepts agent invocations."""
+    async def process(self, context: AgentRunContext, next): ...
+class FunctionMiddleware(ABC):
+    """Intercepts tool/function calls."""
+    async def process(self, context: FunctionInvocationContext, next): ...
+class ChatMiddleware(ABC):
+    """Intercepts chat client requests."""
+    async def process(self, context: ChatContext, next): ...
+# Decorators for easy middleware creation
+@agent_middleware
+async def logging_middleware(context: AgentRunContext, next):
+    print(f"Before: {context.agent.name}")
+    await next(context)
+    print(f"After: {context.result}")
+# Pipeline execution with terminate support
+context.terminate = True  # Stops pipeline early
+```
+**What DeepBoner Has:**
+- Uses MS decorators (`@use_chat_middleware`, `@use_observability`) ✓
+- BUT: No custom `ChatMiddleware` class implementations ✗
+- `src/middleware/` folder contains a workflow, not actual middleware ✗
+**ADR-001 Solution:**
+1. Rename `src/middleware/` → `src/workflows/` (fix misleading name)
+2. Create proper `src/middleware/` with MS-pattern implementations:
+   - `RetryMiddleware(ChatMiddleware)` - fixes HuggingFace retry bug
+   - `TokenTrackingMiddleware(ChatMiddleware)` - enables cost monitoring
+   - `LoggingMiddleware(ChatMiddleware)` - structured request/response logs
+---
+## Gap 3: Thread/Conversation State Management (MEDIUM VALUE)
+**What MS Framework Has:**
+```python
+# _threads.py
+class AgentThread:
+    """Maintains conversation state with serialization support."""
+    def __init__(self, service_thread_id=None, message_store=None):
+        ...
+    async def serialize(self) -> dict[str, Any]:
+        """Persist thread state."""
+        ...
+    @classmethod
+    async def deserialize(cls, state: dict) -> "AgentThread":
+        """Restore thread from persisted state."""
+        ...
+class ChatMessageStoreProtocol(Protocol):
+    """Protocol for message storage backends."""
+    async def list_messages(self) -> list[ChatMessage]: ...
+    async def add_messages(self, messages: Sequence[ChatMessage]): ...
+```
+**What DeepBoner Has:**
+- `ResearchMemory` for research state only
+- No conversation persistence
+- No serialization/deserialization
+**Gap Impact:**
+- Cannot resume interrupted research sessions
+- Cannot persist conversation history
+- Cannot implement checkpointing
+---
+## Gap 4: Function/Tool Configuration (MEDIUM VALUE)
+**What MS Framework Has:**
+```python
+# _tools.py
+class FunctionInvocationConfiguration:
+    """Configuration for function invocation in chat clients."""
+    enabled: bool = True
+    max_iterations: int = 40  # Maximum tool loop iterations
+    max_consecutive_errors_per_request: int = 3
+    terminate_on_unknown_calls: bool = False
+    include_detailed_errors: bool = False
+class AIFunction:
+    """Wraps Python function for AI model calling."""
+    approval_mode: Literal["always_require", "never_require"]
+    max_invocations: int  # Per-function invocation limit
+    max_invocation_exceptions: int  # Per-function error limit
+    invocation_count: int  # Tracks usage
+```
+**What DeepBoner Has:**
+- `max_iterations` in Settings
+- Basic tool execution
+- No per-tool configuration
+- No approval mode
+**Gap Impact:**
+- Cannot limit individual tool usage
+- No human-in-the-loop for dangerous tools
+- No per-tool error budgets
+---
+## Gap 5: Context Provider Lifecycle (LOW VALUE)
+**What MS Framework Has:**
+```python
+# _memory.py
+class ContextProvider(ABC):
+    """Abstract pattern for injecting context into agent invocations."""
+    async def invoking(self, agent, thread) -> str | None:
+        """Called before agent invocation. Returns context to inject."""
+        ...
+    async def invoked(self, agent, thread, result):
+        """Called after agent invocation."""
+        ...
+    async def thread_created(self, thread):
+        """Called when new thread is created."""
+        ...
+class AggregateContextProvider(ContextProvider):
+    """Combines multiple context providers."""
+    ...
+```
+**What DeepBoner Has:**
+- `ResearchMemory` as simple state container
+- No lifecycle hooks
+- No provider aggregation
+---
+## Gap 6: Exception Granularity (LOW VALUE)
+**What MS Framework Has:**
+```
+AgentFrameworkException (base)
+├── AgentException
+│   ├── AgentExecutionException
+│   ├── AgentInitializationError
+│   └── AgentThreadException
+├── ChatClientException
+│   └── ChatClientInitializationError
+├── ServiceException
+│   ├── ServiceInitializationError
+│   ├── ServiceResponseException
+│   │   ├── ServiceContentFilterException
+│   │   ├── ServiceInvalidExecutionSettingsError
+│   │   └── ServiceInvalidResponseError
+│   └── ServiceInvalidAuthError
+├── ToolException
+│   └── ToolExecutionException
+├── MiddlewareException
+└── ContentError
+```
+**What DeepBoner Has:**
+```
+DeepBonerError (base)
+├── SearchError
+│   └── RateLimitError
+├── JudgeError
+├── ConfigurationError
+└── EmbeddingError
+```
+**Gap Impact:**
+- Less precise error handling
+- Harder to distinguish error sources
+- Less informative error messages for users
+---
+## Prioritized Implementation Roadmap
+### Phase 1: Quick Wins (1-2 days)
+1. Add token tracking to orchestrator (no OTEL yet, just counters)
+2. Add `max_consecutive_errors` to tool execution
+### Phase 2: Medium Effort (3-5 days)
+1. Add basic middleware pattern to orchestrator
+2. Implement thread serialization for `ResearchMemory`
+### Phase 3: Full Production (1-2 weeks)
+1. Full OpenTelemetry integration
+2. Complete middleware pipeline
+3. Context provider lifecycle hooks
+---
+## Related Issues
+- **P2 Hardening Issues:** `docs/bugs/p2-hardening-issues.md`
+- **MS Framework Reference:** `reference_repos/microsoft-agent-framework/`
+---
+## Notes
+These gaps are P3 because:
+1. DeepBoner is functional without them
+2. They're architectural improvements, not bug fixes
+3. User-facing functionality is not impacted
+However, for production deployment serving multiple users, Gaps 1 (Observability) and 3 (Thread State) become P1/P2.

docs/specs/README.md ADDED Viewed

	@@ -0,0 +1,146 @@

+# Tech Debt & Bug Fix Specs
+**Status:** AWAITING SENIOR REVIEW
+**Created:** 2025-12-06
+---
+## Overview
+These specs consolidate all identified bugs, tech debt, and architectural issues into phased, implementable work packages. Each spec is designed to be a single PR with TDD, SOLID, DRY, Gang of Four principles.
+**Implementation Order:** SPEC-20 → SPEC-21 → SPEC-22
+---
+## Spec Index
+| Spec | Title | Priority | Effort | Status |
+|------|-------|----------|--------|--------|
+| [SPEC-20](./SPEC-20-PUBMED-JSON-FIX.md) | PubMed JSON Parsing Fix | P2 | 15 min | READY |
+| [SPEC-21](./SPEC-21-MIDDLEWARE-ARCHITECTURE.md) | Middleware Architecture Refactor | P2 | 2 hours | READY |
+| [SPEC-22](./SPEC-22-PROGRESS-BAR-REMOVAL.md) | Progress Bar Removal | P3 | 15 min | READY |
+**Total Effort:** ~2.5 hours
+---
+## Why This Order?
+### SPEC-20 First (15 min)
+- Quickest win
+- Fixes a real crash bug
+- Builds confidence before larger refactor
+- Single file, single PR
+### SPEC-21 Second (2 hours)
+- The big architectural fix
+- Renames confusing folder
+- Implements proper MS framework patterns
+- Fixes HuggingFace retry bug THE RIGHT WAY
+- Adds token tracking
+### SPEC-22 Last (15 min)
+- Cosmetic only
+- Can be deferred if needed
+- Easy cleanup
+---
+## What These Specs Consolidate
+These specs replace the scattered documentation in:
+| Old Location | Now Covered By |
+|--------------|----------------|
+| `docs/bugs/p2-hardening-issues.md` Issue 1 | SPEC-20 |
+| `docs/bugs/p2-hardening-issues.md` Issue 2 | SPEC-21 |
+| `docs/architecture/adr-001-middleware-refactor.md` | SPEC-21 |
+| `docs/bugs/p3-progress-bar-positioning.md` | SPEC-22 |
+---
+## What's NOT In These Specs (Deferred P3)
+The following are documented but deferred for later:
+1. **OpenTelemetry observability** - Nice to have, not blocking
+2. **Thread state serialization** - Nice to have, not blocking
+3. **ResearchMemory locks** - Not a bug today (sequential execution)
+4. **Error path cleanup** - Minor resource leakage, GC handles it
+5. **Per-tool configuration** - Nice to have
+6. **Context provider lifecycle** - Nice to have
+These remain documented in `docs/bugs/p3-ms-framework-gaps.md` for future work.
+---
+## Implementation Protocol
+For each spec:
+1. **Read the spec** - Understand the problem and solution
+2. **TDD** - Write failing test first
+3. **Implement** - Minimal code to pass tests
+4. **Run `make check`** - Lint + typecheck + test
+5. **Commit** - Single commit per spec
+6. **PR** - One PR per spec with spec number in title
+---
+## Commit Message Format
+```
+fix: PubMed JSON parsing (SPEC-20)
+Moves JSON parsing inside try/except block to handle API
+maintenance pages gracefully. Adds JSONDecodeError handling.
+Fixes: production crash on PubMed maintenance pages
+```
+```
+refactor: middleware architecture (SPEC-21)
+- Renames src/middleware → src/workflows (accurate naming)
+- Creates proper src/middleware with ChatMiddleware implementations
+- Implements RetryMiddleware (fixes HuggingFace 429 crashes)
+- Implements TokenTrackingMiddleware (enables cost monitoring)
+```
+```
+fix: remove progress bar overlap (SPEC-22)
+Removes gr.Progress() from research_agent function.
+Gradio's Progress is incompatible with ChatInterface.
+Emoji status messages in chat output are retained.
+```
+---
+## Senior Review Checklist
+Before implementation, please verify:
+- [ ] SPEC-20: Fix approach is correct (move into try/except)
+- [ ] SPEC-21: MS middleware pattern is used correctly
+- [ ] SPEC-21: RetryMiddleware implementation follows framework conventions
+- [ ] SPEC-21: Folder rename won't break anything else
+- [ ] SPEC-22: Removing gr.Progress() is the right fix (vs CSS hack)
+- [ ] Order of implementation makes sense
+- [ ] Nothing critical is missing from these specs
+---
+## After Implementation
+Once all specs are implemented:
+1. Archive old docs:
+   - `docs/bugs/p2-hardening-issues.md` → Mark as RESOLVED
+   - `docs/architecture/adr-001-middleware-refactor.md` → Delete or archive
+   - `docs/bugs/p3-progress-bar-positioning.md` → Mark as RESOLVED
+2. Update `docs/bugs/active-bugs.md` to reflect completed fixes
+3. Consider v0.2.0 release with these fixes

docs/specs/SPEC-20-PUBMED-JSON-FIX.md ADDED Viewed

	@@ -0,0 +1,129 @@

+# SPEC-20: PubMed JSON Parsing Fix
+**Status:** READY FOR IMPLEMENTATION
+**Priority:** P2 (Critical - causes crashes)
+**Effort:** 15 minutes
+**PR Scope:** Single file fix
+---
+## Problem Statement
+The PubMed search tool crashes when the API returns non-JSON responses (maintenance pages, error pages). The JSON parsing happens **outside** the try/except block.
+**File:** `src/tools/pubmed.py:88`
+**Crash Type:** `json.JSONDecodeError`
+**User Impact:** Entire research workflow crashes
+---
+## Current Code (BROKEN)
+```python
+# src/tools/pubmed.py - Lines ~80-95
+try:
+    search_resp = await client.get(
+        f"{NCBI_BASE_URL}/esearch.fcgi",
+        params=search_params,
+    )
+    search_resp.raise_for_status()
+except httpx.HTTPStatusError as e:
+    logger.warning("PubMed search failed", status=e.response.status_code)
+    return []
+# ↓↓↓ THIS IS OUTSIDE THE TRY BLOCK ↓↓↓
+search_data = search_resp.json()  # CRASHES HERE on maintenance pages
+pmids = search_data.get("esearchresult", {}).get("idlist", [])
+```
+---
+## Required Fix
+Move JSON parsing inside try/except and add `JSONDecodeError` handling:
+```python
+# src/tools/pubmed.py - Fixed version
+try:
+    search_resp = await client.get(
+        f"{NCBI_BASE_URL}/esearch.fcgi",
+        params=search_params,
+    )
+    search_resp.raise_for_status()
+    search_data = search_resp.json()  # ← MOVED INSIDE TRY
+except httpx.HTTPStatusError as e:
+    logger.warning("PubMed search failed", status=e.response.status_code)
+    return []
+except json.JSONDecodeError as e:
+    logger.warning(
+        "PubMed returned invalid JSON (possible maintenance page)",
+        error=str(e),
+        response_preview=search_resp.text[:200] if search_resp else "N/A",
+    )
+    return []
+pmids = search_data.get("esearchresult", {}).get("idlist", [])
+```
+---
+## Implementation Checklist
+- [ ] Add `import json` at top of file (if not present)
+- [ ] Move `search_resp.json()` inside try block (line ~88)
+- [ ] Add `except json.JSONDecodeError` handler
+- [ ] Log warning with response preview for debugging
+- [ ] Return empty list (graceful degradation)
+- [ ] Write unit test: mock response with HTML content
+- [ ] Run `make check` (lint + typecheck + test)
+---
+## Test Case
+```python
+# tests/unit/tools/test_pubmed.py
+import pytest
+from unittest.mock import AsyncMock, MagicMock
+from src.tools.pubmed import search_pubmed
+@pytest.mark.asyncio
+async def test_pubmed_handles_maintenance_page():
+    """PubMed should gracefully handle non-JSON responses."""
+    # Mock httpx client returning HTML maintenance page
+    mock_response = MagicMock()
+    mock_response.status_code = 200
+    mock_response.text = "<html><body>Service Temporarily Unavailable</body></html>"
+    mock_response.json.side_effect = json.JSONDecodeError("Expecting value", "", 0)
+    mock_response.raise_for_status = MagicMock()
+    mock_client = AsyncMock()
+    mock_client.get.return_value = mock_response
+    # Should return empty list, not crash
+    result = await search_pubmed("test query", client=mock_client)
+    assert result == []
+```
+---
+## Acceptance Criteria
+1. `search_pubmed()` returns `[]` when API returns HTML
+2. Warning logged with response preview
+3. No `JSONDecodeError` propagates to caller
+4. All existing tests pass
+5. `make check` passes
+---
+## Dependencies
+None. This is a standalone fix.
+---
+## Notes
+- This same pattern may exist in `clinicaltrials.py` and `europepmc.py` - check after this fix
+- Do NOT over-engineer. Single fix, single PR.

docs/specs/SPEC-21-MIDDLEWARE-ARCHITECTURE.md ADDED Viewed

	@@ -0,0 +1,445 @@

+# SPEC-21: Middleware Architecture Refactor
+**Status:** READY FOR IMPLEMENTATION
+**Priority:** P2 (Architectural hygiene + fixes HuggingFace retry bug)
+**Effort:** 2 hours
+**PR Scope:** Folder rename + new middleware implementations
+---
+## Problem Statement
+1. **Misleading folder name:** `src/middleware/` contains a workflow (`SubIterationMiddleware`), not interceptor middleware
+2. **Missing retry logic:** `HuggingFaceChatClient` has no retry on 429/transient errors (P2 bug)
+3. **No token tracking:** Cannot monitor API costs
+4. **Not using MS framework patterns:** We use decorators but not `ChatMiddleware` base classes
+---
+## Current State (WRONG)
+```
+src/
+├── middleware/                      ← MISLEADING: contains workflow
+│   ├── __init__.py
+│   ├── sub_iteration.py             ← This is a WORKFLOW, not middleware
+│   └── .gitkeep
+```
+**HuggingFace client has no retry:**
+```python
+# src/clients/huggingface.py:263-265
+except Exception as e:
+    logger.error("HuggingFace API error", error=str(e))
+    raise  # ← No retry, crashes on 429
+```
+---
+## Target State (CORRECT)
+```
+src/
+├── workflows/                       ← RENAMED: now accurate
+│   ├── __init__.py
+│   └── sub_iteration.py
+│
+├── middleware/                      ← NEW: actual MS-pattern middleware
+│   ├── __init__.py
+│   ├── retry.py                     ← RetryMiddleware(ChatMiddleware)
+│   └── token_tracking.py            ← TokenTrackingMiddleware(ChatMiddleware)
+```
+---
+## Implementation Steps
+### Step 1: Rename Folder (5 min)
+```bash
+# Rename middleware → workflows
+git mv src/middleware src/workflows
+```
+### Step 2: Update Import (5 min)
+```python
+# src/orchestrators/hierarchical.py - Line ~15
+# BEFORE:
+from src.middleware.sub_iteration import SubIterationMiddleware, SubIterationTeam
+# AFTER:
+from src.workflows.sub_iteration import SubIterationMiddleware, SubIterationTeam
+```
+### Step 3: Create New Middleware Package (10 min)
+```python
+# src/middleware/__init__.py
+"""Microsoft Agent Framework middleware implementations.
+These are interceptor-pattern middleware that wrap chat client calls.
+They are NOT workflows - see src/workflows/ for orchestration patterns.
+"""
+from src.middleware.retry import RetryMiddleware
+from src.middleware.token_tracking import TokenTrackingMiddleware
+__all__ = ["RetryMiddleware", "TokenTrackingMiddleware"]
+```
+### Step 4: Implement RetryMiddleware (30 min)
+```python
+# src/middleware/retry.py
+"""Retry middleware for chat clients with exponential backoff."""
+import asyncio
+from typing import Any
+import structlog
+from agent_framework._middleware import ChatContext, ChatMiddleware
+logger = structlog.get_logger()
+class RetryMiddleware(ChatMiddleware):
+    """Retries failed chat requests with exponential backoff.
+    This middleware intercepts chat client calls and retries on transient
+    errors (rate limits, timeouts, server errors).
+    Attributes:
+        max_attempts: Maximum number of attempts (default: 3)
+        min_wait: Minimum wait between retries in seconds (default: 1.0)
+        max_wait: Maximum wait between retries in seconds (default: 10.0)
+        retryable_status_codes: HTTP status codes to retry (default: 429, 500, 502, 503, 504)
+    """
+    def __init__(
+        self,
+        max_attempts: int = 3,
+        min_wait: float = 1.0,
+        max_wait: float = 10.0,
+        retryable_status_codes: tuple[int, ...] = (429, 500, 502, 503, 504),
+    ) -> None:
+        self.max_attempts = max_attempts
+        self.min_wait = min_wait
+        self.max_wait = max_wait
+        self.retryable_status_codes = retryable_status_codes
+    def _is_retryable(self, error: Exception) -> bool:
+        """Check if error is retryable."""
+        # Check for httpx status errors
+        if hasattr(error, "response") and hasattr(error.response, "status_code"):
+            return error.response.status_code in self.retryable_status_codes
+        # Check for timeout errors
+        error_name = type(error).__name__.lower()
+        if "timeout" in error_name:
+            return True
+        # Check for connection errors
+        if "connection" in error_name:
+            return True
+        return False
+    def _calculate_wait(self, attempt: int) -> float:
+        """Calculate wait time with exponential backoff."""
+        wait = self.min_wait * (2 ** attempt)
+        return min(wait, self.max_wait)
+    async def process(self, context: ChatContext, next: Any) -> None:
+        """Process the chat request with retry logic."""
+        last_error: Exception | None = None
+        for attempt in range(self.max_attempts):
+            try:
+                await next(context)
+                return  # Success - exit retry loop
+            except Exception as e:
+                last_error = e
+                if not self._is_retryable(e):
+                    logger.warning(
+                        "Non-retryable error",
+                        error=str(e),
+                        error_type=type(e).__name__,
+                    )
+                    raise  # Don't retry non-retryable errors
+                if attempt < self.max_attempts - 1:
+                    wait_time = self._calculate_wait(attempt)
+                    logger.info(
+                        "Retrying after error",
+                        attempt=attempt + 1,
+                        max_attempts=self.max_attempts,
+                        wait_seconds=wait_time,
+                        error=str(e),
+                    )
+                    await asyncio.sleep(wait_time)
+        # All retries exhausted
+        logger.error(
+            "All retry attempts failed",
+            max_attempts=self.max_attempts,
+            last_error=str(last_error),
+        )
+        if last_error:
+            raise last_error
+```
+### Step 5: Implement TokenTrackingMiddleware (20 min)
+```python
+# src/middleware/token_tracking.py
+"""Token tracking middleware for monitoring API usage."""
+from contextvars import ContextVar
+from typing import Any
+import structlog
+from agent_framework._middleware import ChatContext, ChatMiddleware
+logger = structlog.get_logger()
+# ContextVar for per-request token tracking
+_request_tokens: ContextVar[dict[str, int]] = ContextVar(
+    "request_tokens",
+    default={"input": 0, "output": 0},
+)
+class TokenTrackingMiddleware(ChatMiddleware):
+    """Tracks token usage across chat requests.
+    This middleware logs token usage after each chat completion
+    and maintains running totals for the session.
+    Usage metrics are logged via structlog for observability.
+    """
+    def __init__(self) -> None:
+        self.total_input_tokens = 0
+        self.total_output_tokens = 0
+        self.request_count = 0
+    async def process(self, context: ChatContext, next: Any) -> None:
+        """Process request and track token usage."""
+        await next(context)
+        # Extract usage from response if available
+        if context.result is None:
+            return
+        usage = None
+        # Try to get usage from response
+        if hasattr(context.result, "usage"):
+            usage = context.result.usage
+        elif hasattr(context.result, "messages") and context.result.messages:
+            # Check first message for usage metadata
+            msg = context.result.messages[0]
+            if hasattr(msg, "metadata") and msg.metadata:
+                usage = msg.metadata.get("usage")
+        if usage:
+            input_tokens = usage.get("input_tokens", 0) or usage.get("prompt_tokens", 0)
+            output_tokens = usage.get("output_tokens", 0) or usage.get("completion_tokens", 0)
+            self.total_input_tokens += input_tokens
+            self.total_output_tokens += output_tokens
+            self.request_count += 1
+            logger.info(
+                "Token usage",
+                request_input=input_tokens,
+                request_output=output_tokens,
+                total_input=self.total_input_tokens,
+                total_output=self.total_output_tokens,
+                total_requests=self.request_count,
+            )
+def get_token_stats() -> dict[str, int]:
+    """Get current request's token usage."""
+    return _request_tokens.get().copy()
+```
+### Step 6: Apply Middleware to HuggingFaceChatClient (15 min)
+```python
+# src/clients/huggingface.py - Update __init__
+from src.middleware.retry import RetryMiddleware
+from src.middleware.token_tracking import TokenTrackingMiddleware
+@use_function_invocation
+@use_observability
+@use_chat_middleware
+class HuggingFaceChatClient(BaseChatClient):
+    def __init__(
+        self,
+        model_id: str | None = None,
+        api_key: str | None = None,
+        **kwargs: Any,
+    ) -> None:
+        # Create middleware instances
+        middleware = [
+            RetryMiddleware(max_attempts=3, min_wait=1.0, max_wait=10.0),
+            TokenTrackingMiddleware(),
+        ]
+        super().__init__(middleware=middleware, **kwargs)
+        # ... rest of __init__
+```
+### Step 7: Update Tests (20 min)
+```python
+# tests/unit/middleware/test_retry.py
+import pytest
+from unittest.mock import AsyncMock, MagicMock
+from src.middleware.retry import RetryMiddleware
+@pytest.mark.asyncio
+async def test_retry_middleware_succeeds_first_try():
+    """RetryMiddleware should pass through on success."""
+    middleware = RetryMiddleware(max_attempts=3)
+    context = MagicMock()
+    next_fn = AsyncMock()
+    await middleware.process(context, next_fn)
+    next_fn.assert_called_once_with(context)
+@pytest.mark.asyncio
+async def test_retry_middleware_retries_on_429():
+    """RetryMiddleware should retry on 429 rate limit."""
+    middleware = RetryMiddleware(max_attempts=3, min_wait=0.01)
+    context = MagicMock()
+    # First two calls fail with 429, third succeeds
+    call_count = 0
+    async def mock_next(ctx):
+        nonlocal call_count
+        call_count += 1
+        if call_count < 3:
+            error = Exception("Rate limited")
+            error.response = MagicMock(status_code=429)
+            raise error
+    await middleware.process(context, mock_next)
+    assert call_count == 3
+@pytest.mark.asyncio
+async def test_retry_middleware_raises_after_max_attempts():
+    """RetryMiddleware should raise after max attempts exhausted."""
+    middleware = RetryMiddleware(max_attempts=2, min_wait=0.01)
+    context = MagicMock()
+    async def always_fails(ctx):
+        error = Exception("Always fails")
+        error.response = MagicMock(status_code=500)
+        raise error
+    with pytest.raises(Exception, match="Always fails"):
+        await middleware.process(context, always_fails)
+```
+---
+## Implementation Checklist
+### Phase 1: Folder Rename
+- [ ] `git mv src/middleware src/workflows`
+- [ ] Update import in `src/orchestrators/hierarchical.py`
+- [ ] Update `src/workflows/__init__.py` docstring
+- [ ] Run `make check` - verify no import errors
+### Phase 2: Create Middleware Package
+- [ ] Create `src/middleware/__init__.py`
+- [ ] Create `src/middleware/retry.py` with `RetryMiddleware`
+- [ ] Create `src/middleware/token_tracking.py` with `TokenTrackingMiddleware`
+- [ ] Run `make check` - verify no syntax errors
+### Phase 3: Apply to Client
+- [ ] Update `src/clients/huggingface.py` to use middleware
+- [ ] Test manually: `uv run python -c "from src.clients.huggingface import HuggingFaceChatClient; print('OK')"`
+- [ ] Run `make check`
+### Phase 4: Tests
+- [ ] Create `tests/unit/middleware/__init__.py`
+- [ ] Create `tests/unit/middleware/test_retry.py`
+- [ ] Create `tests/unit/middleware/test_token_tracking.py`
+- [ ] Run `make test` - all tests pass
+### Phase 5: Cleanup
+- [ ] Remove `.gitkeep` from `src/workflows/` if present
+- [ ] Run full `make check`
+- [ ] Commit with message: "refactor: implement proper middleware architecture (SPEC-21)"
+---
+## Acceptance Criteria
+1. `src/middleware/` folder contains actual `ChatMiddleware` implementations
+2. `src/workflows/` folder contains `SubIterationMiddleware` (renamed from middleware)
+3. `HuggingFaceChatClient` uses `RetryMiddleware` - no more crashes on 429
+4. Token usage is logged via `TokenTrackingMiddleware`
+5. All existing tests pass
+6. `make check` passes
+7. No import errors anywhere in codebase
+---
+## Dependencies
+- **SPEC-20** should be done first (simpler, builds confidence)
+- Requires `agent-framework-core` package (already installed)
+---
+## Gotchas & Nuances
+1. **MS middleware signature:** The `process` method takes `(context, next)` where `next` is a callable
+2. **Middleware order matters:** Retry should be FIRST so it wraps everything
+3. **ContextVar for token tracking:** Use ContextVar for per-request isolation
+4. **Don't break HierarchicalOrchestrator:** It uses `SubIterationMiddleware` - update import path
+5. **BaseChatClient constructor:** Check if it accepts `middleware=` parameter - may need to register differently
+---
+## Testing the Fix
+After implementation, verify 429 handling:
+```python
+# Manual test
+import asyncio
+from src.clients.huggingface import HuggingFaceChatClient
+from agent_framework import ChatMessage, ChatOptions
+async def test():
+    client = HuggingFaceChatClient()
+    # Make rapid requests to trigger rate limit
+    for i in range(10):
+        try:
+            resp = await client.get_response(
+                messages=[ChatMessage(role="user", text="Hello")],
+                chat_options=ChatOptions(),
+            )
+            print(f"Request {i}: OK")
+        except Exception as e:
+            print(f"Request {i}: {e}")
+asyncio.run(test())
+```
+Should see retry logs instead of immediate crashes.

docs/specs/SPEC-22-PROGRESS-BAR-REMOVAL.md ADDED Viewed

	@@ -0,0 +1,127 @@

+# SPEC-22: Progress Bar Removal
+**Status:** READY FOR IMPLEMENTATION
+**Priority:** P3 (Cosmetic UX fix)
+**Effort:** 15 minutes
+**PR Scope:** Single file fix
+---
+## Problem Statement
+The `gr.Progress()` bar conflicts with Gradio's `ChatInterface`, causing visual glitches:
+- Progress bar "floats" in the middle of chat output
+- Text overlaps with progress bar
+- Looks unprofessional
+**Root Cause:** `gr.Progress()` is designed for `gr.Interface`, not `ChatInterface`. It's a known Gradio limitation.
+---
+## Current Code (BROKEN)
+```python
+# src/app.py - research_agent function
+async def research_agent(
+    message: str,
+    history: list[dict[str, Any]],
+    domain: str = "sexual_health",
+    api_key: str = "",
+    api_key_state: str = "",
+    progress: gr.Progress = gr.Progress(),  # ← PROBLEM: Causes visual glitches
+) -> AsyncGenerator[str, None]:
+    ...
+    if event.type == "started":
+        progress(0, desc="Starting research...")  # ← These cause overlap
+    elif event.type == "progress":
+        progress(p, desc=event.message)
+```
+---
+## Required Fix
+Remove `gr.Progress()` entirely. We already have emoji status messages in chat output.
+```python
+# src/app.py - Fixed version
+async def research_agent(
+    message: str,
+    history: list[dict[str, Any]],
+    domain: str = "sexual_health",
+    api_key: str = "",
+    api_key_state: str = "",
+    # REMOVED: progress: gr.Progress = gr.Progress(),
+) -> AsyncGenerator[str, None]:
+    ...
+    # REMOVED: All progress(...) calls
+    # KEEP: Emoji status messages are already being yielded
+    # These work great with ChatInterface:
+    # yield "⏱️ **PROGRESS**: Round 1/5 (~3m 0s remaining)"
+```
+---
+## Implementation Checklist
+- [ ] Open `src/app.py`
+- [ ] Remove `progress: gr.Progress = gr.Progress()` from `research_agent` signature
+- [ ] Remove all `progress(...)` calls inside `research_agent`
+- [ ] Verify emoji status yields are still present (they should be)
+- [ ] Run `uv run python -c "from src.app import create_demo; print('OK')"`
+- [ ] Run `make check`
+- [ ] Test locally: `uv run python src/app.py` and verify no floating progress bar
+---
+## What We Keep
+The emoji status messages in chat output:
+```
+⏱️ **PROGRESS**: Round 1/5 (~3m 0s remaining)
+🔬 **Step 2: SearchAgent** - Searching for evidence...
+✅ **COMPLETE**: Research finished in 45 seconds
+```
+These are yielded directly to chat and work perfectly with `ChatInterface`.
+---
+## Acceptance Criteria
+1. No `gr.Progress()` in `research_agent` function signature
+2. No `progress(...)` calls in `research_agent` function body
+3. Emoji status messages still appear in chat output
+4. No floating/overlapping progress bar in UI
+5. `make check` passes
+---
+## Dependencies
+None. This is a standalone cosmetic fix.
+---
+## Testing
+```bash
+# Start local server
+uv run python src/app.py
+# In browser:
+# 1. Submit a research query
+# 2. Verify NO floating progress bar appears
+# 3. Verify emoji status messages DO appear in chat
+# 4. Verify chat messages don't have visual glitches
+```
+---
+## Notes
+- This is the recommended fix from Gradio's own documentation
+- `ChatInterface.show_progress="minimal"` (default) still shows a spinner, which is fine
+- If we need a real progress bar later, we'd need to refactor to `gr.Blocks` wrapper