DeepBoner / docs /bugs /p3-ms-framework-gaps.md
VibecoderMcSwaggins's picture
docs: Address CodeRabbit review findings
f9576ce
# P3: Microsoft Agent Framework Gaps Analysis
**Date:** 2025-12-06
**Priority:** P3 (Nice-to-Have)
**Source:** Comparison with Microsoft Agent Framework v1.0.0b251204 (commit 8c6b12e)
## Executive Summary
Comparison of DeepBoner's implementation against Microsoft Agent Framework reveals several architectural patterns we're missing. These are not bugs but opportunities for hardening and production-readiness.
---
## Gap 1: OpenTelemetry Observability (HIGH VALUE)
**What MS Framework Has:**
```python
# observability.py - 1706 lines of comprehensive OTEL integration
from opentelemetry.trace import get_tracer, Span
from opentelemetry.metrics import get_meter, Histogram
@use_observability # Decorator for ChatClient
@use_agent_observability # Decorator for Agent
# Token usage histograms with bucket boundaries
TOKEN_USAGE_BUCKET_BOUNDARIES = (1, 4, 16, 64, 256, 1024, 4096, 16384, 65536, 262144, 1048576)
# Operation duration histograms
OPERATION_DURATION_BUCKET_BOUNDARIES = (0.01, 0.02, 0.04, 0.08, 0.16, ...)
# 80+ semantic span attributes (OtelAttr enum)
OtelAttr.GEN_AI_OPERATION_NAME
OtelAttr.GEN_AI_REQUEST_MODEL
OtelAttr.GEN_AI_USAGE_INPUT_TOKENS
OtelAttr.GEN_AI_USAGE_OUTPUT_TOKENS
```
**What DeepBoner Has:**
- `structlog` for logging only
- No distributed tracing
- No metrics collection
- No token usage tracking
**Gap Impact:**
- Cannot trace requests across agents
- No token cost monitoring
- No performance profiling in production
**Recommended Fix:**
```python
# Add optional OTEL support to orchestrator
# src/observability/__init__.py
from opentelemetry import trace
from opentelemetry.metrics import get_meter
def setup_observability():
"""One-time setup for OpenTelemetry."""
...
@contextmanager
def trace_agent_operation(name: str, attributes: dict):
"""Context manager for tracing agent operations."""
...
```
---
## Gap 2: Middleware Pipelines (MEDIUM VALUE) - ADDRESSED IN ADR-001
> **NOTE:** This gap is being addressed in [ADR-001: Middleware Architecture Refactor](../architecture/adr-001-middleware-refactor.md)
**What MS Framework Has:**
```python
# _middleware.py - Three types of middleware
class AgentMiddleware(ABC):
"""Intercepts agent invocations."""
async def process(self, context: AgentRunContext, next): ...
class FunctionMiddleware(ABC):
"""Intercepts tool/function calls."""
async def process(self, context: FunctionInvocationContext, next): ...
class ChatMiddleware(ABC):
"""Intercepts chat client requests."""
async def process(self, context: ChatContext, next): ...
# Decorators for easy middleware creation
@agent_middleware
async def logging_middleware(context: AgentRunContext, next):
print(f"Before: {context.agent.name}")
await next(context)
print(f"After: {context.result}")
# Pipeline execution with terminate support
context.terminate = True # Stops pipeline early
```
**What DeepBoner Has:**
- Uses MS decorators (`@use_chat_middleware`, `@use_observability`) βœ“
- BUT: No custom `ChatMiddleware` class implementations βœ—
- `src/middleware/` folder contains a workflow, not actual middleware βœ—
**ADR-001 Solution:**
1. Rename `src/middleware/` β†’ `src/workflows/` (fix misleading name)
2. Create proper `src/middleware/` with MS-pattern implementations:
- `RetryMiddleware(ChatMiddleware)` - fixes HuggingFace retry bug
- `TokenTrackingMiddleware(ChatMiddleware)` - enables cost monitoring
- `LoggingMiddleware(ChatMiddleware)` - structured request/response logs
---
## Gap 3: Thread/Conversation State Management (MEDIUM VALUE)
**What MS Framework Has:**
```python
# _threads.py
class AgentThread:
"""Maintains conversation state with serialization support."""
def __init__(self, service_thread_id=None, message_store=None):
...
async def serialize(self) -> dict[str, Any]:
"""Persist thread state."""
...
@classmethod
async def deserialize(cls, state: dict) -> "AgentThread":
"""Restore thread from persisted state."""
...
class ChatMessageStoreProtocol(Protocol):
"""Protocol for message storage backends."""
async def list_messages(self) -> list[ChatMessage]: ...
async def add_messages(self, messages: Sequence[ChatMessage]): ...
```
**What DeepBoner Has:**
- `ResearchMemory` for research state only
- No conversation persistence
- No serialization/deserialization
**Gap Impact:**
- Cannot resume interrupted research sessions
- Cannot persist conversation history
- Cannot implement checkpointing
---
## Gap 4: Function/Tool Configuration (MEDIUM VALUE)
**What MS Framework Has:**
```python
# _tools.py
class FunctionInvocationConfiguration:
"""Configuration for function invocation in chat clients."""
enabled: bool = True
max_iterations: int = 40 # Maximum tool loop iterations
max_consecutive_errors_per_request: int = 3
terminate_on_unknown_calls: bool = False
include_detailed_errors: bool = False
class AIFunction:
"""Wraps Python function for AI model calling."""
approval_mode: Literal["always_require", "never_require"]
max_invocations: int # Per-function invocation limit
max_invocation_exceptions: int # Per-function error limit
invocation_count: int # Tracks usage
```
**What DeepBoner Has:**
- `max_iterations` in Settings
- Basic tool execution
- No per-tool configuration
- No approval mode
**Gap Impact:**
- Cannot limit individual tool usage
- No human-in-the-loop for dangerous tools
- No per-tool error budgets
---
## Gap 5: Context Provider Lifecycle (LOW VALUE)
**What MS Framework Has:**
```python
# _memory.py
class ContextProvider(ABC):
"""Abstract pattern for injecting context into agent invocations."""
async def invoking(self, agent, thread) -> str | None:
"""Called before agent invocation. Returns context to inject."""
...
async def invoked(self, agent, thread, result):
"""Called after agent invocation."""
...
async def thread_created(self, thread):
"""Called when new thread is created."""
...
class AggregateContextProvider(ContextProvider):
"""Combines multiple context providers."""
...
```
**What DeepBoner Has:**
- `ResearchMemory` as simple state container
- No lifecycle hooks
- No provider aggregation
---
## Gap 6: Exception Granularity (LOW VALUE)
**What MS Framework Has:**
```text
AgentFrameworkException (base)
β”œβ”€β”€ AgentException
β”‚ β”œβ”€β”€ AgentExecutionException
β”‚ β”œβ”€β”€ AgentInitializationError
β”‚ └── AgentThreadException
β”œβ”€β”€ ChatClientException
β”‚ └── ChatClientInitializationError
β”œβ”€β”€ ServiceException
β”‚ β”œβ”€β”€ ServiceInitializationError
β”‚ β”œβ”€β”€ ServiceResponseException
β”‚ β”‚ β”œβ”€β”€ ServiceContentFilterException
β”‚ β”‚ β”œβ”€β”€ ServiceInvalidExecutionSettingsError
β”‚ β”‚ └── ServiceInvalidResponseError
β”‚ └── ServiceInvalidAuthError
β”œβ”€β”€ ToolException
β”‚ └── ToolExecutionException
β”œβ”€β”€ MiddlewareException
└── ContentError
```
**What DeepBoner Has:**
```text
DeepBonerError (base)
β”œβ”€β”€ SearchError
β”‚ └── RateLimitError
β”œβ”€β”€ JudgeError
β”œβ”€β”€ ConfigurationError
└── EmbeddingError
```
**Gap Impact:**
- Less precise error handling
- Harder to distinguish error sources
- Less informative error messages for users
---
## Prioritized Implementation Roadmap
### Phase 1: Quick Wins (1-2 days)
1. Add token tracking to orchestrator (no OTEL yet, just counters)
2. Add `max_consecutive_errors` to tool execution
### Phase 2: Medium Effort (3-5 days)
1. Add basic middleware pattern to orchestrator
2. Implement thread serialization for `ResearchMemory`
### Phase 3: Full Production (1-2 weeks)
1. Full OpenTelemetry integration
2. Complete middleware pipeline
3. Context provider lifecycle hooks
---
## Related Issues
- **P2 Hardening Issues:** `docs/bugs/p2-hardening-issues.md`
- **MS Framework Reference:** `reference_repos/microsoft-agent-framework/`
---
## Notes
These gaps are P3 because:
1. DeepBoner is functional without them
2. They're architectural improvements, not bug fixes
3. User-facing functionality is not impacted
However, for production deployment serving multiple users, Gaps 1 (Observability) and 3 (Thread State) become P1/P2.