# P3: Microsoft Agent Framework Gaps Analysis **Date:** 2025-12-06 **Priority:** P3 (Nice-to-Have) **Source:** Comparison with Microsoft Agent Framework v1.0.0b251204 (commit 8c6b12e) ## Executive Summary Comparison of DeepBoner's implementation against Microsoft Agent Framework reveals several architectural patterns we're missing. These are not bugs but opportunities for hardening and production-readiness. --- ## Gap 1: OpenTelemetry Observability (HIGH VALUE) **What MS Framework Has:** ```python # observability.py - 1706 lines of comprehensive OTEL integration from opentelemetry.trace import get_tracer, Span from opentelemetry.metrics import get_meter, Histogram @use_observability # Decorator for ChatClient @use_agent_observability # Decorator for Agent # Token usage histograms with bucket boundaries TOKEN_USAGE_BUCKET_BOUNDARIES = (1, 4, 16, 64, 256, 1024, 4096, 16384, 65536, 262144, 1048576) # Operation duration histograms OPERATION_DURATION_BUCKET_BOUNDARIES = (0.01, 0.02, 0.04, 0.08, 0.16, ...) # 80+ semantic span attributes (OtelAttr enum) OtelAttr.GEN_AI_OPERATION_NAME OtelAttr.GEN_AI_REQUEST_MODEL OtelAttr.GEN_AI_USAGE_INPUT_TOKENS OtelAttr.GEN_AI_USAGE_OUTPUT_TOKENS ``` **What DeepBoner Has:** - `structlog` for logging only - No distributed tracing - No metrics collection - No token usage tracking **Gap Impact:** - Cannot trace requests across agents - No token cost monitoring - No performance profiling in production **Recommended Fix:** ```python # Add optional OTEL support to orchestrator # src/observability/__init__.py from opentelemetry import trace from opentelemetry.metrics import get_meter def setup_observability(): """One-time setup for OpenTelemetry.""" ... @contextmanager def trace_agent_operation(name: str, attributes: dict): """Context manager for tracing agent operations.""" ... ``` --- ## Gap 2: Middleware Pipelines (MEDIUM VALUE) - ADDRESSED IN ADR-001 > **NOTE:** This gap is being addressed in [ADR-001: Middleware Architecture Refactor](../architecture/adr-001-middleware-refactor.md) **What MS Framework Has:** ```python # _middleware.py - Three types of middleware class AgentMiddleware(ABC): """Intercepts agent invocations.""" async def process(self, context: AgentRunContext, next): ... class FunctionMiddleware(ABC): """Intercepts tool/function calls.""" async def process(self, context: FunctionInvocationContext, next): ... class ChatMiddleware(ABC): """Intercepts chat client requests.""" async def process(self, context: ChatContext, next): ... # Decorators for easy middleware creation @agent_middleware async def logging_middleware(context: AgentRunContext, next): print(f"Before: {context.agent.name}") await next(context) print(f"After: {context.result}") # Pipeline execution with terminate support context.terminate = True # Stops pipeline early ``` **What DeepBoner Has:** - Uses MS decorators (`@use_chat_middleware`, `@use_observability`) ✓ - BUT: No custom `ChatMiddleware` class implementations ✗ - `src/middleware/` folder contains a workflow, not actual middleware ✗ **ADR-001 Solution:** 1. Rename `src/middleware/` → `src/workflows/` (fix misleading name) 2. Create proper `src/middleware/` with MS-pattern implementations: - `RetryMiddleware(ChatMiddleware)` - fixes HuggingFace retry bug - `TokenTrackingMiddleware(ChatMiddleware)` - enables cost monitoring - `LoggingMiddleware(ChatMiddleware)` - structured request/response logs --- ## Gap 3: Thread/Conversation State Management (MEDIUM VALUE) **What MS Framework Has:** ```python # _threads.py class AgentThread: """Maintains conversation state with serialization support.""" def __init__(self, service_thread_id=None, message_store=None): ... async def serialize(self) -> dict[str, Any]: """Persist thread state.""" ... @classmethod async def deserialize(cls, state: dict) -> "AgentThread": """Restore thread from persisted state.""" ... class ChatMessageStoreProtocol(Protocol): """Protocol for message storage backends.""" async def list_messages(self) -> list[ChatMessage]: ... async def add_messages(self, messages: Sequence[ChatMessage]): ... ``` **What DeepBoner Has:** - `ResearchMemory` for research state only - No conversation persistence - No serialization/deserialization **Gap Impact:** - Cannot resume interrupted research sessions - Cannot persist conversation history - Cannot implement checkpointing --- ## Gap 4: Function/Tool Configuration (MEDIUM VALUE) **What MS Framework Has:** ```python # _tools.py class FunctionInvocationConfiguration: """Configuration for function invocation in chat clients.""" enabled: bool = True max_iterations: int = 40 # Maximum tool loop iterations max_consecutive_errors_per_request: int = 3 terminate_on_unknown_calls: bool = False include_detailed_errors: bool = False class AIFunction: """Wraps Python function for AI model calling.""" approval_mode: Literal["always_require", "never_require"] max_invocations: int # Per-function invocation limit max_invocation_exceptions: int # Per-function error limit invocation_count: int # Tracks usage ``` **What DeepBoner Has:** - `max_iterations` in Settings - Basic tool execution - No per-tool configuration - No approval mode **Gap Impact:** - Cannot limit individual tool usage - No human-in-the-loop for dangerous tools - No per-tool error budgets --- ## Gap 5: Context Provider Lifecycle (LOW VALUE) **What MS Framework Has:** ```python # _memory.py class ContextProvider(ABC): """Abstract pattern for injecting context into agent invocations.""" async def invoking(self, agent, thread) -> str | None: """Called before agent invocation. Returns context to inject.""" ... async def invoked(self, agent, thread, result): """Called after agent invocation.""" ... async def thread_created(self, thread): """Called when new thread is created.""" ... class AggregateContextProvider(ContextProvider): """Combines multiple context providers.""" ... ``` **What DeepBoner Has:** - `ResearchMemory` as simple state container - No lifecycle hooks - No provider aggregation --- ## Gap 6: Exception Granularity (LOW VALUE) **What MS Framework Has:** ```text AgentFrameworkException (base) ├── AgentException │ ├── AgentExecutionException │ ├── AgentInitializationError │ └── AgentThreadException ├── ChatClientException │ └── ChatClientInitializationError ├── ServiceException │ ├── ServiceInitializationError │ ├── ServiceResponseException │ │ ├── ServiceContentFilterException │ │ ├── ServiceInvalidExecutionSettingsError │ │ └── ServiceInvalidResponseError │ └── ServiceInvalidAuthError ├── ToolException │ └── ToolExecutionException ├── MiddlewareException └── ContentError ``` **What DeepBoner Has:** ```text DeepBonerError (base) ├── SearchError │ └── RateLimitError ├── JudgeError ├── ConfigurationError └── EmbeddingError ``` **Gap Impact:** - Less precise error handling - Harder to distinguish error sources - Less informative error messages for users --- ## Prioritized Implementation Roadmap ### Phase 1: Quick Wins (1-2 days) 1. Add token tracking to orchestrator (no OTEL yet, just counters) 2. Add `max_consecutive_errors` to tool execution ### Phase 2: Medium Effort (3-5 days) 1. Add basic middleware pattern to orchestrator 2. Implement thread serialization for `ResearchMemory` ### Phase 3: Full Production (1-2 weeks) 1. Full OpenTelemetry integration 2. Complete middleware pipeline 3. Context provider lifecycle hooks --- ## Related Issues - **P2 Hardening Issues:** `docs/bugs/p2-hardening-issues.md` - **MS Framework Reference:** `reference_repos/microsoft-agent-framework/` --- ## Notes These gaps are P3 because: 1. DeepBoner is functional without them 2. They're architectural improvements, not bug fixes 3. User-facing functionality is not impacted However, for production deployment serving multiple users, Gaps 1 (Observability) and 3 (Thread State) become P1/P2.