| # P3: Microsoft Agent Framework Gaps Analysis | |
| **Date:** 2025-12-06 | |
| **Priority:** P3 (Nice-to-Have) | |
| **Source:** Comparison with Microsoft Agent Framework v1.0.0b251204 (commit 8c6b12e) | |
| ## Executive Summary | |
| Comparison of DeepBoner's implementation against Microsoft Agent Framework reveals several architectural patterns we're missing. These are not bugs but opportunities for hardening and production-readiness. | |
| --- | |
| ## Gap 1: OpenTelemetry Observability (HIGH VALUE) | |
| **What MS Framework Has:** | |
| ```python | |
| # observability.py - 1706 lines of comprehensive OTEL integration | |
| from opentelemetry.trace import get_tracer, Span | |
| from opentelemetry.metrics import get_meter, Histogram | |
| @use_observability # Decorator for ChatClient | |
| @use_agent_observability # Decorator for Agent | |
| # Token usage histograms with bucket boundaries | |
| TOKEN_USAGE_BUCKET_BOUNDARIES = (1, 4, 16, 64, 256, 1024, 4096, 16384, 65536, 262144, 1048576) | |
| # Operation duration histograms | |
| OPERATION_DURATION_BUCKET_BOUNDARIES = (0.01, 0.02, 0.04, 0.08, 0.16, ...) | |
| # 80+ semantic span attributes (OtelAttr enum) | |
| OtelAttr.GEN_AI_OPERATION_NAME | |
| OtelAttr.GEN_AI_REQUEST_MODEL | |
| OtelAttr.GEN_AI_USAGE_INPUT_TOKENS | |
| OtelAttr.GEN_AI_USAGE_OUTPUT_TOKENS | |
| ``` | |
| **What DeepBoner Has:** | |
| - `structlog` for logging only | |
| - No distributed tracing | |
| - No metrics collection | |
| - No token usage tracking | |
| **Gap Impact:** | |
| - Cannot trace requests across agents | |
| - No token cost monitoring | |
| - No performance profiling in production | |
| **Recommended Fix:** | |
| ```python | |
| # Add optional OTEL support to orchestrator | |
| # src/observability/__init__.py | |
| from opentelemetry import trace | |
| from opentelemetry.metrics import get_meter | |
| def setup_observability(): | |
| """One-time setup for OpenTelemetry.""" | |
| ... | |
| @contextmanager | |
| def trace_agent_operation(name: str, attributes: dict): | |
| """Context manager for tracing agent operations.""" | |
| ... | |
| ``` | |
| --- | |
| ## Gap 2: Middleware Pipelines (MEDIUM VALUE) - ADDRESSED IN ADR-001 | |
| > **NOTE:** This gap is being addressed in [ADR-001: Middleware Architecture Refactor](../architecture/adr-001-middleware-refactor.md) | |
| **What MS Framework Has:** | |
| ```python | |
| # _middleware.py - Three types of middleware | |
| class AgentMiddleware(ABC): | |
| """Intercepts agent invocations.""" | |
| async def process(self, context: AgentRunContext, next): ... | |
| class FunctionMiddleware(ABC): | |
| """Intercepts tool/function calls.""" | |
| async def process(self, context: FunctionInvocationContext, next): ... | |
| class ChatMiddleware(ABC): | |
| """Intercepts chat client requests.""" | |
| async def process(self, context: ChatContext, next): ... | |
| # Decorators for easy middleware creation | |
| @agent_middleware | |
| async def logging_middleware(context: AgentRunContext, next): | |
| print(f"Before: {context.agent.name}") | |
| await next(context) | |
| print(f"After: {context.result}") | |
| # Pipeline execution with terminate support | |
| context.terminate = True # Stops pipeline early | |
| ``` | |
| **What DeepBoner Has:** | |
| - Uses MS decorators (`@use_chat_middleware`, `@use_observability`) β | |
| - BUT: No custom `ChatMiddleware` class implementations β | |
| - `src/middleware/` folder contains a workflow, not actual middleware β | |
| **ADR-001 Solution:** | |
| 1. Rename `src/middleware/` β `src/workflows/` (fix misleading name) | |
| 2. Create proper `src/middleware/` with MS-pattern implementations: | |
| - `RetryMiddleware(ChatMiddleware)` - fixes HuggingFace retry bug | |
| - `TokenTrackingMiddleware(ChatMiddleware)` - enables cost monitoring | |
| - `LoggingMiddleware(ChatMiddleware)` - structured request/response logs | |
| --- | |
| ## Gap 3: Thread/Conversation State Management (MEDIUM VALUE) | |
| **What MS Framework Has:** | |
| ```python | |
| # _threads.py | |
| class AgentThread: | |
| """Maintains conversation state with serialization support.""" | |
| def __init__(self, service_thread_id=None, message_store=None): | |
| ... | |
| async def serialize(self) -> dict[str, Any]: | |
| """Persist thread state.""" | |
| ... | |
| @classmethod | |
| async def deserialize(cls, state: dict) -> "AgentThread": | |
| """Restore thread from persisted state.""" | |
| ... | |
| class ChatMessageStoreProtocol(Protocol): | |
| """Protocol for message storage backends.""" | |
| async def list_messages(self) -> list[ChatMessage]: ... | |
| async def add_messages(self, messages: Sequence[ChatMessage]): ... | |
| ``` | |
| **What DeepBoner Has:** | |
| - `ResearchMemory` for research state only | |
| - No conversation persistence | |
| - No serialization/deserialization | |
| **Gap Impact:** | |
| - Cannot resume interrupted research sessions | |
| - Cannot persist conversation history | |
| - Cannot implement checkpointing | |
| --- | |
| ## Gap 4: Function/Tool Configuration (MEDIUM VALUE) | |
| **What MS Framework Has:** | |
| ```python | |
| # _tools.py | |
| class FunctionInvocationConfiguration: | |
| """Configuration for function invocation in chat clients.""" | |
| enabled: bool = True | |
| max_iterations: int = 40 # Maximum tool loop iterations | |
| max_consecutive_errors_per_request: int = 3 | |
| terminate_on_unknown_calls: bool = False | |
| include_detailed_errors: bool = False | |
| class AIFunction: | |
| """Wraps Python function for AI model calling.""" | |
| approval_mode: Literal["always_require", "never_require"] | |
| max_invocations: int # Per-function invocation limit | |
| max_invocation_exceptions: int # Per-function error limit | |
| invocation_count: int # Tracks usage | |
| ``` | |
| **What DeepBoner Has:** | |
| - `max_iterations` in Settings | |
| - Basic tool execution | |
| - No per-tool configuration | |
| - No approval mode | |
| **Gap Impact:** | |
| - Cannot limit individual tool usage | |
| - No human-in-the-loop for dangerous tools | |
| - No per-tool error budgets | |
| --- | |
| ## Gap 5: Context Provider Lifecycle (LOW VALUE) | |
| **What MS Framework Has:** | |
| ```python | |
| # _memory.py | |
| class ContextProvider(ABC): | |
| """Abstract pattern for injecting context into agent invocations.""" | |
| async def invoking(self, agent, thread) -> str | None: | |
| """Called before agent invocation. Returns context to inject.""" | |
| ... | |
| async def invoked(self, agent, thread, result): | |
| """Called after agent invocation.""" | |
| ... | |
| async def thread_created(self, thread): | |
| """Called when new thread is created.""" | |
| ... | |
| class AggregateContextProvider(ContextProvider): | |
| """Combines multiple context providers.""" | |
| ... | |
| ``` | |
| **What DeepBoner Has:** | |
| - `ResearchMemory` as simple state container | |
| - No lifecycle hooks | |
| - No provider aggregation | |
| --- | |
| ## Gap 6: Exception Granularity (LOW VALUE) | |
| **What MS Framework Has:** | |
| ```text | |
| AgentFrameworkException (base) | |
| βββ AgentException | |
| β βββ AgentExecutionException | |
| β βββ AgentInitializationError | |
| β βββ AgentThreadException | |
| βββ ChatClientException | |
| β βββ ChatClientInitializationError | |
| βββ ServiceException | |
| β βββ ServiceInitializationError | |
| β βββ ServiceResponseException | |
| β β βββ ServiceContentFilterException | |
| β β βββ ServiceInvalidExecutionSettingsError | |
| β β βββ ServiceInvalidResponseError | |
| β βββ ServiceInvalidAuthError | |
| βββ ToolException | |
| β βββ ToolExecutionException | |
| βββ MiddlewareException | |
| βββ ContentError | |
| ``` | |
| **What DeepBoner Has:** | |
| ```text | |
| DeepBonerError (base) | |
| βββ SearchError | |
| β βββ RateLimitError | |
| βββ JudgeError | |
| βββ ConfigurationError | |
| βββ EmbeddingError | |
| ``` | |
| **Gap Impact:** | |
| - Less precise error handling | |
| - Harder to distinguish error sources | |
| - Less informative error messages for users | |
| --- | |
| ## Prioritized Implementation Roadmap | |
| ### Phase 1: Quick Wins (1-2 days) | |
| 1. Add token tracking to orchestrator (no OTEL yet, just counters) | |
| 2. Add `max_consecutive_errors` to tool execution | |
| ### Phase 2: Medium Effort (3-5 days) | |
| 1. Add basic middleware pattern to orchestrator | |
| 2. Implement thread serialization for `ResearchMemory` | |
| ### Phase 3: Full Production (1-2 weeks) | |
| 1. Full OpenTelemetry integration | |
| 2. Complete middleware pipeline | |
| 3. Context provider lifecycle hooks | |
| --- | |
| ## Related Issues | |
| - **P2 Hardening Issues:** `docs/bugs/p2-hardening-issues.md` | |
| - **MS Framework Reference:** `reference_repos/microsoft-agent-framework/` | |
| --- | |
| ## Notes | |
| These gaps are P3 because: | |
| 1. DeepBoner is functional without them | |
| 2. They're architectural improvements, not bug fixes | |
| 3. User-facing functionality is not impacted | |
| However, for production deployment serving multiple users, Gaps 1 (Observability) and 3 (Thread State) become P1/P2. | |