A newer version of the Gradio SDK is available:
6.1.0
P3: Microsoft Agent Framework Gaps Analysis
Date: 2025-12-06 Priority: P3 (Nice-to-Have) Source: Comparison with Microsoft Agent Framework v1.0.0b251204 (commit 8c6b12e)
Executive Summary
Comparison of DeepBoner's implementation against Microsoft Agent Framework reveals several architectural patterns we're missing. These are not bugs but opportunities for hardening and production-readiness.
Gap 1: OpenTelemetry Observability (HIGH VALUE)
What MS Framework Has:
# observability.py - 1706 lines of comprehensive OTEL integration
from opentelemetry.trace import get_tracer, Span
from opentelemetry.metrics import get_meter, Histogram
@use_observability # Decorator for ChatClient
@use_agent_observability # Decorator for Agent
# Token usage histograms with bucket boundaries
TOKEN_USAGE_BUCKET_BOUNDARIES = (1, 4, 16, 64, 256, 1024, 4096, 16384, 65536, 262144, 1048576)
# Operation duration histograms
OPERATION_DURATION_BUCKET_BOUNDARIES = (0.01, 0.02, 0.04, 0.08, 0.16, ...)
# 80+ semantic span attributes (OtelAttr enum)
OtelAttr.GEN_AI_OPERATION_NAME
OtelAttr.GEN_AI_REQUEST_MODEL
OtelAttr.GEN_AI_USAGE_INPUT_TOKENS
OtelAttr.GEN_AI_USAGE_OUTPUT_TOKENS
What DeepBoner Has:
structlogfor logging only- No distributed tracing
- No metrics collection
- No token usage tracking
Gap Impact:
- Cannot trace requests across agents
- No token cost monitoring
- No performance profiling in production
Recommended Fix:
# Add optional OTEL support to orchestrator
# src/observability/__init__.py
from opentelemetry import trace
from opentelemetry.metrics import get_meter
def setup_observability():
"""One-time setup for OpenTelemetry."""
...
@contextmanager
def trace_agent_operation(name: str, attributes: dict):
"""Context manager for tracing agent operations."""
...
Gap 2: Middleware Pipelines (MEDIUM VALUE) - ADDRESSED IN ADR-001
NOTE: This gap is being addressed in ADR-001: Middleware Architecture Refactor
What MS Framework Has:
# _middleware.py - Three types of middleware
class AgentMiddleware(ABC):
"""Intercepts agent invocations."""
async def process(self, context: AgentRunContext, next): ...
class FunctionMiddleware(ABC):
"""Intercepts tool/function calls."""
async def process(self, context: FunctionInvocationContext, next): ...
class ChatMiddleware(ABC):
"""Intercepts chat client requests."""
async def process(self, context: ChatContext, next): ...
# Decorators for easy middleware creation
@agent_middleware
async def logging_middleware(context: AgentRunContext, next):
print(f"Before: {context.agent.name}")
await next(context)
print(f"After: {context.result}")
# Pipeline execution with terminate support
context.terminate = True # Stops pipeline early
What DeepBoner Has:
- Uses MS decorators (
@use_chat_middleware,@use_observability) β - BUT: No custom
ChatMiddlewareclass implementations β src/middleware/folder contains a workflow, not actual middleware β
ADR-001 Solution:
- Rename
src/middleware/βsrc/workflows/(fix misleading name) - Create proper
src/middleware/with MS-pattern implementations:RetryMiddleware(ChatMiddleware)- fixes HuggingFace retry bugTokenTrackingMiddleware(ChatMiddleware)- enables cost monitoringLoggingMiddleware(ChatMiddleware)- structured request/response logs
Gap 3: Thread/Conversation State Management (MEDIUM VALUE)
What MS Framework Has:
# _threads.py
class AgentThread:
"""Maintains conversation state with serialization support."""
def __init__(self, service_thread_id=None, message_store=None):
...
async def serialize(self) -> dict[str, Any]:
"""Persist thread state."""
...
@classmethod
async def deserialize(cls, state: dict) -> "AgentThread":
"""Restore thread from persisted state."""
...
class ChatMessageStoreProtocol(Protocol):
"""Protocol for message storage backends."""
async def list_messages(self) -> list[ChatMessage]: ...
async def add_messages(self, messages: Sequence[ChatMessage]): ...
What DeepBoner Has:
ResearchMemoryfor research state only- No conversation persistence
- No serialization/deserialization
Gap Impact:
- Cannot resume interrupted research sessions
- Cannot persist conversation history
- Cannot implement checkpointing
Gap 4: Function/Tool Configuration (MEDIUM VALUE)
What MS Framework Has:
# _tools.py
class FunctionInvocationConfiguration:
"""Configuration for function invocation in chat clients."""
enabled: bool = True
max_iterations: int = 40 # Maximum tool loop iterations
max_consecutive_errors_per_request: int = 3
terminate_on_unknown_calls: bool = False
include_detailed_errors: bool = False
class AIFunction:
"""Wraps Python function for AI model calling."""
approval_mode: Literal["always_require", "never_require"]
max_invocations: int # Per-function invocation limit
max_invocation_exceptions: int # Per-function error limit
invocation_count: int # Tracks usage
What DeepBoner Has:
max_iterationsin Settings- Basic tool execution
- No per-tool configuration
- No approval mode
Gap Impact:
- Cannot limit individual tool usage
- No human-in-the-loop for dangerous tools
- No per-tool error budgets
Gap 5: Context Provider Lifecycle (LOW VALUE)
What MS Framework Has:
# _memory.py
class ContextProvider(ABC):
"""Abstract pattern for injecting context into agent invocations."""
async def invoking(self, agent, thread) -> str | None:
"""Called before agent invocation. Returns context to inject."""
...
async def invoked(self, agent, thread, result):
"""Called after agent invocation."""
...
async def thread_created(self, thread):
"""Called when new thread is created."""
...
class AggregateContextProvider(ContextProvider):
"""Combines multiple context providers."""
...
What DeepBoner Has:
ResearchMemoryas simple state container- No lifecycle hooks
- No provider aggregation
Gap 6: Exception Granularity (LOW VALUE)
What MS Framework Has:
AgentFrameworkException (base)
βββ AgentException
β βββ AgentExecutionException
β βββ AgentInitializationError
β βββ AgentThreadException
βββ ChatClientException
β βββ ChatClientInitializationError
βββ ServiceException
β βββ ServiceInitializationError
β βββ ServiceResponseException
β β βββ ServiceContentFilterException
β β βββ ServiceInvalidExecutionSettingsError
β β βββ ServiceInvalidResponseError
β βββ ServiceInvalidAuthError
βββ ToolException
β βββ ToolExecutionException
βββ MiddlewareException
βββ ContentError
What DeepBoner Has:
DeepBonerError (base)
βββ SearchError
β βββ RateLimitError
βββ JudgeError
βββ ConfigurationError
βββ EmbeddingError
Gap Impact:
- Less precise error handling
- Harder to distinguish error sources
- Less informative error messages for users
Prioritized Implementation Roadmap
Phase 1: Quick Wins (1-2 days)
- Add token tracking to orchestrator (no OTEL yet, just counters)
- Add
max_consecutive_errorsto tool execution
Phase 2: Medium Effort (3-5 days)
- Add basic middleware pattern to orchestrator
- Implement thread serialization for
ResearchMemory
Phase 3: Full Production (1-2 weeks)
- Full OpenTelemetry integration
- Complete middleware pipeline
- Context provider lifecycle hooks
Related Issues
- P2 Hardening Issues:
docs/bugs/p2-hardening-issues.md - MS Framework Reference:
reference_repos/microsoft-agent-framework/
Notes
These gaps are P3 because:
- DeepBoner is functional without them
- They're architectural improvements, not bug fixes
- User-facing functionality is not impacted
However, for production deployment serving multiple users, Gaps 1 (Observability) and 3 (Thread State) become P1/P2.