File size: 8,349 Bytes
2ac49c3 f9576ce 2ac49c3 f9576ce 2ac49c3 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 |
# P3: Microsoft Agent Framework Gaps Analysis
**Date:** 2025-12-06
**Priority:** P3 (Nice-to-Have)
**Source:** Comparison with Microsoft Agent Framework v1.0.0b251204 (commit 8c6b12e)
## Executive Summary
Comparison of DeepBoner's implementation against Microsoft Agent Framework reveals several architectural patterns we're missing. These are not bugs but opportunities for hardening and production-readiness.
---
## Gap 1: OpenTelemetry Observability (HIGH VALUE)
**What MS Framework Has:**
```python
# observability.py - 1706 lines of comprehensive OTEL integration
from opentelemetry.trace import get_tracer, Span
from opentelemetry.metrics import get_meter, Histogram
@use_observability # Decorator for ChatClient
@use_agent_observability # Decorator for Agent
# Token usage histograms with bucket boundaries
TOKEN_USAGE_BUCKET_BOUNDARIES = (1, 4, 16, 64, 256, 1024, 4096, 16384, 65536, 262144, 1048576)
# Operation duration histograms
OPERATION_DURATION_BUCKET_BOUNDARIES = (0.01, 0.02, 0.04, 0.08, 0.16, ...)
# 80+ semantic span attributes (OtelAttr enum)
OtelAttr.GEN_AI_OPERATION_NAME
OtelAttr.GEN_AI_REQUEST_MODEL
OtelAttr.GEN_AI_USAGE_INPUT_TOKENS
OtelAttr.GEN_AI_USAGE_OUTPUT_TOKENS
```
**What DeepBoner Has:**
- `structlog` for logging only
- No distributed tracing
- No metrics collection
- No token usage tracking
**Gap Impact:**
- Cannot trace requests across agents
- No token cost monitoring
- No performance profiling in production
**Recommended Fix:**
```python
# Add optional OTEL support to orchestrator
# src/observability/__init__.py
from opentelemetry import trace
from opentelemetry.metrics import get_meter
def setup_observability():
"""One-time setup for OpenTelemetry."""
...
@contextmanager
def trace_agent_operation(name: str, attributes: dict):
"""Context manager for tracing agent operations."""
...
```
---
## Gap 2: Middleware Pipelines (MEDIUM VALUE) - ADDRESSED IN ADR-001
> **NOTE:** This gap is being addressed in [ADR-001: Middleware Architecture Refactor](../architecture/adr-001-middleware-refactor.md)
**What MS Framework Has:**
```python
# _middleware.py - Three types of middleware
class AgentMiddleware(ABC):
"""Intercepts agent invocations."""
async def process(self, context: AgentRunContext, next): ...
class FunctionMiddleware(ABC):
"""Intercepts tool/function calls."""
async def process(self, context: FunctionInvocationContext, next): ...
class ChatMiddleware(ABC):
"""Intercepts chat client requests."""
async def process(self, context: ChatContext, next): ...
# Decorators for easy middleware creation
@agent_middleware
async def logging_middleware(context: AgentRunContext, next):
print(f"Before: {context.agent.name}")
await next(context)
print(f"After: {context.result}")
# Pipeline execution with terminate support
context.terminate = True # Stops pipeline early
```
**What DeepBoner Has:**
- Uses MS decorators (`@use_chat_middleware`, `@use_observability`) β
- BUT: No custom `ChatMiddleware` class implementations β
- `src/middleware/` folder contains a workflow, not actual middleware β
**ADR-001 Solution:**
1. Rename `src/middleware/` β `src/workflows/` (fix misleading name)
2. Create proper `src/middleware/` with MS-pattern implementations:
- `RetryMiddleware(ChatMiddleware)` - fixes HuggingFace retry bug
- `TokenTrackingMiddleware(ChatMiddleware)` - enables cost monitoring
- `LoggingMiddleware(ChatMiddleware)` - structured request/response logs
---
## Gap 3: Thread/Conversation State Management (MEDIUM VALUE)
**What MS Framework Has:**
```python
# _threads.py
class AgentThread:
"""Maintains conversation state with serialization support."""
def __init__(self, service_thread_id=None, message_store=None):
...
async def serialize(self) -> dict[str, Any]:
"""Persist thread state."""
...
@classmethod
async def deserialize(cls, state: dict) -> "AgentThread":
"""Restore thread from persisted state."""
...
class ChatMessageStoreProtocol(Protocol):
"""Protocol for message storage backends."""
async def list_messages(self) -> list[ChatMessage]: ...
async def add_messages(self, messages: Sequence[ChatMessage]): ...
```
**What DeepBoner Has:**
- `ResearchMemory` for research state only
- No conversation persistence
- No serialization/deserialization
**Gap Impact:**
- Cannot resume interrupted research sessions
- Cannot persist conversation history
- Cannot implement checkpointing
---
## Gap 4: Function/Tool Configuration (MEDIUM VALUE)
**What MS Framework Has:**
```python
# _tools.py
class FunctionInvocationConfiguration:
"""Configuration for function invocation in chat clients."""
enabled: bool = True
max_iterations: int = 40 # Maximum tool loop iterations
max_consecutive_errors_per_request: int = 3
terminate_on_unknown_calls: bool = False
include_detailed_errors: bool = False
class AIFunction:
"""Wraps Python function for AI model calling."""
approval_mode: Literal["always_require", "never_require"]
max_invocations: int # Per-function invocation limit
max_invocation_exceptions: int # Per-function error limit
invocation_count: int # Tracks usage
```
**What DeepBoner Has:**
- `max_iterations` in Settings
- Basic tool execution
- No per-tool configuration
- No approval mode
**Gap Impact:**
- Cannot limit individual tool usage
- No human-in-the-loop for dangerous tools
- No per-tool error budgets
---
## Gap 5: Context Provider Lifecycle (LOW VALUE)
**What MS Framework Has:**
```python
# _memory.py
class ContextProvider(ABC):
"""Abstract pattern for injecting context into agent invocations."""
async def invoking(self, agent, thread) -> str | None:
"""Called before agent invocation. Returns context to inject."""
...
async def invoked(self, agent, thread, result):
"""Called after agent invocation."""
...
async def thread_created(self, thread):
"""Called when new thread is created."""
...
class AggregateContextProvider(ContextProvider):
"""Combines multiple context providers."""
...
```
**What DeepBoner Has:**
- `ResearchMemory` as simple state container
- No lifecycle hooks
- No provider aggregation
---
## Gap 6: Exception Granularity (LOW VALUE)
**What MS Framework Has:**
```text
AgentFrameworkException (base)
βββ AgentException
β βββ AgentExecutionException
β βββ AgentInitializationError
β βββ AgentThreadException
βββ ChatClientException
β βββ ChatClientInitializationError
βββ ServiceException
β βββ ServiceInitializationError
β βββ ServiceResponseException
β β βββ ServiceContentFilterException
β β βββ ServiceInvalidExecutionSettingsError
β β βββ ServiceInvalidResponseError
β βββ ServiceInvalidAuthError
βββ ToolException
β βββ ToolExecutionException
βββ MiddlewareException
βββ ContentError
```
**What DeepBoner Has:**
```text
DeepBonerError (base)
βββ SearchError
β βββ RateLimitError
βββ JudgeError
βββ ConfigurationError
βββ EmbeddingError
```
**Gap Impact:**
- Less precise error handling
- Harder to distinguish error sources
- Less informative error messages for users
---
## Prioritized Implementation Roadmap
### Phase 1: Quick Wins (1-2 days)
1. Add token tracking to orchestrator (no OTEL yet, just counters)
2. Add `max_consecutive_errors` to tool execution
### Phase 2: Medium Effort (3-5 days)
1. Add basic middleware pattern to orchestrator
2. Implement thread serialization for `ResearchMemory`
### Phase 3: Full Production (1-2 weeks)
1. Full OpenTelemetry integration
2. Complete middleware pipeline
3. Context provider lifecycle hooks
---
## Related Issues
- **P2 Hardening Issues:** `docs/bugs/p2-hardening-issues.md`
- **MS Framework Reference:** `reference_repos/microsoft-agent-framework/`
---
## Notes
These gaps are P3 because:
1. DeepBoner is functional without them
2. They're architectural improvements, not bug fixes
3. User-facing functionality is not impacted
However, for production deployment serving multiple users, Gaps 1 (Observability) and 3 (Thread State) become P1/P2.
|