File size: 8,349 Bytes
2ac49c3
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
f9576ce
2ac49c3
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
f9576ce
2ac49c3
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
# P3: Microsoft Agent Framework Gaps Analysis

**Date:** 2025-12-06
**Priority:** P3 (Nice-to-Have)
**Source:** Comparison with Microsoft Agent Framework v1.0.0b251204 (commit 8c6b12e)

## Executive Summary

Comparison of DeepBoner's implementation against Microsoft Agent Framework reveals several architectural patterns we're missing. These are not bugs but opportunities for hardening and production-readiness.

---

## Gap 1: OpenTelemetry Observability (HIGH VALUE)

**What MS Framework Has:**
```python
# observability.py - 1706 lines of comprehensive OTEL integration
from opentelemetry.trace import get_tracer, Span
from opentelemetry.metrics import get_meter, Histogram

@use_observability   # Decorator for ChatClient
@use_agent_observability  # Decorator for Agent

# Token usage histograms with bucket boundaries
TOKEN_USAGE_BUCKET_BOUNDARIES = (1, 4, 16, 64, 256, 1024, 4096, 16384, 65536, 262144, 1048576)

# Operation duration histograms
OPERATION_DURATION_BUCKET_BOUNDARIES = (0.01, 0.02, 0.04, 0.08, 0.16, ...)

# 80+ semantic span attributes (OtelAttr enum)
OtelAttr.GEN_AI_OPERATION_NAME
OtelAttr.GEN_AI_REQUEST_MODEL
OtelAttr.GEN_AI_USAGE_INPUT_TOKENS
OtelAttr.GEN_AI_USAGE_OUTPUT_TOKENS
```

**What DeepBoner Has:**
- `structlog` for logging only
- No distributed tracing
- No metrics collection
- No token usage tracking

**Gap Impact:**
- Cannot trace requests across agents
- No token cost monitoring
- No performance profiling in production

**Recommended Fix:**
```python
# Add optional OTEL support to orchestrator
# src/observability/__init__.py
from opentelemetry import trace
from opentelemetry.metrics import get_meter

def setup_observability():
    """One-time setup for OpenTelemetry."""
    ...

@contextmanager
def trace_agent_operation(name: str, attributes: dict):
    """Context manager for tracing agent operations."""
    ...
```

---

## Gap 2: Middleware Pipelines (MEDIUM VALUE) - ADDRESSED IN ADR-001

> **NOTE:** This gap is being addressed in [ADR-001: Middleware Architecture Refactor](../architecture/adr-001-middleware-refactor.md)

**What MS Framework Has:**
```python
# _middleware.py - Three types of middleware

class AgentMiddleware(ABC):
    """Intercepts agent invocations."""
    async def process(self, context: AgentRunContext, next): ...

class FunctionMiddleware(ABC):
    """Intercepts tool/function calls."""
    async def process(self, context: FunctionInvocationContext, next): ...

class ChatMiddleware(ABC):
    """Intercepts chat client requests."""
    async def process(self, context: ChatContext, next): ...

# Decorators for easy middleware creation
@agent_middleware
async def logging_middleware(context: AgentRunContext, next):
    print(f"Before: {context.agent.name}")
    await next(context)
    print(f"After: {context.result}")

# Pipeline execution with terminate support
context.terminate = True  # Stops pipeline early
```

**What DeepBoner Has:**
- Uses MS decorators (`@use_chat_middleware`, `@use_observability`) βœ“
- BUT: No custom `ChatMiddleware` class implementations βœ—
- `src/middleware/` folder contains a workflow, not actual middleware βœ—

**ADR-001 Solution:**
1. Rename `src/middleware/` β†’ `src/workflows/` (fix misleading name)
2. Create proper `src/middleware/` with MS-pattern implementations:
   - `RetryMiddleware(ChatMiddleware)` - fixes HuggingFace retry bug
   - `TokenTrackingMiddleware(ChatMiddleware)` - enables cost monitoring
   - `LoggingMiddleware(ChatMiddleware)` - structured request/response logs

---

## Gap 3: Thread/Conversation State Management (MEDIUM VALUE)

**What MS Framework Has:**
```python
# _threads.py
class AgentThread:
    """Maintains conversation state with serialization support."""

    def __init__(self, service_thread_id=None, message_store=None):
        ...

    async def serialize(self) -> dict[str, Any]:
        """Persist thread state."""
        ...

    @classmethod
    async def deserialize(cls, state: dict) -> "AgentThread":
        """Restore thread from persisted state."""
        ...

class ChatMessageStoreProtocol(Protocol):
    """Protocol for message storage backends."""
    async def list_messages(self) -> list[ChatMessage]: ...
    async def add_messages(self, messages: Sequence[ChatMessage]): ...
```

**What DeepBoner Has:**
- `ResearchMemory` for research state only
- No conversation persistence
- No serialization/deserialization

**Gap Impact:**
- Cannot resume interrupted research sessions
- Cannot persist conversation history
- Cannot implement checkpointing

---

## Gap 4: Function/Tool Configuration (MEDIUM VALUE)

**What MS Framework Has:**
```python
# _tools.py
class FunctionInvocationConfiguration:
    """Configuration for function invocation in chat clients."""

    enabled: bool = True
    max_iterations: int = 40  # Maximum tool loop iterations
    max_consecutive_errors_per_request: int = 3
    terminate_on_unknown_calls: bool = False
    include_detailed_errors: bool = False

class AIFunction:
    """Wraps Python function for AI model calling."""
    approval_mode: Literal["always_require", "never_require"]
    max_invocations: int  # Per-function invocation limit
    max_invocation_exceptions: int  # Per-function error limit
    invocation_count: int  # Tracks usage
```

**What DeepBoner Has:**
- `max_iterations` in Settings
- Basic tool execution
- No per-tool configuration
- No approval mode

**Gap Impact:**
- Cannot limit individual tool usage
- No human-in-the-loop for dangerous tools
- No per-tool error budgets

---

## Gap 5: Context Provider Lifecycle (LOW VALUE)

**What MS Framework Has:**
```python
# _memory.py
class ContextProvider(ABC):
    """Abstract pattern for injecting context into agent invocations."""

    async def invoking(self, agent, thread) -> str | None:
        """Called before agent invocation. Returns context to inject."""
        ...

    async def invoked(self, agent, thread, result):
        """Called after agent invocation."""
        ...

    async def thread_created(self, thread):
        """Called when new thread is created."""
        ...

class AggregateContextProvider(ContextProvider):
    """Combines multiple context providers."""
    ...
```

**What DeepBoner Has:**
- `ResearchMemory` as simple state container
- No lifecycle hooks
- No provider aggregation

---

## Gap 6: Exception Granularity (LOW VALUE)

**What MS Framework Has:**
```text
AgentFrameworkException (base)
β”œβ”€β”€ AgentException
β”‚   β”œβ”€β”€ AgentExecutionException
β”‚   β”œβ”€β”€ AgentInitializationError
β”‚   └── AgentThreadException
β”œβ”€β”€ ChatClientException
β”‚   └── ChatClientInitializationError
β”œβ”€β”€ ServiceException
β”‚   β”œβ”€β”€ ServiceInitializationError
β”‚   β”œβ”€β”€ ServiceResponseException
β”‚   β”‚   β”œβ”€β”€ ServiceContentFilterException
β”‚   β”‚   β”œβ”€β”€ ServiceInvalidExecutionSettingsError
β”‚   β”‚   └── ServiceInvalidResponseError
β”‚   └── ServiceInvalidAuthError
β”œβ”€β”€ ToolException
β”‚   └── ToolExecutionException
β”œβ”€β”€ MiddlewareException
└── ContentError
```

**What DeepBoner Has:**
```text
DeepBonerError (base)
β”œβ”€β”€ SearchError
β”‚   └── RateLimitError
β”œβ”€β”€ JudgeError
β”œβ”€β”€ ConfigurationError
└── EmbeddingError
```

**Gap Impact:**
- Less precise error handling
- Harder to distinguish error sources
- Less informative error messages for users

---

## Prioritized Implementation Roadmap

### Phase 1: Quick Wins (1-2 days)
1. Add token tracking to orchestrator (no OTEL yet, just counters)
2. Add `max_consecutive_errors` to tool execution

### Phase 2: Medium Effort (3-5 days)
1. Add basic middleware pattern to orchestrator
2. Implement thread serialization for `ResearchMemory`

### Phase 3: Full Production (1-2 weeks)
1. Full OpenTelemetry integration
2. Complete middleware pipeline
3. Context provider lifecycle hooks

---

## Related Issues

- **P2 Hardening Issues:** `docs/bugs/p2-hardening-issues.md`
- **MS Framework Reference:** `reference_repos/microsoft-agent-framework/`

---

## Notes

These gaps are P3 because:
1. DeepBoner is functional without them
2. They're architectural improvements, not bug fixes
3. User-facing functionality is not impacted

However, for production deployment serving multiple users, Gaps 1 (Observability) and 3 (Thread State) become P1/P2.