Spaces:

VibecoderMcSwaggins
/

DeepBoner

Paused

App Files Files Community

DeepBoner / docs /bugs /p3-ms-framework-gaps.md

VibecoderMcSwaggins

docs: Address CodeRabbit review findings

f9576ce 7 days ago

preview code

raw

history blame contribute delete

8.35 kB

	# P3: Microsoft Agent Framework Gaps Analysis

	Date: 2025-12-06
	Priority: P3 (Nice-to-Have)
	Source: Comparison with Microsoft Agent Framework v1.0.0b251204 (commit 8c6b12e)

	## Executive Summary

	Comparison of DeepBoner's implementation against Microsoft Agent Framework reveals several architectural patterns we're missing. These are not bugs but opportunities for hardening and production-readiness.

	---

	## Gap 1: OpenTelemetry Observability (HIGH VALUE)

	What MS Framework Has:
	```python
	# observability.py - 1706 lines of comprehensive OTEL integration
	from opentelemetry.trace import get_tracer, Span
	from opentelemetry.metrics import get_meter, Histogram

	@use_observability # Decorator for ChatClient
	@use_agent_observability # Decorator for Agent

	# Token usage histograms with bucket boundaries
	TOKEN_USAGE_BUCKET_BOUNDARIES = (1, 4, 16, 64, 256, 1024, 4096, 16384, 65536, 262144, 1048576)

	# Operation duration histograms
	OPERATION_DURATION_BUCKET_BOUNDARIES = (0.01, 0.02, 0.04, 0.08, 0.16, ...)

	# 80+ semantic span attributes (OtelAttr enum)
	OtelAttr.GEN_AI_OPERATION_NAME
	OtelAttr.GEN_AI_REQUEST_MODEL
	OtelAttr.GEN_AI_USAGE_INPUT_TOKENS
	OtelAttr.GEN_AI_USAGE_OUTPUT_TOKENS
	```

	What DeepBoner Has:
	- `structlog` for logging only
	- No distributed tracing
	- No metrics collection
	- No token usage tracking

	Gap Impact:
	- Cannot trace requests across agents
	- No token cost monitoring
	- No performance profiling in production

	Recommended Fix:
	```python
	# Add optional OTEL support to orchestrator
	# src/observability/__init__.py
	from opentelemetry import trace
	from opentelemetry.metrics import get_meter

	def setup_observability():
	"""One-time setup for OpenTelemetry."""
	...

	@contextmanager
	def trace_agent_operation(name: str, attributes: dict):
	"""Context manager for tracing agent operations."""
	...
	```

	---

	## Gap 2: Middleware Pipelines (MEDIUM VALUE) - ADDRESSED IN ADR-001

	> NOTE: This gap is being addressed in [ADR-001: Middleware Architecture Refactor](../architecture/adr-001-middleware-refactor.md)

	What MS Framework Has:
	```python
	# _middleware.py - Three types of middleware

	class AgentMiddleware(ABC):
	"""Intercepts agent invocations."""
	async def process(self, context: AgentRunContext, next): ...

	class FunctionMiddleware(ABC):
	"""Intercepts tool/function calls."""
	async def process(self, context: FunctionInvocationContext, next): ...

	class ChatMiddleware(ABC):
	"""Intercepts chat client requests."""
	async def process(self, context: ChatContext, next): ...

	# Decorators for easy middleware creation
	@agent_middleware
	async def logging_middleware(context: AgentRunContext, next):
	print(f"Before: {context.agent.name}")
	await next(context)
	print(f"After: {context.result}")

	# Pipeline execution with terminate support
	context.terminate = True # Stops pipeline early
	```

	What DeepBoner Has:
	- Uses MS decorators (`@use_chat_middleware`, `@use_observability`) ✓
	- BUT: No custom `ChatMiddleware` class implementations ✗
	- `src/middleware/` folder contains a workflow, not actual middleware ✗

	ADR-001 Solution:
	1. Rename `src/middleware/` → `src/workflows/` (fix misleading name)
	2. Create proper `src/middleware/` with MS-pattern implementations:
	- `RetryMiddleware(ChatMiddleware)` - fixes HuggingFace retry bug
	- `TokenTrackingMiddleware(ChatMiddleware)` - enables cost monitoring
	- `LoggingMiddleware(ChatMiddleware)` - structured request/response logs

	---

	## Gap 3: Thread/Conversation State Management (MEDIUM VALUE)

	What MS Framework Has:
	```python
	# _threads.py
	class AgentThread:
	"""Maintains conversation state with serialization support."""

	def __init__(self, service_thread_id=None, message_store=None):
	...

	async def serialize(self) -> dict[str, Any]:
	"""Persist thread state."""
	...

	@classmethod
	async def deserialize(cls, state: dict) -> "AgentThread":
	"""Restore thread from persisted state."""
	...

	class ChatMessageStoreProtocol(Protocol):
	"""Protocol for message storage backends."""
	async def list_messages(self) -> list[ChatMessage]: ...
	async def add_messages(self, messages: Sequence[ChatMessage]): ...
	```

	What DeepBoner Has:
	- `ResearchMemory` for research state only
	- No conversation persistence
	- No serialization/deserialization

	Gap Impact:
	- Cannot resume interrupted research sessions
	- Cannot persist conversation history
	- Cannot implement checkpointing

	---

	## Gap 4: Function/Tool Configuration (MEDIUM VALUE)

	What MS Framework Has:
	```python
	# _tools.py
	class FunctionInvocationConfiguration:
	"""Configuration for function invocation in chat clients."""

	enabled: bool = True
	max_iterations: int = 40 # Maximum tool loop iterations
	max_consecutive_errors_per_request: int = 3
	terminate_on_unknown_calls: bool = False
	include_detailed_errors: bool = False

	class AIFunction:
	"""Wraps Python function for AI model calling."""
	approval_mode: Literal["always_require", "never_require"]
	max_invocations: int # Per-function invocation limit
	max_invocation_exceptions: int # Per-function error limit
	invocation_count: int # Tracks usage
	```

	What DeepBoner Has:
	- `max_iterations` in Settings
	- Basic tool execution
	- No per-tool configuration
	- No approval mode

	Gap Impact:
	- Cannot limit individual tool usage
	- No human-in-the-loop for dangerous tools
	- No per-tool error budgets

	---

	## Gap 5: Context Provider Lifecycle (LOW VALUE)

	What MS Framework Has:
	```python
	# _memory.py
	class ContextProvider(ABC):
	"""Abstract pattern for injecting context into agent invocations."""

	async def invoking(self, agent, thread) -> str \| None:
	"""Called before agent invocation. Returns context to inject."""
	...

	async def invoked(self, agent, thread, result):
	"""Called after agent invocation."""
	...

	async def thread_created(self, thread):
	"""Called when new thread is created."""
	...

	class AggregateContextProvider(ContextProvider):
	"""Combines multiple context providers."""
	...
	```

	What DeepBoner Has:
	- `ResearchMemory` as simple state container
	- No lifecycle hooks
	- No provider aggregation

	---

	## Gap 6: Exception Granularity (LOW VALUE)

	What MS Framework Has:
	```text
	AgentFrameworkException (base)
	├── AgentException
	│ ├── AgentExecutionException
	│ ├── AgentInitializationError
	│ └── AgentThreadException
	├── ChatClientException
	│ └── ChatClientInitializationError
	├── ServiceException
	│ ├── ServiceInitializationError
	│ ├── ServiceResponseException
	│ │ ├── ServiceContentFilterException
	│ │ ├── ServiceInvalidExecutionSettingsError
	│ │ └── ServiceInvalidResponseError
	│ └── ServiceInvalidAuthError
	├── ToolException
	│ └── ToolExecutionException
	├── MiddlewareException
	└── ContentError
	```

	What DeepBoner Has:
	```text
	DeepBonerError (base)
	├── SearchError
	│ └── RateLimitError
	├── JudgeError
	├── ConfigurationError
	└── EmbeddingError
	```

	Gap Impact:
	- Less precise error handling
	- Harder to distinguish error sources
	- Less informative error messages for users

	---

	## Prioritized Implementation Roadmap

	### Phase 1: Quick Wins (1-2 days)
	1. Add token tracking to orchestrator (no OTEL yet, just counters)
	2. Add `max_consecutive_errors` to tool execution

	### Phase 2: Medium Effort (3-5 days)
	1. Add basic middleware pattern to orchestrator
	2. Implement thread serialization for `ResearchMemory`

	### Phase 3: Full Production (1-2 weeks)
	1. Full OpenTelemetry integration
	2. Complete middleware pipeline
	3. Context provider lifecycle hooks

	---

	## Related Issues

	- P2 Hardening Issues: `docs/bugs/p2-hardening-issues.md`
	- MS Framework Reference: `reference_repos/microsoft-agent-framework/`

	---

	## Notes

	These gaps are P3 because:
	1. DeepBoner is functional without them
	2. They're architectural improvements, not bug fixes
	3. User-facing functionality is not impacted

	However, for production deployment serving multiple users, Gaps 1 (Observability) and 3 (Thread State) become P1/P2.