Spaces:

VibecoderMcSwaggins
/

DeepBoner

Paused

App Files Files Community

DeepBoner / docs /bugs /p3-ms-framework-gaps.md

VibecoderMcSwaggins

docs: Address CodeRabbit review findings

f9576ce 7 days ago

preview code

raw

history blame contribute delete

8.35 kB

A newer version of the Gradio SDK is available: 6.1.0

Upgrade

P3: Microsoft Agent Framework Gaps Analysis

Date: 2025-12-06 Priority: P3 (Nice-to-Have) Source: Comparison with Microsoft Agent Framework v1.0.0b251204 (commit 8c6b12e)

Executive Summary

Comparison of DeepBoner's implementation against Microsoft Agent Framework reveals several architectural patterns we're missing. These are not bugs but opportunities for hardening and production-readiness.

Gap 1: OpenTelemetry Observability (HIGH VALUE)

What MS Framework Has:

# observability.py - 1706 lines of comprehensive OTEL integration
from opentelemetry.trace import get_tracer, Span
from opentelemetry.metrics import get_meter, Histogram

@use_observability   # Decorator for ChatClient
@use_agent_observability  # Decorator for Agent

# Token usage histograms with bucket boundaries
TOKEN_USAGE_BUCKET_BOUNDARIES = (1, 4, 16, 64, 256, 1024, 4096, 16384, 65536, 262144, 1048576)

# Operation duration histograms
OPERATION_DURATION_BUCKET_BOUNDARIES = (0.01, 0.02, 0.04, 0.08, 0.16, ...)

# 80+ semantic span attributes (OtelAttr enum)
OtelAttr.GEN_AI_OPERATION_NAME
OtelAttr.GEN_AI_REQUEST_MODEL
OtelAttr.GEN_AI_USAGE_INPUT_TOKENS
OtelAttr.GEN_AI_USAGE_OUTPUT_TOKENS

What DeepBoner Has:

structlog for logging only
No distributed tracing
No metrics collection
No token usage tracking

Gap Impact:

Cannot trace requests across agents
No token cost monitoring
No performance profiling in production

Recommended Fix:

# Add optional OTEL support to orchestrator
# src/observability/__init__.py
from opentelemetry import trace
from opentelemetry.metrics import get_meter

def setup_observability():
    """One-time setup for OpenTelemetry."""
    ...

@contextmanager
def trace_agent_operation(name: str, attributes: dict):
    """Context manager for tracing agent operations."""
    ...

Gap 2: Middleware Pipelines (MEDIUM VALUE) - ADDRESSED IN ADR-001

NOTE: This gap is being addressed in ADR-001: Middleware Architecture Refactor

What MS Framework Has:

# _middleware.py - Three types of middleware

class AgentMiddleware(ABC):
    """Intercepts agent invocations."""
    async def process(self, context: AgentRunContext, next): ...

class FunctionMiddleware(ABC):
    """Intercepts tool/function calls."""
    async def process(self, context: FunctionInvocationContext, next): ...

class ChatMiddleware(ABC):
    """Intercepts chat client requests."""
    async def process(self, context: ChatContext, next): ...

# Decorators for easy middleware creation
@agent_middleware
async def logging_middleware(context: AgentRunContext, next):
    print(f"Before: {context.agent.name}")
    await next(context)
    print(f"After: {context.result}")

# Pipeline execution with terminate support
context.terminate = True  # Stops pipeline early

What DeepBoner Has:

Uses MS decorators (@use_chat_middleware, @use_observability) ✓
BUT: No custom ChatMiddleware class implementations ✗
src/middleware/ folder contains a workflow, not actual middleware ✗

ADR-001 Solution:

Rename src/middleware/ → src/workflows/ (fix misleading name)
Create proper src/middleware/ with MS-pattern implementations:
- RetryMiddleware(ChatMiddleware) - fixes HuggingFace retry bug
- TokenTrackingMiddleware(ChatMiddleware) - enables cost monitoring
- LoggingMiddleware(ChatMiddleware) - structured request/response logs

Gap 3: Thread/Conversation State Management (MEDIUM VALUE)

What MS Framework Has:

# _threads.py
class AgentThread:
    """Maintains conversation state with serialization support."""

    def __init__(self, service_thread_id=None, message_store=None):
        ...

    async def serialize(self) -> dict[str, Any]:
        """Persist thread state."""
        ...

    @classmethod
    async def deserialize(cls, state: dict) -> "AgentThread":
        """Restore thread from persisted state."""
        ...

class ChatMessageStoreProtocol(Protocol):
    """Protocol for message storage backends."""
    async def list_messages(self) -> list[ChatMessage]: ...
    async def add_messages(self, messages: Sequence[ChatMessage]): ...

What DeepBoner Has:

ResearchMemory for research state only
No conversation persistence
No serialization/deserialization

Gap Impact:

Cannot resume interrupted research sessions
Cannot persist conversation history
Cannot implement checkpointing

Gap 4: Function/Tool Configuration (MEDIUM VALUE)

What MS Framework Has:

# _tools.py
class FunctionInvocationConfiguration:
    """Configuration for function invocation in chat clients."""

    enabled: bool = True
    max_iterations: int = 40  # Maximum tool loop iterations
    max_consecutive_errors_per_request: int = 3
    terminate_on_unknown_calls: bool = False
    include_detailed_errors: bool = False

class AIFunction:
    """Wraps Python function for AI model calling."""
    approval_mode: Literal["always_require", "never_require"]
    max_invocations: int  # Per-function invocation limit
    max_invocation_exceptions: int  # Per-function error limit
    invocation_count: int  # Tracks usage

What DeepBoner Has:

max_iterations in Settings
Basic tool execution
No per-tool configuration
No approval mode

Gap Impact:

Cannot limit individual tool usage
No human-in-the-loop for dangerous tools
No per-tool error budgets

Gap 5: Context Provider Lifecycle (LOW VALUE)

What MS Framework Has:

# _memory.py
class ContextProvider(ABC):
    """Abstract pattern for injecting context into agent invocations."""

    async def invoking(self, agent, thread) -> str | None:
        """Called before agent invocation. Returns context to inject."""
        ...

    async def invoked(self, agent, thread, result):
        """Called after agent invocation."""
        ...

    async def thread_created(self, thread):
        """Called when new thread is created."""
        ...

class AggregateContextProvider(ContextProvider):
    """Combines multiple context providers."""
    ...

What DeepBoner Has:

ResearchMemory as simple state container
No lifecycle hooks
No provider aggregation

Gap 6: Exception Granularity (LOW VALUE)

What MS Framework Has:

AgentFrameworkException (base)
├── AgentException
│   ├── AgentExecutionException
│   ├── AgentInitializationError
│   └── AgentThreadException
├── ChatClientException
│   └── ChatClientInitializationError
├── ServiceException
│   ├── ServiceInitializationError
│   ├── ServiceResponseException
│   │   ├── ServiceContentFilterException
│   │   ├── ServiceInvalidExecutionSettingsError
│   │   └── ServiceInvalidResponseError
│   └── ServiceInvalidAuthError
├── ToolException
│   └── ToolExecutionException
├── MiddlewareException
└── ContentError

What DeepBoner Has:

DeepBonerError (base)
├── SearchError
│   └── RateLimitError
├── JudgeError
├── ConfigurationError
└── EmbeddingError

Gap Impact:

Less precise error handling
Harder to distinguish error sources
Less informative error messages for users

Prioritized Implementation Roadmap

Phase 1: Quick Wins (1-2 days)

Add token tracking to orchestrator (no OTEL yet, just counters)
Add max_consecutive_errors to tool execution

Phase 2: Medium Effort (3-5 days)

Add basic middleware pattern to orchestrator
Implement thread serialization for ResearchMemory

Phase 3: Full Production (1-2 weeks)

Full OpenTelemetry integration
Complete middleware pipeline
Context provider lifecycle hooks

Related Issues

P2 Hardening Issues: docs/bugs/p2-hardening-issues.md
MS Framework Reference: reference_repos/microsoft-agent-framework/

Notes

These gaps are P3 because:

DeepBoner is functional without them
They're architectural improvements, not bug fixes
User-facing functionality is not impacted

However, for production deployment serving multiple users, Gaps 1 (Observability) and 3 (Thread State) become P1/P2.