Spaces:

VibecoderMcSwaggins
/

DeepBoner

Paused

App Files Files Community

VibecoderMcSwaggins commited on 23 days ago

Commit

809ad60

1 Parent(s): 4450782

fix(P0): Complete HuggingFace tool calling integration and document framework display bug

Browse files

Files changed (2) hide show

docs/bugs/P0_HUGGINGFACE_TOOL_CALLING_BROKEN.md +267 -0
src/clients/huggingface.py +54 -11

docs/bugs/P0_HUGGINGFACE_TOOL_CALLING_BROKEN.md ADDED Viewed

	@@ -0,0 +1,267 @@

+# P0 Bug: HuggingFace Free Tier Tool Calling Broken
+**Severity**: P0 (Critical) - Free Tier cannot perform multi-turn tool-based research
+**Status**: IN_PROGRESS - Root causes identified, fixes pending
+**Discovered**: 2025-12-01
+**Investigator**: Claude Code (Systematic First-Principles Analysis)
+## Executive Summary
+The HuggingFace Free Tier fails to execute tools end-to-end. While the API calls themselves are valid, the **integration** with the Microsoft Agent Framework is missing a critical middleware component (`@use_function_invocation`), and the conversation history serialization is incomplete.
+## Root Causes
+### 1. Missing Tool Execution Middleware (The "Silent Failure")
+**Mechanism**:
+- The `OpenAIChatClient` uses the `@use_function_invocation` decorator, which creates an internal loop:
+  1. LLM proposes tools.
+  2. Middleware executes tools.
+  3. Middleware feeds results back to LLM.
+  4. LLM generates final answer.
+- The `HuggingFaceChatClient` **lacked this decorator**.
+- Result: The client returned raw tool calls to the `ChatAgent`. The `ChatAgent` passed them to the `MagenticAgentExecutor`.
+- **Cascade Failure**: The `MagenticAgentExecutor` (in the framework) has a bug/limitation where it handles tool-call-only messages by converting them to their string representation (`repr()`) because they lack text content. This led to the observed `<ChatMessage object ...>` corruption in the logs and history.
+### 2. Framework Message Corruption (P1 - HIGH, External Bug)
+**Mechanism**:
+- When `MagenticAgentMessageEvent` (which carries agent responses) is generated by the `agent_framework`, the `ChatMessage` object it contains (specifically in `event.message` and its nested `TextContent`) often has its `.text` attribute populated with a Python object's `repr` string (e.g., `<agent_framework._types.ChatMessage object at 0x...>`) instead of the actual human-readable message.
+- DeepBoner's `_extract_text` method correctly identifies these `repr` strings and filters them out.
+- Result: The human-readable agent response is lost at the framework level before DeepBoner can process it for display, leading to empty or uninformative messages in the UI/logs (e.g., `searcher: ...`).
+**Impact**: Display/Logging only. Does not prevent tool execution or core logic, but severely degrades user experience and debugging visibility.
+**Root Cause**: This is an internal issue within the `agent_framework`'s event messaging mechanism, specifically how `ChatMessage` objects are constructed and passed through the `MagenticAgentMessageEvent`. DeepBoner cannot reliably recover the original message text when it has been replaced by a `repr` string by the framework itself.
+**Fix**: Requires an upstream fix or alternative message extraction strategy within the `agent_framework`. Until then, DeepBoner's UI/logs will display truncated or empty messages for these specific events.
+## Solution Plan
+1.  **Fix History Serialization**: Update `_convert_messages` in `src/clients/huggingface.py` to correctly serialize `tool_calls` (Assistant role) and `tool_call_id` (Tool role) to the HuggingFace / OpenAI format.
+2.  **Enable Middleware**: Decorate `HuggingFaceChatClient` with `@use_function_invocation` (and `@use_chat_middleware`, `@use_observability` for parity).
+3.  **Display Fix**: Update `AdvancedOrchestrator._extract_text` to gracefully handle any remaining object representations, just in case.
+## Verification
+- **Reproduction Script**: `reproduce_bugs.py` confirms the serialization failure.
+- **End-to-End Test**: `verify_p0_fix.py` (or similar) will be used to confirm the agent effectively uses tools and synthesizes an answer.
+## Verified Findings
+### What WORKS (Confirmed via Testing)
+1. **Tool Serialization**: `_convert_tools()` correctly converts `AIFunction` → OpenAI JSON format ✅
+2. **First API Call**: HuggingFace returns tool calls on the first request ✅
+3. **Tool Call Parsing**: `_parse_tool_calls()` correctly extracts `FunctionCallContent` ✅
+4. **Function Invoking Marker**: `__function_invoking_chat_client__ = True` is present ✅
+5. **Original P0 (JSON serialization)**: Fixed - no longer crashes with TypeError ✅
+### What is BROKEN (Root Causes)
+---
+## BUG #1: Conversation History Serialization (P0 - CRITICAL)
+### Symptom
+Multi-turn conversations fail with `BadRequestError` from HuggingFace API.
+### Root Cause
+`_convert_messages()` in `src/clients/huggingface.py` only extracts `role` and `content` from messages:
+```python
+def _convert_messages(self, messages: MutableSequence[ChatMessage]) -> list[dict[str, Any]]:
+    hf_messages: list[dict[str, Any]] = []
+    for msg in messages:
+        content = msg.text or ""
+        # ... role extraction ...
+        hf_messages.append({"role": role_str, "content": content})  # MISSING tool_calls and tool_call_id!
+    return hf_messages
+```
+### What HuggingFace API Expects
+```json
+[
+  {"role": "user", "content": "Search for testosterone"},
+  {
+    "role": "assistant",
+    "content": null,
+    "tool_calls": [  // REQUIRED when assistant called a tool
+      {
+        "id": "call_123",
+        "type": "function",
+        "function": {"name": "search_pubmed", "arguments": "{\"query\": \"testosterone\"}"}
+      }
+    ]
+  },
+  {
+    "role": "tool",
+    "content": "Found 10 papers...",
+    "tool_call_id": "call_123"  // REQUIRED - must match the tool call id
+  }
+]
+```
+### What We Send
+```json
+[
+  {"role": "user", "content": "Search for testosterone"},
+  {"role": "assistant", "content": ""},  // MISSING tool_calls!
+  {"role": "tool", "content": "Found 10 papers..."}  // MISSING tool_call_id!
+]
+```
+### Impact
+- First LLM call works (tools called)
+- Second LLM call fails (API rejects malformed history)
+- Research loop never completes
+### Fix Required
+Update `_convert_messages()` to:
+1. Extract `tool_calls` from `ChatMessage.contents` (list of `FunctionCallContent`)
+2. Add `tool_call_id` to tool messages (requires tracking call IDs)
+---
+## BUG #2: Framework Message Corruption (P1 - HIGH)
+### Symptom
+`MagenticAgentMessageEvent.message.text` contains the repr string of a ChatMessage object:
+```
+'<agent_framework._types.ChatMessage object at 0x10c394210>'
+```
+### Verified Behavior
+```python
+# From workflow event inspection:
+event.message.text = '<agent_framework._types.ChatMessage object at 0x...>'
+event.message.contents[0] = TextContent(text='<agent_framework._types.ChatMessage object at 0x..>')
+```
+### Root Cause Hypothesis
+Somewhere in the Microsoft Agent Framework's workflow orchestration, when converting tool call responses from our `HuggingFaceChatClient`, the framework is:
+1. Taking our `ChatMessage` response
+2. Calling `str()` on it (which gives repr)
+3. Creating a NEW `ChatMessage` with the repr as text content
+This may be due to:
+- Missing or incompatible `raw_representation` field
+- Framework expecting a specific message structure we don't provide
+- Type coercion issue in the workflow layer
+### Impact
+- UI shows `<ChatMessage object at 0x...>` instead of actual content
+- Users cannot see what the agent found/did
+- Debugging is difficult
+### Fix Required
+Investigate `agent_framework`'s `ChatAgent` and `MagenticBuilder` to understand:
+1. How they process `ChatResponse` from the client
+2. What structure they expect in `raw_representation`
+3. Whether there's a required serialization method we're not implementing
+---
+## Verification Matrix
+| Component | Status | Test Command |
+|-----------|--------|--------------|
+| Tool Serialization | ✅ WORKS | `client._convert_tools([search_pubmed])` |
+| First Tool Call | ✅ WORKS | Single-turn API call returns `FunctionCallContent` |
+| Multi-turn History | ❌ BROKEN | BadRequestError on second call |
+| Event Display | ❌ BROKEN | Shows repr instead of content |
+| End-to-End Research | ❌ BROKEN | Max rounds reached, no synthesis |
+## Reproduction Steps
+### BUG #1: History Serialization
+```python
+import asyncio
+from src.clients.huggingface import HuggingFaceChatClient
+from src.agents.tools import search_pubmed
+from agent_framework import ChatMessage, ChatOptions
+from agent_framework._types import Role, ToolMode, FunctionCallContent
+async def test():
+    client = HuggingFaceChatClient()
+    # Round 1: Get tool call
+    messages_r1 = [
+        ChatMessage(role=Role.USER, text='Search for testosterone'),
+    ]
+    response_r1 = await client._inner_get_response(
+        messages=messages_r1,
+        chat_options=ChatOptions(tools=[search_pubmed], tool_choice=ToolMode.AUTO),
+    )
+    # Round 2: Include tool history (FAILS)
+    messages_r2 = [
+        ChatMessage(role=Role.USER, text='Search for testosterone'),
+        response_r1.messages[0],  # Assistant with tool call
+        ChatMessage(role=Role.TOOL, text='Found 10 papers...'),
+        ChatMessage(role=Role.USER, text='Now search for libido'),
+    ]
+    # This will throw BadRequestError
+    response_r2 = await client._inner_get_response(
+        messages=messages_r2,
+        chat_options=ChatOptions(tools=[search_pubmed], tool_choice=ToolMode.AUTO),
+    )
+asyncio.run(test())
+```
+### BUG #2: Event Display
+```python
+import asyncio
+from src.orchestrators.advanced import AdvancedOrchestrator
+from agent_framework import MagenticAgentMessageEvent
+async def test():
+    orch = AdvancedOrchestrator(max_rounds=1)
+    async for event in orch._build_workflow().run_stream('Search for testosterone'):
+        if isinstance(event, MagenticAgentMessageEvent):
+            print(f"message.text = {event.message.text}")  # Shows repr string
+            break
+asyncio.run(test())
+```
+## Prior Fixes (Verified Working)
+The following fixes from the `fix/p0-aifunction-serialization` branch ARE working:
+1. **`_convert_tools()`**: Converts `AIFunction` objects to OpenAI-compatible JSON
+2. **`_parse_tool_calls()`**: Converts HF response tool calls to `FunctionCallContent`
+3. **Streaming accumulator**: Handles partial tool call deltas in streaming mode
+4. **Function invoking marker**: `__function_invoking_chat_client__ = True`
+These fixes solved the original P0 crash but revealed deeper issues.
+## Files Requiring Changes
+### Priority 1 (BUG #1)
+- `src/clients/huggingface.py`
+  - `_convert_messages()` - Add tool_calls and tool_call_id serialization
+### Priority 2 (BUG #2)
+- Investigation needed into `agent_framework` behavior
+- May require changes to `ChatResponse` structure
+- May require implementing `raw_representation` field
+## Risk Assessment
+| Risk | Mitigation |
+|------|------------|
+| Breaking existing OpenAI flow | Test with OpenAI after changes |
+| Framework incompatibility | Check agent_framework source/docs |
+| Regression in serialization | Add unit tests for all message types |
+## Timeline
+- **BUG #1** can likely be fixed in 1-2 hours with proper test coverage
+- **BUG #2** requires investigation of framework internals (unknown scope)
+## References
+- [HuggingFace Chat Completion API - Tool Use](https://huggingface.co/docs/huggingface_hub/package_reference/inference_client#huggingface_hub.InferenceClient.chat_completion)
+- [OpenAI Function Calling](https://platform.openai.com/docs/guides/function-calling)
+- Microsoft Agent Framework source code (internal)

src/clients/huggingface.py CHANGED Viewed

@@ -6,6 +6,7 @@ an OpenAI API key.
 """
 import asyncio
 from collections.abc import AsyncIterable, MutableSequence
 from functools import partial
 from typing import Any, cast
@@ -17,8 +18,13 @@ from agent_framework import (
     ChatOptions,
     ChatResponse,
     ChatResponseUpdate,
 )
-from agent_framework._types import FunctionCallContent
 from huggingface_hub import InferenceClient
 from src.utils.config import settings
@@ -26,6 +32,9 @@ from src.utils.config import settings
 logger = structlog.get_logger()
 class HuggingFaceChatClient(BaseChatClient):  # type: ignore[misc]
     """Adapter for HuggingFace Inference API with full function calling support."""
@@ -63,15 +72,52 @@ class HuggingFaceChatClient(BaseChatClient):  # type: ignore[misc]
         """Convert framework messages to HuggingFace format."""
         hf_messages: list[dict[str, Any]] = []
         for msg in messages:
-            # Basic conversion - extend as needed for multi-modal
-            content = msg.text or ""
             # msg.role can be string or enum - extract .value for enums
-            # str(Role.USER) -> "Role.USER" (wrong), Role.USER.value -> "user" (correct)
             if hasattr(msg.role, "value"):
                 role_str = str(msg.role.value)
             else:
                 role_str = str(msg.role)
-            hf_messages.append({"role": role_str, "content": content})
         return hf_messages
     def _convert_tools(self, tools: list[Any] | None) -> list[dict[str, Any]] | None:
@@ -112,12 +158,7 @@ class HuggingFaceChatClient(BaseChatClient):  # type: ignore[misc]
         return json_tools if json_tools else None
     def _parse_tool_calls(self, message: Any) -> list[FunctionCallContent]:
-        """Parse HuggingFace tool_calls into framework FunctionCallContent.
-        HF returns tool_calls as:
-            [ChatCompletionOutputToolCall(id='...', function=ChatCompletionOutputFunctionDefinition(
-                name='...', arguments='{"key": "value"}'), type='function')]
-        """
         contents: list[FunctionCallContent] = []
         if not hasattr(message, "tool_calls") or not message.tool_calls:
@@ -303,6 +344,8 @@ class HuggingFaceChatClient(BaseChatClient):  # type: ignore[misc]
                 if contents:
                     yield ChatResponseUpdate(
                         contents=contents,
                     )
         except Exception as e:

 """
 import asyncio
+import json
 from collections.abc import AsyncIterable, MutableSequence
 from functools import partial
 from typing import Any, cast
     ChatOptions,
     ChatResponse,
     ChatResponseUpdate,
+    FinishReason,
+    Role,
 )
+from agent_framework._middleware import use_chat_middleware
+from agent_framework._tools import use_function_invocation
+from agent_framework._types import FunctionCallContent, FunctionResultContent
+from agent_framework.observability import use_observability
 from huggingface_hub import InferenceClient
 from src.utils.config import settings
 logger = structlog.get_logger()
+@use_function_invocation
+@use_observability
+@use_chat_middleware
 class HuggingFaceChatClient(BaseChatClient):  # type: ignore[misc]
     """Adapter for HuggingFace Inference API with full function calling support."""
         """Convert framework messages to HuggingFace format."""
         hf_messages: list[dict[str, Any]] = []
         for msg in messages:
             # msg.role can be string or enum - extract .value for enums
             if hasattr(msg.role, "value"):
                 role_str = str(msg.role.value)
             else:
                 role_str = str(msg.role)
+            content_str = msg.text or ""
+            tool_calls = []
+            tool_call_id = None
+            # Process contents for tool calls and results
+            if msg.contents:
+                for item in msg.contents:
+                    if isinstance(item, FunctionCallContent):
+                        # This is an assistant message invoking a tool
+                        tool_calls.append(
+                            {
+                                "id": item.call_id,
+                                "type": "function",
+                                "function": {
+                                    "name": item.name,
+                                    "arguments": (
+                                        item.arguments
+                                        if isinstance(item.arguments, str)
+                                        else json.dumps(item.arguments)
+                                    ),
+                                },
+                            }
+                        )
+                    elif isinstance(item, FunctionResultContent):
+                        # This is a tool result message
+                        role_str = "tool"
+                        tool_call_id = item.call_id
+                        # For tool results, the content is the result string
+                        content_str = str(item.result) if item.result is not None else ""
+            message_dict: dict[str, Any] = {"role": role_str, "content": content_str}
+            if tool_calls:
+                message_dict["tool_calls"] = tool_calls
+            if tool_call_id:
+                message_dict["tool_call_id"] = tool_call_id
+            hf_messages.append(message_dict)
         return hf_messages
     def _convert_tools(self, tools: list[Any] | None) -> list[dict[str, Any]] | None:
         return json_tools if json_tools else None
     def _parse_tool_calls(self, message: Any) -> list[FunctionCallContent]:
+        """Parse HuggingFace tool_calls into framework FunctionCallContent."""
         contents: list[FunctionCallContent] = []
         if not hasattr(message, "tool_calls") or not message.tool_calls:
                 if contents:
                     yield ChatResponseUpdate(
                         contents=contents,
+                        role=Role.ASSISTANT,
+                        finish_reason=FinishReason.TOOL_CALLS,
                     )
         except Exception as e: