Spaces:

VibecoderMcSwaggins
/

DeepBoner

Paused

VibecoderMcSwaggins commited on 22 days ago

Commit

f815b05

1 Parent(s): 809ad60

docs: Update P0 bug doc - Bug #1 fixed, Bug #2 upstream issue filed

- Bug #1 (History Serialization): FIXED in commit 809ad60
- Bug #2 (Repr String): Filed upstream issue microsoft/agent-framework#2562
- Added root cause analysis for _magentic.py line 1799
- Added workaround code (not implemented)
- Updated verification matrix and next steps

Files changed (1) hide show

docs/bugs/P0_HUGGINGFACE_TOOL_CALLING_BROKEN.md +112 -206

docs/bugs/P0_HUGGINGFACE_TOOL_CALLING_BROKEN.md CHANGED Viewed

@@ -1,267 +1,173 @@
 # P0 Bug: HuggingFace Free Tier Tool Calling Broken
 **Severity**: P0 (Critical) - Free Tier cannot perform multi-turn tool-based research
-**Status**: IN_PROGRESS - Root causes identified, fixes pending
 **Discovered**: 2025-12-01
 **Investigator**: Claude Code (Systematic First-Principles Analysis)
 ## Executive Summary
-The HuggingFace Free Tier fails to execute tools end-to-end. While the API calls themselves are valid, the **integration** with the Microsoft Agent Framework is missing a critical middleware component (`@use_function_invocation`), and the conversation history serialization is incomplete.
-## Root Causes
-### 1. Missing Tool Execution Middleware (The "Silent Failure")
-**Mechanism**:
-- The `OpenAIChatClient` uses the `@use_function_invocation` decorator, which creates an internal loop:
-  1. LLM proposes tools.
-  2. Middleware executes tools.
-  3. Middleware feeds results back to LLM.
-  4. LLM generates final answer.
-- The `HuggingFaceChatClient` **lacked this decorator**.
-- Result: The client returned raw tool calls to the `ChatAgent`. The `ChatAgent` passed them to the `MagenticAgentExecutor`.
-- **Cascade Failure**: The `MagenticAgentExecutor` (in the framework) has a bug/limitation where it handles tool-call-only messages by converting them to their string representation (`repr()`) because they lack text content. This led to the observed `<ChatMessage object ...>` corruption in the logs and history.
-### 2. Framework Message Corruption (P1 - HIGH, External Bug)
-**Mechanism**:
-- When `MagenticAgentMessageEvent` (which carries agent responses) is generated by the `agent_framework`, the `ChatMessage` object it contains (specifically in `event.message` and its nested `TextContent`) often has its `.text` attribute populated with a Python object's `repr` string (e.g., `<agent_framework._types.ChatMessage object at 0x...>`) instead of the actual human-readable message.
-- DeepBoner's `_extract_text` method correctly identifies these `repr` strings and filters them out.
-- Result: The human-readable agent response is lost at the framework level before DeepBoner can process it for display, leading to empty or uninformative messages in the UI/logs (e.g., `searcher: ...`).
-**Impact**: Display/Logging only. Does not prevent tool execution or core logic, but severely degrades user experience and debugging visibility.
-**Root Cause**: This is an internal issue within the `agent_framework`'s event messaging mechanism, specifically how `ChatMessage` objects are constructed and passed through the `MagenticAgentMessageEvent`. DeepBoner cannot reliably recover the original message text when it has been replaced by a `repr` string by the framework itself.
-**Fix**: Requires an upstream fix or alternative message extraction strategy within the `agent_framework`. Until then, DeepBoner's UI/logs will display truncated or empty messages for these specific events.
-## Solution Plan
-1.  **Fix History Serialization**: Update `_convert_messages` in `src/clients/huggingface.py` to correctly serialize `tool_calls` (Assistant role) and `tool_call_id` (Tool role) to the HuggingFace / OpenAI format.
-2.  **Enable Middleware**: Decorate `HuggingFaceChatClient` with `@use_function_invocation` (and `@use_chat_middleware`, `@use_observability` for parity).
-3.  **Display Fix**: Update `AdvancedOrchestrator._extract_text` to gracefully handle any remaining object representations, just in case.
-## Verification
-- **Reproduction Script**: `reproduce_bugs.py` confirms the serialization failure.
-- **End-to-End Test**: `verify_p0_fix.py` (or similar) will be used to confirm the agent effectively uses tools and synthesizes an answer.
-## Verified Findings
-### What WORKS (Confirmed via Testing)
-1. **Tool Serialization**: `_convert_tools()` correctly converts `AIFunction` → OpenAI JSON format ✅
-2. **First API Call**: HuggingFace returns tool calls on the first request ✅
-3. **Tool Call Parsing**: `_parse_tool_calls()` correctly extracts `FunctionCallContent` ✅
-4. **Function Invoking Marker**: `__function_invoking_chat_client__ = True` is present ✅
-5. **Original P0 (JSON serialization)**: Fixed - no longer crashes with TypeError ✅
-### What is BROKEN (Root Causes)
 ---
-## BUG #1: Conversation History Serialization (P0 - CRITICAL)
-### Symptom
-Multi-turn conversations fail with `BadRequestError` from HuggingFace API.
-### Root Cause
-`_convert_messages()` in `src/clients/huggingface.py` only extracts `role` and `content` from messages:
 ```python
-def _convert_messages(self, messages: MutableSequence[ChatMessage]) -> list[dict[str, Any]]:
-    hf_messages: list[dict[str, Any]] = []
-    for msg in messages:
-        content = msg.text or ""
-        # ... role extraction ...
-        hf_messages.append({"role": role_str, "content": content})  # MISSING tool_calls and tool_call_id!
-    return hf_messages
-```
-### What HuggingFace API Expects
-```json
-[
-  {"role": "user", "content": "Search for testosterone"},
-  {
     "role": "assistant",
-    "content": null,
-    "tool_calls": [  // REQUIRED when assistant called a tool
-      {
-        "id": "call_123",
-        "type": "function",
-        "function": {"name": "search_pubmed", "arguments": "{\"query\": \"testosterone\"}"}
-      }
-    ]
-  },
-  {
-    "role": "tool",
-    "content": "Found 10 papers...",
-    "tool_call_id": "call_123"  // REQUIRED - must match the tool call id
-  }
-]
 ```
-### What We Send
-```json
-[
-  {"role": "user", "content": "Search for testosterone"},
-  {"role": "assistant", "content": ""},  // MISSING tool_calls!
-  {"role": "tool", "content": "Found 10 papers..."}  // MISSING tool_call_id!
-]
-```
-### Impact
-- First LLM call works (tools called)
-- Second LLM call fails (API rejects malformed history)
-- Research loop never completes
-### Fix Required
-Update `_convert_messages()` to:
-1. Extract `tool_calls` from `ChatMessage.contents` (list of `FunctionCallContent`)
-2. Add `tool_call_id` to tool messages (requires tracking call IDs)
 ---
-## BUG #2: Framework Message Corruption (P1 - HIGH)
 ### Symptom
-`MagenticAgentMessageEvent.message.text` contains the repr string of a ChatMessage object:
 ```
 '<agent_framework._types.ChatMessage object at 0x10c394210>'
 ```
-### Verified Behavior
 ```python
-# From workflow event inspection:
-event.message.text = '<agent_framework._types.ChatMessage object at 0x...>'
-event.message.contents[0] = TextContent(text='<agent_framework._types.ChatMessage object at 0x..>')
 ```
-### Root Cause Hypothesis
-Somewhere in the Microsoft Agent Framework's workflow orchestration, when converting tool call responses from our `HuggingFaceChatClient`, the framework is:
-1. Taking our `ChatMessage` response
-2. Calling `str()` on it (which gives repr)
-3. Creating a NEW `ChatMessage` with the repr as text content
-This may be due to:
-- Missing or incompatible `raw_representation` field
-- Framework expecting a specific message structure we don't provide
-- Type coercion issue in the workflow layer
-### Impact
-- UI shows `<ChatMessage object at 0x...>` instead of actual content
-- Users cannot see what the agent found/did
-- Debugging is difficult
-### Fix Required
-Investigate `agent_framework`'s `ChatAgent` and `MagenticBuilder` to understand:
-1. How they process `ChatResponse` from the client
-2. What structure they expect in `raw_representation`
-3. Whether there's a required serialization method we're not implementing
----
-## Verification Matrix
-| Component | Status | Test Command |
-|-----------|--------|--------------|
-| Tool Serialization | ✅ WORKS | `client._convert_tools([search_pubmed])` |
-| First Tool Call | ✅ WORKS | Single-turn API call returns `FunctionCallContent` |
-| Multi-turn History | ❌ BROKEN | BadRequestError on second call |
-| Event Display | ❌ BROKEN | Shows repr instead of content |
-| End-to-End Research | ❌ BROKEN | Max rounds reached, no synthesis |
-## Reproduction Steps
-### BUG #1: History Serialization
 ```python
-import asyncio
-from src.clients.huggingface import HuggingFaceChatClient
-from src.agents.tools import search_pubmed
-from agent_framework import ChatMessage, ChatOptions
-from agent_framework._types import Role, ToolMode, FunctionCallContent
-async def test():
-    client = HuggingFaceChatClient()
-    # Round 1: Get tool call
-    messages_r1 = [
-        ChatMessage(role=Role.USER, text='Search for testosterone'),
-    ]
-    response_r1 = await client._inner_get_response(
-        messages=messages_r1,
-        chat_options=ChatOptions(tools=[search_pubmed], tool_choice=ToolMode.AUTO),
-    )
-    # Round 2: Include tool history (FAILS)
-    messages_r2 = [
-        ChatMessage(role=Role.USER, text='Search for testosterone'),
-        response_r1.messages[0],  # Assistant with tool call
-        ChatMessage(role=Role.TOOL, text='Found 10 papers...'),
-        ChatMessage(role=Role.USER, text='Now search for libido'),
-    ]
-    # This will throw BadRequestError
-    response_r2 = await client._inner_get_response(
-        messages=messages_r2,
-        chat_options=ChatOptions(tools=[search_pubmed], tool_choice=ToolMode.AUTO),
-    )
-asyncio.run(test())
 ```
-### BUG #2: Event Display
-```python
-import asyncio
-from src.orchestrators.advanced import AdvancedOrchestrator
-from agent_framework import MagenticAgentMessageEvent
-async def test():
-    orch = AdvancedOrchestrator(max_rounds=1)
-    async for event in orch._build_workflow().run_stream('Search for testosterone'):
-        if isinstance(event, MagenticAgentMessageEvent):
-            print(f"message.text = {event.message.text}")  # Shows repr string
-            break
-asyncio.run(test())
-```
-## Prior Fixes (Verified Working)
-The following fixes from the `fix/p0-aifunction-serialization` branch ARE working:
-1. **`_convert_tools()`**: Converts `AIFunction` objects to OpenAI-compatible JSON
-2. **`_parse_tool_calls()`**: Converts HF response tool calls to `FunctionCallContent`
-3. **Streaming accumulator**: Handles partial tool call deltas in streaming mode
-4. **Function invoking marker**: `__function_invoking_chat_client__ = True`
-These fixes solved the original P0 crash but revealed deeper issues.
-## Files Requiring Changes
-### Priority 1 (BUG #1)
 - `src/clients/huggingface.py`
-  - `_convert_messages()` - Add tool_calls and tool_call_id serialization
-### Priority 2 (BUG #2)
-- Investigation needed into `agent_framework` behavior
-- May require changes to `ChatResponse` structure
-- May require implementing `raw_representation` field
-## Risk Assessment
-| Risk | Mitigation |
-|------|------------|
-| Breaking existing OpenAI flow | Test with OpenAI after changes |
-| Framework incompatibility | Check agent_framework source/docs |
-| Regression in serialization | Add unit tests for all message types |
-## Timeline
-- **BUG #1** can likely be fixed in 1-2 hours with proper test coverage
-- **BUG #2** requires investigation of framework internals (unknown scope)
 ## References
 - [HuggingFace Chat Completion API - Tool Use](https://huggingface.co/docs/huggingface_hub/package_reference/inference_client#huggingface_hub.InferenceClient.chat_completion)
 - [OpenAI Function Calling](https://platform.openai.com/docs/guides/function-calling)
-- Microsoft Agent Framework source code (internal)

 # P0 Bug: HuggingFace Free Tier Tool Calling Broken
 **Severity**: P0 (Critical) - Free Tier cannot perform multi-turn tool-based research
+**Status**: PARTIALLY RESOLVED - Bug #1 FIXED, Bug #2 requires upstream fix
 **Discovered**: 2025-12-01
 **Investigator**: Claude Code (Systematic First-Principles Analysis)
+**Last Updated**: 2025-12-01
 ## Executive Summary
+The HuggingFace Free Tier had two critical bugs preventing end-to-end tool-based research:
+1. **Bug #1 (FIXED)**: Conversation history serialization missing `tool_calls` and `tool_call_id`
+2. **Bug #2 (UPSTREAM)**: Microsoft Agent Framework produces repr strings instead of message text
+## Current Status
+| Bug | Status | Location | Fix |
+|-----|--------|----------|-----|
+| #1 History Serialization | ✅ **FIXED** | `src/clients/huggingface.py` | Commit `809ad60` |
+| #2 Framework Repr Bug | ⏳ **UPSTREAM** | `agent_framework/_workflows/_magentic.py` | [Issue #2562](https://github.com/microsoft/agent-framework/issues/2562) |
 ---
+## BUG #1: Conversation History Serialization ✅ FIXED
+### What Was Wrong
+`_convert_messages()` didn't serialize `tool_calls` (for assistant messages) or `tool_call_id` (for tool messages).
+### The Fix (Commit `809ad60`)
+Updated `_convert_messages()` in `src/clients/huggingface.py:71-121` to:
+1. Extract `FunctionCallContent` from `msg.contents` → `tool_calls` array
+2. Extract `FunctionResultContent` from `msg.contents` → `tool_call_id`
+3. Properly format for HuggingFace/OpenAI API
+### Verification
 ```python
+# Before fix: BadRequestError on multi-turn
+# After fix: Multi-turn conversations work
+# The message format is now correct:
+{
     "role": "assistant",
+    "content": "",
+    "tool_calls": [{"id": "call_123", "type": "function", "function": {...}}]
+}
 ```
 ---
+## BUG #2: Framework Message Corruption ⏳ UPSTREAM
 ### Symptom
+`MagenticAgentMessageEvent.message.text` contains:
 ```
 '<agent_framework._types.ChatMessage object at 0x10c394210>'
 ```
+### Root Cause (CONFIRMED)
+**File**: `agent_framework/_workflows/_magentic.py` line ~1799
 ```python
+async def _invoke_agent(self, ctx, ...) -> ChatMessage:
+    # ...
+    if messages and len(messages) > 0:
+        last: ChatMessage = messages[-1]
+        text = last.text or str(last)  # <-- BUG: str(last) gives repr!
+        msg = ChatMessage(role=role, text=text, author_name=author)
 ```
+**Why it happens**:
+1. `ChatMessage.text` property only extracts `TextContent` items
+2. Tool-call-only messages have empty `.text` (returns `""`)
+3. `"" or str(last)` evaluates to `str(last)`
+4. `ChatMessage` has no `__str__` method → default Python repr
+### Impact Assessment
+| Aspect | Impact | Critical? |
+|--------|--------|-----------|
+| UI Display | Shows garbage instead of agent output | YES for UX |
+| Logging | Can't debug what agents did | YES for debugging |
+| Tool Execution | Tools ARE being called (middleware works) | NO - Works |
+| Research Completion | Manager may not track progress properly | MAYBE - Unclear |
+**Observed behavior**: Research loops often reach max rounds without synthesis. The Manager keeps saying "no progress" even though tools ARE being called. This COULD be:
+1. The repr bug affecting Manager's understanding
+2. Qwen 72B not handling tool message format well
+3. Unrelated orchestration issue
+### Upstream Issue Filed
+**GitHub Issue**: https://github.com/microsoft/agent-framework/issues/2562
+**Suggested fixes in issue**:
+1. **Minimal**: `text = last.text or ""`
+2. **Better UX**: Format tool calls for display
+3. **Best**: Add `__str__` to `ChatMessage` class
+### Workaround (Not Implemented)
+We COULD modify `_extract_text()` in `advanced.py` to extract tool call names from `.contents` when text is empty/repr:
 ```python
+def _extract_text(self, message: Any) -> str:
+    # ... existing logic ...
+    # Workaround: Extract tool call info when text is repr/empty
+    if hasattr(message, "contents") and message.contents:
+        tool_names = [
+            f"[Tool: {c.name}]"
+            for c in message.contents
+            if hasattr(c, "name")  # FunctionCallContent
+        ]
+        if tool_names:
+            return " ".join(tool_names)
+    return ""
 ```
+**Decision**: Not implementing until we confirm whether Bug #2 affects research completion or just display.
+---
+## Verification Matrix (Updated)
+| Component | Status | Notes |
+|-----------|--------|-------|
+| Tool Serialization | ✅ WORKS | `_convert_tools()` |
+| Tool Call Parsing | ✅ WORKS | `_parse_tool_calls()` |
+| History Serialization | ✅ **FIXED** | `_convert_messages()` |
+| Middleware Decorators | ✅ **FIXED** | `@use_function_invocation` etc. |
+| Event Display | ❌ UPSTREAM | Shows repr - framework bug |
+| End-to-End Research | ⚠️ UNCLEAR | Needs testing after upstream fix |
+---
+## Files Changed
+### Fixed (Commit `809ad60`)
 - `src/clients/huggingface.py`
+  - `_convert_messages()` - Now serializes `tool_calls` and `tool_call_id`
+  - Added `@use_function_invocation`, `@use_observability`, `@use_chat_middleware` decorators
+  - Added `__function_invoking_chat_client__ = True` marker
+### No Changes Needed
+- `src/orchestrators/advanced.py` - `_extract_text()` already filters repr strings
+---
+## Related Upstream Issues
+| Issue | Title | Status | Relevance |
+|-------|-------|--------|-----------|
+| [#2562](https://github.com/microsoft/agent-framework/issues/2562) | Repr string bug (OUR ISSUE) | OPEN | Direct cause |
+| [#1366](https://github.com/microsoft/agent-framework/issues/1366) | Thread corruption - unexecuted tool calls | OPEN | Same area |
+| [#2410](https://github.com/microsoft/agent-framework/issues/2410) | OpenAI client splits content/tool_calls | OPEN | Related bug |
+---
+## Next Steps
+1. **Monitor**: Watch for response to [Issue #2562](https://github.com/microsoft/agent-framework/issues/2562)
+2. **Test**: Run end-to-end research tests to see if Bug #2 actually blocks completion
+3. **Optional**: Implement workaround in `_extract_text()` if display is critical
+4. **Contribute**: Consider submitting PR to fix `_magentic.py` line 1799
+---
 ## References
 - [HuggingFace Chat Completion API - Tool Use](https://huggingface.co/docs/huggingface_hub/package_reference/inference_client#huggingface_hub.InferenceClient.chat_completion)
 - [OpenAI Function Calling](https://platform.openai.com/docs/guides/function-calling)
+- [Microsoft Agent Framework Repository](https://github.com/microsoft/agent-framework)
+- [Our Upstream Issue #2562](https://github.com/microsoft/agent-framework/issues/2562)