VibecoderMcSwaggins commited on
Commit
f815b05
Β·
1 Parent(s): 809ad60

docs: Update P0 bug doc - Bug #1 fixed, Bug #2 upstream issue filed

Browse files

- Bug #1 (History Serialization): FIXED in commit 809ad60
- Bug #2 (Repr String): Filed upstream issue microsoft/agent-framework#2562
- Added root cause analysis for _magentic.py line 1799
- Added workaround code (not implemented)
- Updated verification matrix and next steps

docs/bugs/P0_HUGGINGFACE_TOOL_CALLING_BROKEN.md CHANGED
@@ -1,267 +1,173 @@
1
  # P0 Bug: HuggingFace Free Tier Tool Calling Broken
2
 
3
  **Severity**: P0 (Critical) - Free Tier cannot perform multi-turn tool-based research
4
- **Status**: IN_PROGRESS - Root causes identified, fixes pending
5
  **Discovered**: 2025-12-01
6
  **Investigator**: Claude Code (Systematic First-Principles Analysis)
 
7
 
8
  ## Executive Summary
9
- The HuggingFace Free Tier fails to execute tools end-to-end. While the API calls themselves are valid, the **integration** with the Microsoft Agent Framework is missing a critical middleware component (`@use_function_invocation`), and the conversation history serialization is incomplete.
10
 
11
- ## Root Causes
12
 
13
- ### 1. Missing Tool Execution Middleware (The "Silent Failure")
14
- **Mechanism**:
15
- - The `OpenAIChatClient` uses the `@use_function_invocation` decorator, which creates an internal loop:
16
- 1. LLM proposes tools.
17
- 2. Middleware executes tools.
18
- 3. Middleware feeds results back to LLM.
19
- 4. LLM generates final answer.
20
- - The `HuggingFaceChatClient` **lacked this decorator**.
21
- - Result: The client returned raw tool calls to the `ChatAgent`. The `ChatAgent` passed them to the `MagenticAgentExecutor`.
22
- - **Cascade Failure**: The `MagenticAgentExecutor` (in the framework) has a bug/limitation where it handles tool-call-only messages by converting them to their string representation (`repr()`) because they lack text content. This led to the observed `<ChatMessage object ...>` corruption in the logs and history.
23
 
24
- ### 2. Framework Message Corruption (P1 - HIGH, External Bug)
25
- **Mechanism**:
26
- - When `MagenticAgentMessageEvent` (which carries agent responses) is generated by the `agent_framework`, the `ChatMessage` object it contains (specifically in `event.message` and its nested `TextContent`) often has its `.text` attribute populated with a Python object's `repr` string (e.g., `<agent_framework._types.ChatMessage object at 0x...>`) instead of the actual human-readable message.
27
- - DeepBoner's `_extract_text` method correctly identifies these `repr` strings and filters them out.
28
- - Result: The human-readable agent response is lost at the framework level before DeepBoner can process it for display, leading to empty or uninformative messages in the UI/logs (e.g., `searcher: ...`).
29
- **Impact**: Display/Logging only. Does not prevent tool execution or core logic, but severely degrades user experience and debugging visibility.
30
- **Root Cause**: This is an internal issue within the `agent_framework`'s event messaging mechanism, specifically how `ChatMessage` objects are constructed and passed through the `MagenticAgentMessageEvent`. DeepBoner cannot reliably recover the original message text when it has been replaced by a `repr` string by the framework itself.
31
- **Fix**: Requires an upstream fix or alternative message extraction strategy within the `agent_framework`. Until then, DeepBoner's UI/logs will display truncated or empty messages for these specific events.
32
 
33
- ## Solution Plan
34
-
35
- 1. **Fix History Serialization**: Update `_convert_messages` in `src/clients/huggingface.py` to correctly serialize `tool_calls` (Assistant role) and `tool_call_id` (Tool role) to the HuggingFace / OpenAI format.
36
- 2. **Enable Middleware**: Decorate `HuggingFaceChatClient` with `@use_function_invocation` (and `@use_chat_middleware`, `@use_observability` for parity).
37
- 3. **Display Fix**: Update `AdvancedOrchestrator._extract_text` to gracefully handle any remaining object representations, just in case.
38
-
39
- ## Verification
40
- - **Reproduction Script**: `reproduce_bugs.py` confirms the serialization failure.
41
- - **End-to-End Test**: `verify_p0_fix.py` (or similar) will be used to confirm the agent effectively uses tools and synthesizes an answer.
42
-
43
- ## Verified Findings
44
-
45
- ### What WORKS (Confirmed via Testing)
46
-
47
- 1. **Tool Serialization**: `_convert_tools()` correctly converts `AIFunction` β†’ OpenAI JSON format βœ…
48
- 2. **First API Call**: HuggingFace returns tool calls on the first request βœ…
49
- 3. **Tool Call Parsing**: `_parse_tool_calls()` correctly extracts `FunctionCallContent` βœ…
50
- 4. **Function Invoking Marker**: `__function_invoking_chat_client__ = True` is present βœ…
51
- 5. **Original P0 (JSON serialization)**: Fixed - no longer crashes with TypeError βœ…
52
-
53
- ### What is BROKEN (Root Causes)
54
 
55
  ---
56
 
57
- ## BUG #1: Conversation History Serialization (P0 - CRITICAL)
58
 
59
- ### Symptom
60
- Multi-turn conversations fail with `BadRequestError` from HuggingFace API.
61
 
62
- ### Root Cause
63
- `_convert_messages()` in `src/clients/huggingface.py` only extracts `role` and `content` from messages:
 
 
 
64
 
 
65
  ```python
66
- def _convert_messages(self, messages: MutableSequence[ChatMessage]) -> list[dict[str, Any]]:
67
- hf_messages: list[dict[str, Any]] = []
68
- for msg in messages:
69
- content = msg.text or ""
70
- # ... role extraction ...
71
- hf_messages.append({"role": role_str, "content": content}) # MISSING tool_calls and tool_call_id!
72
- return hf_messages
73
- ```
74
-
75
- ### What HuggingFace API Expects
76
 
77
- ```json
78
- [
79
- {"role": "user", "content": "Search for testosterone"},
80
- {
81
  "role": "assistant",
82
- "content": null,
83
- "tool_calls": [ // REQUIRED when assistant called a tool
84
- {
85
- "id": "call_123",
86
- "type": "function",
87
- "function": {"name": "search_pubmed", "arguments": "{\"query\": \"testosterone\"}"}
88
- }
89
- ]
90
- },
91
- {
92
- "role": "tool",
93
- "content": "Found 10 papers...",
94
- "tool_call_id": "call_123" // REQUIRED - must match the tool call id
95
- }
96
- ]
97
  ```
98
 
99
- ### What We Send
100
-
101
- ```json
102
- [
103
- {"role": "user", "content": "Search for testosterone"},
104
- {"role": "assistant", "content": ""}, // MISSING tool_calls!
105
- {"role": "tool", "content": "Found 10 papers..."} // MISSING tool_call_id!
106
- ]
107
- ```
108
-
109
- ### Impact
110
- - First LLM call works (tools called)
111
- - Second LLM call fails (API rejects malformed history)
112
- - Research loop never completes
113
-
114
- ### Fix Required
115
- Update `_convert_messages()` to:
116
- 1. Extract `tool_calls` from `ChatMessage.contents` (list of `FunctionCallContent`)
117
- 2. Add `tool_call_id` to tool messages (requires tracking call IDs)
118
-
119
  ---
120
 
121
- ## BUG #2: Framework Message Corruption (P1 - HIGH)
122
 
123
  ### Symptom
124
- `MagenticAgentMessageEvent.message.text` contains the repr string of a ChatMessage object:
125
  ```
126
  '<agent_framework._types.ChatMessage object at 0x10c394210>'
127
  ```
128
 
129
- ### Verified Behavior
 
130
 
131
  ```python
132
- # From workflow event inspection:
133
- event.message.text = '<agent_framework._types.ChatMessage object at 0x...>'
134
- event.message.contents[0] = TextContent(text='<agent_framework._types.ChatMessage object at 0x..>')
 
 
 
135
  ```
136
 
137
- ### Root Cause Hypothesis
138
- Somewhere in the Microsoft Agent Framework's workflow orchestration, when converting tool call responses from our `HuggingFaceChatClient`, the framework is:
139
- 1. Taking our `ChatMessage` response
140
- 2. Calling `str()` on it (which gives repr)
141
- 3. Creating a NEW `ChatMessage` with the repr as text content
142
 
143
- This may be due to:
144
- - Missing or incompatible `raw_representation` field
145
- - Framework expecting a specific message structure we don't provide
146
- - Type coercion issue in the workflow layer
147
 
148
- ### Impact
149
- - UI shows `<ChatMessage object at 0x...>` instead of actual content
150
- - Users cannot see what the agent found/did
151
- - Debugging is difficult
 
 
152
 
153
- ### Fix Required
154
- Investigate `agent_framework`'s `ChatAgent` and `MagenticBuilder` to understand:
155
- 1. How they process `ChatResponse` from the client
156
- 2. What structure they expect in `raw_representation`
157
- 3. Whether there's a required serialization method we're not implementing
158
 
159
- ---
160
-
161
- ## Verification Matrix
162
 
163
- | Component | Status | Test Command |
164
- |-----------|--------|--------------|
165
- | Tool Serialization | βœ… WORKS | `client._convert_tools([search_pubmed])` |
166
- | First Tool Call | βœ… WORKS | Single-turn API call returns `FunctionCallContent` |
167
- | Multi-turn History | ❌ BROKEN | BadRequestError on second call |
168
- | Event Display | ❌ BROKEN | Shows repr instead of content |
169
- | End-to-End Research | ❌ BROKEN | Max rounds reached, no synthesis |
170
 
171
- ## Reproduction Steps
172
-
173
- ### BUG #1: History Serialization
174
 
175
  ```python
176
- import asyncio
177
- from src.clients.huggingface import HuggingFaceChatClient
178
- from src.agents.tools import search_pubmed
179
- from agent_framework import ChatMessage, ChatOptions
180
- from agent_framework._types import Role, ToolMode, FunctionCallContent
181
-
182
- async def test():
183
- client = HuggingFaceChatClient()
184
-
185
- # Round 1: Get tool call
186
- messages_r1 = [
187
- ChatMessage(role=Role.USER, text='Search for testosterone'),
188
- ]
189
- response_r1 = await client._inner_get_response(
190
- messages=messages_r1,
191
- chat_options=ChatOptions(tools=[search_pubmed], tool_choice=ToolMode.AUTO),
192
- )
193
-
194
- # Round 2: Include tool history (FAILS)
195
- messages_r2 = [
196
- ChatMessage(role=Role.USER, text='Search for testosterone'),
197
- response_r1.messages[0], # Assistant with tool call
198
- ChatMessage(role=Role.TOOL, text='Found 10 papers...'),
199
- ChatMessage(role=Role.USER, text='Now search for libido'),
200
- ]
201
-
202
- # This will throw BadRequestError
203
- response_r2 = await client._inner_get_response(
204
- messages=messages_r2,
205
- chat_options=ChatOptions(tools=[search_pubmed], tool_choice=ToolMode.AUTO),
206
- )
207
-
208
- asyncio.run(test())
209
  ```
210
 
211
- ### BUG #2: Event Display
212
 
213
- ```python
214
- import asyncio
215
- from src.orchestrators.advanced import AdvancedOrchestrator
216
- from agent_framework import MagenticAgentMessageEvent
217
-
218
- async def test():
219
- orch = AdvancedOrchestrator(max_rounds=1)
220
- async for event in orch._build_workflow().run_stream('Search for testosterone'):
221
- if isinstance(event, MagenticAgentMessageEvent):
222
- print(f"message.text = {event.message.text}") # Shows repr string
223
- break
224
-
225
- asyncio.run(test())
226
- ```
227
-
228
- ## Prior Fixes (Verified Working)
229
 
230
- The following fixes from the `fix/p0-aifunction-serialization` branch ARE working:
231
 
232
- 1. **`_convert_tools()`**: Converts `AIFunction` objects to OpenAI-compatible JSON
233
- 2. **`_parse_tool_calls()`**: Converts HF response tool calls to `FunctionCallContent`
234
- 3. **Streaming accumulator**: Handles partial tool call deltas in streaming mode
235
- 4. **Function invoking marker**: `__function_invoking_chat_client__ = True`
 
 
 
 
236
 
237
- These fixes solved the original P0 crash but revealed deeper issues.
238
 
239
- ## Files Requiring Changes
240
 
241
- ### Priority 1 (BUG #1)
242
  - `src/clients/huggingface.py`
243
- - `_convert_messages()` - Add tool_calls and tool_call_id serialization
 
 
 
 
 
244
 
245
- ### Priority 2 (BUG #2)
246
- - Investigation needed into `agent_framework` behavior
247
- - May require changes to `ChatResponse` structure
248
- - May require implementing `raw_representation` field
249
 
250
- ## Risk Assessment
251
 
252
- | Risk | Mitigation |
253
- |------|------------|
254
- | Breaking existing OpenAI flow | Test with OpenAI after changes |
255
- | Framework incompatibility | Check agent_framework source/docs |
256
- | Regression in serialization | Add unit tests for all message types |
257
 
258
- ## Timeline
259
 
260
- - **BUG #1** can likely be fixed in 1-2 hours with proper test coverage
261
- - **BUG #2** requires investigation of framework internals (unknown scope)
 
 
 
 
 
 
262
 
263
  ## References
264
 
265
  - [HuggingFace Chat Completion API - Tool Use](https://huggingface.co/docs/huggingface_hub/package_reference/inference_client#huggingface_hub.InferenceClient.chat_completion)
266
  - [OpenAI Function Calling](https://platform.openai.com/docs/guides/function-calling)
267
- - Microsoft Agent Framework source code (internal)
 
 
1
  # P0 Bug: HuggingFace Free Tier Tool Calling Broken
2
 
3
  **Severity**: P0 (Critical) - Free Tier cannot perform multi-turn tool-based research
4
+ **Status**: PARTIALLY RESOLVED - Bug #1 FIXED, Bug #2 requires upstream fix
5
  **Discovered**: 2025-12-01
6
  **Investigator**: Claude Code (Systematic First-Principles Analysis)
7
+ **Last Updated**: 2025-12-01
8
 
9
  ## Executive Summary
 
10
 
11
+ The HuggingFace Free Tier had two critical bugs preventing end-to-end tool-based research:
12
 
13
+ 1. **Bug #1 (FIXED)**: Conversation history serialization missing `tool_calls` and `tool_call_id`
14
+ 2. **Bug #2 (UPSTREAM)**: Microsoft Agent Framework produces repr strings instead of message text
 
 
 
 
 
 
 
 
15
 
16
+ ## Current Status
 
 
 
 
 
 
 
17
 
18
+ | Bug | Status | Location | Fix |
19
+ |-----|--------|----------|-----|
20
+ | #1 History Serialization | βœ… **FIXED** | `src/clients/huggingface.py` | Commit `809ad60` |
21
+ | #2 Framework Repr Bug | ⏳ **UPSTREAM** | `agent_framework/_workflows/_magentic.py` | [Issue #2562](https://github.com/microsoft/agent-framework/issues/2562) |
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
22
 
23
  ---
24
 
25
+ ## BUG #1: Conversation History Serialization βœ… FIXED
26
 
27
+ ### What Was Wrong
28
+ `_convert_messages()` didn't serialize `tool_calls` (for assistant messages) or `tool_call_id` (for tool messages).
29
 
30
+ ### The Fix (Commit `809ad60`)
31
+ Updated `_convert_messages()` in `src/clients/huggingface.py:71-121` to:
32
+ 1. Extract `FunctionCallContent` from `msg.contents` β†’ `tool_calls` array
33
+ 2. Extract `FunctionResultContent` from `msg.contents` β†’ `tool_call_id`
34
+ 3. Properly format for HuggingFace/OpenAI API
35
 
36
+ ### Verification
37
  ```python
38
+ # Before fix: BadRequestError on multi-turn
39
+ # After fix: Multi-turn conversations work
 
 
 
 
 
 
 
 
40
 
41
+ # The message format is now correct:
42
+ {
 
 
43
  "role": "assistant",
44
+ "content": "",
45
+ "tool_calls": [{"id": "call_123", "type": "function", "function": {...}}]
46
+ }
 
 
 
 
 
 
 
 
 
 
 
 
47
  ```
48
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
49
  ---
50
 
51
+ ## BUG #2: Framework Message Corruption ⏳ UPSTREAM
52
 
53
  ### Symptom
54
+ `MagenticAgentMessageEvent.message.text` contains:
55
  ```
56
  '<agent_framework._types.ChatMessage object at 0x10c394210>'
57
  ```
58
 
59
+ ### Root Cause (CONFIRMED)
60
+ **File**: `agent_framework/_workflows/_magentic.py` line ~1799
61
 
62
  ```python
63
+ async def _invoke_agent(self, ctx, ...) -> ChatMessage:
64
+ # ...
65
+ if messages and len(messages) > 0:
66
+ last: ChatMessage = messages[-1]
67
+ text = last.text or str(last) # <-- BUG: str(last) gives repr!
68
+ msg = ChatMessage(role=role, text=text, author_name=author)
69
  ```
70
 
71
+ **Why it happens**:
72
+ 1. `ChatMessage.text` property only extracts `TextContent` items
73
+ 2. Tool-call-only messages have empty `.text` (returns `""`)
74
+ 3. `"" or str(last)` evaluates to `str(last)`
75
+ 4. `ChatMessage` has no `__str__` method β†’ default Python repr
76
 
77
+ ### Impact Assessment
 
 
 
78
 
79
+ | Aspect | Impact | Critical? |
80
+ |--------|--------|-----------|
81
+ | UI Display | Shows garbage instead of agent output | YES for UX |
82
+ | Logging | Can't debug what agents did | YES for debugging |
83
+ | Tool Execution | Tools ARE being called (middleware works) | NO - Works |
84
+ | Research Completion | Manager may not track progress properly | MAYBE - Unclear |
85
 
86
+ **Observed behavior**: Research loops often reach max rounds without synthesis. The Manager keeps saying "no progress" even though tools ARE being called. This COULD be:
87
+ 1. The repr bug affecting Manager's understanding
88
+ 2. Qwen 72B not handling tool message format well
89
+ 3. Unrelated orchestration issue
 
90
 
91
+ ### Upstream Issue Filed
92
+ **GitHub Issue**: https://github.com/microsoft/agent-framework/issues/2562
 
93
 
94
+ **Suggested fixes in issue**:
95
+ 1. **Minimal**: `text = last.text or ""`
96
+ 2. **Better UX**: Format tool calls for display
97
+ 3. **Best**: Add `__str__` to `ChatMessage` class
 
 
 
98
 
99
+ ### Workaround (Not Implemented)
100
+ We COULD modify `_extract_text()` in `advanced.py` to extract tool call names from `.contents` when text is empty/repr:
 
101
 
102
  ```python
103
+ def _extract_text(self, message: Any) -> str:
104
+ # ... existing logic ...
105
+
106
+ # Workaround: Extract tool call info when text is repr/empty
107
+ if hasattr(message, "contents") and message.contents:
108
+ tool_names = [
109
+ f"[Tool: {c.name}]"
110
+ for c in message.contents
111
+ if hasattr(c, "name") # FunctionCallContent
112
+ ]
113
+ if tool_names:
114
+ return " ".join(tool_names)
115
+
116
+ return ""
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
117
  ```
118
 
119
+ **Decision**: Not implementing until we confirm whether Bug #2 affects research completion or just display.
120
 
121
+ ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
122
 
123
+ ## Verification Matrix (Updated)
124
 
125
+ | Component | Status | Notes |
126
+ |-----------|--------|-------|
127
+ | Tool Serialization | βœ… WORKS | `_convert_tools()` |
128
+ | Tool Call Parsing | βœ… WORKS | `_parse_tool_calls()` |
129
+ | History Serialization | βœ… **FIXED** | `_convert_messages()` |
130
+ | Middleware Decorators | βœ… **FIXED** | `@use_function_invocation` etc. |
131
+ | Event Display | ❌ UPSTREAM | Shows repr - framework bug |
132
+ | End-to-End Research | ⚠️ UNCLEAR | Needs testing after upstream fix |
133
 
134
+ ---
135
 
136
+ ## Files Changed
137
 
138
+ ### Fixed (Commit `809ad60`)
139
  - `src/clients/huggingface.py`
140
+ - `_convert_messages()` - Now serializes `tool_calls` and `tool_call_id`
141
+ - Added `@use_function_invocation`, `@use_observability`, `@use_chat_middleware` decorators
142
+ - Added `__function_invoking_chat_client__ = True` marker
143
+
144
+ ### No Changes Needed
145
+ - `src/orchestrators/advanced.py` - `_extract_text()` already filters repr strings
146
 
147
+ ---
 
 
 
148
 
149
+ ## Related Upstream Issues
150
 
151
+ | Issue | Title | Status | Relevance |
152
+ |-------|-------|--------|-----------|
153
+ | [#2562](https://github.com/microsoft/agent-framework/issues/2562) | Repr string bug (OUR ISSUE) | OPEN | Direct cause |
154
+ | [#1366](https://github.com/microsoft/agent-framework/issues/1366) | Thread corruption - unexecuted tool calls | OPEN | Same area |
155
+ | [#2410](https://github.com/microsoft/agent-framework/issues/2410) | OpenAI client splits content/tool_calls | OPEN | Related bug |
156
 
157
+ ---
158
 
159
+ ## Next Steps
160
+
161
+ 1. **Monitor**: Watch for response to [Issue #2562](https://github.com/microsoft/agent-framework/issues/2562)
162
+ 2. **Test**: Run end-to-end research tests to see if Bug #2 actually blocks completion
163
+ 3. **Optional**: Implement workaround in `_extract_text()` if display is critical
164
+ 4. **Contribute**: Consider submitting PR to fix `_magentic.py` line 1799
165
+
166
+ ---
167
 
168
  ## References
169
 
170
  - [HuggingFace Chat Completion API - Tool Use](https://huggingface.co/docs/huggingface_hub/package_reference/inference_client#huggingface_hub.InferenceClient.chat_completion)
171
  - [OpenAI Function Calling](https://platform.openai.com/docs/guides/function-calling)
172
+ - [Microsoft Agent Framework Repository](https://github.com/microsoft/agent-framework)
173
+ - [Our Upstream Issue #2562](https://github.com/microsoft/agent-framework/issues/2562)