fix(huggingface): P1 Free Tier tool execution - Remove premature marker (#121)
Browse files## Summary
Fixes P1 bug where Free Tier tool calls were never executed because `@use_function_invocation` decorator was skipped.
## Root Cause
`HuggingFaceChatClient` had `__function_invoking_chat_client__ = True` in class body, causing decorator early return.
## Changes
- Remove premature marker from `src/clients/huggingface.py`
- Add `docs/architecture/system_registry.md` as canonical SSOT for wiring
- Document P1 root cause analysis
- Address all CodeRabbit review findings
## Impact
- Free Tier tool execution now works correctly
- P2 7B garbage output superseded (was symptom, not cause)
- docs/architecture/system_registry.md +137 -0
- docs/bugs/ACTIVE_BUGS.md +3 -40
- docs/bugs/P1_FREE_TIER_TOOL_EXECUTION_FAILURE.md +319 -0
- docs/bugs/P2_7B_MODEL_GARBAGE_OUTPUT.md +47 -5
- docs/bugs/{P1_GRADIO_EXAMPLE_CLICK_AUTO_SUBMIT.md β archive/P1_GRADIO_EXAMPLE_CLICK_AUTO_SUBMIT.md} +1 -1
- src/clients/huggingface.py +0 -4
docs/architecture/system_registry.md
ADDED
|
@@ -0,0 +1,137 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# System Registry & Wiring Architecture
|
| 2 |
+
**Status**: Active / Canonical
|
| 3 |
+
**Last Updated**: 2025-12-03
|
| 4 |
+
|
| 5 |
+
This document serves as the **Source of Truth** for the architectural wiring of the agent framework. It defines the strict rules for decorators, protocol markers, and the tool registry to prevent regression and ensure correct system behavior.
|
| 6 |
+
|
| 7 |
+
---
|
| 8 |
+
|
| 9 |
+
## 1. Decorator Registry
|
| 10 |
+
|
| 11 |
+
The agent framework relies on a strict decorator stack to inject functionality into `ChatClient` implementations. The **order of application** is critical for correct behavior.
|
| 12 |
+
|
| 13 |
+
### Standard Stack (Bottom-Up Order)
|
| 14 |
+
|
| 15 |
+
| Order | Decorator | Purpose | Source | Critical Notes |
|
| 16 |
+
|:--|:---|:---|:---|:---|
|
| 17 |
+
| **1 (Inner)** | `@use_chat_middleware` | Handles request/response middleware processing (e.g. logging, filtering). | `agent_framework._middleware` | Must be closest to the class. |
|
| 18 |
+
| **2** | `@use_observability` | Injects tracing and metrics (OpenTelemetry/logging). | `agent_framework.observability` | Wraps the middleware-enhanced client. |
|
| 19 |
+
| **3 (Outer)** | `@use_function_invocation` | **CRITICAL**: Intercepts `FunctionCallContent` in responses, **executes the Python function**, and recursively calls the LLM with the result. | `agent_framework._tools` | **MUST NOT** be used if `__function_invoking_chat_client__ = True` is set (see Markers). |
|
| 20 |
+
|
| 21 |
+
### Correct Usage Example
|
| 22 |
+
|
| 23 |
+
```python
|
| 24 |
+
@use_function_invocation # <--- 3. Handles tool execution loop
|
| 25 |
+
@use_observability # <--- 2. Adds tracing
|
| 26 |
+
@use_chat_middleware # <--- 1. Adds middleware support
|
| 27 |
+
class MyChatClient(BaseChatClient):
|
| 28 |
+
...
|
| 29 |
+
```
|
| 30 |
+
|
| 31 |
+
---
|
| 32 |
+
|
| 33 |
+
## 2. Protocol Markers
|
| 34 |
+
|
| 35 |
+
Special class attributes (dunder methods/variables) that control framework behavior.
|
| 36 |
+
|
| 37 |
+
| Marker | Value | Purpose | Set By | Read By | Impact of Misuse |
|
| 38 |
+
|:---|:---|:---|:---|:---|:---|
|
| 39 |
+
| `__function_invoking_chat_client__` | `bool` | Signals that this client **natively handles** the tool execution loop internally. | `ChatClient` Class Body | `@use_function_invocation` | **CRITICAL BUG**: If set to `True` but the client *doesn't* execute tools, tool calls will be generated by the LLM but **never executed**. The agent will hang or hallucinate results. |
|
| 40 |
+
|
| 41 |
+
### Wiring Rules
|
| 42 |
+
* **Default Clients (OpenAI/HuggingFace):** Should generally **NOT** set this marker. Rely on `@use_function_invocation` to handle execution.
|
| 43 |
+
* **Special Clients:** Only set to `True` if you are implementing a custom loop that executes tools and feeds results back without the framework's help.
|
| 44 |
+
|
| 45 |
+
### Setting Responsibility
|
| 46 |
+
* **Default:** Do not set `__function_invoking_chat_client__` in the class body. The `@use_function_invocation` decorator sets it automatically after wrapping.
|
| 47 |
+
* **Custom Loop:** Only set to `True` if you have implemented a custom tool execution loop that does not rely on the framework's decorator.
|
| 48 |
+
|
| 49 |
+
---
|
| 50 |
+
|
| 51 |
+
## 3. Tool Inventory
|
| 52 |
+
|
| 53 |
+
### 3.1 AI Functions (Agent-Callable Tools)
|
| 54 |
+
|
| 55 |
+
These are the `@ai_function` decorated functions that agents can invoke. The framework executes these via `@use_function_invocation`.
|
| 56 |
+
|
| 57 |
+
| Function Name | File Path | Description |
|
| 58 |
+
|:---|:---|:---|
|
| 59 |
+
| `search_pubmed` | `src/agents/tools.py:21` | Searches PubMed for biomedical literature |
|
| 60 |
+
| `search_clinical_trials` | `src/agents/tools.py:81` | Searches ClinicalTrials.gov for clinical studies |
|
| 61 |
+
| `search_preprints` | `src/agents/tools.py:121` | Searches Europe PMC for preprints and papers |
|
| 62 |
+
| `get_bibliography` | `src/agents/tools.py:161` | Returns collected references for final report |
|
| 63 |
+
| `execute_python_code` | `src/agents/code_executor_agent.py:16` | Executes Python code in Modal sandbox |
|
| 64 |
+
| `search_web` | `src/agents/retrieval_agent.py:17` | Searches the web for additional context |
|
| 65 |
+
|
| 66 |
+
### 3.2 Tool Classes (Internal Wrappers)
|
| 67 |
+
|
| 68 |
+
These are **internal implementation wrappers** used by the AI Functions. They are NOT directly callable by agents.
|
| 69 |
+
|
| 70 |
+
| Class | File Path | Used By |
|
| 71 |
+
|:---|:---|:---|
|
| 72 |
+
| `PubMedTool` | `src/tools/pubmed.py` | `search_pubmed` |
|
| 73 |
+
| `ClinicalTrialsTool` | `src/tools/clinicaltrials.py` | `search_clinical_trials` |
|
| 74 |
+
| `EuropePMCTool` | `src/tools/europepmc.py` | `search_preprints` |
|
| 75 |
+
| `ModalCodeExecutor` | `src/tools/code_execution.py:44` | `execute_python_code` (via `get_code_executor()`) |
|
| 76 |
+
| `OpenAlexTool` | `src/tools/openalex.py` | (Reserved for future use) |
|
| 77 |
+
| `WebSearchTool` | `src/tools/web_search.py` | `search_web` |
|
| 78 |
+
| `SearchHandler` | `src/tools/search_handler.py` | Orchestrates parallel searches |
|
| 79 |
+
|
| 80 |
+
---
|
| 81 |
+
|
| 82 |
+
## 4. Client Implementation Guide
|
| 83 |
+
|
| 84 |
+
When adding a new LLM provider, follow this strict pattern:
|
| 85 |
+
|
| 86 |
+
### A. The "Native Execution" Fallacy
|
| 87 |
+
Do not assume that because an API supports "function calling" (parsing JSON), the client supports "function execution" (running Python code).
|
| 88 |
+
* **Function Calling:** LLM -> JSON (Client responsibility)
|
| 89 |
+
* **Function Execution:** JSON -> Python Result -> LLM (Framework responsibility via `@use_function_invocation`)
|
| 90 |
+
|
| 91 |
+
### B. Reference Implementation
|
| 92 |
+
|
| 93 |
+
```python
|
| 94 |
+
from agent_framework import BaseChatClient
|
| 95 |
+
from agent_framework._tools import use_function_invocation
|
| 96 |
+
from agent_framework.observability import use_observability
|
| 97 |
+
from agent_framework._middleware import use_chat_middleware
|
| 98 |
+
|
| 99 |
+
# 1. Apply decorators in this EXACT order
|
| 100 |
+
@use_function_invocation
|
| 101 |
+
@use_observability
|
| 102 |
+
@use_chat_middleware
|
| 103 |
+
class NewProviderChatClient(BaseChatClient):
|
| 104 |
+
|
| 105 |
+
# 2. DO NOT set this unless you know what you are doing
|
| 106 |
+
# __function_invoking_chat_client__ = True <-- DELETE THIS
|
| 107 |
+
|
| 108 |
+
async def _inner_get_response(self, ...):
|
| 109 |
+
# 3. Parse API response -> FunctionCallContent
|
| 110 |
+
# 4. Return ChatResponse with contents=[FunctionCallContent(...)]
|
| 111 |
+
pass
|
| 112 |
+
|
| 113 |
+
async def _inner_get_streaming_response(self, ...):
|
| 114 |
+
# 5. Yield FunctionCallContent when tool calls are detected
|
| 115 |
+
pass
|
| 116 |
+
```
|
| 117 |
+
|
| 118 |
+
---
|
| 119 |
+
|
| 120 |
+
## 5. Known Issues & Gotchas
|
| 121 |
+
|
| 122 |
+
* **~~P1 Bug - Premature Marker Setting~~ (FIXED):** The `HuggingFaceChatClient` previously set `__function_invoking_chat_client__ = True` in the class body, which caused `@use_function_invocation` to skip wrapping. **Resolution:** Marker removed; decorator now sets it correctly. See `docs/bugs/P1_FREE_TIER_TOOL_EXECUTION_FAILURE.md`.
|
| 123 |
+
* **HuggingFace Provider Routing:** Qwen2.5-7B-Instruct routes to Together.ai (not native HF). Tool call parsing may be inconsistent with complex multi-agent prompts.
|
| 124 |
+
* **Model Hallucination:** If tool execution fails (due to incorrect wiring), models like Qwen2.5-7B will often **hallucinate** fake tool results as text. Always verify `AgentRunResponse` contains actual `FunctionResultContent`.
|
| 125 |
+
|
| 126 |
+
---
|
| 127 |
+
|
| 128 |
+
## 6. Verification Checklist
|
| 129 |
+
|
| 130 |
+
When adding or modifying a ChatClient:
|
| 131 |
+
|
| 132 |
+
- [ ] Decorators applied in correct order: `@use_function_invocation` β `@use_observability` β `@use_chat_middleware`
|
| 133 |
+
- [ ] `__function_invoking_chat_client__` is NOT set in class body (unless implementing custom execution loop)
|
| 134 |
+
- [ ] Verify `@use_function_invocation` decorator actually wraps methods (check `__wrapped__` attribute at runtime)
|
| 135 |
+
- [ ] Tool calls parsed into `FunctionCallContent` objects
|
| 136 |
+
- [ ] Streaming yields `FunctionCallContent` at end of stream
|
| 137 |
+
- [ ] Run `make check` to verify all tests pass
|
docs/bugs/ACTIVE_BUGS.md
CHANGED
|
@@ -9,46 +9,6 @@
|
|
| 9 |
|
| 10 |
## Currently Active Bugs
|
| 11 |
|
| 12 |
-
### P1 - Gradio Example Click Auto-Submits Instead of Loading
|
| 13 |
-
|
| 14 |
-
**File:** `docs/bugs/P1_GRADIO_EXAMPLE_CLICK_AUTO_SUBMIT.md`
|
| 15 |
-
**Status:** OPEN - Simple Fix Available
|
| 16 |
-
|
| 17 |
-
**Problem:** Clicking on example questions immediately starts the research agent instead of loading the text into the input field. This breaks the BYOK (Bring Your Own Key) flow because:
|
| 18 |
-
1. User clicks example β chat starts with Free Tier
|
| 19 |
-
2. User then tries to enter API key β already too late
|
| 20 |
-
3. Session state becomes confused
|
| 21 |
-
|
| 22 |
-
**Root Cause:**
|
| 23 |
-
1. Missing `run_examples_on_click=False` in ChatInterface
|
| 24 |
-
2. HuggingFace Spaces defaults `cache_examples=True`, which overrides `run_examples_on_click`
|
| 25 |
-
3. Examples pass `None` for api_key, overwriting user settings
|
| 26 |
-
|
| 27 |
-
**Fix:** Add two parameters to `gr.ChatInterface()` in `src/app.py`:
|
| 28 |
-
```python
|
| 29 |
-
cache_examples=False,
|
| 30 |
-
run_examples_on_click=False,
|
| 31 |
-
```
|
| 32 |
-
|
| 33 |
-
---
|
| 34 |
-
|
| 35 |
-
### P2 - 7B Model Produces Garbage Streaming Output
|
| 36 |
-
|
| 37 |
-
**File:** `docs/bugs/P2_7B_MODEL_GARBAGE_OUTPUT.md`
|
| 38 |
-
**Status:** OPEN - Investigating
|
| 39 |
-
|
| 40 |
-
**Problem:** When running Free Tier (Qwen2.5-7B-Instruct), the streaming output shows garbage tokens like "yarg", "PostalCodes", "FunctionFlags" instead of coherent agent reasoning.
|
| 41 |
-
|
| 42 |
-
**Root Cause:** The 7B model has insufficient reasoning capacity for the complex multi-agent framework prompts.
|
| 43 |
-
|
| 44 |
-
**Potential Fixes:**
|
| 45 |
-
1. Switch to a better small model (Mistral-7B, Phi-3, Gemma-2-9B, Qwen2.5-14B)
|
| 46 |
-
2. Simplify Free Tier architecture to single-agent mode
|
| 47 |
-
3. Add output filtering/validation
|
| 48 |
-
4. Prompt engineering specifically for 7B models
|
| 49 |
-
|
| 50 |
-
---
|
| 51 |
-
|
| 52 |
### P3 - Progress Bar Positioning in ChatInterface
|
| 53 |
|
| 54 |
**File:** `docs/bugs/P3_PROGRESS_BAR_POSITIONING.md`
|
|
@@ -86,6 +46,8 @@ All resolved bugs have been moved to `docs/bugs/archive/`. Summary:
|
|
| 86 |
- **P0 Advanced Mode Timeout No Synthesis** - FIXED, actual synthesis on timeout
|
| 87 |
|
| 88 |
### P1 Bugs (All FIXED)
|
|
|
|
|
|
|
| 89 |
- **P1 HuggingFace Router 401 Hyperbolic** - FIXED, invalid token was root cause
|
| 90 |
- **P1 HuggingFace Novita 500 Error** - SUPERSEDED, switched to 7B model
|
| 91 |
- **P1 Advanced Mode Uninterpretable Chain-of-Thought** - FIXED in PR #107
|
|
@@ -93,6 +55,7 @@ All resolved bugs have been moved to `docs/bugs/archive/`. Summary:
|
|
| 93 |
- **P1 Simple Mode Removed Breaks Free Tier UX** - FIXED via Accumulator Pattern (PR #117)
|
| 94 |
|
| 95 |
### P2 Bugs (All FIXED)
|
|
|
|
| 96 |
- **P2 Advanced Mode Cold Start No Feedback** - FIXED, all phases complete
|
| 97 |
- **P2 Architectural BYOK Gaps** - FIXED, end-to-end BYOK support in PR #119
|
| 98 |
|
|
|
|
| 9 |
|
| 10 |
## Currently Active Bugs
|
| 11 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 12 |
### P3 - Progress Bar Positioning in ChatInterface
|
| 13 |
|
| 14 |
**File:** `docs/bugs/P3_PROGRESS_BAR_POSITIONING.md`
|
|
|
|
| 46 |
- **P0 Advanced Mode Timeout No Synthesis** - FIXED, actual synthesis on timeout
|
| 47 |
|
| 48 |
### P1 Bugs (All FIXED)
|
| 49 |
+
- **P1 Free Tier Tool Execution Failure** - FIXED in PR fix/P1-free-tier-tool-execution, removed premature marker
|
| 50 |
+
- **P1 Gradio Example Click Auto-Submits** - FIXED in PR #120, prevents auto-submit on example click
|
| 51 |
- **P1 HuggingFace Router 401 Hyperbolic** - FIXED, invalid token was root cause
|
| 52 |
- **P1 HuggingFace Novita 500 Error** - SUPERSEDED, switched to 7B model
|
| 53 |
- **P1 Advanced Mode Uninterpretable Chain-of-Thought** - FIXED in PR #107
|
|
|
|
| 55 |
- **P1 Simple Mode Removed Breaks Free Tier UX** - FIXED via Accumulator Pattern (PR #117)
|
| 56 |
|
| 57 |
### P2 Bugs (All FIXED)
|
| 58 |
+
- **P2 7B Model Garbage Output** - SUPERSEDED by P1 Free Tier fix (root cause was premature marker, not model capacity)
|
| 59 |
- **P2 Advanced Mode Cold Start No Feedback** - FIXED, all phases complete
|
| 60 |
- **P2 Architectural BYOK Gaps** - FIXED, end-to-end BYOK support in PR #119
|
| 61 |
|
docs/bugs/P1_FREE_TIER_TOOL_EXECUTION_FAILURE.md
ADDED
|
@@ -0,0 +1,319 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# P1 Bug: Free Tier Tool Execution Failure
|
| 2 |
+
|
| 3 |
+
**Date**: 2025-12-03
|
| 4 |
+
**Status**: FIXED (PR fix/P1-free-tier-tool-execution)
|
| 5 |
+
**Severity**: P1 (Critical - Free Tier Completely Broken)
|
| 6 |
+
**Component**: HuggingFaceChatClient + Together.ai Routing + Tool Calling
|
| 7 |
+
**Resolution**: Removed premature `__function_invoking_chat_client__ = True` marker from class body
|
| 8 |
+
|
| 9 |
+
---
|
| 10 |
+
|
| 11 |
+
## Executive Summary
|
| 12 |
+
|
| 13 |
+
The Free Tier (HuggingFace) is fundamentally broken due to **multiple interacting issues** that cause tool calls to fail, resulting in garbage output, hallucinated results, and raw JSON appearing in the UI.
|
| 14 |
+
|
| 15 |
+
**This is NOT a simple 7B model issue** - it's a chain of infrastructure and code problems.
|
| 16 |
+
|
| 17 |
+
---
|
| 18 |
+
|
| 19 |
+
## Symptoms
|
| 20 |
+
|
| 21 |
+
Users on Free Tier see:
|
| 22 |
+
|
| 23 |
+
1. **Garbage tokens**: "oleon", "UrlParser", "MemoryWarning", "PostalCodes"
|
| 24 |
+
2. **Raw tool call XML tags**: `<tool_call>`, `</tool_call>` appearing as text
|
| 25 |
+
3. **Raw JSON tool calls**: `{"name": "search_pubmed", "arguments": {...}}`
|
| 26 |
+
4. **Hallucinated tool results**: Fake JSON responses that were never returned by actual tools:
|
| 27 |
+
```json
|
| 28 |
+
{"response": "[{'title': 'Effect of Flibanserin...', ...}]"}
|
| 29 |
+
```
|
| 30 |
+
5. **No actual database searches**: PubMed, ClinicalTrials.gov never queried
|
| 31 |
+
|
| 32 |
+
---
|
| 33 |
+
|
| 34 |
+
## Root Cause Analysis
|
| 35 |
+
|
| 36 |
+
### Cause 1: Model Routed to Third-Party Provider (Together.ai)
|
| 37 |
+
|
| 38 |
+
**Discovery**: Qwen2.5-7B-Instruct is NOT served by native HuggingFace infrastructure.
|
| 39 |
+
|
| 40 |
+
```python
|
| 41 |
+
# API response from HuggingFace:
|
| 42 |
+
{
|
| 43 |
+
"inferenceProviderMapping": {
|
| 44 |
+
"together": {
|
| 45 |
+
"status": "live",
|
| 46 |
+
"providerId": "Qwen/Qwen2.5-7B-Instruct-Turbo" # <-- TURBO variant!
|
| 47 |
+
},
|
| 48 |
+
"featherless-ai": {
|
| 49 |
+
"status": "live",
|
| 50 |
+
"providerId": "Qwen/Qwen2.5-7B-Instruct"
|
| 51 |
+
}
|
| 52 |
+
}
|
| 53 |
+
}
|
| 54 |
+
```
|
| 55 |
+
|
| 56 |
+
**Impact**:
|
| 57 |
+
- Native HF-inference returns 404 for this model
|
| 58 |
+
- All requests route through Together.ai
|
| 59 |
+
- Together serves a "Turbo" variant, not the original
|
| 60 |
+
- We cannot control how Together handles tool calling
|
| 61 |
+
|
| 62 |
+
### Cause 2: Qwen2.5 Uses XML-Style Tool Calling Format
|
| 63 |
+
|
| 64 |
+
**Discovery**: The model's chat template instructs it to output tool calls in XML format:
|
| 65 |
+
|
| 66 |
+
```jinja
|
| 67 |
+
For each function call, return a json object with function name and arguments
|
| 68 |
+
within <tool_call></tool_call> XML tags:
|
| 69 |
+
<tool_call>
|
| 70 |
+
{"name": <function-name>, "arguments": <args-json-object>}
|
| 71 |
+
</tool_call>
|
| 72 |
+
```
|
| 73 |
+
|
| 74 |
+
**Impact**:
|
| 75 |
+
- Model outputs `<tool_call>{"name":...}</tool_call>` as **text**
|
| 76 |
+
- This text appears in `delta.content` (not `delta.tool_calls`)
|
| 77 |
+
- Our streaming code yields this as visible text to the UI
|
| 78 |
+
- When tool calling works correctly, the API parses this internally
|
| 79 |
+
- When it fails, raw XML appears in output
|
| 80 |
+
|
| 81 |
+
### Cause 3: Together.ai Turbo Inconsistent Tool Call Parsing
|
| 82 |
+
|
| 83 |
+
**Discovery**: Together's serving of the Turbo model has inconsistent behavior:
|
| 84 |
+
|
| 85 |
+
| Test Scenario | Tool Call Behavior |
|
| 86 |
+
|---------------|-------------------|
|
| 87 |
+
| Simple query, single tool | β
Parsed correctly to `tool_calls` |
|
| 88 |
+
| Complex multi-agent prompt | β Mixed: some parsed, some as text |
|
| 89 |
+
| Multi-turn with tool results | β Model hallucinates fake results |
|
| 90 |
+
|
| 91 |
+
**Evidence from testing**:
|
| 92 |
+
```python
|
| 93 |
+
# Simple test - WORKS:
|
| 94 |
+
finish_reason: tool_calls
|
| 95 |
+
content: None
|
| 96 |
+
tool_calls: [ChatCompletionOutputToolCall(function=..., name='search_pubmed')]
|
| 97 |
+
|
| 98 |
+
# Complex prompt - FAILS:
|
| 99 |
+
TEXT[49]: 'ε»Ίζ‘£η«ζ ' # Chinese garbage between tool calls
|
| 100 |
+
TEXT[X]: '{"name": "search_preprints", ...}' # Raw JSON as text
|
| 101 |
+
```
|
| 102 |
+
|
| 103 |
+
### Cause 4: Potential Code Bug - Premature Marker Setting
|
| 104 |
+
|
| 105 |
+
**Discovery**: In `HuggingFaceChatClient`, we set a marker that may prevent tool execution wrapping:
|
| 106 |
+
|
| 107 |
+
```python
|
| 108 |
+
@use_function_invocation # Decorator checks marker BEFORE wrapping
|
| 109 |
+
@use_observability
|
| 110 |
+
@use_chat_middleware
|
| 111 |
+
class HuggingFaceChatClient(BaseChatClient):
|
| 112 |
+
# This marker causes decorator to return early!
|
| 113 |
+
__function_invoking_chat_client__ = True # <-- BUG?
|
| 114 |
+
```
|
| 115 |
+
|
| 116 |
+
The `@use_function_invocation` decorator source:
|
| 117 |
+
```python
|
| 118 |
+
def use_function_invocation(chat_client):
|
| 119 |
+
if getattr(chat_client, FUNCTION_INVOKING_CHAT_CLIENT_MARKER, False):
|
| 120 |
+
return chat_client # EARLY RETURN - doesn't wrap methods!
|
| 121 |
+
# ... wrapping code never runs ...
|
| 122 |
+
```
|
| 123 |
+
|
| 124 |
+
**Impact**: The decorator sees the marker as `True` and returns early without wrapping `get_response` and `get_streaming_response` with the function invocation handler.
|
| 125 |
+
|
| 126 |
+
**Status**: NEEDS VERIFICATION - Testing shows methods have `__wrapped__` attribute, suggesting some decoration occurred. May be from other decorators.
|
| 127 |
+
|
| 128 |
+
### Cause 5: Model Hallucination Under Complexity
|
| 129 |
+
|
| 130 |
+
**Discovery**: When the model fails to make proper API tool calls, it **simulates** tool use by outputting fake results:
|
| 131 |
+
|
| 132 |
+
```
|
| 133 |
+
{"response": "[{'title': 'Effect of Flibanserin...'}]"}
|
| 134 |
+
```
|
| 135 |
+
|
| 136 |
+
This is pure hallucination - no actual API calls were made. The model is trained to produce tool-like outputs, so when the API tool calling fails, it falls back to text-based simulation.
|
| 137 |
+
|
| 138 |
+
---
|
| 139 |
+
|
| 140 |
+
## Verification Steps
|
| 141 |
+
|
| 142 |
+
### Test 1: Direct InferenceClient (PASSES)
|
| 143 |
+
|
| 144 |
+
```python
|
| 145 |
+
from huggingface_hub import InferenceClient
|
| 146 |
+
|
| 147 |
+
client = InferenceClient(model='Qwen/Qwen2.5-7B-Instruct')
|
| 148 |
+
response = client.chat_completion(
|
| 149 |
+
messages=[{'role': 'user', 'content': 'What is the weather?'}],
|
| 150 |
+
tools=[weather_tool],
|
| 151 |
+
tool_choice='auto',
|
| 152 |
+
)
|
| 153 |
+
# Result: tool_calls properly parsed, content=None
|
| 154 |
+
```
|
| 155 |
+
|
| 156 |
+
### Test 2: Complex Multi-Agent Prompt (FAILS)
|
| 157 |
+
|
| 158 |
+
```python
|
| 159 |
+
# With our SearchAgent-style prompts:
|
| 160 |
+
stream = client.chat_completion(
|
| 161 |
+
messages=[system_prompt, user_query],
|
| 162 |
+
tools=multiple_tools,
|
| 163 |
+
...
|
| 164 |
+
)
|
| 165 |
+
# Result: Mix of text content AND tool_calls, garbage tokens appear
|
| 166 |
+
```
|
| 167 |
+
|
| 168 |
+
### Test 3: ChatAgent Single Tool (PARTIAL)
|
| 169 |
+
|
| 170 |
+
```python
|
| 171 |
+
agent = ChatAgent(
|
| 172 |
+
chat_client=HuggingFaceChatClient(),
|
| 173 |
+
tools=[search_pubmed],
|
| 174 |
+
...
|
| 175 |
+
)
|
| 176 |
+
result = await agent.run('Search for libido drugs')
|
| 177 |
+
# Result: Tool call request made but function NOT executed (tool_calls=0)
|
| 178 |
+
```
|
| 179 |
+
|
| 180 |
+
---
|
| 181 |
+
|
| 182 |
+
## Impact Assessment
|
| 183 |
+
|
| 184 |
+
| Aspect | Impact |
|
| 185 |
+
|--------|--------|
|
| 186 |
+
| Free Tier Users | **100% broken** - Cannot get any useful results |
|
| 187 |
+
| Demo Quality | **Unprofessional** - Shows garbage/hallucinations |
|
| 188 |
+
| User Trust | **Critical** - Appears completely broken |
|
| 189 |
+
| Tool Execution | **Not working** - Tools never actually called |
|
| 190 |
+
|
| 191 |
+
---
|
| 192 |
+
|
| 193 |
+
## Fix Options
|
| 194 |
+
|
| 195 |
+
### Option 1: Remove Premature Marker (QUICK - Test First)
|
| 196 |
+
|
| 197 |
+
**Location**: `src/clients/huggingface.py:43`
|
| 198 |
+
|
| 199 |
+
```python
|
| 200 |
+
# REMOVE THIS LINE:
|
| 201 |
+
__function_invoking_chat_client__ = True
|
| 202 |
+
```
|
| 203 |
+
|
| 204 |
+
Let the `@use_function_invocation` decorator set the marker AFTER wrapping.
|
| 205 |
+
|
| 206 |
+
**Risk**: Unknown - need to test if this actually enables tool execution.
|
| 207 |
+
|
| 208 |
+
### Option 2: Switch to Model with Native HF Support
|
| 209 |
+
|
| 210 |
+
Find a model that runs on native HuggingFace infrastructure (not routed to third parties):
|
| 211 |
+
|
| 212 |
+
| Model | Size | Native HF? | Tool Calling |
|
| 213 |
+
|-------|------|------------|--------------|
|
| 214 |
+
| `Qwen/Qwen2.5-3B-Instruct` | 3B | β Test | β |
|
| 215 |
+
| `mistralai/Mistral-7B-Instruct-v0.3` | 7B | β Test | β
|
|
| 216 |
+
| `microsoft/Phi-3-mini-4k-instruct` | 3.8B | β Test | Limited |
|
| 217 |
+
|
| 218 |
+
### Option 3: Simplify Free Tier to Single-Agent
|
| 219 |
+
|
| 220 |
+
Remove multi-agent complexity for Free Tier:
|
| 221 |
+
- Single ChatAgent with simpler prompt
|
| 222 |
+
- Direct tool calls instead of MagenticBuilder workflow
|
| 223 |
+
- Reduced prompt complexity
|
| 224 |
+
|
| 225 |
+
### Option 4: Streaming Content Filter (BAND-AID)
|
| 226 |
+
|
| 227 |
+
Filter garbage from streaming output:
|
| 228 |
+
|
| 229 |
+
```python
|
| 230 |
+
def should_stream_content(text: str) -> bool:
|
| 231 |
+
"""Filter garbage from streaming."""
|
| 232 |
+
if text.strip().startswith('{"name":'):
|
| 233 |
+
return False # Raw tool call JSON
|
| 234 |
+
if '</tool_call>' in text or '<tool_call>' in text:
|
| 235 |
+
return False # XML tags
|
| 236 |
+
garbage = ["oleon", "UrlParser", "MemoryWarning", "ε»Ίζ‘£η«ζ "]
|
| 237 |
+
if any(g in text for g in garbage):
|
| 238 |
+
return False
|
| 239 |
+
return True
|
| 240 |
+
```
|
| 241 |
+
|
| 242 |
+
**Note**: This hides symptoms but doesn't fix the underlying tool execution failure.
|
| 243 |
+
|
| 244 |
+
### Option 5: Use Together.ai Directly with Their SDK
|
| 245 |
+
|
| 246 |
+
Bypass HuggingFace routing entirely:
|
| 247 |
+
- Use Together's official SDK
|
| 248 |
+
- May have better tool calling support
|
| 249 |
+
- Requires new client implementation
|
| 250 |
+
|
| 251 |
+
---
|
| 252 |
+
|
| 253 |
+
## Files Involved
|
| 254 |
+
|
| 255 |
+
| File | Role |
|
| 256 |
+
|------|------|
|
| 257 |
+
| `src/clients/huggingface.py` | Main HF client - has premature marker |
|
| 258 |
+
| `src/clients/factory.py` | Client selection logic |
|
| 259 |
+
| `src/agents/magentic_agents.py` | Agent definitions with tools |
|
| 260 |
+
| `src/orchestrators/advanced.py` | Multi-agent workflow |
|
| 261 |
+
| `src/agents/tools.py` | Tool function definitions |
|
| 262 |
+
|
| 263 |
+
---
|
| 264 |
+
|
| 265 |
+
## Recommended Action Plan
|
| 266 |
+
|
| 267 |
+
### Phase 1: Verify Code Bug (Immediate)
|
| 268 |
+
|
| 269 |
+
1. Remove `__function_invoking_chat_client__ = True` from HuggingFaceChatClient
|
| 270 |
+
2. Test if tool execution now works
|
| 271 |
+
3. If yes, verify no regressions with full test suite
|
| 272 |
+
|
| 273 |
+
### Phase 2: Provider Testing
|
| 274 |
+
|
| 275 |
+
1. Test which small models have native HF support
|
| 276 |
+
2. Evaluate Together.ai direct integration
|
| 277 |
+
3. Document provider routing for all candidate models
|
| 278 |
+
|
| 279 |
+
### Phase 3: Architecture Decision
|
| 280 |
+
|
| 281 |
+
Based on Phase 1-2 results:
|
| 282 |
+
- If code fix works: Deploy and monitor
|
| 283 |
+
- If provider issues persist: Implement simplified single-agent mode
|
| 284 |
+
- Consider hybrid: Simple mode for free, advanced for paid
|
| 285 |
+
|
| 286 |
+
---
|
| 287 |
+
|
| 288 |
+
## Relation to P2_7B_MODEL_GARBAGE_OUTPUT
|
| 289 |
+
|
| 290 |
+
This P1 bug **supersedes** the P2 bug. The P2 doc incorrectly blamed the model capacity. The real issues are:
|
| 291 |
+
|
| 292 |
+
1. **Provider routing** (Together.ai Turbo, not native HF)
|
| 293 |
+
2. **Tool execution failure** (possible code bug)
|
| 294 |
+
3. **Model hallucination** (consequence of #2, not root cause)
|
| 295 |
+
|
| 296 |
+
The P2 symptoms are downstream effects of this P1 root cause.
|
| 297 |
+
|
| 298 |
+
---
|
| 299 |
+
|
| 300 |
+
## Investigation Timeline
|
| 301 |
+
|
| 302 |
+
| Time | Finding |
|
| 303 |
+
|------|---------|
|
| 304 |
+
| 16:00 | Started deep investigation per user request |
|
| 305 |
+
| 16:10 | Found Qwen chat template uses XML-style tool_call |
|
| 306 |
+
| 16:20 | Confirmed HF API parses tool calls correctly |
|
| 307 |
+
| 16:30 | Discovered model routed to Together.ai, not native HF |
|
| 308 |
+
| 16:35 | Found premature marker in HuggingFaceChatClient |
|
| 309 |
+
| 16:40 | Verified ChatAgent makes tool requests but doesn't execute |
|
| 310 |
+
| 16:45 | Documented complete root cause chain |
|
| 311 |
+
|
| 312 |
+
---
|
| 313 |
+
|
| 314 |
+
## References
|
| 315 |
+
|
| 316 |
+
- [HuggingFace Inference Providers](https://huggingface.co/docs/inference-providers/index)
|
| 317 |
+
- [Together.ai Function Calling](https://docs.together.ai/docs/function-calling)
|
| 318 |
+
- [Qwen Function Calling Docs](https://qwen.readthedocs.io/en/latest/framework/function_call.html)
|
| 319 |
+
- [TGI Tool Calling Issue #2375](https://github.com/huggingface/text-generation-inference/issues/2375)
|
docs/bugs/P2_7B_MODEL_GARBAGE_OUTPUT.md
CHANGED
|
@@ -9,19 +9,37 @@
|
|
| 9 |
|
| 10 |
## Symptoms
|
| 11 |
|
| 12 |
-
When running a research query on Free Tier (Qwen2.5-7B-Instruct), the streaming output shows **garbage tokens** instead of coherent agent reasoning:
|
| 13 |
|
| 14 |
-
|
|
|
|
| 15 |
π‘ **STREAMING**: yarg
|
| 16 |
π‘ **STREAMING**: PostalCodes
|
| 17 |
-
π‘ **STREAMING**: PostalCodes
|
| 18 |
π‘ **STREAMING**: FunctionFlags
|
| 19 |
-
π‘ **STREAMING**: search_pubmed
|
| 20 |
-
π‘ **STREAMING**: search_clinical_trials
|
| 21 |
π‘ **STREAMING**: system
|
| 22 |
π‘ **STREAMING**: Transferred to searcher, adopt the persona immediately.
|
| 23 |
```
|
| 24 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 25 |
The model outputs random tokens like "yarg", "PostalCodes", "FunctionFlags" instead of actual research reasoning.
|
| 26 |
|
| 27 |
---
|
|
@@ -167,6 +185,30 @@ Significantly simplify the agent prompts for 7B compatibility:
|
|
| 167 |
- Remove abstract concepts
|
| 168 |
- Use few-shot examples
|
| 169 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 170 |
---
|
| 171 |
|
| 172 |
## Recommended Action Plan
|
|
|
|
| 9 |
|
| 10 |
## Symptoms
|
| 11 |
|
| 12 |
+
When running a research query on Free Tier (Qwen2.5-7B-Instruct), the streaming output shows **garbage tokens** and **malformed tool calls** instead of coherent agent reasoning:
|
| 13 |
|
| 14 |
+
### Symptom A: Random Garbage Tokens
|
| 15 |
+
```text
|
| 16 |
π‘ **STREAMING**: yarg
|
| 17 |
π‘ **STREAMING**: PostalCodes
|
|
|
|
| 18 |
π‘ **STREAMING**: FunctionFlags
|
|
|
|
|
|
|
| 19 |
π‘ **STREAMING**: system
|
| 20 |
π‘ **STREAMING**: Transferred to searcher, adopt the persona immediately.
|
| 21 |
```
|
| 22 |
|
| 23 |
+
### Symptom B: Raw Tool Call JSON in Text (NEW - 2025-12-03)
|
| 24 |
+
```text
|
| 25 |
+
π‘ **STREAMING**:
|
| 26 |
+
oleon
|
| 27 |
+
{"name": "search_preprints", "arguments": {"query": "female libido post-menopause drug", "max_results": 10}}
|
| 28 |
+
</tool_call>
|
| 29 |
+
system
|
| 30 |
+
|
| 31 |
+
UrlParser
|
| 32 |
+
{"name": "search_clinical_trials", "arguments": {"query": "female libido post-menopause drug", "max_results": 10}}
|
| 33 |
+
```
|
| 34 |
+
|
| 35 |
+
The model is outputting:
|
| 36 |
+
1. **Garbage tokens**: "oleon", "UrlParser" - meaningless fragments
|
| 37 |
+
2. **Raw JSON tool calls**: `{"name": "search_preprints", ...}` - intended tool calls output as TEXT
|
| 38 |
+
3. **XML-style tags**: `</tool_call>` - model trying to use wrong tool calling format
|
| 39 |
+
4. **"system" keyword**: Model confusing role markers with content
|
| 40 |
+
|
| 41 |
+
**Root Cause of Symptom B**: The 7B model is attempting to make tool calls but outputting them as **text content** instead of using the HuggingFace API's native `tool_calls` structure. The model may have been trained on a different tool calling format (XML-style like Claude's `<tool_call>` tags) and doesn't properly use the OpenAI-compatible JSON format.
|
| 42 |
+
|
| 43 |
The model outputs random tokens like "yarg", "PostalCodes", "FunctionFlags" instead of actual research reasoning.
|
| 44 |
|
| 45 |
---
|
|
|
|
| 185 |
- Remove abstract concepts
|
| 186 |
- Use few-shot examples
|
| 187 |
|
| 188 |
+
### Option 6: Streaming Content Filter (For Symptom B)
|
| 189 |
+
|
| 190 |
+
Filter raw tool call JSON from streaming output:
|
| 191 |
+
|
| 192 |
+
```python
|
| 193 |
+
def should_stream_content(text: str) -> bool:
|
| 194 |
+
"""Filter garbage and raw tool calls from streaming."""
|
| 195 |
+
# Don't stream raw JSON tool calls
|
| 196 |
+
if text.strip().startswith('{"name":'):
|
| 197 |
+
return False
|
| 198 |
+
# Don't stream XML-style tool tags
|
| 199 |
+
if '</tool_call>' in text or '<tool_call>' in text:
|
| 200 |
+
return False
|
| 201 |
+
# Don't stream garbage tokens (extend as needed)
|
| 202 |
+
garbage = ["oleon", "UrlParser", "yarg", "PostalCodes", "FunctionFlags"]
|
| 203 |
+
if any(g in text for g in garbage):
|
| 204 |
+
return False
|
| 205 |
+
return True
|
| 206 |
+
```
|
| 207 |
+
|
| 208 |
+
**Location**: `src/orchestrators/advanced.py` lines 315-322
|
| 209 |
+
|
| 210 |
+
This would prevent the raw tool call JSON from being shown to users, even if the model produces it.
|
| 211 |
+
|
| 212 |
---
|
| 213 |
|
| 214 |
## Recommended Action Plan
|
docs/bugs/{P1_GRADIO_EXAMPLE_CLICK_AUTO_SUBMIT.md β archive/P1_GRADIO_EXAMPLE_CLICK_AUTO_SUBMIT.md}
RENAMED
|
@@ -1,6 +1,6 @@
|
|
| 1 |
# P1: Gradio Example Click Auto-Submits Instead of Loading
|
| 2 |
|
| 3 |
-
**Status:**
|
| 4 |
**Priority:** P1 (High - UX breaks BYOK flow)
|
| 5 |
**Discovered:** 2025-12-03
|
| 6 |
**Component:** `src/app.py` (Gradio UI)
|
|
|
|
| 1 |
# P1: Gradio Example Click Auto-Submits Instead of Loading
|
| 2 |
|
| 3 |
+
**Status:** FIXED (PR #120, merged 2025-12-03)
|
| 4 |
**Priority:** P1 (High - UX breaks BYOK flow)
|
| 5 |
**Discovered:** 2025-12-03
|
| 6 |
**Component:** `src/app.py` (Gradio UI)
|
src/clients/huggingface.py
CHANGED
|
@@ -38,10 +38,6 @@ logger = structlog.get_logger()
|
|
| 38 |
class HuggingFaceChatClient(BaseChatClient): # type: ignore[misc]
|
| 39 |
"""Adapter for HuggingFace Inference API with full function calling support."""
|
| 40 |
|
| 41 |
-
# Marker to tell agent_framework that this client supports function calling
|
| 42 |
-
# Without this, the framework warns and ignores tools
|
| 43 |
-
__function_invoking_chat_client__ = True
|
| 44 |
-
|
| 45 |
def __init__(
|
| 46 |
self,
|
| 47 |
model_id: str | None = None,
|
|
|
|
| 38 |
class HuggingFaceChatClient(BaseChatClient): # type: ignore[misc]
|
| 39 |
"""Adapter for HuggingFace Inference API with full function calling support."""
|
| 40 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 41 |
def __init__(
|
| 42 |
self,
|
| 43 |
model_id: str | None = None,
|