GOSHUNCLE
/

tool_call_validator_zh

@@ -22,14 +22,16 @@ tags:
 # tool_call_validator_zh
-> LoRA fine-tune of Qwen2.5-3B-Instruct
 > Traditional Chinese tool-call validator (guardrail) — LoRA fine-tune of Qwen2.5-3B-Instruct
 ---
 ## 中文說明
-本模型是針對 **Tool Call Validation** 場景微調的繁體中文模型。基於 [Qwen/Qwen2.5-3B-Instruct](https://huggingface.co/Qwen/Qwen2.5-3B-Instruct) 用 LoRA 訓練，能夠：
 1. 讀取使用者請求（user prompt）與多個候選工具的 description
 2. 透過語意比對選出最適合的工具，或在無合適工具時拒絕匹配
@@ -170,6 +172,49 @@ Invalid 時 fallback：`{signal: "abstain", confidence: "low", selected_tool: nu
 訓練樣本 reasoning 風格偏向「翻譯式書面語」（如 memory_2 IC Firewall），對極口語化的輸入可能略顯生硬。
 ### Disclaimer
 訓練資料中的工具名稱（web_search 等 8 個）為**合成虛構**，用於 demonstrate 方法論。所有股票標的、人物、地點等 slot pool 內容皆為公開資訊範例，無暗示任何商業關係。
@@ -216,6 +261,45 @@ The base Qwen2.5-3B-Instruct achieves 57% tool accuracy and 48% confidence accur
 | Training time | ~4.4 hours |
 | Best eval_loss | 0.0051 |
 ### Methodology Inheritance
 This model inherits the methodology from [GOSHUNCLE/ic_content_firewall_zh](https://huggingface.co/GOSHUNCLE/ic_content_firewall_zh) (IC design industry content firewall):

 # tool_call_validator_zh
+> 中文 (繁體) Tool Call 驗證 / Guardrail 模型 · LoRA fine-tune of Qwen2.5-3B-Instruct
 > Traditional Chinese tool-call validator (guardrail) — LoRA fine-tune of Qwen2.5-3B-Instruct
+**🚀 [Try the live demo →](https://huggingface.co/spaces/GOSHUNCLE/tool_call_validator_zh_demo)** · **📦 [Methodology lineage: ic_content_firewall_zh](https://huggingface.co/GOSHUNCLE/ic_content_firewall_zh)**
 ---
 ## 中文說明
+本模型是針對 **Tool Call Validation / Guardrail** 場景微調的繁體中文模型。基於 [Qwen/Qwen2.5-3B-Instruct](https://huggingface.co/Qwen/Qwen2.5-3B-Instruct) 用 LoRA 訓練，能夠：
 1. 讀取使用者請求（user prompt）與多個候選工具的 description
 2. 透過語意比對選出最適合的工具，或在無合適工具時拒絕匹配
 訓練樣本 reasoning 風格偏向「翻譯式書面語」（如 memory_2 IC Firewall），對極口語化的輸入可能略顯生硬。
+### Deployment Notes（部署注意事項）
+#### Gradio + huggingface_hub 相容性 shim
+若要將本模型整合進 **Gradio app**（包括 HF Space），請在 `import gradio` 之前加入以下 monkey-patch，避免 `ImportError: cannot import name 'HfFolder' from 'huggingface_hub'`：
+```python
+# === Compat shim：huggingface_hub >= 1.0 移除了 HfFolder，但 gradio (4.x 與 5.x) 還在用 ===
+import huggingface_hub as _hf_hub
+if not hasattr(_hf_hub, "HfFolder"):
+    class _HfFolderShim:
+        @staticmethod
+        def get_token():
+            try: return _hf_hub.get_token()
+            except Exception: return None
+        @staticmethod
+        def save_token(token):
+            try: _hf_hub.login(token=token)
+            except Exception: pass
+        @staticmethod
+        def delete_token():
+            try: _hf_hub.logout()
+            except Exception: pass
+    _hf_hub.HfFolder = _HfFolderShim
+import gradio as gr  # safe now
+```
+完整實例見 [Demo Space app.py](https://huggingface.co/spaces/GOSHUNCLE/tool_call_validator_zh_demo/blob/main/app.py)。
+#### 部署平台建議
+| 平台 | 推論時間/筆 | 適用 |
+|---|---|---|
+| HF 免費 CPU Space (2 vCPU, 16 GB) | 90-180 秒 | Demo / 驗證 |
+| HF T4 GPU Space (~$0.40/hr) | 1-3 秒 | Light production |
+| 本機 NVIDIA GPU (RTX 3060+) | 1-2 秒 | Self-host |
+| 本機 CPU (Intel Core Ultra 7+) | 30-60 秒 | Offline batch |
+#### GGUF 量化（未實作，v2 backlog）
+如需更快 CPU 推論，可考慮 merge LoRA 後轉 GGUF Q4，預估 CPU 推論可降至 ~5-10 秒/筆。
 ### Disclaimer
 訓練資料中的工具名稱（web_search 等 8 個）為**合成虛構**，用於 demonstrate 方法論。所有股票標的、人物、地點等 slot pool 內容皆為公開資訊範例，無暗示任何商業關係。
 | Training time | ~4.4 hours |
 | Best eval_loss | 0.0051 |
+### Deployment Notes
+#### Gradio compatibility shim
+If you integrate this model into a **Gradio app** (including HF Spaces), add this monkey-patch before `import gradio` to avoid `ImportError: cannot import name 'HfFolder' from 'huggingface_hub'`:
+```python
+# Compat shim: huggingface_hub >= 1.0 removed HfFolder, but gradio (4.x and 5.x) still imports it
+import huggingface_hub as _hf_hub
+if not hasattr(_hf_hub, "HfFolder"):
+    class _HfFolderShim:
+        @staticmethod
+        def get_token():
+            try: return _hf_hub.get_token()
+            except Exception: return None
+        @staticmethod
+        def save_token(token):
+            try: _hf_hub.login(token=token)
+            except Exception: pass
+        @staticmethod
+        def delete_token():
+            try: _hf_hub.logout()
+            except Exception: pass
+    _hf_hub.HfFolder = _HfFolderShim
+import gradio as gr  # safe now
+```
+See full example in [Demo Space app.py](https://huggingface.co/spaces/GOSHUNCLE/tool_call_validator_zh_demo/blob/main/app.py).
+#### Inference latency by platform
+| Platform | Latency / sample | Use case |
+|---|---|---|
+| HF free CPU Space (2 vCPU, 16 GB) | 90-180 s | Demo / validation |
+| HF T4 GPU Space (~$0.40/hr) | 1-3 s | Light production |
+| Local NVIDIA GPU (RTX 3060+) | 1-2 s | Self-host |
+| Local CPU (Intel Core Ultra 7+) | 30-60 s | Offline batch |
 ### Methodology Inheritance
 This model inherits the methodology from [GOSHUNCLE/ic_content_firewall_zh](https://huggingface.co/GOSHUNCLE/ic_content_firewall_zh) (IC design industry content firewall):