GOSHUNCLE commited on
Commit
a8226ae
·
verified ·
1 Parent(s): 68fee66

Upload README.md

Browse files
Files changed (1) hide show
  1. README.md +86 -2
README.md CHANGED
@@ -22,14 +22,16 @@ tags:
22
 
23
  # tool_call_validator_zh
24
 
25
- > LoRA fine-tune of Qwen2.5-3B-Instruct
26
  > Traditional Chinese tool-call validator (guardrail) — LoRA fine-tune of Qwen2.5-3B-Instruct
27
 
 
 
28
  ---
29
 
30
  ## 中文說明
31
 
32
- 本模型是針對 **Tool Call Validation** 場景微調的繁體中文模型。基於 [Qwen/Qwen2.5-3B-Instruct](https://huggingface.co/Qwen/Qwen2.5-3B-Instruct) 用 LoRA 訓練,能夠:
33
 
34
  1. 讀取使用者請求(user prompt)與多個候選工具的 description
35
  2. 透過語意比對選出最適合的工具,或在無合適工具時拒絕匹配
@@ -170,6 +172,49 @@ Invalid 時 fallback:`{signal: "abstain", confidence: "low", selected_tool: nu
170
 
171
  訓練樣本 reasoning 風格偏向「翻譯式書面語」(如 memory_2 IC Firewall),對極口語化的輸入可能略顯生硬。
172
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
173
  ### Disclaimer
174
 
175
  訓練資料中的工具名稱(web_search 等 8 個)為**合成虛構**,用於 demonstrate 方法論。所有股票標的、人物、地點等 slot pool 內容皆為公開資訊範例,無暗示任何商業關係。
@@ -216,6 +261,45 @@ The base Qwen2.5-3B-Instruct achieves 57% tool accuracy and 48% confidence accur
216
  | Training time | ~4.4 hours |
217
  | Best eval_loss | 0.0051 |
218
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
219
  ### Methodology Inheritance
220
 
221
  This model inherits the methodology from [GOSHUNCLE/ic_content_firewall_zh](https://huggingface.co/GOSHUNCLE/ic_content_firewall_zh) (IC design industry content firewall):
 
22
 
23
  # tool_call_validator_zh
24
 
25
+ > 中文 (繁體) Tool Call 驗證 / Guardrail 模型 · LoRA fine-tune of Qwen2.5-3B-Instruct
26
  > Traditional Chinese tool-call validator (guardrail) — LoRA fine-tune of Qwen2.5-3B-Instruct
27
 
28
+ **🚀 [Try the live demo →](https://huggingface.co/spaces/GOSHUNCLE/tool_call_validator_zh_demo)** · **📦 [Methodology lineage: ic_content_firewall_zh](https://huggingface.co/GOSHUNCLE/ic_content_firewall_zh)**
29
+
30
  ---
31
 
32
  ## 中文說明
33
 
34
+ 本模型是針對 **Tool Call Validation / Guardrail** 場景微調的繁體中文模型。基於 [Qwen/Qwen2.5-3B-Instruct](https://huggingface.co/Qwen/Qwen2.5-3B-Instruct) 用 LoRA 訓練,能夠:
35
 
36
  1. 讀取使用者請求(user prompt)與多個候選工具的 description
37
  2. 透過語意比對選出最適合的工具,或在無合適工具時拒絕匹配
 
172
 
173
  訓練樣本 reasoning 風格偏向「翻譯式書面語」(如 memory_2 IC Firewall),對極口語化的輸入可能略顯生硬。
174
 
175
+ ### Deployment Notes(部署注意事項)
176
+
177
+ #### Gradio + huggingface_hub 相容性 shim
178
+
179
+ 若要將本模型整合進 **Gradio app**(包括 HF Space),請在 `import gradio` 之前加入以下 monkey-patch,避免 `ImportError: cannot import name 'HfFolder' from 'huggingface_hub'`:
180
+
181
+ ```python
182
+ # === Compat shim:huggingface_hub >= 1.0 移除了 HfFolder,但 gradio (4.x 與 5.x) 還在用 ===
183
+ import huggingface_hub as _hf_hub
184
+ if not hasattr(_hf_hub, "HfFolder"):
185
+ class _HfFolderShim:
186
+ @staticmethod
187
+ def get_token():
188
+ try: return _hf_hub.get_token()
189
+ except Exception: return None
190
+ @staticmethod
191
+ def save_token(token):
192
+ try: _hf_hub.login(token=token)
193
+ except Exception: pass
194
+ @staticmethod
195
+ def delete_token():
196
+ try: _hf_hub.logout()
197
+ except Exception: pass
198
+ _hf_hub.HfFolder = _HfFolderShim
199
+
200
+ import gradio as gr # safe now
201
+ ```
202
+
203
+ 完整實例見 [Demo Space app.py](https://huggingface.co/spaces/GOSHUNCLE/tool_call_validator_zh_demo/blob/main/app.py)。
204
+
205
+ #### 部署平台建議
206
+
207
+ | 平台 | 推論時間/筆 | 適用 |
208
+ |---|---|---|
209
+ | HF 免費 CPU Space (2 vCPU, 16 GB) | 90-180 秒 | Demo / 驗證 |
210
+ | HF T4 GPU Space (~$0.40/hr) | 1-3 秒 | Light production |
211
+ | 本機 NVIDIA GPU (RTX 3060+) | 1-2 秒 | Self-host |
212
+ | 本機 CPU (Intel Core Ultra 7+) | 30-60 秒 | Offline batch |
213
+
214
+ #### GGUF 量化(未實作,v2 backlog)
215
+
216
+ 如需更快 CPU 推論,可考慮 merge LoRA 後轉 GGUF Q4,預估 CPU 推論可降至 ~5-10 秒/筆。
217
+
218
  ### Disclaimer
219
 
220
  訓練資料中的工具名稱(web_search 等 8 個)為**合成虛構**,用於 demonstrate 方法論。所有股票標的、人物、地點等 slot pool 內容皆為公開資訊範例,無暗示任何商業關係。
 
261
  | Training time | ~4.4 hours |
262
  | Best eval_loss | 0.0051 |
263
 
264
+ ### Deployment Notes
265
+
266
+ #### Gradio compatibility shim
267
+
268
+ If you integrate this model into a **Gradio app** (including HF Spaces), add this monkey-patch before `import gradio` to avoid `ImportError: cannot import name 'HfFolder' from 'huggingface_hub'`:
269
+
270
+ ```python
271
+ # Compat shim: huggingface_hub >= 1.0 removed HfFolder, but gradio (4.x and 5.x) still imports it
272
+ import huggingface_hub as _hf_hub
273
+ if not hasattr(_hf_hub, "HfFolder"):
274
+ class _HfFolderShim:
275
+ @staticmethod
276
+ def get_token():
277
+ try: return _hf_hub.get_token()
278
+ except Exception: return None
279
+ @staticmethod
280
+ def save_token(token):
281
+ try: _hf_hub.login(token=token)
282
+ except Exception: pass
283
+ @staticmethod
284
+ def delete_token():
285
+ try: _hf_hub.logout()
286
+ except Exception: pass
287
+ _hf_hub.HfFolder = _HfFolderShim
288
+
289
+ import gradio as gr # safe now
290
+ ```
291
+
292
+ See full example in [Demo Space app.py](https://huggingface.co/spaces/GOSHUNCLE/tool_call_validator_zh_demo/blob/main/app.py).
293
+
294
+ #### Inference latency by platform
295
+
296
+ | Platform | Latency / sample | Use case |
297
+ |---|---|---|
298
+ | HF free CPU Space (2 vCPU, 16 GB) | 90-180 s | Demo / validation |
299
+ | HF T4 GPU Space (~$0.40/hr) | 1-3 s | Light production |
300
+ | Local NVIDIA GPU (RTX 3060+) | 1-2 s | Self-host |
301
+ | Local CPU (Intel Core Ultra 7+) | 30-60 s | Offline batch |
302
+
303
  ### Methodology Inheritance
304
 
305
  This model inherits the methodology from [GOSHUNCLE/ic_content_firewall_zh](https://huggingface.co/GOSHUNCLE/ic_content_firewall_zh) (IC design industry content firewall):