File size: 12,885 Bytes

---
license: apache-2.0
language:
- zh
- en
base_model: Qwen/Qwen2.5-3B-Instruct
library_name: peft
pipeline_tag: text-generation
tags:
- lora
- peft
- tool-selection
- tool-call
- guardrail
- chinese
- traditional-chinese
- fine-tuned
- qwen2
---

![banner](./banner.svg)

# tool_call_validator_zh

>  LoRA fine-tune of Qwen2.5-3B-Instruct
> Traditional Chinese tool-call validator (guardrail) — LoRA fine-tune of Qwen2.5-3B-Instruct

**🚀 [Try the live demo →](https://huggingface.co/spaces/GOSHUNCLE/tool_call_validator_zh_demo)** · 

---

## 中文說明

本模型是針對 **Tool Call Validation** 場景微調的繁體中文模型。基於 [Qwen/Qwen2.5-3B-Instruct](https://huggingface.co/Qwen/Qwen2.5-3B-Instruct) 用 LoRA 訓練，能夠：

1. 讀取使用者請求（user prompt）與多個候選工具的 description
2. 透過語意比對選出最適合的工具，或在無合適工具時拒絕匹配
3. 同時輸出結構化的 reasoning（含意圖識別、關鍵詞訊號、結論）

設計用途為**與服務模型並行運行的獨立驗證器**：當服務模型做出 tool call 決策時，本 guardrail 同步給出獨立判斷，提供下游決策機制（人工或仲裁邏輯）參考。

### 任務輸出格式

```json
{
  "reasoning": {
    "intent_summary": "<30-60字：辨識使用者意圖>",
    "key_signals": "<20-40字：抓出使用者請求中的關鍵詞與語意訊號>",
    "conclusion": "<30-60字：說明為什麼選 X 或為什麼拒絕匹配>"
  },
  "selected_tool": "<候選工具名稱，或在拒絕匹配時為 null>",
  "signal": "commit | abstain",
  "confidence": "high | medium | low"
}
```

| 欄位 | 說明 |
|---|---|
| `selected_tool` | commit 時必為候選清單之一，abstain 時為 `null` |
| `signal` | `commit`（明確選定工具）/ `abstain`（候選清單無合適工具）|
| `confidence` | `high` / `medium` / `low`，反映模型自我評估強度 |
| `reasoning.intent_summary` | 使用者意圖的精煉描述 |
| `reasoning.key_signals` | 觸發決策的關鍵詞 / 語意訊號 |
| `reasoning.conclusion` | 為何選定（或拒絕）的具體理由 |

### Performance（三層次評估）

三層次評估設計：

| Metric | L1 base | L2 adapter | L3 +Filter |
|---|---:|---:|---:|
| Format Validity | 100.0% | 100.0% | 100.0% |
| **Tool Accuracy** | 57.0% | **100.0%** | **100.0%** |
| **Signal Accuracy** | 73.0% | **100.0%** | **100.0%** |
| **Confidence Accuracy** | 48.0% | **99.0%** | **99.0%** |
| False Alarm Rate | 0.0% | 0.0% | 0.0% |
| Miss Rate | 40.9% | 0.0% | 0.0% |

- **L1 base**：base Qwen2.5-3B（無微調，無 Filter）
- **L2 adapter**：套用 LoRA adapter，無 Filter
- **L3 adapter + Filter**：套用 LoRA adapter + Schema validation + Provenance check

#### 三個關鍵發現

1. **微調貢獻 +27% ~ +51%**（L1 → L2）：base model 偏向過度保守（miss rate 40.9% — 該 commit 卻 abstain），confidence 級別接近瞎猜（48%）。微調全部修正。
2. **Filter 貢獻 = 0**（L2 ≡ L3）：與 memory_2 IC Firewall 相同現象。微調後輸出已無格式錯誤、selected_tool 必在候選中。Filter 仍保留作為 OOD 保險網。
3. **Confidence 是微調貢獻最大維度**（+51%）：base 對 high/medium/low 無 calibration 能力，微調學到 99%。

### Quick Start

```python
import json
import torch
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

base_model = "Qwen/Qwen2.5-3B-Instruct"
adapter = "GOSHUNCLE/tool_call_validator_zh"

tokenizer = AutoTokenizer.from_pretrained(base_model)
model = AutoModelForCausalLM.from_pretrained(
    base_model, torch_dtype=torch.float16, device_map="auto"
)
model = PeftModel.from_pretrained(model, adapter)
model.eval()

SYSTEM_PROMPT = """你是工具選擇守門員（Tool Selection Guardrail）。
（完整 system prompt 見 inference.py）"""

def detect(user_prompt: str, tools: list) -> dict:
    tools_block = "\n".join(f"{i+1}. {t['name']}: {t['description']}"
                              for i, t in enumerate(tools))
    user_msg = f"使用者請求：\n{user_prompt}\n\n候選工具：\n{tools_block}"
    messages = [
        {"role": "system", "content": SYSTEM_PROMPT},
        {"role": "user", "content": user_msg},
    ]
    prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
    inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
    with torch.inference_mode():
        outputs = model.generate(**inputs, max_new_tokens=384, do_sample=False,
                                  pad_token_id=tokenizer.pad_token_id)
    text = tokenizer.decode(outputs[0][inputs.input_ids.shape[1]:], skip_special_tokens=True)
    start = text.find("{")
    end = text.rfind("}")
    return json.loads(text[start:end+1])

# 範例
result = detect(
    user_prompt="請幫我查一下今天台北的 PM2.5 空氣品質指數。",
    tools=[
        {"name": "web_search", "description": "透過搜尋引擎即時取得網路上最新資訊"},
        {"name": "calendar_view", "description": "查看使用者的行事曆"},
        {"name": "calculator", "description": "進行數值與數學運算"},
    ],
)
print(json.dumps(result, ensure_ascii=False, indent=2))
```

### Inference Safeguards

雖然 L2 ≡ L3 顯示 Filter 在 in-distribution 上未激活，但建議**在 production 部署仍保留以下安全層**：

#### Filter 1: Schema Validation

驗證模型輸出 JSON 是否符合預期結構：

- `signal` 必為 `commit` 或 `abstain`
- `confidence` 必為 `high` / `medium` / `low`
- `reasoning` 必含三段（intent_summary, key_signals, conclusion）
- commit 時 `selected_tool` 不可為 null

Invalid 時 fallback：`{signal: "abstain", confidence: "low", selected_tool: null}`

#### Filter 2: Provenance Check

驗證 commit 時的 `selected_tool` 必逐字出現在輸入候選清單中。若不在 → fallback abstain。這層保護避免模型在 OOD 時幻覺出不存在的 tool 名稱。

完整實作見 [inference.py](./inference.py)。

### Limitations

#### 限制 A：Holdout In-distribution

訓練資料與 holdout 共用 template + slot pool。100% 命中**僅反映 in-distribution 表現**，真實業界口語（OOD）的泛化能力**未經實測**。實際使用時請以 confidence 訊號 + Filter 作為保險。

#### 限制 B：8 個工具受限

模型訓練資料限定於 8 個合成虛構工具（web_search / knowledge_qa / news_lookup / fact_check / translator / calculator / calendar_view / summarizer），對 8 個工具以外的場景未驗證。但設計上模型應該能對任何 tool description 做語意比對，因為訓練時 description 是動態填入 prompt 的。

#### 限制 C：Reasoning 中文偏正式書面語

訓練樣本 reasoning 風格偏向「翻譯式書面語」，對極度口語化的輸入可能略顯生硬。

### Deployment Notes（部署注意事項）

#### Gradio + huggingface_hub 相容性 shim

若要將本模型整合進 **Gradio app**（包括 HF Space），請在 `import gradio` 之前加入以下 monkey-patch，避免 `ImportError: cannot import name 'HfFolder' from 'huggingface_hub'`：

```python
# === Compat shim：huggingface_hub >= 1.0 移除了 HfFolder，但 gradio (4.x 與 5.x) 還在用 ===
import huggingface_hub as _hf_hub
if not hasattr(_hf_hub, "HfFolder"):
    class _HfFolderShim:
        @staticmethod
        def get_token():
            try: return _hf_hub.get_token()
            except Exception: return None
        @staticmethod
        def save_token(token):
            try: _hf_hub.login(token=token)
            except Exception: pass
        @staticmethod
        def delete_token():
            try: _hf_hub.logout()
            except Exception: pass
    _hf_hub.HfFolder = _HfFolderShim

import gradio as gr  # safe now
```

完整實例見 [Demo Space app.py](https://huggingface.co/spaces/GOSHUNCLE/tool_call_validator_zh_demo/blob/main/app.py)。

#### 部署平台建議

| 平台 | 推論時間/筆 | 適用 |
|---|---|---|
| HF 免費 CPU Space (2 vCPU, 16 GB) | 90-180 秒 | Demo / 驗證 |
| HF T4 GPU Space (~$0.40/hr) | 1-3 秒 | Light production |
| 本機 NVIDIA GPU (RTX 3060+) | 1-2 秒 | Self-host |
| 本機 CPU (Intel Core Ultra 7+) | 30-60 秒 | Offline batch |

#### GGUF 量化（未實作，v2 backlog）

如需更快 CPU 推論，可考慮 merge LoRA 後轉 GGUF Q4，預估 CPU 推論可降至 ~5-10 秒/筆。

### Disclaimer

訓練資料中的工具名稱（web_search 等 8 個）為**合成虛構**，用於 demonstrate 方法論。所有股票標的、人物、地點等 slot pool 內容皆為公開資訊範例，無暗示任何商業關係。

---

## English

This is a **LoRA fine-tune of Qwen2.5-3B-Instruct** for Traditional Chinese tool-call validation (guardrail). The model:

1. Reads a user prompt and a list of candidate tools (with descriptions)
2. Selects the most appropriate tool via semantic matching, or abstains if none is suitable
3. Outputs structured reasoning (intent summary, key signals, conclusion)

It is designed to run **as an independent validator in parallel with a serving LLM** that produces actual tool calls. The guardrail's output serves as a reference for downstream arbitration (human review or programmatic logic).

### Performance Summary

| Metric | L1 base | L2 adapter | L3 +Filter |
|---|---:|---:|---:|
| Format Validity | 100.0% | 100.0% | 100.0% |
| Tool Accuracy | 57.0% | **100.0%** | 100.0% |
| Signal Accuracy | 73.0% | **100.0%** | 100.0% |
| Confidence Accuracy | 48.0% | **99.0%** | 99.0% |
| False Alarm Rate | 0.0% | 0.0% | 0.0% |
| Miss Rate | 40.9% | 0.0% | 0.0% |

The base Qwen2.5-3B-Instruct achieves 57% tool accuracy and 48% confidence accuracy. After LoRA fine-tuning on 600 synthetic samples (Traditional Chinese), the model reaches 100% tool accuracy and 99% confidence accuracy on the in-distribution holdout. The two-layer post-processing filter (Schema + Provenance) is retained as a safety net for out-of-distribution inputs.

### Training Details

| Item | Value |
|---|---|
| Base model | Qwen/Qwen2.5-3B-Instruct |
| Method | LoRA (r=16, alpha=32, dropout=0.05) |
| Target modules | q_proj, k_proj, v_proj, o_proj |
| Training data | 600 synthetic samples (Traditional Chinese) |
| Validation data | 100 in-distribution holdout samples |
| Epochs | 3 |
| Batch size | 2 × grad_accum 4 (effective 8) |
| Learning rate | 2e-4 (cosine schedule, warmup 5%) |
| Max length | 1024 |
| Hardware | Google Colab T4 (15 GB VRAM, fp16) |
| Training time | ~4.4 hours |
| Best eval_loss | 0.0051 |

### Deployment Notes

#### Gradio compatibility shim

If you integrate this model into a **Gradio app** (including HF Spaces), add this monkey-patch before `import gradio` to avoid `ImportError: cannot import name 'HfFolder' from 'huggingface_hub'`:

```python
# Compat shim: huggingface_hub >= 1.0 removed HfFolder, but gradio (4.x and 5.x) still imports it
import huggingface_hub as _hf_hub
if not hasattr(_hf_hub, "HfFolder"):
    class _HfFolderShim:
        @staticmethod
        def get_token():
            try: return _hf_hub.get_token()
            except Exception: return None
        @staticmethod
        def save_token(token):
            try: _hf_hub.login(token=token)
            except Exception: pass
        @staticmethod
        def delete_token():
            try: _hf_hub.logout()
            except Exception: pass
    _hf_hub.HfFolder = _HfFolderShim

import gradio as gr  # safe now
```

See full example in [Demo Space app.py](https://huggingface.co/spaces/GOSHUNCLE/tool_call_validator_zh_demo/blob/main/app.py).

#### Inference latency by platform

| Platform | Latency / sample | Use case |
|---|---|---|
| HF free CPU Space (2 vCPU, 16 GB) | 90-180 s | Demo / validation |
| HF T4 GPU Space (~$0.40/hr) | 1-3 s | Light production |
| Local NVIDIA GPU (RTX 3060+) | 1-2 s | Self-host |
| Local CPU (Intel Core Ultra 7+) | 30-60 s | Offline batch |

### Methodology Inheritance

This model inherits the methodology from [GOSHUNCLE/ic_content_firewall_zh](https://huggingface.co/GOSHUNCLE/ic_content_firewall_zh) (IC design industry content firewall):

- Dual-track data synthesis (handwritten seed + template-based expansion)
- Three-tier evaluation design (base / adapter / adapter+filter)
- Filter philosophy (Schema validation + Provenance check as healthy minimal set)
- Open-source minimal disclosure strategy

### License

Apache 2.0. See [LICENSE](./LICENSE).

### Citation

If this model contributes to your research or product, please cite:

```bibtex
@misc{tool_call_validator_zh_2026,
  author = {GOSHUNCLE},
  title  = {tool_call_validator_zh: Traditional Chinese Tool Call Validator (LoRA fine-tune of Qwen2.5-3B)},
  year   = {2026},
  url    = {https://huggingface.co/GOSHUNCLE/tool_call_validator_zh},
}
```