| | --- |
| | license: apache-2.0 |
| | language: |
| | - en |
| | datasets: |
| | - intuit/tool-optimizer-dataset |
| | base_model: |
| | - Qwen/Qwen3-4B-Instruct-2507 |
| | pipeline_tag: text-generation |
| | library_name: transformers |
| | tags: |
| | - agents |
| | - tool-use |
| | - sft |
| | - documentation |
| | - text-generation |
| | --- |
| | |
| | # Agent Tool Optimizer (`intuit/agent-tool-optimizer`) |
| |
|
| | `intuit/agent-tool-optimizer` is a **supervised fine-tuned (SFT)** model that rewrites **tool / API descriptions** to be more usable by **LLM agents**. Given a tool name, a parameter schema, and a baseline (often human-written) description, the model produces an improved description that helps an agent: |
| |
|
| | - decide **when to use vs. not use** the tool |
| | - generate **valid parameters** (required vs optional, constraints, defaults) |
| | - avoid common mistakes and likely validation failures |
| |
|
| | This model is trained to work in a **trace-free** setting at inference time (i.e., **no tool execution traces required**). |
| |
|
| | For the accompanying codebase (inference + training), see: [Agent Tool Interface Optimizer](https://github.com/intuit-ai-research/tool-optimizer). |
| |
|
| | --- |
| |
|
| | ## What problem does this solve? |
| |
|
| | Tool interfaces (descriptions + parameter schemas) are the “contract” between agents and tools, but are typically written for humans. When descriptions under-specify **required parameters**, omit **constraints**, or fail to explain **tool boundaries**, agent performance can plateau and can degrade as the number of available tools increases. |
| |
|
| | We study tool interface improvement as a scalable complement to agent fine-tuning, and propose **Trace-Free+**: a curriculum-learning approach that transfers knowledge learned from trace-rich training to trace-free inference for unseen tools. |
| |
|
| | --- |
| |
|
| | ## Paper (arXiv) |
| |
|
| | This model is released alongside the preprint: |
| |
|
| | - **Learning to Rewrite Tool Descriptions for Reliable LLM-Agent Tool Use** |
| | Ruocheng Guo, Kaiwen Dong, Xiang Gao, Kamalika Das |
| | arXiv:2602.20426 (2026) — [paper](https://arxiv.org/abs/2602.20426) |
| |
|
| | ### Citation |
| |
|
| | ```bibtex |
| | @misc{guo2026learningrewritetooldescriptions, |
| | title={Learning to Rewrite Tool Descriptions for Reliable LLM-Agent Tool Use}, |
| | author={Ruocheng Guo and Kaiwen Dong and Xiang Gao and Kamalika Das}, |
| | year={2026}, |
| | eprint={2602.20426}, |
| | archivePrefix={arXiv}, |
| | primaryClass={cs.AI}, |
| | url={https://arxiv.org/abs/2602.20426}, |
| | } |
| | ``` |
| |
|
| | --- |
| |
|
| | ## Recommended prompt (trace-free) |
| |
|
| | This is the **canonical inference prompt** used for trace-free tool description generation (also available as `tool_prompt.txt` in the `tool-optimizer` repo). |
| |
|
| | ``` |
| | You are an API documentation specialist. |
| | |
| | Rewrite the API description so an AI agent can: |
| | 1) Decide when to use this API |
| | 2) Generate valid parameters |
| | |
| | Inputs: |
| | - API name: {tool_name} |
| | - Parameter schema: {parameter_json} |
| | - Baseline description: {original_description} |
| | |
| | Infer (do not output): |
| | - When to use vs not use this API |
| | - Required vs optional parameters |
| | - Parameter meanings and constraints |
| | - Cross-parameter dependencies or exclusions |
| | - Common parameter mistakes |
| | - no examples are provided, infer from the schema and baseline description only |
| | |
| | Write a clear API description that: |
| | - States when to use and NOT use the API |
| | - Does not invent or reference non-provided APIs |
| | - Explains each parameter's meaning, type, required/optional status, constraints, and defaults |
| | - Describes likely validation failures and how to avoid them |
| | - Abstracts patterns into general rules |
| | - Does not restate the full schema verbatim |
| | - Does not mention whether examples were provided |
| | |
| | You may replace the baseline description entirely. |
| | |
| | Output ONLY valid JSON (no markdown, no code blocks): |
| | {{"description": "<your improved API description here>"}} |
| | ``` |
| |
|
| | ### Inputs |
| |
|
| | - **`tool_name`**: the tool/API name |
| | - **`parameter_json`**: a JSON string describing the parameter schema (treat this as authoritative) |
| | - **`original_description`**: the baseline description you want to improve |
| | |
| | ### Output |
| | |
| | The model is trained to output **only valid JSON** with a single field: |
| | |
| | - **`description`**: the improved tool description (string) |
| | |
| | --- |
| | |
| | ## Prompt variation guidance (SFT-sensitive) |
| | |
| | Because this model is SFT to follow a specific prompt and output contract, it can be sensitive to prompt changes. The safest strategy is to treat the prompt as a template and apply only **minimal, well-scoped edits**. |
| | |
| | ### Prompt invariants (do not change) |
| | |
| | - Keep the three input slots exactly: `{tool_name}`, `{parameter_json}`, `{original_description}` |
| | - Keep: **“Output ONLY valid JSON (no markdown, no code blocks)”** |
| | - Keep the output schema exactly: `{"description": "..."}` (same key name; no extra keys) |
| |
|
| | ### Safe, minimal edits (usually OK) |
| |
|
| | - Add 1–3 bullets under **“Infer (do not output)”** to clarify what to pay attention to |
| | - Add constraints under **“Write a clear API description that:”** as additional bullets |
| | - Add brief reminders about schema authority, parameter-name exactness, or concision |
| |
|
| | ### Risky edits (often break JSON / behavior) |
| |
|
| | - Reordering or removing the output-format lines |
| | - Asking for examples, multi-part outputs, markdown, or extra keys |
| | - Changing placeholder names or introducing new “inputs” not present during training |
| |
|
| | ### Concrete example: minimal diff that still tends to work |
| |
|
| | The prompt below is a conservative variation. It adds clarifications without changing the core structure or output contract: |
| |
|
| | ```diff |
| | Infer (do not output): |
| | - Preserve key lexical tokens from the baseline description that may match user queries |
| | - Clarify boundaries if this API could be confused with similar tools |
| | |
| | Write a clear API description that: |
| | - Treats the parameter schema as authoritative and does not introduce fields, types, or requirements not defined in it |
| | - Explains each parameter's meaning ... while keeping parameter names exactly as defined in the schema |
| | - Lists REQUIRED parameters before optional ones |
| | - Uses enumerated or candidate values exactly as defined in the schema when applicable |
| | - Describes likely validation failures strictly based on schema-defined constraints ... |
| | - Keeps the description concise and avoids unnecessary verbosity |
| | ``` |
| |
|
| | --- |
| |
|
| | ## Inference |
| |
|
| | ### Option A: Use the `tool-optimizer` library (recommended) |
| |
|
| | The open-source repo includes a working CLI that runs this model with either **vLLM** or **Hugging Face Transformers**: |
| |
|
| | ```bash |
| | git clone https://github.com/intuit-ai-research/tool-optimizer |
| | cd tool-optimizer |
| | |
| | # Install (one option) |
| | python -m pip install -e . |
| | |
| | # Run inference (vLLM default) |
| | python src/agent_tool_optimizer/inference_main.py \ |
| | --model_name intuit/agent-tool-optimizer \ |
| | --dataset_id intuit/tool-optimizer-dataset |
| | ``` |
| |
|
| | Notes: |
| |
|
| | - `--inference_engine vllm` (default) or `--inference_engine hf` |
| | - The dataset is expected to have a `test` split with a `prompt` field. |
| |
|
| | ### Option B: Transformers (direct) |
| |
|
| | ```python |
| | import json |
| | from transformers import pipeline |
| | import torch |
| | |
| | model_id = "intuit/agent-tool-optimizer" |
| | gen = pipeline( |
| | "text-generation", |
| | model=model_id, |
| | torch_dtype=torch.bfloat16, |
| | device_map="auto", |
| | trust_remote_code=True, |
| | ) |
| | |
| | prompt = """<prompt above>""" |
| | |
| | out = gen( |
| | [{\"role\": \"user\", \"content\": prompt}], |
| | max_new_tokens=512, |
| | do_sample=True, |
| | temperature=0.6, |
| | top_p=0.95, |
| | top_k=40, |
| | return_full_text=False, |
| | ) |
| | result = out[0][\"generated_text\"] |
| | print(result) |
| | |
| | # Optional: validate JSON |
| | json.loads(result) |
| | ``` |
| |
|
| | --- |
| |
|
| | ## Example (Before vs After) |
| |
|
| |  |
| |
|
| |
|