kevindong-intuit commited on
Commit
a7aef48
·
verified ·
1 Parent(s): 3dddcac

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +145 -4
README.md CHANGED
@@ -6,14 +6,42 @@ datasets:
6
  - intuit/tool-optimizer-dataset
7
  base_model:
8
  - Qwen/Qwen3-4B-Instruct-2507
 
 
 
 
 
 
 
 
9
  ---
10
 
11
- Use this model to improve API tool descriptions for LLM Agents.
12
 
13
- For information on how to do inference or training on this model go to the [Agent Tool Interface Optimizer](https://github.com/intuit-ai-research/tool-optimizer).
14
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
15
 
16
- SFT prompt (Trace-free), you can use for inference without execution traces.
17
  ```
18
  You are an API documentation specialist.
19
 
@@ -49,10 +77,123 @@ Output ONLY valid JSON (no markdown, no code blocks):
49
  {{"description": "<your improved API description here>"}}
50
  ```
51
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
52
 
 
53
 
54
- Example (Before vs After)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
55
 
 
56
 
57
  ![Screenshot 2026-02-20 at 5.23.36 PM](https://cdn-uploads.huggingface.co/production/uploads/65dcb410bda21d181b38321b/dFj0XgXancXD51iyGxC83.png)
58
 
 
6
  - intuit/tool-optimizer-dataset
7
  base_model:
8
  - Qwen/Qwen3-4B-Instruct-2507
9
+ pipeline_tag: text-generation
10
+ library_name: transformers
11
+ tags:
12
+ - agents
13
+ - tool-use
14
+ - sft
15
+ - documentation
16
+ - text-generation
17
  ---
18
 
19
+ # Agent Tool Optimizer (`intuit/agent-tool-optimizer`)
20
 
21
+ `intuit/agent-tool-optimizer` is a **supervised fine-tuned (SFT)** model that rewrites **tool / API descriptions** to be more usable by **LLM agents**. Given a tool name, a parameter schema, and a baseline (often human-written) description, the model produces an improved description that helps an agent:
22
 
23
+ - decide **when to use vs. not use** the tool
24
+ - generate **valid parameters** (required vs optional, constraints, defaults)
25
+ - avoid common mistakes and likely validation failures
26
+
27
+ This model is trained to work in a **trace-free** setting at inference time (i.e., **no tool execution traces required**).
28
+
29
+ For the accompanying codebase (inference + training), see: [Agent Tool Interface Optimizer](https://github.com/intuit-ai-research/tool-optimizer).
30
+
31
+ ---
32
+
33
+ ## What problem does this solve?
34
+
35
+ Tool interfaces (descriptions + parameter schemas) are the “contract” between agents and tools, but are typically written for humans. When descriptions under-specify **required parameters**, omit **constraints**, or fail to explain **tool boundaries**, agent performance can plateau and can degrade as the number of available tools increases.
36
+
37
+ We study tool interface improvement as a scalable complement to agent fine-tuning, and propose **Trace-Free+**: a curriculum-learning approach that transfers knowledge learned from trace-rich training to trace-free inference for unseen tools.
38
+
39
+ ---
40
+
41
+ ## Recommended prompt (trace-free)
42
+
43
+ This is the **canonical inference prompt** used for trace-free tool description generation (also available as `tool_prompt.txt` in the `tool-optimizer` repo).
44
 
 
45
  ```
46
  You are an API documentation specialist.
47
 
 
77
  {{"description": "<your improved API description here>"}}
78
  ```
79
 
80
+ ### Inputs
81
+
82
+ - **`tool_name`**: the tool/API name
83
+ - **`parameter_json`**: a JSON string describing the parameter schema (treat this as authoritative)
84
+ - **`original_description`**: the baseline description you want to improve
85
+
86
+ ### Output
87
+
88
+ The model is trained to output **only valid JSON** with a single field:
89
+
90
+ - **`description`**: the improved tool description (string)
91
+
92
+ ---
93
+
94
+ ## Prompt variation guidance (SFT-sensitive)
95
+
96
+ Because this model is SFT to follow a specific prompt and output contract, it can be sensitive to prompt changes. The safest strategy is to treat the prompt as a template and apply only **minimal, well-scoped edits**.
97
+
98
+ ### Prompt invariants (do not change)
99
+
100
+ - Keep the three input slots exactly: `{tool_name}`, `{parameter_json}`, `{original_description}`
101
+ - Keep: **“Output ONLY valid JSON (no markdown, no code blocks)”**
102
+ - Keep the output schema exactly: `{"description": "..."}` (same key name; no extra keys)
103
 
104
+ ### Safe, minimal edits (usually OK)
105
 
106
+ - Add 1–3 bullets under **“Infer (do not output)”** to clarify what to pay attention to
107
+ - Add constraints under **“Write a clear API description that:”** as additional bullets
108
+ - Add brief reminders about schema authority, parameter-name exactness, or concision
109
+
110
+ ### Risky edits (often break JSON / behavior)
111
+
112
+ - Reordering or removing the output-format lines
113
+ - Asking for examples, multi-part outputs, markdown, or extra keys
114
+ - Changing placeholder names or introducing new “inputs” not present during training
115
+
116
+ ### Concrete example: minimal diff that still tends to work
117
+
118
+ The prompt below is a conservative variation. It adds clarifications without changing the core structure or output contract:
119
+
120
+ ```diff
121
+ Infer (do not output):
122
+ - Preserve key lexical tokens from the baseline description that may match user queries
123
+ - Clarify boundaries if this API could be confused with similar tools
124
+
125
+ Write a clear API description that:
126
+ - Treats the parameter schema as authoritative and does not introduce fields, types, or requirements not defined in it
127
+ - Explains each parameter's meaning ... while keeping parameter names exactly as defined in the schema
128
+ - Lists REQUIRED parameters before optional ones
129
+ - Uses enumerated or candidate values exactly as defined in the schema when applicable
130
+ - Describes likely validation failures strictly based on schema-defined constraints ...
131
+ - Keeps the description concise and avoids unnecessary verbosity
132
+ ```
133
+
134
+ ---
135
+
136
+ ## Inference
137
+
138
+ ### Option A: Use the `tool-optimizer` library (recommended)
139
+
140
+ The open-source repo includes a working CLI that runs this model with either **vLLM** or **Hugging Face Transformers**:
141
+
142
+ ```bash
143
+ git clone https://github.com/intuit-ai-research/tool-optimizer
144
+ cd tool-optimizer
145
+
146
+ # Install (one option)
147
+ python -m pip install -e .
148
+
149
+ # Run inference (vLLM default)
150
+ python src/agent_tool_optimizer/inference_main.py \
151
+ --model_name intuit/agent-tool-optimizer \
152
+ --dataset_id intuit/tool-optimizer-dataset
153
+ ```
154
+
155
+ Notes:
156
+
157
+ - `--inference_engine vllm` (default) or `--inference_engine hf`
158
+ - The dataset is expected to have a `test` split with a `prompt` field.
159
+
160
+ ### Option B: Transformers (direct)
161
+
162
+ ```python
163
+ import json
164
+ from transformers import pipeline
165
+ import torch
166
+
167
+ model_id = "intuit/agent-tool-optimizer"
168
+ gen = pipeline(
169
+ "text-generation",
170
+ model=model_id,
171
+ torch_dtype=torch.bfloat16,
172
+ device_map="auto",
173
+ trust_remote_code=True,
174
+ )
175
+
176
+ prompt = """<prompt above>"""
177
+
178
+ out = gen(
179
+ [{\"role\": \"user\", \"content\": prompt}],
180
+ max_new_tokens=512,
181
+ do_sample=True,
182
+ temperature=0.6,
183
+ top_p=0.95,
184
+ top_k=40,
185
+ return_full_text=False,
186
+ )
187
+ result = out[0][\"generated_text\"]
188
+ print(result)
189
+
190
+ # Optional: validate JSON
191
+ json.loads(result)
192
+ ```
193
+
194
+ ---
195
 
196
+ ## Example (Before vs After)
197
 
198
  ![Screenshot 2026-02-20 at 5.23.36 PM](https://cdn-uploads.huggingface.co/production/uploads/65dcb410bda21d181b38321b/dFj0XgXancXD51iyGxC83.png)
199