File size: 10,400 Bytes
945db12 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 | # SimpleTool Skill β Real-Time AI Application Development
> **This is a skill file.** Feed it to any AI coding assistant (Claude, Gemini, GPT, Cursor, etc.) as context, then describe the app you want. The AI will generate a working SimpleTool-powered application.
>
> Example prompt: *"Read the attached SimpleTool skill, then build me a Pong game where AI controls one paddle in real-time."*
---
## 1. What is SimpleTool?
SimpleTool is a **multi-head parallel decoding** server for real-time LLM function calling. It runs on vLLM and decodes function name + arguments simultaneously instead of sequentially.
```
Traditional: function β arg1 β arg2 β ... (sequential, ~200-500ms)
SimpleTool: [function, arg1, arg2, ...] (parallel, ~25-60ms)
```
**Application domains**: game AI, robotic arm control, digital human animation, IoT automation β anything that needs < 100ms LLM decision-making.
## 2. Server API
Server default: `http://localhost:8899`
### Endpoints
| Method | Path | Description |
|--------|------|-------------|
| GET | `/health` | Health check, returns `{status, version, model}` |
| POST | `/v1/function_call` | Multi-head parallel function call |
### Request Format (v2)
```javascript
{
messages: [{role: 'user', content: 'your query'}],
tools: [...], // OpenAI-format tool definitions
system: "domain prompt", // Domain-specific system prompt (v2)
environment: [...], // Current state info (string array, optional)
history: [...], // Action history (string array, max 6)
include_content_head: false // Whether to generate <content> head
}
```
The `system` field lets you inject a domain-specific system prompt (e.g., "You are a robotic arm controller"). If omitted, the server uses a generic default. The `environment` field is optional context folded into the user message.
### Response Format
```javascript
{
success: true,
function: "move",
args: {direction: "up", speed: "fast"}, // Named args (param names from tool def)
heads: { // Raw per-head output
function: "move",
arg1: "up",
arg2: "fast",
arg3: "<|null|>"
},
content: null, // Only if include_content_head was true
latency_ms: 35.2
}
```
## 3. Dynamic Head Count (Critical for Latency!)
**The server automatically prunes unused heads.** If your tools have at most 2 parameters, only 3 heads are spawned (`<function>`, `<arg1>`, `<arg2>`), not 8. This saves ~40% latency.
```
Active heads = [<function>] + [<arg1>...<argN>]
where N = max parameter count across all tool definitions
```
**Design tip**: Keep your tools to 1β3 parameters when possible. Fewer params = fewer heads = lower latency.
## 4. Tool Definition
### Constraints
- Maximum **6 arguments** per function (arg1βarg6)
- Arguments map to `arg1, arg2, ...` in the order defined in `properties`
- Server auto-converts types: numeric strings β int/float, otherwise lowercase string
- Use `enum` to constrain options β this dramatically improves accuracy
### Template
```javascript
const TOOLS = [{
type: "function",
function: {
name: "action_name",
description: "Clear, concise β what this action does and when to use it",
parameters: {
type: "object",
properties: {
param1: {
type: "string",
enum: ["opt_a", "opt_b", "opt_c"], // Constrain! Improves accuracy
description: "What this param controls"
},
param2: {
type: "number",
description: "Numeric value with unit, e.g. 'Force in Newtons'"
}
},
required: ["param1"]
}
}
}];
```
### Multi-Tool Example (Game)
```javascript
const TOOLS = [
{type:"function", function:{name:"move", description:"Move unit to position", parameters:{type:"object", properties:{unit:{type:"string"}, target:{type:"string", enum:["north","south","east","west"]}}}}},
{type:"function", function:{name:"attack", description:"Attack enemy", parameters:{type:"object", properties:{unit:{type:"string"}, target:{type:"string"}}}}},
{type:"function", function:{name:"retreat", description:"Pull back unit", parameters:{type:"object", properties:{unit:{type:"string"}}}}},
{type:"function", function:{name:"pass", description:"Do nothing this turn", parameters:{type:"object", properties:{}}}}
];
// Max params = 2 β only 3 heads spawned
```
## 5. Query Design
### Principles
1. **Be imperative** β tell the model what to decide, not just describe state
2. **Include decision context** β "Ball is BELOW paddle, intercept it" not "Ball y=250"
3. **List valid options** β "Choose: up/down/stay"
4. **Keep it short** β shorter query = faster prefill
### Good vs Bad
```
β
"Ball 50px BELOW paddle, approaching fast. Move DOWN to intercept. Choose: up/down/stay"
β "Ball position: 250, Paddle position: 200. What should I do?"
β
"Red gear at (300,150,50). Move arm there slowly for pickup."
β "There is a gear somewhere on the table. The arm needs to go to it."
β
"Stream starting, viewers saying hello. Greet them warmly."
β "Viewers are in the chat. Do something appropriate."
```
### Environment & History
```javascript
// Environment: current state as key=value strings
const env = [
`ball_y=${ballY}`,
`paddle_y=${paddleY}`,
`gap=${gap}`,
`approaching=true`
];
// History: recent actions (max 6, server trims automatically)
const history = [
"move(up)", "move(up)", "stay()"
];
```
### Domain System Prompts (v2)
For v2 server, set a domain-specific system prompt:
```javascript
// Game AI
const SYSTEM = "You are the AI controller for a Pong game. Move the paddle to intercept the ball. React quickly.";
// Robotic arm
const SYSTEM = "You are the voice controller for a 6-axis robotic arm. Convert commands to precise function calls. Coordinates in mm.";
// Digital human
const SYSTEM = "You are the animation controller for a virtual streamer. Convert director instructions to expression and speech calls.";
```
## 6. Frontend Code Standards
### Required: Type-Safe Value Extraction
```javascript
// Values in args may be int, not string β always coerce
function safeStr(v) {
if (v === null || v === undefined) return '';
return String(v).trim().toLowerCase();
}
// Extract with args (named) first, heads (positional) as fallback
let direction = safeStr(d.args?.direction) || safeStr(d.heads?.arg1);
```
### Required: Validate Return Values
```javascript
const VALID = ['up', 'down', 'stay'];
if (!VALID.includes(direction)) {
console.warn(`Invalid: "${direction}", fallback to stay`);
direction = 'stay';
}
```
### Required: Error Handling with Fallback
```javascript
async function callAI() {
try {
const r = await fetch(SERVER_URL + '/v1/function_call', {
method: 'POST',
headers: {'Content-Type': 'application/json'},
body: JSON.stringify(request)
});
const data = await r.json();
if (!data.success) throw new Error(data.error);
applyAction(data);
} catch (e) {
console.error('[AI] Failed:', e);
applyFallbackAI(); // MUST have fallback β never freeze the app
}
}
```
### Required: Logging
```javascript
console.log(`[Game] Query: ${query}`);
console.log(`[Game] β ${data.function}(${JSON.stringify(data.args)}) ${data.latency_ms.toFixed(0)}ms`);
```
### Recommended: Debug UI Overlay
Show in a corner of your app: current query, raw response, latency (current + rolling average).
## 7. Game Loop Pattern
**Decouple AI from rendering.** The AI loop runs at 10β16 Hz; the render loop runs at 60 fps.
```javascript
const AI_INTERVAL = 100; // 100ms = 10 Hz
let aiPending = false;
// Render loop (60fps) β never blocks on AI
function gameLoop() {
update();
render();
requestAnimationFrame(gameLoop);
}
// AI loop (async, non-blocking)
async function aiLoop() {
if (aiPending) return;
aiPending = true;
await callAI();
aiPending = false;
}
setInterval(aiLoop, AI_INTERVAL);
gameLoop();
```
## 8. FCClient Template
Drop-in client class for any HTML/JS application:
```javascript
class FCClient {
constructor(url = 'http://localhost:8899') {
this.url = url.replace(/\/$/, '');
}
async health() {
try {
const r = await fetch(`${this.url}/health`, {signal: AbortSignal.timeout(3000)});
const d = await r.json();
return {ok: d.loaded === true || d.status === 'ok', version: d.version};
} catch (e) {
return {ok: false};
}
}
async call({query, tools, system, env, history, includeContent = false}) {
const t0 = performance.now();
try {
const r = await fetch(`${this.url}/v1/function_call`, {
method: 'POST',
headers: {'Content-Type': 'application/json'},
body: JSON.stringify({
messages: [{role: 'user', content: query}],
tools,
system, // v2: domain system prompt
environment: env,
history,
include_content_head: includeContent
})
});
const d = await r.json();
return {...d, wall_ms: performance.now() - t0};
} catch (e) {
return {success: false, error: e.message, wall_ms: performance.now() - t0};
}
}
}
```
Usage:
```javascript
const ai = new FCClient('http://localhost:8899');
const result = await ai.call({
query: "Ball is BELOW. Move down. Choose: up/down/stay",
tools: TOOLS,
system: "You are a Pong AI. Move paddle to intercept ball.",
env: ["ball_y=300", "paddle_y=200", "gap=100"],
history: ["move(down)", "move(down)"]
});
if (result.success) {
console.log(`${result.function}(${JSON.stringify(result.args)}) in ${result.latency_ms}ms`);
}
```
## 9. Troubleshooting
| Symptom | Cause | Fix |
|---------|-------|-----|
| AI stuck / no movement | Query too vague | Add decision hints: "Move DOWN to intercept" |
| `.trim is not a function` | `args` values may be int | Use `String(v)` before `.trim()` |
| High latency (>100ms) | Too many heads / long query | Reduce tool params, shorten query/env |
| Wrong function called | Ambiguous tool descriptions | Add `enum`, improve `description` fields |
| `<|null|>` in all args | Model confused | Check tool param order matches expectations |
---
**Skill Version**: 2.0 β Supports v1/v2 server, multi-domain (game, robotics, avatar)
**Last Updated**: 2026-03
|