File size: 10,400 Bytes
945db12
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
# SimpleTool Skill β€” Real-Time AI Application Development

> **This is a skill file.** Feed it to any AI coding assistant (Claude, Gemini, GPT, Cursor, etc.) as context, then describe the app you want. The AI will generate a working SimpleTool-powered application.
>
> Example prompt: *"Read the attached SimpleTool skill, then build me a Pong game where AI controls one paddle in real-time."*

---

## 1. What is SimpleTool?

SimpleTool is a **multi-head parallel decoding** server for real-time LLM function calling. It runs on vLLM and decodes function name + arguments simultaneously instead of sequentially.

```
Traditional:  function β†’ arg1 β†’ arg2 β†’ ...  (sequential, ~200-500ms)
SimpleTool:   [function, arg1, arg2, ...]    (parallel,   ~25-60ms)
```

**Application domains**: game AI, robotic arm control, digital human animation, IoT automation β€” anything that needs < 100ms LLM decision-making.

## 2. Server API

Server default: `http://localhost:8899`

### Endpoints
| Method | Path | Description |
|--------|------|-------------|
| GET | `/health` | Health check, returns `{status, version, model}` |
| POST | `/v1/function_call` | Multi-head parallel function call |

### Request Format (v2)
```javascript
{
  messages: [{role: 'user', content: 'your query'}],
  tools: [...],                // OpenAI-format tool definitions
  system: "domain prompt",     // Domain-specific system prompt (v2)
  environment: [...],          // Current state info (string array, optional)
  history: [...],              // Action history (string array, max 6)
  include_content_head: false  // Whether to generate <content> head
}
```

The `system` field lets you inject a domain-specific system prompt (e.g., "You are a robotic arm controller"). If omitted, the server uses a generic default. The `environment` field is optional context folded into the user message.

### Response Format
```javascript
{
  success: true,
  function: "move",
  args: {direction: "up", speed: "fast"},   // Named args (param names from tool def)
  heads: {                                   // Raw per-head output
    function: "move",
    arg1: "up",
    arg2: "fast",
    arg3: "<|null|>"
  },
  content: null,       // Only if include_content_head was true
  latency_ms: 35.2
}
```

## 3. Dynamic Head Count (Critical for Latency!)

**The server automatically prunes unused heads.** If your tools have at most 2 parameters, only 3 heads are spawned (`<function>`, `<arg1>`, `<arg2>`), not 8. This saves ~40% latency.

```
Active heads = [<function>] + [<arg1>...<argN>]
where N = max parameter count across all tool definitions
```

**Design tip**: Keep your tools to 1–3 parameters when possible. Fewer params = fewer heads = lower latency.

## 4. Tool Definition

### Constraints
- Maximum **6 arguments** per function (arg1–arg6)
- Arguments map to `arg1, arg2, ...` in the order defined in `properties`
- Server auto-converts types: numeric strings β†’ int/float, otherwise lowercase string
- Use `enum` to constrain options β€” this dramatically improves accuracy

### Template
```javascript
const TOOLS = [{
  type: "function",
  function: {
    name: "action_name",
    description: "Clear, concise β€” what this action does and when to use it",
    parameters: {
      type: "object",
      properties: {
        param1: {
          type: "string",
          enum: ["opt_a", "opt_b", "opt_c"],  // Constrain! Improves accuracy
          description: "What this param controls"
        },
        param2: {
          type: "number",
          description: "Numeric value with unit, e.g. 'Force in Newtons'"
        }
      },
      required: ["param1"]
    }
  }
}];
```

### Multi-Tool Example (Game)
```javascript
const TOOLS = [
  {type:"function", function:{name:"move",    description:"Move unit to position", parameters:{type:"object", properties:{unit:{type:"string"}, target:{type:"string", enum:["north","south","east","west"]}}}}},
  {type:"function", function:{name:"attack",  description:"Attack enemy",          parameters:{type:"object", properties:{unit:{type:"string"}, target:{type:"string"}}}}},
  {type:"function", function:{name:"retreat",  description:"Pull back unit",        parameters:{type:"object", properties:{unit:{type:"string"}}}}},
  {type:"function", function:{name:"pass",     description:"Do nothing this turn",  parameters:{type:"object", properties:{}}}}
];
// Max params = 2 β†’ only 3 heads spawned
```

## 5. Query Design

### Principles
1. **Be imperative** β€” tell the model what to decide, not just describe state
2. **Include decision context** β€” "Ball is BELOW paddle, intercept it" not "Ball y=250"
3. **List valid options** β€” "Choose: up/down/stay"
4. **Keep it short** β€” shorter query = faster prefill

### Good vs Bad
```
βœ… "Ball 50px BELOW paddle, approaching fast. Move DOWN to intercept. Choose: up/down/stay"
❌ "Ball position: 250, Paddle position: 200. What should I do?"

βœ… "Red gear at (300,150,50). Move arm there slowly for pickup."
❌ "There is a gear somewhere on the table. The arm needs to go to it."

βœ… "Stream starting, viewers saying hello. Greet them warmly."
❌ "Viewers are in the chat. Do something appropriate."
```

### Environment & History
```javascript
// Environment: current state as key=value strings
const env = [
  `ball_y=${ballY}`,
  `paddle_y=${paddleY}`,
  `gap=${gap}`,
  `approaching=true`
];

// History: recent actions (max 6, server trims automatically)
const history = [
  "move(up)", "move(up)", "stay()"
];
```

### Domain System Prompts (v2)
For v2 server, set a domain-specific system prompt:
```javascript
// Game AI
const SYSTEM = "You are the AI controller for a Pong game. Move the paddle to intercept the ball. React quickly.";

// Robotic arm
const SYSTEM = "You are the voice controller for a 6-axis robotic arm. Convert commands to precise function calls. Coordinates in mm.";

// Digital human
const SYSTEM = "You are the animation controller for a virtual streamer. Convert director instructions to expression and speech calls.";
```

## 6. Frontend Code Standards

### Required: Type-Safe Value Extraction
```javascript
// Values in args may be int, not string β€” always coerce
function safeStr(v) {
  if (v === null || v === undefined) return '';
  return String(v).trim().toLowerCase();
}

// Extract with args (named) first, heads (positional) as fallback
let direction = safeStr(d.args?.direction) || safeStr(d.heads?.arg1);
```

### Required: Validate Return Values
```javascript
const VALID = ['up', 'down', 'stay'];
if (!VALID.includes(direction)) {
  console.warn(`Invalid: "${direction}", fallback to stay`);
  direction = 'stay';
}
```

### Required: Error Handling with Fallback
```javascript
async function callAI() {
  try {
    const r = await fetch(SERVER_URL + '/v1/function_call', {
      method: 'POST',
      headers: {'Content-Type': 'application/json'},
      body: JSON.stringify(request)
    });
    const data = await r.json();
    if (!data.success) throw new Error(data.error);
    applyAction(data);
  } catch (e) {
    console.error('[AI] Failed:', e);
    applyFallbackAI();  // MUST have fallback β€” never freeze the app
  }
}
```

### Required: Logging
```javascript
console.log(`[Game] Query: ${query}`);
console.log(`[Game] β†’ ${data.function}(${JSON.stringify(data.args)}) ${data.latency_ms.toFixed(0)}ms`);
```

### Recommended: Debug UI Overlay
Show in a corner of your app: current query, raw response, latency (current + rolling average).

## 7. Game Loop Pattern

**Decouple AI from rendering.** The AI loop runs at 10–16 Hz; the render loop runs at 60 fps.

```javascript
const AI_INTERVAL = 100;  // 100ms = 10 Hz
let aiPending = false;

// Render loop (60fps) β€” never blocks on AI
function gameLoop() {
  update();
  render();
  requestAnimationFrame(gameLoop);
}

// AI loop (async, non-blocking)
async function aiLoop() {
  if (aiPending) return;
  aiPending = true;
  await callAI();
  aiPending = false;
}

setInterval(aiLoop, AI_INTERVAL);
gameLoop();
```

## 8. FCClient Template

Drop-in client class for any HTML/JS application:

```javascript
class FCClient {
  constructor(url = 'http://localhost:8899') {
    this.url = url.replace(/\/$/, '');
  }

  async health() {
    try {
      const r = await fetch(`${this.url}/health`, {signal: AbortSignal.timeout(3000)});
      const d = await r.json();
      return {ok: d.loaded === true || d.status === 'ok', version: d.version};
    } catch (e) {
      return {ok: false};
    }
  }

  async call({query, tools, system, env, history, includeContent = false}) {
    const t0 = performance.now();
    try {
      const r = await fetch(`${this.url}/v1/function_call`, {
        method: 'POST',
        headers: {'Content-Type': 'application/json'},
        body: JSON.stringify({
          messages: [{role: 'user', content: query}],
          tools,
          system,                              // v2: domain system prompt
          environment: env,
          history,
          include_content_head: includeContent
        })
      });
      const d = await r.json();
      return {...d, wall_ms: performance.now() - t0};
    } catch (e) {
      return {success: false, error: e.message, wall_ms: performance.now() - t0};
    }
  }
}
```

Usage:
```javascript
const ai = new FCClient('http://localhost:8899');

const result = await ai.call({
  query: "Ball is BELOW. Move down. Choose: up/down/stay",
  tools: TOOLS,
  system: "You are a Pong AI. Move paddle to intercept ball.",
  env: ["ball_y=300", "paddle_y=200", "gap=100"],
  history: ["move(down)", "move(down)"]
});

if (result.success) {
  console.log(`${result.function}(${JSON.stringify(result.args)}) in ${result.latency_ms}ms`);
}
```

## 9. Troubleshooting

| Symptom | Cause | Fix |
|---------|-------|-----|
| AI stuck / no movement | Query too vague | Add decision hints: "Move DOWN to intercept" |
| `.trim is not a function` | `args` values may be int | Use `String(v)` before `.trim()` |
| High latency (>100ms) | Too many heads / long query | Reduce tool params, shorten query/env |
| Wrong function called | Ambiguous tool descriptions | Add `enum`, improve `description` fields |
| `<|null|>` in all args | Model confused | Check tool param order matches expectations |

---

**Skill Version**: 2.0 β€” Supports v1/v2 server, multi-domain (game, robotics, avatar)  
**Last Updated**: 2026-03