SimpleTool Skill β Real-Time AI Application Development
This is a skill file. Feed it to any AI coding assistant (Claude, Gemini, GPT, Cursor, etc.) as context, then describe the app you want. The AI will generate a working SimpleTool-powered application.
Example prompt: "Read the attached SimpleTool skill, then build me a Pong game where AI controls one paddle in real-time."
1. What is SimpleTool?
SimpleTool is a multi-head parallel decoding server for real-time LLM function calling. It runs on vLLM and decodes function name + arguments simultaneously instead of sequentially.
Traditional: function β arg1 β arg2 β ... (sequential, ~200-500ms)
SimpleTool: [function, arg1, arg2, ...] (parallel, ~25-60ms)
Application domains: game AI, robotic arm control, digital human animation, IoT automation β anything that needs < 100ms LLM decision-making.
2. Server API
Server default: http://localhost:8899
Endpoints
| Method | Path | Description |
|---|---|---|
| GET | /health |
Health check, returns {status, version, model} |
| POST | /v1/function_call |
Multi-head parallel function call |
Request Format (v2)
{
messages: [{role: 'user', content: 'your query'}],
tools: [...], // OpenAI-format tool definitions
system: "domain prompt", // Domain-specific system prompt (v2)
environment: [...], // Current state info (string array, optional)
history: [...], // Action history (string array, max 6)
include_content_head: false // Whether to generate <content> head
}
The system field lets you inject a domain-specific system prompt (e.g., "You are a robotic arm controller"). If omitted, the server uses a generic default. The environment field is optional context folded into the user message.
Response Format
{
success: true,
function: "move",
args: {direction: "up", speed: "fast"}, // Named args (param names from tool def)
heads: { // Raw per-head output
function: "move",
arg1: "up",
arg2: "fast",
arg3: "<|null|>"
},
content: null, // Only if include_content_head was true
latency_ms: 35.2
}
3. Dynamic Head Count (Critical for Latency!)
The server automatically prunes unused heads. If your tools have at most 2 parameters, only 3 heads are spawned (<function>, <arg1>, <arg2>), not 8. This saves ~40% latency.
Active heads = [<function>] + [<arg1>...<argN>]
where N = max parameter count across all tool definitions
Design tip: Keep your tools to 1β3 parameters when possible. Fewer params = fewer heads = lower latency.
4. Tool Definition
Constraints
- Maximum 6 arguments per function (arg1βarg6)
- Arguments map to
arg1, arg2, ...in the order defined inproperties - Server auto-converts types: numeric strings β int/float, otherwise lowercase string
- Use
enumto constrain options β this dramatically improves accuracy
Template
const TOOLS = [{
type: "function",
function: {
name: "action_name",
description: "Clear, concise β what this action does and when to use it",
parameters: {
type: "object",
properties: {
param1: {
type: "string",
enum: ["opt_a", "opt_b", "opt_c"], // Constrain! Improves accuracy
description: "What this param controls"
},
param2: {
type: "number",
description: "Numeric value with unit, e.g. 'Force in Newtons'"
}
},
required: ["param1"]
}
}
}];
Multi-Tool Example (Game)
const TOOLS = [
{type:"function", function:{name:"move", description:"Move unit to position", parameters:{type:"object", properties:{unit:{type:"string"}, target:{type:"string", enum:["north","south","east","west"]}}}}},
{type:"function", function:{name:"attack", description:"Attack enemy", parameters:{type:"object", properties:{unit:{type:"string"}, target:{type:"string"}}}}},
{type:"function", function:{name:"retreat", description:"Pull back unit", parameters:{type:"object", properties:{unit:{type:"string"}}}}},
{type:"function", function:{name:"pass", description:"Do nothing this turn", parameters:{type:"object", properties:{}}}}
];
// Max params = 2 β only 3 heads spawned
5. Query Design
Principles
- Be imperative β tell the model what to decide, not just describe state
- Include decision context β "Ball is BELOW paddle, intercept it" not "Ball y=250"
- List valid options β "Choose: up/down/stay"
- Keep it short β shorter query = faster prefill
Good vs Bad
β
"Ball 50px BELOW paddle, approaching fast. Move DOWN to intercept. Choose: up/down/stay"
β "Ball position: 250, Paddle position: 200. What should I do?"
β
"Red gear at (300,150,50). Move arm there slowly for pickup."
β "There is a gear somewhere on the table. The arm needs to go to it."
β
"Stream starting, viewers saying hello. Greet them warmly."
β "Viewers are in the chat. Do something appropriate."
Environment & History
// Environment: current state as key=value strings
const env = [
`ball_y=${ballY}`,
`paddle_y=${paddleY}`,
`gap=${gap}`,
`approaching=true`
];
// History: recent actions (max 6, server trims automatically)
const history = [
"move(up)", "move(up)", "stay()"
];
Domain System Prompts (v2)
For v2 server, set a domain-specific system prompt:
// Game AI
const SYSTEM = "You are the AI controller for a Pong game. Move the paddle to intercept the ball. React quickly.";
// Robotic arm
const SYSTEM = "You are the voice controller for a 6-axis robotic arm. Convert commands to precise function calls. Coordinates in mm.";
// Digital human
const SYSTEM = "You are the animation controller for a virtual streamer. Convert director instructions to expression and speech calls.";
6. Frontend Code Standards
Required: Type-Safe Value Extraction
// Values in args may be int, not string β always coerce
function safeStr(v) {
if (v === null || v === undefined) return '';
return String(v).trim().toLowerCase();
}
// Extract with args (named) first, heads (positional) as fallback
let direction = safeStr(d.args?.direction) || safeStr(d.heads?.arg1);
Required: Validate Return Values
const VALID = ['up', 'down', 'stay'];
if (!VALID.includes(direction)) {
console.warn(`Invalid: "${direction}", fallback to stay`);
direction = 'stay';
}
Required: Error Handling with Fallback
async function callAI() {
try {
const r = await fetch(SERVER_URL + '/v1/function_call', {
method: 'POST',
headers: {'Content-Type': 'application/json'},
body: JSON.stringify(request)
});
const data = await r.json();
if (!data.success) throw new Error(data.error);
applyAction(data);
} catch (e) {
console.error('[AI] Failed:', e);
applyFallbackAI(); // MUST have fallback β never freeze the app
}
}
Required: Logging
console.log(`[Game] Query: ${query}`);
console.log(`[Game] β ${data.function}(${JSON.stringify(data.args)}) ${data.latency_ms.toFixed(0)}ms`);
Recommended: Debug UI Overlay
Show in a corner of your app: current query, raw response, latency (current + rolling average).
7. Game Loop Pattern
Decouple AI from rendering. The AI loop runs at 10β16 Hz; the render loop runs at 60 fps.
const AI_INTERVAL = 100; // 100ms = 10 Hz
let aiPending = false;
// Render loop (60fps) β never blocks on AI
function gameLoop() {
update();
render();
requestAnimationFrame(gameLoop);
}
// AI loop (async, non-blocking)
async function aiLoop() {
if (aiPending) return;
aiPending = true;
await callAI();
aiPending = false;
}
setInterval(aiLoop, AI_INTERVAL);
gameLoop();
8. FCClient Template
Drop-in client class for any HTML/JS application:
class FCClient {
constructor(url = 'http://localhost:8899') {
this.url = url.replace(/\/$/, '');
}
async health() {
try {
const r = await fetch(`${this.url}/health`, {signal: AbortSignal.timeout(3000)});
const d = await r.json();
return {ok: d.loaded === true || d.status === 'ok', version: d.version};
} catch (e) {
return {ok: false};
}
}
async call({query, tools, system, env, history, includeContent = false}) {
const t0 = performance.now();
try {
const r = await fetch(`${this.url}/v1/function_call`, {
method: 'POST',
headers: {'Content-Type': 'application/json'},
body: JSON.stringify({
messages: [{role: 'user', content: query}],
tools,
system, // v2: domain system prompt
environment: env,
history,
include_content_head: includeContent
})
});
const d = await r.json();
return {...d, wall_ms: performance.now() - t0};
} catch (e) {
return {success: false, error: e.message, wall_ms: performance.now() - t0};
}
}
}
Usage:
const ai = new FCClient('http://localhost:8899');
const result = await ai.call({
query: "Ball is BELOW. Move down. Choose: up/down/stay",
tools: TOOLS,
system: "You are a Pong AI. Move paddle to intercept ball.",
env: ["ball_y=300", "paddle_y=200", "gap=100"],
history: ["move(down)", "move(down)"]
});
if (result.success) {
console.log(`${result.function}(${JSON.stringify(result.args)}) in ${result.latency_ms}ms`);
}
9. Troubleshooting
| Symptom | Cause | Fix |
|---|---|---|
| AI stuck / no movement | Query too vague | Add decision hints: "Move DOWN to intercept" |
.trim is not a function |
args values may be int |
Use String(v) before .trim() |
| High latency (>100ms) | Too many heads / long query | Reduce tool params, shorten query/env |
| Wrong function called | Ambiguous tool descriptions | Add enum, improve description fields |
| `< | null | >` in all args |
Skill Version: 2.0 β Supports v1/v2 server, multi-domain (game, robotics, avatar)
Last Updated: 2026-03