Instructions to use BrinqAI/functiongemma-270m-physical-ai with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use BrinqAI/functiongemma-270m-physical-ai with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="BrinqAI/functiongemma-270m-physical-ai",
	filename="functiongemma-physical-ai-Q4_K_M.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use BrinqAI/functiongemma-270m-physical-ai with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf BrinqAI/functiongemma-270m-physical-ai:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf BrinqAI/functiongemma-270m-physical-ai:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf BrinqAI/functiongemma-270m-physical-ai:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf BrinqAI/functiongemma-270m-physical-ai:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf BrinqAI/functiongemma-270m-physical-ai:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf BrinqAI/functiongemma-270m-physical-ai:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf BrinqAI/functiongemma-270m-physical-ai:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf BrinqAI/functiongemma-270m-physical-ai:Q4_K_M

Use Docker

docker model run hf.co/BrinqAI/functiongemma-270m-physical-ai:Q4_K_M

LM Studio
Jan

vLLM

How to use BrinqAI/functiongemma-270m-physical-ai with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "BrinqAI/functiongemma-270m-physical-ai"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "BrinqAI/functiongemma-270m-physical-ai",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/BrinqAI/functiongemma-270m-physical-ai:Q4_K_M

Ollama
How to use BrinqAI/functiongemma-270m-physical-ai with Ollama:
```
ollama run hf.co/BrinqAI/functiongemma-270m-physical-ai:Q4_K_M
```

Unsloth Studio new

How to use BrinqAI/functiongemma-270m-physical-ai with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for BrinqAI/functiongemma-270m-physical-ai to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for BrinqAI/functiongemma-270m-physical-ai to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for BrinqAI/functiongemma-270m-physical-ai to start chatting

Pi new

How to use BrinqAI/functiongemma-270m-physical-ai with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf BrinqAI/functiongemma-270m-physical-ai:Q4_K_M

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "BrinqAI/functiongemma-270m-physical-ai:Q4_K_M"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use BrinqAI/functiongemma-270m-physical-ai with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf BrinqAI/functiongemma-270m-physical-ai:Q4_K_M

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default BrinqAI/functiongemma-270m-physical-ai:Q4_K_M

Run Hermes

hermes

Docker Model Runner
How to use BrinqAI/functiongemma-270m-physical-ai with Docker Model Runner:
```
docker model run hf.co/BrinqAI/functiongemma-270m-physical-ai:Q4_K_M
```

Lemonade

How to use BrinqAI/functiongemma-270m-physical-ai with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull BrinqAI/functiongemma-270m-physical-ai:Q4_K_M

Run and chat with the model

lemonade run user.functiongemma-270m-physical-ai-Q4_K_M

List all available models

lemonade list

FunctionGemma 270M — Physical AI (v10, Octopus v2)

Fine-tuned google/functiongemma-270m-it for voice-controlled physical-AI / household-IoT actions on a Synaptics SL2619 "Coral" edge board (Google IO 2026 demo).

Current revision: functiongemma-physical-ai-v10-Q5_K_M.gguf — 6 tools, ~248 MB Q5_K_M, ~0.48 s cold prefill on the 2-core Cortex-A55, 97.9 % mean token accuracy on eval.

Schema ships as tools.json. Token-to-tool mapping is in token_map.json.

Tool surface (6 tools)

Token	Name	Args	Purpose
`<tool_0>`	`set_lights`	`color?`, `effect?`, `state?`	Drive whatever lights are connected — HAT 3-LED indicators or a WLED-driven addressable strip / ring. All three args optional; the model emits only what the user implied.
`<tool_1>`	`play_buzzer`	`pattern`	Named pattern on the piezo buzzer: `beep`, `double_beep`, `chirp`, `siren`, `alarm`, `success`, `error`.
`<tool_2>`	`set_alarm`	`duration` or `time`, `label?`	Schedule an alarm. Fires the buzzer plus a visible flash.
`<tool_3>`	`cancel_alarm`	`label?`	Cancel one alarm by label, or all if no label given.
`<tool_4>`	`get_system_status`	`metric`	`cpu`, `memory`, `temperature`, `npu`, or `all`.
`<tool_5>`	`respond`	`message`	Natural-language reply when no physical-action tool fits, or when the request is ambiguous and the model needs to ask for clarification.

The model is hardware-agnostic for lighting: it parses user intent into semantic args (color, effect, state) and leaves the dispatcher to map those onto whatever LED hardware is detected at launch — the HAT's three indicator LEDs, a WLED-driven strip, or a Neopixel ring. The user vocabulary is hardware-agnostic too: "lights", "LEDs", "strip", "indicators" all refer to whatever is wired up.

Prompt format

The v10 model is trained Octopus v2 style: no schema, no tools list, just a bare user turn.

<start_of_turn>user
{user_text}<end_of_turn>
<start_of_turn>model

Tool semantics live in the model weights (via the special functional tokens <tool_0> … <tool_5> plus <end>), not in the prompt. The tools.json schema in this repo is the dispatcher's arg-validation contract and is embedded in the GGUF metadata for schema-drift checks, but it is not loaded into the inference prompt. Typical prompts are ~13 tokens.

Output format — functional tokens, named args

Tool calls emit as functional tokens with named arguments, per the Mercedes-Benz Octopus v2 convention (arXiv 2501.02342). Each tool name compiles to a single special-vocabulary token (<tool_0> … <tool_5>); arguments are written as name="value" pairs; a single <end> token terminates the call. The model emits only the args the user implied — absent args are simply not present.

Examples:

User says	Model emits	Resolves to
`turn the lights red`	`<tool_0>(color="red")<end>`	`set_lights(color="red")`
`rainbow on the strip`	`<tool_0>(effect="rainbow")<end>`	`set_lights(effect="rainbow")`
`lights off`	`<tool_0>(state="off")<end>`	`set_lights(state="off")`
`red sparkle`	`<tool_0>(color="red", effect="sparkle")<end>`	`set_lights(color="red", effect="sparkle")`
`set an alarm in 5 minutes`	`<tool_2>(duration="5 minutes")<end>`	`set_alarm(duration="5 minutes")`
`cancel all alarms`	`<tool_3>()<end>`	`cancel_alarm()`
`what's the cpu`	`<tool_4>(metric="cpu")<end>`	`get_system_status(metric="cpu")`
`good morning`	`<tool_5>(message="Good morning. ...")<end>`	`respond(message="...")`

A complete call decodes in roughly 8–20 output tokens, well inside the sub-second voice-UX budget on a 2-core Cortex-A55.

⚠️ Inference servers MUST stop generation on <end_of_turn> (or <eos>), NOT on <end>. The model can emit multi-tool sequences <tool_A>(args)<end><tool_B>(args)<end>, so stopping at the first <end> truncates legitimate multi-tool output.

Quick start (Ollama)

hf download BrinqAI/functiongemma-270m-physical-ai \
  functiongemma-physical-ai-v10-Q5_K_M.gguf Modelfile tools.json token_map.json \
  --local-dir ./fg-physical-ai

cd fg-physical-ai
ollama create functiongemma-physical-ai -f Modelfile

The shipped Modelfile bakes in the stop tokens (<end_of_turn>, <eos>) and decode parameters (temperature=0, num_ctx=1024, num_predict=80).

Calling the model

Send a bare user turn — no schema, no tools list. With Ollama, use raw=true:

import json
import re
import urllib.request

OLLAMA_URL = "http://localhost:11434"
MODEL = "functiongemma-physical-ai"

reverse_token_map = json.load(open("token_map.json"))["reverse"]

NAMED_ARG_RE = re.compile(r'(\w+)\s*=\s*"((?:[^"\\]|\\.)*)"')


def build_prompt(user_text: str) -> str:
    return (
        f"<start_of_turn>user\n{user_text}<end_of_turn>\n"
        f"<start_of_turn>model\n"
    )


def call_model(user_text: str) -> str:
    body = json.dumps({
        "model": MODEL,
        "prompt": build_prompt(user_text),
        "raw": True,
        "stream": False,
        "options": {
            "temperature": 0.0,
            "top_p": 1.0,
            "num_predict": 80,
            "stop": ["<end_of_turn>", "<eos>"],
        },
    }).encode()
    req = urllib.request.Request(
        f"{OLLAMA_URL}/api/generate",
        data=body,
        headers={"Content-Type": "application/json"},
    )
    with urllib.request.urlopen(req, timeout=60) as resp:
        return json.loads(resp.read())["response"]


def parse_call(raw: str) -> tuple[str | None, dict[str, str]]:
    """Return (tool_name, kwargs). tool_name is None on parse fail."""
    m = re.match(r"\s*(<tool_\d+>)\((.*?)\)<end>", raw)
    if not m:
        return None, {}
    tok, body = m.group(1), m.group(2)
    kwargs = {k: v for k, v in NAMED_ARG_RE.findall(body)}
    return reverse_token_map.get(tok), kwargs


raw = call_model("turn the lights red")
print(raw)               # e.g. '<tool_0>(color="red")<end>'
print(parse_call(raw))   # ('set_lights', {'color': 'red'})

For llama-cpp-python directly, use detokenize(..., special=True) so the <tool_N> and <end> tokens render in the output instead of being stripped.

Training data

Training data was generated from Haiku-authored phrasing templates crossed with deterministic entity pools, then lightly augmented with Moonshine-flavored ASR noise (dropped function words, lowercased traces, filler-word prepends). Each record is a flat {input, output} pair — no tools / messages array, no chat template.


Train rows	5,222
Eval rows	920
Tools	6
Per-template entity expansion	color × effect × state pools for `set_lights`; pattern pool for `play_buzzer`; duration / time pools for `set_alarm`; metric pool for `get_system_status`
ASR-style augmentation	Moonshine-sim noise on a fraction of records (dropped articles, lowercased traces, filler prepends)
Multi-tool fraction	None — single-tool emphasis; multi-tool routines composed at dispatch time

The set_lights tool also gets explicit failure-mode rows that route bare ambiguous prompts to respond() — e.g. "rainbow" alone ("Did you mean the lights? Try 'rainbow on the lights'."), "siren" alone (prompts the user toward play_buzzer), and bare "on" / "off" (asks what the user wants to act on).

Methodology

Full bf16 fine-tune (no LoRA).
Functional tokens: <tool_0> … <tool_5> + <end> added as additional_special_tokens; new embeddings mean-initialized from the existing input-embedding matrix (random init under-converges on small datasets at this scale).
Completion-only loss mask: hand-rolled — labels before <start_of_turn>model\n are masked to -100. The model learns only from the assistant turn, not the user prompt.
5 epochs, lr 3e-5, cosine schedule, 0.1 warmup, weight decay 0.01.
Effective batch = 16 (per_device_train_batch_size=8 × gradient_accumulation_steps=2).
max_length=256 — the trained prompt format is ~13 tokens and the assistant turn fits comfortably under 64 tokens, including respond() messages.
bf16, gradient checkpointing, adamw_torch_fused, metric_for_best_model="eval_loss" + load_best_model_at_end=True.
Training wallclock: 5 min on a single H100 (~15–20 min on a 4090).

Citation

@article{chen2024octopusv2,
  title   = {Octopus v2: On-device language model for super agent},
  author  = {Chen, Wei and Li, Zhiyuan},
  journal = {arXiv preprint arXiv:2404.01744},
  year    = {2024},
  url     = {https://arxiv.org/abs/2404.01744}
}

@article{merc2025octopusv2,
  title   = {Octopus v2 named-arg function calling},
  journal = {arXiv preprint arXiv:2501.02342},
  year    = {2025},
  url     = {https://arxiv.org/abs/2501.02342}
}

Results

Training metrics (final epoch)


Final train loss	0.493
Final eval loss	0.046
Mean token accuracy (eval)	97.9 %

Held-out smoke test (post-train, 36 prompts spanning all 6 tools)


Smoke-test routing accuracy	35 / 36 (97.2 %)

The 36-prompt suite covers single-tool happy paths for every tool plus failure modes the model is expected to deflect: ambiguous color words without a target ("make it red"), effect names without a target ("rainbow"), unsupported features ("play a tone at 2000 hz"), and out-of-scope appliances. Failure-mode prompts all route to respond() with a helpful clarification message.

On-device benchmark (Coralboard, 2-core Cortex-A55 @ 2 GHz, Q5_K_M GGUF)

Measured with llama-cpp-python 0.3.16, n_ctx=1024, n_threads=2, CPU governor performance, 8 representative prompts spanning all 6 tools.


Model load	2.23 s
Prompt tokens	11–16 (mean ~13)
Cold prefill (turn 1)	0.48 s
Warm prefill (turn 2+, avg)	0.47 s
Decode rate	~9.7 tok/s
Decode time, typical tool call (3–8 output tokens)	0.3–0.8 s
Decode time, `respond()` (~25 output tokens)	~2.6 s
End-to-end first turn (model load + prefill + decode)	~3.4 s

Files

functiongemma-physical-ai-v10-Q5_K_M.gguf  # ~248 MB, Q5_K_M weights (Ollama / llama.cpp)
Modelfile                                  # Ollama Modelfile (functional-token format)
tools.json                                 # 6-tool schema, canonical mobile-actions format
token_map.json                             # functional-token <-> tool-name map
README.md                                  # this file

Earlier checkpoint GGUFs from the project's development history (functiongemma-physical-ai-v9-Q5_K_M.gguf, functiongemma-physical-ai-v7-Q5_K_M.gguf, functiongemma-physical-ai-v6-Q5_K_M.gguf, functiongemma-physical-ai-Q4_K_M.gguf) remain in the repo for reproducibility. They use different tool surfaces and (for v7 and earlier) a different inference-prompt format; new deployments should use the v10 file above.

License

Released under the Gemma Terms of Use. By using this model you agree to those terms. Base model: google/functiongemma-270m-it.

Model tree for BrinqAI/functiongemma-270m-physical-ai

Base model

google/functiongemma-270m-it

Quantized

(50)

this model

Papers for BrinqAI/functiongemma-270m-physical-ai

Optimizing Small Language Models for In-Vehicle Function-Calling

Paper • 2501.02342 • Published Jan 4, 2025

Octopus v2: On-device language model for super agent

Paper • 2404.01744 • Published Apr 2, 2024 • 58

BrinqAI
/

functiongemma-270m-physical-ai