Instructions to use BrinqAI/functiongemma-270m-physical-ai with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use BrinqAI/functiongemma-270m-physical-ai with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="BrinqAI/functiongemma-270m-physical-ai", filename="functiongemma-physical-ai-Q4_K_M.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": "What is the capital of France?" } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use BrinqAI/functiongemma-270m-physical-ai with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf BrinqAI/functiongemma-270m-physical-ai:Q4_K_M # Run inference directly in the terminal: llama-cli -hf BrinqAI/functiongemma-270m-physical-ai:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf BrinqAI/functiongemma-270m-physical-ai:Q4_K_M # Run inference directly in the terminal: llama-cli -hf BrinqAI/functiongemma-270m-physical-ai:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf BrinqAI/functiongemma-270m-physical-ai:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf BrinqAI/functiongemma-270m-physical-ai:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf BrinqAI/functiongemma-270m-physical-ai:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf BrinqAI/functiongemma-270m-physical-ai:Q4_K_M
Use Docker
docker model run hf.co/BrinqAI/functiongemma-270m-physical-ai:Q4_K_M
- LM Studio
- Jan
- vLLM
How to use BrinqAI/functiongemma-270m-physical-ai with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "BrinqAI/functiongemma-270m-physical-ai" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "BrinqAI/functiongemma-270m-physical-ai", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/BrinqAI/functiongemma-270m-physical-ai:Q4_K_M
- Ollama
How to use BrinqAI/functiongemma-270m-physical-ai with Ollama:
ollama run hf.co/BrinqAI/functiongemma-270m-physical-ai:Q4_K_M
- Unsloth Studio new
How to use BrinqAI/functiongemma-270m-physical-ai with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for BrinqAI/functiongemma-270m-physical-ai to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for BrinqAI/functiongemma-270m-physical-ai to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for BrinqAI/functiongemma-270m-physical-ai to start chatting
- Pi new
How to use BrinqAI/functiongemma-270m-physical-ai with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf BrinqAI/functiongemma-270m-physical-ai:Q4_K_M
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "BrinqAI/functiongemma-270m-physical-ai:Q4_K_M" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use BrinqAI/functiongemma-270m-physical-ai with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf BrinqAI/functiongemma-270m-physical-ai:Q4_K_M
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default BrinqAI/functiongemma-270m-physical-ai:Q4_K_M
Run Hermes
hermes
- Docker Model Runner
How to use BrinqAI/functiongemma-270m-physical-ai with Docker Model Runner:
docker model run hf.co/BrinqAI/functiongemma-270m-physical-ai:Q4_K_M
- Lemonade
How to use BrinqAI/functiongemma-270m-physical-ai with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull BrinqAI/functiongemma-270m-physical-ai:Q4_K_M
Run and chat with the model
lemonade run user.functiongemma-270m-physical-ai-Q4_K_M
List all available models
lemonade list
Install from WinGet (Windows)
winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf BrinqAI/functiongemma-270m-physical-ai:Q4_K_M# Run inference directly in the terminal:
llama-cli -hf BrinqAI/functiongemma-270m-physical-ai:Q4_K_MUse pre-built binary
# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf BrinqAI/functiongemma-270m-physical-ai:Q4_K_M# Run inference directly in the terminal:
./llama-cli -hf BrinqAI/functiongemma-270m-physical-ai:Q4_K_MBuild from source code
git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf BrinqAI/functiongemma-270m-physical-ai:Q4_K_M# Run inference directly in the terminal:
./build/bin/llama-cli -hf BrinqAI/functiongemma-270m-physical-ai:Q4_K_MUse Docker
docker model run hf.co/BrinqAI/functiongemma-270m-physical-ai:Q4_K_MFunctionGemma 270M — Physical AI (v10, Octopus v2)
Fine-tuned google/functiongemma-270m-it
for voice-controlled physical-AI / household-IoT actions on a Synaptics
SL2619 "Coral" edge board (Google IO 2026 demo).
Current revision: functiongemma-physical-ai-v10-Q5_K_M.gguf
— 6 tools, ~248 MB Q5_K_M, ~0.48 s cold prefill on the 2-core
Cortex-A55, 97.9 % mean token accuracy on eval.
Schema ships as tools.json. Token-to-tool mapping is
in token_map.json.
Tool surface (6 tools)
| Token | Name | Args | Purpose |
|---|---|---|---|
<tool_0> |
set_lights |
color?, effect?, state? |
Drive whatever lights are connected — HAT 3-LED indicators or a WLED-driven addressable strip / ring. All three args optional; the model emits only what the user implied. |
<tool_1> |
play_buzzer |
pattern |
Named pattern on the piezo buzzer: beep, double_beep, chirp, siren, alarm, success, error. |
<tool_2> |
set_alarm |
duration or time, label? |
Schedule an alarm. Fires the buzzer plus a visible flash. |
<tool_3> |
cancel_alarm |
label? |
Cancel one alarm by label, or all if no label given. |
<tool_4> |
get_system_status |
metric |
cpu, memory, temperature, npu, or all. |
<tool_5> |
respond |
message |
Natural-language reply when no physical-action tool fits, or when the request is ambiguous and the model needs to ask for clarification. |
The model is hardware-agnostic for lighting: it parses user intent
into semantic args (color, effect, state) and leaves the dispatcher
to map those onto whatever LED hardware is detected at launch — the
HAT's three indicator LEDs, a WLED-driven strip, or a Neopixel ring. The
user vocabulary is hardware-agnostic too: "lights", "LEDs", "strip",
"indicators" all refer to whatever is wired up.
Prompt format
The v10 model is trained Octopus v2 style: no schema, no tools list, just a bare user turn.
<start_of_turn>user
{user_text}<end_of_turn>
<start_of_turn>model
Tool semantics live in the model weights (via the special functional
tokens <tool_0> … <tool_5> plus <end>), not in the prompt. The
tools.json schema in this repo is the dispatcher's arg-validation
contract and is embedded in the GGUF metadata for schema-drift checks,
but it is not loaded into the inference prompt. Typical prompts are
~13 tokens.
Output format — functional tokens, named args
Tool calls emit as functional tokens with named arguments, per the
Mercedes-Benz Octopus v2 convention
(arXiv 2501.02342). Each tool name
compiles to a single special-vocabulary token (<tool_0> … <tool_5>);
arguments are written as name="value" pairs; a single <end> token
terminates the call. The model emits only the args the user implied
— absent args are simply not present.
Examples:
| User says | Model emits | Resolves to |
|---|---|---|
turn the lights red |
<tool_0>(color="red")<end> |
set_lights(color="red") |
rainbow on the strip |
<tool_0>(effect="rainbow")<end> |
set_lights(effect="rainbow") |
lights off |
<tool_0>(state="off")<end> |
set_lights(state="off") |
red sparkle |
<tool_0>(color="red", effect="sparkle")<end> |
set_lights(color="red", effect="sparkle") |
set an alarm in 5 minutes |
<tool_2>(duration="5 minutes")<end> |
set_alarm(duration="5 minutes") |
cancel all alarms |
<tool_3>()<end> |
cancel_alarm() |
what's the cpu |
<tool_4>(metric="cpu")<end> |
get_system_status(metric="cpu") |
good morning |
<tool_5>(message="Good morning. ...")<end> |
respond(message="...") |
A complete call decodes in roughly 8–20 output tokens, well inside the sub-second voice-UX budget on a 2-core Cortex-A55.
⚠️ Inference servers MUST stop generation on
<end_of_turn>(or<eos>), NOT on<end>. The model can emit multi-tool sequences<tool_A>(args)<end><tool_B>(args)<end>, so stopping at the first<end>truncates legitimate multi-tool output.
Quick start (Ollama)
hf download BrinqAI/functiongemma-270m-physical-ai \
functiongemma-physical-ai-v10-Q5_K_M.gguf Modelfile tools.json token_map.json \
--local-dir ./fg-physical-ai
cd fg-physical-ai
ollama create functiongemma-physical-ai -f Modelfile
The shipped Modelfile bakes in the stop tokens (<end_of_turn>,
<eos>) and decode parameters (temperature=0, num_ctx=1024,
num_predict=80).
Calling the model
Send a bare user turn — no schema, no tools list. With Ollama, use
raw=true:
import json
import re
import urllib.request
OLLAMA_URL = "http://localhost:11434"
MODEL = "functiongemma-physical-ai"
reverse_token_map = json.load(open("token_map.json"))["reverse"]
NAMED_ARG_RE = re.compile(r'(\w+)\s*=\s*"((?:[^"\\]|\\.)*)"')
def build_prompt(user_text: str) -> str:
return (
f"<start_of_turn>user\n{user_text}<end_of_turn>\n"
f"<start_of_turn>model\n"
)
def call_model(user_text: str) -> str:
body = json.dumps({
"model": MODEL,
"prompt": build_prompt(user_text),
"raw": True,
"stream": False,
"options": {
"temperature": 0.0,
"top_p": 1.0,
"num_predict": 80,
"stop": ["<end_of_turn>", "<eos>"],
},
}).encode()
req = urllib.request.Request(
f"{OLLAMA_URL}/api/generate",
data=body,
headers={"Content-Type": "application/json"},
)
with urllib.request.urlopen(req, timeout=60) as resp:
return json.loads(resp.read())["response"]
def parse_call(raw: str) -> tuple[str | None, dict[str, str]]:
"""Return (tool_name, kwargs). tool_name is None on parse fail."""
m = re.match(r"\s*(<tool_\d+>)\((.*?)\)<end>", raw)
if not m:
return None, {}
tok, body = m.group(1), m.group(2)
kwargs = {k: v for k, v in NAMED_ARG_RE.findall(body)}
return reverse_token_map.get(tok), kwargs
raw = call_model("turn the lights red")
print(raw) # e.g. '<tool_0>(color="red")<end>'
print(parse_call(raw)) # ('set_lights', {'color': 'red'})
For llama-cpp-python directly, use detokenize(..., special=True) so
the <tool_N> and <end> tokens render in the output instead of being
stripped.
Training data
Training data was generated from Haiku-authored phrasing templates
crossed with deterministic entity pools, then lightly augmented with
Moonshine-flavored ASR noise (dropped function words, lowercased traces,
filler-word prepends). Each record is a flat {input, output} pair —
no tools / messages array, no chat template.
| Train rows | 5,222 |
| Eval rows | 920 |
| Tools | 6 |
| Per-template entity expansion | color × effect × state pools for set_lights; pattern pool for play_buzzer; duration / time pools for set_alarm; metric pool for get_system_status |
| ASR-style augmentation | Moonshine-sim noise on a fraction of records (dropped articles, lowercased traces, filler prepends) |
| Multi-tool fraction | None — single-tool emphasis; multi-tool routines composed at dispatch time |
The set_lights tool also gets explicit failure-mode rows that
route bare ambiguous prompts to respond() — e.g. "rainbow" alone
("Did you mean the lights? Try 'rainbow on the lights'."), "siren" alone
(prompts the user toward play_buzzer), and bare "on" / "off"
(asks what the user wants to act on).
Methodology
- Full bf16 fine-tune (no LoRA).
- Functional tokens:
<tool_0>…<tool_5>+<end>added asadditional_special_tokens; new embeddings mean-initialized from the existing input-embedding matrix (random init under-converges on small datasets at this scale). - Completion-only loss mask: hand-rolled — labels before
<start_of_turn>model\nare masked to-100. The model learns only from the assistant turn, not the user prompt. - 5 epochs, lr
3e-5, cosine schedule, 0.1 warmup, weight decay 0.01. - Effective batch = 16
(
per_device_train_batch_size=8 × gradient_accumulation_steps=2). max_length=256— the trained prompt format is ~13 tokens and the assistant turn fits comfortably under 64 tokens, includingrespond()messages.- bf16, gradient checkpointing,
adamw_torch_fused,metric_for_best_model="eval_loss"+load_best_model_at_end=True. - Training wallclock: 5 min on a single H100 (~15–20 min on a 4090).
Citation
@article{chen2024octopusv2,
title = {Octopus v2: On-device language model for super agent},
author = {Chen, Wei and Li, Zhiyuan},
journal = {arXiv preprint arXiv:2404.01744},
year = {2024},
url = {https://arxiv.org/abs/2404.01744}
}
@article{merc2025octopusv2,
title = {Octopus v2 named-arg function calling},
journal = {arXiv preprint arXiv:2501.02342},
year = {2025},
url = {https://arxiv.org/abs/2501.02342}
}
Results
Training metrics (final epoch)
| Final train loss | 0.493 |
| Final eval loss | 0.046 |
| Mean token accuracy (eval) | 97.9 % |
Held-out smoke test (post-train, 36 prompts spanning all 6 tools)
| Smoke-test routing accuracy | 35 / 36 (97.2 %) |
The 36-prompt suite covers single-tool happy paths for every tool plus
failure modes the model is expected to deflect: ambiguous color words
without a target ("make it red"), effect names without a target
("rainbow"), unsupported features ("play a tone at 2000 hz"), and
out-of-scope appliances. Failure-mode prompts all route to respond()
with a helpful clarification message.
On-device benchmark (Coralboard, 2-core Cortex-A55 @ 2 GHz, Q5_K_M GGUF)
Measured with llama-cpp-python 0.3.16, n_ctx=1024, n_threads=2,
CPU governor performance, 8 representative prompts spanning all 6
tools.
| Model load | 2.23 s |
| Prompt tokens | 11–16 (mean ~13) |
| Cold prefill (turn 1) | 0.48 s |
| Warm prefill (turn 2+, avg) | 0.47 s |
| Decode rate | ~9.7 tok/s |
| Decode time, typical tool call (3–8 output tokens) | 0.3–0.8 s |
Decode time, respond() (~25 output tokens) |
~2.6 s |
| End-to-end first turn (model load + prefill + decode) | ~3.4 s |
Files
functiongemma-physical-ai-v10-Q5_K_M.gguf # ~248 MB, Q5_K_M weights (Ollama / llama.cpp)
Modelfile # Ollama Modelfile (functional-token format)
tools.json # 6-tool schema, canonical mobile-actions format
token_map.json # functional-token <-> tool-name map
README.md # this file
Earlier checkpoint GGUFs from the project's development history
(functiongemma-physical-ai-v9-Q5_K_M.gguf,
functiongemma-physical-ai-v7-Q5_K_M.gguf,
functiongemma-physical-ai-v6-Q5_K_M.gguf,
functiongemma-physical-ai-Q4_K_M.gguf) remain in the repo for
reproducibility. They use different tool surfaces and (for v7 and
earlier) a different inference-prompt format; new deployments should use
the v10 file above.
License
Released under the Gemma Terms of Use.
By using this model you agree to those terms. Base model:
google/functiongemma-270m-it.
Links
- Base model: https://huggingface.co/google/functiongemma-270m-it
- Octopus v2 paper: https://arxiv.org/abs/2404.01744
- Mercedes-Benz Octopus v2 (named-arg variant): https://arxiv.org/abs/2501.02342
- Hardware demo + integration code (Synaptics Coralboard, Grinn HAT,
WLED-over-USB-CDC, full PyQt UI):
https://github.com/synaptics-astra-demos/sl2610-examples →
Function_calling/
- Downloads last month
- 235
4-bit
5-bit
Model tree for BrinqAI/functiongemma-270m-physical-ai
Base model
google/functiongemma-270m-it
Install from brew
# Start a local OpenAI-compatible server with a web UI: llama-server -hf BrinqAI/functiongemma-270m-physical-ai:Q4_K_M# Run inference directly in the terminal: llama-cli -hf BrinqAI/functiongemma-270m-physical-ai:Q4_K_M