Instructions to use NexaAI/octo-planner-2b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use NexaAI/octo-planner-2b with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="NexaAI/octo-planner-2b")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("NexaAI/octo-planner-2b")
model = AutoModelForCausalLM.from_pretrained("NexaAI/octo-planner-2b")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Inference
Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use NexaAI/octo-planner-2b with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "NexaAI/octo-planner-2b"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "NexaAI/octo-planner-2b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/NexaAI/octo-planner-2b

SGLang

How to use NexaAI/octo-planner-2b with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "NexaAI/octo-planner-2b" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "NexaAI/octo-planner-2b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "NexaAI/octo-planner-2b" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "NexaAI/octo-planner-2b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use NexaAI/octo-planner-2b with Docker Model Runner:
```
docker model run hf.co/NexaAI/octo-planner-2b
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

Octo-planner: On-device Language Model for Planner-Action Agents Framework

We're thrilled to introduce the Octo-planner, the latest breakthrough in on-device language models from Nexa AI. Developed for the Planner-Action Agents Framework, Octo-planner enables rapid and efficient planning without the need for cloud connectivity, this model together with Octopus-V2 can work on edge devices locally to support AI Agent usages.

Key Features of Octo-planner:

Efficient Planning: Utilizes fine-tuned plan model based on Gemma-2b (2.51 billion parameters) for high efficiency and low power consumption.
Agent Framework: Separates planning and action, allowing for specialized optimization and improved scalability.
Enhanced Accuracy: Achieves a planning success rate of 98.1% on benchmark dataset, providing reliable and effective performance.
On-device Operation: Designed for edge devices, ensuring fast response times and enhanced privacy by processing data locally.

Example Usage

Below is a demo of Octo-planner:

Run below code to use Octopus Planner for a given question:

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "NexaAIDev/octo-planner-2b"
model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto", torch_dtype=torch.bfloat16, trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained(model_id)
question = "Find my presentation for tomorrow's meeting, connect to the conference room projector via Bluetooth, increase the screen brightness, take a screenshot of the final summary slide, and email it to all participants"
inputs = f"<|user|>{question}<|end|><|assistant|>"
input_ids = tokenizer(inputs, return_tensors="pt").to(model.device)
outputs = model.generate(
        input_ids=input_ids["input_ids"], 
        max_length=1024,
        do_sample=False)
res = tokenizer.decode(outputs.tolist()[0])
print(f"=== inference result ===\n{res}")

Training Data

We wrote 10 Android API descriptions to used to train the models, see this file for details. Below is one Android API description example

def send_email(recipient, title, content):
    """
    Sends an email to a specified recipient with a given title and content.
    Parameters:
    - recipient (str): The email address of the recipient.
    - title (str): The subject line of the email. This is a brief summary or title of the email's purpose or content.
    - content (str): The main body text of the email. It contains the primary message, information, or content that is intended to be communicated to the recipient.
    """

Contact Us

For support or to provide feedback, please contact us.

License and Citation

Refer to our license page for usage details. Please cite our work using the below reference for any academic or research purposes.

@article{chen2024octoplannerondevicelanguagemodel,
      title={Octo-planner: On-device Language Model for Planner-Action Agents}, 
      author={Wei Chen and Zhiyuan Li and Zhen Guo and Yikang Shen},
      year={2024},
      eprint={2406.18082},
      url={https://arxiv.org/abs/2406.18082}, 
}

We thank the Google Gemma team for their amazing models!

@misc{gemma-2023-open-models,
  author = {{Gemma Team, Google DeepMind}},
  title = {Gemma: Open Models Based on Gemini Research and Technology},
  url = {https://goo.gle/GemmaReport},  
  year = {2023},
}

Downloads last month: 11

Safetensors

Model size

3B params

Tensor type

BF16

Model tree for NexaAI/octo-planner-2b

Quantizations

1 model

Collection including NexaAI/octo-planner-2b

Nexa Models

Collection

Tiny, multimodal on-device models developed by Nexa AI. • 6 items • Updated Nov 25, 2025 • 8

Paper for NexaAI/octo-planner-2b

Octo-planner: On-device Language Model for Planner-Action Agents

Paper • 2406.18082 • Published Jun 26, 2024 • 48