Instructions to use NexaAI/octo-planner-2b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use NexaAI/octo-planner-2b with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="NexaAI/octo-planner-2b") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("NexaAI/octo-planner-2b") model = AutoModelForCausalLM.from_pretrained("NexaAI/octo-planner-2b") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Inference
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use NexaAI/octo-planner-2b with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "NexaAI/octo-planner-2b" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "NexaAI/octo-planner-2b", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/NexaAI/octo-planner-2b
- SGLang
How to use NexaAI/octo-planner-2b with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "NexaAI/octo-planner-2b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "NexaAI/octo-planner-2b", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "NexaAI/octo-planner-2b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "NexaAI/octo-planner-2b", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use NexaAI/octo-planner-2b with Docker Model Runner:
docker model run hf.co/NexaAI/octo-planner-2b
Octo-planner: On-device Language Model for Planner-Action Agents Framework
We're thrilled to introduce the Octo-planner, the latest breakthrough in on-device language models from Nexa AI. Developed for the Planner-Action Agents Framework, Octo-planner enables rapid and efficient planning without the need for cloud connectivity, this model together with Octopus-V2 can work on edge devices locally to support AI Agent usages.
Key Features of Octo-planner:
- Efficient Planning: Utilizes fine-tuned plan model based on Gemma-2b (2.51 billion parameters) for high efficiency and low power consumption.
- Agent Framework: Separates planning and action, allowing for specialized optimization and improved scalability.
- Enhanced Accuracy: Achieves a planning success rate of 98.1% on benchmark dataset, providing reliable and effective performance.
- On-device Operation: Designed for edge devices, ensuring fast response times and enhanced privacy by processing data locally.
Example Usage
Below is a demo of Octo-planner:
Run below code to use Octopus Planner for a given question:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "NexaAIDev/octo-planner-2b"
model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto", torch_dtype=torch.bfloat16, trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained(model_id)
question = "Find my presentation for tomorrow's meeting, connect to the conference room projector via Bluetooth, increase the screen brightness, take a screenshot of the final summary slide, and email it to all participants"
inputs = f"<|user|>{question}<|end|><|assistant|>"
input_ids = tokenizer(inputs, return_tensors="pt").to(model.device)
outputs = model.generate(
input_ids=input_ids["input_ids"],
max_length=1024,
do_sample=False)
res = tokenizer.decode(outputs.tolist()[0])
print(f"=== inference result ===\n{res}")
Training Data
We wrote 10 Android API descriptions to used to train the models, see this file for details. Below is one Android API description example
def send_email(recipient, title, content):
"""
Sends an email to a specified recipient with a given title and content.
Parameters:
- recipient (str): The email address of the recipient.
- title (str): The subject line of the email. This is a brief summary or title of the email's purpose or content.
- content (str): The main body text of the email. It contains the primary message, information, or content that is intended to be communicated to the recipient.
"""
Contact Us
For support or to provide feedback, please contact us.
License and Citation
Refer to our license page for usage details. Please cite our work using the below reference for any academic or research purposes.
@article{chen2024octoplannerondevicelanguagemodel,
title={Octo-planner: On-device Language Model for Planner-Action Agents},
author={Wei Chen and Zhiyuan Li and Zhen Guo and Yikang Shen},
year={2024},
eprint={2406.18082},
url={https://arxiv.org/abs/2406.18082},
}
We thank the Google Gemma team for their amazing models!
@misc{gemma-2023-open-models,
author = {{Gemma Team, Google DeepMind}},
title = {Gemma: Open Models Based on Gemini Research and Technology},
url = {https://goo.gle/GemmaReport},
year = {2023},
}
- Downloads last month
- 11
