Instructions to use NexaAI/Octopus-v2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use NexaAI/Octopus-v2 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="NexaAI/Octopus-v2")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("NexaAI/Octopus-v2")
model = AutoModelForCausalLM.from_pretrained("NexaAI/Octopus-v2")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Inference
Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use NexaAI/Octopus-v2 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "NexaAI/Octopus-v2"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "NexaAI/Octopus-v2",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/NexaAI/Octopus-v2

SGLang

How to use NexaAI/Octopus-v2 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "NexaAI/Octopus-v2" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "NexaAI/Octopus-v2",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "NexaAI/Octopus-v2" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "NexaAI/Octopus-v2",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use NexaAI/Octopus-v2 with Docker Model Runner:
```
docker model run hf.co/NexaAI/Octopus-v2
```

nexa4ai commited on Apr 18, 2024

Commit

77bd3f5

verified ·

1 Parent(s): 0f44101

Update README.md

Browse files

Files changed (1) hide show

README.md +13 -2

README.md CHANGED Viewed

@@ -16,8 +16,19 @@ language:
 ---
 # Octopus V2: On-device language model for super agent
-**Octopus v3 is out now! Refer to our [latest twitter](https://twitter.com/nexa4ai/status/1780783383737676236)**
 We are a very small team with many work. Please give us more time to prepare the code, and we will **open source** it. We hope Octopus v2 model will be helpful for you. Let's democratize AI agents for everyone. We've received many requests from car industry, health care, financial system etc. Octopus model is able to be applied to **any function**, and you can start to think about it now.
 <p align="center">
 - <a href="https://www.nexa4ai.com/" target="_blank">Nexa AI Product</a>
@@ -29,7 +40,7 @@ We are a very small team with many work. Please give us more time to prepare the
   <a><img src="Octopus-logo.jpeg" alt="nexa-octopus" style="width: 40%; min-width: 300px; display: block; margin: auto;"></a>
 </p>
-## Introducing Octopus-V2-2B
 Octopus-V2-2B, an advanced open-source language model with 2 billion parameters, represents Nexa AI's research breakthrough in the application of large language models (LLMs) for function calling, specifically tailored for Android APIs. Unlike Retrieval-Augmented Generation (RAG) methods, which require detailed descriptions of potential function arguments—sometimes needing up to tens of thousands of input tokens—Octopus-V2-2B introduces a unique **functional token** strategy for both its training and inference stages. This approach not only allows it to achieve performance levels comparable to GPT-4 but also significantly enhances its inference speed beyond that of RAG-based methods, making it especially beneficial for edge computing devices.

 ---
 # Octopus V2: On-device language model for super agent
+## Octopus V3 Release
+We are excited to announce that Octopus v3 is now available! check our [technical report](https://lnkd.in/dxk54W5r) and [Octopus V3 tweet](https://twitter.com/nexa4ai/status/1780783383737676236)!
+Key Features of Octopus v3:
+- **Efficiency**: **Sub-billion** parameters, making it less than half the size of its predecessor, Octopus v2.
+- **Multi-Modal Capabilities**: Proceed both text and images inputs.
+- **Speed and Accuracy**: Incorporate our **patented** functional token technology, achieving function calling accuracy on par with GPT-4V and GPT-4.
+- **Multilingual Support**: Simultaneous support for English and Mandarin.
+Check the Octopus V3 demo video for [Android and iOS](https://octopus3.nexa4ai.com/).
+## Octopus V2
 We are a very small team with many work. Please give us more time to prepare the code, and we will **open source** it. We hope Octopus v2 model will be helpful for you. Let's democratize AI agents for everyone. We've received many requests from car industry, health care, financial system etc. Octopus model is able to be applied to **any function**, and you can start to think about it now.
 <p align="center">
 - <a href="https://www.nexa4ai.com/" target="_blank">Nexa AI Product</a>
   <a><img src="Octopus-logo.jpeg" alt="nexa-octopus" style="width: 40%; min-width: 300px; display: block; margin: auto;"></a>
 </p>
+## Introduction
 Octopus-V2-2B, an advanced open-source language model with 2 billion parameters, represents Nexa AI's research breakthrough in the application of large language models (LLMs) for function calling, specifically tailored for Android APIs. Unlike Retrieval-Augmented Generation (RAG) methods, which require detailed descriptions of potential function arguments—sometimes needing up to tens of thousands of input tokens—Octopus-V2-2B introduces a unique **functional token** strategy for both its training and inference stages. This approach not only allows it to achieve performance levels comparable to GPT-4 but also significantly enhances its inference speed beyond that of RAG-based methods, making it especially beneficial for edge computing devices.