Instructions to use Open-Orca/OpenOrcaxOpenChat-Preview2-13B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Open-Orca/OpenOrcaxOpenChat-Preview2-13B with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Open-Orca/OpenOrcaxOpenChat-Preview2-13B")

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("Open-Orca/OpenOrcaxOpenChat-Preview2-13B")
model = AutoModelForCausalLM.from_pretrained("Open-Orca/OpenOrcaxOpenChat-Preview2-13B")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use Open-Orca/OpenOrcaxOpenChat-Preview2-13B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Open-Orca/OpenOrcaxOpenChat-Preview2-13B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Open-Orca/OpenOrcaxOpenChat-Preview2-13B",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/Open-Orca/OpenOrcaxOpenChat-Preview2-13B

SGLang

How to use Open-Orca/OpenOrcaxOpenChat-Preview2-13B with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Open-Orca/OpenOrcaxOpenChat-Preview2-13B" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Open-Orca/OpenOrcaxOpenChat-Preview2-13B",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Open-Orca/OpenOrcaxOpenChat-Preview2-13B" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Open-Orca/OpenOrcaxOpenChat-Preview2-13B",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use Open-Orca/OpenOrcaxOpenChat-Preview2-13B with Docker Model Runner:
```
docker model run hf.co/Open-Orca/OpenOrcaxOpenChat-Preview2-13B
```

System Prompt

by jphme - opened Aug 3, 2023

Discussion

jphme

Aug 3, 2023

As orca relies on different system prompts, I was wondering why there is no system prompt in your given prompt Template?

Is that an error? For the webui you mention a "context", where should that be placed in the prompt format?

bleysg

OpenOrca org Aug 4, 2023

You can give the context immediately preceding the first " User: " section. It is analogous to a system prompt.

bleysg changed discussion status to closed Aug 4, 2023

jphme

Aug 5, 2023

•

edited Aug 5, 2023

Hi @bleysg thanks for your answer.
I suppose there should be 2 line breaks after the context like in other similar prompt templates?

It would be nice if you would add an example (string and/or tokenization) for this as you write yourself "The model is heavily conditioned to work using this format only" and I also experienced degradation just due to missing/wrong separators.
Many thanks!

Edit: found that in the example Gradio Space, I guess this is the correct format then, with 2 line breaks as expected? Would make sense to add that to the Model card imho:

messages = BASE_SYSTEM_MESSAGE + system_message.strip() + "\n" +
"\n".join(["\n".join(["User: "+item[0]+"<|end_of_turn|>", "Assistant: "+item[1]+"<|end_of_turn|>"])
for item in history])

Edit2: actually this is just one line break and no free line after the system prpmpt, misred the 2 loops... So a deviation from the prompt format which is used by many other models.

jphme changed discussion status to open Aug 5, 2023

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment