Instructions to use microsoft/Orca-2-13b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use microsoft/Orca-2-13b with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="microsoft/Orca-2-13b")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("microsoft/Orca-2-13b") model = AutoModelForCausalLM.from_pretrained("microsoft/Orca-2-13b") - Inference
- Local Apps Settings
- vLLM
How to use microsoft/Orca-2-13b with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "microsoft/Orca-2-13b" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "microsoft/Orca-2-13b", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/microsoft/Orca-2-13b
- SGLang
How to use microsoft/Orca-2-13b with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "microsoft/Orca-2-13b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "microsoft/Orca-2-13b", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "microsoft/Orca-2-13b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "microsoft/Orca-2-13b", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use microsoft/Orca-2-13b with Docker Model Runner:
docker model run hf.co/microsoft/Orca-2-13b
Corrected readme
The readme claims this model is open source, but the attached license clearly contradicts the open source definition. I'm assuming that this is a mistake, though it's one that's important to correct promptly before people use the model in violation of the actual license.
Note that I'm assuming that the license is authoritative, but an alternative solution would be to change the license to an open source license such as Apache 2.0.
Also would be nice to set the license in metadata so it shows directly on the repo header (pet peeve of mine sorry) cc @osanseviero
Also would be nice to set the license in metadata so it shows directly on the repo header (pet peeve of mine sorry) cc @osanseviero
Done!