Instructions to use Open-Orca/OpenOrcaxOpenChat-Preview2-13B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Open-Orca/OpenOrcaxOpenChat-Preview2-13B with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="Open-Orca/OpenOrcaxOpenChat-Preview2-13B")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("Open-Orca/OpenOrcaxOpenChat-Preview2-13B") model = AutoModelForCausalLM.from_pretrained("Open-Orca/OpenOrcaxOpenChat-Preview2-13B") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use Open-Orca/OpenOrcaxOpenChat-Preview2-13B with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Open-Orca/OpenOrcaxOpenChat-Preview2-13B" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Open-Orca/OpenOrcaxOpenChat-Preview2-13B", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/Open-Orca/OpenOrcaxOpenChat-Preview2-13B
- SGLang
How to use Open-Orca/OpenOrcaxOpenChat-Preview2-13B with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Open-Orca/OpenOrcaxOpenChat-Preview2-13B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Open-Orca/OpenOrcaxOpenChat-Preview2-13B", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Open-Orca/OpenOrcaxOpenChat-Preview2-13B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Open-Orca/OpenOrcaxOpenChat-Preview2-13B", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use Open-Orca/OpenOrcaxOpenChat-Preview2-13B with Docker Model Runner:
docker model run hf.co/Open-Orca/OpenOrcaxOpenChat-Preview2-13B
Can you explain how can we train multi-turn conversation?
I am quite surprised that it's pretty good for chat. Very few models have this capability.
Can you provide more information about training mult-turn conversation since datasets just contains a pair of question-answer information; I am curious that how we can fine tune it for conversational purpose.
The model hasn't been trained on multi-turn chat, so it's shocking. To further train on conversations, I'd recommend using the 6K ShareGPT GPT-4 conversations from OpenChat. You can follow the instructions here https://github.com/imoneoi/openchat/
Thanks for your answer. Very helpful for me. Just one concern that why the model can achieve conversational task when not training on multi-turn chat? Does it assume the previous history is the context?
It may be an emergent understanding based on the combination of the focus on step by step reasoning and the format of the task training. The model demonstrates fairly robust theory of mind, as it is capable of clearly interpreting requests to interact as multiple separate agents in diverse ways in a single prompt. We haven’t tested this exhaustively though.