Instructions to use lightblue/openorca_stx with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use lightblue/openorca_stx with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="lightblue/openorca_stx")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("lightblue/openorca_stx") model = AutoModelForCausalLM.from_pretrained("lightblue/openorca_stx") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use lightblue/openorca_stx with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "lightblue/openorca_stx" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "lightblue/openorca_stx", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/lightblue/openorca_stx
- SGLang
How to use lightblue/openorca_stx with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "lightblue/openorca_stx" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "lightblue/openorca_stx", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "lightblue/openorca_stx" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "lightblue/openorca_stx", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use lightblue/openorca_stx with Docker Model Runner:
docker model run hf.co/lightblue/openorca_stx
Chatting and prompt
Can we use this model for chatting?
If yes, how can we do it?
Do we need add stop string like <|endoftext|> as follows?
Where do we put it?
f"""
Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.
### Instruction:
{instruction}
### Input:
{input}
### Response:
{response}<|endoftext|>
### Input:
{input}
### Response:
"""
It hasn't been trained for chatting, but this might work! But I'd suggest that you use one of our newer chat models (Karasu or Qarasu) instead, as they have been trained explicitly for chatting and just perform a lot better than this model.
Here is our 14B parameter model:
https://huggingface.co/lightblue/qarasu-14B-chat-plus-unleashed
And our 7B parameter model:
https://huggingface.co/lightblue/karasu-7B-chat-plus-unleashed
We'll try that. Thank you!