Instructions to use OrionStarAI/Orion-14B-Chat-RAG with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use OrionStarAI/Orion-14B-Chat-RAG with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="OrionStarAI/Orion-14B-Chat-RAG", trust_remote_code=True)# Load model directly from transformers import AutoModelForCausalLM model = AutoModelForCausalLM.from_pretrained("OrionStarAI/Orion-14B-Chat-RAG", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use OrionStarAI/Orion-14B-Chat-RAG with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "OrionStarAI/Orion-14B-Chat-RAG" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "OrionStarAI/Orion-14B-Chat-RAG", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/OrionStarAI/Orion-14B-Chat-RAG
- SGLang
How to use OrionStarAI/Orion-14B-Chat-RAG with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "OrionStarAI/Orion-14B-Chat-RAG" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "OrionStarAI/Orion-14B-Chat-RAG", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "OrionStarAI/Orion-14B-Chat-RAG" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "OrionStarAI/Orion-14B-Chat-RAG", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use OrionStarAI/Orion-14B-Chat-RAG with Docker Model Runner:
docker model run hf.co/OrionStarAI/Orion-14B-Chat-RAG
Commit History
Update README_ja.md 5829290 verified
Update README_zh.md 78494f0 verified
Update README.md 4c0fa33 verified
Update Wechat group QR code. 6cb8d2c verified
Update tokenization_orion.py f9cd51e verified
Update README_zh.md 7439953 verified
Update README.md 87ffe08 verified
Update README_ko.md 66ff7b2 verified
Update README_ja.md 7b33ad4 verified
Update README_zh.md c0dd85a verified
Update README.md 2a5b79b verified
Upload 2 files cdbd457 verified
Update README.md 3883823 verified
Update README_ja.md c1efd8b verified
Update README_ko.md 9e82f6a verified
Update README_zh.md 25ff310 verified
Update README_zh.md c77bd16 verified
Update README_ko.md c71e05b verified
Update README_ja.md b04c9f9 verified
Update README.md c32f362 verified
Upload 2 files 8dd6f61 verified
Update README_zh.md eba2e20 verified
Update README_ko.md ea4e761 verified
Update README_ja.md 6ec088a verified
Update README.md 0e37fc1 verified
Update README_ko.md 984a1d3 verified
Update README.md 4f29245 verified
Upload 3 files f6df5bc verified
Update README.md f3f9616 verified
Delete README_cn.md dd21e4b verified
update readme 97f2b3b
Du Chen commited on
update readme 48fb4f3
Du Chen commited on
update readme c82293b
Du Chen commited on
update readme 23cd9dc
Du Chen commited on
update readme cbd5ffb
Du Chen commited on
update special_tokens_map.json 2e76651
Du Chen commited on
update readme ebd5318
Du Chen commited on
initial commit 260bb10
Du Chen commited on