Text Generation
Transformers
PyTorch
English
llama
upstage
instruct
instruction
text-generation-inference
Instructions to use upstage/llama-30b-instruct with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use upstage/llama-30b-instruct with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="upstage/llama-30b-instruct")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("upstage/llama-30b-instruct") model = AutoModelForCausalLM.from_pretrained("upstage/llama-30b-instruct") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use upstage/llama-30b-instruct with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "upstage/llama-30b-instruct" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "upstage/llama-30b-instruct", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/upstage/llama-30b-instruct
- SGLang
How to use upstage/llama-30b-instruct with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "upstage/llama-30b-instruct" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "upstage/llama-30b-instruct", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "upstage/llama-30b-instruct" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "upstage/llama-30b-instruct", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use upstage/llama-30b-instruct with Docker Model Runner:
docker model run hf.co/upstage/llama-30b-instruct
Commit History
Update README.md 23d935d
Update README.md 595c81f
Update README.md a0294c6
Update README.md db05153
Update README.md b848b9f
Update README.md f254d62
Update README.md 99f955b
Update README.md 0303f81
Update README.md 07f0654
Update README.md bf5e8e3
Update README.md a777934
Update config.json 7eb0c58
Update README.md 05c0776
Update README.md 381049f
Update README.md 2c170b3
Update README.md f03e6dd
Update README.md c942338
Update README.md fea4312
Update README.md a78f6f6
Update README.md 750ed19
Update README.md 4e608bd
Update README.md e30a89d
Update README.md adbb941
Update README.md 5bd201c
Update README.md f6b8ea8
Update README.md 63524b6
Update README.md fcf155c
Update README.md da9ceb6
Update README.md 11d7912
Update README.md 00a74a6
Update README.md dfdd22e
Update README.md eda6b10
Update README.md eef8b90
Update README.md ee66058
Update README.md b0943f0
Update README.md a8e7703
Update README.md 92cc156
Update README.md 9b32e55
Wonho Song commited on
Create README.md ddacc12
Wonho Song commited on
Upload folder using huggingface_hub (#1) 4bc08c5
Wonho Song commited on