Text Generation
Transformers
PyTorch
English
llama
upstage
instruct
instruction
text-generation-inference
Instructions to use upstage/llama-65b-instruct with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use upstage/llama-65b-instruct with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="upstage/llama-65b-instruct")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("upstage/llama-65b-instruct") model = AutoModelForCausalLM.from_pretrained("upstage/llama-65b-instruct") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use upstage/llama-65b-instruct with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "upstage/llama-65b-instruct" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "upstage/llama-65b-instruct", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/upstage/llama-65b-instruct
- SGLang
How to use upstage/llama-65b-instruct with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "upstage/llama-65b-instruct" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "upstage/llama-65b-instruct", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "upstage/llama-65b-instruct" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "upstage/llama-65b-instruct", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use upstage/llama-65b-instruct with Docker Model Runner:
docker model run hf.co/upstage/llama-65b-instruct
Commit History
Update README.md f70a986
Update README.md 7b09b8a
Update README.md 30f6a01
Update README.md c89854f
Update README.md 63f82b2
Update README.md 0dcf44f
Update README.md e62bdde
Update README.md c37d841
Update README.md debb870
Update README.md d4b0c81
Update README.md a36a7e4
Update README.md bd6f760
Update README.md 2e41dd1
Update README.md b956688
Update README.md 294d3ff
Update config.json bfae95b
Update README.md 88f2f4f
Update README.md 6380f5a
Update README.md 0b47d42
Update README.md 1aadd7b
Update README.md 9d66f79
Update README.md 45ea6a1
Update README.md 79ec416
Upload folder using huggingface_hub (#1) 818def1
Wonho Song commited on
Create README.md 521a03d
Wonho Song commited on