Instructions to use upstage/llama-30b-instruct with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use upstage/llama-30b-instruct with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="upstage/llama-30b-instruct")

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("upstage/llama-30b-instruct")
model = AutoModelForCausalLM.from_pretrained("upstage/llama-30b-instruct")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use upstage/llama-30b-instruct with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "upstage/llama-30b-instruct"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "upstage/llama-30b-instruct",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/upstage/llama-30b-instruct

SGLang

How to use upstage/llama-30b-instruct with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "upstage/llama-30b-instruct" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "upstage/llama-30b-instruct",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "upstage/llama-30b-instruct" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "upstage/llama-30b-instruct",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use upstage/llama-30b-instruct with Docker Model Runner:
```
docker model run hf.co/upstage/llama-30b-instruct
```

llama-30b-instruct

Commit History

Update README.md

7fa861f

Limerobot commited on Aug 3, 2023

Update README.md

23d935d

wonhosong commited on Aug 3, 2023

Update README.md

595c81f

Chanjun commited on Aug 1, 2023

Update README.md

a0294c6

wonhosong commited on Aug 1, 2023

Update README.md

db05153

wonhosong commited on Aug 1, 2023

Update README.md

b848b9f

wonhosong commited on Aug 1, 2023

Update README.md

f254d62

Chanjun commited on Aug 1, 2023

Update README.md

99f955b

Chanjun commited on Aug 1, 2023

Update README.md

0303f81

Chanjun commited on Aug 1, 2023

Update README.md

07f0654

Chanjun commited on Aug 1, 2023

Update README.md

bf5e8e3

wonhosong commited on Aug 1, 2023

Update README.md

a777934

Chanjun commited on Jul 25, 2023

Update config.json

7eb0c58

wonhosong commited on Jul 25, 2023

Update README.md

05c0776

wonhosong commited on Jul 22, 2023

Update README.md

381049f

Chanjun commited on Jul 22, 2023

Update README.md

2c170b3

Chanjun commited on Jul 22, 2023

Update README.md

f03e6dd

Chanjun commited on Jul 22, 2023