Instructions to use princeton-nlp/Sheared-LLaMA-2.7B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use princeton-nlp/Sheared-LLaMA-2.7B with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="princeton-nlp/Sheared-LLaMA-2.7B")

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("princeton-nlp/Sheared-LLaMA-2.7B")
model = AutoModelForCausalLM.from_pretrained("princeton-nlp/Sheared-LLaMA-2.7B")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use princeton-nlp/Sheared-LLaMA-2.7B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "princeton-nlp/Sheared-LLaMA-2.7B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "princeton-nlp/Sheared-LLaMA-2.7B",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/princeton-nlp/Sheared-LLaMA-2.7B

SGLang

How to use princeton-nlp/Sheared-LLaMA-2.7B with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "princeton-nlp/Sheared-LLaMA-2.7B" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "princeton-nlp/Sheared-LLaMA-2.7B",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "princeton-nlp/Sheared-LLaMA-2.7B" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "princeton-nlp/Sheared-LLaMA-2.7B",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use princeton-nlp/Sheared-LLaMA-2.7B with Docker Model Runner:
```
docker model run hf.co/princeton-nlp/Sheared-LLaMA-2.7B
```

Sheared-LLaMA-2.7B

Commit History

Update README.md

2f157a0
verified

princeton-nlp commited on Jan 23, 2024

Update README.md

92e2b6f

princeton-nlp commited on Dec 4, 2023

Update README.md

e34c414

princeton-nlp commited on Nov 22, 2023

Update config.json

f4fe028

princeton-nlp commited on Nov 20, 2023

Update README.md

53acc81

princeton-nlp commited on Nov 1, 2023

Update README.md

506f950

princeton-nlp commited on Nov 1, 2023

Update README.md

1ec8e05

princeton-nlp commited on Oct 26, 2023

Update README.md

ca6cded

princeton-nlp commited on Oct 11, 2023

Update README.md

e591e7b

princeton-nlp commited on Oct 11, 2023

Update README.md

b4f554d

princeton-nlp commited on Oct 11, 2023

Merge branch 'main' of https://huggingface.co/princeton-nlp/Sheared-LLaMA-2.7B

1634702

xiamengzhou commited on Oct 10, 2023

Update README.md

5760137

princeton-nlp commited on Oct 10, 2023

update

c38d2e6

xiamengzhou commited on Oct 10, 2023

initial commit

9bea1d3

princeton-nlp commited on Oct 10, 2023

Commit History

Update README.md 2f157a0 verified

Update README.md 92e2b6f

Update README.md e34c414

Update config.json f4fe028

Update README.md 53acc81

Update README.md 506f950

Update README.md 1ec8e05

Update README.md ca6cded

Update README.md e591e7b

Update README.md b4f554d

Merge branch 'main' of https://huggingface.co/princeton-nlp/Sheared-LLaMA-2.7B 1634702

Update README.md 5760137

update c38d2e6

initial commit 9bea1d3

Update README.md

2f157a0
verified

Update README.md

92e2b6f

Update README.md

e34c414

Update config.json

f4fe028

Update README.md

53acc81

Update README.md

506f950

Update README.md

1ec8e05

Update README.md

ca6cded

Update README.md

e591e7b

Update README.md

b4f554d

Merge branch 'main' of https://huggingface.co/princeton-nlp/Sheared-LLaMA-2.7B

1634702

Update README.md

5760137

update

c38d2e6

initial commit

9bea1d3