Instructions to use stanfordnlp/backpack-gpt2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use stanfordnlp/backpack-gpt2 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="stanfordnlp/backpack-gpt2", trust_remote_code=True)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("stanfordnlp/backpack-gpt2", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("stanfordnlp/backpack-gpt2", trust_remote_code=True)

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use stanfordnlp/backpack-gpt2 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "stanfordnlp/backpack-gpt2"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "stanfordnlp/backpack-gpt2",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/stanfordnlp/backpack-gpt2

SGLang

How to use stanfordnlp/backpack-gpt2 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "stanfordnlp/backpack-gpt2" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "stanfordnlp/backpack-gpt2",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "stanfordnlp/backpack-gpt2" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "stanfordnlp/backpack-gpt2",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use stanfordnlp/backpack-gpt2 with Docker Model Runner:
```
docker model run hf.co/stanfordnlp/backpack-gpt2
```

support_generation

by shashwat1002 - opened Aug 7, 2023

base: refs/heads/main

←

from: refs/pr/3

Discussion Files changed

+42

-2

support generation on backpacks by overloading prepare_inputs_for_generation.742a8e3f

shashwat1002

Aug 7, 2023

No description provided.

shashwat1002

Aug 7, 2023

This will support the generate function on the LM head model. Thereby also supporting the generation pipeline

shashwat1002 changed pull request status to open Aug 7, 2023

johnhew

Stanford NLP org Aug 14, 2023

Thanks for working on this! As far as I can tell, all the kwargs stuff that gets built in prepare_inputs_for_generation doesn't actually get used by the Backpack anywhere. I believe some changes need to be made for the kwargs to actually get passed by the Backpack down to the underlying Transformer.

shashwat1002

Aug 16, 2023

Hi @johnhew

While that is true, the function has to be overridden for huggingface to consider that generation is supported.
So the code as per this branch is simply equivalent in capability, except it also happens to support the generation pipeline.

(this would also fix the demo)

It is my intention to implement passing of attention masks in actuality to the underlying model later.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Ready to merge

This branch is ready to get merged automatically.

· Sign up or log in to comment