Instructions to use openai-community/gpt2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use openai-community/gpt2 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="openai-community/gpt2")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("openai-community/gpt2") model = AutoModelForCausalLM.from_pretrained("openai-community/gpt2") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use openai-community/gpt2 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "openai-community/gpt2" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "openai-community/gpt2", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/openai-community/gpt2
- SGLang
How to use openai-community/gpt2 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "openai-community/gpt2" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "openai-community/gpt2", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "openai-community/gpt2" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "openai-community/gpt2", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use openai-community/gpt2 with Docker Model Runner:
docker model run hf.co/openai-community/gpt2
How to use this onnx model
Ask how to use this onnx model, and especially how to configure this tokenizer based on these json files?
I have not used onnx for this model before, but I have one function that works with other models.
The tokeizer are set up outside the function via tokenizer = AutoTokenizer.from_pretrained(model_name).
You might need to adjust the below function based on the model and your needs.
# Function to perform inference with ONNX
def onnx_inference(question, answer):
inputs = tokenizer(question, answer, return_tensors="pt")
input_names = ort_session.get_inputs()
inputs_onnx = {
input_name.name: inputs[input_name.name].numpy() for input_name in input_names
}
outputs = ort_session.run(None, inputs_onnx)
return outputs[0][0]
from transformers import AutoTokenizer, pipeline
from optimum.onnxruntime import ORTModelForCausalLM
Explicitly specify the 'transformers' library
model = ORTModelForCausalLM.from_pretrained("openai-community/gpt2", subfolder="onnx") # Add this line
Load tokenizer (must match the original model)
tokenizer = AutoTokenizer.from_pretrained("openai-community/gpt2")
pipe = pipeline("text-generation", model=model, tokenizer=tokenizer)
result = pipe("He never went out without a book under his arm")
print(result[0]["generated_text"])
Device set to use cpu
Setting pad_token_id to eos_token_id:50256 for open-end generation.
He never went out without a book under his arm. He never left his books unattended so that I, as well as my siblings, could read them. He could even sit in the armchair listening to those who would listen and write. H
