Instructions to use Sanrove/gpt2-GPTQ-4b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Sanrove/gpt2-GPTQ-4b with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="Sanrove/gpt2-GPTQ-4b")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("Sanrove/gpt2-GPTQ-4b") model = AutoModelForCausalLM.from_pretrained("Sanrove/gpt2-GPTQ-4b") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use Sanrove/gpt2-GPTQ-4b with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Sanrove/gpt2-GPTQ-4b" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Sanrove/gpt2-GPTQ-4b", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/Sanrove/gpt2-GPTQ-4b
- SGLang
How to use Sanrove/gpt2-GPTQ-4b with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Sanrove/gpt2-GPTQ-4b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Sanrove/gpt2-GPTQ-4b", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Sanrove/gpt2-GPTQ-4b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Sanrove/gpt2-GPTQ-4b", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use Sanrove/gpt2-GPTQ-4b with Docker Model Runner:
docker model run hf.co/Sanrove/gpt2-GPTQ-4b
Model created using AutoGPTQ on a GPT-2 model with 4-bit quantization.
You can load this model with the AutoGPTQ library, installed with the following command:
pip install auto-gptq
You can then download the model from the hub using the following code:
from transformers import AutoModelForCausalLM, AutoTokenizer
from auto_gptq import AutoGPTQForCausalLM, BaseQuantizeConfig
model_name = "Sanrove/gpt2-GPTQ-4b"
tokenizer = AutoTokenizer.from_pretrained(model_name)
quantize_config = BaseQuantizeConfig.from_pretrained(model_name)
model = AutoGPTQForCausalLM.from_quantized(model_name,
model_basename="gptq_model-4bit-128g",
device="cuda:0",
use_triton=True,
use_safetensors=True,
quantize_config=quantize_config)
This model works with the traditional Text Generation pipeline.
Example of generation with the input text "I have a dream":
I have a dream." – William Shakespeare
With this opening line, one can see how Shakespeare was very influenced by the story of the great poet. The great poet was the first true English poet, as well as the son of a English noble
- Downloads last month
- 9