AWQ
Collection
12 items • Updated • 1
How to use MaziyarPanahi/WizardLM-2-7B-AWQ with Transformers:
# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("text-generation", model="MaziyarPanahi/WizardLM-2-7B-AWQ") # Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("MaziyarPanahi/WizardLM-2-7B-AWQ")
model = AutoModelForCausalLM.from_pretrained("MaziyarPanahi/WizardLM-2-7B-AWQ")How to use MaziyarPanahi/WizardLM-2-7B-AWQ with vLLM:
# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "MaziyarPanahi/WizardLM-2-7B-AWQ"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "MaziyarPanahi/WizardLM-2-7B-AWQ",
"prompt": "Once upon a time,",
"max_tokens": 512,
"temperature": 0.5
}'docker model run hf.co/MaziyarPanahi/WizardLM-2-7B-AWQ
How to use MaziyarPanahi/WizardLM-2-7B-AWQ with SGLang:
# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
--model-path "MaziyarPanahi/WizardLM-2-7B-AWQ" \
--host 0.0.0.0 \
--port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "MaziyarPanahi/WizardLM-2-7B-AWQ",
"prompt": "Once upon a time,",
"max_tokens": 512,
"temperature": 0.5
}'docker run --gpus all \
--shm-size 32g \
-p 30000:30000 \
-v ~/.cache/huggingface:/root/.cache/huggingface \
--env "HF_TOKEN=<secret>" \
--ipc=host \
lmsysorg/sglang:latest \
python3 -m sglang.launch_server \
--model-path "MaziyarPanahi/WizardLM-2-7B-AWQ" \
--host 0.0.0.0 \
--port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "MaziyarPanahi/WizardLM-2-7B-AWQ",
"prompt": "Once upon a time,",
"max_tokens": 512,
"temperature": 0.5
}'How to use MaziyarPanahi/WizardLM-2-7B-AWQ with Docker Model Runner:
docker model run hf.co/MaziyarPanahi/WizardLM-2-7B-AWQ
MaziyarPanahi/WizardLM-2-7B-AWQ is a quantized (AWQ) version of microsoft/WizardLM-2-7B
pip install --upgrade accelerate autoawq transformers
from transformers import AutoTokenizer, AutoModelForCausalLM
model_id = "MaziyarPanahi/WizardLM-2-7B-AWQ"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id).to(0)
text = "User:\nHello can you provide me with top-3 cool places to visit in Paris?\n\nAssistant:\n"
inputs = tokenizer(text, return_tensors="pt").to(0)
out = model.generate(**inputs, max_new_tokens=300)
print(tokenizer.decode(out[0], skip_special_tokens=True))
Results:
User:
Hello can you provide me with top-3 cool places to visit in Paris?
Assistant:
Absolutely, here are my top-3 recommendations for must-see places in Paris:
1. The Eiffel Tower: An icon of Paris, this wrought-iron lattice tower is a global cultural icon of France and is among the most recognizable structures in the world. Climbing up to the top offers breathtaking views of the city.
2. The Louvre Museum: Home to thousands of works of art, the Louvre is the world's largest art museum and a historic monument in Paris. Must-see pieces include the Mona Lisa, the Winged Victory of Samothrace, and the Venus de Milo.
3. Notre-Dame Cathedral: This cathedral is a masterpiece of French Gothic architecture and is famous for its intricate stone carvings, beautiful stained glass, and its iconic twin towers. Be sure to spend some time exploring its history and learning about the fascinating restoration efforts post the 2019 fire.
I hope you find these recommendations helpful and that they make for an enjoyable and memorable trip to Paris. Safe travels!
Base model
microsoft/WizardLM-2-7B