Instructions to use QuixiAI/WizardLM-7B-Uncensored with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use QuixiAI/WizardLM-7B-Uncensored with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="QuixiAI/WizardLM-7B-Uncensored")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("QuixiAI/WizardLM-7B-Uncensored") model = AutoModelForCausalLM.from_pretrained("QuixiAI/WizardLM-7B-Uncensored") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use QuixiAI/WizardLM-7B-Uncensored with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "QuixiAI/WizardLM-7B-Uncensored" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "QuixiAI/WizardLM-7B-Uncensored", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/QuixiAI/WizardLM-7B-Uncensored
- SGLang
How to use QuixiAI/WizardLM-7B-Uncensored with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "QuixiAI/WizardLM-7B-Uncensored" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "QuixiAI/WizardLM-7B-Uncensored", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "QuixiAI/WizardLM-7B-Uncensored" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "QuixiAI/WizardLM-7B-Uncensored", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use QuixiAI/WizardLM-7B-Uncensored with Docker Model Runner:
docker model run hf.co/QuixiAI/WizardLM-7B-Uncensored
Issue with deploy on sagemaker -
Hi, I am trying to deploy on sagemaker and am running into some issues I don't get on other models
from sagemaker.huggingface import HuggingFaceModel
import boto3
from sagemaker import Session
# Replace with your access key and secret key
access_key = "key"
secret_key = "key"
# Create a boto3 session with the specified access key and secret key
boto3_session = boto3.Session(
aws_access_key_id=access_key,
aws_secret_access_key=secret_key,
region_name="us-east-1"
)
# Use the boto3 session to create the IAM client
iam_client = boto3_session.client('iam')
# Create a SageMaker session with the custom boto3 session
sagemaker_session = Session(boto_session=boto3_session)
role = iam_client.get_role(RoleName='ROLE')['Role']['Arn']
# Hub Model configuration. https://huggingface.co/models
hub = {
'HF_MODEL_ID':'TheBloke/WizardLM-7B-uncensored-GPTQ',
'HF_TASK':'text-generation'
}
# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
transformers_version='4.17.0',
pytorch_version='1.10.2',
py_version='py38',
env=hub,
role=role,
sagemaker_session=sagemaker_session # Pass the custom SageMaker session
)
# deploy model to SageMaker Inference
predictor = huggingface_model.deploy(
initial_instance_count=1, # number of instances
instance_type='ml.g4dn.2xlarge' # ec2 instance type
)
I am getting the following error trying to query the endpoint after deployment:
{
"code": 400,
"type": "InternalServerException",
"message": "\u0027llama\u0027"
}
Is this a library that it doesn't import? Do I need to custom set this up instead of just deploying to sagemaker? The inference huggingface export doesn't work for the same reason, probably worth bringing to your attention.
Thank you
I am getting the same error, any suggestions? Im simply using the sagemaker deployment code listed above
I'm afraid I don't know anything about sagemaker. But I'm happy to take pull requests if anyone figures out what's wrong
I figured out the error, unfortunately dont see an immediate solution to deploy this as a sagemaker endpoint. The sagemaker env only supports HF transformers versions up to 4.7 or something, and this model is a fine tuned llama model, which was done on 4.28: https://huggingface.co/decapoda-research/llama-7b-hf/discussions/39
not sure when support will be available