Instructions to use QuixiAI/WizardLM-7B-Uncensored with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use QuixiAI/WizardLM-7B-Uncensored with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="QuixiAI/WizardLM-7B-Uncensored")

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("QuixiAI/WizardLM-7B-Uncensored")
model = AutoModelForCausalLM.from_pretrained("QuixiAI/WizardLM-7B-Uncensored")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use QuixiAI/WizardLM-7B-Uncensored with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "QuixiAI/WizardLM-7B-Uncensored"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "QuixiAI/WizardLM-7B-Uncensored",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/QuixiAI/WizardLM-7B-Uncensored

SGLang

How to use QuixiAI/WizardLM-7B-Uncensored with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "QuixiAI/WizardLM-7B-Uncensored" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "QuixiAI/WizardLM-7B-Uncensored",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "QuixiAI/WizardLM-7B-Uncensored" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "QuixiAI/WizardLM-7B-Uncensored",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use QuixiAI/WizardLM-7B-Uncensored with Docker Model Runner:
```
docker model run hf.co/QuixiAI/WizardLM-7B-Uncensored
```

Issue with deploy on sagemaker -

by scribematic - opened May 9, 2023

Discussion

scribematic

May 9, 2023

•

edited May 11, 2023 by

julien-c

Hi, I am trying to deploy on sagemaker and am running into some issues I don't get on other models

from sagemaker.huggingface import HuggingFaceModel
import boto3
from sagemaker import Session

# Replace with your access key and secret key
access_key = "key"
secret_key = "key"

# Create a boto3 session with the specified access key and secret key
boto3_session = boto3.Session(
    aws_access_key_id=access_key,
    aws_secret_access_key=secret_key,
    region_name="us-east-1"
)

# Use the boto3 session to create the IAM client
iam_client = boto3_session.client('iam')

# Create a SageMaker session with the custom boto3 session
sagemaker_session = Session(boto_session=boto3_session)

role = iam_client.get_role(RoleName='ROLE')['Role']['Arn']
# Hub Model configuration. https://huggingface.co/models
hub = {
    'HF_MODEL_ID':'TheBloke/WizardLM-7B-uncensored-GPTQ',
    'HF_TASK':'text-generation'
}

# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
    transformers_version='4.17.0',
    pytorch_version='1.10.2',
    py_version='py38',
    env=hub,
    role=role,
    sagemaker_session=sagemaker_session  # Pass the custom SageMaker session
)

# deploy model to SageMaker Inference
predictor = huggingface_model.deploy(
    initial_instance_count=1, # number of instances
    instance_type='ml.g4dn.2xlarge' # ec2 instance type
)

I am getting the following error trying to query the endpoint after deployment:

{
  "code": 400,
  "type": "InternalServerException",
  "message": "\u0027llama\u0027"
}

Is this a library that it doesn't import? Do I need to custom set this up instead of just deploying to sagemaker? The inference huggingface export doesn't work for the same reason, probably worth bringing to your attention.

Thank you

magicsquares137

May 11, 2023

I am getting the same error, any suggestions? Im simply using the sagemaker deployment code listed above

ehartford

Quixi AI org May 11, 2023

•

edited May 11, 2023

I'm afraid I don't know anything about sagemaker. But I'm happy to take pull requests if anyone figures out what's wrong

magicsquares137

May 18, 2023

I figured out the error, unfortunately dont see an immediate solution to deploy this as a sagemaker endpoint. The sagemaker env only supports HF transformers versions up to 4.7 or something, and this model is a fine tuned llama model, which was done on 4.28: https://huggingface.co/decapoda-research/llama-7b-hf/discussions/39

not sure when support will be available

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment