Instructions to use WizardLMTeam/WizardLM-70B-V1.0 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use WizardLMTeam/WizardLM-70B-V1.0 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="WizardLMTeam/WizardLM-70B-V1.0")

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("WizardLMTeam/WizardLM-70B-V1.0")
model = AutoModelForCausalLM.from_pretrained("WizardLMTeam/WizardLM-70B-V1.0")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use WizardLMTeam/WizardLM-70B-V1.0 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "WizardLMTeam/WizardLM-70B-V1.0"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "WizardLMTeam/WizardLM-70B-V1.0",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/WizardLMTeam/WizardLM-70B-V1.0

SGLang

How to use WizardLMTeam/WizardLM-70B-V1.0 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "WizardLMTeam/WizardLM-70B-V1.0" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "WizardLMTeam/WizardLM-70B-V1.0",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "WizardLMTeam/WizardLM-70B-V1.0" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "WizardLMTeam/WizardLM-70B-V1.0",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use WizardLMTeam/WizardLM-70B-V1.0 with Docker Model Runner:
```
docker model run hf.co/WizardLMTeam/WizardLM-70B-V1.0
```

Prompt Format

by philschmid - opened Aug 9, 2023

Discussion

philschmid

Aug 9, 2023

In the readme you say

WizardLM adopts the prompt format from Vicuna and supports multi-turn conversation. The prompt should be as following:
A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions. USER: hello, who are you? ASSISTANT: 

Are there \n missing between the roles and did you use a </s> after the Assistant turn? Since that's what the official Vicuna format is [REF]

here an sample

A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions.

USER: Hello!
ASSISTANT: Hello!</s>
USER: How are you?
ASSISTANT: I am good.</s>

ehartford

Aug 9, 2023

•

edited Aug 9, 2023

Also note, that according to the config.json, this model was trained on top of Llama-2-70b-chat-hf rather than Llama-2-70b-hf.

and, Llama-2-70b-chat-hf has a prompt format like:

[INST] <<SYS>>
You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe.  Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature. If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.
<</SYS>>

{prompt} [/INST]

To continue a conversation:

[INST] <<SYS>>
You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe.  Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature. If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.
<</SYS>>

{prompt} [/INST] {model_reply} [INST] {prompt} [/INST]

So this model was trained to follow two different prompt formats, and I imagine its personality changes dramatically depending on which prompt format you use.

WizardLM

WizardLM Team org Aug 14, 2023

WizardLM adopts the prompt format from Vicuna and supports multi-turn conversation. The prompt should be as following:

A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions. USER: Hi ASSISTANT: Hello.</s>USER: Who are you? ASSISTANT: I am WizardLM.</s>......

philschmid

Aug 14, 2023

Hey @WizardLM ,

Thank you for the response! I see in your comment that you have </s> added after the ASSISTANT turn. Any chance you can answer if there should be \n between the turns as Vicuna does?
meaning is the prompt like the one below?

A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions.

USER: Hello!
ASSISTANT: Hello!</s>
USER: How are you?
ASSISTANT: I am good.</s>

MaziyarPanahi

Oct 21, 2023

I am trying to use WizardLM model in chat-conversational-react-description and the prompt schema inside ChatPrompt has a big impact on the result specially in the conversation. I tried but USER/ASSISTANT with </s> and the usual Llama-2 style, I am not sure which prompting style should be the best when it comes to the begin/end of system, user, and assistant roles.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment