Instructions to use mistralai/Mistral-7B-Instruct-v0.2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use mistralai/Mistral-7B-Instruct-v0.2 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="mistralai/Mistral-7B-Instruct-v0.2")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("mistralai/Mistral-7B-Instruct-v0.2")
model = AutoModelForCausalLM.from_pretrained("mistralai/Mistral-7B-Instruct-v0.2", device_map="auto")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Inference
Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use mistralai/Mistral-7B-Instruct-v0.2 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Install mistral-common:
pip install --upgrade mistral-common
# Start the vLLM server:
vllm serve "mistralai/Mistral-7B-Instruct-v0.2" --tokenizer_mode mistral --config_format mistral --load_format mistral --tool-call-parser mistral --enable-auto-tool-choice
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "mistralai/Mistral-7B-Instruct-v0.2",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/mistralai/Mistral-7B-Instruct-v0.2

SGLang

How to use mistralai/Mistral-7B-Instruct-v0.2 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "mistralai/Mistral-7B-Instruct-v0.2" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "mistralai/Mistral-7B-Instruct-v0.2",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "mistralai/Mistral-7B-Instruct-v0.2" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "mistralai/Mistral-7B-Instruct-v0.2",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use mistralai/Mistral-7B-Instruct-v0.2 with Docker Model Runner:
```
docker model run hf.co/mistralai/Mistral-7B-Instruct-v0.2
```

Cannot load model post agreement to new terms and using access token

#104

by CTJP - opened Apr 23, 2024

Discussion

CTJP

Apr 23, 2024

When trying to load any of the mistral7b models I get
"Cannot access gated repo for url https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2/resolve/main/config.json."
I can recreate the issue using the following code:

from transformers import AutoModelForCausalLM, AutoTokenizer, AutoConfig
import os
hf_token = "[hf_token]"
os.environ["HF_TOKEN"] = hf_token
config = AutoConfig.from_pretrained("mistralai/Mistral-7B-instruct-v0.2")

I can confirm 100% that the access token is working (the above code runs for Gemma for example) and I have 100% accepted the agreement for Mistral 7b.
Interestingly I do not get the issue for Mixtral-8x22B-Instruct-v0.1 only for the set of Mistral7b models.

GustavoBMG

Apr 23, 2024

Hey ... try passing the token also into the AutoTokenizer, not only on AutoModelForCausalLM:

AutoTokenizer.from_pretrained(model_id, token = '<your token>')

AutoModelForCausalLM.from_pretrained(model_id, token = '<your token>')

timing418

Apr 24, 2024

In bash
$ huggingface-cli login
then the screen will ask you to Enter your token (input will not be visible):
paste your token from settings => access tokens

this worked for me

CTJP

Apr 24, 2024

Hey, Thanks for the advice, sadly neither of the above worked for me.
huggingface-cli whoami
Is returning the correct user name so the token is definitely working and I am able to load the tokenizer but not the model or config.
I have replicated this in several environments one using vllm in Vertex AI, a local notebook and a Colab enterprise notebook.

GustavoBMG

Apr 24, 2024

ok, weird ... can you post the error message?

mav1814

Apr 24, 2024

@GustavoBMG where do i pass?

AutoTokenizer.from_pretrained(model_id, token = '')

AutoModelForCausalLM.from_pretrained(model_id, token = '')

Im trying to do it on the privategpt repository via docker..

GustavoBMG

Apr 24, 2024

After the token param:

AutoTokenizer.from_pretrained(model_id, token = 'bfksdhbfksdbgkdsbgfhd')

AutoModelForCausalLM.from_pretrained(model_id, token = 'fdsbfndksjfdsnfkds')

something like this

mav1814

Apr 24, 2024

sorry my i wrote wrong.
I know here to put the token, i just don't know where i put the autoTokenizer.from and the CausalILM. When i say where, i'm talking about the file.

GustavoBMG

Apr 25, 2024

@mav1814 ... hey I'm not following what you are saying ...

What I mean is: copy the code snippet example from the model card and put the parameters there.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment