Instructions to use bartowski/Mistral-Small-Instruct-2409-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use bartowski/Mistral-Small-Instruct-2409-GGUF with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="bartowski/Mistral-Small-Instruct-2409-GGUF",
	filename="Mistral-Small-Instruct-2409-IQ2_M.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use bartowski/Mistral-Small-Instruct-2409-GGUF with llama.cpp:

Install (macOS, Linux)

curl -LsSf https://llama.app/install.sh | sh
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf bartowski/Mistral-Small-Instruct-2409-GGUF:Q4_K_M
# Run inference directly in the terminal:
llama cli -hf bartowski/Mistral-Small-Instruct-2409-GGUF:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf bartowski/Mistral-Small-Instruct-2409-GGUF:Q4_K_M
# Run inference directly in the terminal:
llama cli -hf bartowski/Mistral-Small-Instruct-2409-GGUF:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf bartowski/Mistral-Small-Instruct-2409-GGUF:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf bartowski/Mistral-Small-Instruct-2409-GGUF:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf bartowski/Mistral-Small-Instruct-2409-GGUF:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf bartowski/Mistral-Small-Instruct-2409-GGUF:Q4_K_M

Use Docker

docker model run hf.co/bartowski/Mistral-Small-Instruct-2409-GGUF:Q4_K_M

LM Studio
Jan

vLLM

How to use bartowski/Mistral-Small-Instruct-2409-GGUF with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "bartowski/Mistral-Small-Instruct-2409-GGUF"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "bartowski/Mistral-Small-Instruct-2409-GGUF",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/bartowski/Mistral-Small-Instruct-2409-GGUF:Q4_K_M

Ollama
How to use bartowski/Mistral-Small-Instruct-2409-GGUF with Ollama:
```
ollama run hf.co/bartowski/Mistral-Small-Instruct-2409-GGUF:Q4_K_M
```

Unsloth Studio

How to use bartowski/Mistral-Small-Instruct-2409-GGUF with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for bartowski/Mistral-Small-Instruct-2409-GGUF to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for bartowski/Mistral-Small-Instruct-2409-GGUF to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for bartowski/Mistral-Small-Instruct-2409-GGUF to start chatting

How to use bartowski/Mistral-Small-Instruct-2409-GGUF with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama serve -hf bartowski/Mistral-Small-Instruct-2409-GGUF:Q4_K_M

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "bartowski/Mistral-Small-Instruct-2409-GGUF:Q4_K_M"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use bartowski/Mistral-Small-Instruct-2409-GGUF with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama serve -hf bartowski/Mistral-Small-Instruct-2409-GGUF:Q4_K_M

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default bartowski/Mistral-Small-Instruct-2409-GGUF:Q4_K_M

Run Hermes

hermes

Atomic Chat new

OpenClaw new

How to use bartowski/Mistral-Small-Instruct-2409-GGUF with OpenClaw:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama serve -hf bartowski/Mistral-Small-Instruct-2409-GGUF:Q4_K_M

Configure OpenClaw

# Install OpenClaw:
npm install -g openclaw@latest
# Register the local server and set it as the default model:
openclaw onboard --non-interactive --mode local \
  --auth-choice custom-api-key \
  --custom-base-url http://127.0.0.1:8080/v1 \
  --custom-model-id "bartowski/Mistral-Small-Instruct-2409-GGUF:Q4_K_M" \
  --custom-provider-id llama-cpp \
  --custom-compatibility openai \
  --custom-text-input \
  --accept-risk \
  --skip-health

Run OpenClaw

openclaw agent --local --agent main --message "Hello from Hugging Face"

Docker Model Runner
How to use bartowski/Mistral-Small-Instruct-2409-GGUF with Docker Model Runner:
```
docker model run hf.co/bartowski/Mistral-Small-Instruct-2409-GGUF:Q4_K_M
```

Lemonade

How to use bartowski/Mistral-Small-Instruct-2409-GGUF with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull bartowski/Mistral-Small-Instruct-2409-GGUF:Q4_K_M

Run and chat with the model

lemonade run user.Mistral-Small-Instruct-2409-GGUF-Q4_K_M

List all available models

lemonade list

Possibly the provided prompt format is wrong.

by vevi33 - opened Sep 17, 2024

Discussion

vevi33

Sep 17, 2024

•

edited Sep 18, 2024

Hi!
Thanks for the very quick quants. This model is really great, however apparently there is a big misunderstanding around the new Mistral prompt format. (Also it is differ from the official Mistral description as well)

Here is my reddit post about it:

https://www.reddit.com/r/LocalLLaMA/comments/1fjb4i5/mistralsmallinstruct2409_is_actually_really/

Marinara also confirmed my theory a few weeks ago. (You can find it in the model description)
https://huggingface.co/MarinaraSpaghetti/NemoMix-Unleashed-12B-GGUF

The correct one should be:

<s>[INST] user message[/INST] assistant message</s>[INST] new user message[/INST]

Another source:
https://community.aws/content/2dFNOnLVQRhyrOrMsloofnW0ckZ/how-to-prompt-mistral-ai-models-and-why

I tested it with your and our version as well. Nemo and this model is way more coherent and "clever" with the suggested format.
With yours it was broken in many of my tests. (More details in the reddit post).

asdfsdfssddf

Sep 17, 2024

•

edited Sep 17, 2024

I can confirm this with the older mistral nemo based models (still d/l'ing this one, presumably it will be the same).

ddh0

Sep 18, 2024

God, I wish Mistral used a better prompt format

bartowski

Owner Sep 18, 2024

•

edited Sep 18, 2024

I just throw what the actual tokenizer chat template compiles to, hence it <s> at the start, and I assume the Jinja will handle the rest properly, which it looks like it will?

I can't speak to whether the system prompt should get its own response, that feels like just multi turn prompting and suggests that a system message just isn't supported

Otherwise I see no difference in the chat template provided vs the one in the AWS link

Suparious

Sep 18, 2024

•

edited Sep 18, 2024

God, I wish Mistral used a better prompt format

You don't need a better prompt format, if you just use the model's original tokenizer.
Not sure how GGUF people handle this issue, but I was able to make a quick python using the transformer's library to instantiate the toeknizer from here:

and if you want to use the v3 tokenizer, you can use the same JSON, but instead, with this model:

https://huggingface.co/mistralai/Mistral-Small-Instruct-2409/blob/main/tokenizer.model.v3

That will allow you to never care about the prompt format.

Also, using good inference engines, you can usually have both a completions endpoint (no tokenizer, needs you to define prompt format) and the chat/completions endpoints (which is using the tokenizer, and does not need you to specify the prompt format.)

TouchNight

Sep 18, 2024

•

edited Sep 18, 2024

Made a prompt Jinja2 template here to support un - user/assistant/user/assistant... sequence by glue continues role's messages together.

{{- '<s>' }}
{%- for message in messages %}
    {%- set prev_message = messages[loop.index0 - 1] if not loop.first else None %}
    {%- set next_message = messages[loop.index] if not loop.last else None %}

    {%- if message['role'] != 'assistant' %}
        {%- if not prev_message or prev_message['role'] == 'assistant' %}
            {{- '[INST] ' }}
        {%- endif %}
        {{- message['content'] }}
        {%- if not next_message or next_message['role'] == 'assistant' %}
            {{- '[/INST]' }}
        {%- elif message['role'] == 'system' %}
            {{- '\n\n' }}
        {%- else %}
            {{- '\n' }}
        {%- endif %}
        
    {%- elif message['role'] == 'assistant' %}
        {%- if loop.first %}
            {{- '[INST] [/INST]' }}
        {%- endif %}
        {{- ' ' + message['content'] }}
        {%- if next_message and next_message['role'] != 'assistant' %}
            {{- '</s>' }}
        {%- else %}
            {{- '</s>[INST] [/INST]' }}
        {%- endif %}
    {%- endif %}
{%- endfor %}

pandora-s

Sep 18, 2024

@vevi33
Hi there! Actually, the v3 should look more like:
<s>[INST] user message[/INST] assistant message</s>[INST] new user message[/INST]
For more deep explanations: https://github.com/mistralai/cookbook/blob/main/concept-deep-dive/tokenization/chat_templates.md

vevi33

Sep 18, 2024

@pandora-s
Thank you for the clarification!

I purposed basically this if I am not wrong, but I corrected my post according to your link, the be exactly the same and to not confuse anyone!
Thanks for everyone for being helpful and make this topic finally clear in the community!

mirek190

Sep 18, 2024

•

edited Sep 18, 2024

<s>[INST] user message[/INST] assistant message</s>[INST] new user message[/INST]

For llamacpp prompt template will be like that

--in-prefix "</s>[INST] " --in-suffix "[/INST] " -p "<s>[INST] You are a helpful assistant.[/INST]"

ddh0

Sep 18, 2024

Hi there! Actually, the v3 should look more like:
<s>[INST] user message[/INST] assistant message</s>[INST] new user message[/INST]
For more deep explanations: https://github.com/mistralai/cookbook/blob/main/concept-deep-dive/tokenization/chat_templates.md

@pandora-s Just to clarify: what you've written here is the format one should use for Mistral-Small-Instruct-2409, right?

Danioken

Sep 18, 2024

Hi!
Thanks for the very quick quants. This model is really great, however apparently there is a big misunderstanding around the new Mistral prompt format. (Also it is differ from the official Mistral description as well)

Here is my reddit post about it:

https://www.reddit.com/r/LocalLLaMA/comments/1fjb4i5/mistralsmallinstruct2409_is_actually_really/

Marinara also confirmed my theory a few weeks ago. (You can find it in the model description)
https://huggingface.co/MarinaraSpaghetti/NemoMix-Unleashed-12B-GGUF

The correct one should be:

<s>[INST] user message[/INST] assistant message</s>[INST] new user message[/INST]

Another source:
https://community.aws/content/2dFNOnLVQRhyrOrMsloofnW0ckZ/how-to-prompt-mistral-ai-models-and-why

I tested it with your and our version as well. Nemo and this model is way more coherent and "clever" with the suggested format.
With yours it was broken in many of my tests. (More details in the reddit post).

I used https://huggingface.co/MarinaraSpaghetti/SillyTavern-Settings

Awesome! Thanks! It really does contribute a lot... in everything, logic, prose, immersion... incredible.

asdfsdfssddf

Sep 18, 2024

I'm using Marinara's presets too and they make a world of difference far as rp is concerned with Mistral models.

pandora-s

Sep 18, 2024

•

edited Sep 18, 2024

Just to clarify: what you've written here is the format one should use for Mistral-Small-Instruct-2409, right?

@ddh0 yes, the original Small repo was fixed a few hours ago with the correct template, sorry for the trouble!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment