mistralai
/

Mistral-Small-Instruct-2409

Model card Files Files and versions

patrickvonplaten commited on Sep 17, 2024

Commit

7372631

·

verified ·

1 Parent(s): f5b0986

Update README.md

Files changed (1) hide show

README.md +87 -4

README.md CHANGED Viewed

@@ -13,15 +13,98 @@ Mistral-Small-Instruct-2409 is an instruct fine-tuned version with the following
 - Supports function calling
 - 128k sequence length
-## Installation
-It is recommended to use `mistralai/Mistral-Small-Instruct-2409` with [mistral-inference](https://github.com/mistralai/mistral-inference). For HF transformers code snippets, please keep scrolling.
 ```
-pip install mistral_inference
 ```
-## Download
 ```py
 from huggingface_hub import snapshot_download

 - Supports function calling
 - 128k sequence length
+## Usage Examples
+### vLLM (recommended)
+We recommend using Pixtral with the [vLLM library](https://github.com/vllm-project/vllm)
+to implement production-ready inference pipelines with Pixtral.
+**_Installation_**
+Make sure you install `vLLM >= v0.6.1.post1`:
+```
+pip install --upgrade vllm
+```
+Also make sure you have `mistral_common >= 1.4.1` installed:
+```
+pip install --upgrade mistral_common
+```
+You can also make use of a ready-to-go [docker image](https://hub.docker.com/layers/vllm/vllm-openai/latest/images/sha256-de9032a92ffea7b5c007dad80b38fd44aac11eddc31c435f8e52f3b7404bbf39?context=explore).
+**_Offline Example_**
+```py
+from vllm import LLM
+from vllm.sampling_params import SamplingParams
+model_name = "mistralai/Mistral-Small-Instruct-2409"
+sampling_params = SamplingParams(max_tokens=8192)
+llm = LLM(model=model_name, tokenizer_mode="mistral", config_format="mistral", load_format="mistral")
+prompt = "How many often does the letter 'r' occur in 'Mistral'?"
+messages = [
+    {
+        "role": "user",
+        "content": prompt
+    },
+]
+outputs = llm.chat(messages, sampling_params=sampling_params)
+print(outputs[0].outputs[0].text)
+```
+**_Server_**
+You can also use Mistral Small in a server/client setting.
+1. Spin up a server:
+```
+vllm serve mistralai/Mistral-Small-Instruct-2409 --tokenizer_mode mistral --config_format mistral --load_format mistral
+```
+2. And ping the client:
+```
+curl --location 'http://<your-node-url>:8000/v1/chat/completions' \
+--header 'Content-Type: application/json' \
+--header 'Authorization: Bearer token' \
+--data '{
+    "model": "mistralai/Pixtral-12B-2409",
+    "messages": [
+      {
+        "role": "user",
+        "content": "How many often does the letter 'r' occur in 'Mistral'?",
+      }
+    ]
+  }'
+```
+### Mistral-inference
+We recommend using [mistral-inference](https://github.com/mistralai/mistral-inference) to quickly try out / "vibe-check" the model.
+**_Install_**
+Make sure to have `mistral_inference >= 1.4.1` installed.
 ```
+pip install mistral_inference --upgrade
 ```
+**_Download_**
 ```py
 from huggingface_hub import snapshot_download