patrickvonplaten commited on
Commit
7372631
·
verified ·
1 Parent(s): f5b0986

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +87 -4
README.md CHANGED
@@ -13,15 +13,98 @@ Mistral-Small-Instruct-2409 is an instruct fine-tuned version with the following
13
  - Supports function calling
14
  - 128k sequence length
15
 
16
- ## Installation
 
17
 
18
- It is recommended to use `mistralai/Mistral-Small-Instruct-2409` with [mistral-inference](https://github.com/mistralai/mistral-inference). For HF transformers code snippets, please keep scrolling.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
19
 
20
  ```
21
- pip install mistral_inference
22
  ```
23
 
24
- ## Download
25
 
26
  ```py
27
  from huggingface_hub import snapshot_download
 
13
  - Supports function calling
14
  - 128k sequence length
15
 
16
+
17
+ ## Usage Examples
18
 
19
+ ### vLLM (recommended)
20
+
21
+ We recommend using Pixtral with the [vLLM library](https://github.com/vllm-project/vllm)
22
+ to implement production-ready inference pipelines with Pixtral.
23
+
24
+ **_Installation_**
25
+
26
+ Make sure you install `vLLM >= v0.6.1.post1`:
27
+
28
+ ```
29
+ pip install --upgrade vllm
30
+ ```
31
+
32
+ Also make sure you have `mistral_common >= 1.4.1` installed:
33
+
34
+ ```
35
+ pip install --upgrade mistral_common
36
+ ```
37
+
38
+ You can also make use of a ready-to-go [docker image](https://hub.docker.com/layers/vllm/vllm-openai/latest/images/sha256-de9032a92ffea7b5c007dad80b38fd44aac11eddc31c435f8e52f3b7404bbf39?context=explore).
39
+
40
+
41
+ **_Offline Example_**
42
+
43
+ ```py
44
+ from vllm import LLM
45
+ from vllm.sampling_params import SamplingParams
46
+
47
+ model_name = "mistralai/Mistral-Small-Instruct-2409"
48
+
49
+ sampling_params = SamplingParams(max_tokens=8192)
50
+
51
+ llm = LLM(model=model_name, tokenizer_mode="mistral", config_format="mistral", load_format="mistral")
52
+
53
+ prompt = "How many often does the letter 'r' occur in 'Mistral'?"
54
+
55
+ messages = [
56
+ {
57
+ "role": "user",
58
+ "content": prompt
59
+ },
60
+ ]
61
+
62
+ outputs = llm.chat(messages, sampling_params=sampling_params)
63
+
64
+ print(outputs[0].outputs[0].text)
65
+ ```
66
+
67
+ **_Server_**
68
+
69
+ You can also use Mistral Small in a server/client setting.
70
+
71
+ 1. Spin up a server:
72
+
73
+ ```
74
+ vllm serve mistralai/Mistral-Small-Instruct-2409 --tokenizer_mode mistral --config_format mistral --load_format mistral
75
+ ```
76
+
77
+ 2. And ping the client:
78
+
79
+ ```
80
+ curl --location 'http://<your-node-url>:8000/v1/chat/completions' \
81
+ --header 'Content-Type: application/json' \
82
+ --header 'Authorization: Bearer token' \
83
+ --data '{
84
+ "model": "mistralai/Pixtral-12B-2409",
85
+ "messages": [
86
+ {
87
+ "role": "user",
88
+ "content": "How many often does the letter 'r' occur in 'Mistral'?",
89
+ }
90
+ ]
91
+ }'
92
+ ```
93
+
94
+ ### Mistral-inference
95
+
96
+ We recommend using [mistral-inference](https://github.com/mistralai/mistral-inference) to quickly try out / "vibe-check" the model.
97
+
98
+
99
+ **_Install_**
100
+
101
+ Make sure to have `mistral_inference >= 1.4.1` installed.
102
 
103
  ```
104
+ pip install mistral_inference --upgrade
105
  ```
106
 
107
+ **_Download_**
108
 
109
  ```py
110
  from huggingface_hub import snapshot_download