mistralai
/

Leanstral-2603

vllm

Model card Files Files and versions

xet

Community

patrickvonplaten commited on Mar 16

Commit

981c021

verified ·

1 Parent(s): 23555d7

Update README.md

Browse files

Files changed (1) hide show

README.md +396 -3

README.md CHANGED Viewed

@@ -16,7 +16,7 @@ Leanstral consists of the following architectural choices:
 - MoE: 128 experts and 4 active.
 - 119B with 6.5B activated parameters per token.
-- 256k Context Length.
 - Multimodal Input: Accepts both text and image input, with text output.
 Leanstral offers the following capabilities:
@@ -29,6 +29,399 @@ Leanstral offers the following capabilities:
 - **Apache 2.0 License**: Open-source license allowing usage and modification for both commercial and non-commercial purposes.
 - **Large Context Window**: Supports a 256k context window.
-### Examples
-## Usage

 - MoE: 128 experts and 4 active.
 - 119B with 6.5B activated parameters per token.
+- 200k Context Length.
 - Multimodal Input: Accepts both text and image input, with text output.
 Leanstral offers the following capabilities:
 - **Apache 2.0 License**: Open-source license allowing usage and modification for both commercial and non-commercial purposes.
 - **Large Context Window**: Supports a 256k context window.
+## Usage
+### Scaffolding
+We recommend using `Leanstral 119B A6B` with [Mistral Vibe](https://github.com/mistralai/mistral-vibe).
+... TODO
+### Local Deployment
+The model can also be deployed with the following libraries, we advise everyone to use the Mistral AI API if the model is subpar with local serving:
+- [`vllm (recommended)`](https://github.com/vllm-project/vllm): See [here](#vllm-recommended)
+- [`transformers`](https://github.com/huggingface/transformers): WIP ⏳ - follow updates on this PR: ....
+#### vLLM (recommended)
+<details>
+<summary>Expand</summary
+We recommend using this model with the [vLLM library](https://github.com/vllm-project/vllm)
+to implement production-ready inference pipelines.
+**_Installation_**
+Please make sure to install latest vLLM:
+```
+uv pip install -U vllm
+```
+Alternatively you can also directly use the latest docker image [vllm/vllm-openai:latest](https://hub.docker.com/layers/vllm/vllm-openai/latest/):
+```
+docker pull vllm/vllm-openai:latest
+docker run -it vllm/vllm-openai:latest
+```
+Also make sure to have installed [`mistral_common >= 1.8.6`](https://github.com/mistralai/mistral-common/releases/tag/v1.8.6).
+To check:
+```
+python -c "import mistral_common; print(mistral_common.__version__)"
+```
+**_Launch server_**
+We recommand that you use Devstral in a server/client setting.
+1. Spin up a server:
+```
+vllm serve mistralai/Leanstral-2603 \
+  --max-model-len 200000 \
+  --tensor-parallel-size 4 \
+  --attention-backend FLASH_ATTN_MLA \
+  --tool-call-parser mistral \
+  --enable-auto-tool-choice \
+  --reasoning-parser mistral
+```
+2. To ping the client you can use a simple Python snippet.
+```py
+import requests
+import json
+from huggingface_hub import hf_hub_download
+url = "http://<your-hostname>/v1/chat/completions"
+headers = {"Content-Type": "application/json", "Authorization": "Bearer token"}
+model = "mistralai/Leanstral-2603"
+prompt = """Define the transition rules as an inductive proposition.
+This choice provides better support for proving properties about valid transitions and is generally more natural for modeling state machines in Lean, where you want to express logical rules rather than just computing a yes/no vale for each possible transition."""
+messages = [
+    {
+        "role": "user",
+        "content": [
+            {
+                "type": "text",
+                "text": prompt,
+            },
+        ],
+    },
+]
+# or switch to 'reasoning_effort': "none" for faster answers
+data = {"model": model, "messages": messages, "temperature": 1.0, "reasoning_effort": "high"}
+# Devstral 2 supports tool calling. If you want to use tools, follow this:
+# tools = [ # Define tools for vLLM
+#     {
+#         "type": "function",
+#         "function": {
+#             "name": "git_clone",
+#             "description": "Clone a git repository",
+#             "parameters": {
+#                 "type": "object",
+#                 "properties": {
+#                     "url": {
+#                         "type": "string",
+#                         "description": "The url of the git repository",
+#                     },
+#                 },
+#                 "required": ["url"],
+#             },
+#         },
+#     }
+# ]
+# data = {"model": model, "messages": messages, "temperature": 0.15, "tools": tools} # Pass tools to payload.
+response = requests.post(url, headers=headers, data=json.dumps(data))
+import ipdb; ipdb.set_trace()
+print(response.json()["choices"][0]["message"]["content"])
+```
+</details>
+#### SGLang
+<details>
+<summary>Expand</summary>
+To use this model with [SGLang](https://github.com/sgl-project/sglang) to implement a production-ready inference pipelines (OpenAI-compatible API server),
+see the following sections.
+**_Installation_**
+Install SGLang from source (track latest `main` locally):
+```
+git clone https://github.com/sgl-project/sglang.git
+cd sglang
+uv pip install -e python
+uv pip install transformers==5.0.0rc # required
+```
+**_Launch server_**
+We recommend that you use Devstral in a server/client setting.
+1. Spin up a server:
+```
+python -m sglang.launch_server --model-path mistralai/Devstral-2-123B-Instruct-2512 --host 0.0.0.0 --port 30000 --tp 8 --tool-call-parser mistral
+```
+2. To ping the client you can use a simple Python snippet.
+```py
+import requests
+import json
+from huggingface_hub import hf_hub_download
+url = "http://<your-server-url>:30000/v1/chat/completions"
+headers = {"Content-Type": "application/json", "Authorization": "Bearer token"}
+model = "mistralai/Devstral-2-123B-Instruct-2512"
+def load_system_prompt(repo_id: str, filename: str) -> str:
+    file_path = hf_hub_download(repo_id=repo_id, filename=filename)
+    with open(file_path, "r") as file:
+        system_prompt = file.read()
+    return system_prompt
+SYSTEM_PROMPT = load_system_prompt(model, "CHAT_SYSTEM_PROMPT.txt")
+messages = [
+    {"role": "system", "content": SYSTEM_PROMPT},
+    {
+        "role": "user",
+        "content": [
+            {
+                "type": "text",
+                "text": "<your-command>",
+            },
+        ],
+    },
+]
+data = {"model": model, "messages": messages, "temperature": 0.15}
+# Devstral 2 supports tool calling. If you want to use tools, follow this:
+# tools = [  # Define tools (OpenAI-compatible)
+#     {
+#         "type": "function",
+#         "function": {
+#             "name": "git_clone",
+#             "description": "Clone a git repository",
+#             "parameters": {
+#                 "type": "object",
+#                 "properties": {
+#                     "url": {
+#                         "type": "string",
+#                         "description": "The url of the git repository",
+#                     },
+#                 },
+#                 "required": ["url"],
+#             },
+#         },
+#     }
+# ]
+# data = {"model": model, "messages": messages, "temperature": 0.15, "tools": tools} # Pass tools to payload.
+response = requests.post(url, headers=headers, data=json.dumps(data))
+print(response.json()["choices"][0]["message"]["content"])
+```
+</details>
+#### Transformers
+<details>
+<summary>Expand</summary
+Make sure to install from main:
+```sh
+uv pip install git+https://github.com/huggingface/transformers
+```
+And run the following code snippet:
+```python
+from transformers import (
+    MistralForCausalLM,
+    MistralCommonBackend,
+)
+model_id = "mistralai/Devstral-2-123B-Instruct-2512"
+tokenizer = MistralCommonBackend.from_pretrained(model_id)
+model = MistralForCausalLM.from_pretrained(model_id, device_map="auto")
+SP = """You are operating as and within Mistral Vibe, a CLI coding-agent built by Mistral AI and powered by default by the Devstral family of models. It wraps Mistral's Devstral models to enable natural language interaction with a local codebase. Use the available tools when helpful.
+You can:
+- Receive user prompts, project context, and files.
+- Send responses and emit function calls (e.g., shell commands, code edits).
+- Apply patches, run commands, based on user approvals.
+Answer the user's request using the relevant tool(s), if they are available. Check that all the required parameters for each tool call are provided or can reasonably be inferred from context. IF there are no relevant tools or there are missing values for required parameters, ask the user to supply these values; otherwise proceed with the tool calls. If the user provides a specific value for a parameter (for example provided in quotes), make sure to use that value EXACTLY. DO NOT make up values for or ask about optional parameters. Carefully analyze descriptive terms in the request as they may indicate required parameter values that should be included even if not explicitly quoted.
+Always try your hardest to use the tools to answer the user's request. If you can't use the tools, explain why and ask the user for more information.
+Act as an agentic assistant, if a user asks for a long task, break it down and do it step by step.
+When you want to commit changes, you will always use the 'git commit' bash command. It will always be suffixed with a line telling it was generated by Mistral Vibe with the appropriate co-authoring information. The format you will always use is the following heredoc.
+```bash
+git commit -m "<Commit message here>
+Generated by Mistral Vibe.
+Co-Authored-By: Mistral Vibe <vibe@mistral.ai>"
+```"""
+input = {
+    "messages": [
+        {
+            "role": "system",
+            "content": SP,
+        },
+        {
+            "role": "user",
+            "content": [
+                {
+                    "type": "text",
+                    "text": "Can you implement in Python a method to compute the fibonnaci sequence at the `n`th element with `n` a parameter passed to the function ? You should start the sequence from 1, previous values are invalid.\nThen run the Python code for the function for n=5 and give the answer.",
+                }
+            ],
+        },
+    ],
+    "tools": [
+        {
+            "type": "function",
+            "function": {
+                "name": "add_number",
+                "description": "Add two numbers.",
+                "parameters": {
+                    "type": "object",
+                    "properties": {
+                        "a": {"type": "string", "description": "The first number."},
+                        "b": {"type": "string", "description": "The second number."},
+                    },
+                    "required": ["a", "b"],
+                },
+            },
+        },
+        {
+            "type": "function",
+            "function": {
+                "name": "multiply_number",
+                "description": "Multiply two numbers.",
+                "parameters": {
+                    "type": "object",
+                    "properties": {
+                        "a": {"type": "string", "description": "The first number."},
+                        "b": {"type": "string", "description": "The second number."},
+                    },
+                    "required": ["a", "b"],
+                },
+            },
+        },
+        {
+            "type": "function",
+            "function": {
+                "name": "substract_number",
+                "description": "Substract two numbers.",
+                "parameters": {
+                    "type": "object",
+                    "properties": {
+                        "a": {"type": "string", "description": "The first number."},
+                        "b": {"type": "string", "description": "The second number."},
+                    },
+                    "required": ["a", "b"],
+                },
+            },
+        },
+        {
+            "type": "function",
+            "function": {
+                "name": "write_a_story",
+                "description": "Write a story about science fiction and people with badass laser sabers.",
+                "parameters": {},
+            },
+        },
+        {
+            "type": "function",
+            "function": {
+                "name": "terminal",
+                "description": "Perform operations from the terminal.",
+                "parameters": {
+                    "type": "object",
+                    "properties": {
+                        "command": {
+                            "type": "string",
+                            "description": "The command you wish to launch, e.g `ls`, `rm`, ...",
+                        },
+                        "args": {
+                            "type": "string",
+                            "description": "The arguments to pass to the command.",
+                        },
+                    },
+                    "required": ["command"],
+                },
+            },
+        },
+        {
+            "type": "function",
+            "function": {
+                "name": "python",
+                "description": "Call a Python interpreter with some Python code that will be ran.",
+                "parameters": {
+                    "type": "object",
+                    "properties": {
+                        "code": {
+                            "type": "string",
+                            "description": "The Python code to run",
+                        },
+                        "result_variable": {
+                            "type": "string",
+                            "description": "Variable containing the result you'd like to retrieve from the execution.",
+                        },
+                    },
+                    "required": ["code", "result_variable"],
+                },
+            },
+        },
+    ],
+}
+tokenized = tokenizer.apply_chat_template(
+    conversation=input["messages"],
+    tools=input["tools"],
+    return_tensors="pt",
+    return_dict=True,
+)
+input_ids = tokenized["input_ids"].to(device="cuda")
+output = model.generate(
+    input_ids,
+    max_new_tokens=200,
+    do_sample=True,
+    temperature=0.15,
+)[0]
+decoded_output = tokenizer.decode(output[len(tokenized["input_ids"][0]) :])
+print(decoded_output)
+```
+</details>