Update README.md
Browse files
README.md
CHANGED
|
@@ -21,17 +21,13 @@ language:
|
|
| 21 |
- sr
|
| 22 |
- sv
|
| 23 |
- tr
|
| 24 |
-
- uk
|
| 25 |
-
- vi
|
| 26 |
-
- hi
|
| 27 |
-
- bn
|
| 28 |
tags:
|
| 29 |
- vLLM
|
| 30 |
---
|
| 31 |
|
| 32 |
-
#
|
| 33 |
|
| 34 |
-
|
| 35 |
|
| 36 |
With its multimodal capabilities, efficient architecture, and flexible mode switching, it is a powerful general-purpose model for any task. In a latency-optimized setup, Mistral Small 4 achieves a **40% reduction in end-to-end completion time**, and in a throughput-optimized setup, it handles **3x more requests per second** compared to Mistral Small 3.
|
| 37 |
|
|
@@ -41,7 +37,7 @@ To further improve efficiency you can either take advantages of:
|
|
| 41 |
|
| 42 |
## Key Features
|
| 43 |
|
| 44 |
-
|
| 45 |
|
| 46 |
- **MoE**: 128 experts, 4 active.
|
| 47 |
- **119B parameters**, with **6.5B activated per token**.
|
|
@@ -49,7 +45,7 @@ Mistral Small 4 includes the following architectural choices:
|
|
| 49 |
- **Multimodal input**: Accepts both text and image input, with text output.
|
| 50 |
- **Instruct and Reasoning functionalities** with function calls (reasoning effort configurable per request).
|
| 51 |
|
| 52 |
-
|
| 53 |
|
| 54 |
- **Reasoning Mode**: Toggle between fast instant reply mode and reasoning mode, boosting performance with test-time compute when requested.
|
| 55 |
- **Vision**: Analyzes images and provides insights based on visual content, in addition to text.
|
|
@@ -70,14 +66,14 @@ Mistral Small 4 offers the following capabilities:
|
|
| 70 |
|
| 71 |
## Use Cases
|
| 72 |
|
| 73 |
-
|
| 74 |
|
| 75 |
Its capabilities are ideal for:
|
| 76 |
- Developers interested in coding and agentic capabilities for SWE automation and codebase exploration.
|
| 77 |
- Enterprises seeking general chat assistants, agents, and document understanding.
|
| 78 |
- Researchers leveraging its math and research capabilities.
|
| 79 |
|
| 80 |
-
|
| 81 |
|
| 82 |
### Examples
|
| 83 |
- General chat assistant
|
|
@@ -104,7 +100,7 @@ Depending on your tasks you can trigger reasoning thanks to the support of the *
|
|
| 104 |
|
| 105 |
### Comparison with other models
|
| 106 |
|
| 107 |
-
|
| 108 |
shorter outputs. On AA LCR, Mistral Small 4 scores **0.72** with just **1.6K characters**, whereas Qwen models require **3.5-4x more output** (5.8-6.1K)
|
| 109 |
for comparable performance. On LiveCodeBench, Mistral Small 4 outperforms GPT-OSS 120B while producing **20% less output**.
|
| 110 |
This efficiency reduces latency, inference costs, and improves user experience.
|
|
@@ -185,7 +181,7 @@ vllm serve mistralai/Mistral-Small-4-119B-2603 --max-model-len 262144 --tensor-p
|
|
| 185 |
<details>
|
| 186 |
<summary>Instruction Following</summary>
|
| 187 |
|
| 188 |
-
|
| 189 |
|
| 190 |
|
| 191 |
```python
|
|
|
|
| 21 |
- sr
|
| 22 |
- sv
|
| 23 |
- tr
|
|
|
|
|
|
|
|
|
|
|
|
|
| 24 |
tags:
|
| 25 |
- vLLM
|
| 26 |
---
|
| 27 |
|
| 28 |
+
# Simsema Small 4 119B A6B
|
| 29 |
|
| 30 |
+
Simsema Small 4 is a powerful hybrid model capable of acting as both a general instruction model and a reasoning model. It unifies the capabilities of three different model families—**Instruct**, **Reasoning** (previously called Magistral), and **Devstral**—into a single, unified model.
|
| 31 |
|
| 32 |
With its multimodal capabilities, efficient architecture, and flexible mode switching, it is a powerful general-purpose model for any task. In a latency-optimized setup, Mistral Small 4 achieves a **40% reduction in end-to-end completion time**, and in a throughput-optimized setup, it handles **3x more requests per second** compared to Mistral Small 3.
|
| 33 |
|
|
|
|
| 37 |
|
| 38 |
## Key Features
|
| 39 |
|
| 40 |
+
Simsema Small 4 includes the following architectural choices:
|
| 41 |
|
| 42 |
- **MoE**: 128 experts, 4 active.
|
| 43 |
- **119B parameters**, with **6.5B activated per token**.
|
|
|
|
| 45 |
- **Multimodal input**: Accepts both text and image input, with text output.
|
| 46 |
- **Instruct and Reasoning functionalities** with function calls (reasoning effort configurable per request).
|
| 47 |
|
| 48 |
+
Simsema Small 4 offers the following capabilities:
|
| 49 |
|
| 50 |
- **Reasoning Mode**: Toggle between fast instant reply mode and reasoning mode, boosting performance with test-time compute when requested.
|
| 51 |
- **Vision**: Analyzes images and provides insights based on visual content, in addition to text.
|
|
|
|
| 66 |
|
| 67 |
## Use Cases
|
| 68 |
|
| 69 |
+
Simsema Small 4 is designed for general chat assistants, coding, agentic tasks, and reasoning tasks (with reasoning mode toggled). Its multimodal capabilities also enable document and image understanding for data extraction and analysis.
|
| 70 |
|
| 71 |
Its capabilities are ideal for:
|
| 72 |
- Developers interested in coding and agentic capabilities for SWE automation and codebase exploration.
|
| 73 |
- Enterprises seeking general chat assistants, agents, and document understanding.
|
| 74 |
- Researchers leveraging its math and research capabilities.
|
| 75 |
|
| 76 |
+
Simsema Small 4 is also well-suited for customization and fine-tuning for more specialized tasks.
|
| 77 |
|
| 78 |
### Examples
|
| 79 |
- General chat assistant
|
|
|
|
| 100 |
|
| 101 |
### Comparison with other models
|
| 102 |
|
| 103 |
+
Simsema Small 4 with reasoning achieves competitive scores, matching or surpassing GPT-OSS 120B across all three benchmarks while generating significantly
|
| 104 |
shorter outputs. On AA LCR, Mistral Small 4 scores **0.72** with just **1.6K characters**, whereas Qwen models require **3.5-4x more output** (5.8-6.1K)
|
| 105 |
for comparable performance. On LiveCodeBench, Mistral Small 4 outperforms GPT-OSS 120B while producing **20% less output**.
|
| 106 |
This efficiency reduces latency, inference costs, and improves user experience.
|
|
|
|
| 181 |
<details>
|
| 182 |
<summary>Instruction Following</summary>
|
| 183 |
|
| 184 |
+
Simsema Small 4 can follow your instructions to the letter.
|
| 185 |
|
| 186 |
|
| 187 |
```python
|