Simsema commited on
Commit
69b9f5c
·
verified ·
1 Parent(s): 1b8e9c6

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +8 -12
README.md CHANGED
@@ -21,17 +21,13 @@ language:
21
  - sr
22
  - sv
23
  - tr
24
- - uk
25
- - vi
26
- - hi
27
- - bn
28
  tags:
29
  - vLLM
30
  ---
31
 
32
- # Mistral Small 4 119B A6B
33
 
34
- Mistral Small 4 is a powerful hybrid model capable of acting as both a general instruction model and a reasoning model. It unifies the capabilities of three different model families—**Instruct**, **Reasoning** (previously called Magistral), and **Devstral**—into a single, unified model.
35
 
36
  With its multimodal capabilities, efficient architecture, and flexible mode switching, it is a powerful general-purpose model for any task. In a latency-optimized setup, Mistral Small 4 achieves a **40% reduction in end-to-end completion time**, and in a throughput-optimized setup, it handles **3x more requests per second** compared to Mistral Small 3.
37
 
@@ -41,7 +37,7 @@ To further improve efficiency you can either take advantages of:
41
 
42
  ## Key Features
43
 
44
- Mistral Small 4 includes the following architectural choices:
45
 
46
  - **MoE**: 128 experts, 4 active.
47
  - **119B parameters**, with **6.5B activated per token**.
@@ -49,7 +45,7 @@ Mistral Small 4 includes the following architectural choices:
49
  - **Multimodal input**: Accepts both text and image input, with text output.
50
  - **Instruct and Reasoning functionalities** with function calls (reasoning effort configurable per request).
51
 
52
- Mistral Small 4 offers the following capabilities:
53
 
54
  - **Reasoning Mode**: Toggle between fast instant reply mode and reasoning mode, boosting performance with test-time compute when requested.
55
  - **Vision**: Analyzes images and provides insights based on visual content, in addition to text.
@@ -70,14 +66,14 @@ Mistral Small 4 offers the following capabilities:
70
 
71
  ## Use Cases
72
 
73
- Mistral Small 4 is designed for general chat assistants, coding, agentic tasks, and reasoning tasks (with reasoning mode toggled). Its multimodal capabilities also enable document and image understanding for data extraction and analysis.
74
 
75
  Its capabilities are ideal for:
76
  - Developers interested in coding and agentic capabilities for SWE automation and codebase exploration.
77
  - Enterprises seeking general chat assistants, agents, and document understanding.
78
  - Researchers leveraging its math and research capabilities.
79
 
80
- Mistral Small 4 is also well-suited for customization and fine-tuning for more specialized tasks.
81
 
82
  ### Examples
83
  - General chat assistant
@@ -104,7 +100,7 @@ Depending on your tasks you can trigger reasoning thanks to the support of the *
104
 
105
  ### Comparison with other models
106
 
107
- Mistral Small 4 with reasoning achieves competitive scores, matching or surpassing GPT-OSS 120B across all three benchmarks while generating significantly
108
  shorter outputs. On AA LCR, Mistral Small 4 scores **0.72** with just **1.6K characters**, whereas Qwen models require **3.5-4x more output** (5.8-6.1K)
109
  for comparable performance. On LiveCodeBench, Mistral Small 4 outperforms GPT-OSS 120B while producing **20% less output**.
110
  This efficiency reduces latency, inference costs, and improves user experience.
@@ -185,7 +181,7 @@ vllm serve mistralai/Mistral-Small-4-119B-2603 --max-model-len 262144 --tensor-p
185
  <details>
186
  <summary>Instruction Following</summary>
187
 
188
- Mistral Small 4 can follow your instructions to the letter.
189
 
190
 
191
  ```python
 
21
  - sr
22
  - sv
23
  - tr
 
 
 
 
24
  tags:
25
  - vLLM
26
  ---
27
 
28
+ # Simsema Small 4 119B A6B
29
 
30
+ Simsema Small 4 is a powerful hybrid model capable of acting as both a general instruction model and a reasoning model. It unifies the capabilities of three different model families—**Instruct**, **Reasoning** (previously called Magistral), and **Devstral**—into a single, unified model.
31
 
32
  With its multimodal capabilities, efficient architecture, and flexible mode switching, it is a powerful general-purpose model for any task. In a latency-optimized setup, Mistral Small 4 achieves a **40% reduction in end-to-end completion time**, and in a throughput-optimized setup, it handles **3x more requests per second** compared to Mistral Small 3.
33
 
 
37
 
38
  ## Key Features
39
 
40
+ Simsema Small 4 includes the following architectural choices:
41
 
42
  - **MoE**: 128 experts, 4 active.
43
  - **119B parameters**, with **6.5B activated per token**.
 
45
  - **Multimodal input**: Accepts both text and image input, with text output.
46
  - **Instruct and Reasoning functionalities** with function calls (reasoning effort configurable per request).
47
 
48
+ Simsema Small 4 offers the following capabilities:
49
 
50
  - **Reasoning Mode**: Toggle between fast instant reply mode and reasoning mode, boosting performance with test-time compute when requested.
51
  - **Vision**: Analyzes images and provides insights based on visual content, in addition to text.
 
66
 
67
  ## Use Cases
68
 
69
+ Simsema Small 4 is designed for general chat assistants, coding, agentic tasks, and reasoning tasks (with reasoning mode toggled). Its multimodal capabilities also enable document and image understanding for data extraction and analysis.
70
 
71
  Its capabilities are ideal for:
72
  - Developers interested in coding and agentic capabilities for SWE automation and codebase exploration.
73
  - Enterprises seeking general chat assistants, agents, and document understanding.
74
  - Researchers leveraging its math and research capabilities.
75
 
76
+ Simsema Small 4 is also well-suited for customization and fine-tuning for more specialized tasks.
77
 
78
  ### Examples
79
  - General chat assistant
 
100
 
101
  ### Comparison with other models
102
 
103
+ Simsema Small 4 with reasoning achieves competitive scores, matching or surpassing GPT-OSS 120B across all three benchmarks while generating significantly
104
  shorter outputs. On AA LCR, Mistral Small 4 scores **0.72** with just **1.6K characters**, whereas Qwen models require **3.5-4x more output** (5.8-6.1K)
105
  for comparable performance. On LiveCodeBench, Mistral Small 4 outperforms GPT-OSS 120B while producing **20% less output**.
106
  This efficiency reduces latency, inference costs, and improves user experience.
 
181
  <details>
182
  <summary>Instruction Following</summary>
183
 
184
+ Simsema Small 4 can follow your instructions to the letter.
185
 
186
 
187
  ```python