Simsema_Small-4-119B-32226

@@ -21,17 +21,13 @@ language:
 - sr
 - sv
 - tr
-- uk
-- vi
-- hi
-- bn
 tags:
 - vLLM
 ---
-# Mistral Small 4 119B A6B
-Mistral Small 4 is a powerful hybrid model capable of acting as both a general instruction model and a reasoning model. It unifies the capabilities of three different model families—**Instruct**, **Reasoning** (previously called Magistral), and **Devstral**—into a single, unified model.
 With its multimodal capabilities, efficient architecture, and flexible mode switching, it is a powerful general-purpose model for any task. In a latency-optimized setup, Mistral Small 4 achieves a **40% reduction in end-to-end completion time**, and in a throughput-optimized setup, it handles **3x more requests per second** compared to Mistral Small 3.
@@ -41,7 +37,7 @@ To further improve efficiency you can either take advantages of:
 ## Key Features
-Mistral Small 4 includes the following architectural choices:
 - **MoE**: 128 experts, 4 active.
 - **119B parameters**, with **6.5B activated per token**.
@@ -49,7 +45,7 @@ Mistral Small 4 includes the following architectural choices:
 - **Multimodal input**: Accepts both text and image input, with text output.
 - **Instruct and Reasoning functionalities** with function calls (reasoning effort configurable per request).
-Mistral Small 4 offers the following capabilities:
 - **Reasoning Mode**: Toggle between fast instant reply mode and reasoning mode, boosting performance with test-time compute when requested.
 - **Vision**: Analyzes images and provides insights based on visual content, in addition to text.
@@ -70,14 +66,14 @@ Mistral Small 4 offers the following capabilities:
 ## Use Cases
-Mistral Small 4 is designed for general chat assistants, coding, agentic tasks, and reasoning tasks (with reasoning mode toggled). Its multimodal capabilities also enable document and image understanding for data extraction and analysis.
 Its capabilities are ideal for:
 - Developers interested in coding and agentic capabilities for SWE automation and codebase exploration.
 - Enterprises seeking general chat assistants, agents, and document understanding.
 - Researchers leveraging its math and research capabilities.
-Mistral Small 4 is also well-suited for customization and fine-tuning for more specialized tasks.
 ### Examples
 - General chat assistant
@@ -104,7 +100,7 @@ Depending on your tasks you can trigger reasoning thanks to the support of the *
 ### Comparison with other models
-Mistral Small 4 with reasoning achieves competitive scores, matching or surpassing GPT-OSS 120B across all three benchmarks while generating significantly
 shorter outputs. On AA LCR, Mistral Small 4 scores **0.72** with just **1.6K characters**, whereas Qwen models require **3.5-4x more output** (5.8-6.1K)
 for comparable performance. On LiveCodeBench, Mistral Small 4 outperforms GPT-OSS 120B while producing **20% less output**.
 This efficiency reduces latency, inference costs, and improves user experience.
@@ -185,7 +181,7 @@ vllm serve mistralai/Mistral-Small-4-119B-2603 --max-model-len 262144 --tensor-p
 <details>
   <summary>Instruction Following</summary>
-Mistral Small 4 can follow your instructions to the letter.
 ```python

 - sr
 - sv
 - tr
 tags:
 - vLLM
 ---
+# Simsema Small 4 119B A6B
+Simsema Small 4 is a powerful hybrid model capable of acting as both a general instruction model and a reasoning model. It unifies the capabilities of three different model families—**Instruct**, **Reasoning** (previously called Magistral), and **Devstral**—into a single, unified model.
 With its multimodal capabilities, efficient architecture, and flexible mode switching, it is a powerful general-purpose model for any task. In a latency-optimized setup, Mistral Small 4 achieves a **40% reduction in end-to-end completion time**, and in a throughput-optimized setup, it handles **3x more requests per second** compared to Mistral Small 3.
 ## Key Features
+Simsema Small 4 includes the following architectural choices:
 - **MoE**: 128 experts, 4 active.
 - **119B parameters**, with **6.5B activated per token**.
 - **Multimodal input**: Accepts both text and image input, with text output.
 - **Instruct and Reasoning functionalities** with function calls (reasoning effort configurable per request).
+Simsema Small 4 offers the following capabilities:
 - **Reasoning Mode**: Toggle between fast instant reply mode and reasoning mode, boosting performance with test-time compute when requested.
 - **Vision**: Analyzes images and provides insights based on visual content, in addition to text.
 ## Use Cases
+Simsema Small 4 is designed for general chat assistants, coding, agentic tasks, and reasoning tasks (with reasoning mode toggled). Its multimodal capabilities also enable document and image understanding for data extraction and analysis.
 Its capabilities are ideal for:
 - Developers interested in coding and agentic capabilities for SWE automation and codebase exploration.
 - Enterprises seeking general chat assistants, agents, and document understanding.
 - Researchers leveraging its math and research capabilities.
+Simsema Small 4 is also well-suited for customization and fine-tuning for more specialized tasks.
 ### Examples
 - General chat assistant
 ### Comparison with other models
+Simsema Small 4 with reasoning achieves competitive scores, matching or surpassing GPT-OSS 120B across all three benchmarks while generating significantly
 shorter outputs. On AA LCR, Mistral Small 4 scores **0.72** with just **1.6K characters**, whereas Qwen models require **3.5-4x more output** (5.8-6.1K)
 for comparable performance. On LiveCodeBench, Mistral Small 4 outperforms GPT-OSS 120B while producing **20% less output**.
 This efficiency reduces latency, inference costs, and improves user experience.
 <details>
   <summary>Instruction Following</summary>
+Simsema Small 4 can follow your instructions to the letter.
 ```python