arxiv:2601.08584

Ministral 3

Published on Jan 13

· Submitted by

taesiri on Jan 14

Mistral AI_

Upvote

Authors:

Sandeep Subramanian ,

Victor Jouault ,

Adrien Sadé ,

Alan Jeffares ,

Alexandre Sablayrolles ,

Amos You ,

Andy Ehrenberg ,

Antonia Calvi ,

Avinash Sooriyarachchi ,

Baptiste Bout ,

Clémence Lanfranchi

Abstract

The Ministral 3 series consists of parameter-efficient dense language models with three sizes (3B, 8B, 14B) and three variants per size, trained using cascade distillation for compute-constrained applications.

AI-generated summary

We introduce the Ministral 3 series, a family of parameter-efficient dense language models designed for compute and memory constrained applications, available in three model sizes: 3B, 8B, and 14B parameters. For each model size, we release three variants: a pretrained base model for general-purpose use, an instruction finetuned, and a reasoning model for complex problem-solving. In addition, we present our recipe to derive the Ministral 3 models through Cascade Distillation, an iterative pruning and continued training with distillation technique. Each model comes with image understanding capabilities, all under the Apache 2.0 license.