Configuration Parsing Warning:Config file tokenizer_config.json cannot be fetched (too big)

Mistral-Small-3.2-24B-Instruct-2506-MLX-8bit

MLX quantized version of Mistral Small 3.2 24B Instruct 2506.

Quantization

Method: Q8 (8-bit integer quantization)
Bits per weight: 8
Details: Uniform 8-bit integer quantization with group size 64.
Converted with: mlx-lm

Usage

pip install mlx-lm

from mlx_lm import load, generate

model, tokenizer = load("alankessler/Mistral-Small-3.2-24B-Instruct-2506-MLX-8bit")
prompt = tokenizer.apply_chat_template(
    [{"role": "user", "content": "Hello!"}],
    add_generation_prompt=True,
    tokenize=False,
)
response = generate(model, tokenizer, prompt=prompt, max_tokens=512)
print(response)

Base Model

Model: Mistral Small 3.2 24B Instruct 2506
Parameters: 24B
Architecture: Mistral Small 3.2
License: Apache 2.0

Downloads last month: 21

Safetensors

Model size

24B params

Tensor type

BF16

U32

MLX

Hardware compatibility

8-bit

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for alankessler/Mistral-Small-3.2-24B-Instruct-2506-MLX-8bit

Base model

mistralai/Mistral-Small-3.1-24B-Base-2503

Finetuned

mistralai/Mistral-Small-3.2-24B-Instruct-2506

Quantized

(64)

this model