Apertus-8B-Instruct-2509-bnb-8bit

This is an INT8 dynamically quantized version of swiss-ai/Apertus-8B-Instruct-2509 using llm-compressor.

This version used the fineweb-edu-score-2 dataset for calibration.

Quantization Details

  • Quantization Scheme: W8A8
  • Method: Dynamic quantization of weights and activations to INT8 (W8A8) format
  • Targets: All Linear layers
  • Ignored Layers: lm_head (kept in higher precision for better output quality)
  • Tool: llm-compressor
Downloads last month
-
Safetensors
Model size
8B params
Tensor type
F32
BF16
I8
Inference Providers NEW
This model isn't deployed by any Inference Provider. 馃檵 Ask for provider support