MiniCPM-o 4.5 — INT8 (bitsandbytes)

8-bit bitsandbytes quantization of openbmb/MiniCPM-o-4_5 with selective module skipping for audio/vision quality preservation.

Highlights

  • Full multimodal capability preserved: text, vision, audio input, TTS output, voice cloning
  • Only LLM transformer layers are quantized — audio encoder (Whisper), vision encoder (SigLIP), TTS decoder, and projection layers remain in bf16
  • TTS weight_norm layers are explicitly skipped (they crash under bitsandbytes quantization)
  • Tested with 293 unit + integration tests — all passing
  • Benchmark-validated: text quality identical to bf16, audio quality within natural variation

VRAM Requirements

Precision VRAM (loaded) Peak VRAM Load Time
bf16 (baseline) 21.0 GB 21.3 GB 16.3s
8-bit (this repo) 14.5 GB 14.8 GB 20.5s

Benchmarked on NVIDIA RTX PRO 6000 Blackwell (96 GB). Your load times will vary.

Quick Start

from transformers import AutoModel, AutoTokenizer, BitsAndBytesConfig
import torch

model_name = "ericleigh007/MiniCPM-o-4_5-BNB-Int8"

tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
model = AutoModel.from_pretrained(
    model_name,
    trust_remote_code=True,
    attn_implementation="sdpa",
    torch_dtype=torch.bfloat16,
    device_map="auto",
    init_vision=True,
    init_audio=True,
    init_tts=True,
)
model.eval()
model.init_tts()

Quantization Details

Method: bitsandbytes 8-bit linear

Modules kept in bf16 (not quantized):

Module Reason
lm_head Output projection — standard practice
apm Whisper audio encoder — small, quality-sensitive
tts TTS decoder — uses weight_norm, incompatible with bitsandbytes
vpm SigLIP vision encoder — small, quality-sensitive
resampler Vision resampler — small
audio_projection_layer Audio-to-LLM projector — small
audio_avg_pooler Audio pooling layer — small

Benchmark Results

Text quality is identical to bf16 (echo prompts return exact matches, math returns correct answers). Audio quality shows minor spectral variation within the range of normal run-to-run differences.

Full benchmark report with spectrograms: OmniChat Quantization Benchmarks

License

Same as the base model: MiniCPM Model License

Credits


Exported on 2026-03-11

Downloads last month
44
Safetensors
Model size
9B params
Tensor type
BF16
·
F32
·
I8
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ericleigh007/MiniCPM-o-4_5-BNB-Int8

Quantized
(3)
this model