MiniCPM-o 4.5 — INT8 (bitsandbytes)

8-bit bitsandbytes quantization of openbmb/MiniCPM-o-4_5 with selective module skipping for audio/vision quality preservation.

Highlights

Full multimodal capability preserved: text, vision, audio input, TTS output, voice cloning
Only LLM transformer layers are quantized — audio encoder (Whisper), vision encoder (SigLIP), TTS decoder, and projection layers remain in bf16
TTS weight_norm layers are explicitly skipped (they crash under bitsandbytes quantization)
Tested with 293 unit + integration tests — all passing
Benchmark-validated: text quality identical to bf16, audio quality within natural variation

VRAM Requirements

Precision	VRAM (loaded)	Peak VRAM	Load Time
bf16 (baseline)	21.0 GB	21.3 GB	16.3s
8-bit (this repo)	14.5 GB	14.8 GB	20.5s

Benchmarked on NVIDIA RTX PRO 6000 Blackwell (96 GB). Your load times will vary.

Quick Start

from transformers import AutoModel, AutoTokenizer, BitsAndBytesConfig
import torch

model_name = "ericleigh007/MiniCPM-o-4_5-BNB-Int8"

tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
model = AutoModel.from_pretrained(
    model_name,
    trust_remote_code=True,
    attn_implementation="sdpa",
    torch_dtype=torch.bfloat16,
    device_map="auto",
    init_vision=True,
    init_audio=True,
    init_tts=True,
)
model.eval()
model.init_tts()

Quantization Details

Method: bitsandbytes 8-bit linear

Modules kept in bf16 (not quantized):

Module	Reason
`lm_head`	Output projection — standard practice
`apm`	Whisper audio encoder — small, quality-sensitive
`tts`	TTS decoder — uses `weight_norm`, incompatible with bitsandbytes
`vpm`	SigLIP vision encoder — small, quality-sensitive
`resampler`	Vision resampler — small
`audio_projection_layer`	Audio-to-LLM projector — small
`audio_avg_pooler`	Audio pooling layer — small

Benchmark Results

Text quality is identical to bf16 (echo prompts return exact matches, math returns correct answers). Audio quality shows minor spectral variation within the range of normal run-to-run differences.

Full benchmark report with spectrograms: OmniChat Quantization Benchmarks

License

Same as the base model: MiniCPM Model License

Credits

Base model by OpenBMB
Quantization and testing by ericleigh007
Part of the OmniChat project

Exported on 2026-03-11

Downloads last month: 11

Safetensors

Model size

9B params

Tensor type

BF16

F32

Inference Providers NEW

Any-to-Any

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ericleigh007/MiniCPM-o-4_5-BNB-Int8

Base model

openbmb/MiniCPM-o-4_5

Quantized

(7)

this model