MiniCPM-o 4.5 — INT8 (bitsandbytes)
8-bit bitsandbytes quantization of openbmb/MiniCPM-o-4_5 with selective module skipping for audio/vision quality preservation.
Highlights
- Full multimodal capability preserved: text, vision, audio input, TTS output, voice cloning
- Only LLM transformer layers are quantized — audio encoder (Whisper), vision encoder (SigLIP), TTS decoder, and projection layers remain in bf16
- TTS
weight_normlayers are explicitly skipped (they crash under bitsandbytes quantization) - Tested with 293 unit + integration tests — all passing
- Benchmark-validated: text quality identical to bf16, audio quality within natural variation
VRAM Requirements
| Precision | VRAM (loaded) | Peak VRAM | Load Time |
|---|---|---|---|
| bf16 (baseline) | 21.0 GB | 21.3 GB | 16.3s |
| 8-bit (this repo) | 14.5 GB | 14.8 GB | 20.5s |
Benchmarked on NVIDIA RTX PRO 6000 Blackwell (96 GB). Your load times will vary.
Quick Start
from transformers import AutoModel, AutoTokenizer, BitsAndBytesConfig
import torch
model_name = "ericleigh007/MiniCPM-o-4_5-BNB-Int8"
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
model = AutoModel.from_pretrained(
model_name,
trust_remote_code=True,
attn_implementation="sdpa",
torch_dtype=torch.bfloat16,
device_map="auto",
init_vision=True,
init_audio=True,
init_tts=True,
)
model.eval()
model.init_tts()
Quantization Details
Method: bitsandbytes 8-bit linear
Modules kept in bf16 (not quantized):
| Module | Reason |
|---|---|
lm_head |
Output projection — standard practice |
apm |
Whisper audio encoder — small, quality-sensitive |
tts |
TTS decoder — uses weight_norm, incompatible with bitsandbytes |
vpm |
SigLIP vision encoder — small, quality-sensitive |
resampler |
Vision resampler — small |
audio_projection_layer |
Audio-to-LLM projector — small |
audio_avg_pooler |
Audio pooling layer — small |
Benchmark Results
Text quality is identical to bf16 (echo prompts return exact matches, math returns correct answers). Audio quality shows minor spectral variation within the range of normal run-to-run differences.
Full benchmark report with spectrograms: OmniChat Quantization Benchmarks
License
Same as the base model: MiniCPM Model License
Credits
- Base model by OpenBMB
- Quantization and testing by ericleigh007
- Part of the OmniChat project
Exported on 2026-03-11
- Downloads last month
- 44
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support
Model tree for ericleigh007/MiniCPM-o-4_5-BNB-Int8
Base model
openbmb/MiniCPM-o-4_5