MiniCPM-o 4.5 — Sculpt Throughput (keep_frac=0.82)

18% compression — moderate quality tradeoff

Structurally pruned from openbmb/MiniCPM-o-4_5 using Dystrio Sculpt. Only the Qwen3-8B LLM backbone is pruned — vision (SigLip2), audio (Whisper), and TTS (CosyVoice2) modules are untouched.

Quality (Downstream Probe — 250 questions)

Metric	Baseline	This Model	Retention
Weighted Accuracy	0.6756	0.5709	84.5%
MMLU	0.6700	0.5400	80.6%
HellaSwag	0.7625	0.6750	88.5%
ARC-Challenge	0.6000	0.5286	88.1%

Compression Details

keep_frac: 0.82 (18% of MLP intermediate neurons removed)
Method: Structural pruning with live teacher distillation (alpha=0.5)
Repair: Full repair pass with workload-matched training data
Architecture: All multimodal modules preserved; only LLM MLP layers compressed

Intended Use

Drop-in replacement for MiniCPM-o 4.5 with reduced memory footprint. Suitable for:

LoRA fine-tuning on memory-constrained GPUs
File description and indexing workloads
Multimodal inference with lower VRAM requirements

How to Use

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
    "dystrio/MiniCPM-o-4_5-Sculpt-Throughput",
    torch_dtype=torch.bfloat16,
    trust_remote_code=True,
)
tokenizer = AutoTokenizer.from_pretrained(
    "dystrio/MiniCPM-o-4_5-Sculpt-Throughput",
    trust_remote_code=True,
)

Downloads last month: 11

Safetensors

Model size

8B params

Tensor type

BF16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for dystrio/MiniCPM-o-4_5-Sculpt-Throughput

Base model

openbmb/MiniCPM-o-4_5

Finetuned

(7)

this model