MiniCPM-o 4.5 β€” Sculpt Throughput (keep_frac=0.82)

18% compression β€” moderate quality tradeoff

Structurally pruned from openbmb/MiniCPM-o-4_5 using Dystrio Sculpt. Only the Qwen3-8B LLM backbone is pruned β€” vision (SigLip2), audio (Whisper), and TTS (CosyVoice2) modules are untouched.

Quality (Downstream Probe β€” 250 questions)

Metric Baseline This Model Retention
Weighted Accuracy 0.6756 0.5709 84.5%
MMLU 0.6700 0.5400 80.6%
HellaSwag 0.7625 0.6750 88.5%
ARC-Challenge 0.6000 0.5286 88.1%

Compression Details

  • keep_frac: 0.82 (18% of MLP intermediate neurons removed)
  • Method: Structural pruning with live teacher distillation (alpha=0.5)
  • Repair: Full repair pass with workload-matched training data
  • Architecture: All multimodal modules preserved; only LLM MLP layers compressed

Intended Use

Drop-in replacement for MiniCPM-o 4.5 with reduced memory footprint. Suitable for:

  • LoRA fine-tuning on memory-constrained GPUs
  • File description and indexing workloads
  • Multimodal inference with lower VRAM requirements

How to Use

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
    "dystrio/MiniCPM-o-4_5-Sculpt-Throughput",
    torch_dtype=torch.bfloat16,
    trust_remote_code=True,
)
tokenizer = AutoTokenizer.from_pretrained(
    "dystrio/MiniCPM-o-4_5-Sculpt-Throughput",
    trust_remote_code=True,
)
Downloads last month
16
Safetensors
Model size
8B params
Tensor type
BF16
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for dystrio/MiniCPM-o-4_5-Sculpt-Throughput

Finetuned
(4)
this model