MiniCPM-o 4.5 β Sculpt Throughput (keep_frac=0.82)
18% compression β moderate quality tradeoff
Structurally pruned from openbmb/MiniCPM-o-4_5 using Dystrio Sculpt. Only the Qwen3-8B LLM backbone is pruned β vision (SigLip2), audio (Whisper), and TTS (CosyVoice2) modules are untouched.
Quality (Downstream Probe β 250 questions)
| Metric | Baseline | This Model | Retention |
|---|---|---|---|
| Weighted Accuracy | 0.6756 | 0.5709 | 84.5% |
| MMLU | 0.6700 | 0.5400 | 80.6% |
| HellaSwag | 0.7625 | 0.6750 | 88.5% |
| ARC-Challenge | 0.6000 | 0.5286 | 88.1% |
Compression Details
- keep_frac: 0.82 (18% of MLP intermediate neurons removed)
- Method: Structural pruning with live teacher distillation (alpha=0.5)
- Repair: Full repair pass with workload-matched training data
- Architecture: All multimodal modules preserved; only LLM MLP layers compressed
Intended Use
Drop-in replacement for MiniCPM-o 4.5 with reduced memory footprint. Suitable for:
- LoRA fine-tuning on memory-constrained GPUs
- File description and indexing workloads
- Multimodal inference with lower VRAM requirements
How to Use
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained(
"dystrio/MiniCPM-o-4_5-Sculpt-Throughput",
torch_dtype=torch.bfloat16,
trust_remote_code=True,
)
tokenizer = AutoTokenizer.from_pretrained(
"dystrio/MiniCPM-o-4_5-Sculpt-Throughput",
trust_remote_code=True,
)
- Downloads last month
- 16
Inference Providers NEW
This model isn't deployed by any Inference Provider. π Ask for provider support
Model tree for dystrio/MiniCPM-o-4_5-Sculpt-Throughput
Base model
openbmb/MiniCPM-o-4_5