Bernini Renderer — Mixed INT8+INT4 Quantized Weights

Distribution format: 13 GB per model (85% reduction from 79 GB FP32 originals)

Files

File	Size	Description
`bernini_renderer_high.mixed-int8-int4p.safetensors`	13 GB	high model
`bernini_renderer_low.mixed-int8-int4p.safetensors`	13 GB	low model
`config.json`	—	Model architecture config
`model_high.safetensors.index.json`	—	Weight map for high variant
`model_low.safetensors.index.json`	—	Weight map for low variant
`load_mixed.py`	—	Python loader with dequantization

Quantization Strategy

Mixed-precision per-channel asymmetric quantization:

Component	Format	Rationale
T5 text encoder weights	INT8 (per-channel)	Preserves prompt encoding fidelity
Diffusion transformer attention/FFN	Packed INT4 (per-channel)	Bulk compression — 75% size reduction on largest tensors
Embedding tables (T5 + diffusion)	Packed INT4 (per-channel)	Lookup tables tolerate aggressive quantization
Layer norms, biases, scale_shift_tables	FP32 (unchanged)	Small tensors where precision matters

Quality

Tested at 10 and 40 generation steps — visually indistinguishable from FP32 originals on standard prompts. Sharp details preserved, no artifacts or quality degradation observed.

Usage

from load_mixed import load_bernini_mixed

# Load high-res variant (dequantizes to FP32 in memory)
state_dict = load_bernini_mixed(
    "bernini_renderer_high.mixed-int8-int4p.safetensors",
    torch_dtype=torch.float16  # or torch.float32
)

Original Model

ByteDance/Bernini: https://huggingface.co/ByteDance/Bernini

Quantization performed by ultimo-intento, June 2026.

Downloads last month: 9

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ultimo-intento/bernini_renderer_mixed-int8-int4p

Base model

ByteDance/Bernini-R

Finetuned

(6)

this model