Bernini Renderer β€” Mixed INT8+INT4 Quantized Weights

Distribution format: 13 GB per model (85% reduction from 79 GB FP32 originals)

Files

File Size Description
bernini_renderer_high.mixed-int8-int4p.safetensors 13 GB high model
bernini_renderer_low.mixed-int8-int4p.safetensors 13 GB low model
config.json β€” Model architecture config
model_high.safetensors.index.json β€” Weight map for high variant
model_low.safetensors.index.json β€” Weight map for low variant
load_mixed.py β€” Python loader with dequantization

Quantization Strategy

Mixed-precision per-channel asymmetric quantization:

Component Format Rationale
T5 text encoder weights INT8 (per-channel) Preserves prompt encoding fidelity
Diffusion transformer attention/FFN Packed INT4 (per-channel) Bulk compression β€” 75% size reduction on largest tensors
Embedding tables (T5 + diffusion) Packed INT4 (per-channel) Lookup tables tolerate aggressive quantization
Layer norms, biases, scale_shift_tables FP32 (unchanged) Small tensors where precision matters

Quality

Tested at 10 and 40 generation steps β€” visually indistinguishable from FP32 originals on standard prompts. Sharp details preserved, no artifacts or quality degradation observed.

Usage

from load_mixed import load_bernini_mixed

# Load high-res variant (dequantizes to FP32 in memory)
state_dict = load_bernini_mixed(
    "bernini_renderer_high.mixed-int8-int4p.safetensors",
    torch_dtype=torch.float16  # or torch.float32
)

Original Model

ByteDance/Bernini: https://huggingface.co/ByteDance/Bernini

Quantization performed by ultimo-intento, June 2026.

Downloads last month
42
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for ultimo-intento/bernini_renderer_mixed-int8-int4p

Finetuned
(1)
this model