Bernini Renderer β Mixed INT8+INT4 Quantized Weights
Distribution format: 13 GB per model (85% reduction from 79 GB FP32 originals)
Files
| File | Size | Description |
|---|---|---|
bernini_renderer_high.mixed-int8-int4p.safetensors |
13 GB | high model |
bernini_renderer_low.mixed-int8-int4p.safetensors |
13 GB | low model |
config.json |
β | Model architecture config |
model_high.safetensors.index.json |
β | Weight map for high variant |
model_low.safetensors.index.json |
β | Weight map for low variant |
load_mixed.py |
β | Python loader with dequantization |
Quantization Strategy
Mixed-precision per-channel asymmetric quantization:
| Component | Format | Rationale |
|---|---|---|
| T5 text encoder weights | INT8 (per-channel) | Preserves prompt encoding fidelity |
| Diffusion transformer attention/FFN | Packed INT4 (per-channel) | Bulk compression β 75% size reduction on largest tensors |
| Embedding tables (T5 + diffusion) | Packed INT4 (per-channel) | Lookup tables tolerate aggressive quantization |
| Layer norms, biases, scale_shift_tables | FP32 (unchanged) | Small tensors where precision matters |
Quality
Tested at 10 and 40 generation steps β visually indistinguishable from FP32 originals on standard prompts. Sharp details preserved, no artifacts or quality degradation observed.
Usage
from load_mixed import load_bernini_mixed
# Load high-res variant (dequantizes to FP32 in memory)
state_dict = load_bernini_mixed(
"bernini_renderer_high.mixed-int8-int4p.safetensors",
torch_dtype=torch.float16 # or torch.float32
)
Original Model
ByteDance/Bernini: https://huggingface.co/ByteDance/Bernini
Quantization performed by ultimo-intento, June 2026.
- Downloads last month
- 42
Inference Providers NEW
This model isn't deployed by any Inference Provider. π Ask for provider support
Model tree for ultimo-intento/bernini_renderer_mixed-int8-int4p
Base model
ByteDance/Bernini-R