How to use from the
Use from the
Diffusers library
pip install -U diffusers transformers accelerate
import torch
from diffusers import DiffusionPipeline

# switch to "mps" for apple devices
pipe = DiffusionPipeline.from_pretrained("WaveCut/Lens-Turbo-SDNQ-uint4-static", dtype=torch.bfloat16, device_map="cuda")

prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"
image = pipe(prompt).images[0]

Lens-Turbo SDNQ uint4 static

This is a corrected SDNQ static UINT4 quantized variant of microsoft/Lens-Turbo.

The first all-linear UINT4 attempt produced periodic grid artifacts and badly degraded text. An ablation found the culprit: quantizing the transformer block modulation linears (img_mod and txt_mod) damages Lens-Turbo disproportionately. This revision keeps those modulation layers in bfloat16 and quantizes the rest of the denoising transformer with SDNQ UINT4.

Visual Comparison

Full-size comparison grid: the image below is built from native 1024x1024 samples without resampling the image cells and saved as WebP quality 98. Raw file: assets/comparison/comparison_grid_1to1_q98.webp.

Original vs fixed SDNQ comparison grid

Quantization Recipe

Field Value
Method SDNQ uint4 static
Quantized component transformer / LensTransformer2DModel
Excluded transformer layers *.img_mod.*, *.txt_mod.*
Reason for exclusion UINT4 quantization of modulation linears caused periodic grid artifacts and severe text degradation
Weight dtype uint4
Quantized matmul enabled
Quantized matmul dtype int8
Dynamic quantization disabled
SVDQuant disabled
Hadamard rotation disabled
Text encoder unchanged from source checkpoint
VAE unchanged from source checkpoint
Compute dtype torch.bfloat16
Quantization time 0.178 s
{
  "weights_dtype": "uint4",
  "quantized_matmul_dtype": "int8",
  "group_size": 0,
  "use_static_quantization": true,
  "use_dynamic_quantization": false,
  "use_quantized_matmul": true,
  "use_svd": false,
  "use_hadamard": false,
  "quant_conv": false,
  "quant_embedding": false,
  "dequantize_fp32": false,
  "modules_to_not_convert": [
    "*.img_mod.*",
    "*.txt_mod.*"
  ],
  "modules_to_not_use_matmul": [],
  "quantization_device": "cuda",
  "return_device": "cuda"
}

Usage

import torch
from huggingface_hub import snapshot_download
from lens import LensPipeline, LensTransformer2DModel
from sdnq import load_sdnq_model

model_dir = snapshot_download("WaveCut/Lens-Turbo-SDNQ-uint4-static")
transformer = load_sdnq_model(
    model_dir + "/transformer",
    model_cls=LensTransformer2DModel,
    dtype=torch.bfloat16,
    device=torch.device("cuda"),
    dequantize_fp32=False,
    use_quantized_matmul=True,
)
pipe = LensPipeline.from_pretrained(
    model_dir,
    transformer=transformer,
    torch_dtype=torch.bfloat16,
).to("cuda")

Benchmark

Hardware: RunPod NVIDIA H100 80GB HBM3, PyTorch 2.8.0 CUDA 12.8 container, local container disk only. Benchmark date: 2026-05-24.

Metric Original Lens-Turbo SDNQ uint4 static fixed
Load time, seconds 19.272 13.461
Load peak allocated VRAM, GB 20.807 17.179
Load peak reserved VRAM, GB 20.928 17.244
Transformer tensor storage footprint, GB 16.417 4.301
Transformer storage reduction vs original baseline 73.8% smaller
Average prompt runtime, seconds 1.728 3.663

Transformer-only footprint is computed from safetensors tensor storage for the denoising transformer parameter tensors only; it excludes allocator overhead and non-transformer components. The original transformer tensors are F32; the corrected SDNQ transformer stores quantized tensors as U8 plus the excluded modulation layers as BF16.

Model CPU Offload Benchmark

Same hardware and 10 prompts, using pipe.enable_model_cpu_offload(). The reported load time uses a warm local Hugging Face cache on the container disk, so model download time is excluded. Each model was measured in a fresh Python process. Cold generation is P01, the first generation immediately after load/offload setup; warm generation aggregates P02-P10.

Metric Original Lens-Turbo SDNQ uint4 static fixed
Offload setup/load time, seconds 15.411 12.371
Offload setup peak allocated VRAM, GB 12.582 12.582
Offload setup peak reserved VRAM, GB 13.881 13.881
Cold generation time, seconds 8.434 8.440
Cold generation peak allocated VRAM, GB 18.945 15.085
Cold generation peak reserved VRAM, GB 19.262 15.238
Warm generation average time, seconds 5.731 4.976
Warm generation median time, seconds 5.141 3.855
Warm generation average peak allocated VRAM, GB 18.945 15.084
Warm generation average peak reserved VRAM, GB 19.267 15.249
Warm generation max peak allocated VRAM, GB 18.968 15.104
Warm generation max peak reserved VRAM, GB 19.290 15.280

Raw offload benchmark data: model_cpu_offload_benchmark.json.

In model_cpu_offload mode the setup/load VRAM peak is dominated by non-transformer components, so the load peak is effectively unchanged. During generation, where the denoising transformer is active, the SDNQ variant saves about 3.861 GB peak allocated VRAM on the warm prompts, a 20.4% reduction versus the original model.

10-Prompt Matrix

ID Scenario Seed Original time, s Quant time, s Delta Original peak allocated VRAM, GB Quant peak allocated VRAM, GB
P01 Orbital Night Market 101 1.579 2.268 +43.6% 23.245 19.585
P02 Arctic Research Desk 102 1.370 4.307 +214.4% 23.245 19.585
P03 Victorian Automaton Repair 103 3.190 4.111 +28.9% 23.244 19.585
P04 Mars Greenhouse Control Room 104 1.191 4.094 +243.7% 23.242 19.582
P05 Lost Railway Poster Wall 105 1.195 3.672 +207.3% 23.242 19.582
P06 Miniature Courtroom Diorama 106 1.188 3.577 +201.1% 23.244 19.584
P07 Rainy Seoul Book Cafe 107 1.190 3.597 +202.3% 23.244 19.585
P08 Oceanographic Expedition Map 108 1.184 3.695 +212.1% 23.244 19.584
P09 Renaissance Lab Notebook 109 1.197 3.648 +204.8% 23.242 19.582
P10 Russian Provincial Print Shop 110 3.993 3.664 -8.2% 23.252 19.593

Full Prompts

P01 - Orbital Night Market

A dense cinematic night market inside a transparent orbital habitat, with Earth curving below the glass floor, vendors selling glowing algae noodles and tiny repair drones, rain droplets floating in zero gravity, reflections on wet metal, and at least six readable signs in different places: a vertical neon sign saying "ORBITAL TEA HOUSE", a handwritten chalk menu saying "NO GRAVITY REFUNDS", a yellow safety placard saying "MAG BOOTS REQUIRED", a small receipt printer label saying "BAY 12 PICKUP", a red banner saying "FRESH SYNTH-MANGO", and a blue customs notice saying "DECLARE ALL MOON ROCKS". Ultra detailed, wide angle, layered crowd, realistic lens flare, crisp small typography.

P02 - Arctic Research Desk

A top-down documentary photo of an Arctic climate research desk inside a weather station during a blizzard, with ice crystals on the window, a rugged laptop displaying a complex map, three paper field notebooks, sample vials, a steaming enamel mug, and long English text on multiple objects: the notebook cover reads "FIELD LOG: STATION NORD, WEEK 17", a whiteboard in the background reads "CORE DEPTH 42.8m / TEMP -31C / WIND 62 km/h", a red tag on a sample tube reads "DO NOT THAW", and a printed memo reads "CALIBRATE SENSORS BEFORE SUNRISE". Natural cold light, precise shadows, photorealistic texture, no blurry text.

P03 - Victorian Automaton Repair

A richly detailed Victorian workshop where a brass clockwork automaton is being repaired under green banker lamps, with tiny gears, pearl inlays, oiled leather belts, smoke from a soldering iron, magnifying glass distortion, and handwritten labels everywhere. The main blueprint title must read "AUTOMATON HAND ASSEMBLY REV. C", a drawer label says "SPRINGS / EYES / MEMORY CAMS", a dangling tag says "CLIENT: LADY ADA", and a note pinned to the wall says "DO NOT WIND PAST MIDNIGHT". Moody chiaroscuro, shallow depth of field, extremely fine mechanical detail.

P04 - Mars Greenhouse Control Room

A believable Mars greenhouse control room at dawn, red dust outside the curved windows, rows of tomatoes and dwarf wheat under violet grow lights, condensation on transparent tubes, a tired botanist reflected in a touchscreen, and several readable UI panels in English: "OXYGEN LOOP STABLE", "WATER RECOVERY 98.4%", "SECTOR C: POLLINATION DRONES ACTIVE", and a sticky note saying "Tell Earth the basil survived". Technical but warm, high resolution, realistic sci-fi, detailed glass and plant textures.

P05 - Lost Railway Poster Wall

An abandoned underground railway platform turned into an accidental archive of travel posters, peeling ceramic tiles, puddles reflecting amber emergency lights, old suitcases, vines growing through cracked concrete, and five large posters with distinct readable titles: "THE NORTHERN COMET EXPRESS", "SLEEPER TO ISTANBUL", "MIDNIGHT PLATFORM 7", "COASTAL ROUTE REOPENING SOON", and "KEEP YOUR TICKET VISIBLE". Cinematic composition, wet surfaces, layered typography, realistic grime, strong perspective down the tracks.

P06 - Miniature Courtroom Diorama

A hyperreal macro photograph of a miniature courtroom diorama built inside an antique wooden drawer, with tiny judge bench, brass lamps, dust motes, paper exhibits smaller than postage stamps, a mouse-sized witness chair, and readable text on tiny documents: a case file labeled "CASE 1842-B: THE MISSING ORRERY", an evidence tag saying "EXHIBIT C", a court calendar reading "HEARING AT 9:30", and a placard on the judge bench saying "TRUTH IN SMALL THINGS". Macro lens, tactile materials, careful scale cues.

P07 - Rainy Seoul Book Cafe

A cozy but complex rainy evening scene in a narrow Seoul book cafe, viewed through a window covered in raindrops, shelves packed with art books, two students annotating a map, a barista steaming milk, warm tungsten light, street reflections, and multiple readable English text elements: a chalkboard says "TONIGHT: QUIET READING CLUB", a receipt says "OAT LATTE / CINNAMON BUN", a book spine says "ARCHITECTURE OF DREAMS", and a window sticker says "OPEN UNTIL THE LAST TRAIN". Photorealistic, cinematic, intricate reflections.

P08 - Oceanographic Expedition Map

A dramatic captain's table aboard a storm-tossed oceanographic research vessel, with a wet nautical chart, brass dividers, sonar printouts, bioluminescent plankton glowing in a glass jar, a cracked tablet, and readable labels distributed across the image: "TRENCH SURVEY LINE B", "DEPTH 10,928m", "ROV SIGNAL WEAK", "SAMPLE: BLUE VENT WATER", and a torn note saying "If the lights pulse twice, turn back". High detail, realistic water droplets, dark blue-green atmosphere, sharp text.

P09 - Renaissance Lab Notebook

An alternate-history Renaissance laboratory where an astronomer-painter is combining oil pigments with early electrical apparatus, with celestial globes, copper coils, stained glass sunlight, anatomical sketches, a half-finished portrait, and Latin-English notebook text visible on several pages: "LIGHT STUDY: BLUE VERDITER", "GALVANIC TEST NO. 8", "VENUS RISES BEFORE DAWN", and a folded letter sealed in wax reading "FOR THE WORKSHOP MASTER ONLY". Painterly realism, ornate detail, coherent objects, readable calligraphy.

P10 - Russian Provincial Print Shop

Сложная фотореалистичная сцена в старой провинциальной типографии поздним зимним вечером: за большим деревянным столом лежат металлические литеры, корректурные листы, линейки, чашка крепкого чая, заснеженное окно, тусклая лампа и следы типографской краски на пальцах наборщика. На разных элементах изображения должен быть длинный и хорошо читаемый русский текст: на вывеске над дверью написано "ТИПОГРАФИЯ СЕВЕРНЫЙ ЛИСТОК", на корректуре заголовок "СРОЧНО В НОМЕР: ГОРОДСКОЙ СОВЕТ ОТКРЫВАЕТ НОВУЮ БИБЛИОТЕКУ", на маленькой записке фраза "Проверить букву Ё во втором абзаце", а на календаре дата "Пятница, 24 января". Много бытовых деталей, глубокие тени, реалистичная кириллица, никакой размытой каши вместо текста.

Notes

This checkpoint is intended for research and evaluation. It inherits the upstream Lens limitations and responsible AI considerations from the source model. Text rendering remains challenging, but the corrected recipe removes the obvious grid/printed texture failure seen in the all-linear UINT4 attempt.

Downloads last month
59
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for WaveCut/Lens-Turbo-SDNQ-uint4-static

Finetuned
(1)
this model