Instructions to use WaveCut/Lens-Turbo-SDNQ-uint4-static with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Diffusers
How to use WaveCut/Lens-Turbo-SDNQ-uint4-static with Diffusers:
pip install -U diffusers transformers accelerate
import torch from diffusers import DiffusionPipeline # switch to "mps" for apple devices pipe = DiffusionPipeline.from_pretrained("WaveCut/Lens-Turbo-SDNQ-uint4-static", dtype=torch.bfloat16, device_map="cuda") prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k" image = pipe(prompt).images[0] - Notebooks
- Google Colab
- Kaggle
- Local Apps
- Draw Things
- DiffusionBee
Lens-Turbo SDNQ uint4 static
This is a corrected SDNQ static UINT4 quantized variant of microsoft/Lens-Turbo.
The first all-linear UINT4 attempt produced periodic grid artifacts and badly degraded text. An ablation found the culprit: quantizing the transformer block modulation linears (img_mod and txt_mod) damages Lens-Turbo disproportionately. This revision keeps those modulation layers in bfloat16 and quantizes the rest of the denoising transformer with SDNQ UINT4.
Visual Comparison
Full-size comparison grid: the image below is built from native 1024x1024 samples without resampling the image cells and saved as WebP quality 98. Raw file: assets/comparison/comparison_grid_1to1_q98.webp.
Quantization Recipe
| Field | Value |
|---|---|
| Method | SDNQ uint4 static |
| Quantized component | transformer / LensTransformer2DModel |
| Excluded transformer layers | *.img_mod.*, *.txt_mod.* |
| Reason for exclusion | UINT4 quantization of modulation linears caused periodic grid artifacts and severe text degradation |
| Weight dtype | uint4 |
| Quantized matmul | enabled |
| Quantized matmul dtype | int8 |
| Dynamic quantization | disabled |
| SVDQuant | disabled |
| Hadamard rotation | disabled |
| Text encoder | unchanged from source checkpoint |
| VAE | unchanged from source checkpoint |
| Compute dtype | torch.bfloat16 |
| Quantization time | 0.178 s |
{
"weights_dtype": "uint4",
"quantized_matmul_dtype": "int8",
"group_size": 0,
"use_static_quantization": true,
"use_dynamic_quantization": false,
"use_quantized_matmul": true,
"use_svd": false,
"use_hadamard": false,
"quant_conv": false,
"quant_embedding": false,
"dequantize_fp32": false,
"modules_to_not_convert": [
"*.img_mod.*",
"*.txt_mod.*"
],
"modules_to_not_use_matmul": [],
"quantization_device": "cuda",
"return_device": "cuda"
}
Usage
import torch
from huggingface_hub import snapshot_download
from lens import LensPipeline, LensTransformer2DModel
from sdnq import load_sdnq_model
model_dir = snapshot_download("WaveCut/Lens-Turbo-SDNQ-uint4-static")
transformer = load_sdnq_model(
model_dir + "/transformer",
model_cls=LensTransformer2DModel,
dtype=torch.bfloat16,
device=torch.device("cuda"),
dequantize_fp32=False,
use_quantized_matmul=True,
)
pipe = LensPipeline.from_pretrained(
model_dir,
transformer=transformer,
torch_dtype=torch.bfloat16,
).to("cuda")
Benchmark
Hardware: RunPod NVIDIA H100 80GB HBM3, PyTorch 2.8.0 CUDA 12.8 container, local container disk only. Benchmark date: 2026-05-24.
| Metric | Original Lens-Turbo | SDNQ uint4 static fixed |
|---|---|---|
| Load time, seconds | 19.272 | 13.461 |
| Load peak allocated VRAM, GB | 20.807 | 17.179 |
| Load peak reserved VRAM, GB | 20.928 | 17.244 |
| Transformer tensor storage footprint, GB | 16.417 | 4.301 |
| Transformer storage reduction vs original | baseline | 73.8% smaller |
| Average prompt runtime, seconds | 1.728 | 3.663 |
Transformer-only footprint is computed from safetensors tensor storage for the denoising transformer parameter tensors only; it excludes allocator overhead and non-transformer components. The original transformer tensors are F32; the corrected SDNQ transformer stores quantized tensors as U8 plus the excluded modulation layers as BF16.
Model CPU Offload Benchmark
Same hardware and 10 prompts, using pipe.enable_model_cpu_offload(). The reported load time uses a warm local Hugging Face cache on the container disk, so model download time is excluded. Each model was measured in a fresh Python process. Cold generation is P01, the first generation immediately after load/offload setup; warm generation aggregates P02-P10.
| Metric | Original Lens-Turbo | SDNQ uint4 static fixed |
|---|---|---|
| Offload setup/load time, seconds | 15.411 | 12.371 |
| Offload setup peak allocated VRAM, GB | 12.582 | 12.582 |
| Offload setup peak reserved VRAM, GB | 13.881 | 13.881 |
| Cold generation time, seconds | 8.434 | 8.440 |
| Cold generation peak allocated VRAM, GB | 18.945 | 15.085 |
| Cold generation peak reserved VRAM, GB | 19.262 | 15.238 |
| Warm generation average time, seconds | 5.731 | 4.976 |
| Warm generation median time, seconds | 5.141 | 3.855 |
| Warm generation average peak allocated VRAM, GB | 18.945 | 15.084 |
| Warm generation average peak reserved VRAM, GB | 19.267 | 15.249 |
| Warm generation max peak allocated VRAM, GB | 18.968 | 15.104 |
| Warm generation max peak reserved VRAM, GB | 19.290 | 15.280 |
Raw offload benchmark data: model_cpu_offload_benchmark.json.
In model_cpu_offload mode the setup/load VRAM peak is dominated by non-transformer components, so the load peak is effectively unchanged. During generation, where the denoising transformer is active, the SDNQ variant saves about 3.861 GB peak allocated VRAM on the warm prompts, a 20.4% reduction versus the original model.
10-Prompt Matrix
| ID | Scenario | Seed | Original time, s | Quant time, s | Delta | Original peak allocated VRAM, GB | Quant peak allocated VRAM, GB |
|---|---|---|---|---|---|---|---|
| P01 | Orbital Night Market | 101 | 1.579 | 2.268 | +43.6% | 23.245 | 19.585 |
| P02 | Arctic Research Desk | 102 | 1.370 | 4.307 | +214.4% | 23.245 | 19.585 |
| P03 | Victorian Automaton Repair | 103 | 3.190 | 4.111 | +28.9% | 23.244 | 19.585 |
| P04 | Mars Greenhouse Control Room | 104 | 1.191 | 4.094 | +243.7% | 23.242 | 19.582 |
| P05 | Lost Railway Poster Wall | 105 | 1.195 | 3.672 | +207.3% | 23.242 | 19.582 |
| P06 | Miniature Courtroom Diorama | 106 | 1.188 | 3.577 | +201.1% | 23.244 | 19.584 |
| P07 | Rainy Seoul Book Cafe | 107 | 1.190 | 3.597 | +202.3% | 23.244 | 19.585 |
| P08 | Oceanographic Expedition Map | 108 | 1.184 | 3.695 | +212.1% | 23.244 | 19.584 |
| P09 | Renaissance Lab Notebook | 109 | 1.197 | 3.648 | +204.8% | 23.242 | 19.582 |
| P10 | Russian Provincial Print Shop | 110 | 3.993 | 3.664 | -8.2% | 23.252 | 19.593 |
Full Prompts
P01 - Orbital Night Market
A dense cinematic night market inside a transparent orbital habitat, with Earth curving below the glass floor, vendors selling glowing algae noodles and tiny repair drones, rain droplets floating in zero gravity, reflections on wet metal, and at least six readable signs in different places: a vertical neon sign saying "ORBITAL TEA HOUSE", a handwritten chalk menu saying "NO GRAVITY REFUNDS", a yellow safety placard saying "MAG BOOTS REQUIRED", a small receipt printer label saying "BAY 12 PICKUP", a red banner saying "FRESH SYNTH-MANGO", and a blue customs notice saying "DECLARE ALL MOON ROCKS". Ultra detailed, wide angle, layered crowd, realistic lens flare, crisp small typography.
P02 - Arctic Research Desk
A top-down documentary photo of an Arctic climate research desk inside a weather station during a blizzard, with ice crystals on the window, a rugged laptop displaying a complex map, three paper field notebooks, sample vials, a steaming enamel mug, and long English text on multiple objects: the notebook cover reads "FIELD LOG: STATION NORD, WEEK 17", a whiteboard in the background reads "CORE DEPTH 42.8m / TEMP -31C / WIND 62 km/h", a red tag on a sample tube reads "DO NOT THAW", and a printed memo reads "CALIBRATE SENSORS BEFORE SUNRISE". Natural cold light, precise shadows, photorealistic texture, no blurry text.
P03 - Victorian Automaton Repair
A richly detailed Victorian workshop where a brass clockwork automaton is being repaired under green banker lamps, with tiny gears, pearl inlays, oiled leather belts, smoke from a soldering iron, magnifying glass distortion, and handwritten labels everywhere. The main blueprint title must read "AUTOMATON HAND ASSEMBLY REV. C", a drawer label says "SPRINGS / EYES / MEMORY CAMS", a dangling tag says "CLIENT: LADY ADA", and a note pinned to the wall says "DO NOT WIND PAST MIDNIGHT". Moody chiaroscuro, shallow depth of field, extremely fine mechanical detail.
P04 - Mars Greenhouse Control Room
A believable Mars greenhouse control room at dawn, red dust outside the curved windows, rows of tomatoes and dwarf wheat under violet grow lights, condensation on transparent tubes, a tired botanist reflected in a touchscreen, and several readable UI panels in English: "OXYGEN LOOP STABLE", "WATER RECOVERY 98.4%", "SECTOR C: POLLINATION DRONES ACTIVE", and a sticky note saying "Tell Earth the basil survived". Technical but warm, high resolution, realistic sci-fi, detailed glass and plant textures.
P05 - Lost Railway Poster Wall
An abandoned underground railway platform turned into an accidental archive of travel posters, peeling ceramic tiles, puddles reflecting amber emergency lights, old suitcases, vines growing through cracked concrete, and five large posters with distinct readable titles: "THE NORTHERN COMET EXPRESS", "SLEEPER TO ISTANBUL", "MIDNIGHT PLATFORM 7", "COASTAL ROUTE REOPENING SOON", and "KEEP YOUR TICKET VISIBLE". Cinematic composition, wet surfaces, layered typography, realistic grime, strong perspective down the tracks.
P06 - Miniature Courtroom Diorama
A hyperreal macro photograph of a miniature courtroom diorama built inside an antique wooden drawer, with tiny judge bench, brass lamps, dust motes, paper exhibits smaller than postage stamps, a mouse-sized witness chair, and readable text on tiny documents: a case file labeled "CASE 1842-B: THE MISSING ORRERY", an evidence tag saying "EXHIBIT C", a court calendar reading "HEARING AT 9:30", and a placard on the judge bench saying "TRUTH IN SMALL THINGS". Macro lens, tactile materials, careful scale cues.
P07 - Rainy Seoul Book Cafe
A cozy but complex rainy evening scene in a narrow Seoul book cafe, viewed through a window covered in raindrops, shelves packed with art books, two students annotating a map, a barista steaming milk, warm tungsten light, street reflections, and multiple readable English text elements: a chalkboard says "TONIGHT: QUIET READING CLUB", a receipt says "OAT LATTE / CINNAMON BUN", a book spine says "ARCHITECTURE OF DREAMS", and a window sticker says "OPEN UNTIL THE LAST TRAIN". Photorealistic, cinematic, intricate reflections.
P08 - Oceanographic Expedition Map
A dramatic captain's table aboard a storm-tossed oceanographic research vessel, with a wet nautical chart, brass dividers, sonar printouts, bioluminescent plankton glowing in a glass jar, a cracked tablet, and readable labels distributed across the image: "TRENCH SURVEY LINE B", "DEPTH 10,928m", "ROV SIGNAL WEAK", "SAMPLE: BLUE VENT WATER", and a torn note saying "If the lights pulse twice, turn back". High detail, realistic water droplets, dark blue-green atmosphere, sharp text.
P09 - Renaissance Lab Notebook
An alternate-history Renaissance laboratory where an astronomer-painter is combining oil pigments with early electrical apparatus, with celestial globes, copper coils, stained glass sunlight, anatomical sketches, a half-finished portrait, and Latin-English notebook text visible on several pages: "LIGHT STUDY: BLUE VERDITER", "GALVANIC TEST NO. 8", "VENUS RISES BEFORE DAWN", and a folded letter sealed in wax reading "FOR THE WORKSHOP MASTER ONLY". Painterly realism, ornate detail, coherent objects, readable calligraphy.
P10 - Russian Provincial Print Shop
Сложная фотореалистичная сцена в старой провинциальной типографии поздним зимним вечером: за большим деревянным столом лежат металлические литеры, корректурные листы, линейки, чашка крепкого чая, заснеженное окно, тусклая лампа и следы типографской краски на пальцах наборщика. На разных элементах изображения должен быть длинный и хорошо читаемый русский текст: на вывеске над дверью написано "ТИПОГРАФИЯ СЕВЕРНЫЙ ЛИСТОК", на корректуре заголовок "СРОЧНО В НОМЕР: ГОРОДСКОЙ СОВЕТ ОТКРЫВАЕТ НОВУЮ БИБЛИОТЕКУ", на маленькой записке фраза "Проверить букву Ё во втором абзаце", а на календаре дата "Пятница, 24 января". Много бытовых деталей, глубокие тени, реалистичная кириллица, никакой размытой каши вместо текста.
Notes
This checkpoint is intended for research and evaluation. It inherits the upstream Lens limitations and responsible AI considerations from the source model. Text rendering remains challenging, but the corrected recipe removes the obvious grid/printed texture failure seen in the all-linear UINT4 attempt.
- Downloads last month
- 59
Model tree for WaveCut/Lens-Turbo-SDNQ-uint4-static
Base model
microsoft/Lens-Turbo