Instructions to use WaveCut/sdxs-2b-sdnq-t4-tebf16 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Diffusers
How to use WaveCut/sdxs-2b-sdnq-t4-tebf16 with Diffusers:
pip install -U diffusers transformers accelerate
import torch from diffusers import DiffusionPipeline # switch to "mps" for apple devices pipe = DiffusionPipeline.from_pretrained("WaveCut/sdxs-2b-sdnq-t4-tebf16", dtype=torch.bfloat16, device_map="cuda") prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k" image = pipe(prompt).images[0] - Notebooks
- Google Colab
- Kaggle
- Local Apps
- Draw Things
- DiffusionBee
import torch
from diffusers import DiffusionPipeline
# switch to "mps" for apple devices
pipe = DiffusionPipeline.from_pretrained("WaveCut/sdxs-2b-sdnq-t4-tebf16", dtype=torch.bfloat16, device_map="cuda")
prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"
image = pipe(prompt).images[0]Configuration Parsing Warning:In UNKNOWN_FILENAME: "diffusers._class_name" must be a string
SDXS-2B SDNQ T4-TEbf16
Best visual fidelity in this run: Cosmos transformer uint4, text encoder bf16, VAE bf16.
Base model: AiArtLab/sdxs-2b. Quantization engine: Disty0/sdnq.
Recommendation
Recommended when image quality matters most and ~3.0GB model size / ~3.0GB loaded VRAM is acceptable.
Quantization Recipe
- Transformer architecture:
CosmosTransformer3DModel. - Text encoder architecture:
Qwen3_5ForConditionalGeneration. - VAE:
AutoencoderKLQwenImage, left bf16 in all variants. - Quantized matmul:
int8,use_quantized_matmul=True. - Group size: SDNQ auto (
group_size=0, resolves per layer). - SVDQuant: disabled for this run.
- Embeddings/convs: not quantized; SDNQ skip keys preserve fragile input/output/projection paths.
Components:
transformer: targetuint4, classCosmosTransformer3DModel, SDNQ layers{'uint4': 442}
Full machine-readable settings are in sdnq_recipe.json and quantization_summary.json.
Size And Runtime Snapshot
Measured on an RTX 3090 with PyTorch 2.11.0+cu128, Diffusers 0.37.1, Transformers 5.8.1, SDNQ 0.1.8.
- Serialized model folder:
3.0147GiB. - Loaded allocated VRAM after load:
3.04GiB. - Peak VRAM during 768x1152 / 40-step comparison:
8.589GiB. - Warmup generation for Triton/SDNQ kernels:
2.308s. - Card prompt generation time:
21.463s. - Card-style prompt generation time:
18.3s.
The first inference after loading can include kernel compile/warmup; compare steady-state numbers after one warmup pass.
Usage
import torch
import sdnq # registers SDNQ quantization support before loading
from diffusers import DiffusionPipeline
pipe = DiffusionPipeline.from_pretrained(
'WaveCut/sdxs-2b-sdnq-t4-tebf16',
torch_dtype=torch.bfloat16,
trust_remote_code=True,
).to('cuda')
prompt = 'A blonde-haired Red Eyes girl with a hair ribbon, half-updo, and tsurime stands solo in a flower field holding a bouquet with a serene smile, wearing green overalls, a white shirt, rolled-up sleeves, and a straw hat with a flower while looking at the viewer under volumetric and natural lighting with a Dutch angle.'
negative_prompt = 'worst quality, low quality, loli, low details, blurry, jpeg artifacts, unfinished, sketch, sepia, missing limb, text, bad anatomy, bad proportions, bad hands, missing fingers'
image = pipe(
prompt=prompt,
negative_prompt=negative_prompt,
height=1152,
width=768,
num_inference_steps=40,
guidance_scale=4.0,
seed=0,
).images[0]
image.save('sample.png')
Prompting Notes From The Base Card
The upstream card recommends long, already-refined prompts rather than tiny labels. The model is focused on art / illustration / anime / photo, trained at 576-1152 resolution with 64-pixel steps, with default 768x1152. Use a detailed positive prompt and the negative prompt above.
Visual Comparison
The grids below compare baseline bf16 against all generated variants using the same prompt, negative prompt, seed, 768x1152 output, 40 steps, guidance 4.0.
Caveats
- This is an alpha-model quantization experiment, not an upstream release.
trust_remote_code=Trueis required because the base pipeline is custom.
- Downloads last month
- 32
Model tree for WaveCut/sdxs-2b-sdnq-t4-tebf16
Base model
AiArtLab/sdxs-2b
