How to use from the
Use from the
Diffusers library
pip install -U diffusers transformers accelerate
import torch
from diffusers import DiffusionPipeline

# switch to "mps" for apple devices
pipe = DiffusionPipeline.from_pretrained("WaveCut/sdxs-2b-sdnq-t4-tebf16", dtype=torch.bfloat16, device_map="cuda")

prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"
image = pipe(prompt).images[0]

Configuration Parsing Warning:In UNKNOWN_FILENAME: "diffusers._class_name" must be a string

SDXS-2B SDNQ T4-TEbf16

Best visual fidelity in this run: Cosmos transformer uint4, text encoder bf16, VAE bf16.

Base model: AiArtLab/sdxs-2b. Quantization engine: Disty0/sdnq.

Recommendation

Recommended when image quality matters most and ~3.0GB model size / ~3.0GB loaded VRAM is acceptable.

Quantization Recipe

  • Transformer architecture: CosmosTransformer3DModel.
  • Text encoder architecture: Qwen3_5ForConditionalGeneration.
  • VAE: AutoencoderKLQwenImage, left bf16 in all variants.
  • Quantized matmul: int8, use_quantized_matmul=True.
  • Group size: SDNQ auto (group_size=0, resolves per layer).
  • SVDQuant: disabled for this run.
  • Embeddings/convs: not quantized; SDNQ skip keys preserve fragile input/output/projection paths.

Components:

  • transformer: target uint4, class CosmosTransformer3DModel, SDNQ layers {'uint4': 442}

Full machine-readable settings are in sdnq_recipe.json and quantization_summary.json.

Size And Runtime Snapshot

Measured on an RTX 3090 with PyTorch 2.11.0+cu128, Diffusers 0.37.1, Transformers 5.8.1, SDNQ 0.1.8.

  • Serialized model folder: 3.0147 GiB.
  • Loaded allocated VRAM after load: 3.04 GiB.
  • Peak VRAM during 768x1152 / 40-step comparison: 8.589 GiB.
  • Warmup generation for Triton/SDNQ kernels: 2.308 s.
  • Card prompt generation time: 21.463 s.
  • Card-style prompt generation time: 18.3 s.

The first inference after loading can include kernel compile/warmup; compare steady-state numbers after one warmup pass.

Usage

import torch
import sdnq  # registers SDNQ quantization support before loading
from diffusers import DiffusionPipeline

pipe = DiffusionPipeline.from_pretrained(
    'WaveCut/sdxs-2b-sdnq-t4-tebf16',
    torch_dtype=torch.bfloat16,
    trust_remote_code=True,
).to('cuda')

prompt = 'A blonde-haired Red Eyes girl with a hair ribbon, half-updo, and tsurime stands solo in a flower field holding a bouquet with a serene smile, wearing green overalls, a white shirt, rolled-up sleeves, and a straw hat with a flower while looking at the viewer under volumetric and natural lighting with a Dutch angle.'
negative_prompt = 'worst quality, low quality, loli, low details, blurry, jpeg artifacts, unfinished, sketch, sepia, missing limb, text, bad anatomy, bad proportions, bad hands, missing fingers'

image = pipe(
    prompt=prompt,
    negative_prompt=negative_prompt,
    height=1152,
    width=768,
    num_inference_steps=40,
    guidance_scale=4.0,
    seed=0,
).images[0]
image.save('sample.png')

Prompting Notes From The Base Card

The upstream card recommends long, already-refined prompts rather than tiny labels. The model is focused on art / illustration / anime / photo, trained at 576-1152 resolution with 64-pixel steps, with default 768x1152. Use a detailed positive prompt and the negative prompt above.

Visual Comparison

The grids below compare baseline bf16 against all generated variants using the same prompt, negative prompt, seed, 768x1152 output, 40 steps, guidance 4.0.

Card sample comparison

Card-style prompt comparison

Caveats

  • This is an alpha-model quantization experiment, not an upstream release.
  • trust_remote_code=True is required because the base pipeline is custom.
Downloads last month
32
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for WaveCut/sdxs-2b-sdnq-t4-tebf16

Base model

AiArtLab/sdxs-2b
Finetuned
(1)
this model