ovedrive
/

Qwen-Image-2512-4bit

QwenImagePipeline

Model card Files Files and versions

ovedrive commited on Jan 2

Commit

eeeeb81

·

verified ·

1 Parent(s): a84ae71

Update README.md

Files changed (1) hide show

README.md +72 -2

README.md CHANGED Viewed

@@ -1,11 +1,81 @@
 ---
-license: apache-2.0
 language:
 - en
 - zh
 library_name: diffusers
 pipeline_tag: text-to-image
 ---
 <p align="center">
     <img src="https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-Image/qwen_image_logo.png" width="400"/>
 <p>
@@ -225,4 +295,4 @@ If Qwen-Image-2512 proves helpful in your research, we’d greatly appreciate yo
       primaryClass={cs.CV},
       url={https://arxiv.org/abs/2508.02324},
 }
-```

 ---
+license: cc-by-nc-sa-4.0
 language:
 - en
 - zh
+quantized_by: Abhishek Dujari
 library_name: diffusers
 pipeline_tag: text-to-image
+base_model:
+- Qwen/Qwen-Image-2512
+base_model_relation: quantized
 ---
+This is an NF4 quantized model of Qwen-image so it can run on GPUs using 20GB VRAM. You can run it on lower VRAM like 16GB.
+There were other NF4 models but they made the mistake of blindly quantizing all layers in the transformer. This one does not.
+We retain some layers at full precision in order to ensure that we get quality output.
+You can use the original Qwen-Image parameters as is though I recommend atleast 20 inference steps.
+This model is available for inference, modifications and commercial use by support AT justlab.ai
+```python
+from diffusers import DiffusionPipeline
+import torch
+model_name = "ovedrive/qwen-image-2512-4bit"
+# Load the pipeline
+if torch.cuda.is_available():
+    torch_dtype = torch.bfloat16
+    device = "cuda"
+else:
+    torch_dtype = torch.float32
+    device = "cpu"
+pipe = DiffusionPipeline.from_pretrained(model_name, torch_dtype=torch_dtype)
+pipe = pipe.to(device)
+positive_magic = {
+    "en": "Ultra HD, 4K, cinematic composition." # for english prompt,
+    "zh": "超清，4K，电影级构图" # for chinese prompt,
+}
+# Generate image
+prompt = '''A coffee shop entrance features a chalkboard sign reading "Qwen Coffee 😊 $2 per cup," with a neon light beside it displaying "通义千问". Next to it hangs a poster showing a beautiful Chinese woman, and beneath the poster is written "π≈3.1415926-53589793-23846264-33832795-02384197". Ultra HD, 4K, cinematic composition'''
+negative_prompt = " " # using an empty string if you do not have specific concept to remove
+# Generate with different aspect ratios
+aspect_ratios = {
+    "1:1": (1328, 1328),
+    "16:9": (1664, 928),
+    "9:16": (928, 1664),
+    "4:3": (1472, 1140),
+    "3:4": (1140, 1472),
+    "3:2": (1584, 1056),
+    "2:3": (1056, 1584),
+}
+width, height = aspect_ratios["16:9"]
+image = pipe(
+    prompt=prompt + positive_magic["en"],
+    negative_prompt=negative_prompt,
+    width=width,
+    height=height,
+    num_inference_steps=20,
+    true_cfg_scale=4.0,
+    generator=torch.Generator(device="cuda").manual_seed(42)
+).images[0]
+image.save("example.png")
+```
+The original Qwen-Image attributions are included verabtim below.
 <p align="center">
     <img src="https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-Image/qwen_image_logo.png" width="400"/>
 <p>
       primaryClass={cs.CV},
       url={https://arxiv.org/abs/2508.02324},
 }
+```