Qwen-Image-1.9 / README.md
ThirdMiddle's picture
Add model card (run prod-20260407)
7a977ae verified
metadata
license: apache-2.0
base_model:
  - Qwen/Qwen-Image-2512
  - Qwen/Qwen-Image-Edit-2511
  - Qwen/Qwen-Image
tags:
  - image-generation
  - qwen
  - mmdit
  - abliterated
  - quantized
  - rocm
language:
  - en
library_name: diffusers
pipeline_tag: text-to-image

Qwen-Image-1.9

A merged, abliterated, and quantized derivative of the Qwen-Image 20B MMDiT family.

Run ID: prod-20260407 Created: 2026-04-07T18:59:37+00:00

Architecture

Property Value
Base family Qwen-Image (MMDiT 20B)
Text encoder Qwen2.5-VL
VAE RGB-VAE
RoPE 2D
Backbone parameters ~20B
License Apache-2.0

Source Models

Alias Model Role License
qwen-image-2512 Qwen/Qwen-Image-2512 foundation Apache-2.0
qwen-image-base Qwen/Qwen-Image ancestry-base Apache-2.0
qwen-image-edit-2511 Qwen/Qwen-Image-Edit-2511 edit-donor Apache-2.0
qwen-image-layered Qwen/Qwen-Image-Layered layer-logic-donor Apache-2.0

Research Method

1. Delta-Edit Merge

The edit capability is transferred to the foundation model via a controlled delta injection:

edit_delta = Qwen-Image-Edit-2511 − Qwen-Image (delta base)
merged     = Qwen-Image-2512 + 0.35 × edit_delta

Only MMDiT backbone tensors are blended. Text encoder, VAE, and RoPE components are passed through from the foundation checkpoint unchanged.

  • Strategy: slerp
  • Blend coefficient: 0.35
  • Foundation: Qwen/Qwen-Image-2512
  • Excluded subsystems: text_encoder, vae, rope

2. Abliteration (Refusal-Direction Removal)

Refusal-direction vectors are identified in the residual stream and projected out of target weight matrices using a norm-preserving orthogonal projection:

W′ = W − scale × (W @ r̂) ⊗ r̂    (norm-preserving variant)
  • Target layers: 18+ (attention o_proj + MLP down_proj)
  • Scale: 1.0
  • Mode: norm-preserving (preserves weight magnitude distribution)
  • Recipe: stage-3-abliteration.yaml

3. Quantization

Kind Path
quant_config quant-config.json
  • GGUF targets: Q4_K_M, IQ4_XS (with importance-matrix)
  • EXL2 target: 4.0 bpw
  • Runtime: vLLM-Omni (ROCm), ExLlamaV2

Hardware

  • GPU: AMD Instinct MI300X — 192 GB HBM3 VRAM
  • ROCm: 7.2.0
  • Precision: bf16 (merge + abliterate), quantized (deployment)

Usage

from diffusers import DiffusionPipeline
import torch

pipe = DiffusionPipeline.from_pretrained(
    "ThirdMiddle/Qwen-Image-1.9",
    torch_dtype=torch.bfloat16,
    trust_remote_code=True,
)
pipe = pipe.to("cuda")

image = pipe(
    "a photorealistic portrait of an astronaut on Mars at sunrise",
    num_inference_steps=30,
    guidance_scale=4.0,
).images[0]
image.save("output.png")

License

Apache-2.0 — inherited from all source models.