Z-Image-Turbo × Fuliji — LoRA Adapter

A PEFT LoRA adapter trained on top of Tongyi-MAI/Z-Image-Turbo to learn the visual identity of 8 Chinese anime/illustration artists from the Fuliji dataset.

Looking for the ready-to-run merged model? Use DownFlow/Z-Image-Turbo-Fuli — the LoRA weights have been baked into the base model and can be served directly.


Adapter Details

Property Value
Base model Tongyi-MAI/Z-Image-Turbo
LoRA rank 32
LoRA alpha 32
Target modules to_q, to_k, to_v, w1, w2, w3
Trainable params ~39 M
Adapter size ~271 MB
Training steps 5 000 (3 000 at lr=1e-4 + 2 000 continued at lr=5e-5, EMA)
Training resolution 512 × 512
Dataset DownFlow/fuliji (8 artists, ~200 images)

Quick Start (Python + Diffusers + PEFT)

1 — Install dependencies

pip install diffusers transformers peft accelerate safetensors

2 — Generate with artist trigger token

import torch
from diffusers import DiffusionPipeline
from peft import PeftModel

DEVICE = "cuda"
BASE_MODEL = "Tongyi-MAI/Z-Image-Turbo"
ADAPTER = "DownFlow/Z-Image-Turbo-Fuli-LoRA"

# Load base pipeline
pipe = DiffusionPipeline.from_pretrained(
    BASE_MODEL,
    torch_dtype=torch.bfloat16,
).to(DEVICE)

# Attach LoRA adapter to the transformer
pipe.transformer = PeftModel.from_pretrained(pipe.transformer, ADAPTER)

# Generate — prepend the artist's trigger token
# Trained artists: 萌芽儿o0, 年年, 封疆疆v, 焖焖碳, 星之迟迟, 蠢沫沫, 雨波HaneAme, 清水由乃
image = pipe(
    prompt="by 蠢沫沫, 1girl, solo, smile, looking at viewer, soft lighting",
    num_inference_steps=8,
    guidance_scale=0.0,   # Z-Image Turbo uses CFG=0
    height=512,
    width=512,
).images[0]

image.save("output.png")

3 — Adjust LoRA influence at runtime

PEFT exposes a scaling multiplier per adapter. Increase it to push the style harder:

# After PeftModel.from_pretrained ...
for module in pipe.transformer.modules():
    if hasattr(module, "scaling"):
        module.scaling = {k: v * 3.0 for k, v in module.scaling.items()}

Recommended value: 3.0 (step-5000 EMA, strong identity with no colour artefacts on 8-step inference). Lighter alternative: 1.2. Values above 5 may saturate style.


Merge and Unload (for maximum inference speed)

Baking the LoRA into the base weights eliminates PEFT overhead entirely:

import torch
from diffusers import DiffusionPipeline
from peft import PeftModel

pipe = DiffusionPipeline.from_pretrained(
    "Tongyi-MAI/Z-Image-Turbo",
    torch_dtype=torch.bfloat16,
)

pipe.transformer = PeftModel.from_pretrained(
    pipe.transformer,
    "DownFlow/Z-Image-Turbo-Fuli-LoRA",
)
pipe.transformer = pipe.transformer.merge_and_unload()

pipe.to("cuda")

image = pipe(
    prompt="by 年年, 1girl, white dress, cherry blossoms",
    num_inference_steps=8,
    guidance_scale=0.0,
).images[0]

Serving with vLLM

vLLM (≥ 0.8) supports serving diffusion pipelines via an OpenAI-compatible /v1/images/generations endpoint.

Recommended flow for vLLM: use the pre-merged model so no PEFT dependency is needed at serve time.

Option A — Serve the merged model (recommended)

pip install "vllm>=0.8.0"

vllm serve DownFlow/Z-Image-Turbo-Fuli \
    --task generate \
    --dtype bfloat16 \
    --max-model-len 512 \
    --port 8000

Then call the endpoint:

curl http://localhost:8000/v1/images/generations \
  -H "Content-Type: application/json" \
  -d '{
    "model": "DownFlow/Z-Image-Turbo-Fuli",
    "prompt": "by 蠢沫沫, 1girl, smile, soft watercolour style",
    "n": 1,
    "size": "512x512"
  }'

Or via the OpenAI Python SDK:

from openai import OpenAI

client = OpenAI(base_url="http://localhost:8000/v1", api_key="not-needed")

response = client.images.generate(
    model="DownFlow/Z-Image-Turbo-Fuli",
    prompt="by 年年, 1girl, white dress, cherry blossoms",
    n=1,
    size="512x512",
)
print(response.data[0].url)

Option B — Serve with dynamic LoRA (experimental)

vLLM supports dynamic LoRA module loading for LLMs; diffusion pipeline LoRA support is still experimental. If your vLLM build supports --enable-lora for image models:

vllm serve Tongyi-MAI/Z-Image-Turbo \
    --task generate \
    --dtype bfloat16 \
    --enable-lora \
    --lora-modules "fuliji=DownFlow/Z-Image-Turbo-Fuli-LoRA" \
    --port 8000

Request with the LoRA active:

curl http://localhost:8000/v1/images/generations \
  -H "Content-Type: application/json" \
  -d '{
    "model": "fuliji",
    "prompt": "by 雨波HaneAme, 1girl, beach, summer",
    "n": 1,
    "size": "512x512"
  }'

Trained Artist Trigger Tokens

Prepend by <artist>, at the start of your prompt.

Token Approx. images in training set
萌芽儿o0 30
年年 26
封疆疆v 26
焖焖碳 26
星之迟迟 25
蠢沫沫 23
雨波HaneAme 23
清水由乃 21

Training Details

  • Base model: Tongyi-MAI/Z-Image-Turbo (8-step flow matching, CFG-free)
  • Method: PEFT LoRA, rank=32, alpha=32, dropout=0.05
  • Dataset: DownFlow/fuliji filtered to artists with ≥ 21 images
  • Steps: 5 000 — 3 000 initial (lr=1e-4) + 2 000 continuation (lr=5e-5, resumed from step 3000 EMA)
  • Optimizer: AdamW, lr=1e-4→5e-5, warmup=100 steps each phase
  • Batch: 1 × 4 gradient accumulation = effective batch 4
  • Augmentation: horizontal flip, caption dropout 5%, timestep bias 1.2
  • Regularisation: 25% of batches sample from a 277-image generic dataset
  • Hardware: AMD MI300X, ROCm 6.2, bf16

Related

Downloads last month
44
Inference Providers NEW

Model tree for DownFlow/Z-Image-Turbo-Fuli-LoRA

Adapter
(566)
this model