Z-Image-Turbo × Fuliji — LoRA Adapter

A PEFT LoRA adapter trained on top of Tongyi-MAI/Z-Image-Turbo to learn the visual identity of 8 Chinese anime/illustration artists from the Fuliji dataset.

Looking for the ready-to-run merged model? Use DownFlow/Z-Image-Turbo-Fuli — the LoRA weights have been baked into the base model and can be served directly.

Adapter Details

Property	Value
Base model	`Tongyi-MAI/Z-Image-Turbo`
LoRA rank	32
LoRA alpha	32
Target modules	`to_q`, `to_k`, `to_v`, `w1`, `w2`, `w3`
Trainable params	~39 M
Adapter size	~271 MB
Training steps	5 000 (3 000 at lr=1e-4 + 2 000 continued at lr=5e-5, EMA)
Training resolution	512 × 512
Dataset	DownFlow/fuliji (8 artists, ~200 images)

Quick Start (Python + Diffusers + PEFT)

1 — Install dependencies

pip install diffusers transformers peft accelerate safetensors

2 — Generate with artist trigger token

import torch
from diffusers import DiffusionPipeline
from peft import PeftModel

DEVICE = "cuda"
BASE_MODEL = "Tongyi-MAI/Z-Image-Turbo"
ADAPTER = "DownFlow/Z-Image-Turbo-Fuli-LoRA"

# Load base pipeline
pipe = DiffusionPipeline.from_pretrained(
    BASE_MODEL,
    torch_dtype=torch.bfloat16,
).to(DEVICE)

# Attach LoRA adapter to the transformer
pipe.transformer = PeftModel.from_pretrained(pipe.transformer, ADAPTER)

# Generate — prepend the artist's trigger token
# Trained artists: 萌芽儿o0, 年年, 封疆疆v, 焖焖碳, 星之迟迟, 蠢沫沫, 雨波HaneAme, 清水由乃
image = pipe(
    prompt="by 蠢沫沫, 1girl, solo, smile, looking at viewer, soft lighting",
    num_inference_steps=8,
    guidance_scale=0.0,   # Z-Image Turbo uses CFG=0
    height=512,
    width=512,
).images[0]

image.save("output.png")

3 — Adjust LoRA influence at runtime

PEFT exposes a scaling multiplier per adapter. Increase it to push the style harder:

# After PeftModel.from_pretrained ...
for module in pipe.transformer.modules():
    if hasattr(module, "scaling"):
        module.scaling = {k: v * 3.0 for k, v in module.scaling.items()}

Recommended value: 3.0 (step-5000 EMA, strong identity with no colour artefacts on 8-step inference). Lighter alternative: 1.2. Values above 5 may saturate style.

Merge and Unload (for maximum inference speed)

Baking the LoRA into the base weights eliminates PEFT overhead entirely:

import torch
from diffusers import DiffusionPipeline
from peft import PeftModel

pipe = DiffusionPipeline.from_pretrained(
    "Tongyi-MAI/Z-Image-Turbo",
    torch_dtype=torch.bfloat16,
)

pipe.transformer = PeftModel.from_pretrained(
    pipe.transformer,
    "DownFlow/Z-Image-Turbo-Fuli-LoRA",
)
pipe.transformer = pipe.transformer.merge_and_unload()

pipe.to("cuda")

image = pipe(
    prompt="by 年年, 1girl, white dress, cherry blossoms",
    num_inference_steps=8,
    guidance_scale=0.0,
).images[0]

Serving with vLLM

vLLM (≥ 0.8) supports serving diffusion pipelines via an OpenAI-compatible /v1/images/generations endpoint.

Recommended flow for vLLM: use the pre-merged model so no PEFT dependency is needed at serve time.

Option A — Serve the merged model (recommended)

pip install "vllm>=0.8.0"

vllm serve DownFlow/Z-Image-Turbo-Fuli \
    --task generate \
    --dtype bfloat16 \
    --max-model-len 512 \
    --port 8000

Then call the endpoint:

curl http://localhost:8000/v1/images/generations \
  -H "Content-Type: application/json" \
  -d '{
    "model": "DownFlow/Z-Image-Turbo-Fuli",
    "prompt": "by 蠢沫沫, 1girl, smile, soft watercolour style",
    "n": 1,
    "size": "512x512"
  }'

Or via the OpenAI Python SDK:

from openai import OpenAI

client = OpenAI(base_url="http://localhost:8000/v1", api_key="not-needed")

response = client.images.generate(
    model="DownFlow/Z-Image-Turbo-Fuli",
    prompt="by 年年, 1girl, white dress, cherry blossoms",
    n=1,
    size="512x512",
)
print(response.data[0].url)

Option B — Serve with dynamic LoRA (experimental)

vLLM supports dynamic LoRA module loading for LLMs; diffusion pipeline LoRA support is still experimental. If your vLLM build supports --enable-lora for image models:

vllm serve Tongyi-MAI/Z-Image-Turbo \
    --task generate \
    --dtype bfloat16 \
    --enable-lora \
    --lora-modules "fuliji=DownFlow/Z-Image-Turbo-Fuli-LoRA" \
    --port 8000

Request with the LoRA active:

curl http://localhost:8000/v1/images/generations \
  -H "Content-Type: application/json" \
  -d '{
    "model": "fuliji",
    "prompt": "by 雨波HaneAme, 1girl, beach, summer",
    "n": 1,
    "size": "512x512"
  }'

Trained Artist Trigger Tokens

Prepend by <artist>, at the start of your prompt.

Token	Approx. images in training set
`萌芽儿o0`	30
`年年`	26
`封疆疆v`	26
`焖焖碳`	26
`星之迟迟`	25
`蠢沫沫`	23
`雨波HaneAme`	23
`清水由乃`	21

Training Details

Base model: Tongyi-MAI/Z-Image-Turbo (8-step flow matching, CFG-free)
Method: PEFT LoRA, rank=32, alpha=32, dropout=0.05
Dataset: DownFlow/fuliji filtered to artists with ≥ 21 images
Steps: 5 000 — 3 000 initial (lr=1e-4) + 2 000 continuation (lr=5e-5, resumed from step 3000 EMA)
Optimizer: AdamW, lr=1e-4→5e-5, warmup=100 steps each phase
Batch: 1 × 4 gradient accumulation = effective batch 4
Augmentation: horizontal flip, caption dropout 5%, timestep bias 1.2
Regularisation: 25% of batches sample from a 277-image generic dataset
Hardware: AMD MI300X, ROCm 6.2, bf16

DownFlow/Z-Image-Turbo-Fuli — merged model (LoRA baked in, ready for vllm serve)
DownFlow/fuliji — training dataset
Tongyi-MAI/Z-Image-Turbo — base model

Downloads last month: 5

Model tree for DownFlow/Z-Image-Turbo-Fuli-LoRA

Base model

Tongyi-MAI/Z-Image-Turbo

Adapter

(775)

this model

DownFlow
/

Z-Image-Turbo-Fuli-LoRA