Z-Image-Turbo × Fuliji — LoRA Adapter
A PEFT LoRA adapter trained on top of Tongyi-MAI/Z-Image-Turbo to learn the visual identity of 8 Chinese anime/illustration artists from the Fuliji dataset.
Looking for the ready-to-run merged model? Use DownFlow/Z-Image-Turbo-Fuli — the LoRA weights have been baked into the base model and can be served directly.
Adapter Details
| Property | Value |
|---|---|
| Base model | Tongyi-MAI/Z-Image-Turbo |
| LoRA rank | 32 |
| LoRA alpha | 32 |
| Target modules | to_q, to_k, to_v, w1, w2, w3 |
| Trainable params | ~39 M |
| Adapter size | ~271 MB |
| Training steps | 5 000 (3 000 at lr=1e-4 + 2 000 continued at lr=5e-5, EMA) |
| Training resolution | 512 × 512 |
| Dataset | DownFlow/fuliji (8 artists, ~200 images) |
Quick Start (Python + Diffusers + PEFT)
1 — Install dependencies
pip install diffusers transformers peft accelerate safetensors
2 — Generate with artist trigger token
import torch
from diffusers import DiffusionPipeline
from peft import PeftModel
DEVICE = "cuda"
BASE_MODEL = "Tongyi-MAI/Z-Image-Turbo"
ADAPTER = "DownFlow/Z-Image-Turbo-Fuli-LoRA"
# Load base pipeline
pipe = DiffusionPipeline.from_pretrained(
BASE_MODEL,
torch_dtype=torch.bfloat16,
).to(DEVICE)
# Attach LoRA adapter to the transformer
pipe.transformer = PeftModel.from_pretrained(pipe.transformer, ADAPTER)
# Generate — prepend the artist's trigger token
# Trained artists: 萌芽儿o0, 年年, 封疆疆v, 焖焖碳, 星之迟迟, 蠢沫沫, 雨波HaneAme, 清水由乃
image = pipe(
prompt="by 蠢沫沫, 1girl, solo, smile, looking at viewer, soft lighting",
num_inference_steps=8,
guidance_scale=0.0, # Z-Image Turbo uses CFG=0
height=512,
width=512,
).images[0]
image.save("output.png")
3 — Adjust LoRA influence at runtime
PEFT exposes a scaling multiplier per adapter. Increase it to push the style harder:
# After PeftModel.from_pretrained ...
for module in pipe.transformer.modules():
if hasattr(module, "scaling"):
module.scaling = {k: v * 3.0 for k, v in module.scaling.items()}
Recommended value: 3.0 (step-5000 EMA, strong identity with no colour artefacts on 8-step inference). Lighter alternative: 1.2. Values above 5 may saturate style.
Merge and Unload (for maximum inference speed)
Baking the LoRA into the base weights eliminates PEFT overhead entirely:
import torch
from diffusers import DiffusionPipeline
from peft import PeftModel
pipe = DiffusionPipeline.from_pretrained(
"Tongyi-MAI/Z-Image-Turbo",
torch_dtype=torch.bfloat16,
)
pipe.transformer = PeftModel.from_pretrained(
pipe.transformer,
"DownFlow/Z-Image-Turbo-Fuli-LoRA",
)
pipe.transformer = pipe.transformer.merge_and_unload()
pipe.to("cuda")
image = pipe(
prompt="by 年年, 1girl, white dress, cherry blossoms",
num_inference_steps=8,
guidance_scale=0.0,
).images[0]
Serving with vLLM
vLLM (≥ 0.8) supports serving diffusion pipelines via an OpenAI-compatible /v1/images/generations endpoint.
Recommended flow for vLLM: use the pre-merged model so no PEFT dependency is needed at serve time.
Option A — Serve the merged model (recommended)
pip install "vllm>=0.8.0"
vllm serve DownFlow/Z-Image-Turbo-Fuli \
--task generate \
--dtype bfloat16 \
--max-model-len 512 \
--port 8000
Then call the endpoint:
curl http://localhost:8000/v1/images/generations \
-H "Content-Type: application/json" \
-d '{
"model": "DownFlow/Z-Image-Turbo-Fuli",
"prompt": "by 蠢沫沫, 1girl, smile, soft watercolour style",
"n": 1,
"size": "512x512"
}'
Or via the OpenAI Python SDK:
from openai import OpenAI
client = OpenAI(base_url="http://localhost:8000/v1", api_key="not-needed")
response = client.images.generate(
model="DownFlow/Z-Image-Turbo-Fuli",
prompt="by 年年, 1girl, white dress, cherry blossoms",
n=1,
size="512x512",
)
print(response.data[0].url)
Option B — Serve with dynamic LoRA (experimental)
vLLM supports dynamic LoRA module loading for LLMs; diffusion pipeline LoRA support is still experimental. If your vLLM build supports --enable-lora for image models:
vllm serve Tongyi-MAI/Z-Image-Turbo \
--task generate \
--dtype bfloat16 \
--enable-lora \
--lora-modules "fuliji=DownFlow/Z-Image-Turbo-Fuli-LoRA" \
--port 8000
Request with the LoRA active:
curl http://localhost:8000/v1/images/generations \
-H "Content-Type: application/json" \
-d '{
"model": "fuliji",
"prompt": "by 雨波HaneAme, 1girl, beach, summer",
"n": 1,
"size": "512x512"
}'
Trained Artist Trigger Tokens
Prepend by <artist>, at the start of your prompt.
| Token | Approx. images in training set |
|---|---|
萌芽儿o0 |
30 |
年年 |
26 |
封疆疆v |
26 |
焖焖碳 |
26 |
星之迟迟 |
25 |
蠢沫沫 |
23 |
雨波HaneAme |
23 |
清水由乃 |
21 |
Training Details
- Base model:
Tongyi-MAI/Z-Image-Turbo(8-step flow matching, CFG-free) - Method: PEFT LoRA, rank=32, alpha=32, dropout=0.05
- Dataset:
DownFlow/fulijifiltered to artists with ≥ 21 images - Steps: 5 000 — 3 000 initial (lr=1e-4) + 2 000 continuation (lr=5e-5, resumed from step 3000 EMA)
- Optimizer: AdamW, lr=1e-4→5e-5, warmup=100 steps each phase
- Batch: 1 × 4 gradient accumulation = effective batch 4
- Augmentation: horizontal flip, caption dropout 5%, timestep bias 1.2
- Regularisation: 25% of batches sample from a 277-image generic dataset
- Hardware: AMD MI300X, ROCm 6.2, bf16
Related
- DownFlow/Z-Image-Turbo-Fuli — merged model (LoRA baked in, ready for
vllm serve) - DownFlow/fuliji — training dataset
- Tongyi-MAI/Z-Image-Turbo — base model
- Downloads last month
- 44
Model tree for DownFlow/Z-Image-Turbo-Fuli-LoRA
Base model
Tongyi-MAI/Z-Image-Turbo