docs: update model card — 5000-step run

b6c4ec0 verified 7 days ago

3.73 kB

base_model: Tongyi-MAI/Z-Image-Turbo
library_name: diffusers
tags:
  - diffusers
  - text-to-image
  - anime
  - art-style
  - z-image
  - fuliji
  - lora-merged
license: apache-2.0
language:
  - zh
  - en

Z-Image-Turbo × Fuliji — Merged Model

Z-Image Turbo with Fuliji artist LoRA baked in. The LoRA weights have been permanently merged into the base transformer via merge_and_unload(), so no PEFT dependency is needed at inference time.

Want the standalone LoRA adapter instead? Use DownFlow/Z-Image-Turbo-Fuli-LoRA to apply the adapter on top of any Z-Image-Turbo checkpoint.

What This Is

This model is Tongyi-MAI/Z-Image-Turbo (an 8-step flow-matching image generation model) fine-tuned with a LoRA trained on art from 8 Chinese anime/illustration artists in the DownFlow/fuliji dataset.

Trigger the artist style by prepending by <artist>, to your prompt.

Quick Start (Python)

pip install diffusers transformers accelerate safetensors

import torch
from diffusers import DiffusionPipeline

pipe = DiffusionPipeline.from_pretrained(
    "DownFlow/Z-Image-Turbo-Fuli",
    torch_dtype=torch.bfloat16,
).to("cuda")

image = pipe(
    prompt="by 蠢沫沫, 1girl, solo, smile, soft lighting",
    num_inference_steps=8,
    guidance_scale=0.0,   # Z-Image Turbo uses CFG=0
    height=512,
    width=512,
).images[0]

image.save("output.png")

Serving with vLLM

vLLM (≥ 0.8) can serve this model via an OpenAI-compatible /v1/images/generations endpoint.

1 — Start the server

pip install "vllm>=0.8.0"

vllm serve DownFlow/Z-Image-Turbo-Fuli \
    --task generate \
    --dtype bfloat16 \
    --max-model-len 512 \
    --port 8000

2 — Generate via curl

curl http://localhost:8000/v1/images/generations \
  -H "Content-Type: application/json" \
  -d '{
    "model": "DownFlow/Z-Image-Turbo-Fuli",
    "prompt": "by 蠢沫沫, 1girl, smile, soft watercolour style",
    "n": 1,
    "size": "512x512"
  }'

3 — Generate via OpenAI Python SDK

from openai import OpenAI

client = OpenAI(base_url="http://localhost:8000/v1", api_key="not-needed")

response = client.images.generate(
    model="DownFlow/Z-Image-Turbo-Fuli",
    prompt="by 年年, 1girl, white dress, cherry blossoms",
    n=1,
    size="512x512",
)
print(response.data[0].url)

Artist Trigger Tokens

Prepend by <artist>, at the start of your prompt.

Token	Training images
`萌芽儿o0`	30
`年年`	26
`封疆疆v`	26
`焖焖碳`	26
`星之迟迟`	25
`蠢沫沫`	23
`雨波HaneAme`	23
`清水由乃`	21

Model Details

Property	Value
Base model	`Tongyi-MAI/Z-Image-Turbo`
Fine-tuning method	LoRA rank=32, alpha=32 — merged into weights
Target modules	`to_q`, `to_k`, `to_v`, `w1`, `w2`, `w3`
Training steps	5 000 (3 000 at lr=1e-4 + 2 000 continued at lr=5e-5, EMA decay=0.9999)
Training resolution	512 × 512
Inference steps	8
CFG scale	0.0 (CFG-free)
Precision	bfloat16
Dataset	DownFlow/fuliji (8 artists, ~200 images)

DownFlow/Z-Image-Turbo-Fuli-LoRA — standalone LoRA adapter
DownFlow/fuliji — training dataset
Tongyi-MAI/Z-Image-Turbo — base model

DownFlow
/

Z-Image-Turbo-Fuli