docs: update model card — 5000-step run

b6c4ec0 verified 7 days ago

3.73 kB

	---
	base_model: Tongyi-MAI/Z-Image-Turbo
	library_name: diffusers
	tags:
	- diffusers
	- text-to-image
	- anime
	- art-style
	- z-image
	- fuliji
	- lora-merged
	license: apache-2.0
	language:
	- zh
	- en
	---

	# Z-Image-Turbo × Fuliji — Merged Model

	Z-Image Turbo with Fuliji artist LoRA baked in. The LoRA weights have been permanently merged into the base transformer via `merge_and_unload()`, so no PEFT dependency is needed at inference time.

	> Want the standalone LoRA adapter instead?
	> Use [DownFlow/Z-Image-Turbo-Fuli-LoRA](https://huggingface.co/DownFlow/Z-Image-Turbo-Fuli-LoRA) to apply the adapter on top of any Z-Image-Turbo checkpoint.

	---

	## What This Is

	This model is [Tongyi-MAI/Z-Image-Turbo](https://huggingface.co/Tongyi-MAI/Z-Image-Turbo) (an 8-step flow-matching image generation model) fine-tuned with a LoRA trained on art from 8 Chinese anime/illustration artists in the [DownFlow/fuliji](https://huggingface.co/datasets/DownFlow/fuliji) dataset.

	Trigger the artist style by prepending `by <artist>,` to your prompt.

	---

	## Quick Start (Python)

	```bash
	pip install diffusers transformers accelerate safetensors
	```

	```python
	import torch
	from diffusers import DiffusionPipeline

	pipe = DiffusionPipeline.from_pretrained(
	"DownFlow/Z-Image-Turbo-Fuli",
	torch_dtype=torch.bfloat16,
	).to("cuda")

	image = pipe(
	prompt="by 蠢沫沫, 1girl, solo, smile, soft lighting",
	num_inference_steps=8,
	guidance_scale=0.0, # Z-Image Turbo uses CFG=0
	height=512,
	width=512,
	).images[0]

	image.save("output.png")
	```

	---

	## Serving with vLLM

	vLLM (≥ 0.8) can serve this model via an OpenAI-compatible `/v1/images/generations` endpoint.

	### 1 — Start the server

	```bash
	pip install "vllm>=0.8.0"

	vllm serve DownFlow/Z-Image-Turbo-Fuli \
	--task generate \
	--dtype bfloat16 \
	--max-model-len 512 \
	--port 8000
	```

	### 2 — Generate via curl

	```bash
	curl http://localhost:8000/v1/images/generations \
	-H "Content-Type: application/json" \
	-d '{
	"model": "DownFlow/Z-Image-Turbo-Fuli",
	"prompt": "by 蠢沫沫, 1girl, smile, soft watercolour style",
	"n": 1,
	"size": "512x512"
	}'
	```

	### 3 — Generate via OpenAI Python SDK

	```python
	from openai import OpenAI

	client = OpenAI(base_url="http://localhost:8000/v1", api_key="not-needed")

	response = client.images.generate(
	model="DownFlow/Z-Image-Turbo-Fuli",
	prompt="by 年年, 1girl, white dress, cherry blossoms",
	n=1,
	size="512x512",
	)
	print(response.data[0].url)
	```

	---

	## Artist Trigger Tokens

	Prepend `by <artist>, ` at the start of your prompt.

	\| Token \| Training images \|
	\|---\|---\|
	\| `萌芽儿o0` \| 30 \|
	\| `年年` \| 26 \|
	\| `封疆疆v` \| 26 \|
	\| `焖焖碳` \| 26 \|
	\| `星之迟迟` \| 25 \|
	\| `蠢沫沫` \| 23 \|
	\| `雨波HaneAme` \| 23 \|
	\| `清水由乃` \| 21 \|

	---

	## Model Details

	\| Property \| Value \|
	\|---\|---\|
	\| Base model \| `Tongyi-MAI/Z-Image-Turbo` \|
	\| Fine-tuning method \| LoRA rank=32, alpha=32 — merged into weights \|
	\| Target modules \| `to_q`, `to_k`, `to_v`, `w1`, `w2`, `w3` \|
	\| Training steps \| 5 000 (3 000 at lr=1e-4 + 2 000 continued at lr=5e-5, EMA decay=0.9999) \|
	\| Training resolution \| 512 × 512 \|
	\| Inference steps \| 8 \|
	\| CFG scale \| 0.0 (CFG-free) \|
	\| Precision \| bfloat16 \|
	\| Dataset \| [DownFlow/fuliji](https://huggingface.co/datasets/DownFlow/fuliji) (8 artists, ~200 images) \|

	---

	## Related

	- [DownFlow/Z-Image-Turbo-Fuli-LoRA](https://huggingface.co/DownFlow/Z-Image-Turbo-Fuli-LoRA) — standalone LoRA adapter
	- [DownFlow/fuliji](https://huggingface.co/datasets/DownFlow/fuliji) — training dataset
	- [Tongyi-MAI/Z-Image-Turbo](https://huggingface.co/Tongyi-MAI/Z-Image-Turbo) — base model