jc-builds commited on
Commit
5158439
·
verified ·
1 Parent(s): 3e5d06b

docs: add prompting guide + simple-vs-ideal prompt table (Qwen3 LLM encoder favours long natural-language)

Browse files
Files changed (1) hide show
  1. README.md +22 -0
README.md CHANGED
@@ -83,6 +83,28 @@ let image = try await engine.generate(.init(
83
 
84
  That's the whole pipeline. See the [Mirage README](https://github.com/haplollc/Mirage) for the full SwiftUI example.
85
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
86
  ## Performance (measured via Mirage)
87
 
88
  | Device | 1024² @ 9 steps | 512² @ 9 steps |
 
83
 
84
  That's the whole pipeline. See the [Mirage README](https://github.com/haplollc/Mirage) for the full SwiftUI example.
85
 
86
+ ## Prompting guide
87
+
88
+ Z-Image-Turbo conditions on the **Qwen3-4B-Instruct** text encoder, which means it reads prompts the way an instruction-tuned LLM does — **long, natural-language descriptions outperform short tag lists**. The official Tongyi-MAI examples are essentially short paragraphs describing subject, pose, attributes, environment, and lighting in flowing prose.
89
+
90
+ **Common pitfall: fusion concepts.** Prompts of the form *"X as Y"* (e.g. "The Statue of Liberty as a dog") fail because the text encoder parses `Statue of Liberty` and `dog` as two separate noun phrases, and the diffusion model dutifully paints both side-by-side. To get a single **fused** subject, write the fused entity as **one noun** and then describe pose / attributes / setting that pull in the second concept.
91
+
92
+ | Simple English (what fails) | Prompt that works on Z-Image-Turbo |
93
+ |---|---|
94
+ | "The Statue of Liberty as a dog" | A bronze statue of a golden retriever standing on a stone pedestal on Liberty Island, posed exactly like the Statue of Liberty: right paw raised holding a flaming torch, left paw clutching an engraved stone tablet, a seven-pointed crown on its head, weathered green-bronze patina, photographed at golden hour with the New York harbor in the background, photorealistic |
95
+ | "A cat in a city" | A photograph of a tabby cat sitting on a black metal fire escape in lower Manhattan at dusk, neon shop signs glowing across the street, warm yellow light spilling from windows behind it, shallow depth of field, 35mm film aesthetic |
96
+ | "A robot drinking coffee" | A close-up photograph of a humanoid robot with brushed-aluminum face plates sitting at a cafe table, both hands wrapped carefully around a ceramic latte cup, steam rising past glowing blue eye sensors, warm bokeh of cafe lights in the background, late afternoon light, photorealistic |
97
+ | "Sunset over the ocean" | A wide-angle photograph of the Pacific Ocean at sunset viewed from a basalt cliff in Big Sur, sun a deep orange disk just touching the horizon, sky transitioning from violet at zenith to peach and gold at the horizon, foreground tide pools mirroring the sky, dramatic |
98
+ | "A wizard in a forest" | A digital painting of an elderly wizard in dark blue robes embroidered with silver constellations, leaning on a gnarled oak staff with a glowing crystal at its tip, standing in a misty old-growth redwood forest at dawn, soft shafts of light cutting through the trees, painterly style |
99
+
100
+ **Heuristics that work well on Z-Image:**
101
+
102
+ - **Describe like you're talking to a person.** Full sentences. Qwen3 understands intent, not keyword vectors.
103
+ - **Lead with the medium.** "A photograph of...", "A digital painting of...", "A studio portrait of..." anchors the style early.
104
+ - **Be specific about what's in frame.** Lens, lighting direction, time of day, background. The model has plenty of capacity for detail; vague prompts pay for it in vagueness.
105
+ - **English and Chinese both work** — Z-Image was trained bilingually.
106
+ - **For fusion concepts**: pick one noun for the fused subject, then *attribute* the other concept via pose / clothing / context. "A golden retriever statue *in the pose of* the Statue of Liberty" works; "A golden retriever *and* the Statue of Liberty" doesn't.
107
+
108
  ## Performance (measured via Mirage)
109
 
110
  | Device | 1024² @ 9 steps | 512² @ 9 steps |