jc-builds commited on
Commit
efcb855
·
verified ·
1 Parent(s): 1dbb980

docs: add prompting guide + simple-vs-ideal prompt table (Mistral3 encoder; quote-text for in-image rendering)

Browse files
Files changed (1) hide show
  1. README.md +24 -0
README.md CHANGED
@@ -80,6 +80,30 @@ let image = try await engine.generate(.init(
80
  ))
81
  ```
82
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
83
  ## Why ERNIE-Image-Turbo
84
 
85
  If you need **text inside images** that actually renders correctly (signs, labels, captions, UI mocks), this is currently the strongest open-weight option. Mid-2025 evaluations showed ERNIE-Image meeting or beating GPT-Image-1 on text rendering despite being 1/10th the size.
 
80
  ))
81
  ```
82
 
83
+ ## Prompting guide
84
+
85
+ ERNIE-Image-Turbo conditions on a **Mistral3-3B** text encoder. The upstream Baidu README shows two prompt styles working well: short, dense, comma-separated phrases (`"Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"`) and long photograph-style narration (`"This is a photograph depicting an urban street scene. Shot at eye level..."`). Long prompts win when the scene has multiple subjects, structured layout, or rendered text.
86
+
87
+ **ERNIE's particular strength: text inside the image.** Put the exact text you want rendered in **quotes** in the prompt and the model will reproduce it on a sign, poster, label, or UI mock with very high fidelity for an open-weight model.
88
+
89
+ **Common pitfall: fusion concepts.** Prompts of the form *"X as Y"* (e.g. "The Statue of Liberty as a dog") fail because the text encoder parses `Statue of Liberty` and `dog` as two separate noun phrases, and the diffusion model paints both side-by-side. To get a single **fused** subject, write the fused entity as **one noun** and describe pose / attributes / setting that pull in the second concept.
90
+
91
+ | Simple English (what fails) | Prompt that works on ERNIE-Image-Turbo |
92
+ |---|---|
93
+ | "The Statue of Liberty as a dog" | This is a photograph of a colossal bronze statue depicting a golden retriever in the exact pose of the Statue of Liberty: standing upright on its hind legs atop a stone pedestal, right paw holding a flaming torch raised toward the sky, left paw clutching a stone tablet engraved with the date "MCMXXVI", a seven-pointed crown on its head, weathered green-bronze patina, New York harbor in the background, golden hour lighting, photorealistic |
94
+ | "A coffee shop sign" | A photograph of a coffee shop storefront on a brick-fronted street, a vintage glowing red neon sign mounted above the door that reads "BREW & CO." with the words "Coffee · Pastries · Open Late" in smaller letters below, early morning warm light, a few people visible through the window, photorealistic, 35mm lens |
95
+ | "Movie poster of a detective" | A vintage 1940s-style movie poster, a hard-boiled detective in a trench coat and fedora silhouetted against a foggy alley, dramatic chiaroscuro lighting, large red art-deco title reading "THE LAST CASE" at the top, smaller credits text at the bottom reading "Starring · Directed by · Music by", noir aesthetic, paper texture |
96
+ | "A child's bedroom" | A wide-angle photograph of a child's bedroom in a suburban home, twin bed with a galaxy-print comforter, a wall map of the solar system, a small wooden desk with a half-finished science fair poster reading "WHY THE SKY IS BLUE" in marker, late afternoon sunlight slanting through the window, photorealistic |
97
+ | "An infographic on healthy eating" | A clean infographic illustration about healthy eating, top banner text reading "EAT THE RAINBOW", below it a circular chart divided into colored sections labeled "VEGETABLES", "FRUITS", "WHOLE GRAINS", "PROTEIN", "DAIRY", each section illustrated with simple flat-vector food icons, soft pastel palette, modern editorial design |
98
+
99
+ **Heuristics that work well on ERNIE-Image-Turbo:**
100
+
101
+ - **Quote any text you want rendered.** `the sign reads "BREW & CO."` performs much better than `a sign saying brew and co`.
102
+ - **Open with the medium.** `"This is a photograph of..."`, `"A movie poster showing..."`, `"An infographic about..."` anchors composition early — ERNIE is particularly good at structured layouts (posters, comics, UI mocks).
103
+ - **Short prompts work too, in a tag-style register.** `"Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"` is from the upstream Baidu examples. Use this style when you want atmosphere without committing to a specific scene.
104
+ - **For fusion concepts**: pick one noun for the fused subject, then *attribute* the other concept via pose / clothing / context. "A golden retriever statue *in the pose of* the Statue of Liberty" works; "A golden retriever *and* the Statue of Liberty" doesn't.
105
+ - **Bundle ships with Mistral3** — the text encoder is instruction-tuned, so prompts written as natural-language descriptions of a scene generally outperform pure keyword salad.
106
+
107
  ## Why ERNIE-Image-Turbo
108
 
109
  If you need **text inside images** that actually renders correctly (signs, labels, captions, UI mocks), this is currently the strongest open-weight option. Mid-2025 evaluations showed ERNIE-Image meeting or beating GPT-Image-1 on text rendering despite being 1/10th the size.