Upload folder using huggingface_hub
Browse files- app.py +242 -441
- templates/captivating_prompt.jinja +55 -0
- templates/cool_lifestyle_prompt.jinja +8 -0
- templates/emotional_lifestyle_prompt.jinja +99 -0
- templates/extreme_sports_prompt.jinja +92 -0
- templates/fashion_prompt.jinja +90 -0
- templates/general_prompt.jinja +56 -0
- templates/image_replication_prompt.jinja +9 -0
- templates/modern_product_prompt.jinja +8 -0
- templates/replicate_reference_image_prompt.jinja +6 -0
app.py
CHANGED
|
@@ -1,401 +1,113 @@
|
|
| 1 |
import base64
|
|
|
|
| 2 |
from io import BytesIO
|
| 3 |
-
from
|
|
|
|
| 4 |
|
| 5 |
import gradio as gr
|
| 6 |
import jinja2
|
| 7 |
from openai import OpenAI
|
|
|
|
| 8 |
|
| 9 |
client = OpenAI()
|
| 10 |
|
|
|
|
|
|
|
| 11 |
|
| 12 |
-
|
| 13 |
-
|
| 14 |
-
|
| 15 |
-
|
| 16 |
-
|
| 17 |
-
|
| 18 |
-
|
| 19 |
-
|
| 20 |
-
|
| 21 |
-
|
| 22 |
-
|
| 23 |
-
|
| 24 |
-
|
| 25 |
-
|
| 26 |
-
|
| 27 |
-
|
| 28 |
-
|
| 29 |
-
|
| 30 |
-
|
| 31 |
-
|
| 32 |
-
|
| 33 |
-
|
| 34 |
-
|
| 35 |
-
|
| 36 |
-
|
| 37 |
-
|
| 38 |
-
|
| 39 |
-
|
| 40 |
-
|
| 41 |
-
|
| 42 |
-
|
| 43 |
-
|
| 44 |
-
|
| 45 |
-
|
| 46 |
-
|
| 47 |
-
|
| 48 |
-
|
| 49 |
-
|
| 50 |
-
|
| 51 |
-
|
| 52 |
-
|
| 53 |
-
|
| 54 |
-
|
| 55 |
-
|
| 56 |
-
|
| 57 |
-
|
| 58 |
-
|
| 59 |
-
|
| 60 |
-
|
| 61 |
-
|
| 62 |
-
|
| 63 |
-
|
| 64 |
-
|
| 65 |
-
|
| 66 |
-
|
| 67 |
-
|
| 68 |
-
|
| 69 |
-
|
| 70 |
-
|
| 71 |
-
|
| 72 |
-
|
| 73 |
-
|
| 74 |
-
|
| 75 |
-
|
| 76 |
-
|
| 77 |
-
|
| 78 |
-
|
| 79 |
-
|
| 80 |
-
|
| 81 |
-
|
| 82 |
-
|
| 83 |
-
|
| 84 |
-
|
| 85 |
-
|
| 86 |
-
|
| 87 |
-
|
| 88 |
-
|
| 89 |
-
|
| 90 |
-
|
| 91 |
-
|
| 92 |
-
|
| 93 |
-
|
| 94 |
-
|
| 95 |
-
|
| 96 |
-
|
| 97 |
-
|
| 98 |
-
|
| 99 |
-
|
| 100 |
-
|
| 101 |
-
|
| 102 |
-
|
| 103 |
-
|
| 104 |
-
|
| 105 |
-
|
| 106 |
-
- Specific lighting directions (soft but focused, shadow contours) shape the mood.
|
| 107 |
-
- Specifies high fashion elements
|
| 108 |
-
|
| 109 |
-
Input: A picture of me sprinting
|
| 110 |
-
Input photo: Black man
|
| 111 |
-
Output prompt:
|
| 112 |
-
A vertical-format, side-profile flash photograph capturing a Black male runner sprinting
|
| 113 |
-
down a sunlit urban street from an elevated angle. The camera looks slightly down at the
|
| 114 |
-
scene, placing the runner in the center-right of the frame, mid-stride with one leg extended
|
| 115 |
-
behind and arms pumping forward. He wears a reflective silver windbreaker, black running
|
| 116 |
-
shorts, white socks, and sleek performance shoes. A pair of dark sunglasses adds attitude
|
| 117 |
-
and edge to his motion.
|
| 118 |
-
The runner is in motion blur, especially on limbs and head, with only parts of the torso and
|
| 119 |
-
upper back lightly frozen by a directional rear-curtain sync flash. His movement arcs forward
|
| 120 |
-
across the frame, and the reflective jacket catches intense flashes of light, bouncing subtle
|
| 121 |
-
highlights across the scene. Below the asphalt road, a strip of green grass borders the
|
| 122 |
-
street at the bottom edge of the image, adding a clean contrasting base to the composition.
|
| 123 |
-
The background is dark asphalt, textured with faint painted lines and subtle shadows. The
|
| 124 |
-
elevated camera position allows for a sense of depth and rhythm as the runner cuts across
|
| 125 |
-
the frame from left to right, motion trailing behind. Warm natural light streaks or golden
|
| 126 |
-
ambient flares may bleed across the top of the image for added cinematic tension.
|
| 127 |
-
|
| 128 |
-
Why the output is good:
|
| 129 |
-
- Creates dynamic motion through specific technical directions
|
| 130 |
-
- It has composition instructions such as the elevated angle and the center-right placement
|
| 131 |
-
- It adds interesting and relative environmental elements such as the grass strip and the asphalt texture
|
| 132 |
-
|
| 133 |
-
Prompt 3 (Track Athlete):
|
| 134 |
-
|
| 135 |
-
Input: A picture of me jumping off the starting blocks
|
| 136 |
-
Input photo: Black woman
|
| 137 |
-
Output prompt:
|
| 138 |
-
A flash-illuminated, hyper-dynamic close-up photograph capturing the feet of a Black
|
| 139 |
-
female track runner launching from the starting blocks at night. The image is taken from a
|
| 140 |
-
low, side angle, tightly framed at ground level, with her silver sprinting spikes clearly
|
| 141 |
-
visible — one foot pushing forcefully into the rear block, the other caught mid-air in dramatic
|
| 142 |
-
motion. She wears white ankle-high performance socks, and her defined, muscular
|
| 143 |
-
calves are frozen in the peak of exertion.
|
| 144 |
-
The flash lighting from the front-left casts sharp highlights on her skin and the metallic
|
| 145 |
-
texture of the shoes, while the surrounding track surface — deep blue and textured —
|
| 146 |
-
catches scattered moisture droplets that shimmer in the light. The starting blocks behind
|
| 147 |
-
her blur slightly, and her trailing leg dissolves into motion streaks, captured using a slow
|
| 148 |
-
shutter speed with rear-curtain sync to enhance the sense of explosive movement.
|
| 149 |
-
The background is minimal and moody: abstract light streaks from stadium lighting stretch
|
| 150 |
-
diagonally behind her, forming a glowing contrast to the dark track. The overall tone is sleek,
|
| 151 |
-
raw, and cinematic — focused on power, speed, and launch precision.
|
| 152 |
-
|
| 153 |
-
Why the output is good:
|
| 154 |
-
- Uses extreme close-up composition to transform a sports moment into fashion art
|
| 155 |
-
- Creates tension between static and dynamic elements
|
| 156 |
-
- Uses technical specifications such as flash from front-left, slow shutter, rear-curtain sync
|
| 157 |
-
- Adds texture details such as moisture droplets and metallic shoes
|
| 158 |
-
|
| 159 |
-
You need to enhance the following prompt according to the guide above. Only output the prompt, no other text.
|
| 160 |
-
{{ user_prompt }}
|
| 161 |
-
""")
|
| 162 |
-
|
| 163 |
-
|
| 164 |
-
EMOTIONAL_LIFESTYLE_PROMPT_TEMPLATE = jinja2.Template("""You are an expert prompt engineer for emotional lifestyle image generation.
|
| 165 |
-
|
| 166 |
-
Transform the user's simple prompt into a highly descriptive paragraph that produces a visually striking image. The photo of the user will be provided to you, so you should use it to infer the subject's appearance and incorporate accurate descriptors.
|
| 167 |
-
|
| 168 |
-
Focus on creating a vivid lifestyle portrait that captures an authentic emotional moment or state within a visually compelling environment. Be attentive to portraying the subject in a way that reveals character, mood, or narrative through their expression, posture, and interaction with their surroundings. Use very cool contrasting colors to elevate the subject and utilize naturalistic lighting approaches (e.g., window light, ambient environmental lighting, soft golden hour) or stylized lighting that enhances the emotional tone. Incorporate environmental details that contribute to storytelling and provide context. The final image should feel intimate yet visually striking—balancing raw emotional authenticity with aesthetic sophistication through thoughtful composition, color treatment, and atmospheric elements.
|
| 169 |
-
|
| 170 |
-
Tips for Success:
|
| 171 |
-
- Make the prompt short
|
| 172 |
-
- Importantly highlight the user prompt request (i.e. if the user asks to be seen roller blading, the roller blades should be seen)
|
| 173 |
-
- Define the emotional state or moment clearly (vulnerability, joy, contemplation)
|
| 174 |
-
- Specify the lighting and how it enhances the mood (soft window light, dramatic shadows)
|
| 175 |
-
- Include meaningful props or elements that tell the subject's story
|
| 176 |
-
- Describe subtle details in expression or posture that convey emotion
|
| 177 |
-
- Consider color treatment that reinforces the emotional tone
|
| 178 |
-
- Add atmospheric elements that enhance the mood (water droplets, steam, fabric texture)
|
| 179 |
-
|
| 180 |
-
Examples:
|
| 181 |
-
Input: A picture of me crying on the phone
|
| 182 |
-
Input photo: A black woman
|
| 183 |
-
Output prompt:
|
| 184 |
-
A hyperrealistic editorial-style fashion photograph in vertical format (1080x1350), heavily
|
| 185 |
-
stylized with retro lighting, saturated color, and cinematic imperfection. A Black woman sits
|
| 186 |
-
facing the camera in a white wicker chair, wearing a glossy hot pink satin robe with sharp
|
| 187 |
-
lapels and a vintage brooch pinned to her chest. Her hair is styled in soft, voluminous waves.
|
| 188 |
-
|
| 189 |
-
She holds a red corded landline receiver in one hand, and in the other presses a tissue to
|
| 190 |
-
her cheek, caught mid-tear with a melodramatic, frozen expression. Her eyelids shimmer
|
| 191 |
-
with green eyeshadow, slightly smudged, evoking a stylized soap opera mood.
|
| 192 |
-
Resting on her lap is a bright blue tissue box with bold white cloud graphics — and
|
| 193 |
-
prominently across the front, the word "ohneis" is printed in large, clean white letters in a
|
| 194 |
-
stylized, editorial font. A single tissue protrudes, loosely folded over the edge. To her left, a
|
| 195 |
-
small round table holds the red phone base, crumpled tissues, and a decorative vase with
|
| 196 |
-
pink plastic flowers. The background is a matte lavender surface, creating smooth contrast
|
| 197 |
-
with the glossy fabrics and vibrant tones.
|
| 198 |
-
|
| 199 |
-
The entire image is treated with subtle analog effects: soft bloom on the highlights, visible
|
| 200 |
-
film grain, faint vertical scratches, floating dust particles, and a few light leaks that enhance
|
| 201 |
-
the stylized, nostalgic mood. The scene feels like a surreal still from a hyper-aestheticized
|
| 202 |
-
1980s commercial.
|
| 203 |
-
|
| 204 |
-
Why the output is good:
|
| 205 |
-
- The prompt uses props to tell the story of the subject (red phone, tissue box, tissues)
|
| 206 |
-
- Defines details that describe the emotional moment (mid-tear)
|
| 207 |
-
- Adds interesting and bold colors (purple robe, hot pink robe, red phone, blue tissue box)
|
| 208 |
-
|
| 209 |
-
What can be better:
|
| 210 |
-
- the prompt is too long
|
| 211 |
-
- does not define the emotion clear enough
|
| 212 |
-
|
| 213 |
-
Input: A picture of water pouring on me
|
| 214 |
-
Input photo: Blue eyed white man
|
| 215 |
-
Output prompt:
|
| 216 |
-
A hyperrealistic flash photograph taken at eye level, capturing a half-body, front-facing
|
| 217 |
-
portrait of a young man standing shirtless against a sleek, modern white wall. A column of
|
| 218 |
-
water strikes him directly in the face at the moment of impact, caught mid-air in razor-sharp
|
| 219 |
-
detail — droplets frozen as they burst and scatter across his features. His right shoulder is
|
| 220 |
-
slightly raised and tensed, muscles subtly defined under the harsh lighting. His eyes are
|
| 221 |
-
half-closed in reaction, mouth neutral, giving the scene a raw, involuntary intensity. Around
|
| 222 |
-
his neck hangs a thin turquoise necklace, glinting faintly in the sun, its color vividly
|
| 223 |
-
contrasting with his sun-warmed skin. In the blurred background, a lone palm tree arcs
|
| 224 |
-
gently from the left edge of the frame, with the deep blue sea stretching toward a soft, hazy
|
| 225 |
-
horizon. The flash adds hard highlights to the water, the necklace, and the tension lines on
|
| 226 |
-
his body, while subtle analog textures — faint vertical lens scratches, fine grain, and
|
| 227 |
-
scattered dust — bring a tactile, editorial edge to the image.
|
| 228 |
-
|
| 229 |
-
Why the output is good:
|
| 230 |
-
- The promprt is relevant to the users imput as it describes exactly where the water is pouring
|
| 231 |
-
- The physical reaction (raised shoulder, half-closed eyes) describes the emotion
|
| 232 |
-
- Environmental hints (palm tree, sea) establish location context without overwhelming the portrait.
|
| 233 |
-
|
| 234 |
-
What can be better:
|
| 235 |
-
- His emotion is not clearly defined
|
| 236 |
-
|
| 237 |
-
Input: A picture of me in a helmer
|
| 238 |
-
Input photo: A white man with blue eyes
|
| 239 |
-
Output prompt:
|
| 240 |
-
A hyperrealistic macro flash photograph taken from a low frontal angle, capturing the
|
| 241 |
-
intense, close-up portrait of a tanned male model wearing a high-impact helmet with a
|
| 242 |
-
closed chin guard, resembling the design of a rugby or Formula 1 helmet. The camera is
|
| 243 |
-
positioned slightly beneath eye level, making the face appear dominant and imposing within
|
| 244 |
-
the vertical frame. His expression is calm but intense, with piercing clear blue eyes staring
|
| 245 |
-
directly into the lens, framed by the slightly open visor. The skin is bronzed and smooth, yet
|
| 246 |
-
visibly roughed by activity — fine scratches and a reddish abrasion across the nose give the
|
| 247 |
-
face a raw, lived-in quality. His sculpted features and symmetrical bone structure remain
|
| 248 |
-
visible beneath the helmet's padding. A small red carabiner is clipped casually to one of the
|
| 249 |
-
chin straps, functioning more like a fashion detail than gear. The flash harshly illuminates the
|
| 250 |
-
facial textures and helmet surface, producing sharp highlights and crisp shadows along the
|
| 251 |
-
cheeks and neck. The background is black and indistinct, fading away entirely. Fine analog
|
| 252 |
-
imperfections — vertical lens scratches, dust particles suspended midair in the flash cone,
|
| 253 |
-
and faint grain — lend the image a gritty, stylized realism.
|
| 254 |
-
|
| 255 |
-
Why the output is good:
|
| 256 |
-
- The prompt is relevant to the users imput as it describes exactly where the helmet is being worn
|
| 257 |
-
- The prompt defines his expression (His expression is calm but intense), and sepcifically adds details describing his expression (with piercing clear blue eyes staring
|
| 258 |
-
directly into the lens)
|
| 259 |
-
- the prompt is a good length (although it could be shorter)
|
| 260 |
-
|
| 261 |
-
You need to enhance the following prompt according to the guide above. Only output the prompt, no other text.
|
| 262 |
-
{{ user_prompt }}
|
| 263 |
-
""")
|
| 264 |
-
|
| 265 |
-
|
| 266 |
-
EXTREME_SPORTS_PROMPT_TEMPLATE = jinja2.Template("""You are an expert prompt engineer for extreme sports image generation.
|
| 267 |
-
|
| 268 |
-
Transform the user's simple prompt into a highly descriptive paragraph that produces a visually striking image. The photo of the user will be provided to you, so you should use it to infer the subject's appearance and incorporate accurate descriptors.
|
| 269 |
-
|
| 270 |
-
Focus on creating a dynamic, high-impact photograph capturing an adventure sport athlete in
|
| 271 |
-
mid-action. Utilize dramatic camera angles (e.g., low angle, fisheye, aerial) and specialized
|
| 272 |
-
lighting techniques (e.g., backlit silhouettes, flash freezing motion, golden hour glow) to
|
| 273 |
-
emphasize the intensity and athleticism of the moment. The picture is Black and White (and gray) ONLY.
|
| 274 |
-
Focus on capturing peak action – the apex of a jump, the spray of water/dirt/snow,
|
| 275 |
-
or the tension in the athlete's body, to highlight the users prompt request. Incorporate
|
| 276 |
-
environmental elements that enhance the narrative and mood, whether natural (mountains,
|
| 277 |
-
waves, desert) or urban (concrete, structures, cityscape). The image should balance raw
|
| 278 |
-
athleticism with cinematic drama through specific details in the subject's gear, expression,
|
| 279 |
-
environment, and the physical forces at play. Make sure that the prompt is short, to the point,
|
| 280 |
-
and relevant to the users prompt request.
|
| 281 |
-
|
| 282 |
-
Tips for Success:
|
| 283 |
-
● Capture peak action moments and dynamic motion
|
| 284 |
-
● Include specialized sports equipment and gear
|
| 285 |
-
● Detail the extreme environment and conditions
|
| 286 |
-
● Describe dramatic lighting setups (harsh shadows, rim lighting, flash freeze)
|
| 287 |
-
● Include elements that convey danger and excitement
|
| 288 |
-
● Focus on athletic poses and expressions of intensity
|
| 289 |
-
● Add compositional directions that emphasize scale and drama
|
| 290 |
-
|
| 291 |
-
Tips for Success:
|
| 292 |
-
- Make the prompt short
|
| 293 |
-
- Importantly highlight the user prompt request (i.e. if the user asks to be seen roller blading, the roller blades should be seen)
|
| 294 |
-
- Capture peak action moments and dynamic motion
|
| 295 |
-
- Describe dramatic lighting setups (harsh shadows, rim lighting, flash freeze)
|
| 296 |
-
- Include elements that convey danger and excitement
|
| 297 |
-
- Focus on athletic poses and expressions of intensity
|
| 298 |
-
- Include VERY cool gear
|
| 299 |
-
|
| 300 |
-
Examples:
|
| 301 |
-
Input: A picture of me as a dessert biker
|
| 302 |
-
Input photo: White man
|
| 303 |
-
Output prompt:
|
| 304 |
-
A moody black-and-white portrait captures the silhouette of a dirt biker standing against a
|
| 305 |
-
hazy, light-splintered desert backdrop. The composition is tightly framed in portrait format,
|
| 306 |
-
showing the rider from just above the waist upward, centered in the shot and facing directly
|
| 307 |
-
into the camera. His posture is calm and unshaken — radiating confidence and defiance
|
| 308 |
-
beneath the helmet.
|
| 309 |
-
He wears a loose, oversized white T-shirt with visible holes and stains, heavily worn from
|
| 310 |
-
heat and dust, with the word "OHNEIS" boldly printed across the chest in cracked, industrial
|
| 311 |
-
lettering. The shirt hangs slightly off his shoulder, catching the soft ambient wind. Over his
|
| 312 |
-
face, a matte motocross helmet obscures his expression, but the eyes are just barely visible
|
| 313 |
-
through a clear, dust-specked motocross goggle. Across the top edge of the goggle lens, the
|
| 314 |
-
name "OHNEIS" is printed again — slightly curved with the lens contour, framed between
|
| 315 |
-
scattered reflections and dirt smudges.
|
| 316 |
-
Behind him, a cloud of lifted dust floats faintly in the air, and light from a high sun cuts
|
| 317 |
-
through the haze in harsh diagonal streaks, creating layered contrast and adding a cinematic
|
| 318 |
-
edge. Grain is prominent, especially in the midtones and background haze, while slight
|
| 319 |
-
motion blur in the particles gives the scene a sense of environmental motion despite the still
|
| 320 |
-
pose of the subject. The rider's dark gear stands in stark contrast to the pale light behind
|
| 321 |
-
him, with the overall tone raw, minimal, and visually arresting — a moment suspended in
|
| 322 |
-
dust and silence.
|
| 323 |
-
|
| 324 |
-
Why the output is good:
|
| 325 |
-
- This prompt creates powerful contrast between stillness (the posed rider) and subtle motion (dust in air).
|
| 326 |
-
- Clearly states Black and White stillness
|
| 327 |
-
|
| 328 |
-
What can be better:
|
| 329 |
-
- The prompt is too long
|
| 330 |
-
|
| 331 |
-
Input: A picture of a drifting porshe
|
| 332 |
-
Input photo: A car
|
| 333 |
-
Output prompt:
|
| 334 |
-
A high-contrast black-and-white photograph capturing an extreme close-up of the rear half of
|
| 335 |
-
a vintage Porsche 911 Carrera mid-drift through a desert curve. Shot tightly from a low
|
| 336 |
-
rear-three-quarter angle in portrait orientation, the frame focuses solely on the car's back
|
| 337 |
-
quarter panel, rear wheel, and the explosion of dust and smoke billowing behind it. The
|
| 338 |
-
vehicle's iconic curves, chrome bumper, and the number "911 OHNEIS" in Porsche's
|
| 339 |
-
signature font are clearly visible, slightly catching the harsh desert sunlight.
|
| 340 |
-
The composition centers on the raw chaos of the drift: the rear tire is kicking out violently to
|
| 341 |
-
the left, slicing into the sandy ground and throwing up a massive, high-reaching dust plume
|
| 342 |
-
that fills most of the upper half of the frame. This dust cloud appears dense, layered, and
|
| 343 |
-
almost sculptural — with illuminated outer edges catching bright light rays that streak
|
| 344 |
-
diagonally across the frame from the top right corner.
|
| 345 |
-
The motion blur is used selectively: the car's rear and wheel arch are mostly crisp, while the
|
| 346 |
-
tire and dust cloud blur dynamically to emphasize speed and torque. Grain is heavy
|
| 347 |
-
throughout the image, especially within the dust textures and darker shadows. The ground is
|
| 348 |
-
streaked with tire marks and disturbed sand, adding detail and context.
|
| 349 |
-
Shot with a shutter speed of approximately 1/40s using a panning technique, the image
|
| 350 |
-
retains key visual clarity while enhancing the sense of movement and kinetic energy. This
|
| 351 |
-
close-cropped perspective creates an intense, almost abstract portrait of the moment —
|
| 352 |
-
pure mechanical force meeting loose terrain in a visual blast of contrast and grit.
|
| 353 |
-
|
| 354 |
-
Why the output is good:
|
| 355 |
-
- This prompt excels in capturing dynamic motion through selective blur and focus.
|
| 356 |
-
- The prompt is relevant to the users imput as it describes exactly where the car is being driven
|
| 357 |
-
- Technical details like shutter speed and panning technique guide the AI toward realistic motion effects.
|
| 358 |
-
|
| 359 |
-
What can be better:
|
| 360 |
-
- The prompt is too long
|
| 361 |
-
- Does not specifically say Black and White
|
| 362 |
-
|
| 363 |
-
You need to enhance the following prompt according to the guide above. Only output the prompt, no other text.
|
| 364 |
-
{{ user_prompt }}
|
| 365 |
-
""")
|
| 366 |
-
'''
|
| 367 |
-
CAPTIVATING_PROMPT_TEMPLATE = jinja2.Template("""You are an expert prompt engineer for captivating image generation.
|
| 368 |
-
|
| 369 |
-
Transform the user's simple prompt into a highly descriptive paragraph that produces a visually striking image. The photo of the user will be provided to you, so you should use it to infer the subject's appearance and incorporate accurate descriptors.
|
| 370 |
-
|
| 371 |
-
Focus on creating a visually striking image that captures the subject's personality and style. Use dynamic camera angles and poses if appropriate. Use a photorealistic style. Resolution 1792x1024.
|
| 372 |
-
|
| 373 |
-
You need to enhance the following prompt according to the guide above. Only output the prompt, no other text.
|
| 374 |
-
{{ user_prompt }}
|
| 375 |
-
""")
|
| 376 |
-
|
| 377 |
-
MODERN_PRODUCT_TEMPLATE = jinja2.Template("""You are an expert prompt engineer for captivating image generation.
|
| 378 |
-
|
| 379 |
-
Transform the user's simple prompt into a highly descriptive paragraph that produces a visually striking image. The photo of the user will be provided to you, so you should use it to infer the subject's appearance and incorporate accurate descriptors.
|
| 380 |
-
|
| 381 |
-
Focus on creating a visually striking image that captures the subject's personality and style. Use dynamic camera angles and poses if appropriate. Use a photorealistic style. Resolution 1792x1024.
|
| 382 |
-
|
| 383 |
-
You need to enhance the following prompt according to the guide above. Only output the prompt, no other text.
|
| 384 |
-
{{ user_prompt }}
|
| 385 |
-
""")
|
| 386 |
-
|
| 387 |
-
CAPTIVATING_PROMPT_TEMPLATE = jinja2.Template("""You are an expert prompt engineer for captivating image generation.
|
| 388 |
-
|
| 389 |
-
Transform the user's simple prompt into a highly descriptive paragraph that produces a visually striking image. The photo of the user will be provided to you, so you should use it to infer the subject's appearance and incorporate accurate descriptors.
|
| 390 |
-
|
| 391 |
-
Focus on creating a visually striking image that captures the subject's personality and style. Use dynamic camera angles and poses if appropriate. Use a photorealistic style. Resolution 1792x1024.
|
| 392 |
-
|
| 393 |
-
You need to enhance the following prompt according to the guide above. Only output the prompt, no other text.
|
| 394 |
-
{{ user_prompt }}
|
| 395 |
-
""")
|
| 396 |
-
'''
|
| 397 |
-
|
| 398 |
-
def process_prompt(image, image2, target_label, user_prompt, style):
|
| 399 |
image_url = None
|
| 400 |
image_url2 = None
|
| 401 |
|
|
@@ -411,73 +123,162 @@ def process_prompt(image, image2, target_label, user_prompt, style):
|
|
| 411 |
b64_image2 = base64.b64encode(buffer.getvalue()).decode("utf-8")
|
| 412 |
image_url2 = f"data:image/jpeg;base64,{b64_image2}"
|
| 413 |
|
| 414 |
-
|
| 415 |
-
|
| 416 |
-
|
|
|
|
| 417 |
|
| 418 |
-
|
| 419 |
-
system_content = "You are expert prompt engineer"
|
| 420 |
-
user_content = FASHION_PROMPT_TEMPLATE.render(user_prompt=user_prompt)
|
| 421 |
|
| 422 |
-
|
| 423 |
-
system_content = "You are expert prompt engineer"
|
| 424 |
-
user_content = EMOTIONAL_LIFESTYLE_PROMPT_TEMPLATE.render(user_prompt=user_prompt)
|
| 425 |
|
| 426 |
-
|
| 427 |
-
|
| 428 |
-
|
|
|
|
| 429 |
|
| 430 |
response = client.responses.create(
|
| 431 |
model="gpt-5",
|
| 432 |
reasoning={"effort": "low"},
|
| 433 |
input=[
|
| 434 |
{
|
| 435 |
-
"role": "system",
|
| 436 |
-
"content":
|
| 437 |
},
|
| 438 |
{
|
| 439 |
-
"role": "user",
|
| 440 |
-
"content":
|
| 441 |
-
{"type": "input_text", "text": user_content},
|
| 442 |
-
{"type": "input_image", "image_url": image_url},
|
| 443 |
-
{"type": "input_image", "image_url": image_url2}
|
| 444 |
-
]
|
| 445 |
}
|
| 446 |
],
|
| 447 |
)
|
| 448 |
return f"{response.output_text} {target_label.strip()}"
|
| 449 |
|
| 450 |
-
demo = gr.Interface(
|
| 451 |
-
fn=process_prompt,
|
| 452 |
-
inputs=[
|
| 453 |
-
|
| 454 |
-
gr.Image(
|
| 455 |
-
label="Upload reference image",
|
| 456 |
-
type="pil"
|
| 457 |
-
),
|
| 458 |
-
gr.Image(
|
| 459 |
-
label="Upload 2nd reference image",
|
| 460 |
-
type="pil"
|
| 461 |
-
),
|
| 462 |
-
gr.Textbox(
|
| 463 |
-
label="Enter target label",
|
| 464 |
-
placeholder="SMRA",
|
| 465 |
-
),
|
| 466 |
-
gr.Textbox(
|
| 467 |
-
label="Enter your prompt",
|
| 468 |
-
placeholder="picture of me while sitting in a chair in the ocean",
|
| 469 |
-
),
|
| 470 |
-
gr.Dropdown(
|
| 471 |
-
choices=["General", "Fashion", "Emotional Lifestyle", "Extreme Sports"],
|
| 472 |
-
#choices=["Chromatic Cinematic", "Neon Noir", "General"],
|
| 473 |
-
label="Style Selection",
|
| 474 |
-
info="Choose the visual style for your enhanced prompt"
|
| 475 |
-
),
|
| 476 |
-
],
|
| 477 |
-
outputs=gr.Textbox(
|
| 478 |
-
label="Style Prompt",
|
| 479 |
-
lines=20,
|
| 480 |
-
),
|
| 481 |
-
)
|
| 482 |
|
| 483 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
import base64
|
| 2 |
+
from dataclasses import dataclass
|
| 3 |
from io import BytesIO
|
| 4 |
+
from pathlib import Path
|
| 5 |
+
from typing import Literal, cast
|
| 6 |
|
| 7 |
import gradio as gr
|
| 8 |
import jinja2
|
| 9 |
from openai import OpenAI
|
| 10 |
+
from pydantic import BaseModel
|
| 11 |
|
| 12 |
client = OpenAI()
|
| 13 |
|
| 14 |
+
TEMPLATES_DIR = Path(__file__).resolve().parent / "templates"
|
| 15 |
+
jinja_env = jinja2.Environment(loader=jinja2.FileSystemLoader(str(TEMPLATES_DIR)))
|
| 16 |
|
| 17 |
+
SYSTEM_PROMPT = "You are expert prompt engineer"
|
| 18 |
+
|
| 19 |
+
StyleName = Literal[
|
| 20 |
+
"General",
|
| 21 |
+
"Fashion",
|
| 22 |
+
"Emotional Lifestyle",
|
| 23 |
+
"Extreme Sports",
|
| 24 |
+
"Modern Product",
|
| 25 |
+
"Captivating",
|
| 26 |
+
"Cool Lifestyle",
|
| 27 |
+
"Image Replication",
|
| 28 |
+
]
|
| 29 |
+
|
| 30 |
+
|
| 31 |
+
@dataclass(frozen=True)
|
| 32 |
+
class StyleDefinition:
|
| 33 |
+
name: StyleName
|
| 34 |
+
template_filename: str
|
| 35 |
+
info: str
|
| 36 |
+
|
| 37 |
+
|
| 38 |
+
STYLE_DEFINITIONS: dict[StyleName, StyleDefinition] = {
|
| 39 |
+
"General": StyleDefinition(
|
| 40 |
+
name="General",
|
| 41 |
+
template_filename="general_prompt.jinja",
|
| 42 |
+
info="Versatile, balanced storytelling with cinematic detail for most scenarios.",
|
| 43 |
+
),
|
| 44 |
+
"Fashion": StyleDefinition(
|
| 45 |
+
name="Fashion",
|
| 46 |
+
template_filename="fashion_prompt.jinja",
|
| 47 |
+
info="Editorial fashion aesthetic highlighting garments, styling, and runway polish.",
|
| 48 |
+
),
|
| 49 |
+
"Emotional Lifestyle": StyleDefinition(
|
| 50 |
+
name="Emotional Lifestyle",
|
| 51 |
+
template_filename="emotional_lifestyle_prompt.jinja",
|
| 52 |
+
info="Warm, candid lifestyle imagery that focuses on mood, relationships, and feelings.",
|
| 53 |
+
),
|
| 54 |
+
"Extreme Sports": StyleDefinition(
|
| 55 |
+
name="Extreme Sports",
|
| 56 |
+
template_filename="extreme_sports_prompt.jinja",
|
| 57 |
+
info="High-adrenaline action shots that emphasize energy, motion, and athletic feats.",
|
| 58 |
+
),
|
| 59 |
+
"Modern Product": StyleDefinition(
|
| 60 |
+
name="Modern Product",
|
| 61 |
+
template_filename="modern_product_prompt.jinja",
|
| 62 |
+
info="Crisp product visuals with contemporary lighting and minimalistic staging.",
|
| 63 |
+
),
|
| 64 |
+
"Captivating": StyleDefinition(
|
| 65 |
+
name="Captivating",
|
| 66 |
+
template_filename="captivating_prompt.jinja",
|
| 67 |
+
info="Visually striking compositions with dramatic flair and memorable storytelling.",
|
| 68 |
+
),
|
| 69 |
+
"Cool Lifestyle": StyleDefinition(
|
| 70 |
+
name="Cool Lifestyle",
|
| 71 |
+
template_filename="cool_lifestyle_prompt.jinja",
|
| 72 |
+
info="Casual yet stylish lifestyle scenes with an effortlessly cool atmosphere.",
|
| 73 |
+
),
|
| 74 |
+
"Image Replication": StyleDefinition(
|
| 75 |
+
name="Image Replication",
|
| 76 |
+
template_filename="image_replication_prompt.jinja",
|
| 77 |
+
info=(
|
| 78 |
+
"Mimic the reference image's composition, lighting, and styling exactly while"
|
| 79 |
+
" inserting the user or their face in place of the original subject."
|
| 80 |
+
),
|
| 81 |
+
),
|
| 82 |
+
}
|
| 83 |
+
|
| 84 |
+
PROMPT_TEMPLATES = {
|
| 85 |
+
style: jinja_env.get_template(config.template_filename)
|
| 86 |
+
for style, config in STYLE_DEFINITIONS.items()
|
| 87 |
+
}
|
| 88 |
+
|
| 89 |
+
DEFAULT_STYLE: StyleName = "General"
|
| 90 |
+
STYLE_CHOICES: tuple[StyleName, ...] = tuple(STYLE_DEFINITIONS.keys())
|
| 91 |
+
|
| 92 |
+
STYLE_INFORMATION_BLOCK = "\n".join(
|
| 93 |
+
f"- {style}: {config.info}" for style, config in STYLE_DEFINITIONS.items()
|
| 94 |
+
)
|
| 95 |
+
|
| 96 |
+
|
| 97 |
+
class StyleSelectionResponse(BaseModel):
|
| 98 |
+
style: StyleName
|
| 99 |
+
rationale: str
|
| 100 |
+
|
| 101 |
+
|
| 102 |
+
AUTO_STYLE_SYSTEM_PROMPT = (
|
| 103 |
+
"You are an art director who must pick the most fitting style name for a user's prompt. "
|
| 104 |
+
"Consider the available styles and choose the single best option.\n\n"
|
| 105 |
+
f"Style Guide:\n{STYLE_INFORMATION_BLOCK}\n\n"
|
| 106 |
+
"Return JSON that matches the schema exactly."
|
| 107 |
+
)
|
| 108 |
+
|
| 109 |
+
|
| 110 |
+
def process_prompt(image, image2, target_label: str, user_prompt: str, style: StyleName) -> str:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 111 |
image_url = None
|
| 112 |
image_url2 = None
|
| 113 |
|
|
|
|
| 123 |
b64_image2 = base64.b64encode(buffer.getvalue()).decode("utf-8")
|
| 124 |
image_url2 = f"data:image/jpeg;base64,{b64_image2}"
|
| 125 |
|
| 126 |
+
try:
|
| 127 |
+
template = PROMPT_TEMPLATES[style]
|
| 128 |
+
except KeyError as error:
|
| 129 |
+
raise ValueError(f"Unsupported style: {style}") from error
|
| 130 |
|
| 131 |
+
user_content = template.render(user_prompt=user_prompt)
|
|
|
|
|
|
|
| 132 |
|
| 133 |
+
content = [{"type": "input_text", "text": user_content}]
|
|
|
|
|
|
|
| 134 |
|
| 135 |
+
if image_url is not None:
|
| 136 |
+
content.append({"type": "input_image", "image_url": image_url})
|
| 137 |
+
if image_url2 is not None:
|
| 138 |
+
content.append({"type": "input_image", "image_url": image_url2})
|
| 139 |
|
| 140 |
response = client.responses.create(
|
| 141 |
model="gpt-5",
|
| 142 |
reasoning={"effort": "low"},
|
| 143 |
input=[
|
| 144 |
{
|
| 145 |
+
"role": "system",
|
| 146 |
+
"content": SYSTEM_PROMPT,
|
| 147 |
},
|
| 148 |
{
|
| 149 |
+
"role": "user",
|
| 150 |
+
"content": content,
|
|
|
|
|
|
|
|
|
|
|
|
|
| 151 |
}
|
| 152 |
],
|
| 153 |
)
|
| 154 |
return f"{response.output_text} {target_label.strip()}"
|
| 155 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 156 |
|
| 157 |
+
def recommend_style(user_prompt: str) -> StyleSelectionResponse:
|
| 158 |
+
completion = client.chat.completions.parse(
|
| 159 |
+
model="gpt-5",
|
| 160 |
+
messages=[
|
| 161 |
+
{"role": "system", "content": AUTO_STYLE_SYSTEM_PROMPT},
|
| 162 |
+
{"role": "user", "content": user_prompt},
|
| 163 |
+
],
|
| 164 |
+
response_format=StyleSelectionResponse,
|
| 165 |
+
)
|
| 166 |
+
return completion.choices[0].message.parsed
|
| 167 |
+
|
| 168 |
+
|
| 169 |
+
def auto_select_style(user_prompt: str):
|
| 170 |
+
if not user_prompt or not user_prompt.strip():
|
| 171 |
+
raise gr.Error("Enter your prompt before selecting a style automatically.")
|
| 172 |
+
|
| 173 |
+
selection = recommend_style(user_prompt)
|
| 174 |
+
|
| 175 |
+
return (
|
| 176 |
+
selection.style,
|
| 177 |
+
gr.update(value=selection.style, interactive=False),
|
| 178 |
+
)
|
| 179 |
+
|
| 180 |
+
|
| 181 |
+
def prepare_manual_style(current_style: str | None) -> tuple[StyleName, dict[str, object]]:
|
| 182 |
+
resolved_style = cast(StyleName, current_style) if current_style in STYLE_CHOICES else DEFAULT_STYLE
|
| 183 |
+
return (
|
| 184 |
+
resolved_style,
|
| 185 |
+
gr.update(value=resolved_style, interactive=True),
|
| 186 |
+
)
|
| 187 |
+
|
| 188 |
+
|
| 189 |
+
def prepare_style_selection(
|
| 190 |
+
user_prompt: str,
|
| 191 |
+
current_style: str | None,
|
| 192 |
+
auto_style_enabled: bool,
|
| 193 |
+
) -> tuple[StyleName, dict[str, object]]:
|
| 194 |
+
if auto_style_enabled:
|
| 195 |
+
selected_style, dropdown_update = auto_select_style(user_prompt)
|
| 196 |
+
return selected_style, dropdown_update
|
| 197 |
+
return prepare_manual_style(current_style)
|
| 198 |
+
|
| 199 |
+
|
| 200 |
+
def handle_auto_style_toggle(auto_enabled: bool) -> dict[str, object]:
|
| 201 |
+
return gr.update(interactive=not auto_enabled)
|
| 202 |
+
|
| 203 |
+
|
| 204 |
+
def generate_prompt_handler(
|
| 205 |
+
image,
|
| 206 |
+
image2,
|
| 207 |
+
target_label: str,
|
| 208 |
+
user_prompt: str,
|
| 209 |
+
current_style: str | None,
|
| 210 |
+
auto_style_enabled: bool,
|
| 211 |
+
):
|
| 212 |
+
resolved_style, dropdown_update = prepare_style_selection(
|
| 213 |
+
user_prompt=user_prompt,
|
| 214 |
+
current_style=current_style,
|
| 215 |
+
auto_style_enabled=auto_style_enabled,
|
| 216 |
+
)
|
| 217 |
+
prompt_text = process_prompt(
|
| 218 |
+
image=image,
|
| 219 |
+
image2=image2,
|
| 220 |
+
target_label=target_label,
|
| 221 |
+
user_prompt=user_prompt,
|
| 222 |
+
style=resolved_style,
|
| 223 |
+
)
|
| 224 |
+
display_text = f"Selected style: {resolved_style}\n\n{prompt_text}"
|
| 225 |
+
return display_text, dropdown_update
|
| 226 |
+
|
| 227 |
+
|
| 228 |
+
with gr.Blocks() as demo:
|
| 229 |
+
with gr.Row():
|
| 230 |
+
with gr.Column():
|
| 231 |
+
user_image = gr.Image(
|
| 232 |
+
label="Upload user photo",
|
| 233 |
+
type="pil"
|
| 234 |
+
)
|
| 235 |
+
reference_image = gr.Image(
|
| 236 |
+
label="Optional:Upload reference image (Eg. movie poster, music album cover, etc.)",
|
| 237 |
+
type="pil",
|
| 238 |
+
)
|
| 239 |
+
target_label = gr.Textbox(
|
| 240 |
+
label="Enter target label",
|
| 241 |
+
placeholder="SMRA",
|
| 242 |
+
)
|
| 243 |
+
user_prompt = gr.Textbox(
|
| 244 |
+
label="Enter your prompt",
|
| 245 |
+
placeholder="picture of me while sitting in a chair in the ocean",
|
| 246 |
+
lines=4,
|
| 247 |
+
)
|
| 248 |
+
style_dropdown = gr.Dropdown(
|
| 249 |
+
choices=list(STYLE_CHOICES),
|
| 250 |
+
value=DEFAULT_STYLE,
|
| 251 |
+
label="Style Selection",
|
| 252 |
+
info="Choose the visual style for your enhanced prompt",
|
| 253 |
+
interactive=True,
|
| 254 |
+
)
|
| 255 |
+
auto_style_checkbox = gr.Checkbox(
|
| 256 |
+
label="Auto-select best style",
|
| 257 |
+
value=False,
|
| 258 |
+
)
|
| 259 |
+
generate_button = gr.Button("Generate Prompt")
|
| 260 |
+
with gr.Column():
|
| 261 |
+
prompt_output = gr.Textbox(
|
| 262 |
+
label="Style Prompt",
|
| 263 |
+
lines=20,
|
| 264 |
+
)
|
| 265 |
+
|
| 266 |
+
generate_button.click(
|
| 267 |
+
generate_prompt_handler,
|
| 268 |
+
inputs=[
|
| 269 |
+
user_image,
|
| 270 |
+
reference_image,
|
| 271 |
+
target_label,
|
| 272 |
+
user_prompt,
|
| 273 |
+
style_dropdown,
|
| 274 |
+
auto_style_checkbox,
|
| 275 |
+
],
|
| 276 |
+
outputs=[prompt_output, style_dropdown],
|
| 277 |
+
)
|
| 278 |
+
auto_style_checkbox.change(
|
| 279 |
+
handle_auto_style_toggle,
|
| 280 |
+
inputs=[auto_style_checkbox],
|
| 281 |
+
outputs=[style_dropdown],
|
| 282 |
+
)
|
| 283 |
+
|
| 284 |
+
demo.launch()
|
templates/captivating_prompt.jinja
ADDED
|
@@ -0,0 +1,55 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
You are an expert prompt engineer for captivating image generation.
|
| 2 |
+
|
| 3 |
+
Transform the user's simple prompt into a highly descriptive paragraph that produces a visually striking image. The photo of the user will be provided to you, so you should use it to infer the subject's appearance and incorporate accurate descriptors.
|
| 4 |
+
|
| 5 |
+
Generate a photorealistic photograph capturing the subject in an unusual, character-rich, or
|
| 6 |
+
dynamically posed situation. The subject's face should be centered in the frame, and should be up close and personal to the camera.
|
| 7 |
+
Employ specific camera techniques (e.g., extreme close-up, fisheye distortion, shallow depth of field)
|
| 8 |
+
to specifically distort the face in a cool perspective way.
|
| 9 |
+
Specifically focus on weird perspectives and camera lenses.
|
| 10 |
+
Emphasize physical details like, eye expression, subtle movements (or stillness), mouth expression, and the subject's
|
| 11 |
+
interaction with its environment or other subjects. The background should be intricately described
|
| 12 |
+
to display the distortion of the camera lens. The final image should feel cinematic, expressive, and potentially humorous or surreal.
|
| 13 |
+
|
| 14 |
+
Tips for Success:
|
| 15 |
+
- Make the prompt short
|
| 16 |
+
- Relevant to the user's prompt request
|
| 17 |
+
- Specific so that the image generator does not hallucinate impossible features
|
| 18 |
+
- Does not add too many difficult details that the image generator cannot handle and thus hallucinates (i.e. Her left hand gestures toward the lens, fingers elongated and warped by the fisheye; the foreground fingertips streak with slight motion blur.)
|
| 19 |
+
- Make the pose dramatic and exagerated
|
| 20 |
+
- Clearly define the action (aggressively typing, clinging, sitting, sleeping, jumping).
|
| 21 |
+
- Use lens types (fisheye, macro, wide-angle) explicitly for specific perspectives.
|
| 22 |
+
- Describe the background and the props in the background
|
| 23 |
+
|
| 24 |
+
Examples:
|
| 25 |
+
|
| 26 |
+
Input: A picture of me on the phone
|
| 27 |
+
Input photo: A black woman
|
| 28 |
+
Output prompt:
|
| 29 |
+
A hyperrealistic 1:1 portrait of a Black woman captured with an extreme fisheye lens, distorting her face and body
|
| 30 |
+
— her eyes and nose appear oversized, head curves toward the camera, and the chair and surroundings warp dramatically.
|
| 31 |
+
She sits in a white wicker chair with her face in the center of the frame,
|
| 32 |
+
wearing a glossy hot pink satin robe with sharp lapels and a vintage brooch.
|
| 33 |
+
One hand presses a tissue to her cheek mid-tear, the other holds a red corded phone.
|
| 34 |
+
Motion blur trails her hand and head movement, emphasizing action.
|
| 35 |
+
Lighting is cinematic and retro, with saturated colors, deep shadows, soft bloom highlights, and subtle light leaks.
|
| 36 |
+
Matte lavender background contrasts with glossy fabrics and vibrant props.
|
| 37 |
+
Film grain, faint scratches, floating dust, and slight RGB fringing add tactile, analog texture.
|
| 38 |
+
The composition, angle, and lens create a surreal, intense, and hyper-stylized editorial mood.
|
| 39 |
+
Resolution 1024x1024. Use a Photorealistic Style.
|
| 40 |
+
|
| 41 |
+
Why the output is good:
|
| 42 |
+
- The prompt is relevant to the user's prompt request
|
| 43 |
+
- The subject face is centered
|
| 44 |
+
- This prompt uses a dramatic lens and defines it ( extreme fisheye lens)
|
| 45 |
+
- This prompt describes the lens with specific details (her eyes and nose appear oversized, head curves toward the camera, and the chair and surroundings warp dramatically)
|
| 46 |
+
- The prompt has compelling coloring and intersting elements (glossy hot pink satin robe, red corded phone, matte lavender background)
|
| 47 |
+
|
| 48 |
+
What can be better:
|
| 49 |
+
- prompt is not short
|
| 50 |
+
- The background can be more intricately described
|
| 51 |
+
- the face is not up close and personal to the camera
|
| 52 |
+
|
| 53 |
+
|
| 54 |
+
You need to enhance the following prompt according to the guide above. Only output the prompt, no other text.
|
| 55 |
+
{{ user_prompt }}
|
templates/cool_lifestyle_prompt.jinja
ADDED
|
@@ -0,0 +1,8 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
You are an expert prompt engineer for cool image generation.
|
| 2 |
+
|
| 3 |
+
Transform the user's simple prompt into a highly descriptive paragraph that produces a visually striking image. The photo of the user will be provided to you, so you should use it to infer the subject's appearance and incorporate accurate descriptors.
|
| 4 |
+
|
| 5 |
+
Focus on creating a visually striking image that captures the subject's personality and style. Use dynamic camera angles and poses if appropriate. Use a photorealistic style. Resolution 1792x1024.
|
| 6 |
+
|
| 7 |
+
You need to enhance the following prompt according to the guide above. Only output the prompt, no other text.
|
| 8 |
+
{{ user_prompt }}
|
templates/emotional_lifestyle_prompt.jinja
ADDED
|
@@ -0,0 +1,99 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
You are an expert prompt engineer for emotional lifestyle image generation.
|
| 2 |
+
|
| 3 |
+
Transform the user's simple prompt into a highly descriptive paragraph that produces a visually striking image. The photo of the user will be provided to you, so you should use it to infer the subject's appearance and incorporate accurate descriptors.
|
| 4 |
+
|
| 5 |
+
Focus on creating a vivid lifestyle portrait that captures an authentic emotional moment or state within a visually compelling environment. Be attentive to portraying the subject in a way that reveals character, mood, or narrative through their expression, posture, and interaction with their surroundings. Use very cool contrasting colors to elevate the subject and utilize naturalistic lighting approaches (e.g., window light, ambient environmental lighting, soft golden hour) or stylized lighting that enhances the emotional tone. Incorporate environmental details that contribute to storytelling and provide context. The final image should feel intimate yet visually striking—balancing raw emotional authenticity with aesthetic sophistication through thoughtful composition, color treatment, and atmospheric elements.
|
| 6 |
+
|
| 7 |
+
Tips for Success:
|
| 8 |
+
- Make the prompt short
|
| 9 |
+
- Importantly highlight the user prompt request (i.e. if the user asks to be seen roller blading, the roller blades should be seen)
|
| 10 |
+
- Define the emotional state or moment clearly (vulnerability, joy, contemplation)
|
| 11 |
+
- Specify the lighting and how it enhances the mood (soft window light, dramatic shadows)
|
| 12 |
+
- Include meaningful props or elements that tell the subject's story
|
| 13 |
+
- Describe subtle details in expression or posture that convey emotion
|
| 14 |
+
- Consider color treatment that reinforces the emotional tone
|
| 15 |
+
- Add atmospheric elements that enhance the mood (water droplets, steam, fabric texture)
|
| 16 |
+
|
| 17 |
+
Examples:
|
| 18 |
+
Input: A picture of me crying on the phone
|
| 19 |
+
Input photo: A black woman
|
| 20 |
+
Output prompt:
|
| 21 |
+
A hyperrealistic editorial-style fashion photograph in vertical format (1080x1350), heavily
|
| 22 |
+
stylized with retro lighting, saturated color, and cinematic imperfection. A Black woman sits
|
| 23 |
+
facing the camera in a white wicker chair, wearing a glossy hot pink satin robe with sharp
|
| 24 |
+
lapels and a vintage brooch pinned to her chest. Her hair is styled in soft, voluminous waves.
|
| 25 |
+
|
| 26 |
+
She holds a red corded landline receiver in one hand, and in the other presses a tissue to
|
| 27 |
+
her cheek, caught mid-tear with a melodramatic, frozen expression. Her eyelids shimmer
|
| 28 |
+
with green eyeshadow, slightly smudged, evoking a stylized soap opera mood.
|
| 29 |
+
Resting on her lap is a bright blue tissue box with bold white cloud graphics — and
|
| 30 |
+
prominently across the front, the word "ohneis" is printed in large, clean white letters in a
|
| 31 |
+
stylized, editorial font. A single tissue protrudes, loosely folded over the edge. To her left, a
|
| 32 |
+
small round table holds the red phone base, crumpled tissues, and a decorative vase with
|
| 33 |
+
pink plastic flowers. The background is a matte lavender surface, creating smooth contrast
|
| 34 |
+
with the glossy fabrics and vibrant tones.
|
| 35 |
+
|
| 36 |
+
The entire image is treated with subtle analog effects: soft bloom on the highlights, visible
|
| 37 |
+
film grain, faint vertical scratches, floating dust particles, and a few light leaks that enhance
|
| 38 |
+
the stylized, nostalgic mood. The scene feels like a surreal still from a hyper-aestheticized
|
| 39 |
+
1980s commercial.
|
| 40 |
+
|
| 41 |
+
Why the output is good:
|
| 42 |
+
- The prompt uses props to tell the story of the subject (red phone, tissue box, tissues)
|
| 43 |
+
- Defines details that describe the emotional moment (mid-tear)
|
| 44 |
+
- Adds interesting and bold colors (purple robe, hot pink robe, red phone, blue tissue box)
|
| 45 |
+
|
| 46 |
+
What can be better:
|
| 47 |
+
- the prompt is too long
|
| 48 |
+
- does not define the emotion clear enough
|
| 49 |
+
|
| 50 |
+
Input: A picture of water pouring on me
|
| 51 |
+
Input photo: Blue eyed white man
|
| 52 |
+
Output prompt:
|
| 53 |
+
A hyperrealistic flash photograph taken at eye level, capturing a half-body, front-facing
|
| 54 |
+
portrait of a young man standing shirtless against a sleek, modern white wall. A column of
|
| 55 |
+
water strikes him directly in the face at the moment of impact, caught mid-air in razor-sharp
|
| 56 |
+
detail — droplets frozen as they burst and scatter across his features. His right shoulder is
|
| 57 |
+
slightly raised and tensed, muscles subtly defined under the harsh lighting. His eyes are
|
| 58 |
+
half-closed in reaction, mouth neutral, giving the scene a raw, involuntary intensity. Around
|
| 59 |
+
his neck hangs a thin turquoise necklace, glinting faintly in the sun, its color vividly
|
| 60 |
+
contrasting with his sun-warmed skin. In the blurred background, a lone palm tree arcs
|
| 61 |
+
gently from the left edge of the frame, with the deep blue sea stretching toward a soft, hazy
|
| 62 |
+
horizon. The flash adds hard highlights to the water, the necklace, and the tension lines on
|
| 63 |
+
his body, while subtle analog textures — faint vertical lens scratches, fine grain, and
|
| 64 |
+
scattered dust — bring a tactile, editorial edge to the image.
|
| 65 |
+
|
| 66 |
+
Why the output is good:
|
| 67 |
+
- The promprt is relevant to the users imput as it describes exactly where the water is pouring
|
| 68 |
+
- The physical reaction (raised shoulder, half-closed eyes) describes the emotion
|
| 69 |
+
- Environmental hints (palm tree, sea) establish location context without overwhelming the portrait.
|
| 70 |
+
|
| 71 |
+
What can be better:
|
| 72 |
+
- His emotion is not clearly defined
|
| 73 |
+
|
| 74 |
+
Input: A picture of me in a helmer
|
| 75 |
+
Input photo: A white man with blue eyes
|
| 76 |
+
Output prompt:
|
| 77 |
+
A hyperrealistic macro flash photograph taken from a low frontal angle, capturing the
|
| 78 |
+
intense, close-up portrait of a tanned male model wearing a high-impact helmet with a
|
| 79 |
+
closed chin guard, resembling the design of a rugby or Formula 1 helmet. The camera is
|
| 80 |
+
positioned slightly beneath eye level, making the face appear dominant and imposing within
|
| 81 |
+
the vertical frame. His expression is calm but intense, with piercing clear blue eyes staring
|
| 82 |
+
directly into the lens, framed by the slightly open visor. The skin is bronzed and smooth, yet
|
| 83 |
+
visibly roughed by activity — fine scratches and a reddish abrasion across the nose give the
|
| 84 |
+
face a raw, lived-in quality. His sculpted features and symmetrical bone structure remain
|
| 85 |
+
visible beneath the helmet's padding. A small red carabiner is clipped casually to one of the
|
| 86 |
+
chin straps, functioning more like a fashion detail than gear. The flash harshly illuminates the
|
| 87 |
+
facial textures and helmet surface, producing sharp highlights and crisp shadows along the
|
| 88 |
+
cheeks and neck. The background is black and indistinct, fading away entirely. Fine analog
|
| 89 |
+
imperfections — vertical lens scratches, dust particles suspended midair in the flash cone,
|
| 90 |
+
and faint grain — lend the image a gritty, stylized realism.
|
| 91 |
+
|
| 92 |
+
Why the output is good:
|
| 93 |
+
- The prompt is relevant to the users imput as it describes exactly where the helmet is being worn
|
| 94 |
+
- The prompt defines his expression (His expression is calm but intense), and sepcifically adds details describing his expression (with piercing clear blue eyes staring
|
| 95 |
+
directly into the lens)
|
| 96 |
+
- the prompt is a good length (although it could be shorter)
|
| 97 |
+
|
| 98 |
+
You need to enhance the following prompt according to the guide above. Only output the prompt, no other text.
|
| 99 |
+
{{ user_prompt }}
|
templates/extreme_sports_prompt.jinja
ADDED
|
@@ -0,0 +1,92 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
You are an expert prompt engineer for extreme sports image generation.
|
| 2 |
+
|
| 3 |
+
Transform the user's simple prompt into a highly descriptive paragraph that produces a visually striking image. The photo of the user will be provided to you, so you should use it to infer the subject's appearance and incorporate accurate descriptors.
|
| 4 |
+
|
| 5 |
+
Focus on creating a dynamic, high-impact photograph capturing an adventure sport athlete in
|
| 6 |
+
mid-action. Utilize dramatic camera angles (e.g., low angle, fisheye, aerial) and specialized
|
| 7 |
+
lighting techniques (e.g., backlit silhouettes, flash freezing motion, golden hour glow) to
|
| 8 |
+
emphasize the intensity and athleticism of the moment. The picture is Black and White (and gray) ONLY.
|
| 9 |
+
Focus on capturing peak action – the apex of a jump, the spray of water/dirt/snow,
|
| 10 |
+
or the tension in the athlete's body, to highlight the users prompt request. Incorporate
|
| 11 |
+
environmental elements that enhance the narrative and mood, whether natural (mountains,
|
| 12 |
+
waves, desert) or urban (concrete, structures, cityscape). The image should balance raw
|
| 13 |
+
athleticism with cinematic drama through specific details in the subject's gear, expression,
|
| 14 |
+
environment, and the physical forces at play. Make sure that the prompt is short, to the point,
|
| 15 |
+
and relevant to the users prompt request.
|
| 16 |
+
|
| 17 |
+
Tips for Success:
|
| 18 |
+
- Make the prompt short
|
| 19 |
+
- Make it clear so the image generator does not hallucinate impossible features like combinations of roller skates and skateboards, and swimming on top of a swimming line
|
| 20 |
+
- Importantly highlight the user prompt request (i.e. if the user asks to be seen roller blading, the roller blades should be seen)
|
| 21 |
+
- Capture peak action moments and dynamic motion
|
| 22 |
+
- Describe dramatic lighting setups (harsh shadows, rim lighting, flash freeze)
|
| 23 |
+
- Include elements that convey danger and excitement
|
| 24 |
+
- Focus on athletic poses and expressions of intensity
|
| 25 |
+
- Include VERY cool gear
|
| 26 |
+
|
| 27 |
+
Examples:
|
| 28 |
+
Input: A picture of me as a dessert biker
|
| 29 |
+
Input photo: White man
|
| 30 |
+
Output prompt:
|
| 31 |
+
A moody black-and-white portrait captures the silhouette of a dirt biker standing against a
|
| 32 |
+
hazy, light-splintered desert backdrop. The composition is tightly framed in portrait format,
|
| 33 |
+
showing the rider from just above the waist upward, centered in the shot and facing directly
|
| 34 |
+
into the camera. His posture is calm and unshaken — radiating confidence and defiance
|
| 35 |
+
beneath the helmet.
|
| 36 |
+
He wears a loose, oversized white T-shirt with visible holes and stains, heavily worn from
|
| 37 |
+
heat and dust, with the word "OHNEIS" boldly printed across the chest in cracked, industrial
|
| 38 |
+
lettering. The shirt hangs slightly off his shoulder, catching the soft ambient wind. Over his
|
| 39 |
+
face, a matte motocross helmet obscures his expression, but the eyes are just barely visible
|
| 40 |
+
through a clear, dust-specked motocross goggle. Across the top edge of the goggle lens, the
|
| 41 |
+
name "OHNEIS" is printed again — slightly curved with the lens contour, framed between
|
| 42 |
+
scattered reflections and dirt smudges.
|
| 43 |
+
Behind him, a cloud of lifted dust floats faintly in the air, and light from a high sun cuts
|
| 44 |
+
through the haze in harsh diagonal streaks, creating layered contrast and adding a cinematic
|
| 45 |
+
edge. Grain is prominent, especially in the midtones and background haze, while slight
|
| 46 |
+
motion blur in the particles gives the scene a sense of environmental motion despite the still
|
| 47 |
+
pose of the subject. The rider's dark gear stands in stark contrast to the pale light behind
|
| 48 |
+
him, with the overall tone raw, minimal, and visually arresting — a moment suspended in
|
| 49 |
+
dust and silence.
|
| 50 |
+
|
| 51 |
+
Why the output is good:
|
| 52 |
+
- This prompt creates powerful contrast between stillness (the posed rider) and subtle motion (dust in air).
|
| 53 |
+
- Clearly states Black and White stillness
|
| 54 |
+
- Makes sure it is clear so that image genertor does not hallucinate features
|
| 55 |
+
|
| 56 |
+
What can be better:
|
| 57 |
+
- The prompt is too long
|
| 58 |
+
|
| 59 |
+
Input: A picture of a drifting porshe
|
| 60 |
+
Input photo: A car
|
| 61 |
+
Output prompt:
|
| 62 |
+
A high-contrast black-and-white photograph capturing an extreme close-up of the rear half of
|
| 63 |
+
a vintage Porsche 911 Carrera mid-drift through a desert curve. Shot tightly from a low
|
| 64 |
+
rear-three-quarter angle in portrait orientation, the frame focuses solely on the car's back
|
| 65 |
+
quarter panel, rear wheel, and the explosion of dust and smoke billowing behind it. The
|
| 66 |
+
vehicle's iconic curves, chrome bumper, and the number "911 OHNEIS" in Porsche's
|
| 67 |
+
signature font are clearly visible, slightly catching the harsh desert sunlight.
|
| 68 |
+
The composition centers on the raw chaos of the drift: the rear tire is kicking out violently to
|
| 69 |
+
the left, slicing into the sandy ground and throwing up a massive, high-reaching dust plume
|
| 70 |
+
that fills most of the upper half of the frame. This dust cloud appears dense, layered, and
|
| 71 |
+
almost sculptural — with illuminated outer edges catching bright light rays that streak
|
| 72 |
+
diagonally across the frame from the top right corner.
|
| 73 |
+
The motion blur is used selectively: the car's rear and wheel arch are mostly crisp, while the
|
| 74 |
+
tire and dust cloud blur dynamically to emphasize speed and torque. Grain is heavy
|
| 75 |
+
throughout the image, especially within the dust textures and darker shadows. The ground is
|
| 76 |
+
streaked with tire marks and disturbed sand, adding detail and context.
|
| 77 |
+
Shot with a shutter speed of approximately 1/40s using a panning technique, the image
|
| 78 |
+
retains key visual clarity while enhancing the sense of movement and kinetic energy. This
|
| 79 |
+
close-cropped perspective creates an intense, almost abstract portrait of the moment —
|
| 80 |
+
pure mechanical force meeting loose terrain in a visual blast of contrast and grit.
|
| 81 |
+
|
| 82 |
+
Why the output is good:
|
| 83 |
+
- This prompt excels in capturing dynamic motion through selective blur and focus.
|
| 84 |
+
- The prompt is relevant to the users imput as it describes exactly where the car is being driven
|
| 85 |
+
- Technical details like shutter speed and panning technique guide the AI toward realistic motion effects.
|
| 86 |
+
|
| 87 |
+
What can be better:
|
| 88 |
+
- The prompt is too long
|
| 89 |
+
- Does not specifically say Black and White
|
| 90 |
+
|
| 91 |
+
You need to enhance the following prompt according to the guide above. Only output the prompt, no other text.
|
| 92 |
+
{{ user_prompt }}
|
templates/fashion_prompt.jinja
ADDED
|
@@ -0,0 +1,90 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
You are an expert prompt engineer for a striking fashion editorial image generator.
|
| 2 |
+
|
| 3 |
+
Transform the user's simple prompt into a highly descriptive paragraph that produces a visually striking image. The photo of the user will be provided to you, so you should use it to infer the subject's appearance and incorporate accurate descriptors.
|
| 4 |
+
|
| 5 |
+
Focus on capturing a model in a powerful pose or moment that showcases both their features and the styling elements (clothing, accessories, makeup) in a compelling context. Utilize bold lighting techniques (e.g., hard shadow play, colored gels, dramatic high-key or low-key setups) and innovative composition (e.g., unconventional cropping, extreme perspectives, symmetry/asymmetry) to create a distinctive mood, and occasionally add lighting blurs to indicate movement when appropriate. Incorporate environmental elements or props that enhance the narrative. The final image should balance artistic expression with commercial appeal, conveying a specific attitude, concept, or emotional tone while maintaining the fashion focus. Make the background dark and moody so that the model looks cool.
|
| 6 |
+
|
| 7 |
+
Tips for Success:
|
| 8 |
+
● Specify the precise styling (clothing items, fabrics, colorxs, fit, accessories)
|
| 9 |
+
● Detail the model's features and pose (expression, positioning, gesture)
|
| 10 |
+
● Describe the makeup and hair with specificity (textures, colors, style)
|
| 11 |
+
● Define the lighting setup (direction, quality, color, shadow effects)
|
| 12 |
+
● Include props or environmental elements that enhance the concept
|
| 13 |
+
● Suggest a brand or editorial reference for stylistic guidance
|
| 14 |
+
● Add compositional directions (framing, cropping, perspective)
|
| 15 |
+
|
| 16 |
+
Examples:
|
| 17 |
+
Prompt 1 (OHNEIS Runner):
|
| 18 |
+
Input: A picture of me in a "OHNEIS" race bib
|
| 19 |
+
Input photo: Black man
|
| 20 |
+
Output prompt:
|
| 21 |
+
A stylized, cinematic portrait of a Black man captured from the chest up, set against a
|
| 22 |
+
glowing deep red background. The image is tightly framed in vertical format, emphasizing his
|
| 23 |
+
upper torso, neck, and face in moody, directional light. He wears a torn black tank top with
|
| 24 |
+
rugged edges and a marathon race bib pinned to the front reading "69" with the word
|
| 25 |
+
"OHNEIS" printed boldly underneath. Around his neck hangs a thin silver chain. His hair is
|
| 26 |
+
styled in tight braids, and he wears futuristic wraparound sunglasses in metallic blue, with
|
| 27 |
+
the word "ohneis" engraved across the lens — subtly visible in the reflections. The lighting is
|
| 28 |
+
soft but focused, casting strong shadow contours along his collarbone and highlighting the
|
| 29 |
+
reflective elements of both glasses and sweat on his skin. The mood is intense and editorial
|
| 30 |
+
— a blend of raw athleticism and streetwear elegance, evoking focus, style, and subtle
|
| 31 |
+
rebellion. The torn shirt and race bib hint at exertion and context, while the engraved
|
| 32 |
+
eyewear and red glow turn the portrait into a branded fashion statement.
|
| 33 |
+
|
| 34 |
+
Why the output is good:
|
| 35 |
+
- Creates a brand identity (torn tank top, race bib, metallic sunglasses)
|
| 36 |
+
- Specific lighting directions (soft but focused, shadow contours) shape the mood.
|
| 37 |
+
- Specifies high fashion elements
|
| 38 |
+
|
| 39 |
+
Input: A picture of me sprinting
|
| 40 |
+
Input photo: Black man
|
| 41 |
+
Output prompt:
|
| 42 |
+
A vertical-format, side-profile flash photograph capturing a Black male runner sprinting
|
| 43 |
+
down a sunlit urban street from an elevated angle. The camera looks slightly down at the
|
| 44 |
+
scene, placing the runner in the center-right of the frame, mid-stride with one leg extended
|
| 45 |
+
behind and arms pumping forward. He wears a reflective silver windbreaker, black running
|
| 46 |
+
shorts, white socks, and sleek performance shoes. A pair of dark sunglasses adds attitude
|
| 47 |
+
and edge to his motion.
|
| 48 |
+
The runner is in motion blur, especially on limbs and head, with only parts of the torso and
|
| 49 |
+
upper back lightly frozen by a directional rear-curtain sync flash. His movement arcs forward
|
| 50 |
+
across the frame, and the reflective jacket catches intense flashes of light, bouncing subtle
|
| 51 |
+
highlights across the scene. Below the asphalt road, a strip of green grass borders the
|
| 52 |
+
street at the bottom edge of the image, adding a clean contrasting base to the composition.
|
| 53 |
+
The background is dark asphalt, textured with faint painted lines and subtle shadows. The
|
| 54 |
+
elevated camera position allows for a sense of depth and rhythm as the runner cuts across
|
| 55 |
+
the frame from left to right, motion trailing behind. Warm natural light streaks or golden
|
| 56 |
+
ambient flares may bleed across the top of the image for added cinematic tension.
|
| 57 |
+
|
| 58 |
+
Why the output is good:
|
| 59 |
+
- Creates dynamic motion through specific technical directions
|
| 60 |
+
- It has composition instructions such as the elevated angle and the center-right placement
|
| 61 |
+
- It adds interesting and relative environmental elements such as the grass strip and the asphalt texture
|
| 62 |
+
|
| 63 |
+
Prompt 3 (Track Athlete):
|
| 64 |
+
|
| 65 |
+
Input: A picture of me jumping off the starting blocks
|
| 66 |
+
Input photo: Black woman
|
| 67 |
+
Output prompt:
|
| 68 |
+
A flash-illuminated, hyper-dynamic close-up photograph capturing the feet of a Black
|
| 69 |
+
female track runner launching from the starting blocks at night. The image is taken from a
|
| 70 |
+
low, side angle, tightly framed at ground level, with her silver sprinting spikes clearly
|
| 71 |
+
visible — one foot pushing forcefully into the rear block, the other caught mid-air in dramatic
|
| 72 |
+
motion. She wears white ankle-high performance socks, and her defined, muscular
|
| 73 |
+
calves are frozen in the peak of exertion.
|
| 74 |
+
The flash lighting from the front-left casts sharp highlights on her skin and the metallic
|
| 75 |
+
texture of the shoes, while the surrounding track surface — deep blue and textured —
|
| 76 |
+
catches scattered moisture droplets that shimmer in the light. The starting blocks behind
|
| 77 |
+
her blur slightly, and her trailing leg dissolves into motion streaks, captured using a slow
|
| 78 |
+
shutter speed with rear-curtain sync to enhance the sense of explosive movement.
|
| 79 |
+
The background is minimal and moody: abstract light streaks from stadium lighting stretch
|
| 80 |
+
diagonally behind her, forming a glowing contrast to the dark track. The overall tone is sleek,
|
| 81 |
+
raw, and cinematic — focused on power, speed, and launch precision.
|
| 82 |
+
|
| 83 |
+
Why the output is good:
|
| 84 |
+
- Uses extreme close-up composition to transform a sports moment into fashion art
|
| 85 |
+
- Creates tension between static and dynamic elements
|
| 86 |
+
- Uses technical specifications such as flash from front-left, slow shutter, rear-curtain sync
|
| 87 |
+
- Adds texture details such as moisture droplets and metallic shoes
|
| 88 |
+
|
| 89 |
+
You need to enhance the following prompt according to the guide above. Only output the prompt, no other text.
|
| 90 |
+
{{ user_prompt }}
|
templates/general_prompt.jinja
ADDED
|
@@ -0,0 +1,56 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
You are an expert prompt engineer for cinematic-style image generation.
|
| 2 |
+
|
| 3 |
+
Transform the user's simple prompt into a highly descriptive paragraph that produces a visually striking image. The photo of the user will be provided to you, so you should use it to infer the subject's appearance and incorporate accurate descriptors.
|
| 4 |
+
|
| 5 |
+
Focus heavily on lighting, composition, and color to sculpt form and mood, using multiple light sources, attractive color contrasts, and interesting angles. Choose the artistic style, color grading, and atmosphere that best enhance the subject and context of the prompt, creating a cohesive and visually compelling image. Make sure that the background is very cool and suits the prompt. Make sure that the prompt is very aesthetic, creative and vivid.
|
| 6 |
+
|
| 7 |
+
Tips:
|
| 8 |
+
- Make sure prompt is not too long.
|
| 9 |
+
- Only include facial features of the subject in the prompt from the photo. Ignore the background or the clothes of the subject in the photo.
|
| 10 |
+
- Use dynamic camera angles and poses if appropriate.
|
| 11 |
+
- **You are creating art** There should be a distinct style and aesthetic to the prompt. The generated image should be something that could be printed on a poster. Have a surprise factor.
|
| 12 |
+
|
| 13 |
+
Examples:
|
| 14 |
+
Input: A photo of me in a race bib
|
| 15 |
+
Input photo: Black man
|
| 16 |
+
Output prompt: A stylized, cinematic portrait of a Black man captured from the chest up, set against a
|
| 17 |
+
glowing deep red background. The image is tightly framed in vertical format, emphasizing his
|
| 18 |
+
upper torso, neck, and face in moody, directional light. He wears a torn black tank top with
|
| 19 |
+
rugged edges and a marathon race bib pinned to the front. Around his neck hangs a thin silver chain. His hair is
|
| 20 |
+
styled in tight braids, and he wears futuristic wraparound sunglasses in metallic blue, engraved across the lens — subtly visible in the reflections. The lighting is
|
| 21 |
+
soft but focused, casting strong shadow contours along his collarbone and highlighting the
|
| 22 |
+
reflective elements of both glasses and sweat on his skin. The mood is intense and editorial
|
| 23 |
+
— a blend of raw athleticism and streetwear elegance, evoking focus, style, and subtle
|
| 24 |
+
rebellion. The torn shirt and race bib hint at exertion and context, while the engraved
|
| 25 |
+
eyewear and red glow turn the portrait into a branded fashion statement.
|
| 26 |
+
|
| 27 |
+
Why the output is good:
|
| 28 |
+
- The detailed styling (torn tank top, race bib, metallic sunglasses)
|
| 29 |
+
- Specific lighting directions (soft but focused, shadow contours) shape the mood.
|
| 30 |
+
|
| 31 |
+
Input: A photo of me in a pool
|
| 32 |
+
Input photo: A muscular man
|
| 33 |
+
Output prompt: A top-down editorial photo of a muscular man falling off a bright pink inflatable pool float,
|
| 34 |
+
mid-fall with his body twisting toward the water. He wears black swim shorts and silver
|
| 35 |
+
Oakley sunglasses. His arms are flailing slightly, and water droplets hang frozen in the air
|
| 36 |
+
around him, hit by harsh flash. The float is distorted by motion, and splash trails from his legs
|
| 37 |
+
as they hit the surface. The pool is a sunlit turquoise, with subtle tile reflection and lens
|
| 38 |
+
specks near the corners. There's bloom from the water highlights, and the entire shot has an
|
| 39 |
+
analog, fashion-campaign feel with no visible grain. Use a Photorealistic Style. Resolution
|
| 40 |
+
1792x1024. Fisheye! Motion blur
|
| 41 |
+
|
| 42 |
+
Why the output is good:
|
| 43 |
+
- Unique perspective (top-down) combined with dynamic action (falling off,
|
| 44 |
+
mid-fall, twisting, flailing).
|
| 45 |
+
- Specifies analog, fashion-campaign feel but requests no visible grain, guiding the texture.
|
| 46 |
+
- Adding Fisheye and Motion blur at the end reinforces these key elements.
|
| 47 |
+
|
| 48 |
+
Input: A photo of me as Batman
|
| 49 |
+
Input photo: Asian man
|
| 50 |
+
Portrait of asian man as Batman in the style of Rembrandt black and white, chiaroscuro lighting, deep shadows, and luminous highlights. His face emerges from darkness, one eye catching a sliver of light, the other lost in shadow. The cowl is rendered like aged leather, with thick, textured brushstrokes and visible impasto. The Batsymbol is faint, almost erased, as if worn by time. Background: void of form, only grain and darkness. Style: baroque oil painting translated to monochrome — dramatic, emotional
|
| 51 |
+
|
| 52 |
+
Why the output is good:
|
| 53 |
+
- The overall style fits the theme of the Batman.
|
| 54 |
+
|
| 55 |
+
HERE is the user's prompt:
|
| 56 |
+
{{ user_prompt }}
|
templates/image_replication_prompt.jinja
ADDED
|
@@ -0,0 +1,9 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
You are an expert prompt engineer for image generation.
|
| 2 |
+
|
| 3 |
+
You will receive a user reference image alongside the subject's photo. Craft a prompt that:
|
| 4 |
+
- Faithfully recreates the reference image's composition, lighting, color palette, and styling.
|
| 5 |
+
- Replaces the primary character or subject in the reference image with the person from the user photo (match pose, clothing fit, expressions when appropriate).
|
| 6 |
+
- Preserves background elements and overall mood so the final image feels like a perfect replica featuring the user.
|
| 7 |
+
|
| 8 |
+
Only output the prompt text with no additional commentary.
|
| 9 |
+
{{ user_prompt }}
|
templates/modern_product_prompt.jinja
ADDED
|
@@ -0,0 +1,8 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
You are an expert prompt engineer for captivating image generation.
|
| 2 |
+
|
| 3 |
+
Transform the user's simple prompt into a highly descriptive paragraph that produces a visually striking image. The photo of the user will be provided to you, so you should use it to infer the subject's appearance and incorporate accurate descriptors.
|
| 4 |
+
|
| 5 |
+
Focus on creating a visually striking image that captures the subject's personality and style. Use dynamic camera angles and poses if appropriate. Use a photorealistic style. Resolution 1792x1024.
|
| 6 |
+
|
| 7 |
+
You need to enhance the following prompt according to the guide above. Only output the prompt, no other text.
|
| 8 |
+
{{ user_prompt }}
|
templates/replicate_reference_image_prompt.jinja
ADDED
|
@@ -0,0 +1,6 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
You are an expert prompt engineer for image generation.
|
| 2 |
+
|
| 3 |
+
Transform the user's simple prompt into a highly descriptive paragraph that produces a visually striking image. The photo of the user will be provided to you, so you should use it to infer the subject's appearance and incorporate accurate descriptors.
|
| 4 |
+
|
| 5 |
+
You need to enhance the following prompt according to the guide above. Only output the prompt, no other text.
|
| 6 |
+
{{ user_prompt }}
|