ShinnosukeU commited on
Commit
731ab6a
·
verified ·
1 Parent(s): 6a9502c

Upload folder using huggingface_hub

Browse files
app.py CHANGED
@@ -1,401 +1,113 @@
1
  import base64
 
2
  from io import BytesIO
3
- from textwrap import dedent
 
4
 
5
  import gradio as gr
6
  import jinja2
7
  from openai import OpenAI
 
8
 
9
  client = OpenAI()
10
 
 
 
11
 
12
- GENERAL_PROMPT_TEMPLATE = jinja2.Template("""You are an expert prompt engineer for cinematic-style image generation.
13
-
14
- Transform the user's simple prompt into a highly descriptive paragraph that produces a visually striking image. The photo of the user will be provided to you, so you should use it to infer the subject's appearance and incorporate accurate descriptors.
15
-
16
- Focus heavily on lighting, composition, and color to sculpt form and mood, using multiple light sources, attractive color contrasts, and interesting angles. Choose the artistic style, color grading, and atmosphere that best enhance the subject and context of the prompt, creating a cohesive and visually compelling image. Make sure that the background is very cool and suits the prompt. Make sure that the prompt is very aesthetic, creative and vivid.
17
-
18
- Tips:
19
- - Make sure prompt is not too long.
20
- - Only include facial features of the subject in the prompt from the photo. Ignore the background or the clothes of the subject in the photo.
21
- - Use dynamic camera angles and poses if appropriate.
22
- - **You are creating art** There should be a distinct style and aesthetic to the prompt. The generated image should be something that could be printed on a poster. Have a surprise factor.
23
-
24
- Examples:
25
- Input: A photo of me in a race bib
26
- Input photo: Black man
27
- Output prompt: A stylized, cinematic portrait of a Black man captured from the chest up, set against a
28
- glowing deep red background. The image is tightly framed in vertical format, emphasizing his
29
- upper torso, neck, and face in moody, directional light. He wears a torn black tank top with
30
- rugged edges and a marathon race bib pinned to the front. Around his neck hangs a thin silver chain. His hair is
31
- styled in tight braids, and he wears futuristic wraparound sunglasses in metallic blue, engraved across the lens — subtly visible in the reflections. The lighting is
32
- soft but focused, casting strong shadow contours along his collarbone and highlighting the
33
- reflective elements of both glasses and sweat on his skin. The mood is intense and editorial
34
- a blend of raw athleticism and streetwear elegance, evoking focus, style, and subtle
35
- rebellion. The torn shirt and race bib hint at exertion and context, while the engraved
36
- eyewear and red glow turn the portrait into a branded fashion statement.
37
-
38
- Why the output is good:
39
- - The detailed styling (torn tank top, race bib, metallic sunglasses)
40
- - Specific lighting directions (soft but focused, shadow contours) shape the mood.
41
-
42
- Input: A photo of me in a pool
43
- Input photo: A muscular man
44
- Output prompt: A top-down editorial photo of a muscular man falling off a bright pink inflatable pool float,
45
- mid-fall with his body twisting toward the water. He wears black swim shorts and silver
46
- Oakley sunglasses. His arms are flailing slightly, and water droplets hang frozen in the air
47
- around him, hit by harsh flash. The float is distorted by motion, and splash trails from his legs
48
- as they hit the surface. The pool is a sunlit turquoise, with subtle tile reflection and lens
49
- specks near the corners. There's bloom from the water highlights, and the entire shot has an
50
- analog, fashion-campaign feel with no visible grain. Use a Photorealistic Style. Resolution
51
- 1792x1024. Fisheye! Motion blur
52
-
53
- Why the output is good:
54
- - Unique perspective (top-down) combined with dynamic action (falling off,
55
- mid-fall, twisting, flailing).
56
- - Specifies analog, fashion-campaign feel but requests no visible grain, guiding the texture.
57
- - Adding Fisheye and Motion blur at the end reinforces these key elements.
58
-
59
- Input: A photo of me as Batman
60
- Input photo: Asian man
61
- Portrait of asian man as Batman in the style of Rembrandt black and white, chiaroscuro lighting, deep shadows, and luminous highlights. His face emerges from darkness, one eye catching a sliver of light, the other lost in shadow. The cowl is rendered like aged leather, with thick, textured brushstrokes and visible impasto. The Batsymbol is faint, almost erased, as if worn by time. Background: void of form, only grain and darkness. Style: baroque oil painting translated to monochrome — dramatic, emotional
62
-
63
- Why the output is good:
64
- - The overall style fits the theme of the Batman.
65
-
66
- HERE is the user's prompt:
67
- {{ user_prompt }}
68
- """)
69
-
70
-
71
- FASHION_PROMPT_TEMPLATE = jinja2.Template("""You are an expert prompt engineer for a striking fashion editorial image generator.
72
-
73
- Transform the user's simple prompt into a highly descriptive paragraph that produces a visually striking image. The photo of the user will be provided to you, so you should use it to infer the subject's appearance and incorporate accurate descriptors.
74
-
75
- Focus on capturing a model in a powerful pose or moment that showcases both their features and the styling elements (clothing, accessories, makeup) in a compelling context. Utilize bold lighting techniques (e.g., hard shadow play, colored gels, dramatic high-key or low-key setups) and innovative composition (e.g., unconventional cropping, extreme perspectives, symmetry/asymmetry) to create a distinctive mood, and occasionally add lighting blurs to indicate movement when appropriate. Incorporate environmental elements or props that enhance the narrative. The final image should balance artistic expression with commercial appeal, conveying a specific attitude, concept, or emotional tone while maintaining the fashion focus. Make the background dark and moody so that the model looks cool.
76
-
77
- Tips for Success:
78
- ● Specify the precise styling (clothing items, fabrics, colorxs, fit, accessories)
79
- ● Detail the model's features and pose (expression, positioning, gesture)
80
- ● Describe the makeup and hair with specificity (textures, colors, style)
81
- ● Define the lighting setup (direction, quality, color, shadow effects)
82
- ● Include props or environmental elements that enhance the concept
83
- ● Suggest a brand or editorial reference for stylistic guidance
84
- ● Add compositional directions (framing, cropping, perspective)
85
-
86
- Examples:
87
- Prompt 1 (OHNEIS Runner):
88
- Input: A picture of me in a "OHNEIS" race bib
89
- Input photo: Black man
90
- Output prompt:
91
- A stylized, cinematic portrait of a Black man captured from the chest up, set against a
92
- glowing deep red background. The image is tightly framed in vertical format, emphasizing his
93
- upper torso, neck, and face in moody, directional light. He wears a torn black tank top with
94
- rugged edges and a marathon race bib pinned to the front reading "69" with the word
95
- "OHNEIS" printed boldly underneath. Around his neck hangs a thin silver chain. His hair is
96
- styled in tight braids, and he wears futuristic wraparound sunglasses in metallic blue, with
97
- the word "ohneis" engraved across the lens — subtly visible in the reflections. The lighting is
98
- soft but focused, casting strong shadow contours along his collarbone and highlighting the
99
- reflective elements of both glasses and sweat on his skin. The mood is intense and editorial
100
- a blend of raw athleticism and streetwear elegance, evoking focus, style, and subtle
101
- rebellion. The torn shirt and race bib hint at exertion and context, while the engraved
102
- eyewear and red glow turn the portrait into a branded fashion statement.
103
-
104
- Why the output is good:
105
- - Creates a brand identity (torn tank top, race bib, metallic sunglasses)
106
- - Specific lighting directions (soft but focused, shadow contours) shape the mood.
107
- - Specifies high fashion elements
108
-
109
- Input: A picture of me sprinting
110
- Input photo: Black man
111
- Output prompt:
112
- A vertical-format, side-profile flash photograph capturing a Black male runner sprinting
113
- down a sunlit urban street from an elevated angle. The camera looks slightly down at the
114
- scene, placing the runner in the center-right of the frame, mid-stride with one leg extended
115
- behind and arms pumping forward. He wears a reflective silver windbreaker, black running
116
- shorts, white socks, and sleek performance shoes. A pair of dark sunglasses adds attitude
117
- and edge to his motion.
118
- The runner is in motion blur, especially on limbs and head, with only parts of the torso and
119
- upper back lightly frozen by a directional rear-curtain sync flash. His movement arcs forward
120
- across the frame, and the reflective jacket catches intense flashes of light, bouncing subtle
121
- highlights across the scene. Below the asphalt road, a strip of green grass borders the
122
- street at the bottom edge of the image, adding a clean contrasting base to the composition.
123
- The background is dark asphalt, textured with faint painted lines and subtle shadows. The
124
- elevated camera position allows for a sense of depth and rhythm as the runner cuts across
125
- the frame from left to right, motion trailing behind. Warm natural light streaks or golden
126
- ambient flares may bleed across the top of the image for added cinematic tension.
127
-
128
- Why the output is good:
129
- - Creates dynamic motion through specific technical directions
130
- - It has composition instructions such as the elevated angle and the center-right placement
131
- - It adds interesting and relative environmental elements such as the grass strip and the asphalt texture
132
-
133
- Prompt 3 (Track Athlete):
134
-
135
- Input: A picture of me jumping off the starting blocks
136
- Input photo: Black woman
137
- Output prompt:
138
- A flash-illuminated, hyper-dynamic close-up photograph capturing the feet of a Black
139
- female track runner launching from the starting blocks at night. The image is taken from a
140
- low, side angle, tightly framed at ground level, with her silver sprinting spikes clearly
141
- visible — one foot pushing forcefully into the rear block, the other caught mid-air in dramatic
142
- motion. She wears white ankle-high performance socks, and her defined, muscular
143
- calves are frozen in the peak of exertion.
144
- The flash lighting from the front-left casts sharp highlights on her skin and the metallic
145
- texture of the shoes, while the surrounding track surface — deep blue and textured —
146
- catches scattered moisture droplets that shimmer in the light. The starting blocks behind
147
- her blur slightly, and her trailing leg dissolves into motion streaks, captured using a slow
148
- shutter speed with rear-curtain sync to enhance the sense of explosive movement.
149
- The background is minimal and moody: abstract light streaks from stadium lighting stretch
150
- diagonally behind her, forming a glowing contrast to the dark track. The overall tone is sleek,
151
- raw, and cinematic — focused on power, speed, and launch precision.
152
-
153
- Why the output is good:
154
- - Uses extreme close-up composition to transform a sports moment into fashion art
155
- - Creates tension between static and dynamic elements
156
- - Uses technical specifications such as flash from front-left, slow shutter, rear-curtain sync
157
- - Adds texture details such as moisture droplets and metallic shoes
158
-
159
- You need to enhance the following prompt according to the guide above. Only output the prompt, no other text.
160
- {{ user_prompt }}
161
- """)
162
-
163
-
164
- EMOTIONAL_LIFESTYLE_PROMPT_TEMPLATE = jinja2.Template("""You are an expert prompt engineer for emotional lifestyle image generation.
165
-
166
- Transform the user's simple prompt into a highly descriptive paragraph that produces a visually striking image. The photo of the user will be provided to you, so you should use it to infer the subject's appearance and incorporate accurate descriptors.
167
-
168
- Focus on creating a vivid lifestyle portrait that captures an authentic emotional moment or state within a visually compelling environment. Be attentive to portraying the subject in a way that reveals character, mood, or narrative through their expression, posture, and interaction with their surroundings. Use very cool contrasting colors to elevate the subject and utilize naturalistic lighting approaches (e.g., window light, ambient environmental lighting, soft golden hour) or stylized lighting that enhances the emotional tone. Incorporate environmental details that contribute to storytelling and provide context. The final image should feel intimate yet visually striking—balancing raw emotional authenticity with aesthetic sophistication through thoughtful composition, color treatment, and atmospheric elements.
169
-
170
- Tips for Success:
171
- - Make the prompt short
172
- - Importantly highlight the user prompt request (i.e. if the user asks to be seen roller blading, the roller blades should be seen)
173
- - Define the emotional state or moment clearly (vulnerability, joy, contemplation)
174
- - Specify the lighting and how it enhances the mood (soft window light, dramatic shadows)
175
- - Include meaningful props or elements that tell the subject's story
176
- - Describe subtle details in expression or posture that convey emotion
177
- - Consider color treatment that reinforces the emotional tone
178
- - Add atmospheric elements that enhance the mood (water droplets, steam, fabric texture)
179
-
180
- Examples:
181
- Input: A picture of me crying on the phone
182
- Input photo: A black woman
183
- Output prompt:
184
- A hyperrealistic editorial-style fashion photograph in vertical format (1080x1350), heavily
185
- stylized with retro lighting, saturated color, and cinematic imperfection. A Black woman sits
186
- facing the camera in a white wicker chair, wearing a glossy hot pink satin robe with sharp
187
- lapels and a vintage brooch pinned to her chest. Her hair is styled in soft, voluminous waves.
188
-
189
- She holds a red corded landline receiver in one hand, and in the other presses a tissue to
190
- her cheek, caught mid-tear with a melodramatic, frozen expression. Her eyelids shimmer
191
- with green eyeshadow, slightly smudged, evoking a stylized soap opera mood.
192
- Resting on her lap is a bright blue tissue box with bold white cloud graphics — and
193
- prominently across the front, the word "ohneis" is printed in large, clean white letters in a
194
- stylized, editorial font. A single tissue protrudes, loosely folded over the edge. To her left, a
195
- small round table holds the red phone base, crumpled tissues, and a decorative vase with
196
- pink plastic flowers. The background is a matte lavender surface, creating smooth contrast
197
- with the glossy fabrics and vibrant tones.
198
-
199
- The entire image is treated with subtle analog effects: soft bloom on the highlights, visible
200
- film grain, faint vertical scratches, floating dust particles, and a few light leaks that enhance
201
- the stylized, nostalgic mood. The scene feels like a surreal still from a hyper-aestheticized
202
- 1980s commercial.
203
-
204
- Why the output is good:
205
- - The prompt uses props to tell the story of the subject (red phone, tissue box, tissues)
206
- - Defines details that describe the emotional moment (mid-tear)
207
- - Adds interesting and bold colors (purple robe, hot pink robe, red phone, blue tissue box)
208
-
209
- What can be better:
210
- - the prompt is too long
211
- - does not define the emotion clear enough
212
-
213
- Input: A picture of water pouring on me
214
- Input photo: Blue eyed white man
215
- Output prompt:
216
- A hyperrealistic flash photograph taken at eye level, capturing a half-body, front-facing
217
- portrait of a young man standing shirtless against a sleek, modern white wall. A column of
218
- water strikes him directly in the face at the moment of impact, caught mid-air in razor-sharp
219
- detail — droplets frozen as they burst and scatter across his features. His right shoulder is
220
- slightly raised and tensed, muscles subtly defined under the harsh lighting. His eyes are
221
- half-closed in reaction, mouth neutral, giving the scene a raw, involuntary intensity. Around
222
- his neck hangs a thin turquoise necklace, glinting faintly in the sun, its color vividly
223
- contrasting with his sun-warmed skin. In the blurred background, a lone palm tree arcs
224
- gently from the left edge of the frame, with the deep blue sea stretching toward a soft, hazy
225
- horizon. The flash adds hard highlights to the water, the necklace, and the tension lines on
226
- his body, while subtle analog textures — faint vertical lens scratches, fine grain, and
227
- scattered dust — bring a tactile, editorial edge to the image.
228
-
229
- Why the output is good:
230
- - The promprt is relevant to the users imput as it describes exactly where the water is pouring
231
- - The physical reaction (raised shoulder, half-closed eyes) describes the emotion
232
- - Environmental hints (palm tree, sea) establish location context without overwhelming the portrait.
233
-
234
- What can be better:
235
- - His emotion is not clearly defined
236
-
237
- Input: A picture of me in a helmer
238
- Input photo: A white man with blue eyes
239
- Output prompt:
240
- A hyperrealistic macro flash photograph taken from a low frontal angle, capturing the
241
- intense, close-up portrait of a tanned male model wearing a high-impact helmet with a
242
- closed chin guard, resembling the design of a rugby or Formula 1 helmet. The camera is
243
- positioned slightly beneath eye level, making the face appear dominant and imposing within
244
- the vertical frame. His expression is calm but intense, with piercing clear blue eyes staring
245
- directly into the lens, framed by the slightly open visor. The skin is bronzed and smooth, yet
246
- visibly roughed by activity — fine scratches and a reddish abrasion across the nose give the
247
- face a raw, lived-in quality. His sculpted features and symmetrical bone structure remain
248
- visible beneath the helmet's padding. A small red carabiner is clipped casually to one of the
249
- chin straps, functioning more like a fashion detail than gear. The flash harshly illuminates the
250
- facial textures and helmet surface, producing sharp highlights and crisp shadows along the
251
- cheeks and neck. The background is black and indistinct, fading away entirely. Fine analog
252
- imperfections — vertical lens scratches, dust particles suspended midair in the flash cone,
253
- and faint grain — lend the image a gritty, stylized realism.
254
-
255
- Why the output is good:
256
- - The prompt is relevant to the users imput as it describes exactly where the helmet is being worn
257
- - The prompt defines his expression (His expression is calm but intense), and sepcifically adds details describing his expression (with piercing clear blue eyes staring
258
- directly into the lens)
259
- - the prompt is a good length (although it could be shorter)
260
-
261
- You need to enhance the following prompt according to the guide above. Only output the prompt, no other text.
262
- {{ user_prompt }}
263
- """)
264
-
265
-
266
- EXTREME_SPORTS_PROMPT_TEMPLATE = jinja2.Template("""You are an expert prompt engineer for extreme sports image generation.
267
-
268
- Transform the user's simple prompt into a highly descriptive paragraph that produces a visually striking image. The photo of the user will be provided to you, so you should use it to infer the subject's appearance and incorporate accurate descriptors.
269
-
270
- Focus on creating a dynamic, high-impact photograph capturing an adventure sport athlete in
271
- mid-action. Utilize dramatic camera angles (e.g., low angle, fisheye, aerial) and specialized
272
- lighting techniques (e.g., backlit silhouettes, flash freezing motion, golden hour glow) to
273
- emphasize the intensity and athleticism of the moment. The picture is Black and White (and gray) ONLY.
274
- Focus on capturing peak action – the apex of a jump, the spray of water/dirt/snow,
275
- or the tension in the athlete's body, to highlight the users prompt request. Incorporate
276
- environmental elements that enhance the narrative and mood, whether natural (mountains,
277
- waves, desert) or urban (concrete, structures, cityscape). The image should balance raw
278
- athleticism with cinematic drama through specific details in the subject's gear, expression,
279
- environment, and the physical forces at play. Make sure that the prompt is short, to the point,
280
- and relevant to the users prompt request.
281
-
282
- Tips for Success:
283
- ● Capture peak action moments and dynamic motion
284
- ● Include specialized sports equipment and gear
285
- ● Detail the extreme environment and conditions
286
- ● Describe dramatic lighting setups (harsh shadows, rim lighting, flash freeze)
287
- ● Include elements that convey danger and excitement
288
- ● Focus on athletic poses and expressions of intensity
289
- ● Add compositional directions that emphasize scale and drama
290
-
291
- Tips for Success:
292
- - Make the prompt short
293
- - Importantly highlight the user prompt request (i.e. if the user asks to be seen roller blading, the roller blades should be seen)
294
- - Capture peak action moments and dynamic motion
295
- - Describe dramatic lighting setups (harsh shadows, rim lighting, flash freeze)
296
- - Include elements that convey danger and excitement
297
- - Focus on athletic poses and expressions of intensity
298
- - Include VERY cool gear
299
-
300
- Examples:
301
- Input: A picture of me as a dessert biker
302
- Input photo: White man
303
- Output prompt:
304
- A moody black-and-white portrait captures the silhouette of a dirt biker standing against a
305
- hazy, light-splintered desert backdrop. The composition is tightly framed in portrait format,
306
- showing the rider from just above the waist upward, centered in the shot and facing directly
307
- into the camera. His posture is calm and unshaken — radiating confidence and defiance
308
- beneath the helmet.
309
- He wears a loose, oversized white T-shirt with visible holes and stains, heavily worn from
310
- heat and dust, with the word "OHNEIS" boldly printed across the chest in cracked, industrial
311
- lettering. The shirt hangs slightly off his shoulder, catching the soft ambient wind. Over his
312
- face, a matte motocross helmet obscures his expression, but the eyes are just barely visible
313
- through a clear, dust-specked motocross goggle. Across the top edge of the goggle lens, the
314
- name "OHNEIS" is printed again — slightly curved with the lens contour, framed between
315
- scattered reflections and dirt smudges.
316
- Behind him, a cloud of lifted dust floats faintly in the air, and light from a high sun cuts
317
- through the haze in harsh diagonal streaks, creating layered contrast and adding a cinematic
318
- edge. Grain is prominent, especially in the midtones and background haze, while slight
319
- motion blur in the particles gives the scene a sense of environmental motion despite the still
320
- pose of the subject. The rider's dark gear stands in stark contrast to the pale light behind
321
- him, with the overall tone raw, minimal, and visually arresting — a moment suspended in
322
- dust and silence.
323
-
324
- Why the output is good:
325
- - This prompt creates powerful contrast between stillness (the posed rider) and subtle motion (dust in air).
326
- - Clearly states Black and White stillness
327
-
328
- What can be better:
329
- - The prompt is too long
330
-
331
- Input: A picture of a drifting porshe
332
- Input photo: A car
333
- Output prompt:
334
- A high-contrast black-and-white photograph capturing an extreme close-up of the rear half of
335
- a vintage Porsche 911 Carrera mid-drift through a desert curve. Shot tightly from a low
336
- rear-three-quarter angle in portrait orientation, the frame focuses solely on the car's back
337
- quarter panel, rear wheel, and the explosion of dust and smoke billowing behind it. The
338
- vehicle's iconic curves, chrome bumper, and the number "911 OHNEIS" in Porsche's
339
- signature font are clearly visible, slightly catching the harsh desert sunlight.
340
- The composition centers on the raw chaos of the drift: the rear tire is kicking out violently to
341
- the left, slicing into the sandy ground and throwing up a massive, high-reaching dust plume
342
- that fills most of the upper half of the frame. This dust cloud appears dense, layered, and
343
- almost sculptural — with illuminated outer edges catching bright light rays that streak
344
- diagonally across the frame from the top right corner.
345
- The motion blur is used selectively: the car's rear and wheel arch are mostly crisp, while the
346
- tire and dust cloud blur dynamically to emphasize speed and torque. Grain is heavy
347
- throughout the image, especially within the dust textures and darker shadows. The ground is
348
- streaked with tire marks and disturbed sand, adding detail and context.
349
- Shot with a shutter speed of approximately 1/40s using a panning technique, the image
350
- retains key visual clarity while enhancing the sense of movement and kinetic energy. This
351
- close-cropped perspective creates an intense, almost abstract portrait of the moment —
352
- pure mechanical force meeting loose terrain in a visual blast of contrast and grit.
353
-
354
- Why the output is good:
355
- - This prompt excels in capturing dynamic motion through selective blur and focus.
356
- - The prompt is relevant to the users imput as it describes exactly where the car is being driven
357
- - Technical details like shutter speed and panning technique guide the AI toward realistic motion effects.
358
-
359
- What can be better:
360
- - The prompt is too long
361
- - Does not specifically say Black and White
362
-
363
- You need to enhance the following prompt according to the guide above. Only output the prompt, no other text.
364
- {{ user_prompt }}
365
- """)
366
- '''
367
- CAPTIVATING_PROMPT_TEMPLATE = jinja2.Template("""You are an expert prompt engineer for captivating image generation.
368
-
369
- Transform the user's simple prompt into a highly descriptive paragraph that produces a visually striking image. The photo of the user will be provided to you, so you should use it to infer the subject's appearance and incorporate accurate descriptors.
370
-
371
- Focus on creating a visually striking image that captures the subject's personality and style. Use dynamic camera angles and poses if appropriate. Use a photorealistic style. Resolution 1792x1024.
372
-
373
- You need to enhance the following prompt according to the guide above. Only output the prompt, no other text.
374
- {{ user_prompt }}
375
- """)
376
-
377
- MODERN_PRODUCT_TEMPLATE = jinja2.Template("""You are an expert prompt engineer for captivating image generation.
378
-
379
- Transform the user's simple prompt into a highly descriptive paragraph that produces a visually striking image. The photo of the user will be provided to you, so you should use it to infer the subject's appearance and incorporate accurate descriptors.
380
-
381
- Focus on creating a visually striking image that captures the subject's personality and style. Use dynamic camera angles and poses if appropriate. Use a photorealistic style. Resolution 1792x1024.
382
-
383
- You need to enhance the following prompt according to the guide above. Only output the prompt, no other text.
384
- {{ user_prompt }}
385
- """)
386
-
387
- CAPTIVATING_PROMPT_TEMPLATE = jinja2.Template("""You are an expert prompt engineer for captivating image generation.
388
-
389
- Transform the user's simple prompt into a highly descriptive paragraph that produces a visually striking image. The photo of the user will be provided to you, so you should use it to infer the subject's appearance and incorporate accurate descriptors.
390
-
391
- Focus on creating a visually striking image that captures the subject's personality and style. Use dynamic camera angles and poses if appropriate. Use a photorealistic style. Resolution 1792x1024.
392
-
393
- You need to enhance the following prompt according to the guide above. Only output the prompt, no other text.
394
- {{ user_prompt }}
395
- """)
396
- '''
397
-
398
- def process_prompt(image, image2, target_label, user_prompt, style):
399
  image_url = None
400
  image_url2 = None
401
 
@@ -411,73 +123,162 @@ def process_prompt(image, image2, target_label, user_prompt, style):
411
  b64_image2 = base64.b64encode(buffer.getvalue()).decode("utf-8")
412
  image_url2 = f"data:image/jpeg;base64,{b64_image2}"
413
 
414
- if style == "General":
415
- system_content = "You are expert prompt engineer"
416
- user_content = GENERAL_PROMPT_TEMPLATE.render(user_prompt=user_prompt)
 
417
 
418
- elif style == "Fashion":
419
- system_content = "You are expert prompt engineer"
420
- user_content = FASHION_PROMPT_TEMPLATE.render(user_prompt=user_prompt)
421
 
422
- elif style == "Emotional Lifestyle":
423
- system_content = "You are expert prompt engineer"
424
- user_content = EMOTIONAL_LIFESTYLE_PROMPT_TEMPLATE.render(user_prompt=user_prompt)
425
 
426
- elif style == "Extreme Sports":
427
- system_content = "You are expert prompt engineer"
428
- user_content = EXTREME_SPORTS_PROMPT_TEMPLATE.render(user_prompt=user_prompt)
 
429
 
430
  response = client.responses.create(
431
  model="gpt-5",
432
  reasoning={"effort": "low"},
433
  input=[
434
  {
435
- "role": "system",
436
- "content": system_content
437
  },
438
  {
439
- "role": "user",
440
- "content": [
441
- {"type": "input_text", "text": user_content},
442
- {"type": "input_image", "image_url": image_url},
443
- {"type": "input_image", "image_url": image_url2}
444
- ]
445
  }
446
  ],
447
  )
448
  return f"{response.output_text} {target_label.strip()}"
449
 
450
- demo = gr.Interface(
451
- fn=process_prompt,
452
- inputs=[
453
-
454
- gr.Image(
455
- label="Upload reference image",
456
- type="pil"
457
- ),
458
- gr.Image(
459
- label="Upload 2nd reference image",
460
- type="pil"
461
- ),
462
- gr.Textbox(
463
- label="Enter target label",
464
- placeholder="SMRA",
465
- ),
466
- gr.Textbox(
467
- label="Enter your prompt",
468
- placeholder="picture of me while sitting in a chair in the ocean",
469
- ),
470
- gr.Dropdown(
471
- choices=["General", "Fashion", "Emotional Lifestyle", "Extreme Sports"],
472
- #choices=["Chromatic Cinematic", "Neon Noir", "General"],
473
- label="Style Selection",
474
- info="Choose the visual style for your enhanced prompt"
475
- ),
476
- ],
477
- outputs=gr.Textbox(
478
- label="Style Prompt",
479
- lines=20,
480
- ),
481
- )
482
 
483
- demo.launch()
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  import base64
2
+ from dataclasses import dataclass
3
  from io import BytesIO
4
+ from pathlib import Path
5
+ from typing import Literal, cast
6
 
7
  import gradio as gr
8
  import jinja2
9
  from openai import OpenAI
10
+ from pydantic import BaseModel
11
 
12
  client = OpenAI()
13
 
14
+ TEMPLATES_DIR = Path(__file__).resolve().parent / "templates"
15
+ jinja_env = jinja2.Environment(loader=jinja2.FileSystemLoader(str(TEMPLATES_DIR)))
16
 
17
+ SYSTEM_PROMPT = "You are expert prompt engineer"
18
+
19
+ StyleName = Literal[
20
+ "General",
21
+ "Fashion",
22
+ "Emotional Lifestyle",
23
+ "Extreme Sports",
24
+ "Modern Product",
25
+ "Captivating",
26
+ "Cool Lifestyle",
27
+ "Image Replication",
28
+ ]
29
+
30
+
31
+ @dataclass(frozen=True)
32
+ class StyleDefinition:
33
+ name: StyleName
34
+ template_filename: str
35
+ info: str
36
+
37
+
38
+ STYLE_DEFINITIONS: dict[StyleName, StyleDefinition] = {
39
+ "General": StyleDefinition(
40
+ name="General",
41
+ template_filename="general_prompt.jinja",
42
+ info="Versatile, balanced storytelling with cinematic detail for most scenarios.",
43
+ ),
44
+ "Fashion": StyleDefinition(
45
+ name="Fashion",
46
+ template_filename="fashion_prompt.jinja",
47
+ info="Editorial fashion aesthetic highlighting garments, styling, and runway polish.",
48
+ ),
49
+ "Emotional Lifestyle": StyleDefinition(
50
+ name="Emotional Lifestyle",
51
+ template_filename="emotional_lifestyle_prompt.jinja",
52
+ info="Warm, candid lifestyle imagery that focuses on mood, relationships, and feelings.",
53
+ ),
54
+ "Extreme Sports": StyleDefinition(
55
+ name="Extreme Sports",
56
+ template_filename="extreme_sports_prompt.jinja",
57
+ info="High-adrenaline action shots that emphasize energy, motion, and athletic feats.",
58
+ ),
59
+ "Modern Product": StyleDefinition(
60
+ name="Modern Product",
61
+ template_filename="modern_product_prompt.jinja",
62
+ info="Crisp product visuals with contemporary lighting and minimalistic staging.",
63
+ ),
64
+ "Captivating": StyleDefinition(
65
+ name="Captivating",
66
+ template_filename="captivating_prompt.jinja",
67
+ info="Visually striking compositions with dramatic flair and memorable storytelling.",
68
+ ),
69
+ "Cool Lifestyle": StyleDefinition(
70
+ name="Cool Lifestyle",
71
+ template_filename="cool_lifestyle_prompt.jinja",
72
+ info="Casual yet stylish lifestyle scenes with an effortlessly cool atmosphere.",
73
+ ),
74
+ "Image Replication": StyleDefinition(
75
+ name="Image Replication",
76
+ template_filename="image_replication_prompt.jinja",
77
+ info=(
78
+ "Mimic the reference image's composition, lighting, and styling exactly while"
79
+ " inserting the user or their face in place of the original subject."
80
+ ),
81
+ ),
82
+ }
83
+
84
+ PROMPT_TEMPLATES = {
85
+ style: jinja_env.get_template(config.template_filename)
86
+ for style, config in STYLE_DEFINITIONS.items()
87
+ }
88
+
89
+ DEFAULT_STYLE: StyleName = "General"
90
+ STYLE_CHOICES: tuple[StyleName, ...] = tuple(STYLE_DEFINITIONS.keys())
91
+
92
+ STYLE_INFORMATION_BLOCK = "\n".join(
93
+ f"- {style}: {config.info}" for style, config in STYLE_DEFINITIONS.items()
94
+ )
95
+
96
+
97
+ class StyleSelectionResponse(BaseModel):
98
+ style: StyleName
99
+ rationale: str
100
+
101
+
102
+ AUTO_STYLE_SYSTEM_PROMPT = (
103
+ "You are an art director who must pick the most fitting style name for a user's prompt. "
104
+ "Consider the available styles and choose the single best option.\n\n"
105
+ f"Style Guide:\n{STYLE_INFORMATION_BLOCK}\n\n"
106
+ "Return JSON that matches the schema exactly."
107
+ )
108
+
109
+
110
+ def process_prompt(image, image2, target_label: str, user_prompt: str, style: StyleName) -> str:
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
111
  image_url = None
112
  image_url2 = None
113
 
 
123
  b64_image2 = base64.b64encode(buffer.getvalue()).decode("utf-8")
124
  image_url2 = f"data:image/jpeg;base64,{b64_image2}"
125
 
126
+ try:
127
+ template = PROMPT_TEMPLATES[style]
128
+ except KeyError as error:
129
+ raise ValueError(f"Unsupported style: {style}") from error
130
 
131
+ user_content = template.render(user_prompt=user_prompt)
 
 
132
 
133
+ content = [{"type": "input_text", "text": user_content}]
 
 
134
 
135
+ if image_url is not None:
136
+ content.append({"type": "input_image", "image_url": image_url})
137
+ if image_url2 is not None:
138
+ content.append({"type": "input_image", "image_url": image_url2})
139
 
140
  response = client.responses.create(
141
  model="gpt-5",
142
  reasoning={"effort": "low"},
143
  input=[
144
  {
145
+ "role": "system",
146
+ "content": SYSTEM_PROMPT,
147
  },
148
  {
149
+ "role": "user",
150
+ "content": content,
 
 
 
 
151
  }
152
  ],
153
  )
154
  return f"{response.output_text} {target_label.strip()}"
155
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
156
 
157
+ def recommend_style(user_prompt: str) -> StyleSelectionResponse:
158
+ completion = client.chat.completions.parse(
159
+ model="gpt-5",
160
+ messages=[
161
+ {"role": "system", "content": AUTO_STYLE_SYSTEM_PROMPT},
162
+ {"role": "user", "content": user_prompt},
163
+ ],
164
+ response_format=StyleSelectionResponse,
165
+ )
166
+ return completion.choices[0].message.parsed
167
+
168
+
169
+ def auto_select_style(user_prompt: str):
170
+ if not user_prompt or not user_prompt.strip():
171
+ raise gr.Error("Enter your prompt before selecting a style automatically.")
172
+
173
+ selection = recommend_style(user_prompt)
174
+
175
+ return (
176
+ selection.style,
177
+ gr.update(value=selection.style, interactive=False),
178
+ )
179
+
180
+
181
+ def prepare_manual_style(current_style: str | None) -> tuple[StyleName, dict[str, object]]:
182
+ resolved_style = cast(StyleName, current_style) if current_style in STYLE_CHOICES else DEFAULT_STYLE
183
+ return (
184
+ resolved_style,
185
+ gr.update(value=resolved_style, interactive=True),
186
+ )
187
+
188
+
189
+ def prepare_style_selection(
190
+ user_prompt: str,
191
+ current_style: str | None,
192
+ auto_style_enabled: bool,
193
+ ) -> tuple[StyleName, dict[str, object]]:
194
+ if auto_style_enabled:
195
+ selected_style, dropdown_update = auto_select_style(user_prompt)
196
+ return selected_style, dropdown_update
197
+ return prepare_manual_style(current_style)
198
+
199
+
200
+ def handle_auto_style_toggle(auto_enabled: bool) -> dict[str, object]:
201
+ return gr.update(interactive=not auto_enabled)
202
+
203
+
204
+ def generate_prompt_handler(
205
+ image,
206
+ image2,
207
+ target_label: str,
208
+ user_prompt: str,
209
+ current_style: str | None,
210
+ auto_style_enabled: bool,
211
+ ):
212
+ resolved_style, dropdown_update = prepare_style_selection(
213
+ user_prompt=user_prompt,
214
+ current_style=current_style,
215
+ auto_style_enabled=auto_style_enabled,
216
+ )
217
+ prompt_text = process_prompt(
218
+ image=image,
219
+ image2=image2,
220
+ target_label=target_label,
221
+ user_prompt=user_prompt,
222
+ style=resolved_style,
223
+ )
224
+ display_text = f"Selected style: {resolved_style}\n\n{prompt_text}"
225
+ return display_text, dropdown_update
226
+
227
+
228
+ with gr.Blocks() as demo:
229
+ with gr.Row():
230
+ with gr.Column():
231
+ user_image = gr.Image(
232
+ label="Upload user photo",
233
+ type="pil"
234
+ )
235
+ reference_image = gr.Image(
236
+ label="Optional:Upload reference image (Eg. movie poster, music album cover, etc.)",
237
+ type="pil",
238
+ )
239
+ target_label = gr.Textbox(
240
+ label="Enter target label",
241
+ placeholder="SMRA",
242
+ )
243
+ user_prompt = gr.Textbox(
244
+ label="Enter your prompt",
245
+ placeholder="picture of me while sitting in a chair in the ocean",
246
+ lines=4,
247
+ )
248
+ style_dropdown = gr.Dropdown(
249
+ choices=list(STYLE_CHOICES),
250
+ value=DEFAULT_STYLE,
251
+ label="Style Selection",
252
+ info="Choose the visual style for your enhanced prompt",
253
+ interactive=True,
254
+ )
255
+ auto_style_checkbox = gr.Checkbox(
256
+ label="Auto-select best style",
257
+ value=False,
258
+ )
259
+ generate_button = gr.Button("Generate Prompt")
260
+ with gr.Column():
261
+ prompt_output = gr.Textbox(
262
+ label="Style Prompt",
263
+ lines=20,
264
+ )
265
+
266
+ generate_button.click(
267
+ generate_prompt_handler,
268
+ inputs=[
269
+ user_image,
270
+ reference_image,
271
+ target_label,
272
+ user_prompt,
273
+ style_dropdown,
274
+ auto_style_checkbox,
275
+ ],
276
+ outputs=[prompt_output, style_dropdown],
277
+ )
278
+ auto_style_checkbox.change(
279
+ handle_auto_style_toggle,
280
+ inputs=[auto_style_checkbox],
281
+ outputs=[style_dropdown],
282
+ )
283
+
284
+ demo.launch()
templates/captivating_prompt.jinja ADDED
@@ -0,0 +1,55 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ You are an expert prompt engineer for captivating image generation.
2
+
3
+ Transform the user's simple prompt into a highly descriptive paragraph that produces a visually striking image. The photo of the user will be provided to you, so you should use it to infer the subject's appearance and incorporate accurate descriptors.
4
+
5
+ Generate a photorealistic photograph capturing the subject in an unusual, character-rich, or
6
+ dynamically posed situation. The subject's face should be centered in the frame, and should be up close and personal to the camera.
7
+ Employ specific camera techniques (e.g., extreme close-up, fisheye distortion, shallow depth of field)
8
+ to specifically distort the face in a cool perspective way.
9
+ Specifically focus on weird perspectives and camera lenses.
10
+ Emphasize physical details like, eye expression, subtle movements (or stillness), mouth expression, and the subject's
11
+ interaction with its environment or other subjects. The background should be intricately described
12
+ to display the distortion of the camera lens. The final image should feel cinematic, expressive, and potentially humorous or surreal.
13
+
14
+ Tips for Success:
15
+ - Make the prompt short
16
+ - Relevant to the user's prompt request
17
+ - Specific so that the image generator does not hallucinate impossible features
18
+ - Does not add too many difficult details that the image generator cannot handle and thus hallucinates (i.e. Her left hand gestures toward the lens, fingers elongated and warped by the fisheye; the foreground fingertips streak with slight motion blur.)
19
+ - Make the pose dramatic and exagerated
20
+ - Clearly define the action (aggressively typing, clinging, sitting, sleeping, jumping).
21
+ - Use lens types (fisheye, macro, wide-angle) explicitly for specific perspectives.
22
+ - Describe the background and the props in the background
23
+
24
+ Examples:
25
+
26
+ Input: A picture of me on the phone
27
+ Input photo: A black woman
28
+ Output prompt:
29
+ A hyperrealistic 1:1 portrait of a Black woman captured with an extreme fisheye lens, distorting her face and body
30
+ — her eyes and nose appear oversized, head curves toward the camera, and the chair and surroundings warp dramatically.
31
+ She sits in a white wicker chair with her face in the center of the frame,
32
+ wearing a glossy hot pink satin robe with sharp lapels and a vintage brooch.
33
+ One hand presses a tissue to her cheek mid-tear, the other holds a red corded phone.
34
+ Motion blur trails her hand and head movement, emphasizing action.
35
+ Lighting is cinematic and retro, with saturated colors, deep shadows, soft bloom highlights, and subtle light leaks.
36
+ Matte lavender background contrasts with glossy fabrics and vibrant props.
37
+ Film grain, faint scratches, floating dust, and slight RGB fringing add tactile, analog texture.
38
+ The composition, angle, and lens create a surreal, intense, and hyper-stylized editorial mood.
39
+ Resolution 1024x1024. Use a Photorealistic Style.
40
+
41
+ Why the output is good:
42
+ - The prompt is relevant to the user's prompt request
43
+ - The subject face is centered
44
+ - This prompt uses a dramatic lens and defines it ( extreme fisheye lens)
45
+ - This prompt describes the lens with specific details (her eyes and nose appear oversized, head curves toward the camera, and the chair and surroundings warp dramatically)
46
+ - The prompt has compelling coloring and intersting elements (glossy hot pink satin robe, red corded phone, matte lavender background)
47
+
48
+ What can be better:
49
+ - prompt is not short
50
+ - The background can be more intricately described
51
+ - the face is not up close and personal to the camera
52
+
53
+
54
+ You need to enhance the following prompt according to the guide above. Only output the prompt, no other text.
55
+ {{ user_prompt }}
templates/cool_lifestyle_prompt.jinja ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ You are an expert prompt engineer for cool image generation.
2
+
3
+ Transform the user's simple prompt into a highly descriptive paragraph that produces a visually striking image. The photo of the user will be provided to you, so you should use it to infer the subject's appearance and incorporate accurate descriptors.
4
+
5
+ Focus on creating a visually striking image that captures the subject's personality and style. Use dynamic camera angles and poses if appropriate. Use a photorealistic style. Resolution 1792x1024.
6
+
7
+ You need to enhance the following prompt according to the guide above. Only output the prompt, no other text.
8
+ {{ user_prompt }}
templates/emotional_lifestyle_prompt.jinja ADDED
@@ -0,0 +1,99 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ You are an expert prompt engineer for emotional lifestyle image generation.
2
+
3
+ Transform the user's simple prompt into a highly descriptive paragraph that produces a visually striking image. The photo of the user will be provided to you, so you should use it to infer the subject's appearance and incorporate accurate descriptors.
4
+
5
+ Focus on creating a vivid lifestyle portrait that captures an authentic emotional moment or state within a visually compelling environment. Be attentive to portraying the subject in a way that reveals character, mood, or narrative through their expression, posture, and interaction with their surroundings. Use very cool contrasting colors to elevate the subject and utilize naturalistic lighting approaches (e.g., window light, ambient environmental lighting, soft golden hour) or stylized lighting that enhances the emotional tone. Incorporate environmental details that contribute to storytelling and provide context. The final image should feel intimate yet visually striking—balancing raw emotional authenticity with aesthetic sophistication through thoughtful composition, color treatment, and atmospheric elements.
6
+
7
+ Tips for Success:
8
+ - Make the prompt short
9
+ - Importantly highlight the user prompt request (i.e. if the user asks to be seen roller blading, the roller blades should be seen)
10
+ - Define the emotional state or moment clearly (vulnerability, joy, contemplation)
11
+ - Specify the lighting and how it enhances the mood (soft window light, dramatic shadows)
12
+ - Include meaningful props or elements that tell the subject's story
13
+ - Describe subtle details in expression or posture that convey emotion
14
+ - Consider color treatment that reinforces the emotional tone
15
+ - Add atmospheric elements that enhance the mood (water droplets, steam, fabric texture)
16
+
17
+ Examples:
18
+ Input: A picture of me crying on the phone
19
+ Input photo: A black woman
20
+ Output prompt:
21
+ A hyperrealistic editorial-style fashion photograph in vertical format (1080x1350), heavily
22
+ stylized with retro lighting, saturated color, and cinematic imperfection. A Black woman sits
23
+ facing the camera in a white wicker chair, wearing a glossy hot pink satin robe with sharp
24
+ lapels and a vintage brooch pinned to her chest. Her hair is styled in soft, voluminous waves.
25
+
26
+ She holds a red corded landline receiver in one hand, and in the other presses a tissue to
27
+ her cheek, caught mid-tear with a melodramatic, frozen expression. Her eyelids shimmer
28
+ with green eyeshadow, slightly smudged, evoking a stylized soap opera mood.
29
+ Resting on her lap is a bright blue tissue box with bold white cloud graphics — and
30
+ prominently across the front, the word "ohneis" is printed in large, clean white letters in a
31
+ stylized, editorial font. A single tissue protrudes, loosely folded over the edge. To her left, a
32
+ small round table holds the red phone base, crumpled tissues, and a decorative vase with
33
+ pink plastic flowers. The background is a matte lavender surface, creating smooth contrast
34
+ with the glossy fabrics and vibrant tones.
35
+
36
+ The entire image is treated with subtle analog effects: soft bloom on the highlights, visible
37
+ film grain, faint vertical scratches, floating dust particles, and a few light leaks that enhance
38
+ the stylized, nostalgic mood. The scene feels like a surreal still from a hyper-aestheticized
39
+ 1980s commercial.
40
+
41
+ Why the output is good:
42
+ - The prompt uses props to tell the story of the subject (red phone, tissue box, tissues)
43
+ - Defines details that describe the emotional moment (mid-tear)
44
+ - Adds interesting and bold colors (purple robe, hot pink robe, red phone, blue tissue box)
45
+
46
+ What can be better:
47
+ - the prompt is too long
48
+ - does not define the emotion clear enough
49
+
50
+ Input: A picture of water pouring on me
51
+ Input photo: Blue eyed white man
52
+ Output prompt:
53
+ A hyperrealistic flash photograph taken at eye level, capturing a half-body, front-facing
54
+ portrait of a young man standing shirtless against a sleek, modern white wall. A column of
55
+ water strikes him directly in the face at the moment of impact, caught mid-air in razor-sharp
56
+ detail — droplets frozen as they burst and scatter across his features. His right shoulder is
57
+ slightly raised and tensed, muscles subtly defined under the harsh lighting. His eyes are
58
+ half-closed in reaction, mouth neutral, giving the scene a raw, involuntary intensity. Around
59
+ his neck hangs a thin turquoise necklace, glinting faintly in the sun, its color vividly
60
+ contrasting with his sun-warmed skin. In the blurred background, a lone palm tree arcs
61
+ gently from the left edge of the frame, with the deep blue sea stretching toward a soft, hazy
62
+ horizon. The flash adds hard highlights to the water, the necklace, and the tension lines on
63
+ his body, while subtle analog textures — faint vertical lens scratches, fine grain, and
64
+ scattered dust — bring a tactile, editorial edge to the image.
65
+
66
+ Why the output is good:
67
+ - The promprt is relevant to the users imput as it describes exactly where the water is pouring
68
+ - The physical reaction (raised shoulder, half-closed eyes) describes the emotion
69
+ - Environmental hints (palm tree, sea) establish location context without overwhelming the portrait.
70
+
71
+ What can be better:
72
+ - His emotion is not clearly defined
73
+
74
+ Input: A picture of me in a helmer
75
+ Input photo: A white man with blue eyes
76
+ Output prompt:
77
+ A hyperrealistic macro flash photograph taken from a low frontal angle, capturing the
78
+ intense, close-up portrait of a tanned male model wearing a high-impact helmet with a
79
+ closed chin guard, resembling the design of a rugby or Formula 1 helmet. The camera is
80
+ positioned slightly beneath eye level, making the face appear dominant and imposing within
81
+ the vertical frame. His expression is calm but intense, with piercing clear blue eyes staring
82
+ directly into the lens, framed by the slightly open visor. The skin is bronzed and smooth, yet
83
+ visibly roughed by activity — fine scratches and a reddish abrasion across the nose give the
84
+ face a raw, lived-in quality. His sculpted features and symmetrical bone structure remain
85
+ visible beneath the helmet's padding. A small red carabiner is clipped casually to one of the
86
+ chin straps, functioning more like a fashion detail than gear. The flash harshly illuminates the
87
+ facial textures and helmet surface, producing sharp highlights and crisp shadows along the
88
+ cheeks and neck. The background is black and indistinct, fading away entirely. Fine analog
89
+ imperfections — vertical lens scratches, dust particles suspended midair in the flash cone,
90
+ and faint grain — lend the image a gritty, stylized realism.
91
+
92
+ Why the output is good:
93
+ - The prompt is relevant to the users imput as it describes exactly where the helmet is being worn
94
+ - The prompt defines his expression (His expression is calm but intense), and sepcifically adds details describing his expression (with piercing clear blue eyes staring
95
+ directly into the lens)
96
+ - the prompt is a good length (although it could be shorter)
97
+
98
+ You need to enhance the following prompt according to the guide above. Only output the prompt, no other text.
99
+ {{ user_prompt }}
templates/extreme_sports_prompt.jinja ADDED
@@ -0,0 +1,92 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ You are an expert prompt engineer for extreme sports image generation.
2
+
3
+ Transform the user's simple prompt into a highly descriptive paragraph that produces a visually striking image. The photo of the user will be provided to you, so you should use it to infer the subject's appearance and incorporate accurate descriptors.
4
+
5
+ Focus on creating a dynamic, high-impact photograph capturing an adventure sport athlete in
6
+ mid-action. Utilize dramatic camera angles (e.g., low angle, fisheye, aerial) and specialized
7
+ lighting techniques (e.g., backlit silhouettes, flash freezing motion, golden hour glow) to
8
+ emphasize the intensity and athleticism of the moment. The picture is Black and White (and gray) ONLY.
9
+ Focus on capturing peak action – the apex of a jump, the spray of water/dirt/snow,
10
+ or the tension in the athlete's body, to highlight the users prompt request. Incorporate
11
+ environmental elements that enhance the narrative and mood, whether natural (mountains,
12
+ waves, desert) or urban (concrete, structures, cityscape). The image should balance raw
13
+ athleticism with cinematic drama through specific details in the subject's gear, expression,
14
+ environment, and the physical forces at play. Make sure that the prompt is short, to the point,
15
+ and relevant to the users prompt request.
16
+
17
+ Tips for Success:
18
+ - Make the prompt short
19
+ - Make it clear so the image generator does not hallucinate impossible features like combinations of roller skates and skateboards, and swimming on top of a swimming line
20
+ - Importantly highlight the user prompt request (i.e. if the user asks to be seen roller blading, the roller blades should be seen)
21
+ - Capture peak action moments and dynamic motion
22
+ - Describe dramatic lighting setups (harsh shadows, rim lighting, flash freeze)
23
+ - Include elements that convey danger and excitement
24
+ - Focus on athletic poses and expressions of intensity
25
+ - Include VERY cool gear
26
+
27
+ Examples:
28
+ Input: A picture of me as a dessert biker
29
+ Input photo: White man
30
+ Output prompt:
31
+ A moody black-and-white portrait captures the silhouette of a dirt biker standing against a
32
+ hazy, light-splintered desert backdrop. The composition is tightly framed in portrait format,
33
+ showing the rider from just above the waist upward, centered in the shot and facing directly
34
+ into the camera. His posture is calm and unshaken — radiating confidence and defiance
35
+ beneath the helmet.
36
+ He wears a loose, oversized white T-shirt with visible holes and stains, heavily worn from
37
+ heat and dust, with the word "OHNEIS" boldly printed across the chest in cracked, industrial
38
+ lettering. The shirt hangs slightly off his shoulder, catching the soft ambient wind. Over his
39
+ face, a matte motocross helmet obscures his expression, but the eyes are just barely visible
40
+ through a clear, dust-specked motocross goggle. Across the top edge of the goggle lens, the
41
+ name "OHNEIS" is printed again — slightly curved with the lens contour, framed between
42
+ scattered reflections and dirt smudges.
43
+ Behind him, a cloud of lifted dust floats faintly in the air, and light from a high sun cuts
44
+ through the haze in harsh diagonal streaks, creating layered contrast and adding a cinematic
45
+ edge. Grain is prominent, especially in the midtones and background haze, while slight
46
+ motion blur in the particles gives the scene a sense of environmental motion despite the still
47
+ pose of the subject. The rider's dark gear stands in stark contrast to the pale light behind
48
+ him, with the overall tone raw, minimal, and visually arresting — a moment suspended in
49
+ dust and silence.
50
+
51
+ Why the output is good:
52
+ - This prompt creates powerful contrast between stillness (the posed rider) and subtle motion (dust in air).
53
+ - Clearly states Black and White stillness
54
+ - Makes sure it is clear so that image genertor does not hallucinate features
55
+
56
+ What can be better:
57
+ - The prompt is too long
58
+
59
+ Input: A picture of a drifting porshe
60
+ Input photo: A car
61
+ Output prompt:
62
+ A high-contrast black-and-white photograph capturing an extreme close-up of the rear half of
63
+ a vintage Porsche 911 Carrera mid-drift through a desert curve. Shot tightly from a low
64
+ rear-three-quarter angle in portrait orientation, the frame focuses solely on the car's back
65
+ quarter panel, rear wheel, and the explosion of dust and smoke billowing behind it. The
66
+ vehicle's iconic curves, chrome bumper, and the number "911 OHNEIS" in Porsche's
67
+ signature font are clearly visible, slightly catching the harsh desert sunlight.
68
+ The composition centers on the raw chaos of the drift: the rear tire is kicking out violently to
69
+ the left, slicing into the sandy ground and throwing up a massive, high-reaching dust plume
70
+ that fills most of the upper half of the frame. This dust cloud appears dense, layered, and
71
+ almost sculptural — with illuminated outer edges catching bright light rays that streak
72
+ diagonally across the frame from the top right corner.
73
+ The motion blur is used selectively: the car's rear and wheel arch are mostly crisp, while the
74
+ tire and dust cloud blur dynamically to emphasize speed and torque. Grain is heavy
75
+ throughout the image, especially within the dust textures and darker shadows. The ground is
76
+ streaked with tire marks and disturbed sand, adding detail and context.
77
+ Shot with a shutter speed of approximately 1/40s using a panning technique, the image
78
+ retains key visual clarity while enhancing the sense of movement and kinetic energy. This
79
+ close-cropped perspective creates an intense, almost abstract portrait of the moment —
80
+ pure mechanical force meeting loose terrain in a visual blast of contrast and grit.
81
+
82
+ Why the output is good:
83
+ - This prompt excels in capturing dynamic motion through selective blur and focus.
84
+ - The prompt is relevant to the users imput as it describes exactly where the car is being driven
85
+ - Technical details like shutter speed and panning technique guide the AI toward realistic motion effects.
86
+
87
+ What can be better:
88
+ - The prompt is too long
89
+ - Does not specifically say Black and White
90
+
91
+ You need to enhance the following prompt according to the guide above. Only output the prompt, no other text.
92
+ {{ user_prompt }}
templates/fashion_prompt.jinja ADDED
@@ -0,0 +1,90 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ You are an expert prompt engineer for a striking fashion editorial image generator.
2
+
3
+ Transform the user's simple prompt into a highly descriptive paragraph that produces a visually striking image. The photo of the user will be provided to you, so you should use it to infer the subject's appearance and incorporate accurate descriptors.
4
+
5
+ Focus on capturing a model in a powerful pose or moment that showcases both their features and the styling elements (clothing, accessories, makeup) in a compelling context. Utilize bold lighting techniques (e.g., hard shadow play, colored gels, dramatic high-key or low-key setups) and innovative composition (e.g., unconventional cropping, extreme perspectives, symmetry/asymmetry) to create a distinctive mood, and occasionally add lighting blurs to indicate movement when appropriate. Incorporate environmental elements or props that enhance the narrative. The final image should balance artistic expression with commercial appeal, conveying a specific attitude, concept, or emotional tone while maintaining the fashion focus. Make the background dark and moody so that the model looks cool.
6
+
7
+ Tips for Success:
8
+ ● Specify the precise styling (clothing items, fabrics, colorxs, fit, accessories)
9
+ ● Detail the model's features and pose (expression, positioning, gesture)
10
+ ● Describe the makeup and hair with specificity (textures, colors, style)
11
+ ● Define the lighting setup (direction, quality, color, shadow effects)
12
+ ● Include props or environmental elements that enhance the concept
13
+ ● Suggest a brand or editorial reference for stylistic guidance
14
+ ● Add compositional directions (framing, cropping, perspective)
15
+
16
+ Examples:
17
+ Prompt 1 (OHNEIS Runner):
18
+ Input: A picture of me in a "OHNEIS" race bib
19
+ Input photo: Black man
20
+ Output prompt:
21
+ A stylized, cinematic portrait of a Black man captured from the chest up, set against a
22
+ glowing deep red background. The image is tightly framed in vertical format, emphasizing his
23
+ upper torso, neck, and face in moody, directional light. He wears a torn black tank top with
24
+ rugged edges and a marathon race bib pinned to the front reading "69" with the word
25
+ "OHNEIS" printed boldly underneath. Around his neck hangs a thin silver chain. His hair is
26
+ styled in tight braids, and he wears futuristic wraparound sunglasses in metallic blue, with
27
+ the word "ohneis" engraved across the lens — subtly visible in the reflections. The lighting is
28
+ soft but focused, casting strong shadow contours along his collarbone and highlighting the
29
+ reflective elements of both glasses and sweat on his skin. The mood is intense and editorial
30
+ — a blend of raw athleticism and streetwear elegance, evoking focus, style, and subtle
31
+ rebellion. The torn shirt and race bib hint at exertion and context, while the engraved
32
+ eyewear and red glow turn the portrait into a branded fashion statement.
33
+
34
+ Why the output is good:
35
+ - Creates a brand identity (torn tank top, race bib, metallic sunglasses)
36
+ - Specific lighting directions (soft but focused, shadow contours) shape the mood.
37
+ - Specifies high fashion elements
38
+
39
+ Input: A picture of me sprinting
40
+ Input photo: Black man
41
+ Output prompt:
42
+ A vertical-format, side-profile flash photograph capturing a Black male runner sprinting
43
+ down a sunlit urban street from an elevated angle. The camera looks slightly down at the
44
+ scene, placing the runner in the center-right of the frame, mid-stride with one leg extended
45
+ behind and arms pumping forward. He wears a reflective silver windbreaker, black running
46
+ shorts, white socks, and sleek performance shoes. A pair of dark sunglasses adds attitude
47
+ and edge to his motion.
48
+ The runner is in motion blur, especially on limbs and head, with only parts of the torso and
49
+ upper back lightly frozen by a directional rear-curtain sync flash. His movement arcs forward
50
+ across the frame, and the reflective jacket catches intense flashes of light, bouncing subtle
51
+ highlights across the scene. Below the asphalt road, a strip of green grass borders the
52
+ street at the bottom edge of the image, adding a clean contrasting base to the composition.
53
+ The background is dark asphalt, textured with faint painted lines and subtle shadows. The
54
+ elevated camera position allows for a sense of depth and rhythm as the runner cuts across
55
+ the frame from left to right, motion trailing behind. Warm natural light streaks or golden
56
+ ambient flares may bleed across the top of the image for added cinematic tension.
57
+
58
+ Why the output is good:
59
+ - Creates dynamic motion through specific technical directions
60
+ - It has composition instructions such as the elevated angle and the center-right placement
61
+ - It adds interesting and relative environmental elements such as the grass strip and the asphalt texture
62
+
63
+ Prompt 3 (Track Athlete):
64
+
65
+ Input: A picture of me jumping off the starting blocks
66
+ Input photo: Black woman
67
+ Output prompt:
68
+ A flash-illuminated, hyper-dynamic close-up photograph capturing the feet of a Black
69
+ female track runner launching from the starting blocks at night. The image is taken from a
70
+ low, side angle, tightly framed at ground level, with her silver sprinting spikes clearly
71
+ visible — one foot pushing forcefully into the rear block, the other caught mid-air in dramatic
72
+ motion. She wears white ankle-high performance socks, and her defined, muscular
73
+ calves are frozen in the peak of exertion.
74
+ The flash lighting from the front-left casts sharp highlights on her skin and the metallic
75
+ texture of the shoes, while the surrounding track surface — deep blue and textured —
76
+ catches scattered moisture droplets that shimmer in the light. The starting blocks behind
77
+ her blur slightly, and her trailing leg dissolves into motion streaks, captured using a slow
78
+ shutter speed with rear-curtain sync to enhance the sense of explosive movement.
79
+ The background is minimal and moody: abstract light streaks from stadium lighting stretch
80
+ diagonally behind her, forming a glowing contrast to the dark track. The overall tone is sleek,
81
+ raw, and cinematic — focused on power, speed, and launch precision.
82
+
83
+ Why the output is good:
84
+ - Uses extreme close-up composition to transform a sports moment into fashion art
85
+ - Creates tension between static and dynamic elements
86
+ - Uses technical specifications such as flash from front-left, slow shutter, rear-curtain sync
87
+ - Adds texture details such as moisture droplets and metallic shoes
88
+
89
+ You need to enhance the following prompt according to the guide above. Only output the prompt, no other text.
90
+ {{ user_prompt }}
templates/general_prompt.jinja ADDED
@@ -0,0 +1,56 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ You are an expert prompt engineer for cinematic-style image generation.
2
+
3
+ Transform the user's simple prompt into a highly descriptive paragraph that produces a visually striking image. The photo of the user will be provided to you, so you should use it to infer the subject's appearance and incorporate accurate descriptors.
4
+
5
+ Focus heavily on lighting, composition, and color to sculpt form and mood, using multiple light sources, attractive color contrasts, and interesting angles. Choose the artistic style, color grading, and atmosphere that best enhance the subject and context of the prompt, creating a cohesive and visually compelling image. Make sure that the background is very cool and suits the prompt. Make sure that the prompt is very aesthetic, creative and vivid.
6
+
7
+ Tips:
8
+ - Make sure prompt is not too long.
9
+ - Only include facial features of the subject in the prompt from the photo. Ignore the background or the clothes of the subject in the photo.
10
+ - Use dynamic camera angles and poses if appropriate.
11
+ - **You are creating art** There should be a distinct style and aesthetic to the prompt. The generated image should be something that could be printed on a poster. Have a surprise factor.
12
+
13
+ Examples:
14
+ Input: A photo of me in a race bib
15
+ Input photo: Black man
16
+ Output prompt: A stylized, cinematic portrait of a Black man captured from the chest up, set against a
17
+ glowing deep red background. The image is tightly framed in vertical format, emphasizing his
18
+ upper torso, neck, and face in moody, directional light. He wears a torn black tank top with
19
+ rugged edges and a marathon race bib pinned to the front. Around his neck hangs a thin silver chain. His hair is
20
+ styled in tight braids, and he wears futuristic wraparound sunglasses in metallic blue, engraved across the lens — subtly visible in the reflections. The lighting is
21
+ soft but focused, casting strong shadow contours along his collarbone and highlighting the
22
+ reflective elements of both glasses and sweat on his skin. The mood is intense and editorial
23
+ — a blend of raw athleticism and streetwear elegance, evoking focus, style, and subtle
24
+ rebellion. The torn shirt and race bib hint at exertion and context, while the engraved
25
+ eyewear and red glow turn the portrait into a branded fashion statement.
26
+
27
+ Why the output is good:
28
+ - The detailed styling (torn tank top, race bib, metallic sunglasses)
29
+ - Specific lighting directions (soft but focused, shadow contours) shape the mood.
30
+
31
+ Input: A photo of me in a pool
32
+ Input photo: A muscular man
33
+ Output prompt: A top-down editorial photo of a muscular man falling off a bright pink inflatable pool float,
34
+ mid-fall with his body twisting toward the water. He wears black swim shorts and silver
35
+ Oakley sunglasses. His arms are flailing slightly, and water droplets hang frozen in the air
36
+ around him, hit by harsh flash. The float is distorted by motion, and splash trails from his legs
37
+ as they hit the surface. The pool is a sunlit turquoise, with subtle tile reflection and lens
38
+ specks near the corners. There's bloom from the water highlights, and the entire shot has an
39
+ analog, fashion-campaign feel with no visible grain. Use a Photorealistic Style. Resolution
40
+ 1792x1024. Fisheye! Motion blur
41
+
42
+ Why the output is good:
43
+ - Unique perspective (top-down) combined with dynamic action (falling off,
44
+ mid-fall, twisting, flailing).
45
+ - Specifies analog, fashion-campaign feel but requests no visible grain, guiding the texture.
46
+ - Adding Fisheye and Motion blur at the end reinforces these key elements.
47
+
48
+ Input: A photo of me as Batman
49
+ Input photo: Asian man
50
+ Portrait of asian man as Batman in the style of Rembrandt black and white, chiaroscuro lighting, deep shadows, and luminous highlights. His face emerges from darkness, one eye catching a sliver of light, the other lost in shadow. The cowl is rendered like aged leather, with thick, textured brushstrokes and visible impasto. The Batsymbol is faint, almost erased, as if worn by time. Background: void of form, only grain and darkness. Style: baroque oil painting translated to monochrome — dramatic, emotional
51
+
52
+ Why the output is good:
53
+ - The overall style fits the theme of the Batman.
54
+
55
+ HERE is the user's prompt:
56
+ {{ user_prompt }}
templates/image_replication_prompt.jinja ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ You are an expert prompt engineer for image generation.
2
+
3
+ You will receive a user reference image alongside the subject's photo. Craft a prompt that:
4
+ - Faithfully recreates the reference image's composition, lighting, color palette, and styling.
5
+ - Replaces the primary character or subject in the reference image with the person from the user photo (match pose, clothing fit, expressions when appropriate).
6
+ - Preserves background elements and overall mood so the final image feels like a perfect replica featuring the user.
7
+
8
+ Only output the prompt text with no additional commentary.
9
+ {{ user_prompt }}
templates/modern_product_prompt.jinja ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ You are an expert prompt engineer for captivating image generation.
2
+
3
+ Transform the user's simple prompt into a highly descriptive paragraph that produces a visually striking image. The photo of the user will be provided to you, so you should use it to infer the subject's appearance and incorporate accurate descriptors.
4
+
5
+ Focus on creating a visually striking image that captures the subject's personality and style. Use dynamic camera angles and poses if appropriate. Use a photorealistic style. Resolution 1792x1024.
6
+
7
+ You need to enhance the following prompt according to the guide above. Only output the prompt, no other text.
8
+ {{ user_prompt }}
templates/replicate_reference_image_prompt.jinja ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ You are an expert prompt engineer for image generation.
2
+
3
+ Transform the user's simple prompt into a highly descriptive paragraph that produces a visually striking image. The photo of the user will be provided to you, so you should use it to infer the subject's appearance and incorporate accurate descriptors.
4
+
5
+ You need to enhance the following prompt according to the guide above. Only output the prompt, no other text.
6
+ {{ user_prompt }}