ShinnosukeU commited on
Commit
6a9502c
Β·
verified Β·
1 Parent(s): bce9dbe

Upload folder using huggingface_hub

Browse files
Files changed (1) hide show
  1. app.py +294 -94
app.py CHANGED
@@ -68,40 +68,25 @@ HERE is the user's prompt:
68
  """)
69
 
70
 
71
- FASHION_PROMPT_TEMPLATE = jinja2.Template("""Generate a striking fashion editorial photograph with strong conceptual impact and visual
72
- distinction. Focus on capturing a model in a powerful pose or moment that showcases both
73
- their features and the styling elements (clothing, accessories, makeup) in a compelling context.
74
- context. Utilize bold lighting techniques (e.g., hard shadow play, colored gels, dramatic
75
- high-key or low-key setups) and innovative composition (e.g., unconventional cropping,
76
- extreme perspectives, symmetry/asymmetry) to create a distinctive mood. Incorporate
77
- environmental elements or props that enhance the narrative. The final image should balance
78
- artistic expression with commercial appeal, conveying a specific attitude, concept, or
79
- emotional tone while maintaining the fashion focus.
80
- Analysis:
81
- Fashion editorial prompts aim to create images with both artistic and commercial value.
82
- Success often comes from a precise balance of styling details, environmental context, and
83
- technical elements like lighting and composition. Strong concepts and bold visual choices
84
- typically yield the most compelling results. Describing makeup, accessories, and specific
85
- fashion elements in detail helps the AI create cohesive styling.
86
  Tips for Success:
87
- ● Specify the precise styling (clothing items, fabrics, colors, fit, accessories)
88
  ● Detail the model's features and pose (expression, positioning, gesture)
89
  ● Describe the makeup and hair with specificity (textures, colors, style)
90
  ● Define the lighting setup (direction, quality, color, shadow effects)
91
  ● Include props or environmental elements that enhance the concept
92
  ● Suggest a brand or editorial reference for stylistic guidance
93
  ● Add compositional directions (framing, cropping, perspective)
94
- Key Keywords:
95
- fashion editorial, fashion photography, studio photography, location editorial, model pose,
96
- conceptual fashion, striking composition, dramatic lighting, hard light, soft light, colored gels,
97
- editorial makeup, fashion styling, haute couture, ready-to-wear, fashion concept, art
98
- direction, commercial appeal, [brand reference]
99
-
100
 
101
  Examples:
102
-
103
  Prompt 1 (OHNEIS Runner):
104
-
 
105
  Output prompt:
106
  A stylized, cinematic portrait of a Black man captured from the chest up, set against a
107
  glowing deep red background. The image is tightly framed in vertical format, emphasizing his
@@ -115,28 +100,14 @@ reflective elements of both glasses and sweat on his skin. The mood is intense a
115
  β€” a blend of raw athleticism and streetwear elegance, evoking focus, style, and subtle
116
  rebellion. The torn shirt and race bib hint at exertion and context, while the engraved
117
  eyewear and red glow turn the portrait into a branded fashion statement.
118
- Sora Companion Prompt: A high-impact, vertically framed editorial portrait of a male
119
- runner against a rich red backdrop. The athlete wears a distressed black tank top with a torn
120
- collar and a large pinned-on race number reading "69" above the bold white word "OHNEIS"
121
- in a rectangular black bar. His face is partially shadowed, lit with cinematic precision that
122
- accentuates the contours of his skin and shoulders. He wears reflective blue OHNEIS sports
123
- sunglasses with the brand name clearly legible across the lens. His expression is stoic,
124
- exuding focus and control. The image balances a gritty, competitive energy with futuristic
125
- fashion elements and controlled studio lighting.
126
-
127
- ● Analysis: This prompt masterfully blends fashion and athletic elements. The detailed
128
- styling (torn tank top, race bib, metallic sunglasses) creates brand identity. Specific
129
- lighting directions (soft but focused, shadow contours) shape the mood. The Sora
130
- prompt maintains the exact styling while adding subtle emotional elements (stoic
131
- expression, focus and control) for video continuity.
132
-
133
- ● Variations & Keywords: Try different backgrounds (urban setting, abstract color
134
- field), lighting scenarios (harsh backlight, cool blue tones), model demographics, or
135
- accessories (compression sleeves, techwear). Keywords: athletic fashion, editorial
136
- portrait, directional lighting, race bib, futuristic eyewear, brand styling, reflective
137
- elements, torn fabric, cinematic portrait, color-blocking.
138
 
139
- Prompt 2 (Sprinting Runner):
 
140
  Output prompt:
141
  A vertical-format, side-profile flash photograph capturing a Black male runner sprinting
142
  down a sunlit urban street from an elevated angle. The camera looks slightly down at the
@@ -153,20 +124,16 @@ The background is dark asphalt, textured with faint painted lines and subtle sha
153
  elevated camera position allows for a sense of depth and rhythm as the runner cuts across
154
  the frame from left to right, motion trailing behind. Warm natural light streaks or golden
155
  ambient flares may bleed across the top of the image for added cinematic tension.
156
- ● Analysis: This prompt creates dynamic motion through specific technical directions
157
- (rear-curtain sync flash, motion blur). The clothing details (reflective silver
158
- windbreaker) add visual interest through light interaction. The composition
159
- instructions (elevated angle, center-right placement) guide spatial understanding.
160
- Environmental elements (grass strip, asphalt texture) add realism and grounding.
161
-
162
- ● Variations & Keywords: Change the setting (beach, track, trail), lighting (twilight,
163
- rainy day), clothing (branded apparel, minimal gear), or technique (front-curtain sync,
164
- pan blur). Keywords: sports fashion, running, motion photography, rear-curtain sync,
165
- flash photography, reflective materials, urban environment, elevated angle, side
166
- profile, dynamic pose, directional movement.
167
 
168
  Prompt 3 (Track Athlete):
169
 
 
 
170
  Output prompt:
171
  A flash-illuminated, hyper-dynamic close-up photograph capturing the feet of a Black
172
  female track runner launching from the starting blocks at night. The image is taken from a
@@ -183,48 +150,268 @@ The background is minimal and moody: abstract light streaks from stadium lightin
183
  diagonally behind her, forming a glowing contrast to the dark track. The overall tone is sleek,
184
  raw, and cinematic β€” focused on power, speed, and launch precision.
185
 
186
- ● Analysis: This prompt uses extreme close-up composition to transform a sports
187
- moment into fashion art. The visual contrast between static elements (starting blocks,
188
- track surface) and dynamic elements (mid-air foot, motion streaks) creates tension.
189
- Technical specifications (flash from front-left, slow shutter, rear-curtain sync) guide
190
- the lighting and motion effects precisely. Texture details (moisture droplets, metallic
191
- shoes) add depth and realism.
192
-
193
- ● Variations & Keywords: Try different sports (swimming dive, basketball jump shot),
194
- perspectives (from front, from above), lighting (daylight, colored gels), or focusing on
195
- different body parts (hands, torso). Keywords: athletic footwear, sprinting spikes,
196
- track and field, starting blocks, flash photography, low angle, extreme close-up,
197
- rear-curtain sync, moisture droplets, motion blur, explosive movement, performance
198
- apparel.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
199
 
200
  You need to enhance the following prompt according to the guide above. Only output the prompt, no other text.
201
  {{ user_prompt }}
202
  """)
203
 
 
 
 
 
 
204
 
205
- def process_prompt(image, target_label, user_prompt, style):
 
 
 
 
 
206
  image_url = None
 
 
 
 
 
 
 
 
 
 
 
 
 
207
 
208
- buffer = BytesIO()
209
- image.convert("RGB").save(buffer, format="JPEG", quality=90)
210
- b64_image = base64.b64encode(buffer.getvalue()).decode("utf-8")
211
- image_url = f"data:image/jpeg;base64,{b64_image}"
212
-
213
- if style == "Chromatic Cinematic":
214
- system_content = """You are an expert prompt engineer for chromatic cinematic-style image generation. Transform the user's simple prompt into a highly descriptive paragraph that produces a visually striking image with strong contrast and aesthetic color grading such as Wes Anderson. Frame close to the camera so the subject is immediately recognizable, emphasizing dynamic and exaggerated editorial posing. Integrate secondary subjects, environmental elements, and leading lines naturally into the scene to direct attention toward the main subjectβ€”examples like architectural beams, diagonal staircases, waves, or shadows can inspire but do not need to be used literally. Focus heavily on lighting to sculpt the form and mood, using two lighting sources from different directions, attractive color combinations, and interesting lighting angles (e.g., dramatic diagonal or overhead from the top-left corner). When referencing a style like Wes Anderson, describe the scene, composition, or color grading (e.g., bold symmetry, saturated pastels) without simply copying his visuals. Use a photorealistic style. Resolution 1792x1024."""
215
-
216
- user_content = (
217
- f"Use the uploaded image to infer the subject's appearance attribtues. Instead of refercing pronouns in the prompt (i.e. me/she siting on a chair), use the attributes to describe the subjet (i.e. the man with the glasses sitting on the chair). "
218
- f"Then transform this prompt into a detailed chromatic cinematic style description: User's prompt: {user_prompt}"
219
- )
220
- elif style == "Film Noir":
221
- system_content = "You are an expert prompt engineer for cinematic-style image generation in the film noir aesthetic. Transform the user's simple prompt into a highly descriptive paragraph that produces a visually striking image with high contrast, deep shadows, and moody lighting characteristic of classic noir. Frame close to the camera so the subject is immediately recognizable, emphasizing tense, dramatic, or expressive editorial posing. Integrate secondary subjects, environmental elements, and leading lines naturally into the scene to direct attention toward the main subjectβ€”examples like rain-slicked streets, lampposts casting long shadows, Venetian blinds, or fog can inspire but do not need to be used literally. Focus heavily on lighting to sculpt form and mood, using harsh key lights, soft fill lights, and strong directional shadows to create tension and depth. When referencing a style like film noir, describe the scene, composition, or tonal contrasts (e.g., stark black-and-white contrasts, smoky atmospheres, reflective wet surfaces) without simply copying existing visuals. Use a photorealistic style. Resolution 1792x1024."
222
- user_content = (
223
- "Use the uploaded image to infer the subject's appearance and incorporate accurate descriptors. "
224
- f"User's prompt: {user_prompt}"
225
- )
226
-
227
- elif style == "General":
228
  system_content = "You are expert prompt engineer"
229
  user_content = GENERAL_PROMPT_TEMPLATE.render(user_prompt=user_prompt)
230
 
@@ -232,6 +419,13 @@ def process_prompt(image, target_label, user_prompt, style):
232
  system_content = "You are expert prompt engineer"
233
  user_content = FASHION_PROMPT_TEMPLATE.render(user_prompt=user_prompt)
234
 
 
 
 
 
 
 
 
235
 
236
  response = client.responses.create(
237
  model="gpt-5",
@@ -245,7 +439,8 @@ def process_prompt(image, target_label, user_prompt, style):
245
  "role": "user",
246
  "content": [
247
  {"type": "input_text", "text": user_content},
248
- {"type": "input_image", "image_url": image_url}
 
249
  ]
250
  }
251
  ],
@@ -255,9 +450,14 @@ def process_prompt(image, target_label, user_prompt, style):
255
  demo = gr.Interface(
256
  fn=process_prompt,
257
  inputs=[
 
 
 
 
 
258
  gr.Image(
259
- label="Upload reference image",
260
- type="pil",
261
  ),
262
  gr.Textbox(
263
  label="Enter target label",
@@ -268,7 +468,7 @@ demo = gr.Interface(
268
  placeholder="picture of me while sitting in a chair in the ocean",
269
  ),
270
  gr.Dropdown(
271
- choices=["General", "Fashion"],
272
  #choices=["Chromatic Cinematic", "Neon Noir", "General"],
273
  label="Style Selection",
274
  info="Choose the visual style for your enhanced prompt"
 
68
  """)
69
 
70
 
71
+ FASHION_PROMPT_TEMPLATE = jinja2.Template("""You are an expert prompt engineer for a striking fashion editorial image generator.
72
+
73
+ Transform the user's simple prompt into a highly descriptive paragraph that produces a visually striking image. The photo of the user will be provided to you, so you should use it to infer the subject's appearance and incorporate accurate descriptors.
74
+
75
+ Focus on capturing a model in a powerful pose or moment that showcases both their features and the styling elements (clothing, accessories, makeup) in a compelling context. Utilize bold lighting techniques (e.g., hard shadow play, colored gels, dramatic high-key or low-key setups) and innovative composition (e.g., unconventional cropping, extreme perspectives, symmetry/asymmetry) to create a distinctive mood, and occasionally add lighting blurs to indicate movement when appropriate. Incorporate environmental elements or props that enhance the narrative. The final image should balance artistic expression with commercial appeal, conveying a specific attitude, concept, or emotional tone while maintaining the fashion focus. Make the background dark and moody so that the model looks cool.
76
+
 
 
 
 
 
 
 
 
 
77
  Tips for Success:
78
+ ● Specify the precise styling (clothing items, fabrics, colorxs, fit, accessories)
79
  ● Detail the model's features and pose (expression, positioning, gesture)
80
  ● Describe the makeup and hair with specificity (textures, colors, style)
81
  ● Define the lighting setup (direction, quality, color, shadow effects)
82
  ● Include props or environmental elements that enhance the concept
83
  ● Suggest a brand or editorial reference for stylistic guidance
84
  ● Add compositional directions (framing, cropping, perspective)
 
 
 
 
 
 
85
 
86
  Examples:
 
87
  Prompt 1 (OHNEIS Runner):
88
+ Input: A picture of me in a "OHNEIS" race bib
89
+ Input photo: Black man
90
  Output prompt:
91
  A stylized, cinematic portrait of a Black man captured from the chest up, set against a
92
  glowing deep red background. The image is tightly framed in vertical format, emphasizing his
 
100
  β€” a blend of raw athleticism and streetwear elegance, evoking focus, style, and subtle
101
  rebellion. The torn shirt and race bib hint at exertion and context, while the engraved
102
  eyewear and red glow turn the portrait into a branded fashion statement.
103
+
104
+ Why the output is good:
105
+ - Creates a brand identity (torn tank top, race bib, metallic sunglasses)
106
+ - Specific lighting directions (soft but focused, shadow contours) shape the mood.
107
+ - Specifies high fashion elements
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
108
 
109
+ Input: A picture of me sprinting
110
+ Input photo: Black man
111
  Output prompt:
112
  A vertical-format, side-profile flash photograph capturing a Black male runner sprinting
113
  down a sunlit urban street from an elevated angle. The camera looks slightly down at the
 
124
  elevated camera position allows for a sense of depth and rhythm as the runner cuts across
125
  the frame from left to right, motion trailing behind. Warm natural light streaks or golden
126
  ambient flares may bleed across the top of the image for added cinematic tension.
127
+
128
+ Why the output is good:
129
+ - Creates dynamic motion through specific technical directions
130
+ - It has composition instructions such as the elevated angle and the center-right placement
131
+ - It adds interesting and relative environmental elements such as the grass strip and the asphalt texture
 
 
 
 
 
 
132
 
133
  Prompt 3 (Track Athlete):
134
 
135
+ Input: A picture of me jumping off the starting blocks
136
+ Input photo: Black woman
137
  Output prompt:
138
  A flash-illuminated, hyper-dynamic close-up photograph capturing the feet of a Black
139
  female track runner launching from the starting blocks at night. The image is taken from a
 
150
  diagonally behind her, forming a glowing contrast to the dark track. The overall tone is sleek,
151
  raw, and cinematic β€” focused on power, speed, and launch precision.
152
 
153
+ Why the output is good:
154
+ - Uses extreme close-up composition to transform a sports moment into fashion art
155
+ - Creates tension between static and dynamic elements
156
+ - Uses technical specifications such as flash from front-left, slow shutter, rear-curtain sync
157
+ - Adds texture details such as moisture droplets and metallic shoes
158
+
159
+ You need to enhance the following prompt according to the guide above. Only output the prompt, no other text.
160
+ {{ user_prompt }}
161
+ """)
162
+
163
+
164
+ EMOTIONAL_LIFESTYLE_PROMPT_TEMPLATE = jinja2.Template("""You are an expert prompt engineer for emotional lifestyle image generation.
165
+
166
+ Transform the user's simple prompt into a highly descriptive paragraph that produces a visually striking image. The photo of the user will be provided to you, so you should use it to infer the subject's appearance and incorporate accurate descriptors.
167
+
168
+ Focus on creating a vivid lifestyle portrait that captures an authentic emotional moment or state within a visually compelling environment. Be attentive to portraying the subject in a way that reveals character, mood, or narrative through their expression, posture, and interaction with their surroundings. Use very cool contrasting colors to elevate the subject and utilize naturalistic lighting approaches (e.g., window light, ambient environmental lighting, soft golden hour) or stylized lighting that enhances the emotional tone. Incorporate environmental details that contribute to storytelling and provide context. The final image should feel intimate yet visually strikingβ€”balancing raw emotional authenticity with aesthetic sophistication through thoughtful composition, color treatment, and atmospheric elements.
169
+
170
+ Tips for Success:
171
+ - Make the prompt short
172
+ - Importantly highlight the user prompt request (i.e. if the user asks to be seen roller blading, the roller blades should be seen)
173
+ - Define the emotional state or moment clearly (vulnerability, joy, contemplation)
174
+ - Specify the lighting and how it enhances the mood (soft window light, dramatic shadows)
175
+ - Include meaningful props or elements that tell the subject's story
176
+ - Describe subtle details in expression or posture that convey emotion
177
+ - Consider color treatment that reinforces the emotional tone
178
+ - Add atmospheric elements that enhance the mood (water droplets, steam, fabric texture)
179
+
180
+ Examples:
181
+ Input: A picture of me crying on the phone
182
+ Input photo: A black woman
183
+ Output prompt:
184
+ A hyperrealistic editorial-style fashion photograph in vertical format (1080x1350), heavily
185
+ stylized with retro lighting, saturated color, and cinematic imperfection. A Black woman sits
186
+ facing the camera in a white wicker chair, wearing a glossy hot pink satin robe with sharp
187
+ lapels and a vintage brooch pinned to her chest. Her hair is styled in soft, voluminous waves.
188
+
189
+ She holds a red corded landline receiver in one hand, and in the other presses a tissue to
190
+ her cheek, caught mid-tear with a melodramatic, frozen expression. Her eyelids shimmer
191
+ with green eyeshadow, slightly smudged, evoking a stylized soap opera mood.
192
+ Resting on her lap is a bright blue tissue box with bold white cloud graphics β€” and
193
+ prominently across the front, the word "ohneis" is printed in large, clean white letters in a
194
+ stylized, editorial font. A single tissue protrudes, loosely folded over the edge. To her left, a
195
+ small round table holds the red phone base, crumpled tissues, and a decorative vase with
196
+ pink plastic flowers. The background is a matte lavender surface, creating smooth contrast
197
+ with the glossy fabrics and vibrant tones.
198
+
199
+ The entire image is treated with subtle analog effects: soft bloom on the highlights, visible
200
+ film grain, faint vertical scratches, floating dust particles, and a few light leaks that enhance
201
+ the stylized, nostalgic mood. The scene feels like a surreal still from a hyper-aestheticized
202
+ 1980s commercial.
203
+
204
+ Why the output is good:
205
+ - The prompt uses props to tell the story of the subject (red phone, tissue box, tissues)
206
+ - Defines details that describe the emotional moment (mid-tear)
207
+ - Adds interesting and bold colors (purple robe, hot pink robe, red phone, blue tissue box)
208
+
209
+ What can be better:
210
+ - the prompt is too long
211
+ - does not define the emotion clear enough
212
+
213
+ Input: A picture of water pouring on me
214
+ Input photo: Blue eyed white man
215
+ Output prompt:
216
+ A hyperrealistic flash photograph taken at eye level, capturing a half-body, front-facing
217
+ portrait of a young man standing shirtless against a sleek, modern white wall. A column of
218
+ water strikes him directly in the face at the moment of impact, caught mid-air in razor-sharp
219
+ detail β€” droplets frozen as they burst and scatter across his features. His right shoulder is
220
+ slightly raised and tensed, muscles subtly defined under the harsh lighting. His eyes are
221
+ half-closed in reaction, mouth neutral, giving the scene a raw, involuntary intensity. Around
222
+ his neck hangs a thin turquoise necklace, glinting faintly in the sun, its color vividly
223
+ contrasting with his sun-warmed skin. In the blurred background, a lone palm tree arcs
224
+ gently from the left edge of the frame, with the deep blue sea stretching toward a soft, hazy
225
+ horizon. The flash adds hard highlights to the water, the necklace, and the tension lines on
226
+ his body, while subtle analog textures β€” faint vertical lens scratches, fine grain, and
227
+ scattered dust β€” bring a tactile, editorial edge to the image.
228
+
229
+ Why the output is good:
230
+ - The promprt is relevant to the users imput as it describes exactly where the water is pouring
231
+ - The physical reaction (raised shoulder, half-closed eyes) describes the emotion
232
+ - Environmental hints (palm tree, sea) establish location context without overwhelming the portrait.
233
+
234
+ What can be better:
235
+ - His emotion is not clearly defined
236
+
237
+ Input: A picture of me in a helmer
238
+ Input photo: A white man with blue eyes
239
+ Output prompt:
240
+ A hyperrealistic macro flash photograph taken from a low frontal angle, capturing the
241
+ intense, close-up portrait of a tanned male model wearing a high-impact helmet with a
242
+ closed chin guard, resembling the design of a rugby or Formula 1 helmet. The camera is
243
+ positioned slightly beneath eye level, making the face appear dominant and imposing within
244
+ the vertical frame. His expression is calm but intense, with piercing clear blue eyes staring
245
+ directly into the lens, framed by the slightly open visor. The skin is bronzed and smooth, yet
246
+ visibly roughed by activity β€” fine scratches and a reddish abrasion across the nose give the
247
+ face a raw, lived-in quality. His sculpted features and symmetrical bone structure remain
248
+ visible beneath the helmet's padding. A small red carabiner is clipped casually to one of the
249
+ chin straps, functioning more like a fashion detail than gear. The flash harshly illuminates the
250
+ facial textures and helmet surface, producing sharp highlights and crisp shadows along the
251
+ cheeks and neck. The background is black and indistinct, fading away entirely. Fine analog
252
+ imperfections β€” vertical lens scratches, dust particles suspended midair in the flash cone,
253
+ and faint grain β€” lend the image a gritty, stylized realism.
254
+
255
+ Why the output is good:
256
+ - The prompt is relevant to the users imput as it describes exactly where the helmet is being worn
257
+ - The prompt defines his expression (His expression is calm but intense), and sepcifically adds details describing his expression (with piercing clear blue eyes staring
258
+ directly into the lens)
259
+ - the prompt is a good length (although it could be shorter)
260
+
261
+ You need to enhance the following prompt according to the guide above. Only output the prompt, no other text.
262
+ {{ user_prompt }}
263
+ """)
264
+
265
+
266
+ EXTREME_SPORTS_PROMPT_TEMPLATE = jinja2.Template("""You are an expert prompt engineer for extreme sports image generation.
267
+
268
+ Transform the user's simple prompt into a highly descriptive paragraph that produces a visually striking image. The photo of the user will be provided to you, so you should use it to infer the subject's appearance and incorporate accurate descriptors.
269
+
270
+ Focus on creating a dynamic, high-impact photograph capturing an adventure sport athlete in
271
+ mid-action. Utilize dramatic camera angles (e.g., low angle, fisheye, aerial) and specialized
272
+ lighting techniques (e.g., backlit silhouettes, flash freezing motion, golden hour glow) to
273
+ emphasize the intensity and athleticism of the moment. The picture is Black and White (and gray) ONLY.
274
+ Focus on capturing peak action – the apex of a jump, the spray of water/dirt/snow,
275
+ or the tension in the athlete's body, to highlight the users prompt request. Incorporate
276
+ environmental elements that enhance the narrative and mood, whether natural (mountains,
277
+ waves, desert) or urban (concrete, structures, cityscape). The image should balance raw
278
+ athleticism with cinematic drama through specific details in the subject's gear, expression,
279
+ environment, and the physical forces at play. Make sure that the prompt is short, to the point,
280
+ and relevant to the users prompt request.
281
+
282
+ Tips for Success:
283
+ ● Capture peak action moments and dynamic motion
284
+ ● Include specialized sports equipment and gear
285
+ ● Detail the extreme environment and conditions
286
+ ● Describe dramatic lighting setups (harsh shadows, rim lighting, flash freeze)
287
+ ● Include elements that convey danger and excitement
288
+ ● Focus on athletic poses and expressions of intensity
289
+ ● Add compositional directions that emphasize scale and drama
290
+
291
+ Tips for Success:
292
+ - Make the prompt short
293
+ - Importantly highlight the user prompt request (i.e. if the user asks to be seen roller blading, the roller blades should be seen)
294
+ - Capture peak action moments and dynamic motion
295
+ - Describe dramatic lighting setups (harsh shadows, rim lighting, flash freeze)
296
+ - Include elements that convey danger and excitement
297
+ - Focus on athletic poses and expressions of intensity
298
+ - Include VERY cool gear
299
+
300
+ Examples:
301
+ Input: A picture of me as a dessert biker
302
+ Input photo: White man
303
+ Output prompt:
304
+ A moody black-and-white portrait captures the silhouette of a dirt biker standing against a
305
+ hazy, light-splintered desert backdrop. The composition is tightly framed in portrait format,
306
+ showing the rider from just above the waist upward, centered in the shot and facing directly
307
+ into the camera. His posture is calm and unshaken β€” radiating confidence and defiance
308
+ beneath the helmet.
309
+ He wears a loose, oversized white T-shirt with visible holes and stains, heavily worn from
310
+ heat and dust, with the word "OHNEIS" boldly printed across the chest in cracked, industrial
311
+ lettering. The shirt hangs slightly off his shoulder, catching the soft ambient wind. Over his
312
+ face, a matte motocross helmet obscures his expression, but the eyes are just barely visible
313
+ through a clear, dust-specked motocross goggle. Across the top edge of the goggle lens, the
314
+ name "OHNEIS" is printed again β€” slightly curved with the lens contour, framed between
315
+ scattered reflections and dirt smudges.
316
+ Behind him, a cloud of lifted dust floats faintly in the air, and light from a high sun cuts
317
+ through the haze in harsh diagonal streaks, creating layered contrast and adding a cinematic
318
+ edge. Grain is prominent, especially in the midtones and background haze, while slight
319
+ motion blur in the particles gives the scene a sense of environmental motion despite the still
320
+ pose of the subject. The rider's dark gear stands in stark contrast to the pale light behind
321
+ him, with the overall tone raw, minimal, and visually arresting β€” a moment suspended in
322
+ dust and silence.
323
+
324
+ Why the output is good:
325
+ - This prompt creates powerful contrast between stillness (the posed rider) and subtle motion (dust in air).
326
+ - Clearly states Black and White stillness
327
+
328
+ What can be better:
329
+ - The prompt is too long
330
+
331
+ Input: A picture of a drifting porshe
332
+ Input photo: A car
333
+ Output prompt:
334
+ A high-contrast black-and-white photograph capturing an extreme close-up of the rear half of
335
+ a vintage Porsche 911 Carrera mid-drift through a desert curve. Shot tightly from a low
336
+ rear-three-quarter angle in portrait orientation, the frame focuses solely on the car's back
337
+ quarter panel, rear wheel, and the explosion of dust and smoke billowing behind it. The
338
+ vehicle's iconic curves, chrome bumper, and the number "911 OHNEIS" in Porsche's
339
+ signature font are clearly visible, slightly catching the harsh desert sunlight.
340
+ The composition centers on the raw chaos of the drift: the rear tire is kicking out violently to
341
+ the left, slicing into the sandy ground and throwing up a massive, high-reaching dust plume
342
+ that fills most of the upper half of the frame. This dust cloud appears dense, layered, and
343
+ almost sculptural β€” with illuminated outer edges catching bright light rays that streak
344
+ diagonally across the frame from the top right corner.
345
+ The motion blur is used selectively: the car's rear and wheel arch are mostly crisp, while the
346
+ tire and dust cloud blur dynamically to emphasize speed and torque. Grain is heavy
347
+ throughout the image, especially within the dust textures and darker shadows. The ground is
348
+ streaked with tire marks and disturbed sand, adding detail and context.
349
+ Shot with a shutter speed of approximately 1/40s using a panning technique, the image
350
+ retains key visual clarity while enhancing the sense of movement and kinetic energy. This
351
+ close-cropped perspective creates an intense, almost abstract portrait of the moment β€”
352
+ pure mechanical force meeting loose terrain in a visual blast of contrast and grit.
353
+
354
+ Why the output is good:
355
+ - This prompt excels in capturing dynamic motion through selective blur and focus.
356
+ - The prompt is relevant to the users imput as it describes exactly where the car is being driven
357
+ - Technical details like shutter speed and panning technique guide the AI toward realistic motion effects.
358
+
359
+ What can be better:
360
+ - The prompt is too long
361
+ - Does not specifically say Black and White
362
+
363
+ You need to enhance the following prompt according to the guide above. Only output the prompt, no other text.
364
+ {{ user_prompt }}
365
+ """)
366
+ '''
367
+ CAPTIVATING_PROMPT_TEMPLATE = jinja2.Template("""You are an expert prompt engineer for captivating image generation.
368
+
369
+ Transform the user's simple prompt into a highly descriptive paragraph that produces a visually striking image. The photo of the user will be provided to you, so you should use it to infer the subject's appearance and incorporate accurate descriptors.
370
+
371
+ Focus on creating a visually striking image that captures the subject's personality and style. Use dynamic camera angles and poses if appropriate. Use a photorealistic style. Resolution 1792x1024.
372
+
373
+ You need to enhance the following prompt according to the guide above. Only output the prompt, no other text.
374
+ {{ user_prompt }}
375
+ """)
376
+
377
+ MODERN_PRODUCT_TEMPLATE = jinja2.Template("""You are an expert prompt engineer for captivating image generation.
378
+
379
+ Transform the user's simple prompt into a highly descriptive paragraph that produces a visually striking image. The photo of the user will be provided to you, so you should use it to infer the subject's appearance and incorporate accurate descriptors.
380
+
381
+ Focus on creating a visually striking image that captures the subject's personality and style. Use dynamic camera angles and poses if appropriate. Use a photorealistic style. Resolution 1792x1024.
382
 
383
  You need to enhance the following prompt according to the guide above. Only output the prompt, no other text.
384
  {{ user_prompt }}
385
  """)
386
 
387
+ CAPTIVATING_PROMPT_TEMPLATE = jinja2.Template("""You are an expert prompt engineer for captivating image generation.
388
+
389
+ Transform the user's simple prompt into a highly descriptive paragraph that produces a visually striking image. The photo of the user will be provided to you, so you should use it to infer the subject's appearance and incorporate accurate descriptors.
390
+
391
+ Focus on creating a visually striking image that captures the subject's personality and style. Use dynamic camera angles and poses if appropriate. Use a photorealistic style. Resolution 1792x1024.
392
 
393
+ You need to enhance the following prompt according to the guide above. Only output the prompt, no other text.
394
+ {{ user_prompt }}
395
+ """)
396
+ '''
397
+
398
+ def process_prompt(image, image2, target_label, user_prompt, style):
399
  image_url = None
400
+ image_url2 = None
401
+
402
+ if image is not None:
403
+ buffer = BytesIO()
404
+ image.convert("RGB").save(buffer, format="JPEG", quality=90)
405
+ b64_image = base64.b64encode(buffer.getvalue()).decode("utf-8")
406
+ image_url = f"data:image/jpeg;base64,{b64_image}"
407
+
408
+ if image2 is not None:
409
+ buffer = BytesIO()
410
+ image2.convert("RGB").save(buffer, format="JPEG", quality=90)
411
+ b64_image2 = base64.b64encode(buffer.getvalue()).decode("utf-8")
412
+ image_url2 = f"data:image/jpeg;base64,{b64_image2}"
413
 
414
+ if style == "General":
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
415
  system_content = "You are expert prompt engineer"
416
  user_content = GENERAL_PROMPT_TEMPLATE.render(user_prompt=user_prompt)
417
 
 
419
  system_content = "You are expert prompt engineer"
420
  user_content = FASHION_PROMPT_TEMPLATE.render(user_prompt=user_prompt)
421
 
422
+ elif style == "Emotional Lifestyle":
423
+ system_content = "You are expert prompt engineer"
424
+ user_content = EMOTIONAL_LIFESTYLE_PROMPT_TEMPLATE.render(user_prompt=user_prompt)
425
+
426
+ elif style == "Extreme Sports":
427
+ system_content = "You are expert prompt engineer"
428
+ user_content = EXTREME_SPORTS_PROMPT_TEMPLATE.render(user_prompt=user_prompt)
429
 
430
  response = client.responses.create(
431
  model="gpt-5",
 
439
  "role": "user",
440
  "content": [
441
  {"type": "input_text", "text": user_content},
442
+ {"type": "input_image", "image_url": image_url},
443
+ {"type": "input_image", "image_url": image_url2}
444
  ]
445
  }
446
  ],
 
450
  demo = gr.Interface(
451
  fn=process_prompt,
452
  inputs=[
453
+
454
+ gr.Image(
455
+ label="Upload reference image",
456
+ type="pil"
457
+ ),
458
  gr.Image(
459
+ label="Upload 2nd reference image",
460
+ type="pil"
461
  ),
462
  gr.Textbox(
463
  label="Enter target label",
 
468
  placeholder="picture of me while sitting in a chair in the ocean",
469
  ),
470
  gr.Dropdown(
471
+ choices=["General", "Fashion", "Emotional Lifestyle", "Extreme Sports"],
472
  #choices=["Chromatic Cinematic", "Neon Noir", "General"],
473
  label="Style Selection",
474
  info="Choose the visual style for your enhanced prompt"