Upload folder using huggingface_hub
Browse files
app.py
CHANGED
|
@@ -68,40 +68,25 @@ HERE is the user's prompt:
|
|
| 68 |
""")
|
| 69 |
|
| 70 |
|
| 71 |
-
FASHION_PROMPT_TEMPLATE = jinja2.Template("""
|
| 72 |
-
|
| 73 |
-
|
| 74 |
-
|
| 75 |
-
high-key or low-key setups) and innovative composition (e.g., unconventional cropping,
|
| 76 |
-
|
| 77 |
-
environmental elements or props that enhance the narrative. The final image should balance
|
| 78 |
-
artistic expression with commercial appeal, conveying a specific attitude, concept, or
|
| 79 |
-
emotional tone while maintaining the fashion focus.
|
| 80 |
-
Analysis:
|
| 81 |
-
Fashion editorial prompts aim to create images with both artistic and commercial value.
|
| 82 |
-
Success often comes from a precise balance of styling details, environmental context, and
|
| 83 |
-
technical elements like lighting and composition. Strong concepts and bold visual choices
|
| 84 |
-
typically yield the most compelling results. Describing makeup, accessories, and specific
|
| 85 |
-
fashion elements in detail helps the AI create cohesive styling.
|
| 86 |
Tips for Success:
|
| 87 |
-
β Specify the precise styling (clothing items, fabrics,
|
| 88 |
β Detail the model's features and pose (expression, positioning, gesture)
|
| 89 |
β Describe the makeup and hair with specificity (textures, colors, style)
|
| 90 |
β Define the lighting setup (direction, quality, color, shadow effects)
|
| 91 |
β Include props or environmental elements that enhance the concept
|
| 92 |
β Suggest a brand or editorial reference for stylistic guidance
|
| 93 |
β Add compositional directions (framing, cropping, perspective)
|
| 94 |
-
Key Keywords:
|
| 95 |
-
fashion editorial, fashion photography, studio photography, location editorial, model pose,
|
| 96 |
-
conceptual fashion, striking composition, dramatic lighting, hard light, soft light, colored gels,
|
| 97 |
-
editorial makeup, fashion styling, haute couture, ready-to-wear, fashion concept, art
|
| 98 |
-
direction, commercial appeal, [brand reference]
|
| 99 |
-
|
| 100 |
|
| 101 |
Examples:
|
| 102 |
-
|
| 103 |
Prompt 1 (OHNEIS Runner):
|
| 104 |
-
|
|
|
|
| 105 |
Output prompt:
|
| 106 |
A stylized, cinematic portrait of a Black man captured from the chest up, set against a
|
| 107 |
glowing deep red background. The image is tightly framed in vertical format, emphasizing his
|
|
@@ -115,28 +100,14 @@ reflective elements of both glasses and sweat on his skin. The mood is intense a
|
|
| 115 |
β a blend of raw athleticism and streetwear elegance, evoking focus, style, and subtle
|
| 116 |
rebellion. The torn shirt and race bib hint at exertion and context, while the engraved
|
| 117 |
eyewear and red glow turn the portrait into a branded fashion statement.
|
| 118 |
-
|
| 119 |
-
|
| 120 |
-
|
| 121 |
-
|
| 122 |
-
|
| 123 |
-
sunglasses with the brand name clearly legible across the lens. His expression is stoic,
|
| 124 |
-
exuding focus and control. The image balances a gritty, competitive energy with futuristic
|
| 125 |
-
fashion elements and controlled studio lighting.
|
| 126 |
-
|
| 127 |
-
β Analysis: This prompt masterfully blends fashion and athletic elements. The detailed
|
| 128 |
-
styling (torn tank top, race bib, metallic sunglasses) creates brand identity. Specific
|
| 129 |
-
lighting directions (soft but focused, shadow contours) shape the mood. The Sora
|
| 130 |
-
prompt maintains the exact styling while adding subtle emotional elements (stoic
|
| 131 |
-
expression, focus and control) for video continuity.
|
| 132 |
-
|
| 133 |
-
β Variations & Keywords: Try different backgrounds (urban setting, abstract color
|
| 134 |
-
field), lighting scenarios (harsh backlight, cool blue tones), model demographics, or
|
| 135 |
-
accessories (compression sleeves, techwear). Keywords: athletic fashion, editorial
|
| 136 |
-
portrait, directional lighting, race bib, futuristic eyewear, brand styling, reflective
|
| 137 |
-
elements, torn fabric, cinematic portrait, color-blocking.
|
| 138 |
|
| 139 |
-
|
|
|
|
| 140 |
Output prompt:
|
| 141 |
A vertical-format, side-profile flash photograph capturing a Black male runner sprinting
|
| 142 |
down a sunlit urban street from an elevated angle. The camera looks slightly down at the
|
|
@@ -153,20 +124,16 @@ The background is dark asphalt, textured with faint painted lines and subtle sha
|
|
| 153 |
elevated camera position allows for a sense of depth and rhythm as the runner cuts across
|
| 154 |
the frame from left to right, motion trailing behind. Warm natural light streaks or golden
|
| 155 |
ambient flares may bleed across the top of the image for added cinematic tension.
|
| 156 |
-
|
| 157 |
-
|
| 158 |
-
|
| 159 |
-
instructions
|
| 160 |
-
|
| 161 |
-
|
| 162 |
-
β Variations & Keywords: Change the setting (beach, track, trail), lighting (twilight,
|
| 163 |
-
rainy day), clothing (branded apparel, minimal gear), or technique (front-curtain sync,
|
| 164 |
-
pan blur). Keywords: sports fashion, running, motion photography, rear-curtain sync,
|
| 165 |
-
flash photography, reflective materials, urban environment, elevated angle, side
|
| 166 |
-
profile, dynamic pose, directional movement.
|
| 167 |
|
| 168 |
Prompt 3 (Track Athlete):
|
| 169 |
|
|
|
|
|
|
|
| 170 |
Output prompt:
|
| 171 |
A flash-illuminated, hyper-dynamic close-up photograph capturing the feet of a Black
|
| 172 |
female track runner launching from the starting blocks at night. The image is taken from a
|
|
@@ -183,48 +150,268 @@ The background is minimal and moody: abstract light streaks from stadium lightin
|
|
| 183 |
diagonally behind her, forming a glowing contrast to the dark track. The overall tone is sleek,
|
| 184 |
raw, and cinematic β focused on power, speed, and launch precision.
|
| 185 |
|
| 186 |
-
|
| 187 |
-
|
| 188 |
-
|
| 189 |
-
|
| 190 |
-
|
| 191 |
-
|
| 192 |
-
|
| 193 |
-
|
| 194 |
-
|
| 195 |
-
|
| 196 |
-
|
| 197 |
-
|
| 198 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 199 |
|
| 200 |
You need to enhance the following prompt according to the guide above. Only output the prompt, no other text.
|
| 201 |
{{ user_prompt }}
|
| 202 |
""")
|
| 203 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 204 |
|
| 205 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 206 |
image_url = None
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 207 |
|
| 208 |
-
|
| 209 |
-
image.convert("RGB").save(buffer, format="JPEG", quality=90)
|
| 210 |
-
b64_image = base64.b64encode(buffer.getvalue()).decode("utf-8")
|
| 211 |
-
image_url = f"data:image/jpeg;base64,{b64_image}"
|
| 212 |
-
|
| 213 |
-
if style == "Chromatic Cinematic":
|
| 214 |
-
system_content = """You are an expert prompt engineer for chromatic cinematic-style image generation. Transform the user's simple prompt into a highly descriptive paragraph that produces a visually striking image with strong contrast and aesthetic color grading such as Wes Anderson. Frame close to the camera so the subject is immediately recognizable, emphasizing dynamic and exaggerated editorial posing. Integrate secondary subjects, environmental elements, and leading lines naturally into the scene to direct attention toward the main subjectβexamples like architectural beams, diagonal staircases, waves, or shadows can inspire but do not need to be used literally. Focus heavily on lighting to sculpt the form and mood, using two lighting sources from different directions, attractive color combinations, and interesting lighting angles (e.g., dramatic diagonal or overhead from the top-left corner). When referencing a style like Wes Anderson, describe the scene, composition, or color grading (e.g., bold symmetry, saturated pastels) without simply copying his visuals. Use a photorealistic style. Resolution 1792x1024."""
|
| 215 |
-
|
| 216 |
-
user_content = (
|
| 217 |
-
f"Use the uploaded image to infer the subject's appearance attribtues. Instead of refercing pronouns in the prompt (i.e. me/she siting on a chair), use the attributes to describe the subjet (i.e. the man with the glasses sitting on the chair). "
|
| 218 |
-
f"Then transform this prompt into a detailed chromatic cinematic style description: User's prompt: {user_prompt}"
|
| 219 |
-
)
|
| 220 |
-
elif style == "Film Noir":
|
| 221 |
-
system_content = "You are an expert prompt engineer for cinematic-style image generation in the film noir aesthetic. Transform the user's simple prompt into a highly descriptive paragraph that produces a visually striking image with high contrast, deep shadows, and moody lighting characteristic of classic noir. Frame close to the camera so the subject is immediately recognizable, emphasizing tense, dramatic, or expressive editorial posing. Integrate secondary subjects, environmental elements, and leading lines naturally into the scene to direct attention toward the main subjectβexamples like rain-slicked streets, lampposts casting long shadows, Venetian blinds, or fog can inspire but do not need to be used literally. Focus heavily on lighting to sculpt form and mood, using harsh key lights, soft fill lights, and strong directional shadows to create tension and depth. When referencing a style like film noir, describe the scene, composition, or tonal contrasts (e.g., stark black-and-white contrasts, smoky atmospheres, reflective wet surfaces) without simply copying existing visuals. Use a photorealistic style. Resolution 1792x1024."
|
| 222 |
-
user_content = (
|
| 223 |
-
"Use the uploaded image to infer the subject's appearance and incorporate accurate descriptors. "
|
| 224 |
-
f"User's prompt: {user_prompt}"
|
| 225 |
-
)
|
| 226 |
-
|
| 227 |
-
elif style == "General":
|
| 228 |
system_content = "You are expert prompt engineer"
|
| 229 |
user_content = GENERAL_PROMPT_TEMPLATE.render(user_prompt=user_prompt)
|
| 230 |
|
|
@@ -232,6 +419,13 @@ def process_prompt(image, target_label, user_prompt, style):
|
|
| 232 |
system_content = "You are expert prompt engineer"
|
| 233 |
user_content = FASHION_PROMPT_TEMPLATE.render(user_prompt=user_prompt)
|
| 234 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 235 |
|
| 236 |
response = client.responses.create(
|
| 237 |
model="gpt-5",
|
|
@@ -245,7 +439,8 @@ def process_prompt(image, target_label, user_prompt, style):
|
|
| 245 |
"role": "user",
|
| 246 |
"content": [
|
| 247 |
{"type": "input_text", "text": user_content},
|
| 248 |
-
{"type": "input_image", "image_url": image_url}
|
|
|
|
| 249 |
]
|
| 250 |
}
|
| 251 |
],
|
|
@@ -255,9 +450,14 @@ def process_prompt(image, target_label, user_prompt, style):
|
|
| 255 |
demo = gr.Interface(
|
| 256 |
fn=process_prompt,
|
| 257 |
inputs=[
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 258 |
gr.Image(
|
| 259 |
-
label="Upload reference image",
|
| 260 |
-
type="pil"
|
| 261 |
),
|
| 262 |
gr.Textbox(
|
| 263 |
label="Enter target label",
|
|
@@ -268,7 +468,7 @@ demo = gr.Interface(
|
|
| 268 |
placeholder="picture of me while sitting in a chair in the ocean",
|
| 269 |
),
|
| 270 |
gr.Dropdown(
|
| 271 |
-
choices=["General", "Fashion"],
|
| 272 |
#choices=["Chromatic Cinematic", "Neon Noir", "General"],
|
| 273 |
label="Style Selection",
|
| 274 |
info="Choose the visual style for your enhanced prompt"
|
|
|
|
| 68 |
""")
|
| 69 |
|
| 70 |
|
| 71 |
+
FASHION_PROMPT_TEMPLATE = jinja2.Template("""You are an expert prompt engineer for a striking fashion editorial image generator.
|
| 72 |
+
|
| 73 |
+
Transform the user's simple prompt into a highly descriptive paragraph that produces a visually striking image. The photo of the user will be provided to you, so you should use it to infer the subject's appearance and incorporate accurate descriptors.
|
| 74 |
+
|
| 75 |
+
Focus on capturing a model in a powerful pose or moment that showcases both their features and the styling elements (clothing, accessories, makeup) in a compelling context. Utilize bold lighting techniques (e.g., hard shadow play, colored gels, dramatic high-key or low-key setups) and innovative composition (e.g., unconventional cropping, extreme perspectives, symmetry/asymmetry) to create a distinctive mood, and occasionally add lighting blurs to indicate movement when appropriate. Incorporate environmental elements or props that enhance the narrative. The final image should balance artistic expression with commercial appeal, conveying a specific attitude, concept, or emotional tone while maintaining the fashion focus. Make the background dark and moody so that the model looks cool.
|
| 76 |
+
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 77 |
Tips for Success:
|
| 78 |
+
β Specify the precise styling (clothing items, fabrics, colorxs, fit, accessories)
|
| 79 |
β Detail the model's features and pose (expression, positioning, gesture)
|
| 80 |
β Describe the makeup and hair with specificity (textures, colors, style)
|
| 81 |
β Define the lighting setup (direction, quality, color, shadow effects)
|
| 82 |
β Include props or environmental elements that enhance the concept
|
| 83 |
β Suggest a brand or editorial reference for stylistic guidance
|
| 84 |
β Add compositional directions (framing, cropping, perspective)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 85 |
|
| 86 |
Examples:
|
|
|
|
| 87 |
Prompt 1 (OHNEIS Runner):
|
| 88 |
+
Input: A picture of me in a "OHNEIS" race bib
|
| 89 |
+
Input photo: Black man
|
| 90 |
Output prompt:
|
| 91 |
A stylized, cinematic portrait of a Black man captured from the chest up, set against a
|
| 92 |
glowing deep red background. The image is tightly framed in vertical format, emphasizing his
|
|
|
|
| 100 |
β a blend of raw athleticism and streetwear elegance, evoking focus, style, and subtle
|
| 101 |
rebellion. The torn shirt and race bib hint at exertion and context, while the engraved
|
| 102 |
eyewear and red glow turn the portrait into a branded fashion statement.
|
| 103 |
+
|
| 104 |
+
Why the output is good:
|
| 105 |
+
- Creates a brand identity (torn tank top, race bib, metallic sunglasses)
|
| 106 |
+
- Specific lighting directions (soft but focused, shadow contours) shape the mood.
|
| 107 |
+
- Specifies high fashion elements
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 108 |
|
| 109 |
+
Input: A picture of me sprinting
|
| 110 |
+
Input photo: Black man
|
| 111 |
Output prompt:
|
| 112 |
A vertical-format, side-profile flash photograph capturing a Black male runner sprinting
|
| 113 |
down a sunlit urban street from an elevated angle. The camera looks slightly down at the
|
|
|
|
| 124 |
elevated camera position allows for a sense of depth and rhythm as the runner cuts across
|
| 125 |
the frame from left to right, motion trailing behind. Warm natural light streaks or golden
|
| 126 |
ambient flares may bleed across the top of the image for added cinematic tension.
|
| 127 |
+
|
| 128 |
+
Why the output is good:
|
| 129 |
+
- Creates dynamic motion through specific technical directions
|
| 130 |
+
- It has composition instructions such as the elevated angle and the center-right placement
|
| 131 |
+
- It adds interesting and relative environmental elements such as the grass strip and the asphalt texture
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 132 |
|
| 133 |
Prompt 3 (Track Athlete):
|
| 134 |
|
| 135 |
+
Input: A picture of me jumping off the starting blocks
|
| 136 |
+
Input photo: Black woman
|
| 137 |
Output prompt:
|
| 138 |
A flash-illuminated, hyper-dynamic close-up photograph capturing the feet of a Black
|
| 139 |
female track runner launching from the starting blocks at night. The image is taken from a
|
|
|
|
| 150 |
diagonally behind her, forming a glowing contrast to the dark track. The overall tone is sleek,
|
| 151 |
raw, and cinematic β focused on power, speed, and launch precision.
|
| 152 |
|
| 153 |
+
Why the output is good:
|
| 154 |
+
- Uses extreme close-up composition to transform a sports moment into fashion art
|
| 155 |
+
- Creates tension between static and dynamic elements
|
| 156 |
+
- Uses technical specifications such as flash from front-left, slow shutter, rear-curtain sync
|
| 157 |
+
- Adds texture details such as moisture droplets and metallic shoes
|
| 158 |
+
|
| 159 |
+
You need to enhance the following prompt according to the guide above. Only output the prompt, no other text.
|
| 160 |
+
{{ user_prompt }}
|
| 161 |
+
""")
|
| 162 |
+
|
| 163 |
+
|
| 164 |
+
EMOTIONAL_LIFESTYLE_PROMPT_TEMPLATE = jinja2.Template("""You are an expert prompt engineer for emotional lifestyle image generation.
|
| 165 |
+
|
| 166 |
+
Transform the user's simple prompt into a highly descriptive paragraph that produces a visually striking image. The photo of the user will be provided to you, so you should use it to infer the subject's appearance and incorporate accurate descriptors.
|
| 167 |
+
|
| 168 |
+
Focus on creating a vivid lifestyle portrait that captures an authentic emotional moment or state within a visually compelling environment. Be attentive to portraying the subject in a way that reveals character, mood, or narrative through their expression, posture, and interaction with their surroundings. Use very cool contrasting colors to elevate the subject and utilize naturalistic lighting approaches (e.g., window light, ambient environmental lighting, soft golden hour) or stylized lighting that enhances the emotional tone. Incorporate environmental details that contribute to storytelling and provide context. The final image should feel intimate yet visually strikingβbalancing raw emotional authenticity with aesthetic sophistication through thoughtful composition, color treatment, and atmospheric elements.
|
| 169 |
+
|
| 170 |
+
Tips for Success:
|
| 171 |
+
- Make the prompt short
|
| 172 |
+
- Importantly highlight the user prompt request (i.e. if the user asks to be seen roller blading, the roller blades should be seen)
|
| 173 |
+
- Define the emotional state or moment clearly (vulnerability, joy, contemplation)
|
| 174 |
+
- Specify the lighting and how it enhances the mood (soft window light, dramatic shadows)
|
| 175 |
+
- Include meaningful props or elements that tell the subject's story
|
| 176 |
+
- Describe subtle details in expression or posture that convey emotion
|
| 177 |
+
- Consider color treatment that reinforces the emotional tone
|
| 178 |
+
- Add atmospheric elements that enhance the mood (water droplets, steam, fabric texture)
|
| 179 |
+
|
| 180 |
+
Examples:
|
| 181 |
+
Input: A picture of me crying on the phone
|
| 182 |
+
Input photo: A black woman
|
| 183 |
+
Output prompt:
|
| 184 |
+
A hyperrealistic editorial-style fashion photograph in vertical format (1080x1350), heavily
|
| 185 |
+
stylized with retro lighting, saturated color, and cinematic imperfection. A Black woman sits
|
| 186 |
+
facing the camera in a white wicker chair, wearing a glossy hot pink satin robe with sharp
|
| 187 |
+
lapels and a vintage brooch pinned to her chest. Her hair is styled in soft, voluminous waves.
|
| 188 |
+
|
| 189 |
+
She holds a red corded landline receiver in one hand, and in the other presses a tissue to
|
| 190 |
+
her cheek, caught mid-tear with a melodramatic, frozen expression. Her eyelids shimmer
|
| 191 |
+
with green eyeshadow, slightly smudged, evoking a stylized soap opera mood.
|
| 192 |
+
Resting on her lap is a bright blue tissue box with bold white cloud graphics β and
|
| 193 |
+
prominently across the front, the word "ohneis" is printed in large, clean white letters in a
|
| 194 |
+
stylized, editorial font. A single tissue protrudes, loosely folded over the edge. To her left, a
|
| 195 |
+
small round table holds the red phone base, crumpled tissues, and a decorative vase with
|
| 196 |
+
pink plastic flowers. The background is a matte lavender surface, creating smooth contrast
|
| 197 |
+
with the glossy fabrics and vibrant tones.
|
| 198 |
+
|
| 199 |
+
The entire image is treated with subtle analog effects: soft bloom on the highlights, visible
|
| 200 |
+
film grain, faint vertical scratches, floating dust particles, and a few light leaks that enhance
|
| 201 |
+
the stylized, nostalgic mood. The scene feels like a surreal still from a hyper-aestheticized
|
| 202 |
+
1980s commercial.
|
| 203 |
+
|
| 204 |
+
Why the output is good:
|
| 205 |
+
- The prompt uses props to tell the story of the subject (red phone, tissue box, tissues)
|
| 206 |
+
- Defines details that describe the emotional moment (mid-tear)
|
| 207 |
+
- Adds interesting and bold colors (purple robe, hot pink robe, red phone, blue tissue box)
|
| 208 |
+
|
| 209 |
+
What can be better:
|
| 210 |
+
- the prompt is too long
|
| 211 |
+
- does not define the emotion clear enough
|
| 212 |
+
|
| 213 |
+
Input: A picture of water pouring on me
|
| 214 |
+
Input photo: Blue eyed white man
|
| 215 |
+
Output prompt:
|
| 216 |
+
A hyperrealistic flash photograph taken at eye level, capturing a half-body, front-facing
|
| 217 |
+
portrait of a young man standing shirtless against a sleek, modern white wall. A column of
|
| 218 |
+
water strikes him directly in the face at the moment of impact, caught mid-air in razor-sharp
|
| 219 |
+
detail β droplets frozen as they burst and scatter across his features. His right shoulder is
|
| 220 |
+
slightly raised and tensed, muscles subtly defined under the harsh lighting. His eyes are
|
| 221 |
+
half-closed in reaction, mouth neutral, giving the scene a raw, involuntary intensity. Around
|
| 222 |
+
his neck hangs a thin turquoise necklace, glinting faintly in the sun, its color vividly
|
| 223 |
+
contrasting with his sun-warmed skin. In the blurred background, a lone palm tree arcs
|
| 224 |
+
gently from the left edge of the frame, with the deep blue sea stretching toward a soft, hazy
|
| 225 |
+
horizon. The flash adds hard highlights to the water, the necklace, and the tension lines on
|
| 226 |
+
his body, while subtle analog textures β faint vertical lens scratches, fine grain, and
|
| 227 |
+
scattered dust β bring a tactile, editorial edge to the image.
|
| 228 |
+
|
| 229 |
+
Why the output is good:
|
| 230 |
+
- The promprt is relevant to the users imput as it describes exactly where the water is pouring
|
| 231 |
+
- The physical reaction (raised shoulder, half-closed eyes) describes the emotion
|
| 232 |
+
- Environmental hints (palm tree, sea) establish location context without overwhelming the portrait.
|
| 233 |
+
|
| 234 |
+
What can be better:
|
| 235 |
+
- His emotion is not clearly defined
|
| 236 |
+
|
| 237 |
+
Input: A picture of me in a helmer
|
| 238 |
+
Input photo: A white man with blue eyes
|
| 239 |
+
Output prompt:
|
| 240 |
+
A hyperrealistic macro flash photograph taken from a low frontal angle, capturing the
|
| 241 |
+
intense, close-up portrait of a tanned male model wearing a high-impact helmet with a
|
| 242 |
+
closed chin guard, resembling the design of a rugby or Formula 1 helmet. The camera is
|
| 243 |
+
positioned slightly beneath eye level, making the face appear dominant and imposing within
|
| 244 |
+
the vertical frame. His expression is calm but intense, with piercing clear blue eyes staring
|
| 245 |
+
directly into the lens, framed by the slightly open visor. The skin is bronzed and smooth, yet
|
| 246 |
+
visibly roughed by activity β fine scratches and a reddish abrasion across the nose give the
|
| 247 |
+
face a raw, lived-in quality. His sculpted features and symmetrical bone structure remain
|
| 248 |
+
visible beneath the helmet's padding. A small red carabiner is clipped casually to one of the
|
| 249 |
+
chin straps, functioning more like a fashion detail than gear. The flash harshly illuminates the
|
| 250 |
+
facial textures and helmet surface, producing sharp highlights and crisp shadows along the
|
| 251 |
+
cheeks and neck. The background is black and indistinct, fading away entirely. Fine analog
|
| 252 |
+
imperfections β vertical lens scratches, dust particles suspended midair in the flash cone,
|
| 253 |
+
and faint grain β lend the image a gritty, stylized realism.
|
| 254 |
+
|
| 255 |
+
Why the output is good:
|
| 256 |
+
- The prompt is relevant to the users imput as it describes exactly where the helmet is being worn
|
| 257 |
+
- The prompt defines his expression (His expression is calm but intense), and sepcifically adds details describing his expression (with piercing clear blue eyes staring
|
| 258 |
+
directly into the lens)
|
| 259 |
+
- the prompt is a good length (although it could be shorter)
|
| 260 |
+
|
| 261 |
+
You need to enhance the following prompt according to the guide above. Only output the prompt, no other text.
|
| 262 |
+
{{ user_prompt }}
|
| 263 |
+
""")
|
| 264 |
+
|
| 265 |
+
|
| 266 |
+
EXTREME_SPORTS_PROMPT_TEMPLATE = jinja2.Template("""You are an expert prompt engineer for extreme sports image generation.
|
| 267 |
+
|
| 268 |
+
Transform the user's simple prompt into a highly descriptive paragraph that produces a visually striking image. The photo of the user will be provided to you, so you should use it to infer the subject's appearance and incorporate accurate descriptors.
|
| 269 |
+
|
| 270 |
+
Focus on creating a dynamic, high-impact photograph capturing an adventure sport athlete in
|
| 271 |
+
mid-action. Utilize dramatic camera angles (e.g., low angle, fisheye, aerial) and specialized
|
| 272 |
+
lighting techniques (e.g., backlit silhouettes, flash freezing motion, golden hour glow) to
|
| 273 |
+
emphasize the intensity and athleticism of the moment. The picture is Black and White (and gray) ONLY.
|
| 274 |
+
Focus on capturing peak action β the apex of a jump, the spray of water/dirt/snow,
|
| 275 |
+
or the tension in the athlete's body, to highlight the users prompt request. Incorporate
|
| 276 |
+
environmental elements that enhance the narrative and mood, whether natural (mountains,
|
| 277 |
+
waves, desert) or urban (concrete, structures, cityscape). The image should balance raw
|
| 278 |
+
athleticism with cinematic drama through specific details in the subject's gear, expression,
|
| 279 |
+
environment, and the physical forces at play. Make sure that the prompt is short, to the point,
|
| 280 |
+
and relevant to the users prompt request.
|
| 281 |
+
|
| 282 |
+
Tips for Success:
|
| 283 |
+
β Capture peak action moments and dynamic motion
|
| 284 |
+
β Include specialized sports equipment and gear
|
| 285 |
+
β Detail the extreme environment and conditions
|
| 286 |
+
β Describe dramatic lighting setups (harsh shadows, rim lighting, flash freeze)
|
| 287 |
+
β Include elements that convey danger and excitement
|
| 288 |
+
β Focus on athletic poses and expressions of intensity
|
| 289 |
+
β Add compositional directions that emphasize scale and drama
|
| 290 |
+
|
| 291 |
+
Tips for Success:
|
| 292 |
+
- Make the prompt short
|
| 293 |
+
- Importantly highlight the user prompt request (i.e. if the user asks to be seen roller blading, the roller blades should be seen)
|
| 294 |
+
- Capture peak action moments and dynamic motion
|
| 295 |
+
- Describe dramatic lighting setups (harsh shadows, rim lighting, flash freeze)
|
| 296 |
+
- Include elements that convey danger and excitement
|
| 297 |
+
- Focus on athletic poses and expressions of intensity
|
| 298 |
+
- Include VERY cool gear
|
| 299 |
+
|
| 300 |
+
Examples:
|
| 301 |
+
Input: A picture of me as a dessert biker
|
| 302 |
+
Input photo: White man
|
| 303 |
+
Output prompt:
|
| 304 |
+
A moody black-and-white portrait captures the silhouette of a dirt biker standing against a
|
| 305 |
+
hazy, light-splintered desert backdrop. The composition is tightly framed in portrait format,
|
| 306 |
+
showing the rider from just above the waist upward, centered in the shot and facing directly
|
| 307 |
+
into the camera. His posture is calm and unshaken β radiating confidence and defiance
|
| 308 |
+
beneath the helmet.
|
| 309 |
+
He wears a loose, oversized white T-shirt with visible holes and stains, heavily worn from
|
| 310 |
+
heat and dust, with the word "OHNEIS" boldly printed across the chest in cracked, industrial
|
| 311 |
+
lettering. The shirt hangs slightly off his shoulder, catching the soft ambient wind. Over his
|
| 312 |
+
face, a matte motocross helmet obscures his expression, but the eyes are just barely visible
|
| 313 |
+
through a clear, dust-specked motocross goggle. Across the top edge of the goggle lens, the
|
| 314 |
+
name "OHNEIS" is printed again β slightly curved with the lens contour, framed between
|
| 315 |
+
scattered reflections and dirt smudges.
|
| 316 |
+
Behind him, a cloud of lifted dust floats faintly in the air, and light from a high sun cuts
|
| 317 |
+
through the haze in harsh diagonal streaks, creating layered contrast and adding a cinematic
|
| 318 |
+
edge. Grain is prominent, especially in the midtones and background haze, while slight
|
| 319 |
+
motion blur in the particles gives the scene a sense of environmental motion despite the still
|
| 320 |
+
pose of the subject. The rider's dark gear stands in stark contrast to the pale light behind
|
| 321 |
+
him, with the overall tone raw, minimal, and visually arresting β a moment suspended in
|
| 322 |
+
dust and silence.
|
| 323 |
+
|
| 324 |
+
Why the output is good:
|
| 325 |
+
- This prompt creates powerful contrast between stillness (the posed rider) and subtle motion (dust in air).
|
| 326 |
+
- Clearly states Black and White stillness
|
| 327 |
+
|
| 328 |
+
What can be better:
|
| 329 |
+
- The prompt is too long
|
| 330 |
+
|
| 331 |
+
Input: A picture of a drifting porshe
|
| 332 |
+
Input photo: A car
|
| 333 |
+
Output prompt:
|
| 334 |
+
A high-contrast black-and-white photograph capturing an extreme close-up of the rear half of
|
| 335 |
+
a vintage Porsche 911 Carrera mid-drift through a desert curve. Shot tightly from a low
|
| 336 |
+
rear-three-quarter angle in portrait orientation, the frame focuses solely on the car's back
|
| 337 |
+
quarter panel, rear wheel, and the explosion of dust and smoke billowing behind it. The
|
| 338 |
+
vehicle's iconic curves, chrome bumper, and the number "911 OHNEIS" in Porsche's
|
| 339 |
+
signature font are clearly visible, slightly catching the harsh desert sunlight.
|
| 340 |
+
The composition centers on the raw chaos of the drift: the rear tire is kicking out violently to
|
| 341 |
+
the left, slicing into the sandy ground and throwing up a massive, high-reaching dust plume
|
| 342 |
+
that fills most of the upper half of the frame. This dust cloud appears dense, layered, and
|
| 343 |
+
almost sculptural β with illuminated outer edges catching bright light rays that streak
|
| 344 |
+
diagonally across the frame from the top right corner.
|
| 345 |
+
The motion blur is used selectively: the car's rear and wheel arch are mostly crisp, while the
|
| 346 |
+
tire and dust cloud blur dynamically to emphasize speed and torque. Grain is heavy
|
| 347 |
+
throughout the image, especially within the dust textures and darker shadows. The ground is
|
| 348 |
+
streaked with tire marks and disturbed sand, adding detail and context.
|
| 349 |
+
Shot with a shutter speed of approximately 1/40s using a panning technique, the image
|
| 350 |
+
retains key visual clarity while enhancing the sense of movement and kinetic energy. This
|
| 351 |
+
close-cropped perspective creates an intense, almost abstract portrait of the moment β
|
| 352 |
+
pure mechanical force meeting loose terrain in a visual blast of contrast and grit.
|
| 353 |
+
|
| 354 |
+
Why the output is good:
|
| 355 |
+
- This prompt excels in capturing dynamic motion through selective blur and focus.
|
| 356 |
+
- The prompt is relevant to the users imput as it describes exactly where the car is being driven
|
| 357 |
+
- Technical details like shutter speed and panning technique guide the AI toward realistic motion effects.
|
| 358 |
+
|
| 359 |
+
What can be better:
|
| 360 |
+
- The prompt is too long
|
| 361 |
+
- Does not specifically say Black and White
|
| 362 |
+
|
| 363 |
+
You need to enhance the following prompt according to the guide above. Only output the prompt, no other text.
|
| 364 |
+
{{ user_prompt }}
|
| 365 |
+
""")
|
| 366 |
+
'''
|
| 367 |
+
CAPTIVATING_PROMPT_TEMPLATE = jinja2.Template("""You are an expert prompt engineer for captivating image generation.
|
| 368 |
+
|
| 369 |
+
Transform the user's simple prompt into a highly descriptive paragraph that produces a visually striking image. The photo of the user will be provided to you, so you should use it to infer the subject's appearance and incorporate accurate descriptors.
|
| 370 |
+
|
| 371 |
+
Focus on creating a visually striking image that captures the subject's personality and style. Use dynamic camera angles and poses if appropriate. Use a photorealistic style. Resolution 1792x1024.
|
| 372 |
+
|
| 373 |
+
You need to enhance the following prompt according to the guide above. Only output the prompt, no other text.
|
| 374 |
+
{{ user_prompt }}
|
| 375 |
+
""")
|
| 376 |
+
|
| 377 |
+
MODERN_PRODUCT_TEMPLATE = jinja2.Template("""You are an expert prompt engineer for captivating image generation.
|
| 378 |
+
|
| 379 |
+
Transform the user's simple prompt into a highly descriptive paragraph that produces a visually striking image. The photo of the user will be provided to you, so you should use it to infer the subject's appearance and incorporate accurate descriptors.
|
| 380 |
+
|
| 381 |
+
Focus on creating a visually striking image that captures the subject's personality and style. Use dynamic camera angles and poses if appropriate. Use a photorealistic style. Resolution 1792x1024.
|
| 382 |
|
| 383 |
You need to enhance the following prompt according to the guide above. Only output the prompt, no other text.
|
| 384 |
{{ user_prompt }}
|
| 385 |
""")
|
| 386 |
|
| 387 |
+
CAPTIVATING_PROMPT_TEMPLATE = jinja2.Template("""You are an expert prompt engineer for captivating image generation.
|
| 388 |
+
|
| 389 |
+
Transform the user's simple prompt into a highly descriptive paragraph that produces a visually striking image. The photo of the user will be provided to you, so you should use it to infer the subject's appearance and incorporate accurate descriptors.
|
| 390 |
+
|
| 391 |
+
Focus on creating a visually striking image that captures the subject's personality and style. Use dynamic camera angles and poses if appropriate. Use a photorealistic style. Resolution 1792x1024.
|
| 392 |
|
| 393 |
+
You need to enhance the following prompt according to the guide above. Only output the prompt, no other text.
|
| 394 |
+
{{ user_prompt }}
|
| 395 |
+
""")
|
| 396 |
+
'''
|
| 397 |
+
|
| 398 |
+
def process_prompt(image, image2, target_label, user_prompt, style):
|
| 399 |
image_url = None
|
| 400 |
+
image_url2 = None
|
| 401 |
+
|
| 402 |
+
if image is not None:
|
| 403 |
+
buffer = BytesIO()
|
| 404 |
+
image.convert("RGB").save(buffer, format="JPEG", quality=90)
|
| 405 |
+
b64_image = base64.b64encode(buffer.getvalue()).decode("utf-8")
|
| 406 |
+
image_url = f"data:image/jpeg;base64,{b64_image}"
|
| 407 |
+
|
| 408 |
+
if image2 is not None:
|
| 409 |
+
buffer = BytesIO()
|
| 410 |
+
image2.convert("RGB").save(buffer, format="JPEG", quality=90)
|
| 411 |
+
b64_image2 = base64.b64encode(buffer.getvalue()).decode("utf-8")
|
| 412 |
+
image_url2 = f"data:image/jpeg;base64,{b64_image2}"
|
| 413 |
|
| 414 |
+
if style == "General":
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 415 |
system_content = "You are expert prompt engineer"
|
| 416 |
user_content = GENERAL_PROMPT_TEMPLATE.render(user_prompt=user_prompt)
|
| 417 |
|
|
|
|
| 419 |
system_content = "You are expert prompt engineer"
|
| 420 |
user_content = FASHION_PROMPT_TEMPLATE.render(user_prompt=user_prompt)
|
| 421 |
|
| 422 |
+
elif style == "Emotional Lifestyle":
|
| 423 |
+
system_content = "You are expert prompt engineer"
|
| 424 |
+
user_content = EMOTIONAL_LIFESTYLE_PROMPT_TEMPLATE.render(user_prompt=user_prompt)
|
| 425 |
+
|
| 426 |
+
elif style == "Extreme Sports":
|
| 427 |
+
system_content = "You are expert prompt engineer"
|
| 428 |
+
user_content = EXTREME_SPORTS_PROMPT_TEMPLATE.render(user_prompt=user_prompt)
|
| 429 |
|
| 430 |
response = client.responses.create(
|
| 431 |
model="gpt-5",
|
|
|
|
| 439 |
"role": "user",
|
| 440 |
"content": [
|
| 441 |
{"type": "input_text", "text": user_content},
|
| 442 |
+
{"type": "input_image", "image_url": image_url},
|
| 443 |
+
{"type": "input_image", "image_url": image_url2}
|
| 444 |
]
|
| 445 |
}
|
| 446 |
],
|
|
|
|
| 450 |
demo = gr.Interface(
|
| 451 |
fn=process_prompt,
|
| 452 |
inputs=[
|
| 453 |
+
|
| 454 |
+
gr.Image(
|
| 455 |
+
label="Upload reference image",
|
| 456 |
+
type="pil"
|
| 457 |
+
),
|
| 458 |
gr.Image(
|
| 459 |
+
label="Upload 2nd reference image",
|
| 460 |
+
type="pil"
|
| 461 |
),
|
| 462 |
gr.Textbox(
|
| 463 |
label="Enter target label",
|
|
|
|
| 468 |
placeholder="picture of me while sitting in a chair in the ocean",
|
| 469 |
),
|
| 470 |
gr.Dropdown(
|
| 471 |
+
choices=["General", "Fashion", "Emotional Lifestyle", "Extreme Sports"],
|
| 472 |
#choices=["Chromatic Cinematic", "Neon Noir", "General"],
|
| 473 |
label="Style Selection",
|
| 474 |
info="Choose the visual style for your enhanced prompt"
|