Spaces:
Running
on
Zero
Running
on
Zero
fancyfeast
commited on
Commit
·
301ae18
1
Parent(s):
40e61c6
Tweak UI and guide
Browse files
app.py
CHANGED
|
@@ -27,13 +27,9 @@ DESCRIPTION = """
|
|
| 27 |
<h2>Quick-start</h2>
|
| 28 |
<ol>
|
| 29 |
<li><strong>Upload or drop</strong> an image in the left-hand panel.</li>
|
| 30 |
-
<li>Pick a <strong>Caption Type</strong> and, if you wish, adjust the
|
| 31 |
-
|
| 32 |
-
<li>(Optional)
|
| 33 |
-
– these add or remove specific details in the caption.</li>
|
| 34 |
-
<li>(Optional) expand <em>Generation settings</em> to tune
|
| 35 |
-
<code>temperature</code>, <code>top-p</code>, or
|
| 36 |
-
<code>max tokens</code>.</li>
|
| 37 |
<li>Press <kbd>Caption</kbd>.
|
| 38 |
The prompt sent to the model appears in the <em>Prompt</em> box (editable),
|
| 39 |
and the resulting caption streams into the <em>Caption</em> box.</li>
|
|
@@ -50,21 +46,20 @@ DESCRIPTION = """
|
|
| 50 |
<tr><td><strong>Straightforward</strong></td>
|
| 51 |
<td>Objective, no fluff, and more succinct than Descriptive.</td></tr>
|
| 52 |
<tr><td><strong>Stable Diffusion Prompt</strong></td>
|
| 53 |
-
<td>Reverse-engineers a prompt that could have produced the image in a
|
| 54 |
-
SD/T2I model.</td></tr>
|
| 55 |
<tr><td><strong>MidJourney</strong></td>
|
| 56 |
-
<td>Same idea as above but tuned to MidJourney’s prompt style.</td></tr>
|
| 57 |
<tr><td><strong>Danbooru tag list</strong></td>
|
| 58 |
<td>Comma-separated tags strictly following Danbooru conventions
|
| 59 |
-
(artist:, copyright:, etc.). Lower-case underscores only
|
| 60 |
<tr><td><strong>e621 tag list</strong></td>
|
| 61 |
<td>Alphabetical, namespaced tags in e621 style – includes species/meta
|
| 62 |
-
tags when relevant
|
| 63 |
<tr><td><strong>rul34 tag list</strong></td>
|
| 64 |
<td>Rule34 style alphabetical tag dump; artist/copyright/character
|
| 65 |
-
prefixes first
|
| 66 |
<tr><td><strong>Booru-like tag list</strong></td>
|
| 67 |
-
<td>Looser tag list when you want labels but not a specific Booru format
|
| 68 |
<tr><td><strong>Art Critic</strong></td>
|
| 69 |
<td>Paragraph of art-historical commentary: composition, symbolism, style,
|
| 70 |
lighting, movement, etc.</td></tr>
|
|
@@ -74,6 +69,12 @@ DESCRIPTION = """
|
|
| 74 |
<td>Catchy caption aimed at platforms like Instagram or BlueSky.</td></tr>
|
| 75 |
</table>
|
| 76 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 77 |
<!-- ───────────────────── Extras + generation notes ───────────────── -->
|
| 78 |
<h3>Extra Options</h3>
|
| 79 |
<p>These check-boxes fine-tune what the model should or should not mention:
|
|
@@ -267,38 +268,39 @@ with gr.Blocks() as demo:
|
|
| 267 |
value="long",
|
| 268 |
)
|
| 269 |
|
| 270 |
-
|
| 271 |
-
|
| 272 |
-
|
| 273 |
-
|
| 274 |
-
|
| 275 |
-
|
| 276 |
-
|
| 277 |
-
|
| 278 |
-
|
| 279 |
-
|
| 280 |
-
|
| 281 |
-
|
| 282 |
-
|
| 283 |
-
|
| 284 |
-
|
| 285 |
-
|
| 286 |
-
|
| 287 |
-
|
| 288 |
-
|
| 289 |
-
|
| 290 |
-
|
| 291 |
-
|
| 292 |
-
|
| 293 |
-
|
| 294 |
-
|
| 295 |
-
|
| 296 |
-
|
| 297 |
-
|
| 298 |
-
|
| 299 |
-
|
| 300 |
-
|
| 301 |
-
|
|
|
|
| 302 |
|
| 303 |
name_input = gr.Textbox(label="Person / Character Name")
|
| 304 |
|
|
|
|
| 27 |
<h2>Quick-start</h2>
|
| 28 |
<ol>
|
| 29 |
<li><strong>Upload or drop</strong> an image in the left-hand panel.</li>
|
| 30 |
+
<li>Pick a <strong>Caption Type</strong> and, if you wish, adjust the <strong>Caption Length</strong>.</li>
|
| 31 |
+
<li>(Optional) <em>expand the "Extra Options" accordion</em> and tick any boxes that should influence the caption.</li>
|
| 32 |
+
<li>(Optional) open <em>Generation settings</em> to adjust <code>temperature</code>, <code>top-p</code>, or <code>max tokens</code>.</li>
|
|
|
|
|
|
|
|
|
|
|
|
|
| 33 |
<li>Press <kbd>Caption</kbd>.
|
| 34 |
The prompt sent to the model appears in the <em>Prompt</em> box (editable),
|
| 35 |
and the resulting caption streams into the <em>Caption</em> box.</li>
|
|
|
|
| 46 |
<tr><td><strong>Straightforward</strong></td>
|
| 47 |
<td>Objective, no fluff, and more succinct than Descriptive.</td></tr>
|
| 48 |
<tr><td><strong>Stable Diffusion Prompt</strong></td>
|
| 49 |
+
<td>Reverse-engineers a prompt that could have produced the image in a SD/T2I model.<br><em>⚠︎ Experimental – can glitch ≈ 3 % of the time.</em></td></tr>
|
|
|
|
| 50 |
<tr><td><strong>MidJourney</strong></td>
|
| 51 |
+
<td>Same idea as above but tuned to MidJourney’s prompt style.<br><em>⚠︎ Experimental – can glitch ≈ 3 % of the time.</em></td></tr>
|
| 52 |
<tr><td><strong>Danbooru tag list</strong></td>
|
| 53 |
<td>Comma-separated tags strictly following Danbooru conventions
|
| 54 |
+
(artist:, copyright:, etc.). Lower-case underscores only.<br><em>⚠︎ Experimental – can glitch ≈ 3 %.</em></td></tr>
|
| 55 |
<tr><td><strong>e621 tag list</strong></td>
|
| 56 |
<td>Alphabetical, namespaced tags in e621 style – includes species/meta
|
| 57 |
+
tags when relevant.<br><em>⚠︎ Experimental – can glitch ≈ 3 %.</em></td></tr>
|
| 58 |
<tr><td><strong>rul34 tag list</strong></td>
|
| 59 |
<td>Rule34 style alphabetical tag dump; artist/copyright/character
|
| 60 |
+
prefixes first.<br><em>⚠︎ Experimental – can glitch ≈ 3 %.</em></td></tr>
|
| 61 |
<tr><td><strong>Booru-like tag list</strong></td>
|
| 62 |
+
<td>Looser tag list when you want labels but not a specific Booru format.<br><em>⚠︎ Experimental – can glitch ≈ 3 %.</em></td></tr>
|
| 63 |
<tr><td><strong>Art Critic</strong></td>
|
| 64 |
<td>Paragraph of art-historical commentary: composition, symbolism, style,
|
| 65 |
lighting, movement, etc.</td></tr>
|
|
|
|
| 69 |
<td>Catchy caption aimed at platforms like Instagram or BlueSky.</td></tr>
|
| 70 |
</table>
|
| 71 |
|
| 72 |
+
<p style="margin-top:0.6em">
|
| 73 |
+
<strong>Note on Booru modes:</strong> They’re tuned for
|
| 74 |
+
anime-style / illustration imagery; accuracy drops on real-world photographs
|
| 75 |
+
or highly abstract artwork.
|
| 76 |
+
</p>
|
| 77 |
+
|
| 78 |
<!-- ───────────────────── Extras + generation notes ───────────────── -->
|
| 79 |
<h3>Extra Options</h3>
|
| 80 |
<p>These check-boxes fine-tune what the model should or should not mention:
|
|
|
|
| 268 |
value="long",
|
| 269 |
)
|
| 270 |
|
| 271 |
+
with gr.Accordion("Extra Options", open=False):
|
| 272 |
+
extra_options = gr.CheckboxGroup(
|
| 273 |
+
choices=[
|
| 274 |
+
"If there is a person/character in the image you must refer to them as {name}.",
|
| 275 |
+
"Do NOT include information about people/characters that cannot be changed (like ethnicity, gender, etc), but do still include changeable attributes (like hair style).",
|
| 276 |
+
"Include information about lighting.",
|
| 277 |
+
"Include information about camera angle.",
|
| 278 |
+
"Include information about whether there is a watermark or not.",
|
| 279 |
+
"Include information about whether there are JPEG artifacts or not.",
|
| 280 |
+
"If it is a photo you MUST include information about what camera was likely used and details such as aperture, shutter speed, ISO, etc.",
|
| 281 |
+
"Do NOT include anything sexual; keep it PG.",
|
| 282 |
+
"Do NOT mention the image's resolution.",
|
| 283 |
+
"You MUST include information about the subjective aesthetic quality of the image from low to very high.",
|
| 284 |
+
"Include information on the image's composition style, such as leading lines, rule of thirds, or symmetry.",
|
| 285 |
+
"Do NOT mention any text that is in the image.",
|
| 286 |
+
"Specify the depth of field and whether the background is in focus or blurred.",
|
| 287 |
+
"If applicable, mention the likely use of artificial or natural lighting sources.",
|
| 288 |
+
"Do NOT use any ambiguous language.",
|
| 289 |
+
"Include whether the image is sfw, suggestive, or nsfw.",
|
| 290 |
+
"ONLY describe the most important elements of the image.",
|
| 291 |
+
"If it is a work of art, do not include the artist's name or the title of the work.",
|
| 292 |
+
"Identify the image orientation (portrait, landscape, or square) and aspect ratio if obvious.",
|
| 293 |
+
"""Use vulgar slang and profanity, such as (but not limited to) "fucking," "slut," "cock," etc.""",
|
| 294 |
+
"Do NOT use polite euphemisms—lean into blunt, casual phrasing.",
|
| 295 |
+
"Include information about the ages of any people/characters when applicable.",
|
| 296 |
+
"Mention whether the image depicts an extreme close-up, close-up, medium close-up, medium shot, cowboy shot, medium wide shot, wide shot, or extreme wide shot.",
|
| 297 |
+
"Do not mention the mood/feeling/etc of the image.",
|
| 298 |
+
"Explicitly specify the vantage height (eye-level, low-angle worm’s-eye, bird’s-eye, drone, rooftop, etc.).",
|
| 299 |
+
"If there is a watermark, you must mention it.",
|
| 300 |
+
"""Your response will be used by a text-to-image model, so avoid useless meta phrases like “This image shows…”, "You are looking at...", etc.""",
|
| 301 |
+
],
|
| 302 |
+
label="Select one or more",
|
| 303 |
+
)
|
| 304 |
|
| 305 |
name_input = gr.Textbox(label="Person / Character Name")
|
| 306 |
|