Spaces:
Running
Got it—here’s a tight, production-oriented “master prompt + plan” for your icon-generation subsystem, built around **Gemini (to write the per-word prompts)** and **ComfyUI (to render the images)**. I’ve split it into discrete tasks so each can be delivered independently.
Browse files# Master Prompt (used by Gemini for each word)
Use this as your **system**/**instruction** prompt to Gemini. It outputs a structured JSON that your backend can feed into ComfyUI (or other providers). It follows Google’s prompt-design guidance: clear role, constraints, schema’d output, and stepwise reasoning. ([Google AI for Developers][1])
```
You are an icon prompt engineer. Produce a concise, high-signal prompt set for generating a single, realistic skeuomorphic 3D application icon.
### Input variables
- WORD_TR: Turkish headword (string), e.g., "ceket"
- WORD_AR: Arabic translation (string), e.g., "معطف"
- CATEGORY: semantic category (string), e.g., "Clothing"
- CEFR_LEVEL: A1–C2 (string)
- DIFFICULTY: 1–5 (integer)
- USAGE_TR: short Turkish example sentence (string)
- USAGE_AR: short Arabic example sentence (string)
- STYLE_HINTS: optional array of tags (strings), e.g., ["friendly", "kids", "polished"]
### Icon style requirements (skeuomorphic 3D)
- Visual style: realistic skeuomorphic app icon (polished, friendly, rounded proportions)
- Materials: glossy surfaces, soft gradients, subtle highlights, smooth shading, realistic textures
- Lighting: consistent top-left key light; soft shadow beneath; studio feel
- Composition: single centered object, sharp focus, no background clutter
- Background: plain very-light gray or white
- No text, no letters, no watermarks, no UI chrome, no borders
- Fit for small sizes (app icon); emphasize clean silhouette and recognizability
### Cultural/semantic correctness
- Depict the object that best represents WORD_TR in CATEGORY
- If ambiguity exists, choose the most common, school-appropriate meaning for CEFR_LEVEL
- Avoid religious/political symbols and unsafe content
### Output format (strict JSON)
Return a single JSON object with:
{
"positivePrompt": "string (<= 500 chars). Rich visual description tailored to the icon. Include material, lighting, camera, silhouette clarity. DO NOT include negative prompt terms here.",
"negativePrompt": "string (<= 400 chars). List visual errors to avoid (see defaults).",
"metadata": {
"word_tr": "WORD_TR",
"word_ar": "WORD_AR",
"category": "CATEGORY",
"cefr_level": "CEFR_LEVEL",
"difficulty": DIFFICULTY,
"usage_tr": "USAGE_TR",
"usage_ar": "USAGE_AR",
"style_tags": ["..."],
"sizing": {"width": 1024, "height": 1024},
"safety": {"nsfw": false, "brand_logos": false, "text": false}
}
}
### Negative prompt defaults (merge with any data-specific items)
blurry, noisy, low-res, jpeg artifacts, harsh rim light, busy background, text, captions, letters, watermark, logo, multiple objects, cut-off, cropped, extra limbs/parts, over-saturated, flat lighting, extreme perspective, dramatic tilt, grunge, matte painting, anime, lineart, isometric UI, pixel art
### Reasoning (internal, do NOT include in output)
1) Identify the concrete physical object that best teaches WORD_TR at CEFR_LEVEL.
2) Select materials & surface details that read clearly at small sizes.
3) Describe silhouette & key features to maximize recognizability.
4) Enforce constraints (lighting, background, no text/watermarks).
5) Keep the prompt compact and production-ready.
Return ONLY the JSON.
```
---
# ComfyUI Contract (what your renderer expects)
Your backend will take Gemini’s JSON and execute a saved **ComfyUI workflow** (saved in “API format”) by `POST http://127.0.0.1:8188/prompt` (or via WS `/ws`). Pass the `positivePrompt`/`negativePrompt`, sizing, and any LoRA/model settings as workflow inputs. ComfyUI workflows saved in API format maintain a stable node graph you can fill with variables. ([9elements][2])
**Optional:** If you prefer a schema-driven prompt node in-graph, consider the **PromptJSON** custom node to accept the Gemini JSON and map fields to Conditioning nodes. ([GitHub][3])
> Why this contract? ComfyUI’s API pattern is: **save a workflow in API format → send JSON to `/prompt` → poll/stream results**. This is the most reliable way to separate “prompt writing” (Gemini) from “image rendering” (ComfyUI). Multiple community write-ups document this workflow. ([9elements][2])
---
# Tasks (deliverable in independent slices)
**T1 — Icon Prompt Schema & Validator**
* Define the Gemini output JSON schema (above).
* Implement server-side validation (e.g., Zod/JSON Schema).
* Acceptance: invalid or oversized prompts are rejected with actionable errors.
**T2 — Gemini Prompting Module**
* Implement the **master system prompt** and per-word templating.
* Add temperature/top-p defaults and retries; log prompts & tokens.
* Acceptance: for a sample of 20 words, outputs valid JSON 100% of the time.
* (Uses Google’s prompt-strategy guidance: structure, role, constraints.) ([Google AI for Developers][1])
**T3 — ComfyUI Workflow (API Format)**
* Build a single-image workflow (SDXL or your chosen model) with inputs for: positive/negative prompt, width/height, seed, steps, cfg, sampler, and output filename.
* Save in API format; document all input keys.
* Acceptance: cURL to `/prompt` with test inputs returns a PNG in the output folder. ([9elements][2])
**T4 — Renderer Adapter (ComfyUI default)**
* Map Gemini JSON → ComfyUI inputs; call `POST /prompt`; capture image path/bytes.
* Support **seed control** and **determinism** toggles for regeneration.
* Acceptance: given the same inputs, regeneration produces identical results when seed is fixed. (Basic API usage covered in public guides.) ([DEV Community][4])
**T5 — Negative-Prompt Library for Icons**
* Centralize negative terms (from the master prompt) and allow per-category overrides (e.g., avoid “fabric folds” for non-textiles).
* Acceptance: swapping category adjusts the negative prompt set automatically.
* (Grounded in skeuomorphic icon guidance—focus on realism, lighting, texture.) ([Nielsen Norman Group][5])
**T6 — Provider Abstraction**
* Keep ComfyUI as default; define a common interface (`generateIcon(job)`) so you can later add OpenAI or others without touching UI/state.
* Guardrails: block queueing if provider credentials are missing/invalid.
**T7 — Job Queue + Idempotency**
* Enqueue word/category icon jobs; sequential or small batches.
* Store a **content hash** of (word, style, size, seed) to skip duplicates unless “Regenerate” is requested.
* Acceptance: re-enqueueing same payload is a no-op unless forced.
**T8 — Storage & Caching**
* On success, persist: `wordIconKey`, `wordIconUrl`, generation metadata (seed, model, sampler).
* Add a CDN-friendly path scheme: `/icons/{categorySlug}/{wordId}-{seed}.png`.
**T9 — UI Hooks**
* **Words Table**: Generate/Regenerate/Download per row; bulk generate for filtered set; disabled if queue running or keys invalid.
* **Categories List**: Generate category icons; apply to all words in category.
* **Icon Preview**: Show image + metadata; copy URL; apply to word/category.
**T10 — Safety & Abuse Controls**
* Block text/watermarks/logos; filter forbidden terms before sending to ComfyUI.
* Note: LLM prompt-injection risks exist in broader pipelines—keep generation inputs controlled (don’t pass untrusted HTML). ([TechRadar][6])
**T11 — Telemetry & Tracing**
* Log `request_id`, provider, latency, seed, errors; expose `/api/health` and job stats.
**T12 — Regression Pack**
* Golden prompts for ~30 words (varied categories).
* Snapshot tests compare SSIM/perceptual hash to detect major drift.
---
# Example: Gemini → ComfyUI payload mapping
**(A) Gemini output (for “ceket / معطف”, Clothing, A1):**
```json
{
"positivePrompt": "realistic skeuomorphic 3D app icon of a single white coat hanger holding a neatly folded light jacket, glossy fabric with subtle stitching, soft studio lighting from top-left, gentle specular highlights, smooth shading, centered object, sharp focus, plain light-gray background, friendly rounded proportions, polished look",
"negativePrompt": "text, letters, watermark, logos, multiple objects, cluttered background, blurry, noisy, low-res, harsh rim light, extreme perspective, dramatic tilt, grunge, matte painting, anime, lineart, isometric UI, pixel art, cut-off, cropped",
"metadata": {
"word_tr": "ceket",
"word_ar": "معطف",
"category": "Clothing",
"cefr_level": "A1",
"difficulty": 2,
"usage_tr": "Ceketimi dolaba astım.",
"usage_ar": "علّقتُ معطفي في الخزانة.",
"style_tags": ["friendly","polished"],
"sizing": {"width": 1024, "height": 1024},
"safety": {"nsfw": false, "brand_logos": false, "text": false}
}
}
```
**(B) ComfyUI `/prompt` (sketch of inputs for your saved workflow):**
```json
{
"prompt": {
"CLIPTextEncodePositive": {
"inputs": {
"text": "{positivePrompt}"
}
},
"CLIPTextEncodeNegative": {
"inputs": {
"text": "{negativePrompt}"
}
},
"KSampler": {
"inputs": {
"seed": 123456789,
"steps": 30,
"cfg": 5.5,
"sampler_name": "dpmpp_2m",
"scheduler": "karras"
}
},
"EmptyLatentImage": {
"inputs": {
"width": 1024,
"height": 1024
}
}
/* ... your model/LoRA nodes and SaveImage node ... */
},
"client_id": "icon-gen-ui",
"extra_data": { "request_id": "..." }
}
```
> Save your workflow in **API format** and document the exact node keys you want to substitute—this is the stable bridge your server fills in at runtime. ([9elements][2])
---
# Notes on style correctness
Skeuomorphic icons benefit from: **material realism, clean silhouette, controlled lighting, and zero text**. The prompt above encodes those constraints so the outputs read crisply at small sizes. (General skeuomorphic/icon guidelines referenced.) ([Nielsen Norman Group][5])
---
If you want, I can also supply a **ready-to-import ComfyUI workflow (API format)** with plac
- index.html +43 -14
|
@@ -156,11 +156,21 @@
|
|
| 156 |
<input type="text" id="style_tags" class="w-full px-4 py-3 border border-gray-300 rounded-lg focus:ring-2 focus:ring-purple-500 focus:border-transparent" placeholder="e.g., friendly, kids, polished">
|
| 157 |
</div>
|
| 158 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 159 |
<button type="submit" class="w-full py-4 bg-gradient-to-r from-purple-600 to-indigo-600 text-white font-semibold rounded-lg shadow-md hover:from-purple-700 hover:to-indigo-700 transform hover:scale-[1.02] transition-all duration-300">
|
| 160 |
Generate Icon
|
| 161 |
</button>
|
| 162 |
</form>
|
| 163 |
-
|
| 164 |
|
| 165 |
<div class="md:w-1/2 bg-gradient-to-br from-purple-500 to-indigo-600 p-8 flex flex-col items-center justify-center">
|
| 166 |
<div class="text-center mb-8">
|
|
@@ -296,7 +306,6 @@
|
|
| 296 |
// Form submission handler
|
| 297 |
document.getElementById('iconForm').addEventListener('submit', function(e) {
|
| 298 |
e.preventDefault();
|
| 299 |
-
|
| 300 |
// Get form values
|
| 301 |
const word_tr = document.getElementById('word_tr').value;
|
| 302 |
const word_ar = document.getElementById('word_ar').value;
|
|
@@ -304,22 +313,42 @@
|
|
| 304 |
const cefr_level = document.getElementById('cefr_level').value;
|
| 305 |
const difficulty = document.getElementById('difficulty').value;
|
| 306 |
const style_tags = document.getElementById('style_tags').value.split(',').map(tag => tag.trim()).filter(tag => tag);
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 307 |
|
| 308 |
-
// In a real app, this would send to your backend
|
| 309 |
-
// For demo, we'll just show a loading state and then a placeholder
|
| 310 |
const preview = document.getElementById('icon-preview');
|
| 311 |
preview.innerHTML = '<div class="animate-pulse-slow text-gray-500">Generating...</div>';
|
| 312 |
|
| 313 |
-
//
|
| 314 |
-
|
| 315 |
-
|
| 316 |
-
|
| 317 |
-
|
| 318 |
-
|
| 319 |
-
|
| 320 |
-
|
| 321 |
-
|
| 322 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 323 |
|
| 324 |
// Scroll functions
|
| 325 |
function scrollToGenerator() {
|
|
|
|
| 156 |
<input type="text" id="style_tags" class="w-full px-4 py-3 border border-gray-300 rounded-lg focus:ring-2 focus:ring-purple-500 focus:border-transparent" placeholder="e.g., friendly, kids, polished">
|
| 157 |
</div>
|
| 158 |
|
| 159 |
+
<div>
|
| 160 |
+
<label class="block text-gray-700 font-medium mb-2">Turkish Example Sentence</label>
|
| 161 |
+
<input type="text" id="usage_tr" class="w-full px-4 py-3 border border-gray-300 rounded-lg focus:ring-2 focus:ring-purple-500 focus:border-transparent" placeholder="e.g., Ceketimi dolaba astım.">
|
| 162 |
+
</div>
|
| 163 |
+
|
| 164 |
+
<div>
|
| 165 |
+
<label class="block text-gray-700 font-medium mb-2">Arabic Example Sentence</label>
|
| 166 |
+
<input type="text" id="usage_ar" class="w-full px-4 py-3 border border-gray-300 rounded-lg focus:ring-2 focus:ring-purple-500 focus:border-transparent" placeholder="e.g., علّقتُ معطفي في الخزانة.">
|
| 167 |
+
</div>
|
| 168 |
+
|
| 169 |
<button type="submit" class="w-full py-4 bg-gradient-to-r from-purple-600 to-indigo-600 text-white font-semibold rounded-lg shadow-md hover:from-purple-700 hover:to-indigo-700 transform hover:scale-[1.02] transition-all duration-300">
|
| 170 |
Generate Icon
|
| 171 |
</button>
|
| 172 |
</form>
|
| 173 |
+
</div>
|
| 174 |
|
| 175 |
<div class="md:w-1/2 bg-gradient-to-br from-purple-500 to-indigo-600 p-8 flex flex-col items-center justify-center">
|
| 176 |
<div class="text-center mb-8">
|
|
|
|
| 306 |
// Form submission handler
|
| 307 |
document.getElementById('iconForm').addEventListener('submit', function(e) {
|
| 308 |
e.preventDefault();
|
|
|
|
| 309 |
// Get form values
|
| 310 |
const word_tr = document.getElementById('word_tr').value;
|
| 311 |
const word_ar = document.getElementById('word_ar').value;
|
|
|
|
| 313 |
const cefr_level = document.getElementById('cefr_level').value;
|
| 314 |
const difficulty = document.getElementById('difficulty').value;
|
| 315 |
const style_tags = document.getElementById('style_tags').value.split(',').map(tag => tag.trim()).filter(tag => tag);
|
| 316 |
+
const usage_tr = document.getElementById('usage_tr').value;
|
| 317 |
+
const usage_ar = document.getElementById('usage_ar').value;
|
| 318 |
+
|
| 319 |
+
// Build request body
|
| 320 |
+
const body = {
|
| 321 |
+
word_tr,
|
| 322 |
+
word_ar,
|
| 323 |
+
category,
|
| 324 |
+
cefr_level,
|
| 325 |
+
difficulty: parseInt(difficulty),
|
| 326 |
+
usage_tr,
|
| 327 |
+
usage_ar,
|
| 328 |
+
style_tags
|
| 329 |
+
};
|
| 330 |
|
|
|
|
|
|
|
| 331 |
const preview = document.getElementById('icon-preview');
|
| 332 |
preview.innerHTML = '<div class="animate-pulse-slow text-gray-500">Generating...</div>';
|
| 333 |
|
| 334 |
+
// Call backend
|
| 335 |
+
fetch('/api/generate-icon', {
|
| 336 |
+
method: 'POST',
|
| 337 |
+
headers: { 'Content-Type': 'application/json' },
|
| 338 |
+
body: JSON.stringify(body)
|
| 339 |
+
})
|
| 340 |
+
.then(r => r.json())
|
| 341 |
+
.then(data => {
|
| 342 |
+
if (data.url) {
|
| 343 |
+
preview.innerHTML = `<img src="${data.url}" class="w-full h-full object-contain rounded-xl" alt="Generated icon">`;
|
| 344 |
+
} else {
|
| 345 |
+
throw new Error(data.error || 'Unknown error');
|
| 346 |
+
}
|
| 347 |
+
})
|
| 348 |
+
.catch(err => {
|
| 349 |
+
preview.innerHTML = `<div class="text-red-500 text-sm text-center">${err.message}</div>`;
|
| 350 |
+
});
|
| 351 |
+
});
|
| 352 |
|
| 353 |
// Scroll functions
|
| 354 |
function scrollToGenerator() {
|