srab1980 commited on
Commit
290b059
·
verified ·
1 Parent(s): 79f13bd

Got it—here’s a tight, production-oriented “master prompt + plan” for your icon-generation subsystem, built around **Gemini (to write the per-word prompts)** and **ComfyUI (to render the images)**. I’ve split it into discrete tasks so each can be delivered independently.

Browse files

# Master Prompt (used by Gemini for each word)

Use this as your **system**/**instruction** prompt to Gemini. It outputs a structured JSON that your backend can feed into ComfyUI (or other providers). It follows Google’s prompt-design guidance: clear role, constraints, schema’d output, and stepwise reasoning. ([Google AI for Developers][1])

```
You are an icon prompt engineer. Produce a concise, high-signal prompt set for generating a single, realistic skeuomorphic 3D application icon.

### Input variables
- WORD_TR: Turkish headword (string), e.g., "ceket"
- WORD_AR: Arabic translation (string), e.g., "معطف"
- CATEGORY: semantic category (string), e.g., "Clothing"
- CEFR_LEVEL: A1–C2 (string)
- DIFFICULTY: 1–5 (integer)
- USAGE_TR: short Turkish example sentence (string)
- USAGE_AR: short Arabic example sentence (string)
- STYLE_HINTS: optional array of tags (strings), e.g., ["friendly", "kids", "polished"]

### Icon style requirements (skeuomorphic 3D)
- Visual style: realistic skeuomorphic app icon (polished, friendly, rounded proportions)
- Materials: glossy surfaces, soft gradients, subtle highlights, smooth shading, realistic textures
- Lighting: consistent top-left key light; soft shadow beneath; studio feel
- Composition: single centered object, sharp focus, no background clutter
- Background: plain very-light gray or white
- No text, no letters, no watermarks, no UI chrome, no borders
- Fit for small sizes (app icon); emphasize clean silhouette and recognizability

### Cultural/semantic correctness
- Depict the object that best represents WORD_TR in CATEGORY
- If ambiguity exists, choose the most common, school-appropriate meaning for CEFR_LEVEL
- Avoid religious/political symbols and unsafe content

### Output format (strict JSON)
Return a single JSON object with:
{
"positivePrompt": "string (<= 500 chars). Rich visual description tailored to the icon. Include material, lighting, camera, silhouette clarity. DO NOT include negative prompt terms here.",
"negativePrompt": "string (<= 400 chars). List visual errors to avoid (see defaults).",
"metadata": {
"word_tr": "WORD_TR",
"word_ar": "WORD_AR",
"category": "CATEGORY",
"cefr_level": "CEFR_LEVEL",
"difficulty": DIFFICULTY,
"usage_tr": "USAGE_TR",
"usage_ar": "USAGE_AR",
"style_tags": ["..."],
"sizing": {"width": 1024, "height": 1024},
"safety": {"nsfw": false, "brand_logos": false, "text": false}
}
}

### Negative prompt defaults (merge with any data-specific items)
blurry, noisy, low-res, jpeg artifacts, harsh rim light, busy background, text, captions, letters, watermark, logo, multiple objects, cut-off, cropped, extra limbs/parts, over-saturated, flat lighting, extreme perspective, dramatic tilt, grunge, matte painting, anime, lineart, isometric UI, pixel art

### Reasoning (internal, do NOT include in output)
1) Identify the concrete physical object that best teaches WORD_TR at CEFR_LEVEL.
2) Select materials & surface details that read clearly at small sizes.
3) Describe silhouette & key features to maximize recognizability.
4) Enforce constraints (lighting, background, no text/watermarks).
5) Keep the prompt compact and production-ready.

Return ONLY the JSON.
```

---

# ComfyUI Contract (what your renderer expects)

Your backend will take Gemini’s JSON and execute a saved **ComfyUI workflow** (saved in “API format”) by `POST http://127.0.0.1:8188/prompt` (or via WS `/ws`). Pass the `positivePrompt`/`negativePrompt`, sizing, and any LoRA/model settings as workflow inputs. ComfyUI workflows saved in API format maintain a stable node graph you can fill with variables. ([9elements][2])

**Optional:** If you prefer a schema-driven prompt node in-graph, consider the **PromptJSON** custom node to accept the Gemini JSON and map fields to Conditioning nodes. ([GitHub][3])

> Why this contract? ComfyUI’s API pattern is: **save a workflow in API format → send JSON to `/prompt` → poll/stream results**. This is the most reliable way to separate “prompt writing” (Gemini) from “image rendering” (ComfyUI). Multiple community write-ups document this workflow. ([9elements][2])

---

# Tasks (deliverable in independent slices)

**T1 — Icon Prompt Schema & Validator**

* Define the Gemini output JSON schema (above).
* Implement server-side validation (e.g., Zod/JSON Schema).
* Acceptance: invalid or oversized prompts are rejected with actionable errors.

**T2 — Gemini Prompting Module**

* Implement the **master system prompt** and per-word templating.
* Add temperature/top-p defaults and retries; log prompts & tokens.
* Acceptance: for a sample of 20 words, outputs valid JSON 100% of the time.
* (Uses Google’s prompt-strategy guidance: structure, role, constraints.) ([Google AI for Developers][1])

**T3 — ComfyUI Workflow (API Format)**

* Build a single-image workflow (SDXL or your chosen model) with inputs for: positive/negative prompt, width/height, seed, steps, cfg, sampler, and output filename.
* Save in API format; document all input keys.
* Acceptance: cURL to `/prompt` with test inputs returns a PNG in the output folder. ([9elements][2])

**T4 — Renderer Adapter (ComfyUI default)**

* Map Gemini JSON → ComfyUI inputs; call `POST /prompt`; capture image path/bytes.
* Support **seed control** and **determinism** toggles for regeneration.
* Acceptance: given the same inputs, regeneration produces identical results when seed is fixed. (Basic API usage covered in public guides.) ([DEV Community][4])

**T5 — Negative-Prompt Library for Icons**

* Centralize negative terms (from the master prompt) and allow per-category overrides (e.g., avoid “fabric folds” for non-textiles).
* Acceptance: swapping category adjusts the negative prompt set automatically.
* (Grounded in skeuomorphic icon guidance—focus on realism, lighting, texture.) ([Nielsen Norman Group][5])

**T6 — Provider Abstraction**

* Keep ComfyUI as default; define a common interface (`generateIcon(job)`) so you can later add OpenAI or others without touching UI/state.
* Guardrails: block queueing if provider credentials are missing/invalid.

**T7 — Job Queue + Idempotency**

* Enqueue word/category icon jobs; sequential or small batches.
* Store a **content hash** of (word, style, size, seed) to skip duplicates unless “Regenerate” is requested.
* Acceptance: re-enqueueing same payload is a no-op unless forced.

**T8 — Storage & Caching**

* On success, persist: `wordIconKey`, `wordIconUrl`, generation metadata (seed, model, sampler).
* Add a CDN-friendly path scheme: `/icons/{categorySlug}/{wordId}-{seed}.png`.

**T9 — UI Hooks**

* **Words Table**: Generate/Regenerate/Download per row; bulk generate for filtered set; disabled if queue running or keys invalid.
* **Categories List**: Generate category icons; apply to all words in category.
* **Icon Preview**: Show image + metadata; copy URL; apply to word/category.

**T10 — Safety & Abuse Controls**

* Block text/watermarks/logos; filter forbidden terms before sending to ComfyUI.
* Note: LLM prompt-injection risks exist in broader pipelines—keep generation inputs controlled (don’t pass untrusted HTML). ([TechRadar][6])

**T11 — Telemetry & Tracing**

* Log `request_id`, provider, latency, seed, errors; expose `/api/health` and job stats.

**T12 — Regression Pack**

* Golden prompts for ~30 words (varied categories).
* Snapshot tests compare SSIM/perceptual hash to detect major drift.

---

# Example: Gemini → ComfyUI payload mapping

**(A) Gemini output (for “ceket / معطف”, Clothing, A1):**

```json
{
"positivePrompt": "realistic skeuomorphic 3D app icon of a single white coat hanger holding a neatly folded light jacket, glossy fabric with subtle stitching, soft studio lighting from top-left, gentle specular highlights, smooth shading, centered object, sharp focus, plain light-gray background, friendly rounded proportions, polished look",
"negativePrompt": "text, letters, watermark, logos, multiple objects, cluttered background, blurry, noisy, low-res, harsh rim light, extreme perspective, dramatic tilt, grunge, matte painting, anime, lineart, isometric UI, pixel art, cut-off, cropped",
"metadata": {
"word_tr": "ceket",
"word_ar": "معطف",
"category": "Clothing",
"cefr_level": "A1",
"difficulty": 2,
"usage_tr": "Ceketimi dolaba astım.",
"usage_ar": "علّقتُ معطفي في الخزانة.",
"style_tags": ["friendly","polished"],
"sizing": {"width": 1024, "height": 1024},
"safety": {"nsfw": false, "brand_logos": false, "text": false}
}
}
```

**(B) ComfyUI `/prompt` (sketch of inputs for your saved workflow):**

```json
{
"prompt": {
"CLIPTextEncodePositive": {
"inputs": {
"text": "{positivePrompt}"
}
},
"CLIPTextEncodeNegative": {
"inputs": {
"text": "{negativePrompt}"
}
},
"KSampler": {
"inputs": {
"seed": 123456789,
"steps": 30,
"cfg": 5.5,
"sampler_name": "dpmpp_2m",
"scheduler": "karras"
}
},
"EmptyLatentImage": {
"inputs": {
"width": 1024,
"height": 1024
}
}
/* ... your model/LoRA nodes and SaveImage node ... */
},
"client_id": "icon-gen-ui",
"extra_data": { "request_id": "..." }
}
```

> Save your workflow in **API format** and document the exact node keys you want to substitute—this is the stable bridge your server fills in at runtime. ([9elements][2])

---

# Notes on style correctness

Skeuomorphic icons benefit from: **material realism, clean silhouette, controlled lighting, and zero text**. The prompt above encodes those constraints so the outputs read crisply at small sizes. (General skeuomorphic/icon guidelines referenced.) ([Nielsen Norman Group][5])

---

If you want, I can also supply a **ready-to-import ComfyUI workflow (API format)** with plac

Files changed (1) hide show
  1. index.html +43 -14
index.html CHANGED
@@ -156,11 +156,21 @@
156
  <input type="text" id="style_tags" class="w-full px-4 py-3 border border-gray-300 rounded-lg focus:ring-2 focus:ring-purple-500 focus:border-transparent" placeholder="e.g., friendly, kids, polished">
157
  </div>
158
 
 
 
 
 
 
 
 
 
 
 
159
  <button type="submit" class="w-full py-4 bg-gradient-to-r from-purple-600 to-indigo-600 text-white font-semibold rounded-lg shadow-md hover:from-purple-700 hover:to-indigo-700 transform hover:scale-[1.02] transition-all duration-300">
160
  Generate Icon
161
  </button>
162
  </form>
163
- </div>
164
 
165
  <div class="md:w-1/2 bg-gradient-to-br from-purple-500 to-indigo-600 p-8 flex flex-col items-center justify-center">
166
  <div class="text-center mb-8">
@@ -296,7 +306,6 @@
296
  // Form submission handler
297
  document.getElementById('iconForm').addEventListener('submit', function(e) {
298
  e.preventDefault();
299
-
300
  // Get form values
301
  const word_tr = document.getElementById('word_tr').value;
302
  const word_ar = document.getElementById('word_ar').value;
@@ -304,22 +313,42 @@
304
  const cefr_level = document.getElementById('cefr_level').value;
305
  const difficulty = document.getElementById('difficulty').value;
306
  const style_tags = document.getElementById('style_tags').value.split(',').map(tag => tag.trim()).filter(tag => tag);
 
 
 
 
 
 
 
 
 
 
 
 
 
 
307
 
308
- // In a real app, this would send to your backend
309
- // For demo, we'll just show a loading state and then a placeholder
310
  const preview = document.getElementById('icon-preview');
311
  preview.innerHTML = '<div class="animate-pulse-slow text-gray-500">Generating...</div>';
312
 
313
- // Simulate API call delay
314
- setTimeout(() => {
315
- // In a real app, you would show the generated icon
316
- preview.innerHTML = `
317
- <div class="bg-gradient-to-br from-purple-400 to-indigo-500 w-full h-full rounded-xl flex items-center justify-center">
318
- <span class="text-white font-bold text-4xl">${word_tr.charAt(0).toUpperCase()}</span>
319
- </div>
320
- `;
321
- }, 2000);
322
- });
 
 
 
 
 
 
 
 
323
 
324
  // Scroll functions
325
  function scrollToGenerator() {
 
156
  <input type="text" id="style_tags" class="w-full px-4 py-3 border border-gray-300 rounded-lg focus:ring-2 focus:ring-purple-500 focus:border-transparent" placeholder="e.g., friendly, kids, polished">
157
  </div>
158
 
159
+ <div>
160
+ <label class="block text-gray-700 font-medium mb-2">Turkish Example Sentence</label>
161
+ <input type="text" id="usage_tr" class="w-full px-4 py-3 border border-gray-300 rounded-lg focus:ring-2 focus:ring-purple-500 focus:border-transparent" placeholder="e.g., Ceketimi dolaba astım.">
162
+ </div>
163
+
164
+ <div>
165
+ <label class="block text-gray-700 font-medium mb-2">Arabic Example Sentence</label>
166
+ <input type="text" id="usage_ar" class="w-full px-4 py-3 border border-gray-300 rounded-lg focus:ring-2 focus:ring-purple-500 focus:border-transparent" placeholder="e.g., علّقتُ معطفي في الخزانة.">
167
+ </div>
168
+
169
  <button type="submit" class="w-full py-4 bg-gradient-to-r from-purple-600 to-indigo-600 text-white font-semibold rounded-lg shadow-md hover:from-purple-700 hover:to-indigo-700 transform hover:scale-[1.02] transition-all duration-300">
170
  Generate Icon
171
  </button>
172
  </form>
173
+ </div>
174
 
175
  <div class="md:w-1/2 bg-gradient-to-br from-purple-500 to-indigo-600 p-8 flex flex-col items-center justify-center">
176
  <div class="text-center mb-8">
 
306
  // Form submission handler
307
  document.getElementById('iconForm').addEventListener('submit', function(e) {
308
  e.preventDefault();
 
309
  // Get form values
310
  const word_tr = document.getElementById('word_tr').value;
311
  const word_ar = document.getElementById('word_ar').value;
 
313
  const cefr_level = document.getElementById('cefr_level').value;
314
  const difficulty = document.getElementById('difficulty').value;
315
  const style_tags = document.getElementById('style_tags').value.split(',').map(tag => tag.trim()).filter(tag => tag);
316
+ const usage_tr = document.getElementById('usage_tr').value;
317
+ const usage_ar = document.getElementById('usage_ar').value;
318
+
319
+ // Build request body
320
+ const body = {
321
+ word_tr,
322
+ word_ar,
323
+ category,
324
+ cefr_level,
325
+ difficulty: parseInt(difficulty),
326
+ usage_tr,
327
+ usage_ar,
328
+ style_tags
329
+ };
330
 
 
 
331
  const preview = document.getElementById('icon-preview');
332
  preview.innerHTML = '<div class="animate-pulse-slow text-gray-500">Generating...</div>';
333
 
334
+ // Call backend
335
+ fetch('/api/generate-icon', {
336
+ method: 'POST',
337
+ headers: { 'Content-Type': 'application/json' },
338
+ body: JSON.stringify(body)
339
+ })
340
+ .then(r => r.json())
341
+ .then(data => {
342
+ if (data.url) {
343
+ preview.innerHTML = `<img src="${data.url}" class="w-full h-full object-contain rounded-xl" alt="Generated icon">`;
344
+ } else {
345
+ throw new Error(data.error || 'Unknown error');
346
+ }
347
+ })
348
+ .catch(err => {
349
+ preview.innerHTML = `<div class="text-red-500 text-sm text-center">${err.message}</div>`;
350
+ });
351
+ });
352
 
353
  // Scroll functions
354
  function scrollToGenerator() {