Upload 10 files

Browse files

Files changed (11) hide show

.gitattributes +3 -0
LICENSE +21 -0
README.md +142 -3
VisionHarvester-PoseStyleExtractor.json +1 -0
examples/example_01.png +3 -0
examples/example_01_output.txt +1 -0
examples/example_02.png +3 -0
examples/example_02_output.txt +1 -0
examples/example_03.png +3 -0
examples/example_03_output.txt +1 -0
visionharvester_v1_extractor.prompt.txt +10 -0

.gitattributes CHANGED Viewed

@@ -33,3 +33,6 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+examples/example_01.png filter=lfs diff=lfs merge=lfs -text
+examples/example_02.png filter=lfs diff=lfs merge=lfs -text
+examples/example_03.png filter=lfs diff=lfs merge=lfs -text

LICENSE ADDED Viewed

	@@ -0,0 +1,21 @@

+MIT License
+Copyright (c) 2025 GritAI Solutions LLC
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.

README.md CHANGED Viewed

@@ -1,3 +1,142 @@
----
-license: mit
----

+# VisionHarvester v1 — Identity-Safe Image Style & Pose Extractor
+**By GritAI Solutions LLC**
+VisionHarvester v1 is a lightweight prompt-based extraction tool for creators working with Stable Diffusion, ComfyUI, SDXL, LoRA training, and Qwen-VL workflows.
+It converts a reference image into clean, modular, identity-safe text blocks you can reuse for character building, style replication, dataset generation, and scene reconstruction — without copying real people.
+---
+## 🚀 What VisionHarvester v1 Does
+VisionHarvester is built around five reusable components:
+### 1. Base Identity (Safe & Generic)
+A neutral description of the subject that includes:
+- General body type
+- Hair color and basic hairstyle
+- Clothing and fabric behavior
+- Broad, non-identifying facial description
+No pose, no emotion, no personality.
+### 2. Pose (Geometry Only)
+Short, comma-separated fragments describing:
+- Limb positions
+- Body orientation
+- Weight distribution
+- Head/hip angles
+No outfit, no style, no emotion.
+### 3. Outfit & Materials
+Details about:
+- Clothing type and cut
+- Colors
+- Fabric texture and behavior (matte, glossy, stretchy)
+- Accessories
+### 4. Camera & Lighting
+Information about:
+- Framing (close-up, half body, full body)
+- Camera angle / lens feel
+- Lighting direction and softness
+- Major shadows and highlights
+### 5. Style Tags
+Reusable tags such as:
+- studio fitness look
+- clean background
+- soft cinematic lighting
+- high-resolution texture
+These drop straight into Stable Diffusion prompts.
+---
+## 📂 Included in This Repository
+- README.md — this documentation
+- LICENSE — MIT license
+- isionharvester_v1_extractor.prompt.txt — main extraction prompt
+- VisionHarvester-PoseStyleExtractor.json — ComfyUI workflow (optional)
+- examples/ — sample images and their extracted outputs
+Example files:
+- examples/example_01.png
+- examples/example_01_output.txt
+- examples/example_02.png
+- examples/example_02_output.txt
+- examples/example_03.png
+- examples/example_03_output.txt
+---
+## 🟩 Main Extraction Prompt (v1)
+This is the core VisionHarvester v1 prompt shipped in isionharvester_v1_extractor.prompt.txt:
+\\\
+Extract a clean, neutral description of the woman in the image.
+Keep it simple:
+• No pose or body positioning
+• No emotions or personality
+• No unique facial identifiers
+• No NSFW content
+• Do describe hair, body type (general), clothing, colors, fabrics, and broad facial features
+Output 2–4 sentences that would work as a Stable Diffusion base identity block.
+\\\
+Use this in:
+- Qwen-VL custom prompt
+- Any Vision-LLM
+- ComfyUI Qwen nodes
+- Image-to-text or SD prompt pipelines
+---
+## 🖼 Example Outputs
+Example of the kind of identity-safe text this prompt produces:
+\\\
+An athletic woman with long dark hair, a medium tan complexion, and soft neutral facial features without distinctive identifiers. She is wearing a fitted black sports bra made from matte stretch fabric and high-waisted leggings. Her appearance is clean, simple, and suitable as a Stable Diffusion base identity.
+\\\
+---
+## 🔒 Identity & Safety
+VisionHarvester v1 is designed to:
+- Avoid 1:1 face cloning
+- Avoid unique facial markers
+- Avoid real-person or celebrity references
+- Avoid explicit or NSFW content
+It focuses on style, clothing, pose, and scene — not identity.
+---
+## 🧩 Use Cases
+- Character consistency
+- Pose and outfit reuse
+- LoRA dataset prep
+- Style transfer
+- Scene reconstruction
+- Visual prompt creation
+- Multi-lane ComfyUI pipelines
+---
+## 🧱 Author
+**GritAI Solutions LLC**
+Robert "BonusLockSmith" Lucyk
+Lawton, Oklahoma
+---
+MIT licensed. Free for personal and commercial use.

VisionHarvester-PoseStyleExtractor.json ADDED Viewed

	@@ -0,0 +1 @@

+ {"id":"6e0f9792-1d56-4fd7-ad14-d1269803c658","revision":0,"last_node_id":21,"last_link_id":25,"nodes":[{"id":4,"type":"SaveImage","pos":[-1296.7420260001102,-3007.666190531094],"size":[270,270],"flags":{},"order":7,"mode":0,"inputs":[{"localized_name":"images","name":"images","type":"IMAGE","link":3},{"localized_name":"filename_prefix","name":"filename_prefix","type":"STRING","widget":{"name":"filename_prefix"},"link":null}],"outputs":[],"properties":{"cnr_id":"comfy-core","ver":"0.3.71","Node name for S&R":"SaveImage","ue_properties":{"widget_ue_connectable":{},"input_ue_unconnectable":{},"version":"7.4.1"}},"widgets_values":["VisionHarvester/images/ComfyUI_"]},{"id":2,"type":"ImageResize+","pos":[-2589.521058680804,-2739.633891185613],"size":[270,218],"flags":{},"order":4,"mode":0,"inputs":[{"localized_name":"image","name":"image","type":"IMAGE","link":1},{"localized_name":"width","name":"width","type":"INT","widget":{"name":"width"},"link":null},{"localized_name":"height","name":"height","type":"INT","widget":{"name":"height"},"link":null},{"localized_name":"interpolation","name":"interpolation","type":"COMBO","widget":{"name":"interpolation"},"link":null},{"localized_name":"method","name":"method","type":"COMBO","widget":{"name":"method"},"link":null},{"localized_name":"condition","name":"condition","type":"COMBO","widget":{"name":"condition"},"link":null},{"localized_name":"multiple_of","name":"multiple_of","type":"INT","widget":{"name":"multiple_of"},"link":null}],"outputs":[{"localized_name":"IMAGE","name":"IMAGE","type":"IMAGE","links":[2,3]},{"localized_name":"width","name":"width","type":"INT","links":null},{"localized_name":"height","name":"height","type":"INT","links":null}],"properties":{"cnr_id":"comfyui_essentials","ver":"1.1.0","Node name for S&R":"ImageResize+","ue_properties":{"widget_ue_connectable":{},"input_ue_unconnectable":{},"version":"7.4.1"}},"widgets_values":[832,1216,"lanczos","keep proportion","always",0]},{"id":1,"type":"LoadImage","pos":[-2983.0397680299106,-2730.082471465146],"size":[274.080078125,314],"flags":{},"order":0,"mode":0,"inputs":[{"localized_name":"image","name":"image","type":"COMBO","widget":{"name":"image"},"link":null},{"localized_name":"choose file to upload","name":"upload","type":"IMAGEUPLOAD","widget":{"name":"upload"},"link":null}],"outputs":[{"localized_name":"IMAGE","name":"IMAGE","type":"IMAGE","links":[1]},{"localized_name":"MASK","name":"MASK","type":"MASK","links":null}],"properties":{"cnr_id":"comfy-core","ver":"0.3.71","Node name for S&R":"LoadImage","ue_properties":{"widget_ue_connectable":{},"input_ue_unconnectable":{},"version":"7.4.1"}},"widgets_values":["v1_base_Paladin_01_00002_.png","image"]},{"id":15,"type":"VAEDecode","pos":[-1816.4155165225534,-1484.74221751387],"size":[210,46],"flags":{},"order":11,"mode":0,"inputs":[{"localized_name":"samples","name":"samples","type":"LATENT","link":20},{"localized_name":"vae","name":"vae","type":"VAE","link":21}],"outputs":[{"localized_name":"IMAGE","name":"IMAGE","type":"IMAGE","slot_index":0,"links":[22]}],"properties":{"cnr_id":"comfy-core","ver":"0.3.65","Node name for S&R":"VAEDecode","ue_properties":{"widget_ue_connectable":{},"version":"7.4.1","input_ue_unconnectable":{}}},"widgets_values":[]},{"id":19,"type":"CheckpointLoaderSimple","pos":[-2731.415516522554,-1984.7422175138709],"size":[315,98],"flags":{},"order":1,"mode":0,"inputs":[{"localized_name":"ckpt_name","name":"ckpt_name","type":"COMBO","widget":{"name":"ckpt_name"},"link":null}],"outputs":[{"localized_name":"MODEL","name":"MODEL","type":"MODEL","slot_index":0,"links":[16]},{"localized_name":"CLIP","name":"CLIP","type":"CLIP","slot_index":1,"links":[23,24]},{"localized_name":"VAE","name":"VAE","type":"VAE","slot_index":2,"links":[21]}],"properties":{"cnr_id":"comfy-core","ver":"0.3.65","Node name for S&R":"CheckpointLoaderSimple","ue_properties":{"widget_ue_connectable":{},"version":"7.4.1","input_ue_unconnectable":{}},"models":[{"name":"v1-5-pruned-emaonly-fp16.safetensors","url":"https://huggingface.co/Comfy-Org/stable-diffusion-v1-5-archive/resolve/main/v1-5-pruned-emaonly-fp16.safetensors?download=true","directory":"checkpoints"}]},"widgets_values":["cyberrealistic_v90.safetensors"]},{"id":14,"type":"KSampler","pos":[-1921.4155165225534,-2014.7422175138709],"size":[315,474],"flags":{},"order":10,"mode":0,"inputs":[{"localized_name":"model","name":"model","type":"MODEL","link":16},{"localized_name":"positive","name":"positive","type":"CONDITIONING","link":17},{"localized_name":"negative","name":"negative","type":"CONDITIONING","link":18},{"localized_name":"latent_image","name":"latent_image","type":"LATENT","link":19},{"localized_name":"seed","name":"seed","type":"INT","widget":{"name":"seed"},"link":null},{"localized_name":"steps","name":"steps","type":"INT","widget":{"name":"steps"},"link":null},{"localized_name":"cfg","name":"cfg","type":"FLOAT","widget":{"name":"cfg"},"link":null},{"localized_name":"sampler_name","name":"sampler_name","type":"COMBO","widget":{"name":"sampler_name"},"link":null},{"localized_name":"scheduler","name":"scheduler","type":"COMBO","widget":{"name":"scheduler"},"link":null},{"localized_name":"denoise","name":"denoise","type":"FLOAT","widget":{"name":"denoise"},"link":null}],"outputs":[{"localized_name":"LATENT","name":"LATENT","type":"LATENT","slot_index":0,"links":[20]}],"properties":{"cnr_id":"comfy-core","ver":"0.3.65","Node name for S&R":"KSampler","ue_properties":{"widget_ue_connectable":{},"version":"7.4.1","input_ue_unconnectable":{}}},"widgets_values":[400782778445824,"randomize",20,8,"euler","normal",1]},{"id":5,"type":"Save Text File","pos":[-1338.6015503133458,-2578.2798589391555],"size":[306.1015625,202],"flags":{},"order":8,"mode":0,"inputs":[{"localized_name":"text","name":"text","type":"STRING","link":4},{"localized_name":"path","name":"path","type":"STRING","widget":{"name":"path"},"link":null},{"localized_name":"filename_prefix","name":"filename_prefix","type":"STRING","widget":{"name":"filename_prefix"},"link":null},{"localized_name":"filename_delimiter","name":"filename_delimiter","type":"STRING","widget":{"name":"filename_delimiter"},"link":null},{"localized_name":"filename_number_padding","name":"filename_number_padding","type":"INT","widget":{"name":"filename_number_padding"},"link":null},{"localized_name":"file_extension","name":"file_extension","shape":7,"type":"STRING","widget":{"name":"file_extension"},"link":null},{"localized_name":"encoding","name":"encoding","shape":7,"type":"STRING","widget":{"name":"encoding"},"link":null},{"localized_name":"filename_suffix","name":"filename_suffix","shape":7,"type":"STRING","widget":{"name":"filename_suffix"},"link":null}],"outputs":[],"properties":{"cnr_id":"was-ns","ver":"3.0.1","Node name for S&R":"Save Text File","ue_properties":{"widget_ue_connectable":{},"input_ue_unconnectable":{},"version":"7.4.1"}},"widgets_values":["VisionHarvester/descriptions/ComfyUI_","","_",4,".txt","utf-8",""]},{"id":16,"type":"SaveImage","pos":[-1571.4155165225534,-2014.7422175138709],"size":[470,560],"flags":{},"order":12,"mode":0,"inputs":[{"localized_name":"images","name":"images","type":"IMAGE","link":22},{"localized_name":"filename_prefix","name":"filename_prefix","type":"STRING","widget":{"name":"filename_prefix"},"link":null}],"outputs":[],"properties":{"cnr_id":"comfy-core","ver":"0.3.65","Node name for S&R":"SaveImage","ue_properties":{"widget_ue_connectable":{},"version":"7.4.1","input_ue_unconnectable":{}}},"widgets_values":["VisionHarvester/rerun/ComfyUI_"]},{"id":18,"type":"EmptyLatentImage","pos":[-2271.4155165225534,-1494.74221751387],"size":[315,106],"flags":{},"order":2,"mode":0,"inputs":[{"localized_name":"width","name":"width","type":"INT","widget":{"name":"width"},"link":null},{"localized_name":"height","name":"height","type":"INT","widget":{"name":"height"},"link":null},{"localized_name":"batch_size","name":"batch_size","type":"INT","widget":{"name":"batch_size"},"link":null}],"outputs":[{"localized_name":"LATENT","name":"LATENT","type":"LATENT","slot_index":0,"links":[19]}],"properties":{"cnr_id":"comfy-core","ver":"0.3.65","Node name for S&R":"EmptyLatentImage","ue_properties":{"widget_ue_connectable":{},"version":"7.4.1","input_ue_unconnectable":{}}},"widgets_values":[832,1216,1]},{"id":17,"type":"CLIPTextEncode","pos":[-2381.4155165225534,-1774.7422175138695],"size":[425.27801513671875,180.6060791015625],"flags":{},"order":5,"mode":0,"inputs":[{"localized_name":"clip","name":"clip","type":"CLIP","link":23},{"localized_name":"text","name":"text","type":"STRING","widget":{"name":"text"},"link":null}],"outputs":[{"localized_name":"CONDITIONING","name":"CONDITIONING","type":"CONDITIONING","slot_index":0,"links":[18]}],"properties":{"cnr_id":"comfy-core","ver":"0.3.65","Node name for S&R":"CLIPTextEncode","ue_properties":{"widget_ue_connectable":{},"version":"7.4.1","input_ue_unconnectable":{}}},"widgets_values":["bad anatomy, deformed face, asymmetry, wrong skin tone, blurry, cartoon, CGI, extra limbs"],"color":"#223","bgcolor":"#335"},{"id":20,"type":"CLIPTextEncode","pos":[-2387.706260435692,-1984.0581775706505],"size":[422.84503173828125,164.31304931640625],"flags":{},"order":9,"mode":0,"inputs":[{"localized_name":"clip","name":"clip","type":"CLIP","link":24},{"localized_name":"text","name":"text","type":"STRING","widget":{"name":"text"},"link":25}],"outputs":[{"localized_name":"CONDITIONING","name":"CONDITIONING","type":"CONDITIONING","slot_index":0,"links":[17]}],"properties":{"cnr_id":"comfy-core","ver":"0.3.65","Node name for S&R":"CLIPTextEncode","ue_properties":{"widget_ue_connectable":{},"version":"7.4.1","input_ue_unconnectable":{}}},"widgets_values":[""],"color":"#232","bgcolor":"#353"},{"id":21,"type":"Note","pos":[-3475.4594977571664,-2650.763830733268],"size":[417.7683774744314,547.9715107219222],"flags":{},"order":3,"mode":0,"inputs":[],"outputs":[],"title":"ootblock","properties":{},"widgets_values":["🟩 VisionHarvester Hybrid vNext — Chatbot Bootblock\n\nRole:\nYou are VisionHarvester, a pose, style, and prompt extraction assistant built on top of a hybrid Stable Diffusion workflow (SD1.5 → SDXL) with Qwen-VL for image understanding.\n\nYour job is to:\n\nExplain what this workflow does in clear, non-technical language to normal users.\n\nHelp users pull out styles, outfits, poses, camera angles, and lighting from reference images.\n\nReturn Stable Diffusion–ready prompt blocks that plug directly into a multi-lane ComfyUI workflow.\n\nKeep faces generic and non-identifying. Users can get everything except a detailed face description.\n\n1. High-Level Explanation (User-Facing)\n\nWhen a new user arrives, briefly explain yourself like this (paraphrase is okay, but keep the structure):\n\nI’m VisionHarvester.\nYou can give me reference images, and I’ll break them down into clear building blocks for image generation:\n\nPose (body position and angles)\n\nOutfit & materials (clothing, fabric types, accessories)\n\nCamera & lighting (framing, lens feel, light direction)\n\nStyle tags (e.g., studio fitness look, soft cinematic lighting)\n\nI’ll turn those into Stable Diffusion–ready prompt fragments that you or your workflow can reuse.\nI deliberately keep facial details vague to protect identity — you get the full style and body language without a 1:1 face copy.\n\nDo not mention internal implementation details like ControlNet, SD1.5 vs SDXL, or Qwen-VL unless the user explicitly asks.\n\n2. Core Capabilities (What You Do)\n\nYou support these main tasks:\n\nPose Extraction\n\nOutput short, comma-separated fragments describing only body geometry.\n\nNo mood, no style, no clothing, no face details.\n\nOutfit & Material Extraction\n\nDescribe clothing type, cut, length, fabric behavior (matte, glossy, stretchy), and colors.\n\nAvoid brand logos or real-world branding unless the user explicitly wants them.\n\nCamera & Lighting Extraction\n\nDescribe framing (close-up, half body, full body), camera angle, and approximate focal length feel.\n\nDescribe lighting direction, softness, and major shadows/highlights.\n\nStyle & Atmosphere Tags\n\nDescribe overall look: studio fitness, lifestyle, high-key, low-key, etc.\n\nThese should be usable as style tags in a prompt.\n\nPrompt Block Assembly\n\nOrganize outputs into separate lanes:\n\nBase Identity (public-safe, no pose)\n\nPose\n\nOutfit\n\nCamera & Lighting\n\nStyle/Detail\n\nOptionally suggest negative prompt fragments if user asks (e.g., “no extra limbs, no distorted hands”).\n\n3. Safety & Face Handling Rules\n\nYou must always follow these constraints:\n\nNo detailed face cloning.\n\nDescribe faces only in broad, non-identifying terms:\n\nAllowed: “soft neutral facial features”, “generic feminine face”, “short beard”, “light makeup”\n\nNot allowed: specific scars, moles, freckles patterns, celebrity likeness, or any unique marks.\n\nBase Identity is public-safe and generic.\n\nThe default body description should be something like:\n\n“an athletic woman with long chestnut hair, wearing a black halter cropped sports top made from smooth stretch fabric and fitted dark yoga shorts with clean seams and a matte finish, soft neutral facial features without distinctive marks”\n\nDo not include pose language (no “standing”, “arms lifted”, etc.) in this Base Identity block.\n\nNo emotions, no personality terms in Base Identity.\n\nNo explicit content.\n\nIf a user pushes for NSFW, refuse and gently redirect to safe, clothed, fitness or casual styles.\n\n4. Internal Prompt Structure (How You Format Outputs)\n\nWhenever you return SD-ready content, use this structure by default:\n\n[BASE IDENTITY — PUBLIC-SAFE]\nan athletic woman with long chestnut hair, wearing a black halter cropped sports top made from smooth stretch fabric and fitted dark yoga shorts with clean seams and a matte finish, soft neutral facial features without distinctive marks\n\n[POSE]\ntorso facing slight left, shoulders relaxed, left hip shifted, right leg straight, left leg slightly bent, arms hanging naturally at sides, head level, weight mostly on right leg\n\n[OUTFIT & MATERIALS]\nblack halter cropped sports top, smooth matte athletic fabric, fitted dark yoga shorts with clean seams, no logos, subtle fabric folds at waist and hips\n\n[CAMERA & LIGHTING]\nfull body framing, eye-level camera, approximately 50mm lens feel, soft even gym-style lighting, faint reflections on floor, slightly blurred background\n\n[STYLE & DETAIL TAGS]\nstudio fitness look, clean background, subtle skin sheen, realistic fabric folds, high-resolution, minimal clutter\n\n[OPTIONAL NEGATIVE PROMPT]\nextra limbs, deformed hands, distorted anatomy, text, watermarks, logos\n\n\nRules:\n\nNo pose terms in [BASE IDENTITY].\n\n[POSE] is always short, comma-separated geometry fragments.\n\n[OUTFIT & MATERIALS] focuses on clothing & fabric behavior, not anatomy.\n\n[CAMERA & LIGHTING] is practical: framing + lens feel + light description.\n\n[STYLE & DETAIL TAGS] are reusable tags, not full sentences.\n\n5. How Users “Pull Styles and Such” Out of Images\n\nYou must help users in plain language. Examples of supported requests:\n\n“Extract just the pose from this image.”\n\n“Give me the outfit and material description only.”\n\n“I want the lighting and camera angle from this shot, but nothing else.”\n\n“Turn this into SD prompt blocks I can reuse, with a generic safe face.”\n\n“Give me multiple style tags that describe this image’s vibe.”\n\n“Combine this pose with a different gym style — outline the blocks.”\n\nFor each, respond using the block structure above and only fill in relevant sections.\nIf they ask for “pose only,” you might reply:\n\n[POSE]\ntorso facing forward, shoulders slightly back, both legs straight, feet shoulder-width apart, arms relaxed by sides, head looking straight ahead\n\n6. Architectural Constraints (Don’t Break These)\n\nInternally you are aware of the following design rules and always respect them, even if the user isn’t talking in those terms:\n\nNo pose language in Base Identity.\nPose is handled entirely in [POSE] + ControlNet in the underlying workflow.\n\nNo abstract words in Base Identity.\nNo “confident”, “cinematic”, “hyper-realistic”, etc. Base Identity = literal anatomy + clothing.\n\nPose text must be narrow and geometric.\nShort fragments, no style or emotion.\n\nIdentity & outfit live in Base/Outfit lanes only.\nPose, camera, style, and detail must not change identity.\n\nFace is intentionally generic.\nYou never output detailed, unique facial identifiers.\n\n7. Example User Prompts & Expected Behavior\n\nExample 1 — Full Breakdown\n\nUser:\n\nHere’s an image. Break it into reusable prompt blocks, but keep the face generic and don’t include any real-world brand logos.\n\nAssistant:\n\nReturn all blocks: [BASE IDENTITY], [POSE], [OUTFIT & MATERIALS], [CAMERA & LIGHTING], [STYLE & DETAIL TAGS], optionally [NEGATIVE PROMPT].\n\nMake facial description generic.\n\nReplace specific brand names with generic terms like “running shoes”, “sports leggings”.\n\nExample 2 — Style Extraction Only\n\nUser:\n\nI like the style of this photo. Just give me the camera, lighting, and style tags.\n\nAssistant:\n\nFill only [CAMERA & LIGHTING] and [STYLE & DETAIL TAGS].\n\nLeave identity and pose out unless explicitly requested.\n\nExample 3 — Pose Reuse\n\nUser:\n\nGive me the pose from this image so I can reuse it in my own character workflow.\n\nAssistant:\n\nReturn only [POSE] with clean, geometric fragments.\n\nNo style, no emotion, no clothing details.\n\n8. Interaction Rules\n\nUse clear, direct language. The user shouldn’t need to know ComfyUI or SD internals.\n\nIf they want more control, suggest:\n\n“I can also split this into pose, outfit, lighting, and style tags if you’d like finer control.”\n\nIf they submit text-only (no image), you can still:\n\nHelp them draft blocks in the same [BASE IDENTITY] / [POSE] / [CAMERA & LIGHTING] / [STYLE & DETAIL TAGS] structure based on their description.\n\nIf they ask for something you can’t safely provide (exact face clone, explicit content), refuse and offer a safe alternative.\n\nEnd of Bootblock"],"color":"#432","bgcolor":"#653"},{"id":3,"type":"AILab_QwenVL","pos":[-2225.4773794667894,-2958.681206862904],"size":[510.3534331830017,445.77510957882987],"flags":{},"order":6,"mode":0,"inputs":[{"localized_name":"image","name":"image","shape":7,"type":"IMAGE","link":2},{"localized_name":"video","name":"video","shape":7,"type":"IMAGE","link":null},{"localized_name":"model_name","name":"model_name","type":"COMBO","widget":{"name":"model_name"},"link":null},{"localized_name":"quantization","name":"quantization","type":"COMBO","widget":{"name":"quantization"},"link":null},{"localized_name":"attention_mode","name":"attention_mode","type":"COMBO","widget":{"name":"attention_mode"},"link":null},{"localized_name":"preset_prompt","name":"preset_prompt","type":"COMBO","widget":{"name":"preset_prompt"},"link":null},{"localized_name":"custom_prompt","name":"custom_prompt","type":"STRING","widget":{"name":"custom_prompt"},"link":null},{"localized_name":"max_tokens","name":"max_tokens","type":"INT","widget":{"name":"max_tokens"},"link":null},{"localized_name":"keep_model_loaded","name":"keep_model_loaded","type":"BOOLEAN","widget":{"name":"keep_model_loaded"},"link":null},{"localized_name":"seed","name":"seed","type":"INT","widget":{"name":"seed"},"link":null}],"outputs":[{"localized_name":"RESPONSE","name":"RESPONSE","type":"STRING","links":[4,25]}],"properties":{"cnr_id":"ComfyUI-QwenVL","ver":"1f6af2528168650fdf2ee544572549d59dc2824a","Node name for S&R":"AILab_QwenVL","ue_properties":{"widget_ue_connectable":{},"input_ue_unconnectable":{},"version":"7.4.1"},"aux_id":"1038lab/ComfyUI-QwenVL"},"widgets_values":["Qwen3-VL-2B-Instruct","None (FP16)","auto","🖼️ Tags","You are VisionHarvester.\n\nExtract ONLY a clean, neutral, identity-safe description of the woman in the image.\n\nFollow these strict rules:\n\n1. DO NOT describe pose, stance, limb position, or body angles.\n2. DO NOT describe emotion, attitude, personality, or expression.\n3. DO NOT describe unique facial identifiers (no freckles, moles, scars, specific face shape).\n4. DO NOT mention real people, celebrities, or names.\n5. DO NOT include NSFW content or anything suggestive.\n\n6. DO describe:\n - General body type (athletic, slim, curvy, average, etc.)\n - Hair color, length, and overall style (generic only)\n - Clothing type and materials (fabric texture, cut, color)\n - Accessories (if any)\n - Very broad face description (“soft neutral facial features,” “generic feminine face,” etc.)\n - Skin tone in broad terms (“light,” “medium,” “tan,” “deep”)\n\n7. Keep the output in ONE paragraph, 2–4 sentences max.\n\n8. Output ONLY the description. No commentary, no labels.\n\nThe final output must be suitable for a Stable Diffusion [BASE IDENTITY — PUBLIC-SAFE] block.\n",512,true,5,"fixed"],"color":"#28403f","bgcolor":"#374539"}],"links":[[1,1,0,2,0,"IMAGE"],[2,2,0,3,0,"IMAGE"],[3,2,0,4,0,"IMAGE"],[4,3,0,5,0,"STRING"],[16,19,0,14,0,"MODEL"],[17,20,0,14,1,"CONDITIONING"],[18,17,0,14,2,"CONDITIONING"],[19,18,0,14,3,"LATENT"],[20,14,0,15,0,"LATENT"],[21,19,2,15,1,"VAE"],[22,15,0,16,0,"IMAGE"],[23,19,1,17,0,"CLIP"],[24,19,1,20,0,"CLIP"],[25,3,0,20,1,"STRING"]],"groups":[],"config":{},"extra":{"ue_links":[],"ds":{"scale":1.3109994191500136,"offset":[3216.263730597563,2932.0583917462063]},"links_added_by_ue":[],"frontendVersion":"1.28.8","VHS_latentpreview":false,"VHS_latentpreviewrate":0,"VHS_MetadataImage":true,"VHS_KeepIntermediate":true},"version":0.4}

examples/example_01.png ADDED Viewed

Git LFS Details

SHA256: f1bb7199ba1912911c87d98f4d2360261af5ab762a6a2e7c7036ce1d0f8d290a
Pointer size: 131 Bytes
Size of remote file: 721 kB

examples/example_01_output.txt ADDED Viewed

	@@ -0,0 +1 @@

+ A woman with long dark brown wavy hair, medium skin tone, wearing a fitted olive green sports bra made from smooth stretch fabric, standing against a plain beige background. She has soft neutral facial features, an athletic build, and is accessorized with small silver earrings. Her posture is relaxed with one hand on her hip and the other behind her head. The lighting highlights her toned physique and natural appearance without emphasizing any particular detail beyond generic aesthetic elements.

examples/example_02.png ADDED Viewed

Git LFS Details

SHA256: f81de9fa633c371951ed4c5f321a69b7c33dd7a36b13dc69d66eebdc9920a390
Pointer size: 131 Bytes
Size of remote file: 896 kB

examples/example_02_output.txt ADDED Viewed

	@@ -0,0 +1 @@

+ A slender athletic woman with long brown hair styled straight down her back stands facing forward against a light gray background. She wears black performance sportswear consisting of an ergonomic sports bra with a high-cut waistband and form-fitting leggings that extend to her ankles. Her skin has a medium-tone complexion with soft neutral facial features, suggesting a calm presence without overt expressions. The clothing appears smooth and stretchy, likely made from moisture-wicking fabric designed for comfort during physical activity.

examples/example_03.png ADDED Viewed

Git LFS Details

SHA256: 4bd1de7422b035512694c358fd14f86d083d7e0c0882aa266418fe0891547677
Pointer size: 132 Bytes
Size of remote file: 1.26 MB

examples/example_03_output.txt ADDED Viewed

	@@ -0,0 +1 @@

+ A woman with long, wavy red hair styled loosely over her shoulders stands confidently against a plain gray background. She wears an armored top made from metallic silver material that covers her chest and upper abdomen, paired with brown leather shorts featuring fur trim at the sides. Her arms are adorned with matching arm guards on both wrists, while she holds a large sword with a detailed hilt in one hand. The outfit includes knee-high boots crafted from dark brown leather with metal accents around the joints. A wide belt cinches her waist, fastened by ornate detailing near the center. Her skin has a medium tone, appearing smooth under even lighting, with soft-neutral facial features suggesting a balanced build.

visionharvester_v1_extractor.prompt.txt ADDED Viewed

	@@ -0,0 +1,10 @@

+Extract a clean, neutral description of the woman in the image.
+Keep it simple:
+• No pose or body positioning
+• No emotions or personality
+• No unique facial identifiers
+• No NSFW content
+• Do describe hair, body type (general), clothing, colors, fabrics, and broad facial features
+Output 2–4 sentences that would work as a Stable Diffusion base identity block.